| 01:46.05 | *** join/#brlcad IriX64 (n=IriX64@bas3-sudbury98-1168057882.dsl.bell.ca) | |
| 02:42.55 | *** join/#brlcad Twingy (n=justin@74.92.144.217) | |
| 03:22.11 | Maloeran | Erik or brlcad, are you there? I could use your knowledge of non-Linux platforms |
| 03:23.14 | Maloeran | Basically, I would like to solve the problem of running on NUMA platforms. Having a copy of the datasets into each memory bank, and having specific threads on specific cores working on each copy is something that can be done on Linux |
| 03:24.26 | Maloeran | Since that might be tricky to support all platforms this way, I was thinking about a more general way... Each die, each chunk of cores accessing the same memory bank, could have its own process running ; the processes working together by distributed processing |
| 03:25.22 | Maloeran | Are other platforms clever enough to allocate memory into the memory bank specific to the processor a process is running on? Are they bright enough to put all the threads of the process on the cores of the same die? |
| 03:25.57 | Maloeran | That really would be a simple solution. The processes can synchronize each other by shared memory to avoid most of the networking overhead |
| 03:29.28 | ``Erik | not always |
| 03:29.44 | Maloeran | Can it be manually forced? |
| 03:29.51 | ``Erik | fbsd doesn't seem to do bank association on amd64 from dorking around... no way to force it |
| 03:30.00 | Maloeran | I think that's far less trouble than having NUMA-aware built in ; just having a process per memory bank |
| 03:30.05 | Maloeran | Ouch! |
| 03:30.32 | ``Erik | and if you have one hard running thread hard running on a dual proc mac, it'll aggressively rotate it between procs to keep temps even |
| 03:30.43 | Maloeran | This is terrible |
| 03:30.53 | ``Erik | *shrug* it's the way things go |
| 03:31.22 | ``Erik | (the bsd thing needs to be fixed... if I had free time, I'd get elbow deep into the allocator and scheduler and make it happen... but time is a rare commodity) |
| 03:31.58 | Maloeran | Do you have any thought about numa-aware code within a single process, storing multiple copies of the dataset, or just having multiple synchronized processes? |
| 03:32.32 | ``Erik | that all depends on if there's enough ram *shrug* |
| 03:32.32 | Maloeran | The second way seems easier to get to work on different OSes, if the OSes themselves are numa-aware |
| 03:32.40 | Maloeran | Right, of coures |
| 03:32.44 | Maloeran | course, even |
| 03:35.22 | Maloeran | I'm surprised that, even manually, one can't force threads on cores and allocation in banks... That's probably part of the explanation on why clusters don't run BSD |
| 03:37.29 | brlcad | a lot of similar concepts to numa |
| 03:38.01 | brlcad | additional reading with details on threading: http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/POWER5+Architecture |
| 03:38.34 | Maloeran | Thanks brlcad, seems similar to the Opteron docs I read at first glance |
| 03:39.31 | Maloeran | I'm mostly wondering about how to solve the software aspect of the problem |
| 03:41.01 | brlcad | eh, devil the in details .. exceptionally high-end server market, no commodity aspects |
| 03:41.15 | brlcad | the documents go into software implications |
| 03:41.23 | Maloeran | Right, great |
| 03:41.48 | brlcad | in particular the latter that details execution, threading, and memory management |
| 03:42.25 | brlcad | could probably get an account on an sp4 to play with |
| 03:42.47 | Maloeran | Wouldn't that be Power5-OSX specific? It's awfully specific to the OS, there's no standard for NUMA management |
| 03:43.26 | brlcad | os x doesn't run on power5 |
| 03:43.45 | Maloeran | MacOS9 then :), I really didn't follow that line of software |
| 03:44.06 | brlcad | the power series are what are used by the high-end supercomputers |
| 03:44.21 | brlcad | they have no relation to apple/mac |
| 03:45.05 | Maloeran | Oh. Great |
| 03:46.24 | brlcad | the G4 and G5 have architecture aspects similar to the power series, and some have suggested that the G5 is effectively the Power3 or Power4 with some of the high-end supercomputing facilities removed (data management, simultaneous core execution, larger L1/L2/L3 memories, etc, etc) |
| 03:47.02 | Maloeran | Thanks, that clears things up |
| 03:47.28 | ``Erik | 'cept the g[45] series have altivec, ibm/ppc doesn't |
| 03:47.44 | ``Erik | 'cluster' is an awfully broad term o.O |
| 03:47.57 | Maloeran | Exactly :) |
| 03:48.16 | ``Erik | that's like saying you want to learn how to write assembly for computers... |
| 03:49.11 | Maloeran | The comparison is valid ; learning assembly for the main architectures, or learning scalable software for the main cluster architectures |
| 03:51.33 | brlcad | valid, but potentially vary misleading -- comparing athlon/G5/P4/whatever to the Power architecture is sort of like comparing the GForce 2 to the Quaddro FX .. there are correlations, but one is the exceptional high-end with various features that can be leveraged for extra order(s) performance |
| 03:54.53 | ``Erik | heh, my analogy was to point out how vague the notion of mals statement was, as there are many radically different archs... as there are cluster technologies *shrug* |
| 03:55.32 | ``Erik | heh, yeah, the power line displaced the mips line, its immediate ancestor.. :D |
| 03:56.04 | brlcad | yeah, and have been king ever since.. for what? a decade now? |
| 03:56.26 | brlcad | since at least 1998 iirc |
| 03:56.48 | Maloeran | Erik, and I'm interested in learning scalable programming for the main ones |
| 03:56.48 | ``Erik | the unf/$ leans more towards opterons, though *shrug* |
| 03:56.52 | brlcad | opteron has certainly been on the rise with the revival of cray |
| 03:57.10 | ``Erik | some amusing quotes from seymour |
| 03:58.08 | Maloeran | NUMA-aware threading code isn't too much trouble on Linux, but as for some other OSes.. |
| 03:58.08 | ``Erik | 'numa' is a pretty broad category |
| 03:58.13 | Maloeran | Assigning threads to memory banks is a fairly simple concept |
| 03:58.47 | ``Erik | the simplest of forms and provided the OS exposes it, sure *shrug* |
| 03:59.01 | brlcad | at the top 500 level, it rarely has to do with $$.. it's reliability and performance first followed by probably support and installation impact |
| 03:59.35 | brlcad | the technology is usually second to just computing things as fast as possible |
| 08:33.22 | *** join/#brlcad IriX64 (n=IriX64@bas3-sudbury98-1168057882.dsl.bell.ca) | |
| 09:17.33 | *** join/#brlcad dtidrow (n=dtidrow@c-69-255-182-248.hsd1.va.comcast.net) | |
| 09:59.21 | *** join/#brlcad clock_ (n=clock@zux221-122-143.adsl.green.ch) | |
| 10:03.35 | *** join/#brlcad cad32 (n=503708da@bz.bzflag.bz) | |
| 13:52.38 | *** join/#brlcad b0ef (n=b0ef@084202025057.customer.alfanett.no) | |
| 14:55.18 | *** join/#brlcad docelic (n=docelic@212.15.183.78) | |
| 15:27.54 | *** join/#brlcad docelic (n=docelic@212.15.174.172) | |
| 15:55.47 | *** join/#brlcad brlcad (n=sean@bz.bzflag.bz) [NETSPLIT VICTIM] | |
| 15:59.33 | *** join/#brlcad b0ef (n=b0ef@084202025057.customer.alfanett.no) [NETSPLIT VICTIM] | |
| 16:00.07 | *** join/#brlcad docelic (n=docelic@212.15.174.172) [NETSPLIT VICTIM] | |
| 16:00.33 | *** mode/#brlcad [+o brlcad] by ChanServ | |
| 16:58.10 | *** join/#brlcad docelic (n=docelic@212.15.185.121) | |
| 17:55.50 | *** join/#brlcad debarshi (n=rishi@202.141.130.198) | |
| 22:37.02 | *** join/#brlcad FthrNtr (n=IriX64@bas3-sudbury98-1168056909.dsl.bell.ca) | |