01:46.05 |
*** join/#brlcad IriX64
(n=IriX64@bas3-sudbury98-1168057882.dsl.bell.ca) |
02:42.55 |
*** join/#brlcad Twingy
(n=justin@74.92.144.217) |
03:22.11 |
Maloeran |
Erik or brlcad, are you there? I could use
your knowledge of non-Linux platforms |
03:23.14 |
Maloeran |
Basically, I would like to solve the problem
of running on NUMA platforms. Having a copy of the datasets into
each memory bank, and having specific threads on specific cores
working on each copy is something that can be done on
Linux |
03:24.26 |
Maloeran |
Since that might be tricky to support all
platforms this way, I was thinking about a more general way... Each
die, each chunk of cores accessing the same memory bank, could have
its own process running ; the processes working together by
distributed processing |
03:25.22 |
Maloeran |
Are other platforms clever enough to allocate
memory into the memory bank specific to the processor a process is
running on? Are they bright enough to put all the threads of the
process on the cores of the same die? |
03:25.57 |
Maloeran |
That really would be a simple solution. The
processes can synchronize each other by shared memory to avoid most
of the networking overhead |
03:29.28 |
``Erik |
not always |
03:29.44 |
Maloeran |
Can it be manually forced? |
03:29.51 |
``Erik |
fbsd doesn't seem to do bank association on
amd64 from dorking around... no way to force it |
03:30.00 |
Maloeran |
I think that's far less trouble than having
NUMA-aware built in ; just having a process per memory
bank |
03:30.05 |
Maloeran |
Ouch! |
03:30.32 |
``Erik |
and if you have one hard running thread hard
running on a dual proc mac, it'll aggressively rotate it between
procs to keep temps even |
03:30.43 |
Maloeran |
This is terrible |
03:30.53 |
``Erik |
*shrug* it's the way things go |
03:31.22 |
``Erik |
(the bsd thing needs to be fixed... if I had
free time, I'd get elbow deep into the allocator and scheduler and
make it happen... but time is a rare commodity) |
03:31.58 |
Maloeran |
Do you have any thought about numa-aware code
within a single process, storing multiple copies of the dataset, or
just having multiple synchronized processes? |
03:32.32 |
``Erik |
that all depends on if there's enough ram
*shrug* |
03:32.32 |
Maloeran |
The second way seems easier to get to work on
different OSes, if the OSes themselves are numa-aware |
03:32.40 |
Maloeran |
Right, of coures |
03:32.44 |
Maloeran |
course, even |
03:35.22 |
Maloeran |
I'm surprised that, even manually, one can't
force threads on cores and allocation in banks... That's probably
part of the explanation on why clusters don't run BSD |
03:37.29 |
brlcad |
a lot of similar concepts to numa |
03:38.01 |
brlcad |
additional reading with details on threading:
http://www-941.ibm.com/collaboration/wiki/display/WikiPtype/POWER5+Architecture |
03:38.34 |
Maloeran |
Thanks brlcad, seems similar to the Opteron
docs I read at first glance |
03:39.31 |
Maloeran |
I'm mostly wondering about how to solve the
software aspect of the problem |
03:41.01 |
brlcad |
eh, devil the in details .. exceptionally
high-end server market, no commodity aspects |
03:41.15 |
brlcad |
the documents go into software
implications |
03:41.23 |
Maloeran |
Right, great |
03:41.48 |
brlcad |
in particular the latter that details
execution, threading, and memory management |
03:42.25 |
brlcad |
could probably get an account on an sp4 to
play with |
03:42.47 |
Maloeran |
Wouldn't that be Power5-OSX specific? It's
awfully specific to the OS, there's no standard for NUMA
management |
03:43.26 |
brlcad |
os x doesn't run on power5 |
03:43.45 |
Maloeran |
MacOS9 then :), I really didn't follow that
line of software |
03:44.06 |
brlcad |
the power series are what are used by the
high-end supercomputers |
03:44.21 |
brlcad |
they have no relation to apple/mac |
03:45.05 |
Maloeran |
Oh. Great |
03:46.24 |
brlcad |
the G4 and G5 have architecture aspects
similar to the power series, and some have suggested that the G5 is
effectively the Power3 or Power4 with some of the high-end
supercomputing facilities removed (data management, simultaneous
core execution, larger L1/L2/L3 memories, etc, etc) |
03:47.02 |
Maloeran |
Thanks, that clears things up |
03:47.28 |
``Erik |
'cept the g[45] series have altivec, ibm/ppc
doesn't |
03:47.44 |
``Erik |
'cluster' is an awfully broad term
o.O |
03:47.57 |
Maloeran |
Exactly :) |
03:48.16 |
``Erik |
that's like saying you want to learn how to
write assembly for computers... |
03:49.11 |
Maloeran |
The comparison is valid ; learning assembly
for the main architectures, or learning scalable software for the
main cluster architectures |
03:51.33 |
brlcad |
valid, but potentially vary misleading --
comparing athlon/G5/P4/whatever to the Power architecture is sort
of like comparing the GForce 2 to the Quaddro FX .. there are
correlations, but one is the exceptional high-end with various
features that can be leveraged for extra order(s)
performance |
03:54.53 |
``Erik |
heh, my analogy was to point out how vague the
notion of mals statement was, as there are many radically different
archs... as there are cluster technologies *shrug* |
03:55.32 |
``Erik |
heh, yeah, the power line displaced the mips
line, its immediate ancestor.. :D |
03:56.04 |
brlcad |
yeah, and have been king ever since.. for
what? a decade now? |
03:56.26 |
brlcad |
since at least 1998 iirc |
03:56.48 |
Maloeran |
Erik, and I'm interested in learning scalable
programming for the main ones |
03:56.48 |
``Erik |
the unf/$ leans more towards opterons, though
*shrug* |
03:56.52 |
brlcad |
opteron has certainly been on the rise with
the revival of cray |
03:57.10 |
``Erik |
some amusing quotes from seymour |
03:58.08 |
Maloeran |
NUMA-aware threading code isn't too much
trouble on Linux, but as for some other OSes.. |
03:58.08 |
``Erik |
'numa' is a pretty broad category |
03:58.13 |
Maloeran |
Assigning threads to memory banks is a fairly
simple concept |
03:58.47 |
``Erik |
the simplest of forms and provided the OS
exposes it, sure *shrug* |
03:59.01 |
brlcad |
at the top 500 level, it rarely has to do with
$$.. it's reliability and performance first followed by probably
support and installation impact |
03:59.35 |
brlcad |
the technology is usually second to just
computing things as fast as possible |
08:33.22 |
*** join/#brlcad IriX64
(n=IriX64@bas3-sudbury98-1168057882.dsl.bell.ca) |
09:17.33 |
*** join/#brlcad dtidrow
(n=dtidrow@c-69-255-182-248.hsd1.va.comcast.net) |
09:59.21 |
*** join/#brlcad clock_
(n=clock@zux221-122-143.adsl.green.ch) |
10:03.35 |
*** join/#brlcad cad32
(n=503708da@bz.bzflag.bz) |
13:52.38 |
*** join/#brlcad b0ef
(n=b0ef@084202025057.customer.alfanett.no) |
14:55.18 |
*** join/#brlcad docelic
(n=docelic@212.15.183.78) |
15:27.54 |
*** join/#brlcad docelic
(n=docelic@212.15.174.172) |
15:55.47 |
*** join/#brlcad brlcad
(n=sean@bz.bzflag.bz) [NETSPLIT VICTIM] |
15:59.33 |
*** join/#brlcad b0ef
(n=b0ef@084202025057.customer.alfanett.no) [NETSPLIT
VICTIM] |
16:00.07 |
*** join/#brlcad docelic
(n=docelic@212.15.174.172) [NETSPLIT VICTIM] |
16:00.33 |
*** mode/#brlcad [+o brlcad]
by ChanServ |
16:58.10 |
*** join/#brlcad docelic
(n=docelic@212.15.185.121) |
17:55.50 |
*** join/#brlcad debarshi
(n=rishi@202.141.130.198) |
22:37.02 |
*** join/#brlcad FthrNtr
(n=IriX64@bas3-sudbury98-1168056909.dsl.bell.ca) |