01:00.10 |
*** join/#brlcad DTRemenak
(n=DTRemena@adsl-68-126-0-210.dsl.irvnca.pacbell.net) |
03:01.12 |
*** join/#brlcad digitalfredy
(n=digitalf@200.71.62.161) |
03:09.40 |
Maloeran |
That 1.7 million triangles frigate really
kills the raytracing performance, with all its diagonal ropes
through the scene. Very stressfull test for a raytracer... I would
be interested in knowing how my 200mb of RAM use on this compares
with ADRT |
03:10.44 |
Maloeran |
Or 400mb if I push the quality ( and
performance ) high |
03:19.04 |
CIA-9 |
BRL-CAD: 03brlcad * 10brlcad/sh/ (footer.sh
header.sh): add support for C++ and Objective-C/C++ to the
mix |
04:04.52 |
*** join/#brlcad digitalfredy
(n=digitalf@200.71.62.161) |
04:05.19 |
*** join/#brlcad dan_falck
(n=danfalck@pool-71-111-76-8.ptldor.dsl-w.verizon.net) |
04:19.38 |
*** join/#brlcad IriX64
(n=Who@bas3-sudbury98-1168052970.dsl.bell.ca) |
04:54.19 |
*** join/#brlcad DTRemenak
(n=DTRemena@adsl-68-126-0-210.dsl.irvnca.pacbell.net) |
05:46.36 |
*** join/#brlcad clock_
(i=clock@84-72-60-185.dclient.hispeed.ch) |
07:20.50 |
*** join/#brlcad clock_
(n=clock@zux221-122-143.adsl.green.ch) |
11:20.58 |
*** join/#brlcad rossberg
(n=rossberg@bz.bzflag.bz) |
11:54.20 |
CIA-9 |
BRL-CAD: 03d_rossberg * 10brlcad/BUGS: fixed
rendering toyjeep.g on Windows bug (on 7/6/2006) by using a less
rigorouse function to invert a 4x4 matrix in
rt_bend_pipe_prep |
12:35.38 |
*** join/#brlcad Twingy
(n=justin@74.92.144.217) |
13:04.20 |
Maloeran |
Does anyone have a recommendation for the best
reference for doxygen comments in the BRL-CAD code? |
13:05.14 |
Maloeran |
I noticed Lee working on libuu's doxygen
documentation, though I'm not sure where that libuu is. Not much
comes out on find |
13:05.56 |
Maloeran |
Ah, or perhaps it was libbu |
14:01.31 |
Maloeran |
Eh, Doxygen is confused about GCC's
__attribute__() |
14:22.54 |
``Erik |
O.o |
14:24.44 |
Maloeran |
Feeling any better, Erik? |
14:34.32 |
``Erik |
not much, heh |
14:35.00 |
Maloeran |
:/ Did you go through a x-ray scan just to
make sure? |
14:38.24 |
``Erik |
yeah, several xrays and a catscan |
14:38.50 |
``Erik |
btw, I think I may have an idea on why your
code doesn't run so hot on g4/g5 ... gcc 4.0.0 |
14:39.20 |
Maloeran |
Oh hum, that's a possibility. The assembly
looked very poor, as little as I know that arch |
14:39.49 |
Maloeran |
The demo now loads the 1.7 million triangles
frigate with caching, if you want |
14:40.19 |
``Erik |
yeah, been building for a few
minutes |
14:40.23 |
``Erik |
it segfaults on my amd64 |
14:40.35 |
``Erik |
#0 0x0000000801758c88 in stepComputeValue
(step=0x522030) at ../../../RF/prepmodel.c:701 |
14:40.35 |
``Erik |
701 step->linkcost[RF_EDGE_MAXZ] =
WALK_LINKCOUNT_COST( step->linkcount[ RF_EDGE_MAXZ ]
); |
14:40.44 |
Maloeran |
Hum. Okay |
14:41.34 |
Maloeran |
I seriously need to speed up that prep
eventually, it does a decent job but isn't fast at it |
14:42.29 |
Maloeran |
Could you p step->linkcount[ RF_EDGE_MAXZ
] on that segfault? It's rather curious |
14:44.33 |
Maloeran |
Even with low preparation quality, the 'prep'
can eat up to 500mb ; if it takes minutes, I think you are
swapping... |
14:50.23 |
brlcad |
src/lib*, there's a list of what each of the
various libs do in HACKING and src/README |
14:51.11 |
Maloeran |
I was more looking for the best reference for
the desired doxygen comment style, rather than a specific
library |
14:54.48 |
``Erik |
it's consuming one whole cpu, 508.19m real,
719.54m virtual, and has been going for 22 minutes |
14:54.55 |
Maloeran |
Thanks Erik, bug reproductible if I fill all
malloc'ed memory with garbage |
14:55.09 |
``Erik |
mal: yet another linux vs restoftheworld type
issue |
14:55.28 |
Maloeran |
Woah, it takes less than a minute on a good
Athlon |
14:56.10 |
``Erik |
I got 1.3m r/s with the m1 a couple days
ago |
14:56.27 |
``Erik |
I'm wondering if maybe it's caught in an
infinite loop due to different rounding behaviors or
something |
14:56.33 |
Maloeran |
I got 2.5-3.0m on my desktop, but the frigate
is much more demanding |
14:56.55 |
Maloeran |
That shouldn't happen, then again, I might
have missed something in this new prep written from
scratch |
14:56.58 |
``Erik |
oh, and *HUGE* stalls on some ops,
heh |
14:57.09 |
``Erik |
but I think it's a compiler problem more than
anything else :/ |
14:57.20 |
``Erik |
and stupid darwinports won't compile
gcc42 |
14:58.01 |
Maloeran |
Yes, the dotproduct4 assembly code was loading
all the values just before working on them, instead of scheduling a
bit |
14:58.30 |
``Erik |
hm, 'real' memory dropped a bit and is
creaping back up |
14:58.33 |
``Erik |
it must still be doing SOMETHING |
14:58.36 |
``Erik |
uh |
14:58.38 |
Maloeran |
Ahah |
14:58.42 |
``Erik |
you don't do something like realloc in that
prep, do you? |
14:59.04 |
Maloeran |
Very rarely, but it will happen |
14:59.08 |
``Erik |
hrm |
14:59.30 |
``Erik |
it's horrendously expensive on the bsd family
since phkmalloc and dmalloc work differently |
14:59.32 |
Maloeran |
I realloc the table of pages for pointer
directories, for sectors/steps/nodes |
14:59.37 |
Maloeran |
I see. |
15:00.20 |
``Erik |
phkmalloc tries to keep things more secure
from mmu smashes, so it tries to force memory to be contiguous on
the wire, which means a realloc is an ugly naive alloc/copy/dealloc
instead of dmalloc's page mangling |
15:00.33 |
Maloeran |
Gah! |
15:00.44 |
``Erik |
MOST unix has a very very slow
realloc |
15:01.42 |
``Erik |
but mallocing more than you need is 'free', it
won't actually hit wire until it's written to, so malloc 2g, use
what you want, don't worry about it *shrug* :) |
15:02.28 |
Maloeran |
Then it's swapping around happily, hence why
it takes 22 minutes instead of 40 seconds |
15:02.56 |
``Erik |
swap is totally unused right now |
15:03.15 |
Maloeran |
What is system doing? |
15:03.21 |
``Erik |
I d'no *shrug* |
15:03.37 |
``Erik |
you're making system calls (wrapped via libc
calls, I'm sure) that are expensive |
15:04.04 |
Maloeran |
There are no system calls but
malloc/free/realloc in there |
15:04.19 |
``Erik |
malloc and free should be fast |
15:04.21 |
``Erik |
there it is |
15:04.24 |
``Erik |
realloc is dog slow |
15:04.44 |
Maloeran |
It's really realloc? The one in mmDir* in mm.c
? |
15:07.33 |
``Erik |
hrm, in the raytrace porttion, 9.6% of the
time is spent on one op... "cror" (but it's stalled pretty
heavy) |
15:08.16 |
Maloeran |
In the dot product again? :) |
15:08.54 |
``Erik |
graphTraceDualOut line 635, the
"if(dstdist<=0.0)", which looks like it has to do two sequential
tests and then or the results before choosing to branch |
15:09.37 |
``Erik |
so to the machine, it looks like "if(
dstdist<0.0 || dstdist==0.0 )", requiring both to get out of the
pipeline, then feed back in for the or? *shrug* |
15:09.52 |
Maloeran |
That's quite possible, weird chip you
got |
15:09.53 |
``Erik |
vs if(!(dstdir>0.0)) which can be
streamed |
15:09.58 |
``Erik |
it's risc *shrug* |
15:10.12 |
Maloeran |
dstdist < 0.0 if you prefer, won't make a
difference |
15:10.32 |
``Erik |
I'm kinda guessing based on what the little
comment in shark says, heh |
15:10.48 |
Maloeran |
Yes I remember |
15:10.50 |
``Erik |
14% of compute time is on that dstdis =
_mathPlanePoint(tri->plane, dst) on 634 |
15:11.22 |
``Erik |
' |
15:11.24 |
``Erik |
gheh |
15:12.29 |
Maloeran |
So I suppose it finished prep'in in the end.
Care to profile that part?.. |
15:12.48 |
Maloeran |
I can't see what would take so long, as lazy
as some of the code is |
15:15.37 |
Maloeran |
If you do so, make sure to delete the cache or
it will just load it |
15:24.40 |
``Erik |
sure, uh, I'll gzip the cache instead,
heh... |
15:24.55 |
``Erik |
rtch ? |
15:24.58 |
Maloeran |
Right |
15:25.13 |
``Erik |
100 meg file, huh |
15:25.55 |
Maloeran |
I was aiming for a bit packed version earlier,
I'll switch back to that later |
15:26.19 |
Maloeran |
( So if you need 13 bits to identify a sector,
it will use that instead of 32 bits ) |
15:26.46 |
``Erik |
interesting, it starts very user based, and
linearly ramps to very system based |
15:27.21 |
Maloeran |
Anything more precise on what's going on in
system? |
15:29.49 |
``Erik |
"shandler" sounds familiar? |
15:31.03 |
Maloeran |
Hum, no? |
15:32.25 |
``Erik |
only 15.6 spend outside of
mach_kernel |
15:32.48 |
``Erik |
the biggest single symbol being
vm_map_enter |
15:33.02 |
``Erik |
which kinda smells like lots of small
alloc's |
15:33.33 |
``Erik |
O.O holy forshizzle |
15:33.56 |
``Erik |
chunk->prev = (void *)&(mmList); is
greviously expensive, if I'm reading this right |
15:34.18 |
Maloeran |
But... how? |
15:34.27 |
``Erik |
stw r0,12(r3) |
15:35.24 |
``Erik |
okie, readin that wrong... |
15:35.48 |
``Erik |
of the 3% of program time, that op was the big
consumer there... still less than 3% total |
15:36.07 |
Maloeran |
:) I prefer that |
15:53.44 |
``Erik |
*shrug* comments and docs would allow other
people to understand your stuff more readily and maybe make
comments on possible concerns or bottlenecks that you'd otherwise
spend a lot of time tracking |
15:53.55 |
``Erik |
especially since your environment is pretty
homogenous |
15:54.31 |
Maloeran |
I wanted to try Justin's fbsd box but it only
has 256mb of ram |
15:55.22 |
``Erik |
mine only has 384, heh |
15:55.39 |
``Erik |
my home one, that is |
15:56.17 |
Maloeran |
I just tried profiling in gprof, and it
doesn't profile anything in shared libraries :p, so I profiled my
main.c |
15:56.45 |
``Erik |
you need to build profiling forms of the
shared libraries |
15:57.06 |
``Erik |
uhmmm, on fbsd, you'd see like libc.so and
libc_p.so where _p.so is for the profiling lib |
15:57.25 |
``Erik |
I'm too out of leenewx to remember there,
heh |
15:57.26 |
Maloeran |
Shared libraries were built with -pg as well,
anything else? |
15:58.47 |
Maloeran |
Any sensitive results out of Sharp? |
16:01.50 |
Maloeran |
"Support for gprof profiling of shared
libraries is available on 32-bit systems only." What
the... |
16:02.20 |
Maloeran |
Sorry, nevermind that, specific to
HP-Unix |
16:02.22 |
``Erik |
shark? I don't think I ran it right, so I'm
rerunning it :/ |
16:06.12 |
``Erik |
stepSampleSort is a bit pricey |
16:06.56 |
Maloeran |
Like 5% or 40%? |
16:07.05 |
``Erik |
22.6 |
16:07.31 |
Maloeran |
Okay. That's one of the thing I have marked to
fix, I'm more wondering about the time spent on "system" |
16:08.27 |
``Erik |
sampleAddTri() is a tiny bit expensive,
... |
16:09.30 |
Maloeran |
Yes... and I'm not even using these lists yet,
planning ahead for improvements of the prep |
16:10.28 |
Maloeran |
Can you throw all the profiling text at
me? |
16:12.01 |
``Erik |
uhmmmmm, I'm running another set with
different time variables |
16:35.39 |
Maloeran |
So 50% is spent outside the executable itself,
that's... cute ;) |
16:36.39 |
``Erik |
I d'no if that's because it's a single thread
on a dual proc machine, or if it's just not seeing the frame stack
correctly when it samples, or if sdl throws threads, or
what |
16:39.42 |
Maloeran |
The model is built before SDL is initialized,
and you mentionned the system share starts growing later
on |
16:40.06 |
``Erik |
hm, part of sdl is initialized before main()
iirc |
16:40.20 |
``Erik |
it immediately pops up an sdl icon in the
doc |
16:40.23 |
``Erik |
before the window appears |
16:40.24 |
``Erik |
dock |
16:41.17 |
Maloeran |
Right I see |
16:47.26 |
Maloeran |
I think I would know how to build shared
libraries for gprof'iling, except that everything goes though this
libtool thing |
16:48.33 |
``Erik |
yeah, I'm not terribly keen on libtool, but
dynamic libraries are different on every os :/ |
16:49.09 |
``Erik |
btw, I msg'd the url there because I can't msg
here and I don't know how public you want that info... I'll delete
it if you want |
16:50.20 |
Maloeran |
Ah, nothing sensitive in there |
16:53.49 |
``Erik |
ok, thandler is the 'trap handler' and
shandler is the 'syscall handler', in the mach kernel (micro, so
it's handled via messages and 'servers', not function
calls) |
16:54.33 |
Maloeran |
Trap handler sounds like handling of page
faults when running out of ram |
16:54.53 |
Maloeran |
Syscall handler... Growing the heap size? 25%
of the processing time? Gez. |
17:06.47 |
``Erik |
hrm, dude, I have 2g of ram and I'm only using
like 200m |
17:06.53 |
``Erik |
and I never touched swap |
17:07.13 |
``Erik |
now the trap might be cache line related or
something else *shrug* and itt might be system wide, not just
applied to your application |
17:09.17 |
``Erik |
I just ran a program to allocate a gig in 1m
chunks and write crap to every page... almost no system time
consumed in that (16s user, 3s sys) |
17:09.35 |
``Erik |
no slowdown in it, so no swap hit |
17:10.23 |
``Erik |
about 1.5g I start seeing swap hits |
17:11.28 |
Maloeran |
Right. I could be mistaken, but the trap
handler handles page faults and I don't see what else could be
causing faults.. |
17:13.59 |
``Erik |
page fault is just one kind of trap |
17:16.26 |
``Erik |
ok, in the midst of the ugly, the syscall
handler is 54% and the trap handler is 21.5%, |
17:16.40 |
``Erik |
the trap that consumes most time looks to be
"ml_set_interrupts_enabled" |
17:17.07 |
``Erik |
only 1% of the time is vm_fault |
17:17.28 |
Maloeran |
I can't think of any other syscall being made
but malloc() and friends |
17:17.43 |
``Erik |
"isync" is the big trap abuse |
17:17.57 |
``Erik |
context switches force traps and shit,
too |
17:19.21 |
``Erik |
ok, isync stops new ops from entering the
pipeline and waits until the pipeline is empty, "This instruction
is context synchronizing" |
17:19.39 |
``Erik |
for OS memory management tasks, like changes
in the mmu |
17:23.22 |
``Erik |
"large_and_huge_malloc" might be related, in
mmAlloc under sampleAddTri |
17:24.46 |
Maloeran |
20-40k is "large and huge" ? |
17:25.15 |
``Erik |
bigger than a page *shrug* I d'no, heh, I'm
looking through this stuff more or less lost... |
17:25.18 |
``Erik |
<-- doesn't know ppc asm :) |
17:25.26 |
Maloeran |
#define SAMPLE_TRIANGLES_PER_LIST (4096)
could be set to 200k or something *shrug*, to have fewer
calls |
17:51.38 |
Maloeran |
Erik, could one of OSX's "security feature" be
to zero malloc() chunks or something? I'm running out of
hypotheses |
17:52.38 |
``Erik |
might be *shrug* I d'no |
17:55.12 |
Maloeran |
"The default malloc on OS X causes a large
performance degradation relative to the default mallocs on Linux
and Solaris." |
17:55.16 |
Maloeran |
Gah. |
17:56.42 |
Maloeran |
50% slower, nothing of the scale we saw
here |
18:07.06 |
``Erik |
interesting, a significant portion of time
looks like it's attribtued to handling l2 cache misses |
18:09.45 |
``Erik |
ahhhhhhhhh |
18:10.05 |
``Erik |
mmAlloc() cooks up time in a kernel function
called "Zero Fill" |
18:10.15 |
Maloeran |
AHH!! |
18:10.26 |
``Erik |
which'd explain cache thrashing |
18:10.35 |
Maloeran |
_That_ is the reason, I'm allocating a whole
bunch and freeing, sometimes without even using the
chunks |
18:10.59 |
``Erik |
learn somethin' new every day |
18:11.22 |
Maloeran |
Can you fix that? |
18:11.30 |
Maloeran |
Can you make malloc() behave in a sane
manner? |
18:12.32 |
``Erik |
googling for that now... and 'sane' is a
phrase that can be argued against... :D quit abusing malloc?
*duck* |
18:12.37 |
``Erik |
http://lists.apple.com/archives/Darwin-development/2003/Apr/msg00217.html
mentions some |
18:12.46 |
Maloeran |
Maybe there are multiple memory managers on
OSX, as there are multiple threading libraries on fbsd ( and the
default one is horrible too ) |
18:13.17 |
Maloeran |
Why would an OS ever memset() malloc'ed
chunks? I can do that myself I need it, that's absurd |
18:13.28 |
Maloeran |
if* I need it |
18:13.49 |
Maloeran |
The segfault mentionned earlier was fixed
too |
18:13.56 |
``Erik |
http://lists.apple.com/archives/Darwin-development/2003/Apr/msg00210.html
answers that, heh |
18:14.01 |
``Erik |
security mechanism |
18:14.11 |
Maloeran |
Absurd. |
18:16.01 |
``Erik |
http://developer.apple.com/tools/performance/optimizingwithsystemtrace.html
and search for "zero-fill" |
18:17.25 |
Maloeran |
So I have to write my own full-featured memory
manager because the OSX manager is too incompetent to care about
performance |
18:17.48 |
``Erik |
well, the converse argument is that the linux
memory manager is too incompetent to care about security |
18:17.52 |
Maloeran |
That also explains why even the m1a2 was
taking so long to prep on your laptops, it's supposed to be a few
seconds |
18:18.26 |
Maloeran |
If a process puts sensitive stuff in RAM, it's
the duty of _that_ process to mlock() the memory and clear it
accordingly |
18:18.44 |
Maloeran |
Don't slow down the whole OS for a few chunks
of ram that might possibly contain something sensitive |
18:19.09 |
``Erik |
heh |
18:19.22 |
``Erik |
in the land of incompetent coders...
:) |
18:19.32 |
Maloeran |
mlock() and related functions exist for a good
reason |
18:19.45 |
``Erik |
yes, as do calloc(), etc... |
18:20.33 |
Maloeran |
Grah, this is so absurd |
18:20.58 |
``Erik |
freebsd does the same thing,
apparently |
18:21.04 |
``Erik |
http://kerneltrap.org/node/72 |
18:22.55 |
Maloeran |
Seriously, this makes no sense at all. There
are POSIX functions to take care of storing sensitive information
in RAM |
18:23.16 |
``Erik |
... and if people USED them, then os's
wouldn't have to step up and cover |
18:24.06 |
Maloeran |
This is a _very_ bad fix. Fix the software,
don't hack a slow and patchy solution in the OS |
18:24.46 |
``Erik |
heh, and it seems to be a hot issue in linux
kernel development right now |
18:25.23 |
``Erik |
(and if the software is designed to break the
os? malicious code exists :/ ) |
18:26.08 |
Maloeran |
Okay. Do you have a full-featured and complete
memory manager in BRL-CAD already? |
18:26.22 |
``Erik |
http://lists.apple.com/archives/darwin-development/2003/Apr/msg00227.html
has more |
18:26.29 |
``Erik |
yeah, in libbu |
18:26.31 |
``Erik |
um |
18:26.52 |
``Erik |
but the behavior of "lots of allocs and
deallocs" is gonna be slow if it's passed to the os... |
18:27.02 |
Maloeran |
Seriously, the OS could bzero() pages as the
heap grows, but OSX seems to clear even reused pages ; malloc'ing
without expanding the heap |
18:27.30 |
Maloeran |
Normally, malloc() only reaches the OS if the
heap has to be extended. Otherwise, it stays entirely in user
space |
18:27.34 |
Maloeran |
On a sane and decent OS anyway |
18:28.02 |
``Erik |
erm, ... vm and wm are different,
dude |
18:29.13 |
``Erik |
(heh, and this is exactly where compacting
gc's shine) |
18:29.45 |
Maloeran |
Checking libbu, I only saw red-black tree
stuff there last time |
18:30.34 |
``Erik |
I'm pretty sure the libbu memory management is
just portable passthrough stuff, though |
18:31.36 |
``Erik |
stupid headache *grr* |
18:32.00 |
Maloeran |
I really don't feel like writing a memory
manager to handle broken malloc() implementations, but if I
must.. |
18:32.17 |
``Erik |
<-- thinks it's less broken than linux's
:( |
18:33.03 |
Maloeran |
Surely you agree that if software deals with
sensitive information, there are robust and _efficient_ mechanisms
to deal with this, instead of having every malloc() call being
zero'ed? |
18:33.28 |
``Erik |
given the quality of 95% of coders writing
'real' applications, no. I don't. |
18:33.34 |
Maloeran |
malloc()'ed memory is not supposed to be
cleared, it's supposed to be fast |
18:34.07 |
``Erik |
hm, I've never thought of malloc as a fast
operation *shrug* if you want fast, allocate a big honkin' heap and
do it yourself in that... |
18:34.33 |
Maloeran |
Clearing the new pages as the heap grows would
have made a certain sense, but for every malloc call, this is
highly absurd |
18:34.48 |
``Erik |
... |
18:35.00 |
``Erik |
you cannot make that statement because of how
mmu's work. |
18:35.21 |
``Erik |
you can free 4k, and then "immediately" alloc
4k, and you are not guaranteed that you got the same 4k
back |
18:35.33 |
``Erik |
you coudl've gotten one of my pages, or a
completely different page altogether |
18:35.50 |
Maloeran |
Of course not, but it's likely to be within
the heap for the process address space |
18:36.11 |
``Erik |
... for the process address space, yes... but
not the wired address space |
18:36.33 |
``Erik |
physical memory doesn't line up to process
memory, that's what the mmu does... |
18:36.39 |
Maloeran |
The heap never shrinks, the OS doesn't know
that the page is now unused |
18:36.59 |
``Erik |
erm, which heap? heh |
18:37.27 |
Maloeran |
The heap of the process ; the memory manager
is likely to reuse that page and you'll get what you had previously
stored there, without ever making a syscall |
18:37.28 |
``Erik |
free() is to mark a heap as unused so it can
be culled... |
18:37.43 |
``Erik |
and it disassociates it from the wired
page |
18:37.45 |
Maloeran |
So the heap can shrink on OSX? It never does
on Linux |
18:38.56 |
Maloeran |
That seems to be a logical explanation as to
why every malloc() call is zero'ed |
18:40.27 |
``Erik |
the process heap should be able to shrink on
every os :/ |
18:40.46 |
``Erik |
now the memory address of new allocations is
up in the air, but *shrug* |
18:42.04 |
Maloeran |
You can't shrink the heap on Linux. If it
grows high and shrink, unused high pages will eventually be put on
swap to make room for other processes, and just forgotten |
18:42.15 |
Maloeran |
That design has its flaws too ( the swapping
) |
18:42.17 |
*** join/#brlcad cadguy
(n=butler@bz.bzflag.bz) |
18:42.26 |
``Erik |
heh, and eventually oom |
18:42.56 |
cadguy |
Yo! How is everyone? |
18:43.00 |
``Erik |
(might be why I've seen ugly oom's on linux,
it's malloc is broken... O:-) ) |
18:43.18 |
Maloeran |
Good afternoon Lee |
18:43.35 |
``Erik |
email is sent, lee... subj "Sql" |
18:43.36 |
Maloeran |
BSD's malloc() seems less broken than OSX
still, it clears new pages but not the content of every malloc()
call |
18:43.44 |
cadguy |
Howdy Maloeran |
18:44.06 |
Maloeran |
Just having a long debate with Erik about why
the raytracer's prep is so terribly slow on OSX |
18:44.09 |
``Erik |
osX only zerofills when the freshly allocated
page is touched, as far as I can tell |
18:45.26 |
Maloeran |
Now reading libbu's memory manager, I suppose
that's the solution to work around inefficient malloc
implementations |
18:45.27 |
cadguy |
Hmm. How many pages are we allocating?
Lots? |
18:45.40 |
Maloeran |
Lots of pages, which are often just unused and
freed |
18:45.57 |
Maloeran |
malloc() is quite fast on Linux as pages are
never cleared |
18:45.58 |
cadguy |
Yes, that's a notorious performance
killer. |
18:46.09 |
cadguy |
That's a security issue. |
18:46.45 |
Maloeran |
When dealing with sensitive information,
processes can mlock() the memory, there are POSIX functions to take
care of that |
18:47.25 |
Maloeran |
But as Erik argued, a dirty and inefficient
fix at the OS level seems to be required due to the amount of bad
software out there... *shakes head* |
18:47.42 |
cadguy |
The usual technique is to keep a buffer pool
if you want to alloc/free a lot to keep the code easy. Then
allocate through your own buffer pool. |
18:48.31 |
``Erik |
*nod* allocate a slew of pages, keep 'free'
and 'used' linked lists, when one is freed or allocated, just
change which list it lives in |
18:48.39 |
cadguy |
Yea. Lots of lame code mucking around with
priviledges. Remember mlock() didn't appear until
4.4BSD. |
18:48.41 |
Maloeran |
Right. I'm checking libbu, but I won't hide
that I'm used to deal with an efficient malloc
implementation |
18:49.13 |
``Erik |
if you allocate with nothing in the free list,
free more... if you're worried about memory consumption, free()
some out of the free list when it reaches a threshhold |
18:49.28 |
``Erik |
s/efficient/insecure/ :) |
18:49.51 |
Maloeran |
Yes yes, I got that to deal with many small
chunks. I haven't got a full memory manager to deal with chunks of
all sizes and shapes |
18:49.52 |
cadguy |
No reason to hide. Just be aware that there
are space/time/security tradeoffs that different OS's
make. |
18:50.03 |
``Erik |
my bike goes 20kph and stays together, yours
goes 30 and kicks the wheels off every 50km |
18:50.05 |
``Erik |
:D |
18:50.52 |
Maloeran |
:) Eh well, time to write a memory manager
then! |
18:51.24 |
``Erik |
<-- thought that's what mm was supposed to
be o.O :) |
18:51.56 |
Maloeran |
It's not a full-blown memory manager, it has
efficient handling of packed tiny chunks, balanced trees,
etc. |
18:52.47 |
Maloeran |
since Linux's malloc() always performed
decently for management of medium to large sized chunks |
18:53.26 |
cadguy |
In general, any time you can avoid a system
call, it is worth doing. |
18:54.46 |
Maloeran |
On Linux, free() never shrinks the heap, so
malloc() will always remain in user-space unless the heap has to
grow. I realize it's quite different on OSX |
18:55.36 |
cadguy |
And different on solaris and other
Unix's |
19:07.54 |
Maloeran |
That model really is a challenge for any
acceleration structure, the planned second 'prep' pass should
improve things a bit... but mostly, ray bundles will |
19:08.05 |
Maloeran |
That and threads |
19:09.48 |
``Erik |
oohhhhh, rfTraceRays() calls malloc,
too |
19:11.58 |
Maloeran |
Only if there are no already allocated 'job'
struct in the list, nothing to worry about there |
19:15.13 |
``Erik |
that dstdir=mathPlanePoint() line (634) is a
major contributor to L2 cache misses (27.5%) |
19:15.39 |
``Erik |
second being line 582
"if(src[linkflags&RF_NODE_AXIS_MASK]<NODE(root)->plane)"
at 6.6% |
19:16.56 |
Maloeran |
The prototype had prefetch instructions for
caching triangles before the actual tests, that should
help |
19:17.19 |
``Erik |
memory bandwidth looks like, um, around
200-300 MB/s read and 20-30MB/s write |
19:17.32 |
Maloeran |
You know, I really like your profiler
:) |
19:17.50 |
``Erik |
heh, me too, this thing is gnarly |
19:18.06 |
cadguy |
You really should try to pick it up. |
19:18.39 |
cadguy |
Want me to talk with Mark? |
19:19.30 |
Maloeran |
Thanks, just give me 33 hours to receive my
first real pay check from Survice assuming the 30 days delay after
the end of the month is respected |
19:20.19 |
``Erik |
you got your travel expenses and per diem all
sorted out, correct? |
19:21.07 |
CIA-9 |
BRL-CAD: 03lbutler * 10brlcad/sh/gforge.sh:
script for querying a gforge site |
19:21.34 |
Maloeran |
I had no per diem expenses in August, but
sure |
19:23.24 |
*** join/#brlcad IriX64
(n=IriX64@bas3-sudbury98-1168052970.dsl.bell.ca) |
19:23.38 |
``Erik |
dude, if you ever do work related travel, the
employer should set everything up and take care of all the
(reasonable) expenses... |
19:24.34 |
``Erik |
it's chump change to them, a no brainer
investment... |
19:27.16 |
Maloeran |
Ah don't worry, I'll be quite fine. The 30
days delay for a monthly pay is just a bit annoying, after 2-3
months of unpaid vacation anyway ;) |
19:27.41 |
``Erik |
rtiBatchNsCallback() is your flat shadow-less
shader? |
19:27.53 |
Maloeran |
Somewhat, yes |
20:57.14 |
CIA-9 |
BRL-CAD: 03lbutler * 10brlcad/sh/gforge.sh:
make script adaptable to host |
21:13.00 |
Maloeran |
Erik, before I write a bunch of code, do you
have Hoard handy to see if the memory manager does a better
job? |
21:13.25 |
Maloeran |
It might clear pages the BSD way even on
OSX |
22:43.50 |
``Erik |
hoard? nope |
23:16.54 |
Maloeran |
Oh well. Everything but sectors and steps are
now allocated by sliced blocks, these chunks of variable size will
have their own personal little memory manager |