| 01:00.10 | *** join/#brlcad DTRemenak (n=DTRemena@adsl-68-126-0-210.dsl.irvnca.pacbell.net) | |
| 03:01.12 | *** join/#brlcad digitalfredy (n=digitalf@200.71.62.161) | |
| 03:09.40 | Maloeran | That 1.7 million triangles frigate really kills the raytracing performance, with all its diagonal ropes through the scene. Very stressfull test for a raytracer... I would be interested in knowing how my 200mb of RAM use on this compares with ADRT |
| 03:10.44 | Maloeran | Or 400mb if I push the quality ( and performance ) high |
| 03:19.04 | CIA-9 | BRL-CAD: 03brlcad * 10brlcad/sh/ (footer.sh header.sh): add support for C++ and Objective-C/C++ to the mix |
| 04:04.52 | *** join/#brlcad digitalfredy (n=digitalf@200.71.62.161) | |
| 04:05.19 | *** join/#brlcad dan_falck (n=danfalck@pool-71-111-76-8.ptldor.dsl-w.verizon.net) | |
| 04:19.38 | *** join/#brlcad IriX64 (n=Who@bas3-sudbury98-1168052970.dsl.bell.ca) | |
| 04:54.19 | *** join/#brlcad DTRemenak (n=DTRemena@adsl-68-126-0-210.dsl.irvnca.pacbell.net) | |
| 05:46.36 | *** join/#brlcad clock_ (i=clock@84-72-60-185.dclient.hispeed.ch) | |
| 07:20.50 | *** join/#brlcad clock_ (n=clock@zux221-122-143.adsl.green.ch) | |
| 11:20.58 | *** join/#brlcad rossberg (n=rossberg@bz.bzflag.bz) | |
| 11:54.20 | CIA-9 | BRL-CAD: 03d_rossberg * 10brlcad/BUGS: fixed rendering toyjeep.g on Windows bug (on 7/6/2006) by using a less rigorouse function to invert a 4x4 matrix in rt_bend_pipe_prep |
| 12:35.38 | *** join/#brlcad Twingy (n=justin@74.92.144.217) | |
| 13:04.20 | Maloeran | Does anyone have a recommendation for the best reference for doxygen comments in the BRL-CAD code? |
| 13:05.14 | Maloeran | I noticed Lee working on libuu's doxygen documentation, though I'm not sure where that libuu is. Not much comes out on find |
| 13:05.56 | Maloeran | Ah, or perhaps it was libbu |
| 14:01.31 | Maloeran | Eh, Doxygen is confused about GCC's __attribute__() |
| 14:22.54 | ``Erik | O.o |
| 14:24.44 | Maloeran | Feeling any better, Erik? |
| 14:34.32 | ``Erik | not much, heh |
| 14:35.00 | Maloeran | :/ Did you go through a x-ray scan just to make sure? |
| 14:38.24 | ``Erik | yeah, several xrays and a catscan |
| 14:38.50 | ``Erik | btw, I think I may have an idea on why your code doesn't run so hot on g4/g5 ... gcc 4.0.0 |
| 14:39.20 | Maloeran | Oh hum, that's a possibility. The assembly looked very poor, as little as I know that arch |
| 14:39.49 | Maloeran | The demo now loads the 1.7 million triangles frigate with caching, if you want |
| 14:40.19 | ``Erik | yeah, been building for a few minutes |
| 14:40.23 | ``Erik | it segfaults on my amd64 |
| 14:40.35 | ``Erik | #0 0x0000000801758c88 in stepComputeValue (step=0x522030) at ../../../RF/prepmodel.c:701 |
| 14:40.35 | ``Erik | 701 step->linkcost[RF_EDGE_MAXZ] = WALK_LINKCOUNT_COST( step->linkcount[ RF_EDGE_MAXZ ] ); |
| 14:40.44 | Maloeran | Hum. Okay |
| 14:41.34 | Maloeran | I seriously need to speed up that prep eventually, it does a decent job but isn't fast at it |
| 14:42.29 | Maloeran | Could you p step->linkcount[ RF_EDGE_MAXZ ] on that segfault? It's rather curious |
| 14:44.33 | Maloeran | Even with low preparation quality, the 'prep' can eat up to 500mb ; if it takes minutes, I think you are swapping... |
| 14:50.23 | brlcad | src/lib*, there's a list of what each of the various libs do in HACKING and src/README |
| 14:51.11 | Maloeran | I was more looking for the best reference for the desired doxygen comment style, rather than a specific library |
| 14:54.48 | ``Erik | it's consuming one whole cpu, 508.19m real, 719.54m virtual, and has been going for 22 minutes |
| 14:54.55 | Maloeran | Thanks Erik, bug reproductible if I fill all malloc'ed memory with garbage |
| 14:55.09 | ``Erik | mal: yet another linux vs restoftheworld type issue |
| 14:55.28 | Maloeran | Woah, it takes less than a minute on a good Athlon |
| 14:56.10 | ``Erik | I got 1.3m r/s with the m1 a couple days ago |
| 14:56.27 | ``Erik | I'm wondering if maybe it's caught in an infinite loop due to different rounding behaviors or something |
| 14:56.33 | Maloeran | I got 2.5-3.0m on my desktop, but the frigate is much more demanding |
| 14:56.55 | Maloeran | That shouldn't happen, then again, I might have missed something in this new prep written from scratch |
| 14:56.58 | ``Erik | oh, and *HUGE* stalls on some ops, heh |
| 14:57.09 | ``Erik | but I think it's a compiler problem more than anything else :/ |
| 14:57.20 | ``Erik | and stupid darwinports won't compile gcc42 |
| 14:58.01 | Maloeran | Yes, the dotproduct4 assembly code was loading all the values just before working on them, instead of scheduling a bit |
| 14:58.30 | ``Erik | hm, 'real' memory dropped a bit and is creaping back up |
| 14:58.33 | ``Erik | it must still be doing SOMETHING |
| 14:58.36 | ``Erik | uh |
| 14:58.38 | Maloeran | Ahah |
| 14:58.42 | ``Erik | you don't do something like realloc in that prep, do you? |
| 14:59.04 | Maloeran | Very rarely, but it will happen |
| 14:59.08 | ``Erik | hrm |
| 14:59.30 | ``Erik | it's horrendously expensive on the bsd family since phkmalloc and dmalloc work differently |
| 14:59.32 | Maloeran | I realloc the table of pages for pointer directories, for sectors/steps/nodes |
| 14:59.37 | Maloeran | I see. |
| 15:00.20 | ``Erik | phkmalloc tries to keep things more secure from mmu smashes, so it tries to force memory to be contiguous on the wire, which means a realloc is an ugly naive alloc/copy/dealloc instead of dmalloc's page mangling |
| 15:00.33 | Maloeran | Gah! |
| 15:00.44 | ``Erik | MOST unix has a very very slow realloc |
| 15:01.42 | ``Erik | but mallocing more than you need is 'free', it won't actually hit wire until it's written to, so malloc 2g, use what you want, don't worry about it *shrug* :) |
| 15:02.28 | Maloeran | Then it's swapping around happily, hence why it takes 22 minutes instead of 40 seconds |
| 15:02.56 | ``Erik | swap is totally unused right now |
| 15:03.15 | Maloeran | What is system doing? |
| 15:03.21 | ``Erik | I d'no *shrug* |
| 15:03.37 | ``Erik | you're making system calls (wrapped via libc calls, I'm sure) that are expensive |
| 15:04.04 | Maloeran | There are no system calls but malloc/free/realloc in there |
| 15:04.19 | ``Erik | malloc and free should be fast |
| 15:04.21 | ``Erik | there it is |
| 15:04.24 | ``Erik | realloc is dog slow |
| 15:04.44 | Maloeran | It's really realloc? The one in mmDir* in mm.c ? |
| 15:07.33 | ``Erik | hrm, in the raytrace porttion, 9.6% of the time is spent on one op... "cror" (but it's stalled pretty heavy) |
| 15:08.16 | Maloeran | In the dot product again? :) |
| 15:08.54 | ``Erik | graphTraceDualOut line 635, the "if(dstdist<=0.0)", which looks like it has to do two sequential tests and then or the results before choosing to branch |
| 15:09.37 | ``Erik | so to the machine, it looks like "if( dstdist<0.0 || dstdist==0.0 )", requiring both to get out of the pipeline, then feed back in for the or? *shrug* |
| 15:09.52 | Maloeran | That's quite possible, weird chip you got |
| 15:09.53 | ``Erik | vs if(!(dstdir>0.0)) which can be streamed |
| 15:09.58 | ``Erik | it's risc *shrug* |
| 15:10.12 | Maloeran | dstdist < 0.0 if you prefer, won't make a difference |
| 15:10.32 | ``Erik | I'm kinda guessing based on what the little comment in shark says, heh |
| 15:10.48 | Maloeran | Yes I remember |
| 15:10.50 | ``Erik | 14% of compute time is on that dstdis = _mathPlanePoint(tri->plane, dst) on 634 |
| 15:11.22 | ``Erik | ' |
| 15:11.24 | ``Erik | gheh |
| 15:12.29 | Maloeran | So I suppose it finished prep'in in the end. Care to profile that part?.. |
| 15:12.48 | Maloeran | I can't see what would take so long, as lazy as some of the code is |
| 15:15.37 | Maloeran | If you do so, make sure to delete the cache or it will just load it |
| 15:24.40 | ``Erik | sure, uh, I'll gzip the cache instead, heh... |
| 15:24.55 | ``Erik | rtch ? |
| 15:24.58 | Maloeran | Right |
| 15:25.13 | ``Erik | 100 meg file, huh |
| 15:25.55 | Maloeran | I was aiming for a bit packed version earlier, I'll switch back to that later |
| 15:26.19 | Maloeran | ( So if you need 13 bits to identify a sector, it will use that instead of 32 bits ) |
| 15:26.46 | ``Erik | interesting, it starts very user based, and linearly ramps to very system based |
| 15:27.21 | Maloeran | Anything more precise on what's going on in system? |
| 15:29.49 | ``Erik | "shandler" sounds familiar? |
| 15:31.03 | Maloeran | Hum, no? |
| 15:32.25 | ``Erik | only 15.6 spend outside of mach_kernel |
| 15:32.48 | ``Erik | the biggest single symbol being vm_map_enter |
| 15:33.02 | ``Erik | which kinda smells like lots of small alloc's |
| 15:33.33 | ``Erik | O.O holy forshizzle |
| 15:33.56 | ``Erik | chunk->prev = (void *)&(mmList); is greviously expensive, if I'm reading this right |
| 15:34.18 | Maloeran | But... how? |
| 15:34.27 | ``Erik | stw r0,12(r3) |
| 15:35.24 | ``Erik | okie, readin that wrong... |
| 15:35.48 | ``Erik | of the 3% of program time, that op was the big consumer there... still less than 3% total |
| 15:36.07 | Maloeran | :) I prefer that |
| 15:53.44 | ``Erik | *shrug* comments and docs would allow other people to understand your stuff more readily and maybe make comments on possible concerns or bottlenecks that you'd otherwise spend a lot of time tracking |
| 15:53.55 | ``Erik | especially since your environment is pretty homogenous |
| 15:54.31 | Maloeran | I wanted to try Justin's fbsd box but it only has 256mb of ram |
| 15:55.22 | ``Erik | mine only has 384, heh |
| 15:55.39 | ``Erik | my home one, that is |
| 15:56.17 | Maloeran | I just tried profiling in gprof, and it doesn't profile anything in shared libraries :p, so I profiled my main.c |
| 15:56.45 | ``Erik | you need to build profiling forms of the shared libraries |
| 15:57.06 | ``Erik | uhmmm, on fbsd, you'd see like libc.so and libc_p.so where _p.so is for the profiling lib |
| 15:57.25 | ``Erik | I'm too out of leenewx to remember there, heh |
| 15:57.26 | Maloeran | Shared libraries were built with -pg as well, anything else? |
| 15:58.47 | Maloeran | Any sensitive results out of Sharp? |
| 16:01.50 | Maloeran | "Support for gprof profiling of shared libraries is available on 32-bit systems only." What the... |
| 16:02.20 | Maloeran | Sorry, nevermind that, specific to HP-Unix |
| 16:02.22 | ``Erik | shark? I don't think I ran it right, so I'm rerunning it :/ |
| 16:06.12 | ``Erik | stepSampleSort is a bit pricey |
| 16:06.56 | Maloeran | Like 5% or 40%? |
| 16:07.05 | ``Erik | 22.6 |
| 16:07.31 | Maloeran | Okay. That's one of the thing I have marked to fix, I'm more wondering about the time spent on "system" |
| 16:08.27 | ``Erik | sampleAddTri() is a tiny bit expensive, ... |
| 16:09.30 | Maloeran | Yes... and I'm not even using these lists yet, planning ahead for improvements of the prep |
| 16:10.28 | Maloeran | Can you throw all the profiling text at me? |
| 16:12.01 | ``Erik | uhmmmmm, I'm running another set with different time variables |
| 16:35.39 | Maloeran | So 50% is spent outside the executable itself, that's... cute ;) |
| 16:36.39 | ``Erik | I d'no if that's because it's a single thread on a dual proc machine, or if it's just not seeing the frame stack correctly when it samples, or if sdl throws threads, or what |
| 16:39.42 | Maloeran | The model is built before SDL is initialized, and you mentionned the system share starts growing later on |
| 16:40.06 | ``Erik | hm, part of sdl is initialized before main() iirc |
| 16:40.20 | ``Erik | it immediately pops up an sdl icon in the doc |
| 16:40.23 | ``Erik | before the window appears |
| 16:40.24 | ``Erik | dock |
| 16:41.17 | Maloeran | Right I see |
| 16:47.26 | Maloeran | I think I would know how to build shared libraries for gprof'iling, except that everything goes though this libtool thing |
| 16:48.33 | ``Erik | yeah, I'm not terribly keen on libtool, but dynamic libraries are different on every os :/ |
| 16:49.09 | ``Erik | btw, I msg'd the url there because I can't msg here and I don't know how public you want that info... I'll delete it if you want |
| 16:50.20 | Maloeran | Ah, nothing sensitive in there |
| 16:53.49 | ``Erik | ok, thandler is the 'trap handler' and shandler is the 'syscall handler', in the mach kernel (micro, so it's handled via messages and 'servers', not function calls) |
| 16:54.33 | Maloeran | Trap handler sounds like handling of page faults when running out of ram |
| 16:54.53 | Maloeran | Syscall handler... Growing the heap size? 25% of the processing time? Gez. |
| 17:06.47 | ``Erik | hrm, dude, I have 2g of ram and I'm only using like 200m |
| 17:06.53 | ``Erik | and I never touched swap |
| 17:07.13 | ``Erik | now the trap might be cache line related or something else *shrug* and itt might be system wide, not just applied to your application |
| 17:09.17 | ``Erik | I just ran a program to allocate a gig in 1m chunks and write crap to every page... almost no system time consumed in that (16s user, 3s sys) |
| 17:09.35 | ``Erik | no slowdown in it, so no swap hit |
| 17:10.23 | ``Erik | about 1.5g I start seeing swap hits |
| 17:11.28 | Maloeran | Right. I could be mistaken, but the trap handler handles page faults and I don't see what else could be causing faults.. |
| 17:13.59 | ``Erik | page fault is just one kind of trap |
| 17:16.26 | ``Erik | ok, in the midst of the ugly, the syscall handler is 54% and the trap handler is 21.5%, |
| 17:16.40 | ``Erik | the trap that consumes most time looks to be "ml_set_interrupts_enabled" |
| 17:17.07 | ``Erik | only 1% of the time is vm_fault |
| 17:17.28 | Maloeran | I can't think of any other syscall being made but malloc() and friends |
| 17:17.43 | ``Erik | "isync" is the big trap abuse |
| 17:17.57 | ``Erik | context switches force traps and shit, too |
| 17:19.21 | ``Erik | ok, isync stops new ops from entering the pipeline and waits until the pipeline is empty, "This instruction is context synchronizing" |
| 17:19.39 | ``Erik | for OS memory management tasks, like changes in the mmu |
| 17:23.22 | ``Erik | "large_and_huge_malloc" might be related, in mmAlloc under sampleAddTri |
| 17:24.46 | Maloeran | 20-40k is "large and huge" ? |
| 17:25.15 | ``Erik | bigger than a page *shrug* I d'no, heh, I'm looking through this stuff more or less lost... |
| 17:25.18 | ``Erik | <-- doesn't know ppc asm :) |
| 17:25.26 | Maloeran | #define SAMPLE_TRIANGLES_PER_LIST (4096) could be set to 200k or something *shrug*, to have fewer calls |
| 17:51.38 | Maloeran | Erik, could one of OSX's "security feature" be to zero malloc() chunks or something? I'm running out of hypotheses |
| 17:52.38 | ``Erik | might be *shrug* I d'no |
| 17:55.12 | Maloeran | "The default malloc on OS X causes a large performance degradation relative to the default mallocs on Linux and Solaris." |
| 17:55.16 | Maloeran | Gah. |
| 17:56.42 | Maloeran | 50% slower, nothing of the scale we saw here |
| 18:07.06 | ``Erik | interesting, a significant portion of time looks like it's attribtued to handling l2 cache misses |
| 18:09.45 | ``Erik | ahhhhhhhhh |
| 18:10.05 | ``Erik | mmAlloc() cooks up time in a kernel function called "Zero Fill" |
| 18:10.15 | Maloeran | AHH!! |
| 18:10.26 | ``Erik | which'd explain cache thrashing |
| 18:10.35 | Maloeran | _That_ is the reason, I'm allocating a whole bunch and freeing, sometimes without even using the chunks |
| 18:10.59 | ``Erik | learn somethin' new every day |
| 18:11.22 | Maloeran | Can you fix that? |
| 18:11.30 | Maloeran | Can you make malloc() behave in a sane manner? |
| 18:12.32 | ``Erik | googling for that now... and 'sane' is a phrase that can be argued against... :D quit abusing malloc? *duck* |
| 18:12.37 | ``Erik | http://lists.apple.com/archives/Darwin-development/2003/Apr/msg00217.html mentions some |
| 18:12.46 | Maloeran | Maybe there are multiple memory managers on OSX, as there are multiple threading libraries on fbsd ( and the default one is horrible too ) |
| 18:13.17 | Maloeran | Why would an OS ever memset() malloc'ed chunks? I can do that myself I need it, that's absurd |
| 18:13.28 | Maloeran | if* I need it |
| 18:13.49 | Maloeran | The segfault mentionned earlier was fixed too |
| 18:13.56 | ``Erik | http://lists.apple.com/archives/Darwin-development/2003/Apr/msg00210.html answers that, heh |
| 18:14.01 | ``Erik | security mechanism |
| 18:14.11 | Maloeran | Absurd. |
| 18:16.01 | ``Erik | http://developer.apple.com/tools/performance/optimizingwithsystemtrace.html and search for "zero-fill" |
| 18:17.25 | Maloeran | So I have to write my own full-featured memory manager because the OSX manager is too incompetent to care about performance |
| 18:17.48 | ``Erik | well, the converse argument is that the linux memory manager is too incompetent to care about security |
| 18:17.52 | Maloeran | That also explains why even the m1a2 was taking so long to prep on your laptops, it's supposed to be a few seconds |
| 18:18.26 | Maloeran | If a process puts sensitive stuff in RAM, it's the duty of _that_ process to mlock() the memory and clear it accordingly |
| 18:18.44 | Maloeran | Don't slow down the whole OS for a few chunks of ram that might possibly contain something sensitive |
| 18:19.09 | ``Erik | heh |
| 18:19.22 | ``Erik | in the land of incompetent coders... :) |
| 18:19.32 | Maloeran | mlock() and related functions exist for a good reason |
| 18:19.45 | ``Erik | yes, as do calloc(), etc... |
| 18:20.33 | Maloeran | Grah, this is so absurd |
| 18:20.58 | ``Erik | freebsd does the same thing, apparently |
| 18:21.04 | ``Erik | http://kerneltrap.org/node/72 |
| 18:22.55 | Maloeran | Seriously, this makes no sense at all. There are POSIX functions to take care of storing sensitive information in RAM |
| 18:23.16 | ``Erik | ... and if people USED them, then os's wouldn't have to step up and cover |
| 18:24.06 | Maloeran | This is a _very_ bad fix. Fix the software, don't hack a slow and patchy solution in the OS |
| 18:24.46 | ``Erik | heh, and it seems to be a hot issue in linux kernel development right now |
| 18:25.23 | ``Erik | (and if the software is designed to break the os? malicious code exists :/ ) |
| 18:26.08 | Maloeran | Okay. Do you have a full-featured and complete memory manager in BRL-CAD already? |
| 18:26.22 | ``Erik | http://lists.apple.com/archives/darwin-development/2003/Apr/msg00227.html has more |
| 18:26.29 | ``Erik | yeah, in libbu |
| 18:26.31 | ``Erik | um |
| 18:26.52 | ``Erik | but the behavior of "lots of allocs and deallocs" is gonna be slow if it's passed to the os... |
| 18:27.02 | Maloeran | Seriously, the OS could bzero() pages as the heap grows, but OSX seems to clear even reused pages ; malloc'ing without expanding the heap |
| 18:27.30 | Maloeran | Normally, malloc() only reaches the OS if the heap has to be extended. Otherwise, it stays entirely in user space |
| 18:27.34 | Maloeran | On a sane and decent OS anyway |
| 18:28.02 | ``Erik | erm, ... vm and wm are different, dude |
| 18:29.13 | ``Erik | (heh, and this is exactly where compacting gc's shine) |
| 18:29.45 | Maloeran | Checking libbu, I only saw red-black tree stuff there last time |
| 18:30.34 | ``Erik | I'm pretty sure the libbu memory management is just portable passthrough stuff, though |
| 18:31.36 | ``Erik | stupid headache *grr* |
| 18:32.00 | Maloeran | I really don't feel like writing a memory manager to handle broken malloc() implementations, but if I must.. |
| 18:32.17 | ``Erik | <-- thinks it's less broken than linux's :( |
| 18:33.03 | Maloeran | Surely you agree that if software deals with sensitive information, there are robust and _efficient_ mechanisms to deal with this, instead of having every malloc() call being zero'ed? |
| 18:33.28 | ``Erik | given the quality of 95% of coders writing 'real' applications, no. I don't. |
| 18:33.34 | Maloeran | malloc()'ed memory is not supposed to be cleared, it's supposed to be fast |
| 18:34.07 | ``Erik | hm, I've never thought of malloc as a fast operation *shrug* if you want fast, allocate a big honkin' heap and do it yourself in that... |
| 18:34.33 | Maloeran | Clearing the new pages as the heap grows would have made a certain sense, but for every malloc call, this is highly absurd |
| 18:34.48 | ``Erik | ... |
| 18:35.00 | ``Erik | you cannot make that statement because of how mmu's work. |
| 18:35.21 | ``Erik | you can free 4k, and then "immediately" alloc 4k, and you are not guaranteed that you got the same 4k back |
| 18:35.33 | ``Erik | you coudl've gotten one of my pages, or a completely different page altogether |
| 18:35.50 | Maloeran | Of course not, but it's likely to be within the heap for the process address space |
| 18:36.11 | ``Erik | ... for the process address space, yes... but not the wired address space |
| 18:36.33 | ``Erik | physical memory doesn't line up to process memory, that's what the mmu does... |
| 18:36.39 | Maloeran | The heap never shrinks, the OS doesn't know that the page is now unused |
| 18:36.59 | ``Erik | erm, which heap? heh |
| 18:37.27 | Maloeran | The heap of the process ; the memory manager is likely to reuse that page and you'll get what you had previously stored there, without ever making a syscall |
| 18:37.28 | ``Erik | free() is to mark a heap as unused so it can be culled... |
| 18:37.43 | ``Erik | and it disassociates it from the wired page |
| 18:37.45 | Maloeran | So the heap can shrink on OSX? It never does on Linux |
| 18:38.56 | Maloeran | That seems to be a logical explanation as to why every malloc() call is zero'ed |
| 18:40.27 | ``Erik | the process heap should be able to shrink on every os :/ |
| 18:40.46 | ``Erik | now the memory address of new allocations is up in the air, but *shrug* |
| 18:42.04 | Maloeran | You can't shrink the heap on Linux. If it grows high and shrink, unused high pages will eventually be put on swap to make room for other processes, and just forgotten |
| 18:42.15 | Maloeran | That design has its flaws too ( the swapping ) |
| 18:42.17 | *** join/#brlcad cadguy (n=butler@bz.bzflag.bz) | |
| 18:42.26 | ``Erik | heh, and eventually oom |
| 18:42.56 | cadguy | Yo! How is everyone? |
| 18:43.00 | ``Erik | (might be why I've seen ugly oom's on linux, it's malloc is broken... O:-) ) |
| 18:43.18 | Maloeran | Good afternoon Lee |
| 18:43.35 | ``Erik | email is sent, lee... subj "Sql" |
| 18:43.36 | Maloeran | BSD's malloc() seems less broken than OSX still, it clears new pages but not the content of every malloc() call |
| 18:43.44 | cadguy | Howdy Maloeran |
| 18:44.06 | Maloeran | Just having a long debate with Erik about why the raytracer's prep is so terribly slow on OSX |
| 18:44.09 | ``Erik | osX only zerofills when the freshly allocated page is touched, as far as I can tell |
| 18:45.26 | Maloeran | Now reading libbu's memory manager, I suppose that's the solution to work around inefficient malloc implementations |
| 18:45.27 | cadguy | Hmm. How many pages are we allocating? Lots? |
| 18:45.40 | Maloeran | Lots of pages, which are often just unused and freed |
| 18:45.57 | Maloeran | malloc() is quite fast on Linux as pages are never cleared |
| 18:45.58 | cadguy | Yes, that's a notorious performance killer. |
| 18:46.09 | cadguy | That's a security issue. |
| 18:46.45 | Maloeran | When dealing with sensitive information, processes can mlock() the memory, there are POSIX functions to take care of that |
| 18:47.25 | Maloeran | But as Erik argued, a dirty and inefficient fix at the OS level seems to be required due to the amount of bad software out there... *shakes head* |
| 18:47.42 | cadguy | The usual technique is to keep a buffer pool if you want to alloc/free a lot to keep the code easy. Then allocate through your own buffer pool. |
| 18:48.31 | ``Erik | *nod* allocate a slew of pages, keep 'free' and 'used' linked lists, when one is freed or allocated, just change which list it lives in |
| 18:48.39 | cadguy | Yea. Lots of lame code mucking around with priviledges. Remember mlock() didn't appear until 4.4BSD. |
| 18:48.41 | Maloeran | Right. I'm checking libbu, but I won't hide that I'm used to deal with an efficient malloc implementation |
| 18:49.13 | ``Erik | if you allocate with nothing in the free list, free more... if you're worried about memory consumption, free() some out of the free list when it reaches a threshhold |
| 18:49.28 | ``Erik | s/efficient/insecure/ :) |
| 18:49.51 | Maloeran | Yes yes, I got that to deal with many small chunks. I haven't got a full memory manager to deal with chunks of all sizes and shapes |
| 18:49.52 | cadguy | No reason to hide. Just be aware that there are space/time/security tradeoffs that different OS's make. |
| 18:50.03 | ``Erik | my bike goes 20kph and stays together, yours goes 30 and kicks the wheels off every 50km |
| 18:50.05 | ``Erik | :D |
| 18:50.52 | Maloeran | :) Eh well, time to write a memory manager then! |
| 18:51.24 | ``Erik | <-- thought that's what mm was supposed to be o.O :) |
| 18:51.56 | Maloeran | It's not a full-blown memory manager, it has efficient handling of packed tiny chunks, balanced trees, etc. |
| 18:52.47 | Maloeran | since Linux's malloc() always performed decently for management of medium to large sized chunks |
| 18:53.26 | cadguy | In general, any time you can avoid a system call, it is worth doing. |
| 18:54.46 | Maloeran | On Linux, free() never shrinks the heap, so malloc() will always remain in user-space unless the heap has to grow. I realize it's quite different on OSX |
| 18:55.36 | cadguy | And different on solaris and other Unix's |
| 19:07.54 | Maloeran | That model really is a challenge for any acceleration structure, the planned second 'prep' pass should improve things a bit... but mostly, ray bundles will |
| 19:08.05 | Maloeran | That and threads |
| 19:09.48 | ``Erik | oohhhhh, rfTraceRays() calls malloc, too |
| 19:11.58 | Maloeran | Only if there are no already allocated 'job' struct in the list, nothing to worry about there |
| 19:15.13 | ``Erik | that dstdir=mathPlanePoint() line (634) is a major contributor to L2 cache misses (27.5%) |
| 19:15.39 | ``Erik | second being line 582 "if(src[linkflags&RF_NODE_AXIS_MASK]<NODE(root)->plane)" at 6.6% |
| 19:16.56 | Maloeran | The prototype had prefetch instructions for caching triangles before the actual tests, that should help |
| 19:17.19 | ``Erik | memory bandwidth looks like, um, around 200-300 MB/s read and 20-30MB/s write |
| 19:17.32 | Maloeran | You know, I really like your profiler :) |
| 19:17.50 | ``Erik | heh, me too, this thing is gnarly |
| 19:18.06 | cadguy | You really should try to pick it up. |
| 19:18.39 | cadguy | Want me to talk with Mark? |
| 19:19.30 | Maloeran | Thanks, just give me 33 hours to receive my first real pay check from Survice assuming the 30 days delay after the end of the month is respected |
| 19:20.19 | ``Erik | you got your travel expenses and per diem all sorted out, correct? |
| 19:21.07 | CIA-9 | BRL-CAD: 03lbutler * 10brlcad/sh/gforge.sh: script for querying a gforge site |
| 19:21.34 | Maloeran | I had no per diem expenses in August, but sure |
| 19:23.24 | *** join/#brlcad IriX64 (n=IriX64@bas3-sudbury98-1168052970.dsl.bell.ca) | |
| 19:23.38 | ``Erik | dude, if you ever do work related travel, the employer should set everything up and take care of all the (reasonable) expenses... |
| 19:24.34 | ``Erik | it's chump change to them, a no brainer investment... |
| 19:27.16 | Maloeran | Ah don't worry, I'll be quite fine. The 30 days delay for a monthly pay is just a bit annoying, after 2-3 months of unpaid vacation anyway ;) |
| 19:27.41 | ``Erik | rtiBatchNsCallback() is your flat shadow-less shader? |
| 19:27.53 | Maloeran | Somewhat, yes |
| 20:57.14 | CIA-9 | BRL-CAD: 03lbutler * 10brlcad/sh/gforge.sh: make script adaptable to host |
| 21:13.00 | Maloeran | Erik, before I write a bunch of code, do you have Hoard handy to see if the memory manager does a better job? |
| 21:13.25 | Maloeran | It might clear pages the BSD way even on OSX |
| 22:43.50 | ``Erik | hoard? nope |
| 23:16.54 | Maloeran | Oh well. Everything but sectors and steps are now allocated by sliced blocks, these chunks of variable size will have their own personal little memory manager |