| 01:01.00 | Maloeran | I know I have whined about this before, but the C parser of Doxygen sure is broken on many points |
| 01:02.07 | Twingy | but Lee sure seems to love it |
| 01:02.18 | Maloeran | Arrays of function pointers are functions, struct variables with gnu99 __attribute__ are "class methods" ( hello, this is C ) |
| 01:03.14 | Maloeran | It's nice on some aspects, but the parser makes some gross mistakes |
| 01:07.43 | Maloeran | Whenever the data type of something is complex, it thinks it's a function or a method. Oh, and I had Doxygen tell me that for(;;) and if() where undocumented "functions" before |
| 01:11.06 | ``Erik | nice |
| 01:11.21 | ``Erik | if only for() and if() WERE functions in C |
| 01:11.22 | ``Erik | *sigh* |
| 01:12.40 | Maloeran | Lisp is nice, but I'm not against having some fundamental language contructs that are not functions |
| 01:35.56 | ``Erik | http://www.calarts.edu/~jwhite/gbj/SeeHim.html |
| 01:36.10 | ``Erik | lithp and thcheme have language constructs that aren't functions |
| 07:08.23 | *** join/#brlcad clock_ (i=clock@84-72-60-30.dclient.hispeed.ch) | |
| 08:15.17 | *** join/#brlcad clock_ (n=clock@zux221-122-143.adsl.green.ch) | |
| 09:21.10 | *** join/#brlcad clock_ (n=clock@zux221-122-143.adsl.green.ch) | |
| 10:49.29 | *** join/#brlcad docelic (i=docelic@ri01-092.dialin.iskon.hr) | |
| 14:51.31 | *** join/#brlcad docelic (i=docelic@ri02-084.dialin.iskon.hr) | |
| 16:51.15 | brlcad | ``Erik: you have any 6.2 discs burnt? |
| 17:18.03 | ``Erik | uhmmmmmmmm, no, 6.2 isn't released yet |
| 17:18.21 | ``Erik | it's rc1 |
| 17:45.45 | *** join/#brlcad ntroutman (n=nathanie@prox.snu.edu) | |
| 18:02.42 | *** join/#brlcad ntroutman (n=nathanie@prox.snu.edu) | |
| 18:36.22 | ``Erik | mal: fixes rendering, yup, but generation is still effed up |
| 18:36.32 | ``Erik | linux 8core amd64... |
| 18:36.33 | ``Erik | linkListAddPair (listhead0=0x2a95d3a668, listhead1=0x2a95bfe290, step0=0x2a95d3a618, step1=0x2a95bfe238, memblock=0x513bf0) at ../../../RF/prepmodel.c:718 |
| 18:36.34 | ``Erik | 718 if( ( linklist1->used == LINKS_PER_LIST ) ) { |
| 18:36.34 | ``Erik | (gdb) print *linklist1 |
| 18:36.34 | ``Erik | Cannot access memory at address 0x0 |
| 18:37.02 | ``Erik | (and fbsd just sits, no crash) |
| 18:40.45 | Maloeran | Okay, every time or sometimes? |
| 18:42.52 | Maloeran | And fbsd freezes constantly or it's less consistant? I'm just trying to get an idea of what's going on |
| 18:43.25 | *** join/#brlcad docelic (i=docelic@ri01-139.dialin.iskon.hr) | |
| 18:44.06 | ``Erik | linux: |
| 18:44.07 | ``Erik | 0x0000002a9585ba41 in jobModelPrepStep (engine=0x50aa20, job=Variable "job" is not available. |
| 18:44.07 | ``Erik | ) at ../../../RF/prepmodel.c:1720 |
| 18:44.07 | ``Erik | 1720 if( steplink->edge[axismin] < plane ) { |
| 18:44.47 | Maloeran | Right okay, not too consistent, though I guess it does crash pretty much all the time |
| 18:45.19 | Maloeran | And always in the same pass, okay |
| 18:46.18 | ``Erik | if there's a valid cache, it's all grand... gettin' ~40fps on both the 4 core fbsd and the 8 core linux (hardcoded at 4 threads, I guess) |
| 18:46.58 | ``Erik | hah, linux: Can't attach LWP -1023750857: No such process |
| 18:47.10 | ``Erik | fbsd seems stuck |
| 18:47.11 | Maloeran | Yes, fix rfdemo.c to increase that count, hard-coded #define |
| 18:47.21 | Maloeran | No such process, eh? :) |
| 18:48.47 | Maloeran | I guess I now have some information to figure out this bug |
| 18:50.22 | Maloeran | The more backtrace you can throw at me, the easier I think it would be to see what's happening in there |
| 18:51.17 | ``Erik | 0x0000002a9585ad7f in jobModelPrepEvaluate (engine=0xc4f5d489c4f61b90, job=Variable "job" is not available. |
| 18:51.17 | ``Erik | ) at ../../../RF/prepmodel.c:974 |
| 18:51.17 | ``Erik | 974 } while( --a ); |
| 18:51.19 | ``Erik | wonky |
| 18:51.32 | ``Erik | bt on that is a blown stack |
| 18:51.53 | Maloeran | Groovy |
| 18:52.55 | ``Erik | meh, paste.lisp.org seems to be down |
| 18:53.06 | Maloeran | rafb.net/paste of course |
| 18:53.17 | ``Erik | <-- coudln't remember the url |
| 18:56.10 | ``Erik | http://rafb.net/paste/results/kONUFQ20.html |
| 18:56.24 | ``Erik | those seem to be the 4 cases I get on linux 2.6 amd64 |
| 18:56.51 | Maloeran | Great, thank you |
| 18:56.56 | ``Erik | np |
| 18:57.10 | ``Erik | I have a 2 hour meeting starting in 30 minutes... |
| 18:57.51 | Maloeran | :) Have fun! |
| 19:00.16 | ``Erik | heh |
| 19:00.34 | ``Erik | just letting you know in case you have any modifications you want to see run on multi-cache machines |
| 19:01.56 | ``Erik | hrm, I'm going to try to bump the thread count to something a bit more straining... hope that linux doesn't drop the ball on it... I know it's ok on fbsd and solaris |
| 19:02.25 | ``Erik | heh, gdb might be the problem here :) |
| 19:04.07 | ``Erik | goddamn, it's slugging down ugly :( |
| 19:11.06 | ``Erik | http://rafb.net/paste/results/wJKR5372.html <-- new error |
| 19:11.10 | ``Erik | 128 threads |
| 19:11.28 | ``Erik | that the 'job' variable is unavailable on all the errors is... suspicious |
| 19:19.29 | Maloeran | New error, always the same pass, and consistent with a corruption of the lists of reverse links |
| 19:20.06 | Maloeran | 128 threads, I haven't tried that yet :). I think I can figure out what's going on, the problem has been narrowed a bit |
| 20:01.31 | *** join/#brlcad dtidrow (n=dtidrow@c-69-255-182-248.hsd1.va.comcast.net) | |
| 20:16.54 | *** join/#brlcad clock_ (i=clock@84-72-63-88.dclient.hispeed.ch) | |
| 21:09.42 | *** join/#brlcad dtidrow_work (n=dtidrow@host169.objectsciences.com) | |
| 21:14.10 | *** join/#brlcad ntroutman (n=nathanie@prox.snu.edu) | |
| 22:05.43 | *** join/#brlcad Twingy (n=justin@74.92.144.217) | |
| 22:31.01 | brlcad | looks like Maloeran needs some valgrindage |
| 22:31.37 | Maloeran | I can't reproduce the bug, it's of no use ;) |
| 22:31.55 | brlcad | valgrind will still report the memory problems |
| 22:32.36 | brlcad | assuming it's not a platformness but actually just problem being masked by linux behavior |
| 22:33.10 | Maloeran | It's a problem caused by multiple cores not shared the same cache, probably a faulty mutex somewhere |
| 22:33.19 | brlcad | valgrind is very good in what it does, way better than the memory bounds checkers |
| 22:34.36 | Maloeran | Indeed, I'm just skeptical on its ability to catch a bug that requires multiple caches to occur |
| 22:34.52 | Maloeran | Anyway. If I haven't solved this by tomorrow, I'll call a friend to ask to boot his 4 cores desktop on Linux |
| 22:35.53 | brlcad | you've narrowed it down that far for sure, or guessing? :) |
| 22:36.30 | Maloeran | I can never reproduce the bug when it is running on a single core, or dual cores but with shared cache |
| 22:36.59 | brlcad | you mean you can't reproduce the crash |
| 22:38.19 | brlcad | point being that you could run for years on a single-thread single core machine and never see bad behavior, yet there still be a memory problem (that is simply masked by OS or C lib behavior) |
| 22:38.42 | brlcad | running gentoo? |
| 22:38.52 | Maloeran | Of course so, but that memory problem might not be a "problem" in that case just because threads all share the same cache |
| 22:38.55 | Maloeran | Yes |
| 22:40.58 | dtidrow_work | gonna get nasty cold around here tonight :-( |
| 22:41.00 | Maloeran | An unsafe memory instruction might just happen to have a direct memory operand, so you'll never see a problem if threads are executed one at a time on the same cache |
| 22:41.49 | Maloeran | And no debugger could ever figure that out, unless you trigger the bug with threads running on distinct caches |
| 22:42.22 | brlcad | you're already assuming that's the bug too (and maybe it is, but it's generally not good to assume when hunting stack corruption) |
| 22:42.45 | Maloeran | The stack corruption was caused by something else, probably a consequence of the first bug |
| 22:42.51 | brlcad | easy enough to recompile once, run and get a valgrind report |
| 22:43.11 | brlcad | comes up clean and then at least basic operation is sound |
| 22:43.43 | brlcad | if you get lucky, though, it might indicate a problem elsewhere that hadn't even been considered, or thought to be fine |
| 22:44.28 | ``Erik | heh |
| 22:44.33 | Maloeran | Syscall param write() points to unitialized bytes in XOpenDisplay() Eheh okay, that was unexpected |
| 22:57.16 | Maloeran | ( It's not done prep'ing, I think I'll get results before tomorrow ) |
| 23:10.16 | *** join/#brlcad ntroutman (n=nathanie@prox.snu.edu) | |
| 23:11.30 | Maloeran | No error found by Valgrind, unfortunately |
| 23:19.59 | ``Erik | hum |
| 23:20.16 | ``Erik | what about grabbing bochs, setting it up as a true smp and grabbing a leenewx smp disk image |
| 23:20.19 | ``Erik | and see if it can crash in that? |
| 23:20.32 | ``Erik | (won' be fast, but it might be a way to reproduce the error) |
| 23:49.04 | *** join/#brlcad ntroutman_ (n=nathanie@prox.snu.edu) | |