IRC log for #brlcad on 20190325

00:02.51 *** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)
00:42.43 *** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)
00:57.09 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
01:04.00 *** join/#brlcad LordOfBikes (~armin@dslb-088-065-188-154.088.065.pools.vodafone-ip.de)
02:25.27 *** join/#brlcad kintel (~textual@unaffiliated/kintel)
07:10.14 brlcad getting closer, but going to have to continue this debugging in a few hours with fresh eyes
07:28.16 *** join/#brlcad KimK (~Kim__@2001:579:d00c:600:4a5b:39ff:fe0b:57d2)
08:44.51 *** join/#brlcad hightower2 (~hightower@141-210.dsl.iskon.hr)
08:45.25 *** join/#brlcad hightower2 (~hightower@unaffiliated/hightower2)
08:57.55 *** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)
09:48.37 *** join/#brlcad teepee_ (~teepee@unaffiliated/teepee)
10:39.26 *** join/#brlcad merzo (~merzo@195.20.130.10)
11:01.19 *** join/#brlcad teepee- (bc5c2133@gateway/web/freenode/ip.188.92.33.51)
13:38.21 *** join/#brlcad kintel (~textual@unaffiliated/kintel)
14:14.09 *** join/#brlcad merzo (~merzo@195.20.130.10)
15:11.33 *** join/#brlcad kintel (~textual@unaffiliated/kintel)
15:34.26 starseeker brlcad: Don't know if they'll be any use for debugging, but I'm trying to get cache tests set up that will ensure things stay working once we hammer out the last issue(s)
15:36.35 starseeker is there any way to make an attempt to bu_free a null pointer fatal? A quick look in the code suggests there isn't... Could we set up an environment variable or something we could set for testing purposes to make bu_free exit on attempt to free null?
15:37.12 brlcad I noticed the test, sounds like a good plan
15:37.59 brlcad the usual way to catch that would be a sanity macro before the free call
15:38.51 starseeker if we knew where to put it...
15:39.44 starseeker what I'm after is that spewage of bu_free message we got turning into a fatal error, because in a lot of cases things will "keep working" after that happens
15:39.45 brlcad fwiw, i'm debugging a db_close() failure. there's some memory management issue when two processes attempt to cache at the same time
15:40.06 brlcad i think that's the same bug
15:40.27 starseeker brlcad: cool. Hope I didn't stomp on anything - I figured I'd work on the tests this morning after I saw you were digging into the main code
15:40.45 brlcad easily reproduced because it only happens the first time an object is created and another process tries to create it too
15:40.53 brlcad working on tests is golden
15:41.13 brlcad I'm close it feels like something simple
15:41.46 starseeker so you're seeing an actual multi-process issue (e.g. two different programs at the same time), or just multiple threads?
15:42.32 brlcad in theory, it can happen with two threads or processes -- it's whenever two try to cache at the same time.
15:42.42 starseeker nods
15:42.52 brlcad whoever gets there second ends up with bad dbip book-keeping reliably
15:43.27 Stragus Multiple processes playing in the same files, without file locking? Neat, a bit tricky too
15:43.29 brlcad I'm probably adding the wrong dbip to the hash or something stupid
15:43.54 starseeker Stragus: an efficient way to go bald, so far
15:44.17 starseeker Stragus: locking is fine, if we need to - we're just missing a guard somewhere
15:44.24 brlcad Stragus: yeah, that part actually works alright -- it's when they go to clean up their memory ... on of them stashed a bad pointer
15:45.00 brlcad I don't think this is a guard issue, there's no indication -- it seems like a straight up book-keeping bug
15:45.14 starseeker ah, k
15:45.24 brlcad could be, of course, but so far it's straight up repeatable, not raced
15:46.09 starseeker considers whether a raced bug might be hiding behind the current straight up failure, shudders, and goes back to test writing
15:47.38 Stragus So it's just memory and unrelated to file locking/access... If you get desperate and want to try something new, I wrote a LD_PRELOAD memory debugger: putting each allocation between pages that core dump on access, tracking all memory allocations and their full backtraces (handy to trace memory leaks, etc.), and so on
15:47.57 Stragus I actually use it all the time, it beats Valgrind for me
15:48.16 starseeker I probably shouldn't be doing two rtips at once in the same program... I don't know that that is actually intended to work...
15:48.19 starseeker Stragus: cool
15:48.24 starseeker is that up on your site?
15:49.54 Stragus I never really shared it anywhere. I wrote that once out of desperation to track a bug
15:51.16 starseeker Stragus: you should sent it to the Valigrind people and have them build it in. --totally-desperate or some such option ;-)
15:52.07 Stragus Eheh. Technically Valgrind is fancier, but #1 Valgrind is too slow for many uses #2 I want to core dump INSTANTLY when I access where I shouldn't, not some time later "when I use the value" as decided by Valgrind
15:52.39 starseeker nods
15:52.40 Stragus Ah, and #3 I like getting a detailed list of all memory allocations at any time while the code is running
15:53.01 Stragus Like this: http://www.rayforce.net/mmdebug.log
15:54.51 starseeker nice
16:04.04 brlcad think I just found the bug
16:04.18 Stragus cheers for bug-crushing brlcad
16:05.11 brlcad just a a stray db_close() where we shouldn't be closing anything
16:06.15 Stragus Sounds like a typical double-free. I thought you would a debugging mode #define for bu_alloc and bu_free (hence the wrappers), catching that stuff and more
16:06.30 Stragus you would +have
16:06.33 brlcad starseeker: simultaneous rtips should work just fine...
16:10.37 brlcad Stragus: a bit more complicated. this isn't allocation-related, it's a handle to reference counted memory mapped files that are stored in a hash -- code pulled the handle from the hash to see if it was there... closed it (mind you it's still in the hash), then went to use it again. that was all good and fine until it came time to clean up and shut down, and a bit of naive code simply iterated over the
16:10.42 brlcad hash and closed everything (because it should only be there if it's open)
16:11.14 brlcad could've caught that it was already closed, but that would have just masked the mistake. just took a bit to find where it was getting closd prematurely
16:12.07 Stragus nods
16:20.45 brlcad Stragus: sounds like you implemented _FORTIFY_SOURCE=2
16:21.10 brlcad mind you, with some fancy stack printing instead of just detect and abort
16:24.47 Stragus I thought _FORTIFY_SOURCE was only for glibc calls?
16:25.09 brlcad forget which OS, but there was one out there (maybe openbsd, solaris, I forget) for a while whose libc had all allocations set up to intentionally incur a fault. some of that carried over to Mac with their libc as well.
16:25.18 Stragus I want a core dump the moment I step over a byte I shouldn't, anywhere
16:25.26 Stragus Cool
16:27.21 Stragus (technically, my mmdebug has mechanisms to exclude some allocations from the mmap stuff, 12288 bytes of overhead per malloc() can cause trouble)
16:28.17 brlcad yeah, _FORTIFY_SOURCE tries to do the least expensive and only checks when access via some call (iirc, they could have gotten more advanced), but the OS-level one was definitely any access. read one byte past a char[12] - boom, segfault.
16:29.13 Stragus Neat. Yes, I'm not surprised others have done it before, it's very handy
16:29.18 brlcad there was a lot of commotion back at the time because so many apps wouldn't run when they turned it on
16:29.26 Stragus Ahah
16:31.01 Stragus I had to put some exclusion mechanisms for various reasons, for example the NVIDIA GL drivers malloc'ed 3 bytes then later read 4 bytes from that address
16:36.53 starseeker probably openbsd, that sounds like their style
16:43.55 brlcad starseeker: just slammed through a bunch of tests including deleting cache mid-processings, read-only, dozens of simultaneous collisions ... so far looking good.
19:49.06 starseeker brlcad: sweet
19:49.33 starseeker I've got at least some of the tests in place (not actually shooting the rays yet, but the cache bit is there)
19:49.59 *** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)
19:50.02 starseeker I haven't figured out what I'm doing wrong with the rtip yet
19:53.00 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
20:09.05 *** join/#brlcad kintel (~textual@unaffiliated/kintel)
20:36.35 starseeker brlcad: are you able to run the cache test rt_cache 5 10 successfully?
20:51.56 *** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)
20:52.16 starseeker hang on, I'm doing something wrong with mappedfile.c...
21:23.01 *** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)
21:39.27 starseeker brlcad: OK, r72749 and r72751 may do it - need to try some real tests and have a go on Windows
21:44.47 starseeker brlcad: should the rt_cache tests follow through and actually do a shot, or do we not need that level of validation? (I'm going to add some basic sanity checks on number of files present, but setting up the all-up shot validation is a bit more infrastructure...)
23:23.38 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.