IRC log for #brlcad on 20190325

`00:02.51`	`*** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)`
`00:42.43`	`*** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)`
`00:57.09`	`*** join/#brlcad teepee (~teepee@unaffiliated/teepee)`
`01:04.00`	`*** join/#brlcad LordOfBikes (~armin@dslb-088-065-188-154.088.065.pools.vodafone-ip.de)`
`02:25.27`	`*** join/#brlcad kintel (~textual@unaffiliated/kintel)`
`07:10.14`	`brlcad`	`getting closer, but going to have to continue this debugging in a few hours with fresh eyes`
`07:28.16`	`*** join/#brlcad KimK (~Kim__@2001:579:d00c:600:4a5b:39ff:fe0b:57d2)`
`08:44.51`	`*** join/#brlcad hightower2 (~hightower@141-210.dsl.iskon.hr)`
`08:45.25`	`*** join/#brlcad hightower2 (~hightower@unaffiliated/hightower2)`
`08:57.55`	`*** join/#brlcad merzo (~merzo@43-61-133-95.pool.ukrtel.net)`
`09:48.37`	`*** join/#brlcad teepee_ (~teepee@unaffiliated/teepee)`
`10:39.26`	`*** join/#brlcad merzo (~merzo@195.20.130.10)`
`11:01.19`	`*** join/#brlcad teepee- (bc5c2133@gateway/web/freenode/ip.188.92.33.51)`
`13:38.21`	`*** join/#brlcad kintel (~textual@unaffiliated/kintel)`
`14:14.09`	`*** join/#brlcad merzo (~merzo@195.20.130.10)`
`15:11.33`	`*** join/#brlcad kintel (~textual@unaffiliated/kintel)`
`15:34.26`	`starseeker`	`brlcad: Don't know if they'll be any use for debugging, but I'm trying to get cache tests set up that will ensure things stay working once we hammer out the last issue(s)`
`15:36.35`	`starseeker`	`is there any way to make an attempt to bu_free a null pointer fatal? A quick look in the code suggests there isn't... Could we set up an environment variable or something we could set for testing purposes to make bu_free exit on attempt to free null?`
`15:37.12`	`brlcad`	`I noticed the test, sounds like a good plan`
`15:37.59`	`brlcad`	`the usual way to catch that would be a sanity macro before the free call`
`15:38.51`	`starseeker`	`if we knew where to put it...`
`15:39.44`	`starseeker`	`what I'm after is that spewage of bu_free message we got turning into a fatal error, because in a lot of cases things will "keep working" after that happens`
`15:39.45`	`brlcad`	`fwiw, i'm debugging a db_close() failure. there's some memory management issue when two processes attempt to cache at the same time`
`15:40.06`	`brlcad`	`i think that's the same bug`
`15:40.27`	`starseeker`	`brlcad: cool. Hope I didn't stomp on anything - I figured I'd work on the tests this morning after I saw you were digging into the main code`
`15:40.45`	`brlcad`	`easily reproduced because it only happens the first time an object is created and another process tries to create it too`
`15:40.53`	`brlcad`	`working on tests is golden`
`15:41.13`	`brlcad`	`I'm close it feels like something simple`
`15:41.46`	`starseeker`	`so you're seeing an actual multi-process issue (e.g. two different programs at the same time), or just multiple threads?`
`15:42.32`	`brlcad`	`in theory, it can happen with two threads or processes -- it's whenever two try to cache at the same time.`
`15:42.42`	`starseeker`	`nods`
`15:42.52`	`brlcad`	`whoever gets there second ends up with bad dbip book-keeping reliably`
`15:43.27`	`Stragus`	`Multiple processes playing in the same files, without file locking? Neat, a bit tricky too`
`15:43.29`	`brlcad`	`I'm probably adding the wrong dbip to the hash or something stupid`
`15:43.54`	`starseeker`	`Stragus: an efficient way to go bald, so far`
`15:44.17`	`starseeker`	`Stragus: locking is fine, if we need to - we're just missing a guard somewhere`
`15:44.24`	`brlcad`	`Stragus: yeah, that part actually works alright -- it's when they go to clean up their memory ... on of them stashed a bad pointer`
`15:45.00`	`brlcad`	`I don't think this is a guard issue, there's no indication -- it seems like a straight up book-keeping bug`
`15:45.14`	`starseeker`	`ah, k`
`15:45.24`	`brlcad`	`could be, of course, but so far it's straight up repeatable, not raced`
`15:46.09`	`starseeker`	`considers whether a raced bug might be hiding behind the current straight up failure, shudders, and goes back to test writing`
`15:47.38`	`Stragus`	`So it's just memory and unrelated to file locking/access... If you get desperate and want to try something new, I wrote a LD_PRELOAD memory debugger: putting each allocation between pages that core dump on access, tracking all memory allocations and their full backtraces (handy to trace memory leaks, etc.), and so on`
`15:47.57`	`Stragus`	`I actually use it all the time, it beats Valgrind for me`
`15:48.16`	`starseeker`	`I probably shouldn't be doing two rtips at once in the same program... I don't know that that is actually intended to work...`
`15:48.19`	`starseeker`	`Stragus: cool`
`15:48.24`	`starseeker`	`is that up on your site?`
`15:49.54`	`Stragus`	`I never really shared it anywhere. I wrote that once out of desperation to track a bug`
`15:51.16`	`starseeker`	`Stragus: you should sent it to the Valigrind people and have them build it in. --totally-desperate or some such option ;-)`
`15:52.07`	`Stragus`	`Eheh. Technically Valgrind is fancier, but #1 Valgrind is too slow for many uses #2 I want to core dump INSTANTLY when I access where I shouldn't, not some time later "when I use the value" as decided by Valgrind`
`15:52.39`	`starseeker`	`nods`
`15:52.40`	`Stragus`	`Ah, and #3 I like getting a detailed list of all memory allocations at any time while the code is running`
`15:53.01`	`Stragus`	`Like this: http://www.rayforce.net/mmdebug.log`
`15:54.51`	`starseeker`	`nice`
`16:04.04`	`brlcad`	`think I just found the bug`
`16:04.18`	`Stragus`	`cheers for bug-crushing brlcad`
`16:05.11`	`brlcad`	`just a a stray db_close() where we shouldn't be closing anything`
`16:06.15`	`Stragus`	`Sounds like a typical double-free. I thought you would a debugging mode #define for bu_alloc and bu_free (hence the wrappers), catching that stuff and more`
`16:06.30`	`Stragus`	`you would +have`
`16:06.33`	`brlcad`	`starseeker: simultaneous rtips should work just fine...`
`16:10.37`	`brlcad`	`Stragus: a bit more complicated. this isn't allocation-related, it's a handle to reference counted memory mapped files that are stored in a hash -- code pulled the handle from the hash to see if it was there... closed it (mind you it's still in the hash), then went to use it again. that was all good and fine until it came time to clean up and shut down, and a bit of naive code simply iterated over the`
`16:10.42`	`brlcad`	`hash and closed everything (because it should only be there if it's open)`
`16:11.14`	`brlcad`	`could've caught that it was already closed, but that would have just masked the mistake. just took a bit to find where it was getting closd prematurely`
`16:12.07`	`Stragus`	`nods`
`16:20.45`	`brlcad`	`Stragus: sounds like you implemented _FORTIFY_SOURCE=2`
`16:21.10`	`brlcad`	`mind you, with some fancy stack printing instead of just detect and abort`
`16:24.47`	`Stragus`	`I thought _FORTIFY_SOURCE was only for glibc calls?`
`16:25.09`	`brlcad`	`forget which OS, but there was one out there (maybe openbsd, solaris, I forget) for a while whose libc had all allocations set up to intentionally incur a fault. some of that carried over to Mac with their libc as well.`
`16:25.18`	`Stragus`	`I want a core dump the moment I step over a byte I shouldn't, anywhere`
`16:25.26`	`Stragus`	`Cool`
`16:27.21`	`Stragus`	`(technically, my mmdebug has mechanisms to exclude some allocations from the mmap stuff, 12288 bytes of overhead per malloc() can cause trouble)`
`16:28.17`	`brlcad`	`yeah, _FORTIFY_SOURCE tries to do the least expensive and only checks when access via some call (iirc, they could have gotten more advanced), but the OS-level one was definitely any access. read one byte past a char[12] - boom, segfault.`
`16:29.13`	`Stragus`	`Neat. Yes, I'm not surprised others have done it before, it's very handy`
`16:29.18`	`brlcad`	`there was a lot of commotion back at the time because so many apps wouldn't run when they turned it on`
`16:29.26`	`Stragus`	`Ahah`
`16:31.01`	`Stragus`	`I had to put some exclusion mechanisms for various reasons, for example the NVIDIA GL drivers malloc'ed 3 bytes then later read 4 bytes from that address`
`16:36.53`	`starseeker`	`probably openbsd, that sounds like their style`
`16:43.55`	`brlcad`	`starseeker: just slammed through a bunch of tests including deleting cache mid-processings, read-only, dozens of simultaneous collisions ... so far looking good.`
`19:49.06`	`starseeker`	`brlcad: sweet`
`19:49.33`	`starseeker`	`I've got at least some of the tests in place (not actually shooting the rays yet, but the cache bit is there)`
`19:49.59`	`*** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)`
`19:50.02`	`starseeker`	`I haven't figured out what I'm doing wrong with the rtip yet`
`19:53.00`	`*** join/#brlcad teepee (~teepee@unaffiliated/teepee)`
`20:09.05`	`*** join/#brlcad kintel (~textual@unaffiliated/kintel)`
`20:36.35`	`starseeker`	`brlcad: are you able to run the cache test rt_cache 5 10 successfully?`
`20:51.56`	`*** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)`
`20:52.16`	`starseeker`	`hang on, I'm doing something wrong with mappedfile.c...`
`21:23.01`	`*** join/#brlcad merzo (~merzo@48-10-132-95.pool.ukrtel.net)`
`21:39.27`	`starseeker`	`brlcad: OK, r72749 and r72751 may do it - need to try some real tests and have a go on Windows`
`21:44.47`	`starseeker`	`brlcad: should the rt_cache tests follow through and actually do a shot, or do we not need that level of validation? (I'm going to add some basic sanity checks on number of files present, but setting up the all-up shot validation is a bit more infrastructure...)`
`23:23.38`	`*** join/#brlcad teepee (~teepee@unaffiliated/teepee)`

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.