recent bugs · brlcad · Zulip Chat Archive

@starseeker did you happen to look at "Primitive select (mouse behavior) causes drawn solids to disappear instead of being highlighted. When hitting escape (return to normal mouse behavior) all solids reappear." ?

starseeker (Jul 21 2020 at 11:33):

That rings a bell with a change Nick made a while back - I'll have to check the history, something about transparency support in MGED. Don't know if it was that specifically, but there was some sort of drawing problem...

Sean (Jul 21 2020 at 20:42):

yeah, I thought it was nick's change, but I couldn't confirm it and there was no NEWS entry. probably a set of commits deep down in my inbox I haven't gotten to reviewing yet.

Sean (Aug 05 2020 at 02:46):

@starseeker: sorry I should have tested the process I/O change more carefully.

Sean (Aug 05 2020 at 02:49):

that is a little unsettling that it broke rt in archer on windows... do you have any more info on why? presumably fileno() is not returning 0/1/2 for stdin/err/out, which should imply something else is seriously wrong.

Sean (Aug 05 2020 at 03:03):

it is mildly concerning that the api is assuming 0/1/2 are in/err/out. not just being pedantic. it's going to be terribly difficult to debug out of context, particularly any code that happens to do perfectly normal pipe operations on 0/1/2. likely result in i/o just not working right mysteriously, things failing inconsistently.

Sean (Aug 05 2020 at 03:04):

I wouldn't be surprised if the rt-archer breakage wasn't a reverse assumption elsewhere in the code...

Sean (Aug 05 2020 at 03:10):

on a mildly related note to your commit comment yesterday, how about just jumping to what we talked about last year -- using capnproto for commands to talk back? the protocol could be just simple err/out log messages for now, set up and handled internal to the bu_process api. then you'd only need on descriptor (e.g., stdout) which could transmit both out+err log messages from the process.

starseeker (Aug 05 2020 at 12:43):

starseeker (Aug 05 2020 at 12:44):

I switched to using an enum to specify which channel we're intending to use, so hopefully that will alleviate the issue?

starseeker (Aug 05 2020 at 12:46):

The rt-archer breakage (when I debugged) was the code testing fileno on Windows never getting an expected value for any of the three inputs (stdin/stdout/stderr) and simply giving up - that put us in an anomalous position because to archer it looked like the subprocess wasn't returning any info at all on any valid channel.

starseeker (Aug 05 2020 at 12:47):

I'd like to try the capnproto approach, but I'm not sure how easy/hard that will be to get working - it's not an area of programming I'm terribly familiar with, so there'd likely be a significant spin-up cost.

starseeker (Aug 05 2020 at 12:50):

I'm trying to excise bu_list from as much as possible of the libged drawing layer, both to make it easier to understand what the various pieces are doing and as a step towards being able to more easily use the bu_magic mechanism to validate things getting passed around as void *. That leads of course to the vlist and solid containers... another learning experience, but one I can no longer avoid if I'm going to really be able to following what libtclcad/archer are doing about drawing.

starseeker (Aug 05 2020 at 12:51):

The gsh bits hooking up callbacks were simply trying to set up so I could get a simpler-to-debug (i.e. non-Tcl) method of executing subprocess commands for testing.

starseeker (Aug 05 2020 at 13:05):

@Sean I'm sure you've got quite a bit more expertise than I do, so I'd appreciate any insights, but what it's looking like to me so far:

starseeker (Aug 05 2020 at 13:18):

ASIO (https://think-async.com/Asio/asio-1.16.1/doc/asio/overview.html) has support for file descriptors and HANDLEs, but doesn't (so far as I can tell) wrap both mechanisms under one API we could use. (That by the way also appears to be what happens in Tcl, which is why we have the file-descriptor/Tcl_Channel ifdef for the tclcad I/O callbacks. I tried once to consolidate that into just Tcl_Channel, but it didn't work on Linux...)

starseeker (Aug 05 2020 at 13:20):

starseeker (Aug 05 2020 at 13:24):

In some ways I'm actually tempted to see what it would take to extract the Tcl bits for defining these particular events and I/O management into libbu - it's proven to work, and so far I've not come across any simple, stand-alone drop-in alternative...

starseeker (Aug 05 2020 at 13:47):

Hmm... looking again I see the capnproto code does seem to have IPC logic, but I can't tell if they can work without the socket APIs...

starseeker (Aug 05 2020 at 13:57):

starseeker (Aug 05 2020 at 13:58):

starseeker (Aug 05 2020 at 14:00):

OK, so the question is - can gsh be made to work using capnproto for IPC, events and content?

starseeker (Aug 05 2020 at 14:02):

Or does this need to be wired in at the libbu subprocess management level for things like reading and writing?

starseeker (Aug 05 2020 at 14:10):

We need to be able to allow the Tcl_Even t loop to manage the callback invocation for MGED/Archer to allow current behavior...

Sean (Aug 05 2020 at 14:27):

That's interesting. It makes sense on Window because there isn't a standard input descriptor set up by default for GUI apps on Windows unless it's a console application.

Sean (Aug 05 2020 at 14:29):

That probably means something else opened up an input pipe -- and that code wherever it is didn't register it as stdin. Probably is the same bug for out/err too.

Sean (Aug 05 2020 at 14:39):

you're right that capnproto doesn't solve IPC by itself because it turns it into an RPC solution. I actually wouldn't recommend going down that route until you're ready to abandon IPC because of the obvious performance implications. from a technique perspective, though, RPC is quite a bit simpler than dealing with cross-platform IPC.

Sean (Aug 05 2020 at 14:41):

alternative to capn might be worth trying instead is zeromq -- it supports in-process (inter-thread communication) and inter-process (IPC, ports) communication in addition to capn-style benefits for the data being exchanged.

Sean (Aug 05 2020 at 14:51):

Used asio a while back and wouldn't recommend it -- it's really meant for async client/server communication (e.g., pkg alternative) aside from it pulling in the boost ecosystem.

Sean (Aug 05 2020 at 14:55):

Sean (Aug 05 2020 at 14:59):

there are a bunch of ways to do IPC and lots have wrapped it, so I'm sure you can find one that works. some of them may just rely on a particular IPC method and that'll require changing code a little bit. for example, this one (https://github.com/jarikomppa/ipc/) uses shared memory. so instead of using fwrite, calls get changed to sprintf since shared memory works like a malloc'd buffer with both sides of the pipe able to read/write that memory.

Sean (Aug 05 2020 at 15:15):

I'd be cool with that! It really isn't much code that we're talking about. Only issue would be that it's essentially the same problem of consolidating to Tcl_Channel. From what I saw in the code, there's no reason it shouldn't work on linux, so there's almost certainly some other mistaken assumption going on somewhere in the code and until that assumption is found and eliminated, none of these solutions are going to work.

Sean (Aug 05 2020 at 15:17):

It's very much related to the concern I have with using 0/1/2 integers and assuming they are a particular port. I don't know if it's related to this specific problem, but this is the kind of problem that causes. Really hard to debug without unwinding the port from creation to destruction on both sides of the port.

Sean (Aug 05 2020 at 15:19):

No, it'd be the other way around -- you would adapt gsh to capnproto rpc approach instead of events and ipc. It'd look different, but it can work (just not as performant as IPC).

Sean (Aug 05 2020 at 15:21):

and when I say "adapt gsh" that doesn't preclude this belonging in libbu. libbu would ideally provide a call like subprocess_write and subprocess_read or something similar to abstract from the method underneath. The benefit of file descriptors is trying to avoid needing to do that so you can just use read/write or sprintf/sscanf.

starseeker (Aug 05 2020 at 15:24):

starseeker (Aug 05 2020 at 15:26):

The Tcl refactor is on some ways the most incremental change, assuming it doesn't turn gnarly - even if we eventually opt for another solution, that has the advantage of knowing exactly what it should do if the migration is successful (since it's already working in place.)

starseeker (Aug 05 2020 at 15:30):

This is probably an embarrassing question, but what are the implications of abandoning IPC for RPC? I thought RPC was just one form of IPC?

Sean (Aug 05 2020 at 15:34):

So I know that's a lot and probably talking through too many issues to make sense of it all. In summary, I would recommend 1) trying again to consolidate to Tcl_Channel again as whatever is making that not work likely will affect other solutions until the assumption is inadvertently ripped out, 2) try one of the many wrapped options like shared memory or named pipes.. starting with zeromq or a simpler header-only one, and finally 3) switching to RPC for libged with Capnproto after you've abandoned hope on IPC. ;)

starseeker (Aug 05 2020 at 15:36):

/me nods - the Tcl_Channel thing bothered me last time, and if I take it far enough apart to digest it for extraction I should be able to run it to ground one way or the other.

starseeker (Aug 05 2020 at 15:37):

Not to mention squashing another WIN32 ifdef... those are getting hard to remove these days...

Sean (Aug 05 2020 at 15:38):

Only embarrassing questions are the ones not asked. IPC uses a specific operating system method for allowing two processes to exchange data. typical examples are files (and file descriptors), named pipes, shared memory, message passing, and sockets. each method has significant implications on how you set up communication and how data is exchanged which is to say it's not generally possible to create a generic IPC interface that uses different methods.

Sean (Aug 05 2020 at 15:39):

you typically find one method that is implemented on different platforms similarly wrapped by a library

Sean (Aug 05 2020 at 15:40):

RPC for example is typically associated with the message passing form of IPC and message passing typically relies on the socket method of IPC data exchange

Sean (Aug 05 2020 at 15:41):

starseeker (Aug 05 2020 at 15:42):

So capnproto's RPC API won't guarantee a specific method of communication (say, pipes vs. sockets) even if it sometimes uses pipes under the hood?

Sean (Aug 05 2020 at 15:43):

RPC is an IPC method, but it's more strongly associated with sockets and that's what I was referring to when I mentioned "abandoning IPC" .. which really was"abandon file/pipe method of IPC"

Sean (Aug 05 2020 at 15:44):

I don't know for sure, but when I was reading their docs, I didn't see any support for file/pipe-based methods, only socket-based methods

starseeker (Aug 05 2020 at 15:44):

Sean (Aug 05 2020 at 15:44):

Sean (Aug 05 2020 at 15:45):

starseeker (Aug 05 2020 at 15:47):

Sean (Aug 05 2020 at 15:47):

again, though, nearly every method is going to require adopting a data exchange method, whether that's reading/writing on pipes/files (this is your closest fit currently) or reading/writing on sockets (this is typical in client+server apps) or reading/writing buffers of memory

starseeker (Aug 05 2020 at 15:47):

starseeker (Aug 05 2020 at 15:48):

starseeker (Aug 05 2020 at 15:49):

Hmm... ZeroMQ is LGPLv3 and looks like they're working towards an MPL2 relicense. OK, that's workable...

Sean (Aug 05 2020 at 15:49):

So yeah, looks like capn can -- "As of version 0.4, the only supported way to communicate between threads is over pipes or socketpairs."

starseeker (Aug 05 2020 at 15:51):

/me would ideally prefer to avoid getting user bug reports that parts of the application can't talk to each other...

Sean (Aug 05 2020 at 15:51):

capn notes in https://capnproto.org/encoding.html that he adopted streaming as the data exchange method (implying file/pipe or socket method, not message passing or shared memory)

Sean (Aug 05 2020 at 15:55):

Sean (Aug 05 2020 at 15:56):

Sean (Aug 05 2020 at 15:57):

if he doesn't, that might be a case for something like that header only lib that used a shared memory method -- looks like you can just point capn to it

starseeker (Aug 05 2020 at 21:22):

@Sean one other note about capnproto - if we do adopt it, it bumps our minimum required C++ to C++14. Personally I'm OK with that, but I wanted to raise it in case it's of concern to you.

Sean (Aug 05 2020 at 21:26):

I’m okay with it for this, if it solves the need of communication with her commands. I would probably hesitate elsewhere but capnproto has compelling capability.

Sean (Feb 10 2021 at 05:45):

starseeker (Feb 10 2021 at 12:43):

starseeker (Feb 10 2021 at 12:44):

starseeker (Feb 10 2021 at 12:46):

starseeker (Feb 10 2021 at 12:48):

Sean (Feb 24 2021 at 17:01):

@starseeker related to earlier discussion, this appears to be a consistent hard crasher: mged> search ./ebm.r /pnts.r

ERROR: bad pointer 0x7ffe677688f8: s/b db_full_path(x64626670), was librt directory(x5551212), file /Users/morrison/brlcad.trunk/src/librt/db_fullpath.c, line 264

Sean (Feb 24 2021 at 17:02):

starseeker (Feb 24 2021 at 18:12):

Sean (Feb 25 2021 at 17:45):

Cool. Now if only could figure out why that resulted in zombie processes ... haven't seen them in ages!

Sean (Apr 19 2021 at 18:53):

starseeker (Apr 19 2021 at 18:55):

starseeker (Apr 19 2021 at 18:57):

Almost looks like they're trying to move a binary from one system to another incompatible system, but if I'm reading that correctly it's the result of building and running on the same machine?

Erik (Apr 19 2021 at 19:12):

starseeker (Apr 19 2021 at 19:13):

Sean (Apr 20 2021 at 16:10):

Yes, they appear to have compiled it themselves with BRLCAD_BUNDLED_LIBS=ON.. any response we can give them? The archer error looks like a tcl/tk 8.6 error...

starseeker (Apr 20 2021 at 16:26):

/me shakes head - for the OpenGL bit all I could suggest is they try different drivers (maybe the modern Mesa gallium software rasterizing setup) and for the Archer bit I guess my first thought would be to see if bwish can run.

Sean (Apr 20 2021 at 17:59):

From the backtrace, it looks like they're already using Mesa. Being on Power9, their options are probably limited to Mesa or straight X. MGED did work with OpenGL disabled.

The archer failure is a little more concerning... as "hv3::formmanager" is from us. Main reason I can think for an "invalid command name" on it would be because the tclIndex or pkgIndex.tcl didn't get created/loaded.. which might imply something is wrong in our tcl/tk build system.

starseeker (Apr 20 2021 at 18:18):

That's why I was wondering what bwish does - it will pull in most of the packages but not hv3 (which is the web viewer, iirc) so that might help scope what's wrong.

Sean (Jul 21 2021 at 05:48):

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
XOPENGL_glu_LIBRARY (ADVANCED)
    linked by target "dm-ogl" in directory /home/sean/brlcad.main/src/libdm/glx

Is that new? This is a default build on a remote Linux system (that may or may not have glu, but I presume it doesn't and the logic isn't taking that into account correctly to disable X).

starseeker (Jul 21 2021 at 13:13):

It may be a side effect of my refactor a while back to contain the X OpenGL logic. Does 320af5adad96f fix it?

Sean (Jul 21 2021 at 14:31):

checking, I just yanked the glu line and it worked for me, but suspected a better fix.

starkaiser (Jul 21 2021 at 15:38):

Hi everyone! So I have been learning BRL-CAD for the past couple of weeks, but I have encountered a bug in archer that I was unable to solve. I use Linux Mint 20.2. I have downloaded the source from main and built it. All the tests passed with no error, but when using snap grid in archer I get an error. I tried to look at the code, but I don't know tcl and can't solve it. I tried also to built BRL-CAD on FreeBSD 13.0, but I get the same error in archer. Snap grid works in MGED though. Has anyone else got this error? archer_error.png

bch (Jul 21 2021 at 18:47):

bch (Jul 21 2021 at 18:55):

I’d do a couple things:
1) find out what $v and $c are, for curiousity sake; before the first if in the vscale proc, put something like:
set fh [ open ~/starkaiser_debug.log a]; puts “$v // $c”; chan close $fh

bch (Jul 21 2021 at 18:58):

This will (not surprisingly) open a log file in your homedir w the contents of v and c. The last line (which will have caused the crash) are the most interesting

starkaiser (Jul 21 2021 at 19:46):

Peek-2021-07-21-22-19.gif The error appears for both edges and faces for all arbs. I have added those lines and the values in the log file are:
0 0 0 // 1.000000
-nan -nan -nan // 1.000000

bch (Jul 21 2021 at 20:34):

The “-nan” gives us a specific clue. I’ve got a general patch candidate I’ll work on later that may be useful. Hopefully this isn’t a show-stopper for you…

starkaiser (Jul 21 2021 at 20:39):

Thank you! I have tried to solve it myself, but I have only managed to hop around different files, trying to understand the code. Mged works fine, so I'm using it to learn the basics

bch (Jul 21 2021 at 20:42):

I personally really like mged. Hopefully you come to enjoy it too. Definitely a “learning wall” associated with it, but it pays off

bch (Jul 21 2021 at 20:50):

starkaiser (Jul 21 2021 at 21:05):

7.32.3 I think. I compiled the latest versions from the main repository on github because older stable versions were giving me other errors, more serious. I compiled the version that I am currently using two days ago

starseeker (Jul 21 2021 at 21:10):

@starkaiser Just so you're aware - you're well into the "not-well-tested" aspects of the software interacting with Archer for geometry creation. MGED will usually be the more stable of the interfaces, since Archer doesn't currently get as much use/attention.

starseeker (Jul 21 2021 at 21:13):

Cool that you're digging into it - Archer's also got some of our most advanced GUI features (File->Open can trigger some of the converters to open other file types, for example.)

bch (Jul 21 2021 at 21:19):

bch (Jul 21 2021 at 21:21):

@starseeker - I’m looking at some general fixes that may have knock-on effects. We’ll see how it turns out 🧐

Sean (Jul 22 2021 at 07:52):

That did the trick for libdm, but now there's another error from libpng being built wrong: undefined reference to `brl_png_init_filter_functions_vsx'

Looking at the png source code, it looks like we're missing all the platform-specific subdirs where that function comes from.

Sean (Jul 22 2021 at 07:53):

starseeker (Jul 22 2021 at 14:43):

What platform triggers the failure? Surprisingly, I've not encountered that error before...

starseeker (Jul 22 2021 at 15:14):

It looks like the POWERPC platform is defining PNG_FILTER_OPTIMIZATIONS. Unless we need that defined, my inclination would be to simply not define it.

starseeker (Jul 22 2021 at 15:16):

If I'm interpreting this correctly, PNG_FILTER_OPTIMIZATIONS is for platform specific optimization logic and there is a generic fallback we can use.

starseeker (Jul 22 2021 at 15:35):

Sean (Jul 23 2021 at 05:16):

Sean (Jul 23 2021 at 05:17):

Can we get make clean fixed?.. if make builds it, make clean should still clear it.

Sean (Jul 23 2021 at 05:21):

curiously distclean appears to have deleted files I would have thought it had no business deleting , and still left bin/osdemo and src/other/libosmesa

Sean (Jul 23 2021 at 05:44):

But why is it customized? I would think it's far less complexity and risk to drop in as vanilla as strictly possible. It seems to be just a few files, so I can't see an argument for space/complexity/savings. No idea what the runtime implications are.

Plus, it's an impedence to upgrades and there's a real risk of cost... which it now incurred. I mean, between all the builds, rebuilding, inspecting, having to shift what I was doing elsewhere, trying again, I've now spent at least 4 hours unproductively because of it. :(

I have to hope there was some need or benefit beyond tidying up files.? A benefit that saves us time and effort?

Sean (Jul 23 2021 at 05:45):

starseeker (Jul 23 2021 at 13:16):

My understanding of how ExternalProject_Add works suggests that this will be difficult. ExternalProject builds are decoupled from the primary CMake logic internally, and each individual project's logic isn't even guaranteed to define a clean target at all. CMake doesn't provide a lot of good options for customizing the "make clean" target as far as I know...

starseeker (Jul 23 2021 at 13:19):

That is curious - latest main doesn't do that for me using a build folder. Are you doing a build from the src dir? It doesn't reproduce for me there either...

starseeker (Jul 23 2021 at 13:28):

Looks like I did that back in 2016 (r68360) when was trying to scrub everything we don't need out of src/other to reduce our overall tarball size.

starseeker (Jul 23 2021 at 13:37):

For src/other/ext, the obvious thing to try would be to define some custom logic to be executed by the ExternalProject_Add build steps that generates a list of all files added by the build step that the parent build could then remove, but I don't know of a way to customize the clean target in the parent CMake build to that degree.

starseeker (Jul 23 2021 at 13:44):

We could probably produce a clean-ext target that would invoke the needed steps...

Sean (Jul 24 2021 at 06:31):

Sean (Jul 24 2021 at 06:35):

Nope, I'm using a build folder. I don't know what it'd do in a src tree. It was a straight up fresh cloning, cmake in build, and make calls, then make distclean when clean didn't clear out libpng. I was testing your libpng change, but couldn't get it to recompile again even with the file edited, so tried to make clean which failed, then distclean -- which left turds.

Sean (Jul 24 2021 at 06:51):

Yeah, I don't think we should keep that then, long term, especially as upgrades happen. That was a really expensive impact..

Also, that edit requires a human in the loop at all future upgrade points (i.e., more time) and docs/knowledge of the edits complicating upgrades. That in turn puts us in a position where upgrades are resisted (e.g., gdal, opennurbs, stepcode, ...). Not a healthy pattern.

Sean (Jul 24 2021 at 06:54):

Of course, opennurbs has other reasons, so that one's not entirely equivalent, but it is a bit involved to upgrade in part because of cullings (in addition to our code edits).

starseeker (Jul 24 2021 at 13:32):

I saw those emails, but I'm wondering if the behavior they describe is out of date - I don't think I've ever seen the ExternalProject_Add builds follow a make clean...
exttest.tar.gz
I made a small test (attached) and the behavior I'm seeing here on a make clean is that p1 is removed, but bin/p2 is intact.

starseeker (Jul 24 2021 at 13:34):

 tried to make clean which failed, then distclean -- which left turds.

Which files were left after the distclean? One possibility is that if there are files left from older build states, distclean based on updated CMakeLists.txt files won't know it needs to remove them...

starseeker (Jul 24 2021 at 14:02):

My hope is that once it is properly matured, the new src/other/ext approach to building will make vanilla upstreams more practical. Since the new logic (so far at least) is capable of replicating the CMake RPath magic without needing all up build system replacements, the incentive to clean up the third party directories goes down.

When I'm having to write and/or maintain the build systems myself, those messy directories are a problem - that was the other reason I was stripping them down, to make it easier to understand what I had to write build logic for. When I had to do major build system work on third party deps, all future upgrades needed a human in the loop anyway, so the simplification was an overall win. I'm trying to let the dust settle on the src/other/ext system before I introduce the additional complication of swapping in things like the upstream GDAL build, and I also wanted as much in the way of automated cross platform testing in place as possible before trying that step. (I haven't had bandwidth to do it anyway, but even if I had I would have been hesitant to pile the native build systems on top of everything else.)

Even if we get to completely vanilla src/other/ext, there's still going to be some disincentive to disrupt things by upgrading those deps. I'm thinking it might be helpful if we go ahead and break ext into its own git repo and add it as a submodule. If git will support this, we could set it up as follows:

Sean (Jul 28 2021 at 13:06):

Like I said, that was the entirety of its existence, so no prior build states, no git pulls besides the edit you made to fix the png issue. It was a clean checkout, cmake + make + cmake + make (tried diff compiler), and eventually make clean, then make distclean, which then left bin/osdemo and src/other/libosmesa with a few files in there.

Sean (Jul 28 2021 at 13:19):

The plan you describe sounds good except I would suggest we keep ext branching simpler. Having to hunt for which branch has the deps that works would be a bit .. frustrating to say the least.

I'd think we just have main track ext main and STABLE tract ext STABLE and leave it at that for starters. I.e., what worked for the last release, and whatever is currently needed for main. This would make main be your ACTIVE+TESTING+STAGING branches and it'd be on us to make branches while testing risky efforts, but without any reuqired formality beyone main and STABLE. I like the idea of possibly having main track ext STABLE for some added stability, but I could go either way. I'd hope any instability is very short lived.

scorp08 (Jul 29 2021 at 04:52):

starkaiser (Jul 30 2021 at 16:35):

I just downloaded and compiled the 7.32.4 release and now the snap to grid mode in Archer works great!

starseeker (Jul 30 2021 at 17:41):

The 7.32.4 release is based on older code, and doesn't incorporate most of the changes in main

starseeker (Jul 30 2021 at 17:42):

The most likely problem for issues in main is refactoring work I was doing to shift logic down the library stacks (primarily out of libtclcad, but also some out of libged into lower layers.)

starseeker (Jul 30 2021 at 17:42):

starseeker (Jul 30 2021 at 17:44):

I haven't gotten to the editing modes yet - they'll be one of the very last things to shift to the Qt GUI, because of the amount of work involved - and so the snapping behaviors haven't yet been tested post refactor (or rather, your Archer test has served as an inadvertent test).

starseeker (Aug 25 2021 at 14:32):

@Sean do we have a standard way to get SSIZE_MAX on Windows? limits.h doesn't seem to have it...

starseeker (Aug 25 2021 at 17:15):

Sean (Sep 01 2021 at 02:28):

belated sorry about that, but I saw your fix and seems good enough. there's not a standard way other than including limits.h which we already do.

Sean (Sep 01 2021 at 02:29):

I could probably key off some other limit as it just needs to be some imposed limit to satisfy the cert/stig issue, tainted input sanitization

Sean (Nov 30 2021 at 08:14):

Both ogl and X framebuffer appear to be non-functional (on Mac) ... not sure since when as I've been in a different section of the code, but ogl fails and X crashes.

starseeker (Nov 30 2021 at 19:31):

starseeker (Dec 01 2021 at 02:16):

@Sean Not sure about ogl, but I think I addressed the X issue. I don't see an ogl failure on Linux...

Sean (Jan 08 2022 at 06:37):

oh, didn't report back on this until now, but the ogl issue never went away... still a hard failure on mac, no archer, no /dev/ogl

Sean (Jan 08 2022 at 06:40):

=============== Current Selection ================
bu_shmget failed, errno=22
bu_shmget: Invalid argument
ogl_getmem:  Unable to attach to shared memory, using private
fb_ogl_open: double buffering not available. Using single buffer.
Assertion failed: (glx_dpy), function __glXSendError, file ../src/glx/glx_error.c, line 44.

=============== Current Selection ================
ogl_getmem: shmget failed, errno=22
ogl_getmem:  Unable to attach to shared memory.
Description: Silicon Graphics OpenGL
Device: /dev/ogl
Max width height: 16384 16384
Default width height: 512 512
Usage: /dev/ogl[option letters]
   p   Private memory - else shared
   l   Lingering window
   t   Transient window
   d   Suppress dithering - else dither if not 24-bit buffer
   c   Perform software colormap - else use hardware colormap if possible
   s   Single buffer -  else double buffer if possible
   b   Fast pan and zoom using backbuffer copy -  else normal
   D   Don't update screen until fb_flush() is called.  (Double buffer sim)
   z   Zap (free) shared memory.  Can also be done with fbfree command

Current internal state:
    mi_doublebuffer=1
    mi_cmap_flag=0
    ogl_nwindows=1
X11 Visual:
    TrueColor: Fixed RGB maps, pixel RGB subfield indices
    RGB Masks: 0xff0000 0xff00 0xff
    Colormap Size: 256
    Bits per RGB: 8
    screen: 0
    depth (total bits per pixel): 24

Sean (Jan 08 2022 at 06:43):

also, definitely seeing some regression in the conversion code. was doing an obj-g conversion, was giving me some new errors -- checked against a rando prior release (7.30 I think) and prior succeeded where current main does not (completes, but results in bad/flipped faces).

Sean (Jan 08 2022 at 06:44):

Sean (Jan 08 2022 at 06:46):

Here's that geometry if you want to see if you can track it down.. PoliceLifterSpeed.obj

Sean (Jan 08 2022 at 06:46):

starseeker (Jan 11 2022 at 14:44):

Can you double check what version succeeded? I've tried a number of 7.30 obj-g conversions, and so far they all produce the bad geometry here.

starseeker (Jan 11 2022 at 14:49):

Sean (Jan 14 2022 at 07:22):

Oof, I thought I grabbed 7.30, but it's looking like if I just opened the .g file, then it would have fired up a 7.24 release.

Sean (Jan 14 2022 at 07:24):

Sean (Jan 14 2022 at 07:25):

Sean (Jan 27 2022 at 05:28):

@starseeker not sure if it’s recent but cmake summary has a blank entry for Iwidgets. I took a look but couldn’t follow the logic, appeared to be handled differently from the other _BUILD vars. Would you take a look? TCL is ON, Tk is Disabled, Itcl/Itk is ON (Itcl only), and Iwidgets is blank.

starseeker (Jan 27 2022 at 13:10):

starseeker (Jan 27 2022 at 13:11):

(my real motivator for the Qt work - get rid of all the Tcl/Tk build logic ;-) )

Sean (Feb 14 2022 at 16:48):

@starseeker Here's one of the errors that a couple of them got:
warning: error while sourcing archer_launch.tcl: couldn't read file "tclscripts/archer/itk_redefines.tcl": no such file or directory

Sean (Feb 15 2022 at 03:39):

@starseeker I've run into that particular itk_defines.tcl error before as well, if that helps. I'm not sure the conditions but it doesn't seem to interfere with the build as much as it was on Windows

Sean (Feb 15 2022 at 03:40):

I did get a trace on the all-apps-crashing again bug -- it appears to be something inside libdm during application shutdown. Valgrind is pointing at some unknown symbols in that library:

--11359-- Discarding syms at 0x107420000-0x107474000 in /Users/morrison/brlcad.main/.build/lib/libdm.20.0.1.dylib (have_dinfo 1)
--11359-- Discarding syms at 0x1074cc000-0x1074d4000 in /Users/morrison/brlcad.main/.build/lib/libpkg.20.0.1.dylib (have_dinfo 1)
--11359-- Discarding syms at 0x1092e4000-0x1092f0000 in /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib (have_dinfo 1)
==11359== Jump to the invalid address stated on the next line
==11359==    at 0x107446636: ???
==11359==    by 0x10744657E: ???
==11359==    by 0x10743CA42: ???
==11359==    by 0x7FFF2071ED24: ??? (in /dev/ttys000)
==11359==    by 0x7FFF2071F00F: ??? (in /dev/ttys000)
==11359==    by 0x7FFF2080AF43: ??? (in /dev/ttys000)
==11359==  Address 0x107446636 is not stack'd, malloc'd or (recently) free'd

Sean (Feb 15 2022 at 03:41):

Actually, if that output is strictly ordered, that may be the issue. It's unloaded libdm, but then goes to unload dm/libdm-ps.dylib and a call is made into libdm...

starseeker (Feb 15 2022 at 03:42):

Sean (Feb 15 2022 at 03:44):

I could be wrong on that interpretation, but it def appears to be something libdm plugin-related

starseeker (Feb 15 2022 at 03:45):

For the itk_defines.tcl error, the question I have is whether share/tclscripts/archer/itk_redefines.tcl is present - if not, it may be a missing dependency on the cp target for that file, if it is then it's something about the paths in the Tcl environment.

starseeker (Feb 15 2022 at 03:45):

starseeker (Feb 15 2022 at 03:46):

Let me see if I can make the target for copying that file an explicit dependency of archer...

Sean (Feb 15 2022 at 03:46):

I'll see if I can trigger the itk_redefines.tcl error, and check -- it was pretty consistent for me for a while, but I'd been ignoring it

Sean (Feb 15 2022 at 03:47):

starseeker (Feb 15 2022 at 03:47):

Sean (Feb 15 2022 at 03:47):

I'm not seeing where/why in the code the plugins have any code that would be getting called

Sean (Feb 15 2022 at 03:48):

there's no apparent atexit handler. oh, maybe dlopen registers one.. that might be.

starseeker (Feb 15 2022 at 03:48):

If it's the unloading code, a quick check would be to comment out the unloading bits in libdm_clear

starseeker (Feb 15 2022 at 03:49):

Sean (Feb 15 2022 at 03:49):

(base) morrison@agua .build % nm libexec/dm/libdm-ps.dylib
                 U _Tcl_AppendStringsToObj
                 U _Tcl_DuplicateObj
                 U _Tcl_GetObjResult
                 U _Tcl_SetObjResult
                 U ___stack_chk_fail
                 U ___stack_chk_guard
00000000000100d0 d __dyld_private
                 U _bu_calloc
                 U _bu_free
                 U _bu_log
                 U _bu_vls_addr
                 U _bu_vls_free
                 U _bu_vls_init
                 U _bu_vls_printf
                 U _bu_vls_sprintf
                 U _bu_vls_strcpy
0000000000010550 b _disp_mat
000000000000b770 T _dm_plugin_info
0000000000010138 d _dm_ps
0000000000010150 d _dm_ps_impl
                 U _draw_Line3D
                 U _fclose
                 U _fflush
                 U _fopen
                 U _fprintf
                 U _fputs
00000000000106a8 s _head_ps_vars
                 U _memcpy
                 U _memset
00000000000104d0 b _mod_mat
                 U _null_String2DBBox
                 U _null_SwapBuffers
                 U _null_beginDList
                 U _null_configureWin
                 U _null_doevent
                 U _null_drawDList
                 U _null_drawPoint3D
                 U _null_drawPoints3D
                 U _null_endDList
                 U _null_freeDLists
                 U _null_genDLists
                 U _null_getDisplayImage
                 U _null_loadPMatrix
                 U _null_makeCurrent
                 U _null_openFb
                 U _null_reshape
                 U _null_setDepthMask
                 U _null_setLight
                 U _null_setTransparency
                 U _null_setZBuffer
000000000000c010 s _pinfo
0000000000009120 t _ps_close
000000000000b6a0 t _ps_debug
000000000000b290 t _ps_draw
00000000000092f0 t _ps_drawBegin
0000000000009360 t _ps_drawEnd
0000000000009af0 t _ps_drawLine2D
0000000000009c00 t _ps_drawLine3D
0000000000009c60 t _ps_drawLines3D
0000000000009cf0 t _ps_drawPoint2D
0000000000009940 t _ps_drawString2D
0000000000009d60 t _ps_drawVList
0000000000010690 b _ps_drawVList.fin
0000000000010650 b _ps_drawVList.last
0000000000010670 b _ps_drawVList.start
0000000000009430 t _ps_hud_begin
00000000000094a0 t _ps_hud_end
0000000000009510 t _ps_loadMatrix
000000000000b700 t _ps_logfile
0000000000008310 t _ps_open
00000000000104c0 b _ps_open.count
000000000000b3f0 t _ps_setBGColor
000000000000b340 t _ps_setFGColor
000000000000b480 t _ps_setLineAttr
000000000000b540 t _ps_setWinBounds
00000000000100e0 d _ps_usage
00000000000092a0 t _ps_viable
00000000000105d0 b _psmat
                 U _setbuf
                 U _sscanf
                 U _vclip
                 U dyld_stub_binder

starseeker (Feb 15 2022 at 03:50):

Sean (Feb 15 2022 at 03:51):

oh, there it is! I was looking for whatever was triggering the unloading... plain as day in dm_init.cpp

Sean (Feb 15 2022 at 03:51):

Sean (Feb 15 2022 at 03:52):

starseeker (Feb 15 2022 at 03:58):

Sean (Feb 15 2022 at 04:00):

testing a fix for dlclosure issue, and I'm coincidentally getting the itk_defines issue so I'll update here when I can and test

Sean (Feb 15 2022 at 04:01):

may be related to this huge blather: WARNING - bu_dir's bin value is set to ., but binary being run is located in /Users/morrison/brlcad.main/.build. This probably means you are running btclsh from a non-install directory with BRL-CAD already present in . - be aware that .tcl files from . will be loaded INSTEAD OF local files. Tcl script changes made to source files for testing purposes will not be loaded, even though btclsh will most likely 'work'. To test local changes, either clear ., specify a different install prefix (i.e. a directory without BRL-CAD installed) while building, or manually set the BRLCAD_ROOT environment variable.

starseeker (Feb 15 2022 at 04:02):

Urm. I've seen that too, but didn't seem to trigger the issue for me. However, that shouldn't be happening, so I'll see if I can take a quick look...

Sean (Feb 15 2022 at 04:02):

Sean (Feb 15 2022 at 04:15):

okay pushed.. I think what was going on is because the iterator was registering ABC and then closing ABC .. and badness was happening. now unloads in reverse order, so ABC->CBA, and that appears to have resolved whatever dependency tracking badness was going on. I suspect it's either plugins that refer to other plugins (thus needing to be in order) or the dynamic linker doing recursive reference counting and thinking it was done with libdm as the dlclose() plugins were unloaded and references got updated.

Sean (Feb 15 2022 at 04:19):

Sean (Feb 15 2022 at 04:20):

cpack appears to have some built-in stuff too for what it produces, though that doesn't address the build tree like that module seems to

starseeker (Feb 15 2022 at 04:30):

starseeker (Feb 15 2022 at 04:38):

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff74e98b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd300, operate=<optimized out>,
    args=<optimized out>) at dl-error-skeleton.c:208
#2  0x00007ffff74e9983 in __GI__dl_catch_error (objname=0x55555558b830, errstring=0x55555558b838,
    mallocedp=0x55555558b828, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#3  0x00007ffff39dab59 in _dlerror_run (operate=operate@entry=0x7ffff39da420 <dlclose_doit>, args=0x6) at dlerror.c:170
#4  0x00007ffff39da468 in __dlclose (handle=<optimized out>) at dlclose.c:46
#5  0x00007ffff4bd8a9a in bu_dlclose (handle=0x6) at /home/cyapp/brlcad/src/libbu/dylib.c:66
#6  0x00007ffff48ad034 in libdm_clear () at /home/cyapp/brlcad/src/libdm/dm_init.cpp:201
#7  0x00007ffff48ad654 in libdm_initializer::~libdm_initializer (this=0x7ffff48e8590 <LIBDM>,
    __in_chrg=<optimized out>) at /home/cyapp/brlcad/src/libdm/dm_init.cpp:217
#8  0x00007ffff73d015e in __cxa_finalize (d=0x7ffff48e6c20) at cxa_finalize.c:83
#9  0x00007ffff48970f7 in __do_global_dtors_aux () from /home/cyapp/brlcad-build/lib/libdm.so.20
#10 0x00007fffffffdd90 in ?? ()

starseeker (Feb 15 2022 at 04:39):

Sean (Feb 15 2022 at 05:01):

it indeed is working much better for me, but that 0x6 handle in your stack there is suspicious. I think I may have been careless with the iterator.

Sean (Feb 15 2022 at 05:02):

yeah, end() shouldn't be valid and that's where I made it start. surprisingly works...

Sean (Feb 15 2022 at 05:05):

Sean (Feb 15 2022 at 05:17):

you're good enough to catch mistakes like that! heh. it was an invalid loop! bogus handle was a dead give-away..

Sean (Feb 15 2022 at 05:26):

I feel first response should never be to inject platform identifiers... revert if needed, or at least give me a chance to fix it... see if latest is any better.

starseeker (Feb 15 2022 at 13:43):

Sean (Feb 15 2022 at 14:15):

@starseeker bad news is that the crash isn't gone... declared victory too soon. there's still something very distinctly wrong in the loading/unloading..

starseeker (Feb 15 2022 at 14:17):

That's weird. As I recall that code is pretty straightfoward - load on initialize, unload on exit. Not sure where to go hunting for trouble...

starseeker (Feb 15 2022 at 14:18):

starseeker (Feb 15 2022 at 14:23):

@Sean I don't know if it helps any, but src/libbu/tests/dylib is intended to be a small, self-contained testing of that mechanism...

starseeker (Feb 15 2022 at 14:25):

@Sean do you know when this started? (i.e. has it been doing it ever since the dm/ged plugin work, or did some more recent change kick it off?)

starseeker (Feb 15 2022 at 14:29):

To me the strangest thing is that neither the local mac here nor the CI runners seem to be exhibiting it. And the CI build for the mac indicates it's running the ASCII to .g conversions, which (at least on the Linux box here) did trigger crashing when the unloading wasn't working.

Sean (Feb 15 2022 at 15:06):

It's the same behavior I've been seeing for months, I think since the dm/ged plugin work. It doesn't appear to be 100% deterministic as it seems to depend what symbols are in use, implying it's involving the dynamic linker and when a particular symbol or set of symbols are encountered.

Sean (Feb 15 2022 at 15:09):

It doesn't appear to affect more complicated apps that call lots of symbols (e.g., mged or gcv, etc) as much (or at least as visibly). Seems to be most noticeable on a handful of smaller simpler apps that essentially do nothing (but still load and unload nearly everything), and every now and then on something more complicated.

Sean (Feb 15 2022 at 15:13):

I think there's possibly something fundamental in play here (like the ordering) and Mac happens to be provoking. When I watch the binary's DYLD loading/unloading, there is some strangeness going on. The libged plugins are loading, and then it loads dm and it's dependency libraries. It appears to be choking up when it goes to unload dm and friends.

starseeker (Feb 15 2022 at 15:21):

Sean (Feb 15 2022 at 15:48):

Honestly, I'm not sure.. I just got a clue that it may be related to dm-X and dm-ogl, and that latter is still fully busted on Mac for me -- so that might be something you could check on -- if mged, archer, and such work for you on mac from a build dir. If it works, we can maybe trace backwards to figure out where things diverge.

starseeker (Feb 15 2022 at 15:50):

OK, I'll check on that. I know qged worked with Qt6 on mac, but I didn't try archer and I'm not sure if qged was doing its swrast fallback or not...

Sean (Feb 15 2022 at 15:50):

From what I think I'm seeing is that apps that already link libdm and/or X11 have no problem. It's when an app doesn't use either, but then libged loads libged-dm.dylib, that loads libdm and all it's deps. It's when libdm's deps get unloaded that it segfaults.

Sean (Feb 15 2022 at 15:50):

Sean (Feb 15 2022 at 15:51):

I don't have a qt build. I've been living in mged -c land for a while once ogl stopped working.

Sean (Feb 15 2022 at 15:52):

I haven't tried an opengl-disabled build to see if non-classic mode will fire up X correctly

starseeker (Feb 15 2022 at 15:52):

OH! Yeah, I had to refactor some code for a libgcv plugin because of that - apparently a dynamically loaded lib can't go and load another dynamically loaded lib.

starseeker (Feb 15 2022 at 15:54):

starseeker (Feb 15 2022 at 16:05):

Sean (Feb 15 2022 at 16:10):

Okay, I think I just ruled out X11/ogl -- if I remove libdm-X and libdm-ogl, it still segfaults

Sean (Feb 15 2022 at 16:11):

dyld: unloaded: <970A62D7-21A7-3363-92AC-41D3E3ED2AF5> /Users/morrison/brlcad.main/.build/libexec/ged/libged-autoview.dylib
!!! REMOVING 0x7fd514c08d00 unknown
dyld: unloaded: <D8509635-B237-3585-B70C-823C95F4B5CB> /Users/morrison/brlcad.main/.build/libexec/ged/libged-attr.dylib
!!! REMOVING 0x7fd514c08ba0 unknown
dyld: unloaded: <F4435FE5-243C-3286-B0D3-CEDC50774EEE> /Users/morrison/brlcad.main/.build/libexec/ged/libged-arot.dylib
!!! REMOVING 0x7fd514e05d90 unknown
dyld: unloaded: <B9E74045-8CE7-3438-B699-54645171BFC3> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-swrast.dylib
dyld: unloaded: <21900CBB-094E-349C-A1B2-BAD779BDCF15> /Users/morrison/brlcad.main/.build/lib/libosmesa.dylib
!!! REMOVING 0x7fd514e059b0 unknown
dyld: unloaded: <C363B743-FE6B-3D4A-8513-953A5F6FAF28> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib
!!! REMOVING 0x7fd514e054c0 unknown
dyld: unloaded: <E8A365D0-5923-386F-A9BD-7DA434D46324> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-plot.dylib
!!! REMOVING 0x7fd514d069b0 unknown
dyld: unloaded: <DE144727-5E38-36CE-BFDD-A11CB151703E> /Users/morrison/brlcad.main/.build/lib/libdm.20.dylib
dyld: unloaded: <219AC144-E743-3037-8F1C-9B313D82BB1A> /Users/morrison/brlcad.main/.build/lib/libpkg.20.dylib
dyld: unloaded: <0AC2C158-06D9-3273-962E-FD0F51813D60> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-txt.dylib
zsh: segmentation fault  DYLD_PRINT_LIBRARIES=1 bin/cad_user 2>&1

starseeker (Feb 15 2022 at 16:12):

Sean (Feb 15 2022 at 16:13):

so what's going on there is it's unloading everything, is unloading the last libdm-*.dylib (libdm-plot.dylib in this example) and it unloads libdm itself since reference-counting-wise, nothing else is using it.

starseeker (Feb 15 2022 at 16:13):

Sean (Feb 15 2022 at 16:14):

no, that's my manual debug printing, I'm printing out all the library pointers on load and unload. You put them in a std::set, we the name is unknown, but it's basically the lines that follow -- and in the full log, the pointer address can be matched to the load statement where the name was known

starseeker (Feb 15 2022 at 16:14):

Sean (Feb 15 2022 at 16:15):

starseeker (Feb 15 2022 at 16:27):

Sean (Feb 15 2022 at 16:33):

1) libged static initializer runs and plugins get dlopened, each one getting resolved by the dynamic linker which loads its dependent libraries, among those being..
2) libged-dm loads, which dynamic loads libdm, which static initializer runs and plugins get dlopened, each one getting... yada yada, and then
3) app runs, does it's thing, returns from main
4) libged destructor runs, starts unloading ged plugins, libged-dm unloads for example but dependencies are not yet unloaded
5) libdm destructor runs, starts unloading dm plugins and dependencies (perhaps asynchronously), and when it gets to the last plugin ...
6) dynamic linker unloads libdm itself, and this appears to happen while libdm's destructor is still running
7) seg faults, presumably on next iteration of the loop or on return from the destructor

Sean (Feb 15 2022 at 16:36):

please don't go shotgunning the plugins just yet! -- I have a swath of unpushed commits rebased on main, hundreds of changes to eliminate the per-command API

starseeker (Feb 15 2022 at 16:36):

Don't worry, I'm not going to do anything drastic. Just trying to get a sense of what we're facing

Sean (Feb 15 2022 at 16:37):

it'll conflict for sure if you go ripping on it too much
I think what is needed is to either ensure destruction is deferred, or order is somehow guaranteed by symbols

starseeker (Feb 15 2022 at 16:38):

If I absolutely have to I can ditch the dm command as a libged command and make it available some other way, but it's still a potential issue if anyone else happens to set up a similar conundrum for the unloaders...

Sean (Feb 15 2022 at 16:38):

I mean there is one possibility of simply not auto-loading everything. Only load as called.

starseeker (Feb 15 2022 at 16:39):

Would that help in the unloading calls though? Or do you mean immediately unloading after execution as well?

Sean (Feb 15 2022 at 16:39):

yeah, I think the dm plugin just provokes the issue, and isn't the issue itself. seems reasonable/likely that future plugin will require some lib. only issue might be like you said -- a dylib with a static initializer that loaded another dylib with a static initializer, and trying to avoid that

Sean (Feb 15 2022 at 16:40):

oh gosh, no, not unloading after execution. only loading what is used, and unloading everything that was loaded on shutdown. that would handle this specific case (because very little uses the dm plugin)

Sean (Feb 15 2022 at 16:41):

and apps that DO use the dm plugin appear to be gui and link dm, so it's never unloaded

starseeker (Feb 15 2022 at 16:41):

starseeker (Feb 15 2022 at 16:42):

Sean (Feb 15 2022 at 16:42):

the test would be to run something that dynamically loads libdm, run the dm command, and see if it behaves on exit

Sean (Feb 15 2022 at 16:43):

yeah, it'd require removing libdm from gsh's lib list, run dm command, and see if exit behaves

starseeker (Feb 15 2022 at 16:45):

I'll try that here... one sec. Looks like I've got some actual dm library calls in there, so I'll have to turn off a couple things.

Sean (Feb 15 2022 at 16:46):

interesting. so if I remove all dm plugins, it still loads libdm dynamic, and eventually unloads it some time after libged-dm is unloaded seemingly without issue. valgrind is clean.

Sean (Feb 15 2022 at 16:49):

which is to say it's not simply returning from libdm's destructor that's causing the seg fault. it's that it is in the plugin unloading loop and it unloads a plugin that the corruption happens

Sean (Feb 15 2022 at 16:49):

Sean (Feb 15 2022 at 16:50):

Is there anything different about the libdm plugins compared to the libged plugins?

starseeker (Feb 15 2022 at 16:52):

OK, confirm - if I take out the libdm explicit library calls from gsh, it crashes on exit after running "dm types"

starseeker (Feb 15 2022 at 16:53):

I'll go ahead and commit that turned off so we have a simple test case - will be easy to turn back on later.

Sean (Feb 15 2022 at 16:56):

so this all centers around the c++ trick of using static initialization with a class we're using to ensure constructor/destructor code is called when a library is loaded/unloaded, and that's what is not playing -- it's unloading the library before the destructor is done

Sean (Feb 15 2022 at 17:03):

Options I think are....
1) make libdm not plugin-based, as that would avoid a dynamic lib loading other dynamic-loading/unloading libs,
2) make libged only load plugins on-demand and hope any plugins like dm that load other dynamic-loading libs will already be loaded,
3) defer unloading to libbu unloading -- basically make bu_dlclose schedule something for closure and wait,
4) find a different mechanism (avoid using constructor/destructor since that's at the heart of why this fails)

starseeker (Feb 15 2022 at 17:05):

1) is possible - it was done primarily to keep Tcl out of the core libs, but I can also just put those backends requiring it behind an ENABLE_TCL check like that one shader in liboptical.

starseeker (Feb 15 2022 at 17:06):

My bigger concern is what happens if we start supporting 3rd party GED commands and someone else adds a command that does their own libdm-esque magic behind the scenes.

starseeker (Feb 15 2022 at 17:06):

starseeker (Feb 15 2022 at 17:08):

1) is probably the shortest path back to working reliably, and realistically it's pretty unlikely we're going to get a lot of custom libdm backend implementations anytime soon to take advantage of the modularity.

starseeker (Feb 15 2022 at 17:10):

I also wonder what will happen if we expose libgcv through any of the GED commands - mightn't there be a similar issue?

starseeker (Feb 15 2022 at 17:22):

Would 4) involve (say) making ged_init and ged_free be responsible for plugin loading and unloading?

Sean (Feb 15 2022 at 17:25):

yeah, something like that - making the loading and unloading a little more explicit. I suspect just having the loop that does destruction be explicitly called would avoid the segfault because the dynamic loader would know that it can't unload the parent dm/ged/gcv library

Sean (Feb 15 2022 at 17:31):

I think I can try #3 pretty quickly, and see if it does the trick. I suspect it will. The downside is memory use until libbu is unloaded. Probably could have API forcibly unload on demand if that becomes an issue, but unlikely an issue in our case until we're talking about thousands of plugins.

starseeker (Feb 15 2022 at 17:33):

Sounds good. If that doesn't work let me know if you want me to do either 1) or 4)

Sean (Feb 15 2022 at 17:37):

may still be a benefit to doing #2 (faster load times) -- there is some occasional huge pause on certain (usually infrequent) runs that I assume is something the dynamic loader is doing. seem the pause especially on Windows, 30-60+sec before mged displays.

Sean (Feb 15 2022 at 18:46):

...
!!! REMOVING 0x7fdd55c0a030 unknown
!!! REMOVING 0x7fdd55c09dc0 unknown
!!! REMOVING 0x7fdd55c09b50 unknown
!!! REMOVING 0x7fdd55c098e0 unknown
!!! REMOVING 0x7fdd55c09670 unknown
!!! REMOVING 0x7fdd55c09330 unknown
!!! REMOVING 0x7fdd55c09130 unknown
!!! REMOVING 0x7fdd55c08ef0 unknown
!!! REMOVING 0x7fdd55c08d90 unknown
!!! LIBDM DESTRUCTOR
!!! REMOVING 0x7fdd579054c0 unknown
!!! REMOVING 0x7fdd57905140 unknown
!!! REMOVING 0x7fdd57904a50 unknown
!!! REMOVING 0x7fdd5780ecc0 unknown
!!! REMOVING 0x7fdd55e06400 unknown
!!! REMOVING 0x7fdd55d07010 unknown
dyld: unloaded: <484BDA57-EC5C-3533-8271-1213BE720173> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ogl.dylib
dyld: unloaded: <7CD794FB-07E7-3E51-B7CE-CB9585477278> /usr/local/opt/libxrender/lib/libXrender.1.dylib
dyld: unloaded: <466439D8-1576-33B8-AE38-F4AD4CBCDC3F> /opt/X11/lib/libGLU.1.dylib
dyld: unloaded: <0AC2C158-06D9-3273-962E-FD0F51813D60> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-txt.dylib
dyld: unloaded: <E8A365D0-5923-386F-A9BD-7DA434D46324> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-plot.dylib
dyld: unloaded: <C363B743-FE6B-3D4A-8513-953A5F6FAF28> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib
dyld: unloaded: <B9E74045-8CE7-3438-B699-54645171BFC3> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-swrast.dylib
dyld: unloaded: <21900CBB-094E-349C-A1B2-BAD779BDCF15> /Users/morrison/brlcad.main/.build/lib/libosmesa.dylib
dyld: unloaded: <6461ED77-30C4-3D90-8FFE-224EF5B8365F> /Users/morrison/brlcad.main/.build/libexec/ged/libged-dsp.dylib
dyld: unloaded: <1536E3E1-0B02-3F94-92A2-00D48E37B256> /Users/morrison/brlcad.main/.build/libexec/ged/libged-edmater.dylib
dyld: unloaded: <F0DAA927-CCFA-3F6D-B79B-BC27BDB6A3A8> /Users/morrison/brlcad.main/.build/libexec/ged/libged-env.dylib
dyld: unloaded: <74AF84F3-50D3-398F-9470-8C0F4DC17813> /Users/morrison/brlcad.main/.build/libexec/ged/libged-erase.dylib
dyld: unloaded: <52AACC7B-7DD1-3EA6-BF05-7D1073E5ADC1> /Users/morrison/brlcad.main/.build/libexec/ged/libged-exists.dylib
dyld: unloaded: <35739FC4-A62C-3F93-8E41-B355D7E4D5A2> /Users/morrison/brlcad.main/.build/libexec/ged/libged-expand.dylib
dyld: unloaded: <C079326A-9961-3C29-9CB0-18D9CCA48C32> /Users/morrison/brlcad.main/.build/libexec/ged/libged-eye_pos.dylib
dyld: unloaded: <F40B173E-8839-3244-A200-C1BEAC11EB7E> /Users/morrison/brlcad.main/.build/libexec/ged/libged-facetize.dylib
dyld: unloaded: <47715816-3B66-3BDF-85E8-915D193BDDD4> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fb2pix.dylib
dyld: unloaded: <CA5B33DE-CF2C-3D33-95D8-CDCD86B4C109> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fbclear.dylib
dyld: unloaded: <479418C5-FF5B-3D14-BEEB-D095AD4D4C55> /Users/morrison/brlcad.main/.build/libexec/ged/libged-find.dylib
dyld: unloaded: <7F8475E5-81F6-3032-9465-72E7D321179A> /Users/morrison/brlcad.main/.build/libexec/ged/libged-form.dylib
dyld: unloaded: <0DABEEDD-2EBE-327A-8B17-6C9FFEDA693B> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fracture.dylib
dyld: unloaded: <5E8181FA-7584-37BF-96BE-7E9819B89D52> /Users/morrison/brlcad.main/.build/libexec/ged/libged-gdiff.dylib
...

starseeker (Feb 15 2022 at 19:15):

FYA, I'm trying to get set up with Visual Studio 2022 now - I think the Github CI system made the upgrade.

starseeker (Feb 15 2022 at 19:16):

Rather worrisome in that the openNURBS build appears to be failing with an internal compiler error...

Sean (Feb 15 2022 at 19:20):

basically it cruises through the destructor and schedules all the dylibs for closing. then when libbu is unloaded or an explicit dlunload() is called, it actually closes them all.

Sean (Feb 15 2022 at 19:21):

Sean (Feb 15 2022 at 19:22):

I'm still sorting through compiler errors with the tamu students.. almost all ran into issues. any idea why CHECK_CXX_FLAG(fsanitize=fuzzer) would be passing on Windows??? It did, and then proceeded to fail during compile because of the flag.

starseeker (Feb 15 2022 at 19:23):

Sean (Feb 15 2022 at 19:23):

another tried in WSL, which I've done myself, but their build ended up unable to find Tcl's configure for some reason

starseeker (Feb 15 2022 at 19:23):

starseeker (Feb 15 2022 at 19:31):

CHECK_START: Performing Test FSANITIZE_FUZZER_CXX_FLAG_FOUND
CHECK_PASS: Success
Performing C++ SOURCE FILE Test FSANITIZE_FUZZER_CXX_FLAG_FOUND succeeded with the following output:
Change Dir: C:/brlcad-build/CMakeFiles/CMakeTmp

Run Build Command(s):C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe cmTC_30228.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=17.0 /v:m && Microsoft (R) Build Engine version 17.1.0+ae57d105c for .NETFramework^M
Copyright (C) Microsoft Corporation. All rights reserved.^M
^M
  Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31104 for x64^M
  Copyright (C) Microsoft Corporation.  All rights reserved.^M
  cl /c /Zi /W3 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D _POSIX_C_SOURCE=200809L /D _XOPEN_SOURCE=700 /D FSANITIZE_FUZZER_CXX_FLAG_FOUND /D "CMAKE_INTDIR=\"Debug\"" /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"cmTC_30228.dir\Debug\\" /Fd"cmTC_30228.dir\Debug\vc143.pdb" /external:W3 /Gd /TP /errorReport:queue  -fsanitize=fuzzer "C:\brlcad-build\CMakeFiles\CMakeTmp\src.cxx"^M
  src.cxx^M
  cmTC_30228.vcxproj -> C:\brlcad-build\CMakeFiles\CMakeTmp\Debug\cmTC_30228.exe^M


Source file was:
int main() { return 0; }

Sean (Feb 15 2022 at 19:31):

Sean (Feb 15 2022 at 19:35):

looks like the top-level unprotected one is stray. we have a fuzz regression test that does a direct test and links proper

Sean (Feb 15 2022 at 19:37):

starseeker (Feb 15 2022 at 21:13):

starseeker (Feb 15 2022 at 21:22):

starseeker (Feb 15 2022 at 21:41):

It narrows down fairly quickly to trying to call methods on the const_cast<ON_SerialNumberMap*>(this) pointer. Not sure yet how to work around it.

starseeker (Feb 15 2022 at 21:57):

Grr. I don't have access to the Microsoft compiler bug page referenced in the vcpkg discussion.

starseeker (Feb 15 2022 at 21:57):

Looks like all we can tell students until a workaround is found or Microsoft pushes a fix is to use VS2019

starseeker (Feb 15 2022 at 21:58):

starseeker (Feb 15 2022 at 22:13):

OK... from the "cheap but functional" school... It looks like that particular class method isn't actually used anywhere, so we can just turn it off completely.

starseeker (Feb 15 2022 at 22:17):

Sean (Feb 15 2022 at 22:20):

I wonder if their build system works, implying it being something we're passing in that's untested. I don't see reference to that error in their tracker. If it uniquely affects us, it's probably our combination of flags..

Sean (Feb 15 2022 at 22:21):

looks like cmake guys encountered an issue with the /FS flag recently that they addressed, maybe related

starseeker (Feb 15 2022 at 22:26):

confirmed on the mac here that gsh now shuts down clean. That was some nice work @Sean

starseeker (Feb 15 2022 at 23:14):

@Sean as far as OpenGL is concerned - I think I may have asked you this already, but does glxgears or one of the other X11 OpenGL demos run successfully on your Mac?

starseeker (Feb 16 2022 at 00:19):

Sean (Feb 17 2022 at 21:06):

Sean (Feb 17 2022 at 21:07):

Sean (Feb 17 2022 at 21:08):

Yeah, I was going to say, I have a clean 2022 from two students now .. but one has those errors in the external project builds (maybe all of them)

starseeker (Feb 17 2022 at 21:55):

That's one of those rather unhelpful Visual Studio errors you get when a custom target fails.

starseeker (Feb 17 2022 at 21:55):

starseeker (Feb 17 2022 at 21:59):

For libdm+opengl, what does running ./src/libdm/tests/dm_test from the build directory show?

Sean (Feb 18 2022 at 17:39):

@starseeker I figured that one out. The full log had better detail. Turns out MSVC automatically updated/updates itself, so the compiler that CMake had originally detected no longer existed.

Sean (Feb 18 2022 at 17:41):

I think that may explain a couple build failures commonly encountered by people who have recently installed MSVC. I'm not sure if we can detect that situation as that error is absolutely inscrutible, or maybe put some advice into the Compiling page instructions to ensure MSVC is completely updated before proceeding with CMake (but then MSVC could update at any time).

Sean (Feb 18 2022 at 17:41):

I suppose it's not as common on Mac/Linux/BSD simply because the compiler isn't sitting in a versioned directory like msvc's compiler is.

Sean (Feb 18 2022 at 17:44):

Also figured out one of the other common build errors some of them ran into. If you do a Git for Windows clone of the code, the build will fail in WSL (Ubuntu) because some of the build logic appears to require unix line endings (e.g., libpng seems to be running awk).

Sean (Feb 18 2022 at 17:46):

Not sure that can be detected either, but can put a note in Compiling that one must fully start in WSL if you're going that route.

Sean (Feb 18 2022 at 17:46):

Sean (Feb 18 2022 at 17:47):

(base) morrison@agua .build % src/libdm/tests/dm_test
load msgs: dlsym(0x7f80706048d0, fb_plugin_info): symbol not found
Unable to load symbols from './libexec/dm/libdm-plot.dylib' (skipping)
Could not find 'fb_plugin_info' symbol in plugin
dlsym(0x7f807040bf60, fb_plugin_info): symbol not found
Unable to load symbols from './libexec/dm/libdm-ps.dylib' (skipping)
Could not find 'fb_plugin_info' symbol in plugin

Available types:
    ogl
    X
    plot
    ps
    swrast
    txt
    nu
nu valid: 1
plot valid: 1
X valid: 1
ogl valid: 1
osgl valid: 0
wgl valid: 0
dmp name: nu
open called
dmp name: txt
close called
recommended type: ogl

Sean (Feb 28 2022 at 21:26):

starseeker (Feb 28 2022 at 21:29):

Maybe check whether the older (working) versions are linking to any libraries that are different from the newer version?

I don't know how much trouble it would be, but it would be interesting to know if qged works on that platform or not (the qged setup shouldn't require X11 opengl, so I'm curious as to whether the problem also manifests if we take X out of the equation...)

starseeker (Mar 01 2022 at 20:40):

@Sean I think the bzflag reboot must have introduced a new default compiler - libbu's sort.c is suddenly making it unhappy...

Sean (Mar 01 2022 at 20:41):

starseeker (Mar 01 2022 at 20:41):

Ah, OK. I can see where the error is coming from, but I'm not sure what the "correct" thing to do instead is...

Sean (Mar 01 2022 at 20:43):

I'll can take a look at it, I hadn't gotten to compiling there yet. Been chasing fires, reviewing PR commits, and answering questions all day.

Sean (Mar 01 2022 at 20:43):

Erik (Mar 02 2022 at 02:12):

@starseeker sorry, that was me bashing around. Clang 13 now, gcc 10.3 is also installed, so -DCMAKE_C_COMPILER=gcc

starseeker (Mar 02 2022 at 02:26):

@Erik no worries - we just need to fix the issue. I'm not confident I know what the "right" answer should be yet...

Erik (Mar 02 2022 at 02:28):

the subtracting a null ptr thing is buried in a macro from what I saw, could take a bit of doing to tease out. Using gcc pulls a sysinfo bridge header that mucks up libbu linking, that test should be moved from 'have the header' to 'can link the symbol' I think

Sean (Mar 08 2022 at 16:20):

I just did a build on the latest Fedora, compiled clean, mgen runs, but then abruptly closes after drawing anything and running rt…. Rt window displays the rendering. Terminal output says mged was Killed.

Sean (Mar 08 2022 at 16:22):

ran in gdb and there’s nothing to break on as there indeed appears to be something in the system that send mged the kill signal after forking off the rt process.

Sean (Mar 08 2022 at 16:22):

I’ve only seen that before when a process attempts to allocate too much memory, but haven’t yet seen evidence that’s what’s going on here

Sean (Mar 08 2022 at 16:31):

Oof, okay I found the evidence. It is getting killed by the Out of memory monitor. Not seeing why as it only appears to be using 1mb…

Sean (Mar 08 2022 at 16:57):

Ah, so turns out mged is using 5.4GB just with mged open… and that laptop is my low-resource test box, only has 4GB + 4GB swap. It is running out of memory. Seems a bit nuts that mged is using that much with essentially nothing open.

Sean (Mar 08 2022 at 17:23):

Looks like it’s something in DM. Every attach X is adding 1.5GB usage. Kicking off the tcltk gui adds over 4GB (presumably from the dm+fb).

Sean (Mar 08 2022 at 17:37):

@starseeker can you see what mged does for you if you run mged -c share/db/moss.g. , attach nu, attach X , close the window, attach X again, then draw all.g. ?

starseeker (Mar 08 2022 at 20:32):

X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  62 (X_CopyArea)
  Resource id in failed request:  0x460000a
  Serial number of failed request:  3473
  Current serial number in output stream:  3474

starseeker (Mar 08 2022 at 20:45):

starseeker (Mar 08 2022 at 20:46):

(As an aside, here's something a little weird from valgrind when I run attach X):

==1409798== Invalid read of size 1
==1409798==    at 0x483FEF0: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798==    by 0x90A999E: _XimUnRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x9090892: XUnregisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x90A9866: _XimRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x909080C: XRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x5A392F2: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A349C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==  Address 0xf205dc1 is 1 bytes inside a block of size 9 free'd
==1409798==    at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798==    by 0x909FB3F: XSetLocaleModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x5A39ACA: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x5A39A4F: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x90A9866: _XimRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x909080C: XRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x5A392F2: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==  Block was alloc'd at
==1409798==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798==    by 0x909F756: _XlcDefaultMapModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x909FB2A: XSetLocaleModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798==    by 0x5A39ACA: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x5A392D9: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x59A349C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798==    by 0x129B98: gui_setup (attach.c:333)

starseeker (Mar 08 2022 at 20:57):

Memory is being allocated at if_X24.c:2065, from a size calculation at if_X24.c:1997

starseeker (Mar 08 2022 at 20:58):

starseeker (Mar 08 2022 at 21:00):

starseeker (Mar 08 2022 at 21:04):

starseeker (Mar 08 2022 at 21:06):

ATTACHING ogl (X Windows with OpenGL graphics)
mged> X Error of failed request:  GLXBadDrawable
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  1252
  Current serial number in output stream:  1252

starseeker (Mar 08 2022 at 21:26):

Also seems to be specific to the first attach - if I attach multiple windows and close the second, I can attach a new one successfully.

starseeker (Mar 08 2022 at 21:27):

starseeker (Mar 08 2022 at 21:30):

None of dm_close, fb_close nor fb_close_existing gets triggered when the window closes.

starseeker (Mar 08 2022 at 21:38):

Oof. The first thing that comes to mind is to have the dm_open command bind a some Tcl command that will call dm_close to the Tk <Destroy> event....

Erik (Mar 09 2022 at 14:32):

Sean (Mar 09 2022 at 14:50):

yeah, I noticed that too. I think that's may also be new, but more concerning is the crash. The 20480x20480 change was made back in 7.16.0 and just testing a 7.24 version, it doesn't appear to explode memory use and seems to release when windows are closed. I reduced the number down to 8096x8096 anyways, but some other change is likely involved, and I think the crash is definitely new.

Sean (Mar 09 2022 at 14:56):

I think this is at the heart of the issue, but I'm not yet groking that stack trace.. will try to catch it on Mac to see if it gives a different path or at least more complete symbols -- looks like your build isn't enable-all'd.

starseeker (Mar 09 2022 at 17:45):

@Sean I may have messed up the dm bookkeeping in MGED at some point - my recollection of MGED's management of those can be summed up as "messy", so it's actually quite likely I messed up somewhere. I'll see if I can tease a 7.24 build into working and try to figure out how the fb memory got freed...

starseeker (Mar 09 2022 at 17:52):

One likely culprit of the increased memory usage may be my attempt to set up things so each dm has a built-in embedded fb by default. Don't know if that's behind the window crash but it's likely why the dm's are suddenly taking up memory they didn't previously

starseeker (Mar 09 2022 at 18:29):

OK, got 7.24.4 building - here's what I'm seeing so far (I'm getting X11 windows and the first ogl window, but the second wipes out):

BRL-CAD Release 7.24.4  Geometry Editor (MGED)
    Wed, 09 Mar 2022 13:25:05 -0500, Compilation 0
    cyapp@ubuntu2019

attach (nu|txt|X|ogl)[nu]?
mged> attach X
ATTACHING X (X Window System (X11))
mged> attach X
ATTACHING X (X Window System (X11))
mged> attach ogl
ATTACHING ogl (X Windows with OpenGL graphics)
mged> attach ogl
ATTACHING ogl (X Windows with OpenGL graphics)
mged> X Error of failed request:  GLXBadDrawable
  Major opcode of failed request:  151 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  1181
  Current serial number in output stream:  1181

starseeker (Mar 09 2022 at 18:43):

Huh - now I'm seeing the exact same thing with latest main, fwiw. I can't get 7.22.0 to build easily - will probably need to set up a VM if I need to go that far back.

starseeker (Mar 09 2022 at 18:48):

(Oh - should make clear I'm closing each window above before proceeding to the next attach)

starseeker (Mar 09 2022 at 18:53):

Aaaaand now I can't get the X attach to reproduce the failure, even with the fb memory set large... what on earth...

starseeker (Mar 09 2022 at 22:03):

OK, per recent discussion, drawing in the second "attach X" does indeed crash in latest main.

starseeker (Mar 09 2022 at 22:16):

starseeker (Mar 09 2022 at 22:18):

starseeker (Mar 09 2022 at 22:24):

Sean (Mar 12 2022 at 21:01):

Okay, I swear I'd tested 7.24 and it worked, but it's bombing for me too. I documented it.

starseeker (Mar 14 2022 at 18:54):

Sean (Mar 21 2022 at 16:05):

Just FYI, I'm working on the build Action testing issues from the recent materials merge.

Himanshu (Mar 22 2022 at 04:59):

Himanshu (Mar 22 2022 at 12:37):

/me thinks sometimes why msvc shows weird message that file not found but file is still there. Now builds fine.

Sean (Mar 22 2022 at 22:26):

@Himanshu Sekhar Nayak hm, don't know what to say about that other than it helps to turn up the compilation verbosity (under Options -> Project and Solutions -> Build and Run). I typically set output to Normal and log to Detailed. That way, I can get to what exactly happened if needed.

Sean (Apr 05 2022 at 19:07):

One of the unexpected side effects of the plugin changes is frequently running into runtime crashes now whenever something changes outside the plugin dll/so/dylib that is incompatible with whatever's going on inside the plugin (as it does not appear to automatically recompile). At least that seems to be what's going on. For example, just pulled latest view changes, compiled, and then all tools exhibit hard corruption, assert failures, bu_bombing, etc. Cleaning and recompiling is apparently more often than not necessary now. Rather unexpected and unintuitive that it's not updating/recompiling the plugins. Maybe some dependencies aren't listed correctly?

Sean (Apr 05 2022 at 19:09):

Also working with a student on a hard database I/O corruption situation that seems to be new. Any database creation on his system is resulting in corrupted .g files. Others with the same setup, same msvc, etc. are not experiencing the corruption. It appears to have just started in the past two weeks.

starseeker (Apr 05 2022 at 19:19):

That is unexpected - I would have figured the logic would rebuild anything that would result in such a pronounced failure.

starseeker (Apr 05 2022 at 19:19):

@Sean I'll switch to working in a branch for this, so I don't keep disrupting everyone else.

starseeker (Apr 05 2022 at 19:22):

It's probable I wouldn't see that breakage mode myself, as my normal MO is to clear and rebuild.

Sean (Apr 14 2022 at 17:52):

@starseeker I’m away from a computer to test, but getting multiple reports that mged is busted and recent updates, draw not working. Can you or someone else check?

starseeker (Apr 14 2022 at 18:11):

Erik (May 11 2022 at 21:43):

Sean (May 11 2022 at 21:44):

No that’s my doing. The test is detecting a change due to new material object management and I need to resolve it.

Sean (May 11 2022 at 21:44):

Erik (May 11 2022 at 22:03):

and that durn kryptonite slips in... :) I was mucking with converting jenkins to a pipeline (can be dropped into the repo as /Jenkinsfile and revision controlled)

Sean (May 12 2022 at 17:19):

starseeker (May 12 2022 at 18:13):

@Daniel Rossberg any chance we could wire up your cubes examples as unit/regression tests to make sure the gqa behavior stays correct in the future?

Sean (May 12 2022 at 18:16):

I was just looking at that PR too. Very interesting! Does it still interleave as resolution doubles? That is one of the current features, no ray is shot twice -- it (is supposed to) refines the gaps in-between recursively without ever reshooting the same ray.

Daniel Rossberg (May 13 2022 at 11:46):

BTW, that's why I made a PR and didn't committed it directly: To give it a better review and discuss it first.

Daniel Rossberg (May 13 2022 at 11:49):

I'll look for this ans see, how much effort this would be. Unfortunately, the result of gqa is aprint-out, which had to interpreted first.

Sean (May 13 2022 at 16:18):

@Daniel Rossberg even if it reshoots, correct is obviously more important than performance. I was just more wondering if that behavior changed (and the potential effect as the grid size continues to double, if half the rays are repeat work each level)

Daniel Rossberg (May 13 2022 at 18:00):

It hadn't reshot, but also not reused the old ray-traces. Changed back to the old grid generation.

The main fault was that in lines 1003-1005 the grid sizes were recomputed with the wrong number of steps (state->steps instead of state->steps-1).

The next improvement was to use gridSpacing there too. With every refinement the "old" moments have to be reduced, and its a problem if they were computed with on value and readjusted based on a different one.

Sean (Jun 02 2022 at 16:45):

@Daniel Rossberg thank you for that detail! really helps to understand what's going on there. that's awesome that you caught that off-by-one bug... would take me quite a while to fully re-understand what is going on in there, so glad you figured out what was wrong. :)

starseeker (Jul 11 2022 at 03:03):

Sean (Jul 11 2022 at 06:48):

Thanks @starseeker and sorry, should be fixed now! I hadn't cycled back to mac or linux yet as I was really trying to immerse in a windows dev workflow as much as possible last week so I could address categoric issues from that side I'm seeing in our stig listings. Took a heck of a lot longer than expected to get things off the ground (still not done, but putting a thumbtack in it for now).

Sean (Jul 11 2022 at 13:37):

thanks for clearing the last two. was waiting for the scan to see what else was left and you'd fixed it before I got to see the next (as it's building for me locally clean)

Sean (Jul 11 2022 at 13:39):

let me know if bio.h causes a problem; might get away with these vanilla environments, but I suspect that'll need to be handled differently to be fully portable

starseeker (Aug 02 2022 at 13:00):

/brlcad/src/conv/off/off-g.c: In function ‘off2nmg’:
/brlcad/src/conv/off/off-g.c:208:39: error: ‘%s’ directive output may be truncated writing up to 63 bytes into a region of size 62 [-Werror=format-truncation=]
  208 |     snprintf(sname, sizeof(sname), "s.%s", title);
      |                                       ^~   ~~~~~
/brlcad/src/conv/off/off-g.c:208:5: note: ‘snprintf’ output between 3 and 66 bytes into a destination of size 64
  208 |     snprintf(sname, sizeof(sname), "s.%s", title);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/brlcad/src/conv/off/off-g.c:209:39: error: ‘%s’ directive output may be truncated writing up to 63 bytes into a region of size 62 [-Werror=format-truncation=]
  209 |     snprintf(rname, sizeof(sname), "r.%s", title);
      |                                       ^~   ~~~~~
/brlcad/src/conv/off/off-g.c:209:5: note: ‘snprintf’ output between 3 and 66 bytes into a destination of size 64
  209 |     snprintf(rname, sizeof(sname), "r.%s", title);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Sean (Aug 02 2022 at 13:26):

Sean (Aug 02 2022 at 13:33):

That's a fun one .. fixing one issue let it detect another underlying. Commit fix pushed.

Sean (Aug 02 2022 at 13:34):

I now appear to have a full build, so I'm going to let the gitlab folks know we're good to go. Hopefully we can stay stable until the end of this week for their demo.

starseeker (Aug 02 2022 at 22:08):

starseeker (Aug 02 2022 at 22:56):

starseeker (Aug 03 2022 at 23:56):

Sean (Aug 04 2022 at 01:32):

Sean (Aug 04 2022 at 14:33):

Files known to Git are not accounted for in build logic:
doc/docbook/resources/brlcad/CMakeLists.txt
doc/docbook/resources/brlcad/brlcad-article-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-article-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-book-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-book-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-common.xsl.in
doc/docbook/resources/brlcad/brlcad-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-fonts.xsl.in
doc/docbook/resources/brlcad/brlcad-gendata.xsl
doc/docbook/resources/brlcad/brlcad-lesson-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-lesson-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-presentation-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-presentation-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-specification-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-specification-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-xhtml-header-navigation.xsl
doc/docbook/resources/brlcad/brlcad-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/center-table-print.xsl
doc/docbook/resources/brlcad/images/brlcad-logo-669966.svg
doc/docbook/resources/brlcad/images/brlcad-logo-6699cc.svg
doc/docbook/resources/brlcad/images/brlcad-logo-blue.svg
doc/docbook/resources/brlcad/images/brlcad-logo-cc6666.svg
doc/docbook/resources/brlcad/images/brlcad-logo-cc9966.svg
doc/docbook/resources/brlcad/images/brlcad-logo-green.svg
doc/docbook/resources/brlcad/images/brlcad-logo-limegreen.svg
doc/docbook/resources/brlcad/images/brlcad-logo-red.svg
doc/docbook/resources/brlcad/images/logo-vm-gears.png
doc/docbook/resources/brlcad/images/logo-vm-gears.svg
doc/docbook/resources/brlcad/presentation.xsl.in
doc/docbook/resources/brlcad/tutorial-cover-template.xsl.in
doc/docbook/resources/brlcad/tutorial-template.xsl.in
doc/docbook/resources/brlcad/wordpress.xsl.in

Files mentioned in build logic are not checked into the repository:
doc/docbook/resourcesCMakeLists.txt
doc/docbook/resourcesbrlcad-article-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-article-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-book-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-book-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-common.xsl.in
doc/docbook/resourcesbrlcad-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-fonts.xsl.in
doc/docbook/resourcesbrlcad-gendata.xsl
doc/docbook/resourcesbrlcad-lesson-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-lesson-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-presentation-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-presentation-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-specification-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-specification-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-xhtml-header-navigation.xsl
doc/docbook/resourcesbrlcad-xhtml-stylesheet.xsl.in
doc/docbook/resourcescenter-table-print.xsl
doc/docbook/resourcesimages/brlcad-logo-669966.svg
doc/docbook/resourcesimages/brlcad-logo-6699cc.svg
doc/docbook/resourcesimages/brlcad-logo-blue.svg
doc/docbook/resourcesimages/brlcad-logo-cc6666.svg
doc/docbook/resourcesimages/brlcad-logo-cc9966.svg
doc/docbook/resourcesimages/brlcad-logo-green.svg
doc/docbook/resourcesimages/brlcad-logo-limegreen.svg
doc/docbook/resourcesimages/brlcad-logo-red.svg
doc/docbook/resourcesimages/logo-vm-gears.png
doc/docbook/resourcesimages/logo-vm-gears.svg
doc/docbook/resourcespresentation.xsl.in
doc/docbook/resourcestutorial-cover-template.xsl.in
doc/docbook/resourcestutorial-template.xsl.in
doc/docbook/resourceswordpress.xsl.in

CMake Error at CMakeTmp/distcheck_repo_verify.cmake:228 (message):
ERROR: Distcheck cannot proceed until build files and repo are in sync (set
-DFORCE_DISTCHECK=ON to override)

Sean (Aug 04 2022 at 19:31):

pushed the fix last night, should be good to go. Possibly related, I'm seeing two "Attempt to add a custom rule to output" cmake error rmessages on libnetpbm.a.rule and libgdal.a.rule
Any ideas?

starseeker (Aug 05 2022 at 13:23):

Not offhand - the logic doing that management is src/other/ext/CMake/ExternalProject_Target.cmake:442 - it in turn uses the fcfgcpy function which defines custom rules

starseeker (Aug 05 2022 at 13:26):

You could try some message statements in those functions to see if you can bracket where that error is being generated

Sean (Sep 15 2022 at 03:13):

Sean (Sep 16 2022 at 00:59):

Sean (Apr 24 2023 at 15:23):

@Christopher looks like a few dirs are missing from the latest commit? (fbx, dxf, pbrt in regress/gcv)

Christopher (Apr 24 2023 at 15:28):

GregoryLi (May 18 2023 at 08:55):

It seems we have some problems with brep command. ged_brep_corewill receive four arguments before. Now we only get two.
image.png

GregoryLi (May 18 2023 at 08:56):

Daniel Rossberg (May 18 2023 at 14:25):

Just tests with a clean build of current brlcad on Linux. I got arb8.s.brep is made.. Can you repeat your test with a clean build from scratch?

GregoryLi (May 19 2023 at 03:47):

Sean (May 19 2023 at 20:30):

libged commands are loaded dynamically (as dynamic libs) and for some reason they don't always rebuild when a file has been edited despite having dependencies set in cmake (or perhaps one is missing).

Sean (May 19 2023 at 20:32):

so if anyone edits a header, especially a structure, they all need to be rebuilt and that doesn't always happen automatically. would be great if someone could make that not be a problem, but currently I make sure to delete the libged and libdm libs at a minimum so they're rebuilt.

GregoryLi (Oct 03 2023 at 08:57):

Hi, I just pulled the newest codes and found I can't open .g database.
image.png

GregoryLi (Oct 03 2023 at 08:58):

GregoryLi (Oct 03 2023 at 09:18):

It's quite strange... For me, the problem existed many commit ago (before Sep 13 the problem exists). And it worked well on Aug 27. Does anyone else have this problem? Do I need to use the git bisect command to determine the location?

starseeker (Oct 03 2023 at 15:31):

That might need a bisect - it's probably related to the work I did with the open/opendb GED command work.

starseeker (Oct 03 2023 at 15:32):

A naive guess is that I didn't change something from open to opendb, but it could be something else.

GregoryLi (Oct 04 2023 at 01:25):

I just located the error using bisect. a7bba28a948a1939e53ab224fdc4e4a381cddb23 is the first bad commit.

starseeker (Oct 04 2023 at 02:20):

@GregoryLi OK, that' confirms somewhere in the Archer startup stack we're calling "open" where we should be calling "opendb"

starseeker (Oct 08 2023 at 01:50):

GregoryLi (Oct 09 2023 at 00:29):

starseeker (Nov 04 2023 at 16:38):

starseeker (Nov 04 2023 at 22:26):

I don't believe this - facetizing tor with a tolerance of r=0.0001 is causing an nmg_mdl_to_bot failure just on the mac, which seems to be why the lod drawing test is failing.

Sean (Oct 08 2024 at 16:28):

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001018c4884 libOpenNURBS.dylib`ON_Object::IsKindOf(ON_ClassId const*) const + 20
    frame #1: 0x00000001017aeebc libOpenNURBS.dylib`ON_Geometry::Cast(ON_Object*) + 32
    frame #2: 0x00000001007be114 librt.20.dylib`brep_dbi2on(rt_db_internal const*, ONX_Model&) + 176
    frame #3: 0x00000001007be560 librt.20.dylib`rt_brep_export5 + 168
    frame #4: 0x0000000100809088 librt.20.dylib`rt_generic_xform + 340
    frame #5: 0x000000010003bdec mged`vls_solid + 168
    frame #6: 0x000000010004c618 mged`refresh + 1040
    frame #7: 0x0000000100049988 mged`main + 7080
    frame #8: 0x000000019234ff28 dyld`start + 2236

Sean (Oct 08 2024 at 16:46):

M
M $args
M 1 0 0
adc $args
adc draw
ae
aip f
attach
center
draw $esol_control($id,name)
has_embedded_fb
ill
ill -e -i $ri $spath
ill -e -i 1 $path
ill -e -i 1 [lindex $spath_and_pos 0
ill -e -n -i $ri $spath
ill -i 1 [lindex $paths 0
ill -i 1 \$mged_gui($id,mgs_path)
in $mged_gui($id,solid_name) dsp f \
keep
keep db_glob
ls -c
ls -r
make $mged_gui($id,solid_name) $type} msg
make_name $mged_default(solid_name_fmt)} name
make_name comb@\
matpick
matpick $item
matpick -n $path_pos
matpick -n \$item
matpick [lindex $spath_and_pos 1
nirt $args
opendb
pl
postscript
press
press oill
press reject
press reset
press sill
qray basename
qray echo
qray effects
qray evencolor
qray fmt f
qray fmt g
qray fmt h
qray fmt m
qray fmt o
qray fmt p
qray fmt r
qray oddcolor
qray overlapcolor
qray script
qray voidcolor
quit
rset grid anchor
rt
saveview
sed $mged_gui($id,solid_name)}
sed -i 1 $item
sed -i 1 $spath
size
size $size
status state
tie
tie $id
tie $id $mged_gui($id,active_dm)
tree
tree $args} result
units $mged_display(units)
view
view center
view size
view_ring
view_ring next
view_ring prev
view_ring toggle
who
who phony
x -1
x -2

Sean (Oct 08 2024 at 16:53):

starseeker (Oct 08 2024 at 17:10):

1738 const ON_ClassId* p = ClassId();
(gdb) print *this
$5 = {_vptr.ON_Object = 0x5d00000032, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {
static m_p0 = 0x7ffff3764e00 <ON_3dmObjectAttributes::m_ON_3dmObjectAttributes_class_rtti>,
static m_p1 = 0x7ffff37801e0 <ON_RdkUserData::m_ON_RdkUserData_class_rtti>, static m_mark0 = 0,
m_pNext = 0x7ffff376ed20 <ON_HistoryRecord::m_ON_HistoryRecord_class_rtti>, m_pBaseClassId = 0x0, m_sClassName = "ON_Object", '\000' <repeats 70 times>,
m_sBaseClassName = "0", '\000' <repeats 78 times>, m_create = 0x0, m_uuid = {Data1 = 1622531005, Data2 = 58976, Data3 = 4563,
Data4 = "\277\344\000\020\203\001", <incomplete sequence \360>}, m_mark = -2147483648, m_class_id_version = 0, m_f1 = 0x0, m_f2 = 0x0, m_f3 = 0x0, m_f4 = 0x0,
m_f5 = 0x0, m_f6 = 0x0, m_f7 = 0x0, m_f8 = 0x0}, m_userdata_list = 0x200000003a}

starseeker (Oct 08 2024 at 17:11):

#define ON_VIRTUAL_OBJECT_IMPLEMENT( cls, basecls, uuid ) \
void* cls::m_s_##cls##_ptr = nullptr; \
const ON_ClassId cls::m_##cls##_class_rtti(#cls,#basecls,0,uuid);\
cls * cls::Cast( ON_Object* p) {return(p&&p->IsKindOf(&cls::m_##cls##_class_rtti))?static_cast< cls *>(p):nullptr;} \
const cls * cls::Cast( const ON_Object* p) {return(p&&p->IsKindOf(&cls::m_##cls##_class_rtti))?static_cast<const cls *>(p):nullptr;} \
const ON_ClassId* cls::ClassId() const {return &cls::m_##cls##_class_rtti;} \
bool cls::CopyFrom(const ON_Object*) {return false;} \
cls * cls::Duplicate() const {return static_cast< cls *>(this->Internal_DeepCopy());} \
ON_Object* cls::Internal_DeepCopy() const {return nullptr;}

starseeker (Oct 08 2024 at 17:16):

(gdb) print *bi->brep
$6 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x5d00000032, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {

starseeker (Oct 08 2024 at 18:14):

(gdb) print *this
$3 = {_vptr.ON_Object = 0x7ffff372ed10 <vtable for ON_Brep+16>, static m_s_ON_Object_ptr = 0x0,

starseeker (Oct 08 2024 at 18:14):

(gdb) print *this
$4 = {_vptr.ON_Object = 0xc00000004, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {

starseeker (Oct 08 2024 at 18:21):

#0 brep_dbi2on (intern=0x7fffffffd1c0, model=...) at /home/user/brlcad/src/librt/primitives/brep/brep.cpp:2321
#1 0x00007ffff75b4c82 in rt_brep_get (logstr=0x5555556a70a0, intern=0x7fffffffd1c0, attr=0x0)

(gdb) print *bi->brep
$1 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x7ffff372ed10 <vtable for ON_Brep+16>,
static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {

starseeker (Oct 08 2024 at 18:23):

#0 brep_dbi2on (intern=0x55555565e320 <es_int>, model=...)
at /home/user/brlcad/src/librt/primitives/brep/brep.cpp:2331
#1 0x00007ffff75b544f in rt_brep_export5 (ep=0x7fffffffd1a0, ip=0x55555565e320 <es_int>, UNUSED_local2mm=1,

$3 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x2c00000030, static m_s_ON_Object_ptr = 0x0,
static m_ON_Object_class_rtti = {

starseeker (Oct 08 2024 at 18:36):

==690562== Invalid read of size 8
==690562== at 0x9569A20: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)
==690562== Address 0x13c29970 is 784 bytes inside an unallocated block of size 2,432 in arena "client"
==690562==
==690562== Invalid read of size 8
==690562== at 0x9569A23: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)
==690562== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==690562==
==690562==
==690562== Process terminating with default action of signal 11 (SIGSEGV)
==690562== Access not within mapped region at address 0x0
==690562== at 0x9569A23: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)

starseeker (Oct 08 2024 at 20:33):

@Sean I might have fixed it - let me know if the latest commit works for you. (I didn't put a NEWS item in yet, want more confirmation than just "works on my box" for this sucker...)

Sean (Oct 20 2024 at 05:54):

morrison@Miniagua TCL_BLD-build % make

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -c -I"." -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/unix -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/libtommath -O2 -pipe  -I/Volumes/X10/brlcad/.build/bext_output/install/include  -Wall -Wpointer-arith -fno-common -DBUILD_tcl -DPACKAGE_NAME=\"tcl\" -DPACKAGE_TARNAME=\"tcl\" -DPACKAGE_VERSION=\"8.6\" -DPACKAGE_STRING=\"tcl\ 8.6\" -DPACKAGE_BUGREPORT=\"\" -DNO_DIRENT_H=1 -DNO_VALUES_H=1 -DNO_STDLIB_H=1 -DNO_STRING_H=1 -DNO_SYS_WAIT_H=1 -DNO_DLFCN_H=1 -DUSE_THREAD_ALLOC=1 -D_REENTRANT=1 -D_THREAD_SAFE=1 -DHAVE_PTHREAD_ATTR_SETSTACKSIZE=1 -DHAVE_PTHREAD_ATFORK=1 -DTCL_THREADS=1 -DTCL_CFGVAL_ENCODING=\"iso8859-1\" -DHAVE_ZLIB=1 -DMODULE_SCOPE=extern\ __attribute__\(\(__visibility__\(\"hidden\"\)\)\) -DHAVE_HIDDEN=1 -DMAC_OSX_TCL=1 -DHAVE_CAST_TO_UNION=1 -DHAVE_VFORK=1 -DHAVE_POSIX_SPAWNP=1 -DHAVE_POSIX_SPAWN_FILE_ACTIONS_ADDDUP2=1 -DHAVE_POSIX_SPAWNATTR_SETFLAGS=1 -DTCL_SHLIB_EXT=\".dylib\" -DNDEBUG=1 -DTCL_CFG_OPTIMIZED=1 -DTCL_TOMMATH=1 -DMP_PREC=4 -DTCL_WIDE_INT_IS_LONG=1 -DWORDS_BIGENDIAN=1 -DHAVE_GETCWD=1 -DHAVE_MKSTEMP=1 -DHAVE_OPENDIR=1 -DHAVE_STRTOL=1 -DHAVE_WAITPID=1 -DHAVE_GETNAMEINFO=1 -DHAVE_GETADDRINFO=1 -DHAVE_FREEADDRINFO=1 -DHAVE_GAI_STRERROR=1 -DNEED_FAKE_RFC2553=1 -DHAVE_MTSAFE_GETHOSTBYNAME=1 -DHAVE_MTSAFE_GETHOSTBYADDR=1 -DNO_FD_SET=1 -DHAVE_GMTIME_R=1 -DHAVE_LOCALTIME_R=1 -DHAVE_MKTIME=1 -Dmode_t=int -Dpid_t=int -Dsize_t=unsigned -Duid_t=int -Dgid_t=int -Dsocklen_t=int -DNO_UNION_WAIT=1 -DGETTOD_NOT_DECLARED=1 -DHAVE_SIGNED_CHAR=1 -DHAVE_PUTENV_THAT_COPIES=1 -DHAVE_CHFLAGS=1 -DHAVE_MKSTEMPS=1 -DNO_ISNAN=1 -DHAVE_GETATTRLIST=1 -DHAVE_COPYFILE=1 -DTCL_DEFAULT_ENCODING=\"utf-8\" -DTCL_LOAD_FROM_MEMORY=1 -DTCL_WIDE_CLICKS=1 -DTCL_UNLOAD_DLLS=1     -DSTATIC_BUILD -fno-lto /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclStubLib.c

In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclStubLib.c:14:

In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclInt.h:36:

In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclPort.h:23:
**/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/unix/tclUnixPort.h:32:10:** **fatal error:** **'errno.h' file not found**

#include <errno.h>

         **^~~~~~~~~**

1 error generated.

make: *** [tclStubLib.o] Error 1

starseeker (Oct 21 2024 at 13:15):

Sean (Oct 21 2024 at 16:47):

@starseeker It most certainly does and always has. Nothing on the system has changed. Debug build worked just fine. Just the default build is dying on that error during Tcl's bext build.

Sean (Oct 21 2024 at 16:49):

Only thing I can see is all the -DNO_*_H=1 flags also look wrong, like something is wrong during/after tcl's configure phase.

Sean (Oct 21 2024 at 16:49):

Sean (Oct 21 2024 at 16:50):

release and debug builds both seem to have worked, but I've not deleted them to check from scratch as I'm working on something else and the default build just surprised me that it's failing basic setup.

Sean (Jan 14 2025 at 20:21):

@starseeker you see the rtwiz failures? Looks like it's rendering with a different/higher opacity, less transparency. Any ideas?

Christopher (Jan 14 2025 at 20:26):

Sean (Jan 15 2025 at 20:44):

@Christopher when I run make regress on a Debug build on Mac, I got two failures:

morrison@Miniagua .build % ctest -R rtwiz
Test project /Users/morrison/brlcad.main/.build
    Start 1076: regress-rtwiz_m35_A
1/8 Test #1076: regress-rtwiz_m35_A .................   Passed    2.45 sec
    Start 1077: regress-rtwiz_m35_B
2/8 Test #1077: regress-rtwiz_m35_B .................   Passed    1.67 sec
    Start 1078: regress-rtwiz_m35_C
3/8 Test #1078: regress-rtwiz_m35_C .................   Passed    2.92 sec
    Start 1079: regress-rtwiz_m35_D
4/8 Test #1079: regress-rtwiz_m35_D .................   Passed    2.40 sec
    Start 1080: regress-rtwiz_m35_E
5/8 Test #1080: regress-rtwiz_m35_E .................***Failed    2.27 sec
    Start 1081: regress-rtwiz_m35_F
6/8 Test #1081: regress-rtwiz_m35_F .................***Failed    3.19 sec
    Start 1082: regress-rtwiz_m35_edge_only
7/8 Test #1082: regress-rtwiz_m35_edge_only .........   Passed    1.68 sec
    Start 1083: regress-rtwiz_m35_edge_only_color
8/8 Test #1083: regress-rtwiz_m35_edge_only_color ...   Passed    1.67 sec

Looking into the failures, the rt images are at a different opacity for some reason.

Christopher (Jan 15 2025 at 21:07):

@Sean Josh updated the default intensity a while back (ee673972d23b5efaa5b0d1029dd855b494dfb102). We need to update the comparison. I'll grab it in a bit

Sean (Jan 16 2025 at 04:57):

Sean (Jan 16 2025 at 04:58):

I saw that the source didn't match, but overlooked it thinking the tcl code overrode -- rtwizard's tcl code sets it to 12...

Sean (Jan 16 2025 at 04:59):

Christopher (Jan 16 2025 at 05:08):

Yes, failing on Windows too. If I revert the commit it passes - so it's definitely at play

Sean (Jan 16 2025 at 05:10):

Huh, okay.. well that's disconcerting that nobody noticed for a year. I swear I've run regression tests since then that have passed but ... something must have been out of date/wrong.

Christopher (Jan 16 2025 at 05:12):

Whoa didn't notice the date stamp on that commit.. I guess none of the runner's are checking regress? Does seem suspicious that it wouldn't have cropped up before now.

Sean (Jan 16 2025 at 05:15):

I recently fully blew away that build dir, so maybe something prevented the test from failing unless it started fresh?? Def weird.

Sean (Jan 16 2025 at 05:15):

starseeker (Jan 17 2025 at 00:44):

rtwizard in particular is disabled by default, IIRC - because of how the fbserv stuff is working right now, a distcheck-full can cause trouble with multiple builds trying to run it simultaneously.

starseeker (Jan 17 2025 at 00:45):

Sean (Jan 17 2025 at 18:05):

starseeker (Jan 18 2025 at 00:08):

It's filtered out by the STAND_ALONE flag, which is filtered out by the "check" target in BRLCAD_Test_Wrappers.cmake

starseeker (Jan 18 2025 at 00:08):

Sean (Jan 18 2025 at 00:14):

starseeker (Jan 18 2025 at 00:54):

A ctest -R regress will probably pull it in, but neither regress nor check (the build targets) should

starseeker (Jan 18 2025 at 00:55):

ctest doesn't run the build targets in the build system, it's got a separate setup

Sean (Jan 19 2025 at 05:02):

I didn't originally see it via ctest. I only ran/pasted that way to show just the re-run of the rtwiz tests. Didn't want to run all of them. I guess I must have ran make test and noticed there.

starseeker (Jan 19 2025 at 14:18):

Yeah, IIRC make test will run everything - that's why we ended up defining a separate make check.

Stream: brlcad

Topic: recent bugs

Sean (Jul 21 2020 at 04:27):

starseeker (Jul 21 2020 at 11:33):

Sean (Jul 21 2020 at 20:42):

Sean (Aug 05 2020 at 02:46):

Sean (Aug 05 2020 at 02:49):

Sean (Aug 05 2020 at 03:03):

Sean (Aug 05 2020 at 03:04):

Sean (Aug 05 2020 at 03:10):

starseeker (Aug 05 2020 at 12:43):

starseeker (Aug 05 2020 at 12:44):

starseeker (Aug 05 2020 at 12:46):

starseeker (Aug 05 2020 at 12:47):

starseeker (Aug 05 2020 at 12:50):

starseeker (Aug 05 2020 at 12:51):

starseeker (Aug 05 2020 at 13:05):

starseeker (Aug 05 2020 at 13:18):

starseeker (Aug 05 2020 at 13:20):

starseeker (Aug 05 2020 at 13:24):

starseeker (Aug 05 2020 at 13:47):

starseeker (Aug 05 2020 at 13:57):

starseeker (Aug 05 2020 at 13:58):

starseeker (Aug 05 2020 at 14:00):

starseeker (Aug 05 2020 at 14:02):

starseeker (Aug 05 2020 at 14:10):

Sean (Aug 05 2020 at 14:27):

Sean (Aug 05 2020 at 14:29):

Sean (Aug 05 2020 at 14:39):

Sean (Aug 05 2020 at 14:41):

Sean (Aug 05 2020 at 14:51):

Sean (Aug 05 2020 at 14:55):

Sean (Aug 05 2020 at 14:59):

Sean (Aug 05 2020 at 15:15):

Sean (Aug 05 2020 at 15:17):

Sean (Aug 05 2020 at 15:19):

Sean (Aug 05 2020 at 15:21):

starseeker (Aug 05 2020 at 15:24):

starseeker (Aug 05 2020 at 15:26):

starseeker (Aug 05 2020 at 15:30):

Sean (Aug 05 2020 at 15:34):

starseeker (Aug 05 2020 at 15:36):

starseeker (Aug 05 2020 at 15:37):

Sean (Aug 05 2020 at 15:38):

Sean (Aug 05 2020 at 15:39):

Sean (Aug 05 2020 at 15:40):

Sean (Aug 05 2020 at 15:41):

starseeker (Aug 05 2020 at 15:42):

Sean (Aug 05 2020 at 15:43):

Sean (Aug 05 2020 at 15:44):

starseeker (Aug 05 2020 at 15:44):

Sean (Aug 05 2020 at 15:44):

Sean (Aug 05 2020 at 15:45):

Sean (Aug 05 2020 at 15:45):

starseeker (Aug 05 2020 at 15:47):

Sean (Aug 05 2020 at 15:47):

starseeker (Aug 05 2020 at 15:47):

starseeker (Aug 05 2020 at 15:48):

starseeker (Aug 05 2020 at 15:49):

Sean (Aug 05 2020 at 15:49):

starseeker (Aug 05 2020 at 15:51):

Sean (Aug 05 2020 at 15:51):

Sean (Aug 05 2020 at 15:55):

Sean (Aug 05 2020 at 15:56):

Sean (Aug 05 2020 at 15:57):

starseeker (Aug 05 2020 at 21:22):

Sean (Aug 05 2020 at 21:26):

Sean (Feb 10 2021 at 05:45):

starseeker (Feb 10 2021 at 12:43):

starseeker (Feb 10 2021 at 12:44):

starseeker (Feb 10 2021 at 12:46):

starseeker (Feb 10 2021 at 12:48):

Sean (Feb 24 2021 at 17:01):

Sean (Feb 24 2021 at 17:02):

starseeker (Feb 24 2021 at 18:12):

Sean (Feb 25 2021 at 17:45):

Sean (Apr 19 2021 at 18:53):

starseeker (Apr 19 2021 at 18:55):

starseeker (Apr 19 2021 at 18:57):

Erik (Apr 19 2021 at 19:12):