@starseeker did you happen to look at "Primitive select (mouse behavior) causes drawn solids to disappear instead of being highlighted. When hitting escape (return to normal mouse behavior) all solids reappear." ?
That rings a bell with a change Nick made a while back - I'll have to check the history, something about transparency support in MGED. Don't know if it was that specifically, but there was some sort of drawing problem...
yeah, I thought it was nick's change, but I couldn't confirm it and there was no NEWS entry. probably a set of commits deep down in my inbox I haven't gotten to reviewing yet.
@starseeker
: sorry I should have tested the process I/O change more carefully.
that is a little unsettling that it broke rt in archer on windows... do you have any more info on why? presumably fileno() is not returning 0/1/2 for stdin/err/out, which should imply something else is seriously wrong.
it is mildly concerning that the api is assuming 0/1/2 are in/err/out. not just being pedantic. it's going to be terribly difficult to debug out of context, particularly any code that happens to do perfectly normal pipe operations on 0/1/2. likely result in i/o just not working right mysteriously, things failing inconsistently.
I wouldn't be surprised if the rt-archer breakage wasn't a reverse assumption elsewhere in the code...
on a mildly related note to your commit comment yesterday, how about just jumping to what we talked about last year -- using capnproto for commands to talk back? the protocol could be just simple err/out log messages for now, set up and handled internal to the bu_process api. then you'd only need on descriptor (e.g., stdout) which could transmit both out+err log messages from the process.
When I spot checked, fileno(stdin) returned -2 on Windows.
I switched to using an enum to specify which channel we're intending to use, so hopefully that will alleviate the issue?
The rt-archer breakage (when I debugged) was the code testing fileno on Windows never getting an expected value for any of the three inputs (stdin/stdout/stderr) and simply giving up - that put us in an anomalous position because to archer it looked like the subprocess wasn't returning any info at all on any valid channel.
I'd like to try the capnproto approach, but I'm not sure how easy/hard that will be to get working - it's not an area of programming I'm terribly familiar with, so there'd likely be a significant spin-up cost.
I'm trying to excise bu_list from as much as possible of the libged drawing layer, both to make it easier to understand what the various pieces are doing and as a step towards being able to more easily use the bu_magic mechanism to validate things getting passed around as void *. That leads of course to the vlist and solid containers... another learning experience, but one I can no longer avoid if I'm going to really be able to following what libtclcad/archer are doing about drawing.
The gsh bits hooking up callbacks were simply trying to set up so I could get a simpler-to-debug (i.e. non-Tcl) method of executing subprocess commands for testing.
@Sean I'm sure you've got quite a bit more expertise than I do, so I'd appreciate any insights, but what it's looking like to me so far:
capnproto will allow us to serialize/de-serialize information being sent over the stdin/stdout/stderr channels, allowing for richer communication, but it doesn't itself solve the Inter-Process Communication problem.
Since we've done well over the years with stdin/stdout/stderr piped IPC (which is what allows the MGED/rt connections to work cross platform) that seems like a good way to keep going, with capnproto being used to structure the I/O so we can easily/safely move much more than we currently do over those channels (right now the most complex communication I know of is rtcheck, which uses stdout and stderr for text/vlist drawing information and simply assumes what is coming back over each channel.)
If we're going to use stdin/stdout/stderr, we don't actually want the parent process to do what I've currently got gsh doing, which is to periodically check in a thread whether any new input has arrived for processing. Rather, we would want to do an event based setup where action is triggered when the subprocess sends something down the pipe. I went with the simplest thing that got what I needed working for gsh, but that's not what we want/need to do long term.
Setting up events based on activity in the I/O channels appears to be very very platform specific. I've been hunting around trying to find a small encapsulation of the necessary logic, and so far have come up pretty dry.
ASIO (https://think-async.com/Asio/asio-1.16.1/doc/asio/overview.html) has support for file descriptors and HANDLEs, but doesn't (so far as I can tell) wrap both mechanisms under one API we could use. (That by the way also appears to be what happens in Tcl, which is why we have the file-descriptor/Tcl_Channel ifdef for the tclcad I/O callbacks. I tried once to consolidate that into just Tcl_Channel, but it didn't work on Linux...)
Chromium's IPC solves a similar problem (https://www.chromium.org/developers/design-documents/inter-process-communication) but it's not stand-alone and it's not clear to me if it would adapt easily to what we need.
https://source.chromium.org/chromium/chromium/src/+/master:ipc/
In some ways I'm actually tempted to see what it would take to extract the Tcl bits for defining these particular events and I/O management into libbu - it's proven to work, and so far I've not come across any simple, stand-alone drop-in alternative...
Hmm... looking again I see the capnproto code does seem to have IPC logic, but I can't tell if they can work without the socket APIs...
Ah, there it is... looks like we use pipe if we go with "kj::newOneWayPipe()"
/me 's brain hurts when bending it in purely C++ directions... oof.
OK, so the question is - can gsh be made to work using capnproto for IPC, events and content?
Or does this need to be wired in at the libbu subprocess management level for things like reading and writing?
We need to be able to allow the Tcl_Even t loop to manage the callback invocation for MGED/Archer to allow current behavior...
starseeker said:
When I spot checked, fileno(stdin) returned -2 on Windows.
That's interesting. It makes sense on Window because there isn't a standard input descriptor set up by default for GUI apps on Windows unless it's a console application.
That probably means something else opened up an input pipe -- and that code wherever it is didn't register it as stdin. Probably is the same bug for out/err too.
you're right that capnproto doesn't solve IPC by itself because it turns it into an RPC solution. I actually wouldn't recommend going down that route until you're ready to abandon IPC because of the obvious performance implications. from a technique perspective, though, RPC is quite a bit simpler than dealing with cross-platform IPC.
alternative to capn might be worth trying instead is zeromq -- it supports in-process (inter-thread communication) and inter-process (IPC, ports) communication in addition to capn-style benefits for the data being exchanged.
Used asio a while back and wouldn't recommend it -- it's really meant for async client/server communication (e.g., pkg alternative) aside from it pulling in the boost ecosystem.
Boost.interprocess (https://www.boost.org/doc/libs/1_63_0/doc/html/interprocess.html) would be the one that does what you're needing for ports, but I'd still try 10 other things before pulling in boost myself... :)
there are a bunch of ways to do IPC and lots have wrapped it, so I'm sure you can find one that works. some of them may just rely on a particular IPC method and that'll require changing code a little bit. for example, this one (https://github.com/jarikomppa/ipc/) uses shared memory. so instead of using fwrite, calls get changed to sprintf since shared memory works like a malloc'd buffer with both sides of the pipe able to read/write that memory.
starseeker said:
In some ways I'm actually tempted to see what it would take to extract the Tcl bits for defining these particular events and I/O management into libbu - it's proven to work, and so far I've not come across any simple, stand-alone drop-in alternative...
I'd be cool with that! It really isn't much code that we're talking about. Only issue would be that it's essentially the same problem of consolidating to Tcl_Channel. From what I saw in the code, there's no reason it shouldn't work on linux, so there's almost certainly some other mistaken assumption going on somewhere in the code and until that assumption is found and eliminated, none of these solutions are going to work.
It's very much related to the concern I have with using 0/1/2 integers and assuming they are a particular port. I don't know if it's related to this specific problem, but this is the kind of problem that causes. Really hard to debug without unwinding the port from creation to destruction on both sides of the port.
starseeker said:
OK, so the question is - can gsh be made to work using capnproto for IPC, events and content?
No, it'd be the other way around -- you would adapt gsh to capnproto rpc approach instead of events and ipc. It'd look different, but it can work (just not as performant as IPC).
starseeker said:
Or does this need to be wired in at the libbu subprocess management level for things like reading and writing?
and when I say "adapt gsh" that doesn't preclude this belonging in libbu. libbu would ideally provide a call like subprocess_write and subprocess_read or something similar to abstract from the method underneath. The benefit of file descriptors is trying to avoid needing to do that so you can just use read/write or sprintf/sscanf.
FWIW, there is a stand-alone ASIO that doesn't need boost...
The Tcl refactor is on some ways the most incremental change, assuming it doesn't turn gnarly - even if we eventually opt for another solution, that has the advantage of knowing exactly what it should do if the migration is successful (since it's already working in place.)
This is probably an embarrassing question, but what are the implications of abandoning IPC for RPC? I thought RPC was just one form of IPC?
So I know that's a lot and probably talking through too many issues to make sense of it all. In summary, I would recommend 1) trying again to consolidate to Tcl_Channel again as whatever is making that not work likely will affect other solutions until the assumption is inadvertently ripped out, 2) try one of the many wrapped options like shared memory or named pipes.. starting with zeromq or a simpler header-only one, and finally 3) switching to RPC for libged with Capnproto after you've abandoned hope on IPC. ;)
/me nods - the Tcl_Channel thing bothered me last time, and if I take it far enough apart to digest it for extraction I should be able to run it to ground one way or the other.
Not to mention squashing another WIN32 ifdef... those are getting hard to remove these days...
starseeker said:
This is probably an embarrassing question, but what are the implications of abandoning IPC for RPC? I thought RPC was just one form of IPC?
Only embarrassing questions are the ones not asked. IPC uses a specific operating system method for allowing two processes to exchange data. typical examples are files (and file descriptors), named pipes, shared memory, message passing, and sockets. each method has significant implications on how you set up communication and how data is exchanged which is to say it's not generally possible to create a generic IPC interface that uses different methods.
you typically find one method that is implemented on different platforms similarly wrapped by a library
RPC for example is typically associated with the message passing form of IPC and message passing typically relies on the socket method of IPC data exchange
MPI is the elephant example of RPC/IPC
So capnproto's RPC API won't guarantee a specific method of communication (say, pipes vs. sockets) even if it sometimes uses pipes under the hood?
RPC is an IPC method, but it's more strongly associated with sockets and that's what I was referring to when I mentioned "abandoning IPC" .. which really was"abandon file/pipe method of IPC"
starseeker said:
So capnproto's RPC API won't guarantee a specific method of communication (say, pipes vs. sockets) even if it sometimes uses pipes under the hood?
I don't know for sure, but when I was reading their docs, I didn't see any support for file/pipe-based methods, only socket-based methods
https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/kj/async-io-unix.c%2B%2B#L1700
whereas zeromq explicitly calls them all out
yeah, that code would lead me to believe capnproto may also support pipes
I just didn't see any examples (didn't look very hard either though)
(maybe?) https://github.com/capnproto/capnproto/blob/master/c%2B%2B/src/kj/async-io-test.c%2B%2B#L311
again, though, nearly every method is going to require adopting a data exchange method, whether that's reading/writing on pipes/files (this is your closest fit currently) or reading/writing on sockets (this is typical in client+server apps) or reading/writing buffers of memory
The capnproto docs leave a bit to be desired, IMHO... at least for newbies
/me nods
Hmm... ZeroMQ is LGPLv3 and looks like they're working towards an MPL2 relicense. OK, that's workable...
So yeah, looks like capn can -- "As of version 0.4, the only supported way to communicate between threads is over pipes or socketpairs."
/me would ideally prefer to avoid getting user bug reports that parts of the application can't talk to each other...
capn notes in https://capnproto.org/encoding.html that he adopted streaming as the data exchange method (implying file/pipe or socket method, not message passing or shared memory)
looks like https://capnproto.org/cxx.html has more specific detail, under Messages and I/O
still unclear if he has helpers that wrap the communication pipe setup
if he doesn't, that might be a case for something like that header only lib that used a shared memory method -- looks like you can just point capn to it
@Sean one other note about capnproto - if we do adopt it, it bumps our minimum required C++ to C++14. Personally I'm OK with that, but I wanted to raise it in case it's of concern to you.
I’m okay with it for this, if it solves the need of communication with her commands. I would probably hesitate elsewhere but capnproto has compelling capability.
User reported a bug launching MGED on Windows 7 64-bit. They found an article mentioning a KB update, but that didn't fix it apparently. Any ideas?
147093015_10160483351802542_1761671887773690017_n.jpg
Windows 7 is too old for that function: https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime
src/libbu/datetime.c
Maybe we could use https://stackoverflow.com/a/27856440 to implement this?
I didn't realize there were still any users on Windows 7. If https://en.wikipedia.org/wiki/Windows_7 has it right even extended support ended January 2020.
@starseeker related to earlier discussion, this appears to be a consistent hard crasher: mged> search ./ebm.r /pnts.r
ERROR: bad pointer 0x7ffe677688f8: s/b db_full_path(x64626670), was librt directory(x5551212), file /Users/morrison/brlcad.trunk/src/librt/db_fullpath.c, line 264
ERROR: bad pointer 0x7ffe677688f8: s/b db_full_path(x64626670), was librt directory(x5551212), file /Users/morrison/brlcad.trunk/src/librt/db_fullpath.c, line 264
mixing relative and full path terms apparently makes it unhappy
r78317 should fix it.
Cool. Now if only could figure out why that resulted in zombie processes ... haven't seen them in ages!
@starseeker any ideas: https://sourceforge.net/p/brlcad/bugs/394/
Whoa. Those are wacky looking.
Almost looks like they're trying to move a binary from one system to another incompatible system, but if I'm reading that correctly it's the result of building and running on the same machine?
it's a power8, not an x86, if that means anything
Those errors look like it's not finding system libs correctly
starseeker said:
Almost looks like they're trying to move a binary from one system to another incompatible system, but if I'm reading that correctly it's the result of building and running on the same machine?
Yes, they appear to have compiled it themselves with BRLCAD_BUNDLED_LIBS=ON.. any response we can give them? The archer error looks like a tcl/tk 8.6 error...
/me shakes head - for the OpenGL bit all I could suggest is they try different drivers (maybe the modern Mesa gallium software rasterizing setup) and for the Archer bit I guess my first thought would be to see if bwish can run.
From the backtrace, it looks like they're already using Mesa. Being on Power9, their options are probably limited to Mesa or straight X. MGED did work with OpenGL disabled.
The archer failure is a little more concerning... as "hv3::formmanager" is from us. Main reason I can think for an "invalid command name" on it would be because the tclIndex or pkgIndex.tcl didn't get created/loaded.. which might imply something is wrong in our tcl/tk build system.
That's why I was wondering what bwish does - it will pull in most of the packages but not hv3 (which is the web viewer, iirc) so that might help scope what's wrong.
@starseeker Getting:
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
XOPENGL_glu_LIBRARY (ADVANCED)
linked by target "dm-ogl" in directory /home/sean/brlcad.main/src/libdm/glx
Is that new? This is a default build on a remote Linux system (that may or may not have glu, but I presume it doesn't and the logic isn't taking that into account correctly to disable X).
It may be a side effect of my refactor a while back to contain the X OpenGL logic. Does 320af5adad96f fix it?
checking, I just yanked the glu line and it worked for me, but suspected a better fix.
Hi everyone! So I have been learning BRL-CAD for the past couple of weeks, but I have encountered a bug in archer that I was unable to solve. I use Linux Mint 20.2. I have downloaded the source from main and built it. All the tests passed with no error, but when using snap grid in archer I get an error. I tried to look at the code, but I don't know tcl and can't solve it. I tried also to built BRL-CAD on FreeBSD 13.0, but I get the same error in archer. Snap grid works in MGED though. Has anyone else got this error? archer_error.png
@starkaiser is this simple for you to reproduce?
I’d do a couple things:
1) find out what $v and $c are, for curiousity sake; before the first if in the vscale proc, put something like:
set fh [ open ~/starkaiser_debug.log a]; puts “$v // $c”; chan close $fh
This will (not surprisingly) open a log file in your homedir w the contents of v and c. The last line (which will have caused the crash) are the most interesting
Peek-2021-07-21-22-19.gif The error appears for both edges and faces for all arbs. I have added those lines and the values in the log file are:
0 0 0 // 1.000000
-nan -nan -nan // 1.000000
The “-nan” gives us a specific clue. I’ve got a general patch candidate I’ll work on later that may be useful. Hopefully this isn’t a show-stopper for you…
Thank you! I have tried to solve it myself, but I have only managed to hop around different files, trying to understand the code. Mged works fine, so I'm using it to learn the basics
I personally really like mged. Hopefully you come to enjoy it too. Definitely a “learning wall” associated with it, but it pays off
@starkaiser - which version of brl-cad are you running?
7.32.3 I think. I compiled the latest versions from the main repository on github because older stable versions were giving me other errors, more serious. I compiled the version that I am currently using two days ago
@starkaiser Just so you're aware - you're well into the "not-well-tested" aspects of the software interacting with Archer for geometry creation. MGED will usually be the more stable of the interfaces, since Archer doesn't currently get as much use/attention.
Cool that you're digging into it - Archer's also got some of our most advanced GUI features (File->Open can trigger some of the converters to open other file types, for example.)
@starkaiser - ok, so you’re working w the tip of main, then. I can do same
@starseeker - I’m looking at some general fixes that may have knock-on effects. We’ll see how it turns out 🧐
starseeker said:
It may be a side effect of my refactor a while back to contain the X OpenGL logic. Does 320af5adad96f fix it?
That did the trick for libdm, but now there's another error from libpng being built wrong: undefined reference to `brl_png_init_filter_functions_vsx'
Looking at the png source code, it looks like we're missing all the platform-specific subdirs where that function comes from.
Can you fix that?
What platform triggers the failure? Surprisingly, I've not encountered that error before...
It looks like the POWERPC platform is defining PNG_FILTER_OPTIMIZATIONS. Unless we need that defined, my inclination would be to simply not define it.
If I'm interpreting this correctly, PNG_FILTER_OPTIMIZATIONS is for platform specific optimization logic and there is a generic fallback we can use.
@Sean does d4ecd86137fa fix it?
how does one clear out the src/other/ext builds?? make clean isn't working...
Can we get make clean fixed?.. if make builds it, make clean should still clear it.
curiously distclean appears to have deleted files I would have thought it had no business deleting , and still left bin/osdemo and src/other/libosmesa
starseeker said:
It looks like the POWERPC platform is defining PNG_FILTER_OPTIMIZATIONS. Unless we need that defined, my inclination would be to simply not define it.
But why is it customized? I would think it's far less complexity and risk to drop in as vanilla as strictly possible. It seems to be just a few files, so I can't see an argument for space/complexity/savings. No idea what the runtime implications are.
Plus, it's an impedence to upgrades and there's a real risk of cost... which it now incurred. I mean, between all the builds, rebuilding, inspecting, having to shift what I was doing elsewhere, trying again, I've now spent at least 4 hours unproductively because of it. :(
I have to hope there was some need or benefit beyond tidying up files.? A benefit that saves us time and effort?
starseeker said:
Sean does d4ecd86137fa fix it?
That does seem to have fixed it. Thank you.
Sean said:
Can we get make clean fixed?.. if make builds it, make clean should still clear it.
My understanding of how ExternalProject_Add works suggests that this will be difficult. ExternalProject builds are decoupled from the primary CMake logic internally, and each individual project's logic isn't even guaranteed to define a clean target at all. CMake doesn't provide a lot of good options for customizing the "make clean" target as far as I know...
Sean said:
curiously distclean appears to have deleted files I would have thought it had no business deleting , and still left bin/osdemo and src/other/libosmesa
That is curious - latest main doesn't do that for me using a build folder. Are you doing a build from the src dir? It doesn't reproduce for me there either...
Sean said:
I have to hope there was some need or benefit beyond tidying up files.? A benefit that saves us time and effort?
Looks like I did that back in 2016 (r68360) when was trying to scrub everything we don't need out of src/other to reduce our overall tarball size.
For src/other/ext, the obvious thing to try would be to define some custom logic to be executed by the ExternalProject_Add build steps that generates a list of all files added by the build step that the parent build could then remove, but I don't know of a way to customize the clean target in the parent CMake build to that degree.
I think these folks have a similar issue: https://github.com/klee/klee/issues/718
We could probably produce a clean-ext target that would invoke the needed steps...
starseeker said:
My understanding of how ExternalProject_Add works suggests that this will be difficult. ExternalProject builds are decoupled from the primary CMake logic internally, and each individual project's logic isn't even guaranteed to define a clean target at all. CMake doesn't provide a lot of good options for customizing the "make clean" target as far as I know...
From my reading, e.g., https://cmake.org/pipermail/cmake/2012-February/049208.html it looks like cmake is supposed to run clean on ExternalProjects. That person is trying to stop the behavior I'm expecting.
It didn't look like any of them cleaned, even if they had a cmake build..
starseeker said:
That is curious - latest main doesn't do that for me using a build folder. Are you doing a build from the src dir? It doesn't reproduce for me there either...
Nope, I'm using a build folder. I don't know what it'd do in a src tree. It was a straight up fresh cloning, cmake in build, and make calls, then make distclean when clean didn't clear out libpng. I was testing your libpng change, but couldn't get it to recompile again even with the file edited, so tried to make clean which failed, then distclean -- which left turds.
starseeker said:
Looks like I did that back in 2016 (r68360) when was trying to scrub everything we don't need out of src/other to reduce our overall tarball size.
Oof. :(
Yeah, I don't think we should keep that then, long term, especially as upgrades happen. That was a really expensive impact..
Also, that edit requires a human in the loop at all future upgrade points (i.e., more time) and docs/knowledge of the edits complicating upgrades. That in turn puts us in a position where upgrades are resisted (e.g., gdal, opennurbs, stepcode, ...). Not a healthy pattern.
Of course, opennurbs has other reasons, so that one's not entirely equivalent, but it is a bit involved to upgrade in part because of cullings (in addition to our code edits).
I saw those emails, but I'm wondering if the behavior they describe is out of date - I don't think I've ever seen the ExternalProject_Add builds follow a make clean...
exttest.tar.gz
I made a small test (attached) and the behavior I'm seeing here on a make clean is that p1 is removed, but bin/p2 is intact.
Sean said:
tried to make clean which failed, then distclean -- which left turds.
Which files were left after the distclean? One possibility is that if there are files left from older build states, distclean based on updated CMakeLists.txt files won't know it needs to remove them...
Sean said:
Also, that edit requires a human in the loop at all future upgrade points (i.e., more time) and docs/knowledge of the edits complicating upgrades. That in turn puts us in a position where upgrades are resisted (e.g., gdal, opennurbs, stepcode, ...). Not a healthy pattern.
My hope is that once it is properly matured, the new src/other/ext approach to building will make vanilla upstreams more practical. Since the new logic (so far at least) is capable of replicating the CMake RPath magic without needing all up build system replacements, the incentive to clean up the third party directories goes down.
When I'm having to write and/or maintain the build systems myself, those messy directories are a problem - that was the other reason I was stripping them down, to make it easier to understand what I had to write build logic for. When I had to do major build system work on third party deps, all future upgrades needed a human in the loop anyway, so the simplification was an overall win. I'm trying to let the dust settle on the src/other/ext system before I introduce the additional complication of swapping in things like the upstream GDAL build, and I also wanted as much in the way of automated cross platform testing in place as possible before trying that step. (I haven't had bandwidth to do it anyway, but even if I had I would have been hesitant to pile the native build systems on top of everything else.)
Even if we get to completely vanilla src/other/ext, there's still going to be some disincentive to disrupt things by upgrading those deps. I'm thinking it might be helpful if we go ahead and break ext into its own git repo and add it as a submodule. If git will support this, we could set it up as follows:
Point the src/other/ext submodule in the main repo to an equivalent of "STABLE" in the ext repo by default. (I.e., an out of the box recursive checkout of brlcad will pull a known working ext configuration.)
In the ext repo, we can then upgrade third party deps without impinging on brlcad itself. We can then test both the ext build and its integration with the parent brlcad build by checking out different versions of ext within the brlcad checkout.
If the above works (and I'm not 100% sure if it can, I need to do some experimenting with submodules) we might even be able to set up CI on the ext repo to do continual, ongoing merge and integration testing with upstream repos like GDAL that are also on github. We might define branches like ACTIVE (used by brlcad), STABLE (holds the latest released version of each of the deps, should match ACTIVE in most situations unless we need to patch a stable release for a CVE or some such), TESTING (used to keep an eye on the latest development versions of all the deps, expected to break regularly), and STAGING (similar to testing but where we can adjust problematic upstream versions back a bit if we need to to keep everything else going.)
starseeker said:
Sean said:
tried to make clean which failed, then distclean -- which left turds.
Which files were left after the distclean? One possibility is that if there are files left from older build states, distclean based on updated CMakeLists.txt files won't know it needs to remove them...
Like I said, that was the entirety of its existence, so no prior build states, no git pulls besides the edit you made to fix the png issue. It was a clean checkout, cmake + make + cmake + make (tried diff compiler), and eventually make clean, then make distclean, which then left bin/osdemo and src/other/libosmesa with a few files in there.
The plan you describe sounds good except I would suggest we keep ext branching simpler. Having to hunt for which branch has the deps that works would be a bit .. frustrating to say the least.
I'd think we just have main track ext main and STABLE tract ext STABLE and leave it at that for starters. I.e., what worked for the last release, and whatever is currently needed for main. This would make main be your ACTIVE+TESTING+STAGING branches and it'd be on us to make branches while testing risky efforts, but without any reuqired formality beyone main and STABLE. I like the idea of possibly having main track ext STABLE for some added stability, but I could go either way. I'd hope any instability is very short lived.
vanilla ? :)
starkaiser said:
Peek-2021-07-21-22-19.gif The error appears for both edges and faces for all arbs. I have added those lines and the values in the log file are:
0 0 0 // 1.000000
-nan -nan -nan // 1.000000
I just downloaded and compiled the 7.32.4 release and now the snap to grid mode in Archer works great!
The 7.32.4 release is based on older code, and doesn't incorporate most of the changes in main
The most likely problem for issues in main is refactoring work I was doing to shift logic down the library stacks (primarily out of libtclcad, but also some out of libged into lower layers.)
I may have missed a step when moving the snapping functions.
I haven't gotten to the editing modes yet - they'll be one of the very last things to shift to the Qt GUI, because of the amount of work involved - and so the snapping behaviors haven't yet been tested post refactor (or rather, your Archer test has served as an inadvertent test).
@Sean do we have a standard way to get SSIZE_MAX on Windows? limits.h doesn't seem to have it...
Hmm. The only other use in our code is an ifndef test in common.h...
belated sorry about that, but I saw your fix and seems good enough. there's not a standard way other than including limits.h which we already do.
I could probably key off some other limit as it just needs to be some imposed limit to satisfy the cert/stig issue, tainted input sanitization
Both ogl and X framebuffer appear to be non-functional (on Mac) ... not sure since when as I've been in a different section of the code, but ogl fails and X crashes.
Looks like it's bcc41b5798 that's causing the fb issues, at least for X.
@Sean Not sure about ogl, but I think I addressed the X issue. I don't see an ogl failure on Linux...
oh, didn't report back on this until now, but the ogl issue never went away... still a hard failure on mac, no archer, no /dev/ogl
fbhelp reports:
=============== Current Selection ================
bu_shmget failed, errno=22
bu_shmget: Invalid argument
ogl_getmem: Unable to attach to shared memory, using private
fb_ogl_open: double buffering not available. Using single buffer.
Assertion failed: (glx_dpy), function __glXSendError, file ../src/glx/glx_error.c, line 44.
Should look like this:
=============== Current Selection ================
ogl_getmem: shmget failed, errno=22
ogl_getmem: Unable to attach to shared memory.
Description: Silicon Graphics OpenGL
Device: /dev/ogl
Max width height: 16384 16384
Default width height: 512 512
Usage: /dev/ogl[option letters]
p Private memory - else shared
l Lingering window
t Transient window
d Suppress dithering - else dither if not 24-bit buffer
c Perform software colormap - else use hardware colormap if possible
s Single buffer - else double buffer if possible
b Fast pan and zoom using backbuffer copy - else normal
D Don't update screen until fb_flush() is called. (Double buffer sim)
z Zap (free) shared memory. Can also be done with fbfree command
Current internal state:
mi_doublebuffer=1
mi_cmap_flag=0
ogl_nwindows=1
X11 Visual:
TrueColor: Fixed RGB maps, pixel RGB subfield indices
RGB Masks: 0xff0000 0xff00 0xff
Colormap Size: 256
Bits per RGB: 8
screen: 0
depth (total bits per pixel): 24
also, definitely seeing some regression in the conversion code. was doing an obj-g conversion, was giving me some new errors -- checked against a rando prior release (7.30 I think) and prior succeeded where current main does not (completes, but results in bad/flipped faces).
Here's a visual example, left is prev, right is curr: image.png
Here's that geometry if you want to see if you can track it down.. PoliceLifterSpeed.obj
Shouldn't need this but just in case: PoliceLifterSpeed.mtl
Can you double check what version succeeded? I've tried a number of 7.30 obj-g conversions, and so far they all produce the bad geometry here.
7.28.2 shows bad as well
Oof, I thought I grabbed 7.30, but it's looking like if I just opened the .g file, then it would have fired up a 7.24 release.
looks like it might have been 7.26 actually
So I guess the good news is it's not a release blocker, thanks for checking it
@starseeker not sure if it’s recent but cmake summary has a blank entry for Iwidgets. I took a look but couldn’t follow the logic, appeared to be handled differently from the other _BUILD vars. Would you take a look? TCL is ON, Tk is Disabled, Itcl/Itk is ON (Itcl only), and Iwidgets is blank.
This is a server / no-X build on Ubuntu.
@Sean got it, thanks - was missing an else case in the iwidgets.cmake flow
(my real motivator for the Qt work - get rid of all the Tcl/Tk build logic ;-) )
@starseeker Here's one of the errors that a couple of them got:
warning: error while sourcing archer_launch.tcl: couldn't read file "tclscripts/archer/itk_redefines.tcl": no such file or directory
@starseeker I've run into that particular itk_defines.tcl error before as well, if that helps. I'm not sure the conditions but it doesn't seem to interfere with the build as much as it was on Windows
I did get a trace on the all-apps-crashing again bug -- it appears to be something inside libdm during application shutdown. Valgrind is pointing at some unknown symbols in that library:
--11359-- Discarding syms at 0x107420000-0x107474000 in /Users/morrison/brlcad.main/.build/lib/libdm.20.0.1.dylib (have_dinfo 1)
--11359-- Discarding syms at 0x1074cc000-0x1074d4000 in /Users/morrison/brlcad.main/.build/lib/libpkg.20.0.1.dylib (have_dinfo 1)
--11359-- Discarding syms at 0x1092e4000-0x1092f0000 in /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib (have_dinfo 1)
==11359== Jump to the invalid address stated on the next line
==11359== at 0x107446636: ???
==11359== by 0x10744657E: ???
==11359== by 0x10743CA42: ???
==11359== by 0x7FFF2071ED24: ??? (in /dev/ttys000)
==11359== by 0x7FFF2071F00F: ??? (in /dev/ttys000)
==11359== by 0x7FFF2080AF43: ??? (in /dev/ttys000)
==11359== Address 0x107446636 is not stack'd, malloc'd or (recently) free'd
Actually, if that output is strictly ordered, that may be the issue. It's unloaded libdm, but then goes to unload dm/libdm-ps.dylib and a call is made into libdm...
Um. How do we control that ordering?
I could be wrong on that interpretation, but it def appears to be something libdm plugin-related
For the itk_defines.tcl error, the question I have is whether share/tclscripts/archer/itk_redefines.tcl is present - if not, it may be a missing dependency on the cp target for that file, if it is then it's something about the paths in the Tcl environment.
Can we check if there are any undefined symbols inthe so files?
Let me see if I can make the target for copying that file an explicit dependency of archer...
I'll see if I can trigger the itk_redefines.tcl error, and check -- it was pretty consistent for me for a while, but I'd been ignoring it
which so files? libdm-plot or libdm or ?
the libdm-ps.dylib file, if that's the one factoring into the crash above
I'm not seeing where/why in the code the plugins have any code that would be getting called
there's no apparent atexit handler. oh, maybe dlopen registers one.. that might be.
If it's the unloading code, a quick check would be to comment out the unloading bits in libdm_clear
in dm_init.cpp
(base) morrison@agua .build % nm libexec/dm/libdm-ps.dylib
U _Tcl_AppendStringsToObj
U _Tcl_DuplicateObj
U _Tcl_GetObjResult
U _Tcl_SetObjResult
U ___stack_chk_fail
U ___stack_chk_guard
00000000000100d0 d __dyld_private
U _bu_calloc
U _bu_free
U _bu_log
U _bu_vls_addr
U _bu_vls_free
U _bu_vls_init
U _bu_vls_printf
U _bu_vls_sprintf
U _bu_vls_strcpy
0000000000010550 b _disp_mat
000000000000b770 T _dm_plugin_info
0000000000010138 d _dm_ps
0000000000010150 d _dm_ps_impl
U _draw_Line3D
U _fclose
U _fflush
U _fopen
U _fprintf
U _fputs
00000000000106a8 s _head_ps_vars
U _memcpy
U _memset
00000000000104d0 b _mod_mat
U _null_String2DBBox
U _null_SwapBuffers
U _null_beginDList
U _null_configureWin
U _null_doevent
U _null_drawDList
U _null_drawPoint3D
U _null_drawPoints3D
U _null_endDList
U _null_freeDLists
U _null_genDLists
U _null_getDisplayImage
U _null_loadPMatrix
U _null_makeCurrent
U _null_openFb
U _null_reshape
U _null_setDepthMask
U _null_setLight
U _null_setTransparency
U _null_setZBuffer
000000000000c010 s _pinfo
0000000000009120 t _ps_close
000000000000b6a0 t _ps_debug
000000000000b290 t _ps_draw
00000000000092f0 t _ps_drawBegin
0000000000009360 t _ps_drawEnd
0000000000009af0 t _ps_drawLine2D
0000000000009c00 t _ps_drawLine3D
0000000000009c60 t _ps_drawLines3D
0000000000009cf0 t _ps_drawPoint2D
0000000000009940 t _ps_drawString2D
0000000000009d60 t _ps_drawVList
0000000000010690 b _ps_drawVList.fin
0000000000010650 b _ps_drawVList.last
0000000000010670 b _ps_drawVList.start
0000000000009430 t _ps_hud_begin
00000000000094a0 t _ps_hud_end
0000000000009510 t _ps_loadMatrix
000000000000b700 t _ps_logfile
0000000000008310 t _ps_open
00000000000104c0 b _ps_open.count
000000000000b3f0 t _ps_setBGColor
000000000000b340 t _ps_setFGColor
000000000000b480 t _ps_setLineAttr
000000000000b540 t _ps_setWinBounds
00000000000100e0 d _ps_usage
00000000000092a0 t _ps_viable
00000000000105d0 b _psmat
U _setbuf
U _sscanf
U _vclip
U dyld_stub_binder
U is undefined?
oh, there it is! I was looking for whatever was triggering the unloading... plain as day in dm_init.cpp
yep
oh, I wonder if this is that age-old STL issue...
0a1de42aee may help with the Archer bit
testing a fix for dlclosure issue, and I'm coincidentally getting the itk_defines issue so I'll update here when I can and test
may be related to this huge blather: WARNING - bu_dir's bin value is set to ., but binary being run is located in /Users/morrison/brlcad.main/.build. This probably means you are running btclsh from a non-install directory with BRL-CAD already present in . - be aware that .tcl files from . will be loaded INSTEAD OF local files. Tcl script changes made to source files for testing purposes will not be loaded, even though btclsh will most likely 'work'. To test local changes, either clear ., specify a different install prefix (i.e. a directory without BRL-CAD installed) while building, or manually set the BRLCAD_ROOT environment variable.
Urm. I've seen that too, but didn't seem to trigger the issue for me. However, that shouldn't be happening, so I'll see if I can take a quick look...
my change appears to have fixed the dynamic unloading crashes
I'll push that up here in a sec
okay pushed.. I think what was going on is because the iterator was registering ABC and then closing ABC .. and badness was happening. now unloads in reverse order, so ABC->CBA, and that appears to have resolved whatever dependency tracking badness was going on. I suspect it's either plugins that refer to other plugins (thus needing to be in order) or the dynamic linker doing recursive reference counting and thinking it was done with libdm as the dlclose() plugins were unloaded and references got updated.
on an unrelated note, we're probably going to need to set up code signing before our next major release. that's apparently the solution to all the firewall triggers that go off every build. Looks like these guys have a module: https://github.com/Monetra/libmonetra/blob/master/CMakeModules/CodeSign.cmake
cpack appears to have some built-in stuff too for what it produces, though that doesn't address the build tree like that module seems to
Where do we get a certificate?
@Sean c7cd672c8 appears to break Linux:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0 0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff74e98b8 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd300, operate=<optimized out>,
args=<optimized out>) at dl-error-skeleton.c:208
#2 0x00007ffff74e9983 in __GI__dl_catch_error (objname=0x55555558b830, errstring=0x55555558b838,
mallocedp=0x55555558b828, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#3 0x00007ffff39dab59 in _dlerror_run (operate=operate@entry=0x7ffff39da420 <dlclose_doit>, args=0x6) at dlerror.c:170
#4 0x00007ffff39da468 in __dlclose (handle=<optimized out>) at dlclose.c:46
#5 0x00007ffff4bd8a9a in bu_dlclose (handle=0x6) at /home/cyapp/brlcad/src/libbu/dylib.c:66
#6 0x00007ffff48ad034 in libdm_clear () at /home/cyapp/brlcad/src/libdm/dm_init.cpp:201
#7 0x00007ffff48ad654 in libdm_initializer::~libdm_initializer (this=0x7ffff48e8590 <LIBDM>,
__in_chrg=<optimized out>) at /home/cyapp/brlcad/src/libdm/dm_init.cpp:217
#8 0x00007ffff73d015e in __cxa_finalize (d=0x7ffff48e6c20) at cxa_finalize.c:83
#9 0x00007ffff48970f7 in __do_global_dtors_aux () from /home/cyapp/brlcad-build/lib/libdm.so.20
#10 0x00007fffffffdd90 in ?? ()
Don't tell me Mac wants it one way and Linux the other...
it indeed is working much better for me, but that 0x6 handle in your stack there is suspicious. I think I may have been careless with the iterator.
yeah, end() shouldn't be valid and that's where I made it start. surprisingly works...
fixing.
you're good enough to catch mistakes like that! heh. it was an invalid loop! bogus handle was a dead give-away..
I feel first response should never be to inject platform identifiers... revert if needed, or at least give me a chance to fix it... see if latest is any better.
@Sean confirmed, that's got it
@starseeker bad news is that the crash isn't gone... declared victory too soon. there's still something very distinctly wrong in the loading/unloading..
That's weird. As I recall that code is pretty straightfoward - load on initialize, unload on exit. Not sure where to go hunting for trouble...
unless the bu_dlopen/bu_dlclose wrappers are missing something maybe?
@Sean I don't know if it helps any, but src/libbu/tests/dylib is intended to be a small, self-contained testing of that mechanism...
@Sean do you know when this started? (i.e. has it been doing it ever since the dm/ged plugin work, or did some more recent change kick it off?)
To me the strangest thing is that neither the local mac here nor the CI runners seem to be exhibiting it. And the CI build for the mac indicates it's running the ASCII to .g conversions, which (at least on the Linux box here) did trigger crashing when the unloading wasn't working.
It's the same behavior I've been seeing for months, I think since the dm/ged plugin work. It doesn't appear to be 100% deterministic as it seems to depend what symbols are in use, implying it's involving the dynamic linker and when a particular symbol or set of symbols are encountered.
It doesn't appear to affect more complicated apps that call lots of symbols (e.g., mged or gcv, etc) as much (or at least as visibly). Seems to be most noticeable on a handful of smaller simpler apps that essentially do nothing (but still load and unload nearly everything), and every now and then on something more complicated.
I think there's possibly something fundamental in play here (like the ordering) and Mac happens to be provoking. When I watch the binary's DYLD loading/unloading, there is some strangeness going on. The libged plugins are loading, and then it loads dm and it's dependency libraries. It appears to be choking up when it goes to unload dm and friends.
Anything I can do to help?
Honestly, I'm not sure.. I just got a clue that it may be related to dm-X and dm-ogl, and that latter is still fully busted on Mac for me -- so that might be something you could check on -- if mged, archer, and such work for you on mac from a build dir. If it works, we can maybe trace backwards to figure out where things diverge.
OK, I'll check on that. I know qged worked with Qt6 on mac, but I didn't try archer and I'm not sure if qged was doing its swrast fallback or not...
From what I think I'm seeing is that apps that already link libdm and/or X11 have no problem. It's when an app doesn't use either, but then libged loads libged-dm.dylib, that loads libdm and all it's deps. It's when libdm's deps get unloaded that it segfaults.
which makes sense since an app using libdm is not going to unload it
I don't have a qt build. I've been living in mged -c land for a while once ogl stopped working.
I haven't tried an opengl-disabled build to see if non-classic mode will fire up X correctly
OH! Yeah, I had to refactor some code for a libgcv plugin because of that - apparently a dynamically loaded lib can't go and load another dynamically loaded lib.
Hmm. I'd hate to give up the libged dm command...
Archer does work from the build dir
Okay, I think I just ruled out X11/ogl -- if I remove libdm-X and libdm-ogl, it still segfaults
dyld: unloaded: <970A62D7-21A7-3363-92AC-41D3E3ED2AF5> /Users/morrison/brlcad.main/.build/libexec/ged/libged-autoview.dylib
!!! REMOVING 0x7fd514c08d00 unknown
dyld: unloaded: <D8509635-B237-3585-B70C-823C95F4B5CB> /Users/morrison/brlcad.main/.build/libexec/ged/libged-attr.dylib
!!! REMOVING 0x7fd514c08ba0 unknown
dyld: unloaded: <F4435FE5-243C-3286-B0D3-CEDC50774EEE> /Users/morrison/brlcad.main/.build/libexec/ged/libged-arot.dylib
!!! REMOVING 0x7fd514e05d90 unknown
dyld: unloaded: <B9E74045-8CE7-3438-B699-54645171BFC3> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-swrast.dylib
dyld: unloaded: <21900CBB-094E-349C-A1B2-BAD779BDCF15> /Users/morrison/brlcad.main/.build/lib/libosmesa.dylib
!!! REMOVING 0x7fd514e059b0 unknown
dyld: unloaded: <C363B743-FE6B-3D4A-8513-953A5F6FAF28> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib
!!! REMOVING 0x7fd514e054c0 unknown
dyld: unloaded: <E8A365D0-5923-386F-A9BD-7DA434D46324> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-plot.dylib
!!! REMOVING 0x7fd514d069b0 unknown
dyld: unloaded: <DE144727-5E38-36CE-BFDD-A11CB151703E> /Users/morrison/brlcad.main/.build/lib/libdm.20.dylib
dyld: unloaded: <219AC144-E743-3037-8F1C-9B313D82BB1A> /Users/morrison/brlcad.main/.build/lib/libpkg.20.dylib
dyld: unloaded: <0AC2C158-06D9-3273-962E-FD0F51813D60> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-txt.dylib
zsh: segmentation fault DYLD_PRINT_LIBRARIES=1 bin/cad_user 2>&1
So the "REMOVING <address> unknown" warnings are the problem?
so what's going on there is it's unloading everything, is unloading the last libdm-*.dylib (libdm-plot.dylib in this example) and it unloads libdm itself since reference-counting-wise, nothing else is using it.
Ah, so it's getting to libdm before libdm-txt, which is a no-no?
no, that's my manual debug printing, I'm printing out all the library pointers on load and unload. You put them in a std::set, we the name is unknown, but it's basically the lines that follow -- and in the full log, the pointer address can be matched to the load statement where the name was known
Oh, OK.
I think I see what's wrong, but I'm not sure what to do about it.
Do I need to redesign or back off the plugin approach?
1) libged static initializer runs and plugins get dlopened, each one getting resolved by the dynamic linker which loads its dependent libraries, among those being..
2) libged-dm loads, which dynamic loads libdm, which static initializer runs and plugins get dlopened, each one getting... yada yada, and then
3) app runs, does it's thing, returns from main
4) libged destructor runs, starts unloading ged plugins, libged-dm unloads for example but dependencies are not yet unloaded
5) libdm destructor runs, starts unloading dm plugins and dependencies (perhaps asynchronously), and when it gets to the last plugin ...
6) dynamic linker unloads libdm itself, and this appears to happen while libdm's destructor is still running
7) seg faults, presumably on next iteration of the loop or on return from the destructor
please don't go shotgunning the plugins just yet! -- I have a swath of unpushed commits rebased on main, hundreds of changes to eliminate the per-command API
Don't worry, I'm not going to do anything drastic. Just trying to get a sense of what we're facing
it'll conflict for sure if you go ripping on it too much
I think what is needed is to either ensure destruction is deferred, or order is somehow guaranteed by symbols
If I absolutely have to I can ditch the dm command as a libged command and make it available some other way, but it's still a potential issue if anyone else happens to set up a similar conundrum for the unloaders...
I mean there is one possibility of simply not auto-loading everything. Only load as called.
Would that help in the unloading calls though? Or do you mean immediately unloading after execution as well?
yeah, I think the dm plugin just provokes the issue, and isn't the issue itself. seems reasonable/likely that future plugin will require some lib. only issue might be like you said -- a dylib with a static initializer that loaded another dylib with a static initializer, and trying to avoid that
oh gosh, no, not unloading after execution. only loading what is used, and unloading everything that was loaded on shutdown. that would handle this specific case (because very little uses the dm plugin)
and apps that DO use the dm plugin appear to be gui and link dm, so it's never unloaded
Ah. So the gsh tool should provoke the issue then, if the dm command is used.
Oh, nope - I added libdm to that lib list
the test would be to run something that dynamically loads libdm, run the dm command, and see if it behaves on exit
yeah, it'd require removing libdm from gsh's lib list, run dm command, and see if exit behaves
I'll try that here... one sec. Looks like I've got some actual dm library calls in there, so I'll have to turn off a couple things.
interesting. so if I remove all dm plugins, it still loads libdm dynamic, and eventually unloads it some time after libged-dm is unloaded seemingly without issue. valgrind is clean.
which is to say it's not simply returning from libdm's destructor that's causing the seg fault. it's that it is in the plugin unloading loop and it unloads a plugin that the corruption happens
happens even with just the txt plugin and no others..
Is there anything different about the libdm plugins compared to the libged plugins?
Not intentionally...
OK, confirm - if I take out the libdm explicit library calls from gsh, it crashes on exit after running "dm types"
I'll go ahead and commit that turned off so we have a simple test case - will be easy to turn back on later.
so this all centers around the c++ trick of using static initialization with a class we're using to ensure constructor/destructor code is called when a library is loaded/unloaded, and that's what is not playing -- it's unloading the library before the destructor is done
Options I think are....
1) make libdm not plugin-based, as that would avoid a dynamic lib loading other dynamic-loading/unloading libs,
2) make libged only load plugins on-demand and hope any plugins like dm that load other dynamic-loading libs will already be loaded,
3) defer unloading to libbu unloading -- basically make bu_dlclose schedule something for closure and wait,
4) find a different mechanism (avoid using constructor/destructor since that's at the heart of why this fails)
1) is possible - it was done primarily to keep Tcl out of the core libs, but I can also just put those backends requiring it behind an ENABLE_TCL check like that one shader in liboptical.
My bigger concern is what happens if we start supporting 3rd party GED commands and someone else adds a command that does their own libdm-esque magic behind the scenes.
3) appeals, but I don't know how practical it is
1) is probably the shortest path back to working reliably, and realistically it's pretty unlikely we're going to get a lot of custom libdm backend implementations anytime soon to take advantage of the modularity.
I also wonder what will happen if we expose libgcv through any of the GED commands - mightn't there be a similar issue?
Would 4) involve (say) making ged_init and ged_free be responsible for plugin loading and unloading?
yeah, something like that - making the loading and unloading a little more explicit. I suspect just having the loop that does destruction be explicitly called would avoid the segfault because the dynamic loader would know that it can't unload the parent dm/ged/gcv library
I think I can try #3 pretty quickly, and see if it does the trick. I suspect it will. The downside is memory use until libbu is unloaded. Probably could have API forcibly unload on demand if that becomes an issue, but unlikely an issue in our case until we're talking about thousands of plugins.
Sounds good. If that doesn't work let me know if you want me to do either 1) or 4)
may still be a benefit to doing #2 (faster load times) -- there is some occasional huge pause on certain (usually infrequent) runs that I assume is something the dynamic loader is doing. seem the pause especially on Windows, 30-60+sec before mged displays.
Deferred appears to work nicely:
...
!!! REMOVING 0x7fdd55c0a030 unknown
!!! REMOVING 0x7fdd55c09dc0 unknown
!!! REMOVING 0x7fdd55c09b50 unknown
!!! REMOVING 0x7fdd55c098e0 unknown
!!! REMOVING 0x7fdd55c09670 unknown
!!! REMOVING 0x7fdd55c09330 unknown
!!! REMOVING 0x7fdd55c09130 unknown
!!! REMOVING 0x7fdd55c08ef0 unknown
!!! REMOVING 0x7fdd55c08d90 unknown
!!! LIBDM DESTRUCTOR
!!! REMOVING 0x7fdd579054c0 unknown
!!! REMOVING 0x7fdd57905140 unknown
!!! REMOVING 0x7fdd57904a50 unknown
!!! REMOVING 0x7fdd5780ecc0 unknown
!!! REMOVING 0x7fdd55e06400 unknown
!!! REMOVING 0x7fdd55d07010 unknown
dyld: unloaded: <484BDA57-EC5C-3533-8271-1213BE720173> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ogl.dylib
dyld: unloaded: <7CD794FB-07E7-3E51-B7CE-CB9585477278> /usr/local/opt/libxrender/lib/libXrender.1.dylib
dyld: unloaded: <466439D8-1576-33B8-AE38-F4AD4CBCDC3F> /opt/X11/lib/libGLU.1.dylib
dyld: unloaded: <0AC2C158-06D9-3273-962E-FD0F51813D60> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-txt.dylib
dyld: unloaded: <E8A365D0-5923-386F-A9BD-7DA434D46324> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-plot.dylib
dyld: unloaded: <C363B743-FE6B-3D4A-8513-953A5F6FAF28> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-ps.dylib
dyld: unloaded: <B9E74045-8CE7-3438-B699-54645171BFC3> /Users/morrison/brlcad.main/.build/libexec/dm/libdm-swrast.dylib
dyld: unloaded: <21900CBB-094E-349C-A1B2-BAD779BDCF15> /Users/morrison/brlcad.main/.build/lib/libosmesa.dylib
dyld: unloaded: <6461ED77-30C4-3D90-8FFE-224EF5B8365F> /Users/morrison/brlcad.main/.build/libexec/ged/libged-dsp.dylib
dyld: unloaded: <1536E3E1-0B02-3F94-92A2-00D48E37B256> /Users/morrison/brlcad.main/.build/libexec/ged/libged-edmater.dylib
dyld: unloaded: <F0DAA927-CCFA-3F6D-B79B-BC27BDB6A3A8> /Users/morrison/brlcad.main/.build/libexec/ged/libged-env.dylib
dyld: unloaded: <74AF84F3-50D3-398F-9470-8C0F4DC17813> /Users/morrison/brlcad.main/.build/libexec/ged/libged-erase.dylib
dyld: unloaded: <52AACC7B-7DD1-3EA6-BF05-7D1073E5ADC1> /Users/morrison/brlcad.main/.build/libexec/ged/libged-exists.dylib
dyld: unloaded: <35739FC4-A62C-3F93-8E41-B355D7E4D5A2> /Users/morrison/brlcad.main/.build/libexec/ged/libged-expand.dylib
dyld: unloaded: <C079326A-9961-3C29-9CB0-18D9CCA48C32> /Users/morrison/brlcad.main/.build/libexec/ged/libged-eye_pos.dylib
dyld: unloaded: <F40B173E-8839-3244-A200-C1BEAC11EB7E> /Users/morrison/brlcad.main/.build/libexec/ged/libged-facetize.dylib
dyld: unloaded: <47715816-3B66-3BDF-85E8-915D193BDDD4> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fb2pix.dylib
dyld: unloaded: <CA5B33DE-CF2C-3D33-95D8-CDCD86B4C109> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fbclear.dylib
dyld: unloaded: <479418C5-FF5B-3D14-BEEB-D095AD4D4C55> /Users/morrison/brlcad.main/.build/libexec/ged/libged-find.dylib
dyld: unloaded: <7F8475E5-81F6-3032-9465-72E7D321179A> /Users/morrison/brlcad.main/.build/libexec/ged/libged-form.dylib
dyld: unloaded: <0DABEEDD-2EBE-327A-8B17-6C9FFEDA693B> /Users/morrison/brlcad.main/.build/libexec/ged/libged-fracture.dylib
dyld: unloaded: <5E8181FA-7584-37BF-96BE-7E9819B89D52> /Users/morrison/brlcad.main/.build/libexec/ged/libged-gdiff.dylib
...
FYA, I'm trying to get set up with Visual Studio 2022 now - I think the Github CI system made the upgrade.
Rather worrisome in that the openNURBS build appears to be failing with an internal compiler error...
basically it cruises through the destructor and schedules all the dylibs for closing. then when libbu is unloaded or an explicit dlunload() is called, it actually closes them all.
alrighty.. all tests back to passing. still no ogl, but progress!
I'm still sorting through compiler errors with the tamu students.. almost all ran into issues. any idea why CHECK_CXX_FLAG(fsanitize=fuzzer) would be passing on Windows??? It did, and then proceeded to fail during compile because of the flag.
Not yet - it's something about Visual Studio 2022
another tried in WSL, which I've done myself, but their build ended up unable to find Tcl's configure for some reason
I'm seeing it myself here, but I don't know yet why that test would pass
CHECK_START: Performing Test FSANITIZE_FUZZER_CXX_FLAG_FOUND
CHECK_PASS: Success
Performing C++ SOURCE FILE Test FSANITIZE_FUZZER_CXX_FLAG_FOUND succeeded with the following output:
Change Dir: C:/brlcad-build/CMakeFiles/CMakeTmp
Run Build Command(s):C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe cmTC_30228.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=17.0 /v:m && Microsoft (R) Build Engine version 17.1.0+ae57d105c for .NETFramework^M
Copyright (C) Microsoft Corporation. All rights reserved.^M
^M
Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31104 for x64^M
Copyright (C) Microsoft Corporation. All rights reserved.^M
cl /c /Zi /W3 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D _POSIX_C_SOURCE=200809L /D _XOPEN_SOURCE=700 /D FSANITIZE_FUZZER_CXX_FLAG_FOUND /D "CMAKE_INTDIR=\"Debug\"" /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /Fo"cmTC_30228.dir\Debug\\" /Fd"cmTC_30228.dir\Debug\vc143.pdb" /external:W3 /Gd /TP /errorReport:queue -fsanitize=fuzzer "C:\brlcad-build\CMakeFiles\CMakeTmp\src.cxx"^M
src.cxx^M
cmTC_30228.vcxproj -> C:\brlcad-build\CMakeFiles\CMakeTmp\Debug\cmTC_30228.exe^M
Source file was:
int main() { return 0; }
they actually added it
so then the question is why does it fail later...
looks like the top-level unprotected one is stray. we have a fuzz regression test that does a direct test and links proper
I removed it, doesn't appear to be used
https://github.com/microsoft/vcpkg/issues/19561
It narrows down fairly quickly to trying to call methods on the const_cast<ON_SerialNumberMap*>(this) pointer. Not sure yet how to work around it.
Grr. I don't have access to the Microsoft compiler bug page referenced in the vcpkg discussion.
Looks like all we can tell students until a workaround is found or Microsoft pushes a fix is to use VS2019
removing the consts and the casts doesn't seem to help
OK... from the "cheap but functional" school... It looks like that particular class method isn't actually used anywhere, so we can just turn it off completely.
Blast, typoed the summary line. It's unused
I wonder if their build system works, implying it being something we're passing in that's untested. I don't see reference to that error in their tracker. If it uniquely affects us, it's probably our combination of flags..
looks like cmake guys encountered an issue with the /FS flag recently that they addressed, maybe related
confirmed on the mac here that gsh now shuts down clean. That was some nice work @Sean
@Sean as far as OpenGL is concerned - I think I may have asked you this already, but does glxgears or one of the other X11 OpenGL demos run successfully on your Mac?
Just got a successful build with Visual Studio 2022 (needed a clean build dir)
@starseeker yep, no problems with glxgears or previous mgeds for that matter
any idea what this is about? MicrosoftTeams-image.png
Yeah, I was going to say, I have a clean 2022 from two students now .. but one has those errors in the external project builds (maybe all of them)
maddening... why doesn't it fail with the mac here???
That's one of those rather unhelpful Visual Studio errors you get when a custom target fails.
First question - what version of CMake are they using?
For libdm+opengl, what does running ./src/libdm/tests/dm_test from the build directory show?
@starseeker I figured that one out. The full log had better detail. Turns out MSVC automatically updated/updates itself, so the compiler that CMake had originally detected no longer existed.
I think that may explain a couple build failures commonly encountered by people who have recently installed MSVC. I'm not sure if we can detect that situation as that error is absolutely inscrutible, or maybe put some advice into the Compiling page instructions to ensure MSVC is completely updated before proceeding with CMake (but then MSVC could update at any time).
I suppose it's not as common on Mac/Linux/BSD simply because the compiler isn't sitting in a versioned directory like msvc's compiler is.
Also figured out one of the other common build errors some of them ran into. If you do a Git for Windows clone of the code, the build will fail in WSL (Ubuntu) because some of the build logic appears to require unix line endings (e.g., libpng seems to be running awk).
Not sure that can be detected either, but can put a note in Compiling that one must fully start in WSL if you're going that route.
starseeker said:
First question - what version of CMake are they using?
Always the latest.
starseeker said:
For libdm+opengl, what does running ./src/libdm/tests/dm_test from the build directory show?
(base) morrison@agua .build % src/libdm/tests/dm_test
load msgs: dlsym(0x7f80706048d0, fb_plugin_info): symbol not found
Unable to load symbols from './libexec/dm/libdm-plot.dylib' (skipping)
Could not find 'fb_plugin_info' symbol in plugin
dlsym(0x7f807040bf60, fb_plugin_info): symbol not found
Unable to load symbols from './libexec/dm/libdm-ps.dylib' (skipping)
Could not find 'fb_plugin_info' symbol in plugin
Available types:
ogl
X
plot
ps
swrast
txt
nu
nu valid: 1
plot valid: 1
X valid: 1
ogl valid: 1
osgl valid: 0
wgl valid: 0
dmp name: nu
open called
dmp name: txt
close called
recommended type: ogl
anything else to check @starseeker ?
Maybe check whether the older (working) versions are linking to any libraries that are different from the newer version?
I don't know how much trouble it would be, but it would be interesting to know if qged works on that platform or not (the qged setup shouldn't require X11 opengl, so I'm curious as to whether the problem also manifests if we take X out of the equation...)
@Sean I think the bzflag reboot must have introduced a new default compiler - libbu's sort.c is suddenly making it unhappy...
@starseeker yes, see announcement -- major OS upgrade happened
Ah, OK. I can see where the error is coming from, but I'm not sure what the "correct" thing to do instead is...
I'll can take a look at it, I hadn't gotten to compiling there yet. Been chasing fires, reviewing PR commits, and answering questions all day.
speaking of which... I'll create another thread for an e-mail that came in
@starseeker sorry, that was me bashing around. Clang 13 now, gcc 10.3 is also installed, so -DCMAKE_C_COMPILER=gcc
@Erik no worries - we just need to fix the issue. I'm not confident I know what the "right" answer should be yet...
the subtracting a null ptr thing is buried in a macro from what I saw, could take a bit of doing to tease out. Using gcc pulls a sysinfo bridge header that mucks up libbu linking, that test should be moved from 'have the header' to 'can link the symbol' I think
I just did a build on the latest Fedora, compiled clean, mgen runs, but then abruptly closes after drawing anything and running rt…. Rt window displays the rendering. Terminal output says mged was Killed.
ran in gdb and there’s nothing to break on as there indeed appears to be something in the system that send mged the kill signal after forking off the rt process.
I’ve only seen that before when a process attempts to allocate too much memory, but haven’t yet seen evidence that’s what’s going on here
Oof, okay I found the evidence. It is getting killed by the Out of memory monitor. Not seeing why as it only appears to be using 1mb…
Ah, so turns out mged is using 5.4GB just with mged open… and that laptop is my low-resource test box, only has 4GB + 4GB swap. It is running out of memory. Seems a bit nuts that mged is using that much with essentially nothing open.
Looks like it’s something in DM. Every attach X is adding 1.5GB usage. Kicking off the tcltk gui adds over 4GB (presumably from the dm+fb).
@starseeker can you see what mged does for you if you run mged -c share/db/moss.g. , attach nu, attach X , close the window, attach X again, then draw all.g. ?
@Sean Wipes out with the following error:
X Error of failed request: BadDrawable (invalid Pixmap or Window parameter)
Major opcode of failed request: 62 (X_CopyArea)
Resource id in failed request: 0x460000a
Serial number of failed request: 3473
Current serial number in output stream: 3474
I'm also seeing a memory bump here. Not sure why yet.
(As an aside, here's something a little weird from valgrind when I run attach X):
==1409798== Invalid read of size 1
==1409798== at 0x483FEF0: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798== by 0x90A999E: _XimUnRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x9090892: XUnregisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x90A9866: _XimRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x909080C: XRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x5A392F2: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A349C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== Address 0xf205dc1 is 1 bytes inside a block of size 9 free'd
==1409798== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798== by 0x909FB3F: XSetLocaleModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x5A39ACA: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x5A39A4F: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x90A9866: _XimRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x909080C: XRegisterIMInstantiateCallback (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x5A392F2: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== Block was alloc'd at
==1409798== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1409798== by 0x909F756: _XlcDefaultMapModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x909FB2A: XSetLocaleModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==1409798== by 0x5A39ACA: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x5A392D9: TkpOpenDisplay (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0701: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0567: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A0F4E: TkCreateMainWindow (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59ABA1D: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59AB40C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x59A349C: ??? (in /usr/lib/x86_64-linux-gnu/libtk8.6.so)
==1409798== by 0x129B98: gui_setup (attach.c:333)
Memory is being allocated at if_X24.c:2065, from a size calculation at if_X24.c:1997
both ifp->i->if_max_height and ifp->i->if_max_width are set to 20480
I think that's coming from src/libdm/include/private.h:129
However, that doesn't explain what's going on when the window is closed...
Interestingly, the same general problem happens with ogl:
ATTACHING ogl (X Windows with OpenGL graphics)
mged> X Error of failed request: GLXBadDrawable
Major opcode of failed request: 151 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 1252
Current serial number in output stream: 1252
Also seems to be specific to the first attach - if I attach multiple windows and close the second, I can attach a new one successfully.
Clearly memory is not being freed when the window is closed...
None of dm_close, fb_close nor fb_close_existing gets triggered when the window closes.
Oof. The first thing that comes to mind is to have the dm_open command bind a some Tcl command that will call dm_close to the Tk <Destroy> event....
(deleted)
starseeker said:
Clearly memory is not being freed when the window is closed...
yeah, I noticed that too. I think that's may also be new, but more concerning is the crash. The 20480x20480 change was made back in 7.16.0 and just testing a 7.24 version, it doesn't appear to explode memory use and seems to release when windows are closed. I reduced the number down to 8096x8096 anyways, but some other change is likely involved, and I think the crash is definitely new.
starseeker said:
Sean Wipes out with the following error:
X Error of failed request: BadDrawable (invalid Pixmap or Window parameter) Major opcode of failed request: 62 (X_CopyArea) Resource id in failed request: 0x460000a Serial number of failed request: 3473 Current serial number in output stream: 3474
I think this is at the heart of the issue, but I'm not yet groking that stack trace.. will try to catch it on Mac to see if it gives a different path or at least more complete symbols -- looks like your build isn't enable-all'd.
@Sean I may have messed up the dm bookkeeping in MGED at some point - my recollection of MGED's management of those can be summed up as "messy", so it's actually quite likely I messed up somewhere. I'll see if I can tease a 7.24 build into working and try to figure out how the fb memory got freed...
One likely culprit of the increased memory usage may be my attempt to set up things so each dm has a built-in embedded fb by default. Don't know if that's behind the window crash but it's likely why the dm's are suddenly taking up memory they didn't previously
OK, got 7.24.4 building - here's what I'm seeing so far (I'm getting X11 windows and the first ogl window, but the second wipes out):
BRL-CAD Release 7.24.4 Geometry Editor (MGED)
Wed, 09 Mar 2022 13:25:05 -0500, Compilation 0
cyapp@ubuntu2019
attach (nu|txt|X|ogl)[nu]?
mged> attach X
ATTACHING X (X Window System (X11))
mged> attach X
ATTACHING X (X Window System (X11))
mged> attach ogl
ATTACHING ogl (X Windows with OpenGL graphics)
mged> attach ogl
ATTACHING ogl (X Windows with OpenGL graphics)
mged> X Error of failed request: GLXBadDrawable
Major opcode of failed request: 151 (GLX)
Minor opcode of failed request: 5 (X_GLXMakeCurrent)
Serial number of failed request: 1181
Current serial number in output stream: 1181
Huh - now I'm seeing the exact same thing with latest main, fwiw. I can't get 7.22.0 to build easily - will probably need to set up a VM if I need to go that far back.
(Oh - should make clear I'm closing each window above before proceeding to the next attach)
Aaaaand now I can't get the X attach to reproduce the failure, even with the fb memory set large... what on earth...
OK, per recent discussion, drawing in the second "attach X" does indeed crash in latest main.
Also crashes in 69b1b1bed2 (Tcl/Tk 8.5, just before the 8.6 switch)
Same thing with rel-7-24-4
7.24.0 is too old to readily build on this machine...
Okay, I swear I'd tested 7.24 and it worked, but it's bombing for me too. I documented it.
I believe you :smile: Interesting problem, in a hair-pulling sort of way...
Just FYI, I'm working on the build Action testing issues from the recent materials merge.
Sean said:
Just FYI, I'm working on the build Action testing issues from the recent materials merge.
Latest commit failing in Windows.
/me thinks sometimes why msvc shows weird message that file not found but file is still there. Now builds fine.
@Himanshu Sekhar Nayak hm, don't know what to say about that other than it helps to turn up the compilation verbosity (under Options -> Project and Solutions -> Build and Run). I typically set output to Normal and log to Detailed. That way, I can get to what exactly happened if needed.
One of the unexpected side effects of the plugin changes is frequently running into runtime crashes now whenever something changes outside the plugin dll/so/dylib that is incompatible with whatever's going on inside the plugin (as it does not appear to automatically recompile). At least that seems to be what's going on. For example, just pulled latest view changes, compiled, and then all tools exhibit hard corruption, assert failures, bu_bombing, etc. Cleaning and recompiling is apparently more often than not necessary now. Rather unexpected and unintuitive that it's not updating/recompiling the plugins. Maybe some dependencies aren't listed correctly?
Also working with a student on a hard database I/O corruption situation that seems to be new. Any database creation on his system is resulting in corrupted .g files. Others with the same setup, same msvc, etc. are not experiencing the corruption. It appears to have just started in the past two weeks.
That is unexpected - I would have figured the logic would rebuild anything that would result in such a pronounced failure.
@Sean I'll switch to working in a branch for this, so I don't keep disrupting everyone else.
It's probable I wouldn't see that breakage mode myself, as my normal MO is to clear and rebuild.
@starseeker I’m away from a computer to test, but getting multiple reports that mged is busted and recent updates, draw not working. Can you or someone else check?
@Sean I just pushed a reversion that should put it back.
so, uh, 'sup with make test failing with asc and weight? :D is that just me?
No that’s my doing. The test is detecting a change due to new material object management and I need to resolve it.
Probably will yank the attribute sync code but needs a bit of testing
and that durn kryptonite slips in... :) I was mucking with converting jenkins to a pipeline (can be dropped into the repo as /Jenkinsfile and revision controlled)
That's cool. I've been wanting to do that myself too. IaC FTW.
@Daniel Rossberg any chance we could wire up your cubes examples as unit/regression tests to make sure the gqa behavior stays correct in the future?
I was just looking at that PR too. Very interesting! Does it still interleave as resolution doubles? That is one of the current features, no ray is shot twice -- it (is supposed to) refines the gaps in-between recursively without ever reshooting the same ray.
Sean said:
I was just looking at that PR too. Very interesting! Does it still interleave as resolution doubles? That is one of the current features, no ray is shot twice -- it (is supposed to) refines the gaps in-between recursively without ever reshooting the same ray.
You may have a point here. I'll review it.
BTW, that's why I made a PR and didn't committed it directly: To give it a better review and discuss it first.
starseeker said:
Daniel Rossberg any chance we could wire up your cubes examples as unit/regression tests to make sure the gqa behavior stays correct in the future?
I'll look for this ans see, how much effort this would be. Unfortunately, the result of gqa is aprint-out, which had to interpreted first.
@Daniel Rossberg even if it reshoots, correct is obviously more important than performance. I was just more wondering if that behavior changed (and the potential effect as the grid size continues to double, if half the rays are repeat work each level)
It hadn't reshot, but also not reused the old ray-traces. Changed back to the old grid generation.
The main fault was that in lines 1003-1005 the grid sizes were recomputed with the wrong number of steps (state->steps instead of state->steps-1).
The next improvement was to use gridSpacing there too. With every refinement the "old" moments have to be reduced, and its a problem if they were computed with on value and readjusted based on a different one.
state.steps+1 in lines 2619-2661 ensures that the rays reach mdl_max.
@Daniel Rossberg thank you for that detail! really helps to understand what's going on there. that's awesome that you caught that off-by-one bug... would take me quite a while to fully re-understand what is going on in there, so glad you figured out what was wrong. :)
@Sean Looks like the recent MSVC warning changes broke gcc linux building
Thanks @starseeker and sorry, should be fixed now! I hadn't cycled back to mac or linux yet as I was really trying to immerse in a windows dev workflow as much as possible last week so I could address categoric issues from that side I'm seeing in our stig listings. Took a heck of a lot longer than expected to get things off the ground (still not done, but putting a thumbtack in it for now).
thanks for clearing the last two. was waiting for the scan to see what else was left and you'd fixed it before I got to see the next (as it's building for me locally clean)
let me know if bio.h causes a problem; might get away with these vanilla environments, but I suspect that'll need to be handled differently to be fully portable
@Sean gcc errors with latest changes:
/brlcad/src/conv/off/off-g.c: In function ‘off2nmg’:
/brlcad/src/conv/off/off-g.c:208:39: error: ‘%s’ directive output may be truncated writing up to 63 bytes into a region of size 62 [-Werror=format-truncation=]
208 | snprintf(sname, sizeof(sname), "s.%s", title);
| ^~ ~~~~~
/brlcad/src/conv/off/off-g.c:208:5: note: ‘snprintf’ output between 3 and 66 bytes into a destination of size 64
208 | snprintf(sname, sizeof(sname), "s.%s", title);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/brlcad/src/conv/off/off-g.c:209:39: error: ‘%s’ directive output may be truncated writing up to 63 bytes into a region of size 62 [-Werror=format-truncation=]
209 | snprintf(rname, sizeof(sname), "r.%s", title);
| ^~ ~~~~~
/brlcad/src/conv/off/off-g.c:209:5: note: ‘snprintf’ output between 3 and 66 bytes into a destination of size 64
209 | snprintf(rname, sizeof(sname), "r.%s", title);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Thank you! Home stretch here with musl ... hopefully one of the last issues.
That's a fun one .. fixing one issue let it detect another underlying. Commit fix pushed.
I now appear to have a full build, so I'm going to let the gitlab folks know we're good to go. Hopefully we can stay stable until the end of this week for their demo.
@Sean Hint taken - I'll shift to a branch (sorry, didn't see this until now)
@Sean do you want me to merge to RELEASE so we can start 7.34.0 shakedown?
@Sean - FYI, CheckCompilerFlag is 3.19 and newer: https://cmake.org/cmake/help/latest/module/CheckCompilerFlag.html
Yeah I just discovered that earlier today.. I fixed it but didn’t push yet.
Files known to Git are not accounted for in build logic:
doc/docbook/resources/brlcad/CMakeLists.txt
doc/docbook/resources/brlcad/brlcad-article-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-article-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-book-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-book-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-common.xsl.in
doc/docbook/resources/brlcad/brlcad-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-fonts.xsl.in
doc/docbook/resources/brlcad/brlcad-gendata.xsl
doc/docbook/resources/brlcad/brlcad-lesson-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-lesson-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-man-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-presentation-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-presentation-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-specification-fo-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-specification-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/brlcad-xhtml-header-navigation.xsl
doc/docbook/resources/brlcad/brlcad-xhtml-stylesheet.xsl.in
doc/docbook/resources/brlcad/center-table-print.xsl
doc/docbook/resources/brlcad/images/brlcad-logo-669966.svg
doc/docbook/resources/brlcad/images/brlcad-logo-6699cc.svg
doc/docbook/resources/brlcad/images/brlcad-logo-blue.svg
doc/docbook/resources/brlcad/images/brlcad-logo-cc6666.svg
doc/docbook/resources/brlcad/images/brlcad-logo-cc9966.svg
doc/docbook/resources/brlcad/images/brlcad-logo-green.svg
doc/docbook/resources/brlcad/images/brlcad-logo-limegreen.svg
doc/docbook/resources/brlcad/images/brlcad-logo-red.svg
doc/docbook/resources/brlcad/images/logo-vm-gears.png
doc/docbook/resources/brlcad/images/logo-vm-gears.svg
doc/docbook/resources/brlcad/presentation.xsl.in
doc/docbook/resources/brlcad/tutorial-cover-template.xsl.in
doc/docbook/resources/brlcad/tutorial-template.xsl.in
doc/docbook/resources/brlcad/wordpress.xsl.in
Files mentioned in build logic are not checked into the repository:
doc/docbook/resourcesCMakeLists.txt
doc/docbook/resourcesbrlcad-article-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-article-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-book-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-book-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-common.xsl.in
doc/docbook/resourcesbrlcad-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-fonts.xsl.in
doc/docbook/resourcesbrlcad-gendata.xsl
doc/docbook/resourcesbrlcad-lesson-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-lesson-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-man-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-presentation-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-presentation-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-specification-fo-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-specification-xhtml-stylesheet.xsl.in
doc/docbook/resourcesbrlcad-xhtml-header-navigation.xsl
doc/docbook/resourcesbrlcad-xhtml-stylesheet.xsl.in
doc/docbook/resourcescenter-table-print.xsl
doc/docbook/resourcesimages/brlcad-logo-669966.svg
doc/docbook/resourcesimages/brlcad-logo-6699cc.svg
doc/docbook/resourcesimages/brlcad-logo-blue.svg
doc/docbook/resourcesimages/brlcad-logo-cc6666.svg
doc/docbook/resourcesimages/brlcad-logo-cc9966.svg
doc/docbook/resourcesimages/brlcad-logo-green.svg
doc/docbook/resourcesimages/brlcad-logo-limegreen.svg
doc/docbook/resourcesimages/brlcad-logo-red.svg
doc/docbook/resourcesimages/logo-vm-gears.png
doc/docbook/resourcesimages/logo-vm-gears.svg
doc/docbook/resourcespresentation.xsl.in
doc/docbook/resourcestutorial-cover-template.xsl.in
doc/docbook/resourcestutorial-template.xsl.in
doc/docbook/resourceswordpress.xsl.in
CMake Error at CMakeTmp/distcheck_repo_verify.cmake:228 (message):
ERROR: Distcheck cannot proceed until build files and repo are in sync (set
-DFORCE_DISTCHECK=ON to override)
Sean said:
Yeah I just discovered that earlier today.. I fixed it but didn’t push yet.
pushed the fix last night, should be good to go. Possibly related, I'm seeing two "Attempt to add a custom rule to output" cmake error rmessages on libnetpbm.a.rule and libgdal.a.rule
Any ideas?
Not offhand - the logic doing that management is src/other/ext/CMake/ExternalProject_Target.cmake:442 - it in turn uses the fcfgcpy function which defines custom rules
You could try some message statements in those functions to see if you can bracket where that error is being generated
@starseeker appears to be a recent regression on draw -m2 ...
Your recent fix appears to have fixed it, nice! Thank you.
@Christopher looks like a few dirs are missing from the latest commit? (fbx, dxf, pbrt in regress/gcv)
Forgot some cleanup. Fix pushed
It seems we have some problems with brep
command. ged_brep_core
will receive four arguments before. Now we only get two.
image.png
GregoryLi said:
It seems we have some problems with
brep
command.ged_brep_core
will receive four arguments before. Now we only get two.
Just tests with a clean build of current brlcad on Linux. I got arb8.s.brep is made.
. Can you repeat your test with a clean build from scratch?
Sorry, a clean build works well. :grinning:
libged commands are loaded dynamically (as dynamic libs) and for some reason they don't always rebuild when a file has been edited despite having dependencies set in cmake (or perhaps one is missing).
so if anyone edits a header, especially a structure, they all need to be rebuilt and that doesn't always happen automatically. would be great if someone could make that not be a problem, but currently I make sure to delete the libged and libdm libs at a minimum so they're rebuilt.
Hi, I just pulled the newest codes and found I can't open .g database.
image.png
I'm working on Ubuntu 20.04 at commit 753ca33.
It's quite strange... For me, the problem existed many commit ago (before Sep 13 the problem exists). And it worked well on Aug 27. Does anyone else have this problem? Do I need to use the git bisect
command to determine the location?
That might need a bisect - it's probably related to the work I did with the open/opendb GED command work.
A naive guess is that I didn't change something from open to opendb, but it could be something else.
I just located the error using bisect. a7bba28a948a1939e53ab224fdc4e4a381cddb23 is the first bad commit.
@GregoryLi OK, that' confirms somewhere in the Archer startup stack we're calling "open" where we should be calling "opendb"
@GregoryLi Can you check if things are working again in the latest?
starseeker said:
GregoryLi Can you check if things are working again in the latest?
Yeah, it works well now. :smile:
Bah - bu_vls_vprintf tests 22 and 32 fail on Alpine Linux.
I don't believe this - facetizing tor with a tolerance of r=0.0001 is causing an nmg_mdl_to_bot failure just on the mac, which seems to be why the lod drawing test is failing.
Here's a crash reading/writing BREP:
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x00000001018c4884 libOpenNURBS.dylib`ON_Object::IsKindOf(ON_ClassId const*) const + 20
frame #1: 0x00000001017aeebc libOpenNURBS.dylib`ON_Geometry::Cast(ON_Object*) + 32
frame #2: 0x00000001007be114 librt.20.dylib`brep_dbi2on(rt_db_internal const*, ONX_Model&) + 176
frame #3: 0x00000001007be560 librt.20.dylib`rt_brep_export5 + 168
frame #4: 0x0000000100809088 librt.20.dylib`rt_generic_xform + 340
frame #5: 0x000000010003bdec mged`vls_solid + 168
frame #6: 0x000000010004c618 mged`refresh + 1040
frame #7: 0x0000000100049988 mged`main + 7080
frame #8: 0x000000019234ff28 dyld`start + 2236
commands invoked by mged:
M
M $args
M 1 0 0
adc $args
adc draw
ae
aip f
attach
center
draw $esol_control($id,name)
has_embedded_fb
ill
ill -e -i $ri $spath
ill -e -i 1 $path
ill -e -i 1 [lindex $spath_and_pos 0
ill -e -n -i $ri $spath
ill -i 1 [lindex $paths 0
ill -i 1 \$mged_gui($id,mgs_path)
in $mged_gui($id,solid_name) dsp f \
keep
keep db_glob
ls -c
ls -r
make $mged_gui($id,solid_name) $type} msg
make_name $mged_default(solid_name_fmt)} name
make_name comb@\
matpick
matpick $item
matpick -n $path_pos
matpick -n \$item
matpick [lindex $spath_and_pos 1
nirt $args
opendb
pl
postscript
press
press oill
press reject
press reset
press sill
qray basename
qray echo
qray effects
qray evencolor
qray fmt f
qray fmt g
qray fmt h
qray fmt m
qray fmt o
qray fmt p
qray fmt r
qray oddcolor
qray overlapcolor
qray script
qray voidcolor
quit
rset grid anchor
rt
saveview
sed $mged_gui($id,solid_name)}
sed -i 1 $item
sed -i 1 $spath
size
size $size
status state
tie
tie $id
tie $id $mged_gui($id,active_dm)
tree
tree $args} result
units $mged_display(units)
view
view center
view size
view_ring
view_ring next
view_ring prev
view_ring toggle
who
who phony
x -1
x -2
generated from:
grep -r -e '[^a-z]_mged_' * | sed 's/._mged_//g' | sed 's/].//g' | sed 's/;.//g' | sed 's/".//g' | sort | uniq
1738 const ON_ClassId* p = ClassId();
(gdb) print *this
$5 = {_vptr.ON_Object = 0x5d00000032, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {
static m_p0 = 0x7ffff3764e00 <ON_3dmObjectAttributes::m_ON_3dmObjectAttributes_class_rtti>,
static m_p1 = 0x7ffff37801e0 <ON_RdkUserData::m_ON_RdkUserData_class_rtti>, static m_mark0 = 0,
m_pNext = 0x7ffff376ed20 <ON_HistoryRecord::m_ON_HistoryRecord_class_rtti>, m_pBaseClassId = 0x0, m_sClassName = "ON_Object", '\000' <repeats 70 times>,
m_sBaseClassName = "0", '\000' <repeats 78 times>, m_create = 0x0, m_uuid = {Data1 = 1622531005, Data2 = 58976, Data3 = 4563,
Data4 = "\277\344\000\020\203\001", <incomplete sequence \360>}, m_mark = -2147483648, m_class_id_version = 0, m_f1 = 0x0, m_f2 = 0x0, m_f3 = 0x0, m_f4 = 0x0,
m_f5 = 0x0, m_f6 = 0x0, m_f7 = 0x0, m_f8 = 0x0}, m_userdata_list = 0x200000003a}
opennurbs_object.h:
#define ON_VIRTUAL_OBJECT_IMPLEMENT( cls, basecls, uuid ) \
void* cls::m_s_##cls##_ptr = nullptr; \
const ON_ClassId cls::m_##cls##_class_rtti(#cls,#basecls,0,uuid);\
cls * cls::Cast( ON_Object* p) {return(p&&p->IsKindOf(&cls::m_##cls##_class_rtti))?static_cast< cls *>(p):nullptr;} \
const cls * cls::Cast( const ON_Object* p) {return(p&&p->IsKindOf(&cls::m_##cls##_class_rtti))?static_cast<const cls *>(p):nullptr;} \
const ON_ClassId* cls::ClassId() const {return &cls::m_##cls##_class_rtti;} \
bool cls::CopyFrom(const ON_Object*) {return false;} \
cls * cls::Duplicate() const {return static_cast< cls *>(this->Internal_DeepCopy());} \
ON_Object* cls::Internal_DeepCopy() const {return nullptr;}
(gdb) print *bi->brep
$6 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x5d00000032, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {
(gdb) print *this
$3 = {_vptr.ON_Object = 0x7ffff372ed10 <vtable for ON_Brep+16>, static m_s_ON_Object_ptr = 0x0,
(gdb) print *this
$4 = {_vptr.ON_Object = 0xc00000004, static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {
#0 brep_dbi2on (intern=0x7fffffffd1c0, model=...) at /home/user/brlcad/src/librt/primitives/brep/brep.cpp:2321
#1 0x00007ffff75b4c82 in rt_brep_get (logstr=0x5555556a70a0, intern=0x7fffffffd1c0, attr=0x0)
(gdb) print *bi->brep
$1 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x7ffff372ed10 <vtable for ON_Brep+16>,
static m_s_ON_Object_ptr = 0x0, static m_ON_Object_class_rtti = {
#0 brep_dbi2on (intern=0x55555565e320 <es_int>, model=...)
at /home/user/brlcad/src/librt/primitives/brep/brep.cpp:2331
#1 0x00007ffff75b544f in rt_brep_export5 (ep=0x7fffffffd1a0, ip=0x55555565e320 <es_int>, UNUSED_local2mm=1,
$3 = {<ON_Geometry> = {<ON_Object> = {_vptr.ON_Object = 0x2c00000030, static m_s_ON_Object_ptr = 0x0,
static m_ON_Object_class_rtti = {
==690562== Invalid read of size 8
==690562== at 0x9569A20: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)
==690562== Address 0x13c29970 is 784 bytes inside an unallocated block of size 2,432 in arena "client"
==690562==
==690562== Invalid read of size 8
==690562== at 0x9569A23: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)
==690562== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==690562==
==690562==
==690562== Process terminating with default action of signal 11 (SIGSEGV)
==690562== Access not within mapped region at address 0x0
==690562== at 0x9569A23: ON_Object::IsKindOf(ON_ClassId const*) const (opennurbs_object.cpp:1738)
==690562== by 0x93E6722: ON_Geometry::Cast(ON_Object*) (opennurbs_geometry.cpp:24)
==690562== by 0x4CD2A99: brep_dbi2on(rt_db_internal const*, ONX_Model&) (brep.cpp:2345)
==690562== by 0x4CD344E: rt_brep_export5 (brep.cpp:2422)
==690562== by 0x4DB07D5: rt_generic_xform (generic.c:85)
==690562== by 0x4F7B733: rt_matrix_transform (transform.c:39)
==690562== by 0x16D989: transform_editing_solid (edsol.c:2712)
==690562== by 0x19247E: vls_solid (edsol.c:7349)
==690562== by 0x1D9835: create_text_overlay (titles.c:89)
==690562== by 0x1B9195: refresh (mged.c:2316)
==690562== by 0x1B72B0: main (mged.c:1695)
@Sean I might have fixed it - let me know if the latest commit works for you. (I didn't put a NEWS item in yet, want more confirmation than just "works on my box" for this sucker...)
New build error on a default build (on Mac):
morrison@Miniagua TCL_BLD-build % make
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -c -I"." -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/unix -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic -I/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/libtommath -O2 -pipe -I/Volumes/X10/brlcad/.build/bext_output/install/include -Wall -Wpointer-arith -fno-common -DBUILD_tcl -DPACKAGE_NAME=\"tcl\" -DPACKAGE_TARNAME=\"tcl\" -DPACKAGE_VERSION=\"8.6\" -DPACKAGE_STRING=\"tcl\ 8.6\" -DPACKAGE_BUGREPORT=\"\" -DNO_DIRENT_H=1 -DNO_VALUES_H=1 -DNO_STDLIB_H=1 -DNO_STRING_H=1 -DNO_SYS_WAIT_H=1 -DNO_DLFCN_H=1 -DUSE_THREAD_ALLOC=1 -D_REENTRANT=1 -D_THREAD_SAFE=1 -DHAVE_PTHREAD_ATTR_SETSTACKSIZE=1 -DHAVE_PTHREAD_ATFORK=1 -DTCL_THREADS=1 -DTCL_CFGVAL_ENCODING=\"iso8859-1\" -DHAVE_ZLIB=1 -DMODULE_SCOPE=extern\ __attribute__\(\(__visibility__\(\"hidden\"\)\)\) -DHAVE_HIDDEN=1 -DMAC_OSX_TCL=1 -DHAVE_CAST_TO_UNION=1 -DHAVE_VFORK=1 -DHAVE_POSIX_SPAWNP=1 -DHAVE_POSIX_SPAWN_FILE_ACTIONS_ADDDUP2=1 -DHAVE_POSIX_SPAWNATTR_SETFLAGS=1 -DTCL_SHLIB_EXT=\".dylib\" -DNDEBUG=1 -DTCL_CFG_OPTIMIZED=1 -DTCL_TOMMATH=1 -DMP_PREC=4 -DTCL_WIDE_INT_IS_LONG=1 -DWORDS_BIGENDIAN=1 -DHAVE_GETCWD=1 -DHAVE_MKSTEMP=1 -DHAVE_OPENDIR=1 -DHAVE_STRTOL=1 -DHAVE_WAITPID=1 -DHAVE_GETNAMEINFO=1 -DHAVE_GETADDRINFO=1 -DHAVE_FREEADDRINFO=1 -DHAVE_GAI_STRERROR=1 -DNEED_FAKE_RFC2553=1 -DHAVE_MTSAFE_GETHOSTBYNAME=1 -DHAVE_MTSAFE_GETHOSTBYADDR=1 -DNO_FD_SET=1 -DHAVE_GMTIME_R=1 -DHAVE_LOCALTIME_R=1 -DHAVE_MKTIME=1 -Dmode_t=int -Dpid_t=int -Dsize_t=unsigned -Duid_t=int -Dgid_t=int -Dsocklen_t=int -DNO_UNION_WAIT=1 -DGETTOD_NOT_DECLARED=1 -DHAVE_SIGNED_CHAR=1 -DHAVE_PUTENV_THAT_COPIES=1 -DHAVE_CHFLAGS=1 -DHAVE_MKSTEMPS=1 -DNO_ISNAN=1 -DHAVE_GETATTRLIST=1 -DHAVE_COPYFILE=1 -DTCL_DEFAULT_ENCODING=\"utf-8\" -DTCL_LOAD_FROM_MEMORY=1 -DTCL_WIDE_CLICKS=1 -DTCL_UNLOAD_DLLS=1 -DSTATIC_BUILD -fno-lto /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclStubLib.c
In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclStubLib.c:14:
In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclInt.h:36:
In file included from /Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/generic/tclPort.h:23:
**/Volumes/X10/brlcad/.build/bext_build/tcl/TCL_BLD-prefix/src/TCL_BLD/unix/tclUnixPort.h:32:10:** **fatal error:** **'errno.h' file not found**
#include <errno.h>
**^~~~~~~~~**
1 error generated.
make: *** [tclStubLib.o] Error 1
Does OSX not have errno.h?
@starseeker It most certainly does and always has. Nothing on the system has changed. Debug build worked just fine. Just the default build is dying on that error during Tcl's bext build.
Only thing I can see is all the -DNO_*_H=1 flags also look wrong, like something is wrong during/after tcl's configure phase.
e.g., saying there is no string.h or stdlib.h also
release and debug builds both seem to have worked, but I've not deleted them to check from scratch as I'm working on something else and the default build just surprised me that it's failing basic setup.
Last updated: Jan 07 2025 at 00:46 UTC