@starseeker Looks like build is Sad Mac. It finds system Tcl/Tk but then fails to find TCL_STUB_LIBRARY:
Compile Tcl ........................: OFF Compile Tk .........................: OFF Compile Itcl/Itk ...................: OFF Compile Iwidgets ...................: OFF Compile Tkhtml .....................: ON Compile Tktable ....................: OFF Compile libpng .....................: OFF Compile libregex ...................: OFF Compile zlib .......................: OFF Compile Utah Raster Toolkit ........: ON Compile openNURBS ..................: ON Compile STEPcode....................: ON OpenGL support (optional) ..........: ON X11 support (optional) .............: ON Qt support (optional) ..............: OFF Run-time debuggability (optional) ..: ON Build 32/64-bit release ............: 64BIT (Auto) Build optimized release ............: OFF Build static libraries .............: ON Build dynamic libraries ............: ON Install example geometry models ....: ON Generate extra docs ................: ON (html/man) Elapsed configuration time: 2 minutes 22 seconds -- Configuring done CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: TCL_STUB_LIBRARY linked by target "Tkhtml" in directory /Users/morrison/brlcad.trunk/src/other/tkhtml -- Generating done CMake Generate step failed. Build files cannot be regenerated correctly.
Erm. Does Apple's install of Tcl/Tk have the stub library? (Alternately, we might try just TCL_LIBRARY on Mac...)
What does ENABLE_ALL do?
@starseeker build is still sad mac with enable all too
[ 56%] Building C object src/other/tcl/CMakeFiles/tcl.dir/unix/tclLoadDyld.c.o [ 56%] Linking C shared library ../../../lib/libtcl.dylib duplicate symbol '_TclpDlopen' in: CMakeFiles/tcl.dir/unix/tclLoadDl.c.o CMakeFiles/tcl.dir/unix/tclLoadDyld.c.o duplicate symbol '_TclGuessPackageName' in: CMakeFiles/tcl.dir/unix/tclLoadDl.c.o CMakeFiles/tcl.dir/unix/tclLoadDyld.c.o ld: 2 duplicate symbols for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [lib/libtcl.8.6.10.dylib] Error 1 make[1]: *** [src/other/tcl/CMakeFiles/tcl.dir/all] Error 2
Looks like Mac is even more sad I skip that error. Termio compile errors next, missing libbu symbols.
Ah, that's me - I switched a couple logging statements and string opts to bu versions
forgot to update libtermio CMakeLists.txt
I think I fixed where OSX build was pulling in the file with colliding symbols.
The (Linux)build is still broken?
Build Time: Mon May 4 18:01:26 2020
[ 0%] Built target timestamp
[ 0%] Built target lcheck
[ 0%] Built target toplevel_DOCFILES_cp
[ 0%] Built target lemon
[ 1%] Built target re2c_bootstrap
[ 1%] Built target re2c
[ 1%] Built target perplex_template_cp
[ 2%] Built target perplex
[ 2%] Built target astyle
[ 2%] Built target env2c
[ 2%] Built target debug2c
[ 2%] Built target 04238f4c3429a37ca335f9352ba2dc81_cp
[ 2%] Built target openNURBS_headers_cp
[ 2%] Built target lz4
[ 2%] Built target lz4-static
[ 2%] Built target netpbm-static
[ 3%] Built target netpbm
[ 4%] Built target utahrle-static
[ 4%] Built target utahrle
[ 4%] Built target itclstub
[ 4%] Linking C shared library ../../../lib/libitcl.so
/usr/bin/ld: -ltclstub kann nicht gefunden werden
collect2: error: ld returned 1 exit status
make[2]: *** [src/other/itcl3/CMakeFiles/itcl.dir/build.make:250: lib/libitcl.so.3.4] Fehler 1
make[1]: *** [CMakeFiles/Makefile2:3600: src/other/itcl3/CMakeFiles/itcl.dir/all] Fehler 2
make: *** [Makefile:163: all] Fehler 2
This could be the same issue I ran across. I think the underlying issue was a system libtcl that did not install libtclstub? Or perhaps the cmake detection of it failed? I don't recall the detail but does enable all work for you¿
Right, libtcl8.6 is there, but no libtclstub. And, there seems to be no Debian package providing it.
It looks like the old find-TCL hasn't found the system TCL library, but the new one does.
@Sean Are you able to build trunk on OSX now, or is it still busted?
Trunk built yesterday, but I haven't tested it since. I'll kick off a build
@starseeker new opennurbs-zlib-related compilation error encountered after a fresh cmake:
/usr/local/Cellar/llvm/10.0.0_3/bin/clang++ -w -m64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.14 -dynamiclib -Wl,-headerpad_max_install_names -m64 -current_version 2012.10.245 -o ../../../lib/libopenNURBS.2012.10.245.dylib -install_name @rpath/libopenNURBS.2012.10.245.dylib CMakeFiles/openNURBS-obj.dir/opennurbs_basic.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_changesrf.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_kinky.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_x.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_3dm_attributes.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_3dm_properties.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_3dm_settings.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_annotation.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_annotation2.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_arc.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_arccurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_archive.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_array.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_base32.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_base64.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_beam.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_bezier.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_beziervolume.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_bitmap.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_bounding_box.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_box.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_extrude.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_io.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_isvalid.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_region.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_tools.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_brep_v2valid.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_circle.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_color.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_compress.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_cone.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_crc.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_curve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_curveonsurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_curveproxy.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_cylinder.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_defines.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_detail.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_dimstyle.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_dll.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_ellipse.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_embedded_file.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_error.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_error_message.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_evaluate_nurbs.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_extensions.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_font.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_fsp.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_geometry.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_group.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_hatch.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_instance.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_intersect.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_knot.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_layer.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_light.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_line.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_linecurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_linetype.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_lookup.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_material.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_math.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_massprop.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_matrix.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_memory.c.o CMakeFiles/openNURBS-obj.dir/opennurbs_memory_util.c.o CMakeFiles/openNURBS-obj.dir/opennurbs_mesh.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_mesh_ngon.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_mesh_tools.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_morph.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_nurbscurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_nurbssurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_nurbsvolume.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_object.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_object_history.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_objref.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_offsetsurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_optimize.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_plane.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_planesurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_pluginlist.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_point.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_pointcloud.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_pointgeometry.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_pointgrid.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_polycurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_polyedgecurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_polyline.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_polylinecurve.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_rand.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_revsurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_rtree.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_sort.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_sphere.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_string.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_sum.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_sumsurface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_surface.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_surfaceproxy.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_textlog.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_torus.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_unicode.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_userdata.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_uuid.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_viewport.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_workspace.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_wstring.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_xform.cpp.o CMakeFiles/openNURBS-obj.dir/opennurbs_zlib.cpp.o /usr/lib/libz.dylib
Undefined symbols for architecture x86_64:
"_brl_deflate", referenced from:
ON_CompressStream::In(unsigned long long, void const*) in opennurbs_compress.cpp.o
ON_CompressStream::End() in opennurbs_compress.cpp.o
ON_BinaryArchive::WriteDeflate(unsigned long, void const*) in opennurbs_zlib.cpp.o
ON_CompressedBuffer::DeflateHelper(ON_CompressedBufferHelper*, unsigned long, void const*) in opennurbs_zlib.cpp.o
"_brl_deflateEnd", referenced from:
ON_CompressStream::End() in opennurbs_compress.cpp.o
ON_BinaryArchive::CompressionEnd() in opennurbs_zlib.cpp.o
ON_CompressedBuffer::CompressionEnd(ON_CompressedBufferHelper*) const in opennurbs_zlib.cpp.o
"_brl_deflateInit_", referenced from:
ON_CompressStream::Begin() in opennurbs_compress.cpp.o
ON_BinaryArchive::CompressionInit() in opennurbs_zlib.cpp.o
ON_CompressedBuffer::CompressionInit(ON_CompressedBufferHelper*) const in opennurbs_zlib.cpp.o
"_brl_inflate", referenced from:
ON_UncompressStream::In(unsigned long long, void const*) in opennurbs_compress.cpp.o
ON_UncompressStream::End() in opennurbs_compress.cpp.o
ON_BinaryArchive::ReadInflate(unsigned long, void*) in opennurbs_zlib.cpp.o
ON_CompressedBuffer::InflateHelper(ON_CompressedBufferHelper*, unsigned long, void*) const in opennurbs_zlib.cpp.o
"_brl_inflateEnd", referenced from:
ON_UncompressStream::End() in opennurbs_compress.cpp.o
ON_BinaryArchive::CompressionEnd() in opennurbs_zlib.cpp.o
ON_CompressedBuffer::CompressionEnd(ON_CompressedBufferHelper*) const in opennurbs_zlib.cpp.o
"_brl_inflateInit_", referenced from:
ON_UncompressStream::Begin() in opennurbs_compress.cpp.o
ON_BinaryArchive::CompressionInit() in opennurbs_zlib.cpp.o
ON_CompressedBuffer::CompressionInit(ON_CompressedBufferHelper*) const in opennurbs_zlib.cpp.o
ld: symbol(s) not found for architecture x86_64
this was on mac
looks like it's still linking system libz there at the end
That's... weird. Wonder why it's doing that only on one platform.
The license scanning is in place now - might be time to subsume the openNURBS build under libbrep.
Because we're only testing 2-3 platforms?
Characterizing it diminutively as "only on one platform" is not exactly a healthy perspective for the build... That was absolutely broken elsewhere as well.
It depends how ld is configured on the system, the type of library being compiled, and options in use (two of those are in the user's purview). Libraries can be resolved at compilation, at linkage, or at runtime (or combinations thereof). Mac happens to default to resolved dynamic libraries, but Linux, UNIX, and BSD linking can be configured this way as well (and some distros do).
All that was probably needed was listing our renamed libz as a dependendency to cmake and it would have probably done the right thing.
We've hit that issue before repeatedly over the years. Usually from Mac, but also from some Linux platforms. Cmake figures it out if things are fully declared correctly.
@starseeker build now fails with a different set of issues:
In file included from /Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array.h:1788:
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:120:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
[ 12%] Built target SPSR
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:434:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:463:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:557:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:583:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:875:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:901:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:966:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
[ 12%] Building C object src/libnmg/CMakeFiles/libnmg-obj.dir/copy.c.o
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:1585:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:1627:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:1663:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:1890:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
[ 12%] Building C object src/libbu/CMakeFiles/libbu-obj.dir/whereis.c.o
/Users/morrison/brlcad.trunk/src/libbrep/openNURBS/opennurbs_array_defs.h:1970:36: error: unknown
warning group '-Wclass-memaccess', ignored [-Werror,-Wunknown-warning-option]
# pragma clang diagnostic ignored "-Wclass-memaccess"
^
trying a different compiler...
also horked. looks like more pragmas. did you add those or were they already there? some seem to be in include/defines.h, others in opennurbs files.
@starseeker I see you riddled it with the pragmas. Please, please...
Please don't make huge changes like this. I get why as we've talked about moving opennurbs, but this is so unstable an approach with pragma hacks on top of a major dependency restructure.. all to fix what should have been a single line declaration that was missing.
Reverted. FWIW, wasn't just for that - figured it would avoid the Windows zlib issue as well.
Still need to figure out why openNURBS is getting system zlib on the mac - by the time it gets to src/other/openNURBS FindZLIB should be satisfied
I'm sure it wasn't, but it's just the unexpected stress of everything coming to a halt. Was dead in the water again and that went in a more unstable direction with maintenance issues embedded.
fair enough. I'll see if I can coax a mac build out of github - that's failing when Ubuntu, BSD and Windows don't so I'm having a hard time predicting failures reliably.
Sorry if my frustration is coming through. I hear Lee from 15 years ago in my messages. Just needing things to be a bit more stable on trunk to make progress debugging.
No problem - I'm getting a bit greedy trying to push through things that I've wanted to do for a long time, and now isn't the time for it.
BSD was also broken with the pragmas, fwiw.
right, that one's on me - the class-memaccess warning must be unique to GCC right now.
(probably a real issue though - that may bite us someday with the openNURBS code...)
there were 3-4 pragmas it was complaining about, different files, that was just the first
Yeah, clang and GCC define different sets. Right now the src/other builds deliberately don't enable any of the warnings so we don't see any of that.
and I get why, but pragmas should be last resort imho, when there is not a build system solution because that is intrinsically unstable
a -w flag forced on src/libbrep/openNURBS would have been better I think
or maybe -Wno-error
Probably what would have had to happen in that case is including src/libbrep/openNURBS with -isystem
Anyway, moot
pragmas went into our headers too, which means the instability infection spread
Well, the revert should be clean - two commits only, not mixed with anything
I'm less sure what to do about the system zlib creeping in - can you send me your CMakeCache.txt file?
sure
any idea what this is about? I don't get it all the time and it doesn't halt cmake despite the message:
CMake Error: The source directory "/Users/morrison/brlcad.trunk/.build/remove" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
CMake Error: The source directory "/Users/morrison/brlcad.trunk/.build/remove" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
CMake Error: The source directory "/Users/morrison/brlcad.trunk/.build/remove" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
...
Not sure - I've been seeing that too, but haven't run it down yet.
Happens repeatedly intermixed throughout the cmake output, so it's undoubtedly in some macro/function
that message isn't in our code (at least I didnt' find it)
not sure what to make of 'remove' either as that's not a file in our repo. we do test for the remove function, but that seems entirely unrelated to the messages and this message appearing just in the past few days.
I think I noticed it when I started using a newer CMake, but I'll have to confirm that. It didn't seem to break anything, and I'm using a way newer CMake than most distros, so I wasn't sure if it needed to be run down for general use yet... Since you're seeing it too, that decides it
I'm on a newer cmake too 3.16
also have 3.12 to test with when I'm trying to make sure things work back a ways
well hot diggity, looks like github workflows will let me kick off a Mac build
And now (of course) it doesn't want to give me the remove error. Did it happen on a clean CMake run, or is it when reconfiguring with an existing build dir?
growl... the non-enable-all github osx build didn't help any - it built
enable_all might be more promising, but also quite slow...
arrrrgh. that built as well.
@Sean I've got Xcode_11.4.1.app with AppleClang 11.0.3.11030032 on the runner - is that radically different from yours?
may be a different tool chain or different compiler version or different cmake. i'll see if I can repeat the error with everything reverted.
appreciate you checking. maybe something in my newer cmake's find zlib or something with newer clang (I have multiple compilers installed, xcode's isn't sufficient for some things, e.g., fuzzing)
The log and cache files may help, once you get them. If we absolutely have to we can revert back to pre 8.6.
Yeah, struck out - got a successful make check run on github's osx runner. Only build failure I've got right now is on OpenBSD, and that's something totally different.
OK, pretty sure I found the source of the spurious .build/remove messages (cmake --trace to the rescue.)
@Sean There's a slight chance that the "build/remove" messages error could have had an impact on what was being found on the Mac - if configurations were switched between system and local, that was the piece of logic that was supposed to be clearing out stale files. The files didn't end up removed. My initial guess would have been that that failure was more likely to result in a local copy being found instead of a system copy rather than the other way around, but it certainly could have caused something unexpected to happen.
okay, I've got several builds going at the moment so I'll let you know how things are looking
I think that's about as good as it's going to get for OpenBSD right now, at least until they add a way to get the executable pathname.
how's that? none of the implemented methods work?
Apparently not. Poked around online a bit and it seems like that's just a feature that OpenBSD either deliberately doesn't support or hasn't bothered to support.
but one of the methods is a gcc method
or are they llvm only?
Primary compilation is clang/llvm now - don't know if gcc is still in there or not, but my understanding is that it's on its way out if it is
HAVE_PROGRAM_INVOCATION_NAME is false?
Looking at the whoami.c code, I think they need some runtime support from the kernel to be able to know their location
er, whereami.c rather
https://stackoverflow.com/questions/31494901/how-to-get-the-executable-path-on-openbsd
This seems to be the email referenced in the second comment: https://marc.info/?l=openbsd-misc&m=144987773230417&w=2
I think streams are getting crossed there. Theo is talking about the kernel providing some intrinsic storage/lookup capability. Similarly one of the responses is about whether it's immediately available, but even that is half wrong.
you don't need either of those.
in fact the second SO response is closer to being on track
guarantee there's a way to figure it out. worst case, we can repeat what the shell does to invoke the binary (which I originally had in there but found it to be unnecessary complexity, even as a fallback, so I stripped it out).
still, I'm surprised llvm doesn't supply program_invocation_name -- can you check the cmakeerror?
ld: error: undefined symbol: program_invocation_name
from CheckFunctionExists test
yeah, darn. that's the easiest. clang totally could have stashed it during _start
Does getprogname do anything useful? https://man.openbsd.org/getprogname.3
well yeah, but it should already be testing for that and using it
grep GETPROGNAME in the cache
Doesn't look like it's realpathed anyway...
bu_getprogname() uses it
neither is program_invocation_name, they are just faster alternatives to stashing argv0
<not> - what we need is the all-up full path
<nod> rather
it's undesirable to have to call setprogname(argv[0]) in every application, so half the code in question is simply trying to figure out the name of the running app.
the other half figures out the path to that app
if you have the name, you can always find the path brute force in all but theoretical edge cases that we don't care about.
we've got getprogname detected - it's the whereami.c code that doesn't know how to resolve the full path on OpenBSD
whereami's that external code you added for bu_dir right?
right
you recall what exactly in bu_dir is using it? I just see the include.
wai_getExecutablePath
ah, I found it
did whereis.c compile?
Originally, it bailed.
I added a "always return -1" fallback
not whereami.c .. the other, whereis.c
ah - let me check
yes, compiled - makes a .o file
so then we have the means
we just have to merge or replace whereami.c with whereis.c
they do the same thing, there were just platform robustness differences
I don't recall the exact motivation for why you pulled in whereami.c but I think you had some concern with whereis.c for some reason. You remember what the issue was?
it wasn't robust to arbitrary maneuvering on the file system, IIRC
Illustrated simply:
mged> bu_brlcad_root bin
/home/user/brlcad/build/bin
mged> cd ../../
mged> bu_brlcad_root bin
mged> bu_dir bin
/home/user/brlcad/build/bin
(that's on Linux)
that's not bu_whereis() though ... what's the bu_whereis() call -- I'm not sure how whereis would even be put to work there because bin is a directory, not a command
maybe bu_brlcad_root had/has misuse?
No, bu_brlcad_root doesn't use whereis. The above is just illustrating why I put whereami.c in.
heh, yeah I seemed to remember it being some bug and instead of debugging, you pulled in someone else's code. :)
and now we have code debt from both :D
and a working feature :P
not on openbsd apparently
I don't think it ever worked there with any of the methods except probably your shell mimicking code.
bu_dir didn't, you're right -- that was all stubbed empty
bu_brlcad_root is still just a straight up bug -- it shouldn't be invariant to cwd at all
s/invariant/variant/
the method that whereami code is using is the method to use on windows, and I think it's using the same as other code on linux
In fairness - the above mged log I showed used an mged invoked from the build/lib dir as follows:
../bin/mged -c -a nu ../share/db/moss.g
I think the only immediate solution is extending bu_dir with different methods
or, hm
So I should more precisely have said it isn't just cwd but cwd+invocation argv0
getExecutablePath() intrinsically relies on a global program name, we could just feed it the name we know
you mean the relative path from argv[0]?
bu_getprogname()
which worst case will just be argv[0] passed from the app
it can call bu_whereis to get the resolved path as another method either in getExecutablePath() or in bu_dir() if we want to keep our code separate
what will happen once the cwd changes though? Do we replace the recorded argv[0] with it's resolution immediately upon program launch before the app has a chance to change it's working directory?
I think you're getting issues mixed up -- that's some bu_brlcad_root issue
getprogname doesn't stop working after cwd
Maybe - to me the "user level" issue is I want bu_dir to always resolve correctly.
bu_which logic has nothing that introspects cwd
heh, um, yes... and me too...
Are you saying that the bu_brlcad_root issue is independent of the program name?
i'm saying nothing about the bu_brlcad_root issue at the moment -- there is some issue there to be looked into at some point, but I didn't refer to it
OK
the issue at hand I thought was getting bu_dir to return the path that getExecutablePath() is/should be providing because there's some compilation issue on obsd
we have everything needed to get the current executable path on obsd
OK... that's the whereis.c code?
that can be got using bu_getprogname and bu_which, or we could tweak the logic in whereami.c -- there is a bsd section in there, so I'm a little surprised i it doesn't work
er, yes s/which/whereis/
I tried - OpenBSD doesn't define KERN_PROC_PATHNAME
sure, but what about KERN_PROC_ARGS and KERN_PROC_ARGV ?
assuming it supplies access to argv globally, then we either call bu_whereis/bu_which or replicate that logic in whereami.c for obsd
I believe it has those...
that's what I was originally saying -- the choice is between modifying whereami.c to extend it with openbsd support or using existing support at the higher call in bu_dir to call getprogname+whereis
Thomas' response at https://stackoverflow.com/questions/31494901/how-to-get-the-executable-path-on-openbsd looks like a near drop-in for whereami.c if you want to try it on openbsd; that might make a good upstream patch anyways
whereami.c supports a nice variety of platforms already, so second choice would be to extend it with a LIBBU fallback method using getprogname+whereis
I think either of those are superior to doing anything in bu_dir
Agreed - I guess I'm going to have to try it before it will "click"
I suppose this is something relatively safe for me to chew on...
for what it's worth, that stack overflow response is basically doing a argv0 lookup followed by a PATH search, which is essentially the exact same code as calling bu_getprogname+bu_whereis
bu_whereis searches the system path a lot harder, so I guess his method is actually identical to bu_which()
searching env(PATH)
Am I wrong, or will bu_whereis fail in the case with a relative path to an executable that isn't in any normal binary launching path?
I.e. not in PATH, not in any standard system bin dir
it should fail, that's what 'whereis' does. 'which' will return the relative path. calling code must call realpath on relative paths.
But realpath will only succeed if the current path is still the same as it was when the original invocation occurred?
bingo
and isn't that the problem?
(and yes, that's a problem)
my understanding was that that is the issue whereami.c was trying to resolve
hah, my sausage fingers are slow tonight .. fell down the stairs
yikes!
gonna be sore tomorrow
it is, it's the same reason bu_argv0_full_path() exists too
So... won't what we're proposing to implement for OpenBSD end up failing in those cases without something like KERN_PROC_PATHNAME to fall back on?
same reason it's deprecated, but almost impossible to remove. the only full-proof way for it to work is to either do a realpath() lookup right away (which should be happening when bu_setprogname() is called) or otherwise stashing the cwd on init
if you just pull that persons code as-is, yes -- it will be suseptiple to path changes, but we did resolve the initial path issue in our code
bu_getiwd()
bu_setprogname is currently just copying argv0 - no realpath resolution at all that I can see...
that's likely the bit missing that makes all the realpath() callers in libbu fail after a chdir
will try that tomorrow - might be able to ditch the whereami.c code if that does the trick...
like I said, I wouldn't ditch it.
it's got good platform case variety and they're good, just not piecewise generalized
that's why I didn't complain about it being added, it's a good path forward
ah - so, bu_realpath the progname at set time, then use it as a fallback implementation if we don't have a platform answer?
s/as/to support a/
to be ditched would probably be path_normalize and all the bu_brlcad* routines and the bu_argv0_full_path logic (and callers)
Actually, I think I use path_normalize for some .g path operations, not just filesystem stuff
would need to check, but I think search calls it
yeah, libged/search.c
it's just redundant with something else in libbu iirc
O.o
I sure missed it if it is...
I could be wrong
it's just one func with no system calls too, so not a big deal even if that logic is somewhere, and like you said it might not be
If you're thinking of realpath_bsd.c, that's the actual filesystem resolver from OpenBSD - using that to avoid that crazy crasher case we hit trying to use the Linux realpath
OK, that's enough finger torture - get some rest!
yeah you're right, file_path_normalize() is cool. I think I was remembering bu_file_realpath() and GetFullPathName()'s behavior, but it's not the same as realpath().
@Sean To be sure I understand - we shouldn't be using bio.h in our public header files, correct?
@Sean you probably told me, but I'm afraid I don't recall - what's the reason we don't call bu_setprogname from libbu_init along with bu_getiwd?
starseeker said:
Sean To be sure I understand - we shouldn't be using bio.h in our public header files, correct?
No, it's fine for use in our headers. I mean, usage should be minimized, but there's not a problem. Why you ask?
The repository.sh script has a test that says "make sure nobody includes private headers like bio.h in a public header"
starseeker said:
Sean you probably told me, but I'm afraid I don't recall - what's the reason we don't call bu_setprogname from libbu_init along with bu_getiwd?
Because we don't have the argument to pass that it needs. bu_init() is called before main(), before argv0 exists.
Ah.
I implemented a C++ version that looked for bio.h in include/ and there were a fair number. I scrubbed them back out, but I was wondering if the comment was wrong or the test was broken...
"why not both"?
I'm looking
yeah, that test was broken. I removed it. the single quotes stop expansion of ${i} so grep will never report a match
but also, the restriction that bio.h and friends not be included was removed a few years ago, so it's no longer needed.
Well crud
good catch
I'll put it back for the windows.h includes then - those are the messy ones.
note the other bio check is still good, make sure no redundant bio.h and stdio.h inclusions together
I think we've even used the wrapper headers in src/other .. though maybe not any longer
yeah looks like no longer
you mean put the bio.h public header inclusion test back? no entiendo
brep/defines.h bu/tcl.h and fb/fb_wgl.h were using bio.h because they need window.h - I switched it to just the guarded windows.h include, but it's a lot more verbose.
So I'll put bio.h back in those three headers
Ah, I hadn't noticed you'd removed them yet
I compile tested the snot out of it this time, so hopefully it shouldn't hurt anything...
you're on point, duplication and verbosity trump as being problematic
the only reason they had that rule at the beginning is because they weren't originally safe for public header inclusion. and once we had something they provided needed in a couple places, the refactoring and testing cleanup happened to make sure they were safe.
Ah, yeah I remember that now.
OK. I think I've got a working whereami.c fallback for OpenBSD now - once that's in I'll go test VS2019 and do a final across the board check. If that's all good it will mean trunk is building on everything I have access to.
the main issue is their usage of cmake-tested symbols to define/set/unset logic (for example HAVE_SYS_TIME_H is defined, so we know we can #include <sys/time.h>). we suddenly have a header that is dependent on compilation logic that may not be true on an installed system where the headers get used.
the solution was to simply minimize cmake-test-symbol usage as much possible (which is a good practice for public headers regardless) and to at last document their usage in each header.
starseeker said:
OK. I think I've got a working whereami.c fallback for OpenBSD now - once that's in I'll go test VS2019 and do a final across the board check. If that's all good it will mean trunk is building on everything I have access to.
cool, did you end up using bu_getiwd() or doing something else?
Yep, bu_getiwd
I have a distcheck-full running on RELEASE going, so should have a status check there once that finishes
did you end up getting any CMakeCache.txt files or logs with failures?
I need to test RELEASE with a large .g file.
that's what's running. I'm going methodically through settings to help make sure a failure doesn't come from something from a previous test, or different compiler, etc
OK, sounds good. Once you've got that take a shot at trunk, if you can - I'm trying to get trunk into a solid state, and then I'll switch off into branches for a while to keep it stable.
Yep, RELEASE on Windows was able to open a big .g test - phew
alright, let's see what VS2019 has to say...
shouldn't 75729 be reverted? looks like all of the bio.h's pushed down into headers were because the bu headers didn' declare something they needed (FILE), which means the public headers are incomplete, possibly broken now
I don't think we have a test in there any more that tests isolated header compilation, but that'd probably be good to restore
hm, looks like we need that -- doing a quick test, looks like we have a dozen or so headers on trunk that have type declaration errors, missing headers, and most look like relatively recent changes. would be neat to get cmake doing something like this:
find include -name \*.h -exec g++ -Iinclude -I.build/include -Isrc/other/openNURBS -fsyntax-only -Wall -Wextra -Wno-deprecated {} \;
Can revert it, but I build tested on multiple platforms... you're thinking it's still broken somewhere?
I added stdio.h in lieu of bio.h when it just needed FILE, for example
Hmm. That might be possible, actually...
I'll try it in the branch
Erm. How do I configure test for a compiler flag that by design produces no .o file?
@Sean A rough attempt at such a header testing target is in trunk:
cmake .. -DBRLCAD_HDR_CHECK=ON && make check-headers -k
I don't see any at first glance that look like they're due to r75729...
Do you want me to start fixing them? The two I'm not sure how to handle are vector_x86.h and vector_fpu.h
starseeker said:
Can revert it, but I build tested on multiple platforms... you're thinking it's still broken somewhere?
Not a big deal yet, but there is an issue there.
The headers are likely broken by the nature of using a symbol that hasn't been declared yet, but compilation won't stop because you added the header in all the places it was used (thus masking the fact the header didn't declare it like it should have).
Later someone goes to use the header and gets the error, and they either perpetuate by adding stdio/bio to their source or the fix the header leaving a lot of unnecessary redundant includes in all the callers.
Wouldn't the g++ check you ran see the undeclared symbols? Maybe I'm misunderstanding what' that's telling us...
starseeker said:
I added stdio.h in lieu of bio.h when it just needed FILE, for example
if you actually did catch them all, then they're not broken, but then it means all the bio.h in the usage locations are redundant/unnecessary right?
Some yes, some no - open/close calls need unistd.h
which is to say those headers aren't right
sorry, I was talking about the source files
you're talking about bio.h usage in the headers?
if the symbols were only in the source files then it's certainly fine
there are only 3 bio.h usages in the headers, and they should all be for windows.h (things like HANDLE, iirc, or getting ahead of opennurbs.h in one case.)
I mean symbols in the public headers that were satisfied by bio.h (regardless of the comment about it being for whatever reason)
right, I tried to get all of those when I made the switch.
but did you get them by making sure they included what they declared, or did you get them by putting a header in a source file
including what they declared
then there's no problem
a little surprising, but not a problem if true ;)
if that means what I think it means - i.e. bu_function(FILE *stream) needs stdio in the header because the function signature exposes the FILE type
yep, that's exactly it
you tested windows? iirc, there's a handful of things that come from stdio.h on linux that come from windows.h on windows.
I've built on windows - haven't tried your g++ style test there (can it be done with msvc?)
I don't think we have any in public headers, but wouldn't bet on it
I think I listed the platforms I tested in the commit message... let me see... ah:
Passes make check on MSVC 2017, Linux GCC 9.2, Linux clang 9, OSX (github runner), and OpenBSD (clang 8)
starseeker said:
Do you want me to start fixing them? The two I'm not sure how to handle are vector_x86.h and vector_fpu.h
If you want, but I wouldn't mix it with that branch work. Adding missing headers should be low risk (because in theory all the callers will simply have redundant includes once they get fixed).
would be fun to test the new gcc that came out today
it's been a year I think?
The branch is pretty well closed out at this point - adding the header check in basic form was simple and non-invasive, and if you decide you want it in that form we can juice it up later in trunk.
Last thing I should need to do in trunk (unless your mac builds identify more problems) is clean up the bot command refactor I started.
don't understand -- you added the header check to trunk
just now, yeah - I did the initial experimentation in the branch
Once it proved simple to add, I went ahead and merged it to trunk
ah, it's no risk so shrug
the header check is cool, thanks!
It's a little more work to tie in compiler flag detection and hook it into make regress (which presumably is where you eventually want it to go) but for now now it's a quick manual check if the hard-coded build flags work for the current platform.
is there a way to get rid of all the line added to every header dir though? it's ancillary to that file's purpose and begs questions to newcomers
make regress only after all headers are fixed. they were all last clean and verified maybe 5 years ago
yes, but there are trade-offs - using the lists this way we can be sure we aren't testing stray files stashed in the directories (CMAKEFILES ignored files, for example.) I could do one toplevel build that reaches down into the lower directories, but then any renaming done a lower levels will have non-obvious breakage consequences.
this is one of the rare instances where recursive globbing would have been kosher :)
how do we exclude files though?
besides CMakeLists.txt I mean...
For example, bu has column.h and tbl.h stashed in it - a recursive glob will grab those, which we don't want
I suppose I can try it and see if any of those cause breakage - maybe it won't matter
why not?
I would expect to glob on *.h .. and if there's a header in our include folder, good to know if it's busted.
if they work, then no problem, great to know if we ever do add a broken one
if they don't work, they they specifically could be excluded ... the downside is completely localized to the test itself
A couple are "headers in progress" I left there years ago. I suppose if they're a problem they've just now qualified for removal as a maintenance burden ;-)
it's also super easy to fix headers in general if they are broken. it's just they propagate a slew of unnecessary includes and #ifdef logic to callers when they're broken
bsd used to be notorious for that
OK, r75745 localizes it
r75746 gets most of them - we're down to the weirdly recursive trio of bn/dvec.h bn/vector_fpu.h and bn/vector_x86.h and RtServerImpl.h wanting JNIEXPORT (which our headers don't define anywhere.)
my vote would be to yank RtServerImpl.h, unless someone is still using it...
I've tangled with the bn/vec* files once or twice in the past and lost - IIRC they have something to do with the NURBS code
@starseeker back to clean build state and confirmed. distcheck-full failures are consistent and repeatable. iges error is real, i'm back debugging it. Looks like it's crashing during a boolean evaluation export.
@Sean excellent
There we go - make check-headers is now clean!
once project I did many many mmmmany years ago (before I met yall), I had a 'no recursive include' policy and occasionally tried commenting out includes, the compile time was amazeballs
Heh - I'd hate to even contemplate what that would take for BRL-CAD
Heh - seven commits away from 75757
@starseeker haven't yet isolated if it's bsd-specific, probably not, but it is only exhibiting on freebsd (thus far) and only in optimized builds. it's not limited to g-iges, just happens to be the tool more reliably triggering the race. error appears to be not new, so it doesn't have to be a release blocker.
freebsd, not OSX?
@Daniel Rossberg Is RELEASE branch still good for you?
I don't think I've changed anything since your last test, but it's now last call ;-)
No, nothing has changed. It's still revision 75755.
I.e., it's still good.
OK. Changelog updates are going in, release date gets updated (again), and then we're tagging!
Whew
Lot of fun for a "patch" release...
Thank you, OpenBSD. Getting a core file from MGED (but no other failure report in make regress, so don't know what's producing it) and the old gdb can't read it. Compiled the new gdb from ports, ran egdb... and gdb itself crashed trying to read the core file. Now I have a gdb core dump.
That's a big help.
starseeker said:
freebsd, not OSX?
ugh, I literally lost track of which platform I was debugging on (I have like 20 terminals open atm) and somehow got into an "i'm on freebsd" because the debugging looked exactly like the issue we've been having on freebsd.
I was wrong, very wrong. the corruption is on mac, release builds, not limited to g-iges.
talk about a massive brain fart.
anyways, I have a reliable stack trace and have a lead on the cause. going to keep at it this weekend to see if I can suss it out.
I think you might need to change your brain drawers, boy :D
I do. Got too much going on.
well, poop, I grabbed an rpi3b+ to pop up a new backup server and they only have usb2, I think writing to the disk may be slower than my network O.o :/
@starseeker Just did an update and now seeing this error again:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -std=c11 -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -pipe -fno-strict-aliasing -fn\
o-common -fexceptions -m64 -ggdb -Qunused-arguments -fstack-protector-all -fno-omit-frame-pointer -pedantic -pedantic-errors -Wall -Wextra -Wundef -Wfloat-equal -Wshadow -Wbad-\
function-cast -Wc++-compat -Winline -Wno-long-long -Wno-variadic-macros -Wdocumentation -Wno-c11-extensions -Werror -isysroot /Applications/Xcode.app/Contents/Developer/Platfor\
ms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.14 -dynamiclib -Wl,-headerpad_max_install_names -m64 -ggdb -compatibility_version 20.0.0 -current_ver\
sion 20.0.1 -o ../../lib/libicv.20.0.1.dylib -install_name @rpath/libicv.20.dylib CMakeFiles/libicv-obj.dir/fileformat.c.o CMakeFiles/libicv-obj.dir/rot.c.o CMakeFiles/libicv-o\
bj.dir/color_space.c.o CMakeFiles/libicv-obj.dir/crop.c.o CMakeFiles/libicv-obj.dir/filter.c.o CMakeFiles/libicv-obj.dir/encoding.c.o CMakeFiles/libicv-obj.dir/operations.c.o C\
MakeFiles/libicv-obj.dir/stat.c.o CMakeFiles/libicv-obj.dir/size.c.o CMakeFiles/libicv-obj.dir/pix.c.o CMakeFiles/libicv-obj.dir/png.c.o CMakeFiles/libicv-obj.dir/ppm.c.o CMake\
Files/libicv-obj.dir/bw.c.o CMakeFiles/libicv-obj.dir/dpix.c.o -Wl,-rpath,/Users/morrison/brlcad.trunk/.build/lib ../../lib/libbn.20.0.1.dylib /usr/local/lib/libpng.dylib ../.\
./lib/libnetpbm.dylib ../../lib/libbu.20.0.1.dylib -framework Foundation -ldl -framework System /usr/lib/libc.dylib -lm
Undefined symbols for architecture x86_64:
"_brl_png_create_info_struct", referenced from:
_png_write in png.c.o
_png_read in png.c.o
"_brl_png_create_read_struct", referenced from:
_png_read in png.c.o
"_brl_png_create_write_struct", referenced from:
_png_write in png.c.o
"_brl_png_destroy_read_struct", referenced from:
_png_write in png.c.o
"_brl_png_destroy_write_struct", referenced from:
_png_write in png.c.o
"_brl_png_get_bKGD", referenced from:
_png_read in png.c.o
"_brl_png_get_bit_depth", referenced from:
_png_read in png.c.o
"_brl_png_get_color_type", referenced from:
_png_read in png.c.o
"_brl_png_get_gAMA", referenced from:
_png_read in png.c.o
"_brl_png_get_image_height", referenced from:
_png_read in png.c.o
"_brl_png_get_image_width", referenced from:
_png_read in png.c.o
"_brl_png_init_io", referenced from:
_png_write in png.c.o
_png_read in png.c.o
"_brl_png_read_image", referenced from:
_png_read in png.c.o
"_brl_png_read_info", referenced from:
_png_read in png.c.o
"_brl_png_read_update_info", referenced from:
_png_read in png.c.o
"_brl_png_set_IHDR", referenced from:
_png_write in png.c.o
"_brl_png_set_background", referenced from:
_png_read in png.c.o
"_brl_png_set_expand", referenced from:
_png_read in png.c.o
"_brl_png_set_gAMA", referenced from:
_png_read in png.c.o
"_brl_png_set_gray_to_rgb", referenced from:
_png_read in png.c.o
"_brl_png_set_longjmp_fn", referenced from:
_png_write in png.c.o
"_brl_png_set_sig_bytes", referenced from:
_png_read in png.c.o
"_brl_png_set_strip_16", referenced from:
_png_read in png.c.o
"_brl_png_sig_cmp", referenced from:
_png_read in png.c.o
"_brl_png_write_end", referenced from:
_png_write in png.c.o
"_brl_png_write_info", referenced from:
_png_write in png.c.o
"_brl_png_write_row", referenced from:
_png_write in png.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [lib/libicv.20.0.1.dylib] Error 1
Any ideas? It seems like maybe something stateful as it didn't error on a fresh build, but did after a simple re-run that had nothing to do with libicv.
So it got /usr/local/lib/libpng.dylib somehow.... weird.
@Sean CMakeCache.txt file available?
Somehow PNG_LIBRARIES must be getting overridden - that's the variable libicv is referencing. When you say "rerun" is that a CMake re-run or a re-build?
If it's a CMake re-run you'll probably see something in the CMake output about "finding" PNG - that would be when things are getting messed up. The question would be why it thinks it needs to look for PNG in the first place. I'll see if I can try a double-configure on github...
Sorry, I already wiped some files out and have it working again
it was an svn up followed by make, which re-ran cmake, which then failed. did make clean, wiped out cmake cache, wiped out cmakefiles dir, and still failed. deleted build/cmake* and build/include, re-ran cmake, and seems to be back working. is there perhaps something getting written into the brlcad_config that gets screwed up on a second pass?
this is a simple default "cmake .." so it's the auto-detect default and there is a proper libpng for it to find
this is the fresh start cmakecache results that succeeds:
agua:.build morrison$ grep -r -i png CMakeCache.txt
CMakeCache.txt://define BRLCAD_PNG
CMakeCache.txt:BRLCAD_PNG:STRING=SYSTEM (AUTO)
CMakeCache.txt://PNG_INCLUDE_DIR
CMakeCache.txt:PNG_INCLUDE_DIR:STRING=PNG-NOTFOUND
CMakeCache.txt://PNG include directory
CMakeCache.txt:PNG_INCLUDE_DIRS:STRING=/usr/local/include
CMakeCache.txt://PNG_LIBRARIES
CMakeCache.txt:PNG_LIBRARIES:STRING=/usr/local/lib/libpng.dylib
CMakeCache.txt://PNG_LIBRARY
CMakeCache.txt:PNG_LIBRARY:STRING=PNG-NOTFOUND
CMakeCache.txt:PNG_LIBRARY_DEBUG:FILEPATH=PNG_LIBRARY_DEBUG-NOTFOUND
CMakeCache.txt:PNG_LIBRARY_RELEASE:FILEPATH=/usr/local/lib/libpng.dylib
CMakeCache.txt://Set PNG_MAN_DIR to the global MAN_DIR
CMakeCache.txt:PNG_MAN_DIR:STRING=share/man
CMakeCache.txt://Option to disable Console IO in PNG
CMakeCache.txt:PNG_NO_CONSOLE_IO:BOOL=OFF
CMakeCache.txt://Option to disable STDIO in PNG
CMakeCache.txt:PNG_NO_STDIO:BOOL=OFF
CMakeCache.txt:PNG_PNG_INCLUDE_DIR:PATH=/usr/local/include
CMakeCache.txt://BRL-CAD prefix for libpng
CMakeCache.txt:PNG_PREFIX:STRING=brl_
CMakeCache.txt://Disable building png test executables
CMakeCache.txt:PNG_TESTS:STRING=0
CMakeCache.txt:libdm_LIB_DEPENDS:STATIC=general;librt;general;libfb;general;/usr/X11/lib/libSM.dylib;general;/usr/X11/lib/libICE.dylib;general;/usr/X11/lib/libX11.dylib;general;/usr/X11/lib/libXext.dylib;general;/usr/X11/lib/libXi.dylib;general;/usr/lib/libtcl.dylib;general;/usr/lib/libtk.dylib;general;/usr/local/lib/libpng.dylib;
CMakeCache.txt:libfb_LIB_DEPENDS:STATIC=general;libbu;general;libpkg;general;/usr/local/lib/libpng.dylib;general;/usr/lib/libtcl.dylib;general;/usr/X11/lib/libSM.dylib;general;/usr/X11/lib/libICE.dylib;general;/usr/X11/lib/libX11.dylib;general;/usr/X11/lib/libXext.dylib;general;/usr/X11/lib/libXi.dylib;general;/usr/X11/lib/libGLU.dylib;general;/usr/X11/lib/libGL.dylib;general;/usr/X11/lib/libSM.dylib;general;/usr/X11/lib/libICE.dylib;general;/usr/X11/lib/libX11.dylib;general;/usr/X11/lib/libXext.dylib;general;/usr/X11/lib/libXi.dylib;general;/usr/lib/libtk.dylib;
CMakeCache.txt:libged_LIB_DEPENDS:STATIC=general;libwdb;general;liboptical;general;librt;general;libbrep;general;libnmg;general;libfb;general;libbg;general;libbn;general;libbu;general;libicv;general;libanalyze;general;/usr/local/lib/libpng.dylib;general;/usr/lib/libc.dylib;general;m;
CMakeCache.txt:libicv_LIB_DEPENDS:STATIC=general;libbu;general;libbn;general;/usr/local/lib/libpng.dylib;general;netpbm;
CMakeCache.txt://STRINGS property for variable: BRLCAD_PNG
CMakeCache.txt:BRLCAD_PNG-STRINGS:INTERNAL=AUTO;BUNDLED;SYSTEM
CMakeCache.txt://Details about finding PNG
CMakeCache.txt:FIND_PACKAGE_MESSAGE_DETAILS_PNG:INTERNAL=[/usr/local/lib/libpng.dylib][/usr/local/include][v1.6.37()]
CMakeCache.txt://ADVANCED property for variable: PNG_INCLUDE_DIR
CMakeCache.txt:PNG_INCLUDE_DIR-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_INCLUDE_DIRS
CMakeCache.txt:PNG_INCLUDE_DIRS-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_LIBRARIES
CMakeCache.txt:PNG_LIBRARIES-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_LIBRARY
CMakeCache.txt:PNG_LIBRARY-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_LIBRARY_DEBUG
CMakeCache.txt:PNG_LIBRARY_DEBUG-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_LIBRARY_RELEASE
CMakeCache.txt:PNG_LIBRARY_RELEASE-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_MAN_DIR
CMakeCache.txt:PNG_MAN_DIR-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_NO_CONSOLE_IO
CMakeCache.txt:PNG_NO_CONSOLE_IO-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_NO_STDIO
CMakeCache.txt:PNG_NO_STDIO-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_PNG_INCLUDE_DIR
CMakeCache.txt:PNG_PNG_INCLUDE_DIR-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_PREFIX
CMakeCache.txt:PNG_PREFIX-ADVANCED:INTERNAL=1
CMakeCache.txt://ADVANCED property for variable: PNG_TESTS
CMakeCache.txt:PNG_TESTS-ADVANCED:INTERNAL=1
OK, so the error was the prefix definition being set when it should not have been set?
presumably
so second pass, somehow the header has _brl_ getting defined, so libicv can't resolve when it tries to link
I'm a little surprised that succeeded - PNG_PREFIX is set.
testing a simple touch CMakelists.txt src/libicv/* + make, cmake rerunning now to see if it triggers
that worked, so it's something a bit more complex than just re-running cmake
A stale pnglibconf.h is one possibility
$ nm src/libicv/CMakeFiles/libicv-obj.dir/png.c.o
does not have _brl_
If it's finding a local pnglibconf.h but a system libpng, that'll most likely blow up
Ah, that could very well be actually. pnglibconf.h, IIRC, is generated as a build output. Just switching the third party setting probably isn't clearing it out. Let me see where it ends up in the output...
Looking at the cache file, an initial configure leaves PNG_INCLUDE_DIR set to NOTFOUND. The ENABLE_ALL is actually setting that. Let me see if libicv is looking at the wrong PNG include var...
no...
@Sean with that result, something somehow is getting an ENABLE_ALL header with a system libpng, since clearly the PNG_PREFIX CMake variable by itself didn't trigger the issue (and I see why it doesn't, looking at libpng)
Maybe I should go back to getting that third party build system rework done, even if it does mean a rewrite of the top level flow... I've lost count of the number of times that ThirdParty.cmake state management has bitten me subtly over the years...
(no, I'm not going to, don't worry...)
Don't think it needs to a nuclear option. It's almost certainly a single var mispelled, misplaced, ifdef/ifndef switcheroo or something else really simple.
Updated, and now seeing after make test:
881 - regress-flawfinder (Failed)
882 - regress-mged (Failed)
901 - regress-gcv-dem (Failed)
905 - regress-pkg (Timeout)
@Sean make test is the one that won't work reliably - it runs everything, even the non-working tests. You probably wanted make check?
I know that, but I thought the first three were working
I know the pkg one is nwe
Flawfinder I don't think has ever worked fully (or at least, not in a long time.) If that mged test is the old one I was working on it hasn't worked in a long time. The gcv-dem is too sensitive, IIRC - it's md5summing the output dsp
okay, hrm. thanks! will ignore.
Is Archer working in the latest revision?
also MGED
It should be...
What are the errors?
Which Visual Studio are you using? The testing I've done was with 2017, so 2019 might have some issues...
Archer
C:\summer\brlcad-code\build\Debug\bin\archer.exe
ERROR: Requisite display manager is not available.
BRL-CAD may need to be recompiled with support for:
Run 'fbhelp' for a list of available display managers.
Unexpected error encountered while running Archer.
Aborting.
Process finished with exit code 1
MGED
This is what MGED looks like. image.png
Visual Studio 16 2019, no special build options used
OK. Something about the new display manager setup must be unhappy. We've shifted to plugins, and that failure mode indicates that it can't locate the plugins.
Let me see what 2017 does, then I'll see if I can try 2019
I'll check that
Do you have a "libexec" directory in your build output (in either Release or Debug, depending on which configuration you used to build...)
I didn't build ALL_BUILD, but archer
Ah. OK, you'll need to build dm-wgl as well
That's a point, archer and mged should explicitly depend on those targets now...
It works @starseeker . Thanks
I thought building archer builds all dependancies
Is it not what used to happen? (or did I build ALL_BUILD last time, because it used to work)
It should. I didn't think through all the implications of moving to plugin based architectures - there's no longer a link time dependency between the backends and applications, so the build system won't automatically make the connection
The introduction of plugins into trunk is very new, so you might not have seen it previously, but ALL_BUILD would also have avoided it (that's why I didn't see it in earlier testing)
Easy to fix - just need to specify the required backends as build dependencies.
Testing now, hang on...
cool
r76220 should handle it for the most obvious cases - there are more that will need similar logic to build in isolation, but those will catch the most common cases.
-- Could NOT find TCL (missing: TCL_STUB_LIBRARY TCL_INCLUDE_PATH)
-- Could NOT find TCLTK (missing: TCL_STUB_LIBRARY TCL_INCLUDE_PATH TK_STUB_LIBRARY TK_INCLUDE_PATH)
-- Could NOT find TK (missing: TK_STUB_LIBRARY TK_INCLUDE_PATH)
-- Could NOT find TCLTK (missing: TK_STUB_LIBRARY TK_INCLUDE_PATH)
-- Could NOT find TK (missing: TK_STUB_LIBRARY TK_INCLUDE_PATH)
Any idea why this happens eve after I installed required libraries under https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/doc/README.Linux
(apt-get install xserver-xorg-dev libx11-dev libxi-dev libxext-dev libfontconfig-dev libglu1-mesa-dev)
See everything is installed.
root@sadeep-VirtualBox:/media/sf_summer/brlcad-code/build_linux# apt-get install xserver-xorg-dev libx11-dev libxi-dev libxext-dev libfontconfig-dev libglu1-mesa-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libfontconfig1-dev' instead of 'libfontconfig-dev'
libfontconfig1-dev is already the newest version (2.13.1-2ubuntu3).
libglu1-mesa-dev is already the newest version (9.0.1-1build1).
libx11-dev is already the newest version (2:1.6.9-2ubuntu1).
libxext-dev is already the newest version (2:1.3.4-0ubuntu1).
libxi-dev is already the newest version (2:1.7.10-0ubuntu1).
xserver-xorg-dev is already the newest version (2:1.20.8-2ubuntu2.1).
0 upgraded, 0 newly installed, 0 to remove and 218 not upgraded.
(Ubuntu)
Fixed this. Had to delete the cmake directory and reload after installing missing libraries. Simply reloading was not sufficient.
Yeah, it will usually default to cache'd results, so you have to nuke it all.
What is the background of revision 75933 "ON_DLL_EXPORTS/ON_DLL_IMPORTS is no longer specific enough for an MSVC only override."? It breaks my MS Visual Studio brlcad.dll build with static libs.
The import/export logic was generalized to be enabled both on MSVC and on GCC/clang - the latter has had such an ability for a little while, but we didn't use it. When enabling it (mostly so we could detect changes that would break the import/export bit on platforms other than Windows) we had to adjust the import/export logic to be more general. OpenNURBS was one of the libraries that had to be adjusted to accommodate it.
If I recall correctly, that change triggered MSVC only logic in opennurbs, which is what prompted the change - I think your adjustment should be fine.
Okay, good.
@starseeker got a new failure -- btclsh is crashing on launch on Mac inside dm_init()
frame #4: 0x000000010077cd2e libdm.20.dylib`libdm_init() at dm_init.cpp:109:18
dm_get_name() returned NULL, code didn't handle it
er, not just btclsh. looks like mged is crashing too, anything doing dm init
trunk or RELEASE?
trunk
I haven't tested RELEASE in a long while.
OK. Let me see what happens here... I've been hitting RELEASE so something may have slipped on trunk...
That's a strange error... it indicates that the plugin loaded but didn't define a display manager with a name?
Linux doesn't reproduce it...
Looking at the code, dmp must be NULL so there are at least two issues in the code. First issue is introducing a bunch of getter functions in dm-generic.c that can return NULL in r76200 -- every place those are called must check for NULL or the functions should get changed so NULL is not returnable. Second issue is obviously whatever caused dmp to be NULL.
@Sean I just committed a check and err msg to dm_init.cpp - can you rebuild and run src/libdm/tests/dm_test to see what happens?
looks like that list is dm_interp, dm_get_fb, dm_get_dm_name, dm_get_dm_lname, dm_get_bg, dm_get_fg, dm_get_pathname, dm_get_name, dm_get_dname, dm_get_graphics_system, dm_get_tkname, dm_get_vp, dm_get_vparse, and dm_get_mvars
sure
hm, now cmake is failing
/me closes eyes in pain
It's a build from scratch, but I wouldn't expect that necessarily for the minor changes just made.. don't yet see why it's actually halting.
agua:.build morrison$ make
**********************************************************
*** Configuring BRL-CAD Release 7.31.0, Build 20200814 ***
**********************************************************
X11 detected and enabled
Defining HAVE_X11_XLIB_H
Defining HAVE_X11_EXTENSIONS_XINPUT_H
-- Could NOT find Appleseed (missing: Appleseed_INCLUDE_DIR Appleseed_LIBRARY)
^[[D^[[D
-------------------- BRL-CAD Release 7.31.0, Build 20200814 --------------------
Prefix: /usr/brlcad/dev-7.31.0
Binaries: /usr/brlcad/dev-7.31.0/bin
Libraries: /usr/brlcad/dev-7.31.0/lib
Manual pages: /usr/brlcad/dev-7.31.0/share/man
Data resources: /usr/brlcad/dev-7.31.0/share
CC = /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
CXX = /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
CFLAGS = -std=c11 -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -pipe
-fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions
-m64 -ggdb -Qunused-arguments -fstack-protector-all
-fno-omit-frame-pointer -pedantic -pedantic-errors -Wall -Wextra
-Wundef -Wfloat-equal -Wshadow -Wbad-function-cast -Wc++-compat
-Winline -Wno-long-long -Wno-variadic-macros -Wdocumentation
-Wno-c11-extensions -Werror
CXXFLAGS = -std=c++11 -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -pipe
-fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions
-ftemplate-depth-128 -m64 -ggdb -Qunused-arguments
-fstack-protector-all -fno-omit-frame-pointer -pedantic -Wall
-Wextra -Wundef -Wfloat-equal -Wshadow -Wbad-function-cast -Winline
-Wno-long-long -Wno-variadic-macros -Wdocumentation
-Wno-c11-extensions -Werror
LDFLAGS = -m64 -ggdb
Compile Tcl ........................: ON
Compile Tk .........................: ON
Compile Itcl/Itk ...................: ON
Compile Iwidgets ...................: ON
Compile Tkhtml .....................: ON
Compile Tktable ....................: ON
Compile libpng .....................: ON
Compile libregex ...................: ON
Compile zlib .......................: ON
Compile Utah Raster Toolkit ........: ON
Compile openNURBS ..................: ON
Compile STEPcode....................: ON
OpenGL support (optional) ..........: ON
X11 support (optional) .............: ON
Qt support (optional) ..............: OFF
Run-time debuggability (optional) ..: ON
Build 32/64-bit release ............: 64BIT (Auto)
Build optimized release ............: OFF
Build static libraries .............: ON
Build dynamic libraries ............: ON
Install example geometry models ....: ON
Generate extra docs ................: ON (html/man)
"/Users/morrison/brlcad.trunk/INSTALL" is out of date. An updated version has been generated at "/Users/morrison/brlcad.trunk/.build/INSTALL.new"
To clear this warning, replace "/Users/morrison/brlcad.trunk/INSTALL" with "/Users/morrison/brlcad.trunk/.build/INSTALL.new"
"/Users/morrison/brlcad.trunk/configure" is out of date. An updated version has been generated at "/Users/morrison/brlcad.trunk/.build/configure.new"
To clear this warning, replace "/Users/morrison/brlcad.trunk/configure" with "/Users/morrison/brlcad.trunk/.build/configure.new"
CMake Error at misc/CMake/BRLCAD_Util.cmake:90 (_message):
Configure haulted.
Call Stack (most recent call first):
CMakeLists.txt:3728 (message)
-- Configuring incomplete, errors occurred!
See also "/Users/morrison/brlcad.trunk/.build/CMakeFiles/CMakeOutput.log".
See also "/Users/morrison/brlcad.trunk/.build/CMakeFiles/CMakeError.log".
Oh. For whatever reason, it's generated INSTALL and configure files that don't match the ones in the source tree
I've gotten those messages about INSTALL and configure before, so I can't imagine it's those are causing the halt?
Plus it says they are warnings...
They're supposed to be fatal errors. The only reason they're warnings is so we can check for both files before halting the configure
what's the diff between INSTALL.new and INSTALL?
Hm. Is there not a way for it to at least report them as errors instead of warnings, and still check both?
We could alter the message that's printed
CMakeLists.txt line 3702
configure.new is missing all of the dependency toggles (e.g., --enable-scl, --enable-step, --enable-gdiam, etc)
O.o
Which version of CMake are you using?
that's the same problem with INSTALL.new .. they're all gone
cmake version 3.16.20200125-g33e7bd6
I wouldn't think that would be an issue...
this was fine a week ago, so I'm not sure what changed
This is from a clean build dir?
of course not, I have a lot of in-progress work in there
I can blow away the cmake bits though
Maybe try removing CMakeCache.txt
rm -rfing cmake* CMake*
how is the INSTALL/configure file contents built up? implies there's some bug there at least.
should be obvious/categoric
updated the error message
The BRLCAD_OPTION function in misc/CMake/BRLCAD_Options.cmake does that, calling some helper functions to generate the text. BRLCAD_OPTION is usually invoked by ThirdParty.cmake
looks like the OPTIONS file it wrote has them
at least now, wish I'd looked before running clean
starseeker said:
The BRLCAD_OPTION function in misc/CMake/BRLCAD_Options.cmake does that, calling some helper functions to generate the text. BRLCAD_OPTION is usually invoked by ThirdParty.cmake
Thanks, will see if I can catch it next time.
gah, now the X11 error again. I really need to figure out what's wrong there.
I'm not too surprised if you're running for long periods of time in the same build dir without clearing it - that's not a mode I usually work in (it tends to produce weird intermediate build system states that are hard to debug) so I won't usually know ahead of time if those issues are in there...
Clearing the CMake files should do it though.
I know, you say that every time. To me it's a mode we've always supported and that's even worked in our cmake era robustly-enough until relatively recently.
maybe an indication that issues are piling up, maybe just random chance, but it is absolutely more unproductive time if I have to wait for cmake to fully re-run every single time. these failures aren't that often, so they really should be isolated issues.
the fact that it works at least 9/10 times is a testament that it can work and usually does
or 19/20, whatever it is
Heh. OK, fair enough - I don't re-run every time either. Hopefully clearing the CMake files will right the ship
case in point, I have to wait another 10 minutes (5min into compile, 5 to re-run cmake) to re-run because default build has been non-functional since the tcl upgrade. have to enable all and run cmake again, then rebuild.
ugh, and it doesn't help when I typo the darn var name like I just did and have to wait for cmake yet again...
/me nods. That's a good point - I forget how slow it is to run CMake on the mac. I'm spoiled on Linux - it's fast enough there to be a much less severe pain point.
The github macOS runner does build successfully - maybe because it doesn't have Tk enabled?
even when I'm on the 128 core linux box and compile takes 2min, it's annoying that cmake takes 2 min too :)
starseeker said:
The github macOS runner does build successfully - maybe because it doesn't have Tk enabled?
probably, I don't remember the specifics. I spent time debugging months back, but didn't get it resolved. it was broken on the branch and working on trunk and it'd been merged, so I didn't look too much into it. eventually appeared on trunk too.
I wish Mac had some kind of remote open source development environments we could set up with for testing - the main environment I can't test is the one you use as your primary environment.
Short of buying a Mac (which I suppose is what Apple wants) it's a conundrum
Even the github CI doesn't seem to help much
I think cmake is doing the right thing -- it found a viable system Tcl/Tk. the problem (I think) is that it's finding Tk's X11 stub headers before the actual system X11 headers.
I've seen both compilation and linkage errors, so likely two issues.
Oh. That reminds me of the fink/macports issues - I never did know what the "right" answer to that was. I ended up forcing an ordering, but I think you really didn't like what I did... it was pretty quirky/custom and specific to that setup.
Example build output (parallel unfortunately so it's a bit messy to read):
[ 61%] Building CXX object src/gtools/CMakeFiles/glint.dir/glint.cpp.o
Scanning dependencies of target gex
Scanning dependencies of target libdm
[ 61%] Linking CXX shared library ../../lib/libdm.dylib
[ 61%] Linking C executable gtransfer
[ 61%] Building CXX object src/gtools/CMakeFiles/gex.dir/gex.cpp.o
[ 61%] Linking C executable ../../../bin/vdeck
[ 61%] Built target gtransfer
Scanning dependencies of target libwdb
[ 61%] Built target libdm
[ 61%] Linking CXX shared library ../../lib/libwdb.dylib
Scanning dependencies of target liboptical
[ 61%] Linking C shared library ../../lib/liboptical.dylib
[ 61%] Built target vdeck
[ 61%] Linking CXX static library ../../lib/libdm.a
[ 61%] Built target libwdb
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: ../../lib/libdm.a(knob.c.o) has no symbols
Scanning dependencies of target dm-X
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: ../../lib/libdm.a(knob.c.o) has no symbols
[ 61%] Built target liboptical
[ 61%] Built target libdm-static
Scanning dependencies of target dm-ogl
Scanning dependencies of target dm-plot
Scanning dependencies of target dm-ps
[ 61%] Building C object src/libdm/X/CMakeFiles/dm-X.dir/dm-X.c.o
[ 61%] Building C object src/libdm/plot/CMakeFiles/dm-plot.dir/dm-plot.c.o
[ 61%] Building C object src/libdm/postscript/CMakeFiles/dm-ps.dir/dm-ps.c.o
[ 61%] Linking CXX executable gen-attributes-file
[ 61%] Building C object src/libdm/glx/CMakeFiles/dm-ogl.dir/dm-ogl.c.o
[ 61%] Built target libanalyze-obj
[ 61%] Built target gen-attributes-file
[ 61%] Building C object src/libdm/X/CMakeFiles/dm-X.dir/color.c.o
Scanning dependencies of target dm-txt
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:118:10: error: implicit
declaration of function 'XAllocColor' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
st = XAllocColor(dpy, cmap, color);
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:182:5: error: implicit
declaration of function 'XGetWindowAttributes' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XGetWindowAttributes(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:88:5: error: implicit
declaration of function 'XQueryColors' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XQueryColors(dpy, src, colors, ncolors);
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:212:7: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XLoadQueryFont(pubvars->dpy, FONT9)) == NULL) {
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:91:2: error: implicit
declaration of function 'XStoreColors' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XStoreColors(dpy, dest, colors, ncolors);
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:91:2: note: did you mean
'XQueryColors'?
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:88:5: note: 'XQueryColors'
declared here
XQueryColors(dpy, src, colors, ncolors);
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:94:6: error: implicit
declaration of function 'XAllocColor' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XAllocColor(dpy, dest, &colors[i]);
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:211:27: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((pubvars->fontstruct =
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:214:31: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((pubvars->fontstruct =
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:222:2: error: implicit
declaration of function 'XChangeGC' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XChangeGC(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:139:7: error: implicit
declaration of function 'XStoreColor' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XStoreColor(dpy, cmap, &color);
^
/Users/morrison/brlcad.trunk/src/libdm/X/color.c:141:7: error: implicit
declaration of function 'XAllocColor' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XAllocColor(dpy, cmap, &color);
^
5 errors generated.
make[2]: *** [src/libdm/X/CMakeFiles/dm-X.dir/color.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:231:27: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:231:25: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:233:3: error: implicit
declaration of function 'XFreeFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XFreeFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:233:3: note: did you mean
'Tk_FreeFont'?
/Users/morrison/brlcad.trunk/.build/include/tkDecls.h:270:14: note:
'Tk_FreeFont' declared here
EXTERN void Tk_FreeFont(Tk_Font f);
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:237:3: error: implicit
declaration of function 'XChangeGC' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XChangeGC(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:243:27: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:243:25: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:245:3: error: implicit
declaration of function 'XFreeFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XFreeFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:249:3: error: implicit
declaration of function 'XChangeGC' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XChangeGC(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:255:27: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:255:25: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:257:3: error: implicit
declaration of function 'XFreeFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XFreeFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:261:3: error: implicit
declaration of function 'XChangeGC' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XChangeGC(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/X/dm-X.c:267:27: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
if ((newfontstruct = XLoadQueryFont(pubvars->dpy,
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
[ 61%] Building C object src/libdm/glx/CMakeFiles/dm-ogl.dir/if_ogl.c.o
20 errors generated.
make[2]: *** [src/libdm/X/CMakeFiles/dm-X.dir/dm-X.c.o] Error 1
make[1]: *** [src/libdm/X/CMakeFiles/dm-X.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 61%] Linking C shared library ../../../libexec/dm/libdm-ps.dylib
[ 61%] Linking C shared library ../../../libexec/dm/libdm-plot.dylib
[ 61%] Building C object src/libdm/txt/CMakeFiles/dm-txt.dir/if_debug.c.o
[ 61%] Building C object src/libdm/txt/CMakeFiles/dm-txt.dir/dm-txt.c.o
/Users/morrison/brlcad.trunk/src/libdm/glx/dm-ogl.c:236:5: error: implicit
declaration of function 'XGetWindowAttributes' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XGetWindowAttributes(pubvars->dpy,
^
[ 61%] Built target libgcv_fastgen4-obj
/Users/morrison/brlcad.trunk/src/libdm/glx/dm-ogl.c:250:7: error: implicit
declaration of function 'XLoadQueryFont' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
XLoadQueryFont(pubvars->dpy,
^
/Users/morrison/brlcad.trunk/src/libdm/glx/dm-ogl.c:249:27: error: incompatible
integer to pointer conversion assigning to 'XFontStruct *' from 'int'
[-Werror,-Wint-conversion]
if ((pubvars->fontstruct =
^
Would putting the X11 header includes before the Tk includes deal with that?
I actually don't know the status of fink/macports... homebrew kind of took over that market
maybe, like I said, I didn't dig
/me looks...
looks like at least for dm-ogl.c that they're already first
probably need to do a header-trace to see where the declarations are coming from (or not coming from)
hah, that's great! (just saw the benchmark results)
"Run 'C:/RELEASE-build/Release/bin/benchmark clean' to remove generated pix files." ... Seeing a windows path there might be a first
@Sean one thing to make sure of with the plugins is that any stale .so files are cleared out of the libexec directory - if they aren't removed they'll stand a decent change of causing problems.
src/libdm/tests/dm_test is my "diagnostic" goto to figure out what's going on, at least with the libdm plugins.
starseeker said:
Sean one thing to make sure of with the plugins is that any stale .so files are cleared out of the libexec directory - if they aren't removed they'll stand a decent change of causing problems.
Why is that? None of the other binary products have an issue with that I'm aware of.
that's not something i'm likely going to remember a month from now, so if something isn't stable, that should get fixed.
The library initializes at runtime, looking for whatever .so files are in the directory. It has no way of knowing what is and isn't a stale file.
what does stale mean in that context? I would still expect the plugins to recompile if they were changed
even with dynamic loading, it's possible to detect actual incompatibility and we may need to if this is a potential issue
I'm not sure what staleness potential you're referring to though..
Stale would be a plugin file that dates to an older build (say, a dm backend that was removed from the build and then the plugin API changed.) Attempting to load the old .so file in that case wouldn't work.
Also might be an issue if the .so file linked to older versions of BRL-CAD libraries that no longer are compatible with the current environment.
well, so that's a different issue no? that's got nothing to do with my build dir and needing to remember to wipe out .so files
unless you just recently changed the plugin interface and I only partially recompile, that's not so much a concern. the headers will have changed and the plugins will recompile (at least I would expect them to)
Well, if you do a lot of builds over time in a build dir without doing make clean, it's possible to accumulate such files (similar to how stale symlinks were sometimes left over). The scenario I was thinking of:
The old libdm plot plugin would still be in libexec in that scenario.
I see, so how about preventing that then. sounds like this is a new category of concern.
I'm open to suggestions...
could prevent it just by putting a plugin api version number into the header file. if the loaded version is less, skips that plugin. just have to remember to bump the api version whenever the calling interface changes.
could make the api number simply be the composite of the version number -- that way they're always locked to their compiled version only.
So it would be a "bu_dlsym" to look for something like "dm_api_version" from the plugin?
something like:
#define DM_API ((BRLCAD_MAJOR*10000) + (BRLCAD_MINOR*100) + BRLCAD_PATCH)
so 7.32.12 becomes api version 703212
you'll have to make it plugin-specific, like an: int dm_ogl_version=DM_API;
then bu_dlsym that out after loading the plugin, before looking at any symbols
if var < DM_API, skip it
heck, could start if var != DM_API, skip it
Why plugin specific?
that ties the variable name to the file name of the plugin...
oh my, you're using the same symbol names across all plugins??
ah, I see, yes and no
they symbols are all unique, but in a dm_plugin pinfo struct
Yes. That's a single point of entry - went that way for simplicity.
hm, well that is fine for now I guess but it's a little awkward for dynamic libs to do that. I don't think that's entirely portable in the general case for dlsym, though those cases aren't necessarily a major concern any more.
it's the same reason why Tcl has the library name the loading function specifically LIB_Init()
it also precludes portability to any new systems that don't (yet) support dynamic libraries and precludes static linking if for some case we later need/want that.
OS X was out for years before they added support for dynamic libs for example. now Mac does, of course, but the potential for other/new OS environments is definitely still a modern concern.
we can cross that bring later. it's trivial to change pinfo to ogl_pinfo or whatever, and to change the version var.
hm, it will screw with tags (msvc, emacs, vim jump-to-symbol/definition support) so maybe still something to consider
would simplify the code by eliminating the preprocessor conditional, one benefit
Are those environments capable of following those linkages? I figured as a runtime-only connection they were left out anyway...
What do you mean?
I just am not clear what conditions would allow emacs/vim to jump to the name - as a plugin, wouldn't it just be a pointer anyway?
the editors? they can certainly follow a symbol if I'm looking at a header on struct dm_plugin and ask it to show me the definition
it's going to jump to one of them
some do dynamic symbol searching, some do string matching
(most do the latter I think, especially the 'tags' system)
OK, so just directly (say) return dm_ogl instead of pinfo
Except I thought we needed that static wrapper
it's whatever symbol you load -- I'm guessing you load "dm_plugin_info" currently, so that would change to "dm_ogl_plugin" or whatever. that pinfo struct is already static so it's not going to collide with anything, even if compiled static.
OK. So I just need something then to make sure the names match the filenames (so the init function knows what to ask for for any given file)
could put the version field right into the dm_impl as the first field
then you dont' need to actually introduce another lookup
Sort of like the bu_magic trick?
sort of
I take it we should get this in pre-release?
I would .. this is going to be 7.32 right?
seems pretty major
Yes. (Or what's in RELEASE will be, at any rate - I branched off of trunk a while back.)
also warrants a NEWS mention I think, dynamic behavior is a user-visible architecture change
OK - I'll hit that first, then see if I can make rtweight do something saner.
Really? I had thought if I got it right the user wouldn't notice anything...
you're right, user shouldn't -- but it's a different runtime and exposure profile, and definitely results in different runtime behavior when something is not what we expected
it's like when we made shaders optionally load dynamically. user didn't necessarily see that except it resulted in dynamic load failure messages when a typo'd shader name wasn't found. not something that was anticipated, but the arch change was user visible and the messages obviously could be traced to it's introduction.
documented
so same deal for the ged commands
@Sean was https://sourceforge.net/p/brlcad/code/76767/ what you were thinking (more or less)?
(Want to settle on a pattern before I do libged, that'll be more painful...)
starseeker said:
Sean was https://sourceforge.net/p/brlcad/code/76767/ what you were thinking (more or less)?
Yes, that's spot on! You could even make it an explicit plugin->api_version check if you want to support the version moving around the struct, but what you have checking the first byte is a good contract to require too.
2020-08-15T14:51:35.0885582Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_debug/brlcad-7.31.0/include/bu/magic.h:244:21: error: the address of ‘jaunt_tbl’ will always evaluate as ‘true’ [-Werror=address]
2020-08-15T14:51:35.0886308Z if (UNLIKELY(( (!(_ptr)) /* non-NULL pointer */ \
2020-08-15T14:51:35.0886756Z ^
2020-08-15T14:51:35.0887896Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_debug/brlcad-7.31.0/include/common.h:364:50: note: in definition of macro ‘UNLIKELY’
2020-08-15T14:51:35.0888280Z # define UNLIKELY(expression) __builtin_expect((expression), 0)
2020-08-15T14:51:35.0888663Z ^~~~~~~~~~
2020-08-15T14:51:35.0889257Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_debug/brlcad-7.31.0/include/bu/ptbl.h:65:24: note: in expansion of macro ‘BU_CKMAG’
2020-08-15T14:51:35.0889945Z #define BU_CK_PTBL(_p) BU_CKMAG(_p, BU_PTBL_MAGIC, "bu_ptbl")
2020-08-15T14:51:35.0890236Z ^~~~~~~~
2020-08-15T14:51:35.0890818Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_debug/brlcad-7.31.0/src/libnmg/mod.c:3061:2: note: in expansion of macro ‘BU_CK_PTBL’
2020-08-15T14:51:35.0891507Z BU_CK_PTBL(&jaunt_tbl);
Oh, I see - r76785
If we want the extra test, maybe devise a configure time check to see whether the platform in question lets us do it without the error?
@Sean What issues do you know of that you would consider release blockers?
ah, right, constant expression. hrm.
I have vague recollection of that issue being the reason it was removed now too..
can you recheck the runner with the latest? trying a different approach.
starseeker said:
Sean What issues do you know of that you would consider release blockers?
rtweight no longer reporting volume unless there's a material defined is pretty significant change. the spew wouldn't be a blocker, but I think lacking a report is. also didn't seem to be outputting the report when run from mged -c, but I'd need to verify with a clean build.
only issue that come to mind is peer review isn't yet complete on all outstanding commits, and that's supposed to be a blocker for IA compliance, particularly it it's a build intended to be deployed (which I think it is, yes?) I think I can get through the remaining commits by tuesday if you can hold off tagging.
also, still seeing this:
[ 80%] Linking C shared library ../../../libexec/dm/libdm-ps.dylib
Undefined symbols for architecture x86_64:
"_Tcl_AppendStringsToObj", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_DuplicateObj", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_GetObjResult", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_SetObjResult", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
Huh, that means it's still not linking Tcl - I thought I fixed that.
I saw the commit and thought so too. it's re-run cmake cleanly since.
Yeah, I can hold off til tuesday - still banging my head on the Ninja Windows build anyway (The msbuild run on the runner is less than 100% reliable, so far...)
That's really strange then. Is TCL_LIBRARY set correctly in CMakeCache.txt?
agua:.build morrison$ grep TCL_LIBRARY CMakeCache.txt
//ITCL_LIBRARY
ITCL_LIBRARY:STRING=itcl
//TCL_LIBRARY
TCL_LIBRARY:STRING=tcl
//TCL_LIBRARY
//ADVANCED property for variable: ITCL_LIBRARY
ITCL_LIBRARY-ADVANCED:INTERNAL=1
//ADVANCED property for variable: TCL_LIBRARY
TCL_LIBRARY-ADVANCED:INTERNAL=1
OK, that should be correct.
What does otool say about what the dm-ps libexec lib is seeing?
here's verbose:
[100%] Linking C shared library ../../../libexec/dm/libdm-ps.dylib
cd /Users/morrison/brlcad.trunk/.build/src/libdm/postscript && /Users/morrison/Applications/bin/cmake -E cmake_link_script CMakeFiles/dm-ps.dir/link.txt --verbose=1
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -std=c11 -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -m64 -ggdb -Qunused-arguments -fstack-protector-all -fno-omit-frame-pointer -pedantic -pedantic-errors -Wall -Wextra -Wundef -Wfloat-equal -Wshadow -Wbad-function-cast -Wc++-compat -Winline -Wno-long-long -Wno-variadic-macros -Wdocumentation -Wno-c11-extensions -Werror -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.14 -dynamiclib -Wl,-headerpad_max_install_names -m64 -ggdb -o ../../../libexec/dm/libdm-ps.dylib -install_name @rpath/libdm-ps.dylib CMakeFiles/dm-ps.dir/dm-ps.c.o -Wl,-rpath,/Users/morrison/brlcad.trunk/.build/lib ../../../lib/libdm.20.0.1.dylib ../../../lib/librt.20.0.1.dylib ../../../lib/libgdiam.dylib ../../../lib/libvds.dylib ../../../lib/libbrep.20.0.1.dylib ../../../lib/libnmg.dylib ../../../lib/libbg.20.0.1.dylib ../../../lib/libSPSR.dylib ../../../lib/libopenNURBS.2012.10.245.dylib ../../../lib/libpoly2tri.dylib ../../../lib/libicv.20.0.1.dylib ../../../lib/libbn.20.0.1.dylib ../../../lib/libnetpbm.dylib ../../../lib/libpkg.20.0.1.dylib ../../../lib/libbu.20.0.1.dylib -framework Foundation -ldl -framework System ../../../lib/libregex_brl.1.0.4.dylib -lm /usr/X11/lib/libGLU.dylib /usr/X11/lib/libGL.dylib /usr/X11/lib/libSM.dylib /usr/X11/lib/libICE.dylib /usr/X11/lib/libX11.dylib /usr/X11/lib/libXext.dylib /usr/X11/lib/libXi.dylib ../../../lib/libpng_brl.16.37.0.dylib ../../../lib/libz_brl.1.2.11.dylib
Undefined symbols for architecture x86_64:
"_Tcl_AppendStringsToObj", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_DuplicateObj", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_GetObjResult", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
"_Tcl_SetObjResult", referenced from:
_ps_open in dm-ps.c.o
_ps_loadMatrix in dm-ps.c.o
Yeah, I don't see tcl in that list.. what the heck?
there's no tcl listed
Does line 20 in src/libdm/postscript/CMakeLists.txt have TCL_LIBRARY on it?
hm, it didn't but does now -- different tree got updated. this is probably my bad. rebuilding.
yep, my issue. sorry! enable-all is all good.
No problem. Github runner kicked off - I'm trying with Ninja again, so if that fails I'll have to go again with msbuild, but either way should have an answer within about 2-3 hours.
Actually, the Ubuntu GCC should work either way, come to think of it
2020-08-16T19:39:35.8704061Z FAILED: src/librt/CMakeFiles/librt-obj.dir/reduce_db.cpp.o
2020-08-16T19:39:35.8706430Z /usr/bin/c++ -DBRLCADBUILD -DDB5_DLL_EXPORTS -DGDIAM_DLL_IMPORTS -DHAVE_CONFIG_H -DRT_DLL_EXPORTS -DTIE_DLL_EXPORTS -DVDS_DLL_IMPORTS -I/home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/include -Iinclude -I/home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/librt -Iinclude/brlcad -isystem src/other/openNURBS -isystem src/other/libz -isystem src/other/libregex -isystem src/other/libvds -isystem src/other/libgdiam -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/poly2tri -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/openNURBS -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/libz -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/libregex -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/libvds -isystem /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/other/libgdiam -std=c++11 -D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -g -ggdb3 -O3 -fipa-pta -fstrength-reduce -fexpensive-optimizations -finline-functions -flto -fno-omit-frame-pointer -pedantic -Wall -Wextra -Wundef -Wfloat-equal -Wshadow -Wno-inline -Wno-long-long -Wno-variadic-macros -Werror -fPIC -std=c++11 -MD -MT src/librt/CMakeFiles/librt-obj.dir/reduce_db.cpp.o -MF src/librt/CMakeFiles/librt-obj.dir/reduce_db.cpp.o.d -o src/librt/CMakeFiles/librt-obj.dir/reduce_db.cpp.o -c /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/librt/reduce_db.cpp
2020-08-16T19:39:35.8707398Z In file included from /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/librt/reduce_db.cpp:27:0:
2020-08-16T19:39:35.8708379Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/src/librt/reduce_db.cpp: In function ‘reduce_db::remove_dead_references(db_i&)’:
2020-08-16T19:39:35.8708883Z /home/runner/work/cadcitest/cadcitest/build/distcheck-enableall_release/brlcad-7.31.0/include/bu/magic.h:245:6: error: nonnull argument ‘db’ compared to NULL [-Werror=nonnull-compare]
2020-08-16T19:39:35.8709297Z if (UNLIKELY(( ((uintptr_t)(_ptr) == (uintptr_t)NULL) /* non-NULL pointer */ \
2020-08-16T19:39:35.8709452Z ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2020-08-16T19:39:35.8709757Z || ((uintptr_t)(_ptr) == 0) /* non-zero pointer */ \
2020-08-16T19:39:35.8709894Z ^~~~~~~~~~~~~~~~~~~~~~~~~~~
2020-08-16T19:39:35.8710114Z cc1plus: all warnings being treated as errors
rtweight breakage looks like it was introduced by commit 72190
OK, now I'm really confused. In the 7.26.4 in non-interactive mode src/mged/cmd.c line 1043 is executed, the program quits by eventually calling Tcl_Exit(), and then the queued up stdout output is dumped to the terminal. In trunk this doesn't happen, even though the same sequence still seems to be followed, and I am now getting output in the interactive classic mode.
that's a terrible commit message. :P
My that was painful. @Sean I think I've gotten the output restored in all three modes - GUI, interactive classic, and command execution. Only tested on Linux so far.
IMG_0399.jpeg @starseeker any thoughts on the source of this? recent build on windows by bill.
cmake 3.18.1, so pretty recent
build succeeded so it is apparently innocuous (perhaps regress is busted if it could try)
That matches a problem I had trying to run the "clean" target with msbuild
Strange issue - when I grepped through the directory I could only find one match for a regress target, so I'm not quite sure where it's getting the duplicate.
https://github.com/dotnet/msbuild/issues/3019 might be related if CMake's generation of names is doing something a little quirky, but not sure.
Let me try again with 3.18.1 - I don't recall getting a GUI pop-up, it showed only on the command prompt. 'course I was using a newer VS, that may make a difference too.
Yeah, 2017 loaded without that message, with 3.18.1. Not sure if I have the space to install 2015 on this laptop...
Which version of CAD is he building?
lastest checkout
OK, yeah - that's what I tried too. Weird.
@starseeker did you test if_disk and if_remote?
and what's the deal with bu_exit()?? that's not ignorable/replaceable. there are over 1500 calls to bu_exit
if it's crashing, we have to investigate and fix it
if bu_exit() is crashing dauto, that's pretty much guaranteed to be corruption
I'm still working through all the if_* cases - should be done by the end of the day.
I didn't have much luck debugging the dauto bit - it cropped up when I was trying the usage.sh script, appeared only some of the time, and the only thing GDB could tell me when I popped a sleep into bu_bomb so I could attach was that the BU_SETJUMP test was true when bu_exit was called.
I couldn't figure out where corruption might be introduced - dauto basically did the isatty checks and then quit immediately.
bu_setprogname was the only other call.
The only other thing I can think of is that maybe we need to initialize the bu_jmpbuf array somehow...
AH! My bad - I misinterpreted where the failure was occurring. I thought bu_bomb would be called only when something when wrong, but that's not the case. I put my test in the wrong part of the code.
dauto needs to validate the atoi conversion before mallocing
Grr. Now I can't get fbstretch to repeat it's bomb log file generation, with or without the dauto failure in the usage.sh run. Will just have to keep an eye out to see if it repeats.
starseeker said:
I couldn't figure out where corruption might be introduced - dauto basically did the isatty checks and then quit immediately.
sounds like you figured it out, but another thing to keep in mind is all static initialization also occurs before main(). this includes globals and statics getting initialized (which in the case of C++ can be arbitrarily complex) as well as any library initialization like we're doing in libbu.
starseeker said:
Grr. Now I can't get fbstretch to repeat it's bomb log file generation, with or without the dauto failure in the usage.sh run. Will just have to keep an eye out to see if it repeats.
yeah, we actually had to remove the __attribute__ "never returns" marking on it because technically it can (and often does) return control -- bu_exit() is guaranteed to not return (it calls _exit() hard), but bu_bomb is not.
@Sean how exactly do I test the disk and stack interfaces? I'm not seeing where they get used.
You can/could specify all types via the -F option, though some of the builtins use conventions (e.g., if_network is used when you used a hostname). I believe disk is used when you specify a filepath, mem uses a convention iirc. They have been documented in several places over the decades (especially the old green manuals).
I think I recall some examples in the code. Don't know but libfb man page might have relevant info (where FB_FILE is mentioned).
the env var is another way they can be set
The mem gets used by rtwizard, and -F<filepath> seems to work for disk.
stack I can trigger and it wants some sort of args (but don't know what to feed it, old or new config)
Ah, there we go (buried in one of the old emails)
export FB_FILE="/dev/stack /dev/debug;/home/user/moss.pix;"
./bin/rt share/db/moss.g all.g
OK, looks like we're in business.
btw: not reproducible, but this happened once if it's of interest:
bu_semaphore_free(): pthread_mutex_destroy() failed on [18] of [20]
I don't recall stack being explicit, could be wrong though.
thought it let you do things like -F"/dev/mem -" to read from stdin and write to a memory buffer
html/ReleaseNotes/email3.0.html was where I pulled that from, FWIW
ah, great good find then
First time I've actually used one of those old emails practically... nifty!
how's it actually going through /dev/disk without it being in that switch list?
fb_open - falls through to the disk interface if no other matches
dm_plugins.cpp:405
cool, and remote? triggered on something else?
if (fb_totally_numeric(file) || strchr(file, ':') != NULL) {
hm, interesting
Not sure what happens with a full C:\ windows path, that could get ugly...
That's been there though, I don't think I changed any of that...
yeah, probably wrong for that, but probably always been wrong for that
I would have thought -Fbrlcad.org would use the default port, but it might have always required the colon
/me double checks the rel-7-30-10 code...
yep, same conditional
/me checks 7.24, though it's been broken it was probably broken long before that
At least as far back as 1990 per git blame
svn:revision:4527
/me must concede that was fun... git blame FTW
cool
yeah, 7.24 treats it as a file
don't see a documented way to override in the man page, good to go
svn:revision:3680
I think that's where it's original form got added.
er, so did you test that remote actually works? :) run fbserv on brlcad.org something
something like -Fbrlcad.org:file.pix
that should be a remote /dev/disk
What is that supposed to do? Upload a file to brlcad.org?
wherever the framebuffer server is, yeah
manpage example is using it to launch remote transient windows
Do I have permissions to make that work?
like from brlcad.org, you could -Fyour_local_ip:/dev/Xl assuming networking routes are open
try localhost -- that should still go through the interface
-/
things like -Flocalhost:/dev/X and -Flocalhost:file.pix
can see examples in brlman libfb
How do I launch an fbserver to listen for that?
man fbserv ;)
fbserv 0 /dev/X
So localhost:/dev/X is an alternative to just specifying the port number?
no
at least I don't think so -- it's saying launch a transient /dev/X on the remote host
./bin/fbserv 0 /dev/X
./bin/rt -Flocalhost:/dev/X share/db/moss.g all.g
pkg_open(localhost, remotefb): unknown service
pkg_open: client connect: errno=111
rem_open: can't connect to remotefb server on host "localhost".
fb_open: can't open device "localhost:/dev/X", ret=-4.
rt: can't open frame buffer
Oh, duh
./bin/rt -Flocalhost:0:/dev/X share/db/moss.g all.g
OK, so yes - works in trunk
testing 24 behavior
hm, it's not doing anything with anything after the :0 if you put a port
That whole aspect of the framebuffer I/O gets very little exercise, to the best of my knowledge - the closest I know of that happens semi-regularly is when rtwizard does its wackier stuff with multiple programs targeting one fb.
looks like it's a space if the man page is right
yeah, it's not working even following the man page example in 7.24 so not a release issue at least. maybe broken long ago.
The only other instance I can think of where any of the over-the-network features were at play was your distributed rendering
That's a thought - wonder if remrt works
give it a go
I'm not much good with the multiple machines and network communication stuff Sean - I can try, but the only thing that comes to mind is running a remrt here and trying to enlist bz
that'll work -- I've used it before that way
fb_open(name?name:framebuffer, xx, yy)) is it's invocation pattern - is that what we need to test here?
Do you usually launch it from within MGED?
it's really just replace an rt call with remrt, then run rtsrv pointing it to that host
no, that'd get ugly
usually start with a saveview script
Man page calls out "rrt remrt -M -s###"
then edit the script to replace rt with remrt
rrt should work, but that's introducing a whole other thing too just fyi
er, and rrt -M is going to wait for stdin
/me just noting that the man page could use a step-by-step example
which is a lil tricky from mged
i mean, with all the proc changes, should also make sure rrt still works ...
but would test that separate from remrt
just try like rrt rtedge
/me nods
but yeah, remrt is usually for big long-running jobs -- no way I'd do that inside mged
First let me see if I can get anything to work... considering how long it too me to get hello world going with libpkg this could be good for some comedy
so to assist, here's what you'll need (from memory). 0) start an fbserv -S2048 0 /dev/mem, 1) run mgd saveview, 2) edit script to call remrt, 3) edit script switches to remove -o outputfile and file logging lines, 4) edit script to -F0 or whatever your fbserv is running on, 5) run script make sure it's waiting for clients, 6) run rtsrv from two hosts
you'll of course want to make sure the render job is expensive enough to not immediately terminate. one of the nurbs samples with ambient occlusion should easily do, so you could add -c "set ambSamples=1024 ambSlow=1" to the script
unrelated 76891 needs NEWS
also, your 76890 comment sounds like you might not be familiar with that convention, but '-' means stdin or stdout depending on the context and tool, it's like a magic name so a user can specify stdin/stdout via args
we're not consistent but a few of our tools support that, that many also don't yet
/me nods. Not familiar with it, so good to know.
can see examples of it in action: grep \"-\" src/util/*.c
usually it'll be something like "cat file2.pix | pixdiff file1.pix -"
So... How will I know when the rtsrv command has connected with remrt?
and of course -- typically means "stop processing args, pass the rest along to the next thing that processes args"
Blast - can't contact
remrt logs it getting jobs
rtsrv: unable to contact <ip>, port <num>
it's instantaneous when everything is correct
use rtsrv -d for debugging
OK the port it was listening on wasn't the port that remrt claimed it was listening on
Uh... "setpgid: Operation not permitted" and rtsrv died.
defaults to port 4446
08/22 20:45:45 Automatic REMRT on ubuntu2019
08/22 20:45:45 Listening at port 24081, reading script on stdin
what port is it referring to there?
the "Listening at port .. #" is that whole automatic transient port thing we were talking about months ago
Oh, that business. ugh.
that's not the connection port, that's just the internal listening reassignment the kernel gave it
there's not really much value in it printing it .. obviously misleading
all it really means is that it's listening, and a rtsrv -d localhost 4446 should attach
It does, but rtsrv is dying when it tries to fork with the setpgid error
uh oh, looks like fbclear -c is broken
And launching rtsrv from bz isn't getting through to remrt
@Sean I don't think my machine is accessible from the outside back in...
try via localhost
and yes, you'd definitely need to configure your router if you're not using a DMZ
remrt sees a connection attempt, but that's when I get setpgid: Operation not permitted
and rtsrv dies
add -d
PROTOCOL_VERSION='BRL-CAD REMRT Protocol v2.0'
using 12 of 12 cpus
ph_loglvl 1
ph_dirbuild: NIST_MBE_PMI_2.g
ph_dirbuild: rt_dirbuild(NIST_MBE_PMI_2.g) failure
make sure the .g is in cwd
there's ways to tell remrt to transfer it, but I don't know how to set that up
Ah, there we go. I thought it would send the .g over the wire
default assumes all rtsrv's have access
-d or not, it should probably print the file not found error - that setpgid thing was extremely confusing
looking at the code, setpgid is probably just a perror warning
meaning it's not authorized to run in a proper daemon server mode
which is part of what it's trying to set up there
debug mode skips all that including skipping the fork+exec
If you don't mind, I'm going to add these steps as an example to the remrt man page...
heh, why would I mind?? It is a whole section in the advanced rendering guide I started last year
I really need to format that into docbook
Wanted to make sure you weren't already altering the man page.
no, as awesome as it is for production rendering, it's been strictly minimally maintained thus far -- every big render I've done I've come across dozens of bugs and inflexibilities and things that need to be improved.
I've only done the minimum needed to get it working, try to justify making a few improvements until the next time.
things like setpgid() and the error reporting just scratch the surface.
I don't want to get distracted from the release, but I'll at least try to write down what it took to get it running so I don't have to repeat the flopping a second time...
I'm not sure what's wrong, but remrt isn't working here for me
and fbclear -c does seem to be fully busted
dont' know if that's new
I'll check in a few minutes, once I'm done with the man page... one sec...
@Sean is the fbclear breakage new?
If it's hanging indefinitely, I'm seeing that in 7.30.10. if_remote.c:702 is doing a pkg_send, then waiting on 719 for a response that's apparently not coming.
what's up with r76889? sounds like you disabled a failing regression test...
I partially re-enabled a test that I had fully disabled earlier. The input file being fed to asc2dsp for the other part is reporting an invalid character - I can't tell if it ever worked or even should have worked...
That's a drawback with how some of the regression tests are set up (or at least, it has been - I've not done a systematic audit) - intermediate commands can fail, but as long as the final command executed by the script returns 0 the intermediated failures don't always fail the test.
If I remember correctly with asc2dsp, the input file names passed to asc2dsp were incorrect (probably after something got moved or renamed). However, asc2dsp still creates an empty output file in that case. So the test created two empty output files, which cmp found were the same, ergo the test passed.
I'm not sure if asc2dsp just doesn't really support the "old" format, or this is one of those cases where a .dsp file got tagged as ASCII rather than binary (or vice versa) and the file contents at SVN checkout don't match what asc2dsp is expecting.
Barring bugs or new deviations introduced, all of the tests were structured to accumulate any errors and return that as a result. Relying on the last command's status is not a good idea for a whole host of reasons.
To that specific commit, sounds good if the test was already disabled and it's a partial re-enabling. We shouldn't disable a test if it was working was the only concern.
It's not doing any regression testing from the looks of the code either, so since it's presenting a maintenance cost, may make sense to remove it.
@starseeker on the matter of NULL checking, given the state of all the calling sites needing a null check and most still not really looking like they'd gracefully handle a NULL condition at all, it's worth considering making the API never return NULL (i.e., making it impossible).
You mean returning an "(NULL)" string?
General way to do that would be to make the error condition a special value instead of a special pointer. Any value could work but ideally on that might make sense when printed.
I'm working my way through the list of functions you itemized earlier, trying to make sure they do something sane.
so yeah, "(NULL)" could work though that's what stdio uses on Linux, so wouldn't be able to disambiguate.
I noticed, appreciated as there were a slew of potential null derefs introduced -- that's why I mention it.
@Sean I'm leery of going too far down this rabbit hole, since the case of a null DMP isn't going to come up in normal usage and the entire structure of this whole thing (IMHO) needs some serious TLC
yes
that's why I mention it, I was leery too -- adding all the null checks doesn't feel right because it's decreasing readability, increasing complexity
bu_list everywhere, globals galore... feels like trying to paint a rust bucket, in some ways
but that's the API. so either the API should change to not return NULL, or the callers are technically required to test it, however rare a condition
I'm seeing a fair number of null checks already, actually, as I'm going through - found a few missing
whether the calling code does something "sane" after doing the check is another matter, but that was presumably baked in from the get go...
I think the reason it looks funky now is mostly that I straight up replaced all variable accesses with function calls, which resulted in a lot of double calls in if tests and bu_vls string outputs - that's most of what I'm trying to clean up now.
I'm game to make it return not null, but I'm not sure what to do about the functions returning bu_vls strings in that respect.
Do we malloc a vls internally in the null case, and then the caller has to check for a null vls and free it if that's what they get? that feels wrong...
starseeker said:
bu_list everywhere, globals galore... feels like trying to paint a rust bucket, in some ways
That can be said of almost any code, but I don't think it's a constructive perspective. Dismissing any issue in code and more importantly allowing new ones to be introduced because someone else dismissed something is not something I support (and HACKING has explicitly called out since inception). That's very much a canonical fallacy in my experience.
starseeker said:
I think the reason it looks funky now is mostly that I straight up replaced all variable accesses with function calls, which resulted in a lot of double calls in if tests and bu_vls string outputs - that's most of what I'm trying to clean up now.
I realize that, but therein is now a new issue that did not exist before as functions express contracts, variables do not. So you kind of birthed this beast :smile:
Public struct members don't constitute a contract?
starseeker said:
I'm game to make it return not null, but I'm not sure what to do about the functions returning bu_vls strings in that respect.
That's tricky as there's dynamic memory associated. How about eliminating them?
Not trivial to eliminate. Calling code assumes ability to stash the tcl path in them, IIRC
Tk window path rather
starseeker said:
Public struct members don't constitute a contract?
They're just containers. There is no expectations for C or C++. They often are assumptions on those containers, but they're all generally subject to error. That's why for C and C++ at least the best you can generally do is dictate a convention of who owns the memory.
C++ formalizes that a fair bit better, but in C it's typically whomever allocated is responsible for releasing.
so whoever created the struct would be responsible for ensuring it's embedded values and pointers are "sane". That's also still not a contract on those values by callers though -- it was just the creator/initializer's reponsibility to minimize the risk, ideally provide a safe API, which is what codifies expectations.
So, meta issue for a second... up until now we've not considered the libdm/libfb APIs truly public, since they were not intended to survive in their current form. That's still true - this refactor is not what I would propose if the intent was to establish a proper API. It feels like you're wanting to treat the function wrappers as something more "public" than the old libdm/fb structs - am I misunderstanding the intent?
starseeker said:
Not trivial to eliminate. Calling code assumes ability to stash the tcl path in them, IIRC
How about making the caller pass a vls to it
Not sure how hard that would be - don't know if calling functions are set up to properly manage memory. We might have to make application global vls strings for that to work in MGED.
Would probably work, but even in MGED introducing new globals just feels wrong...
To the meta point, no -- public API has a much higher burden still. This is an issue with any functions, not one just applied to public functions.
The issue is functions that generate NULL. It's one thing to be handed NULL and then pass that along, it's another to be the returning source of NULL.
For the code having issues from a NULL from one of these functions though, wouldn't it have had the same problems trying to access an invalid struct member in the old code?
The only place that should be given a pass is performance-critical code, so you will/should find deviations in librt and some parts of libbn/libbu used in those performance paths.
starseeker said:
For the code having issues from a NULL from one of these functions though, wouldn't it have had the same problems trying to access an invalid struct member in the old code?
It really depends on the specifics, devil in the details. There are entire theories of corporate development that prohibit refactoring in NULL as it's exceptionally common for code to coincidentally work. So there could be swaths of logic that happen to gracefully handle a variety of invalid conditions by accident, but handled none the less. I don't think that's our issue here or concern -- it's the simple principle of having a function that can return NULL, you're supposed to check it otherwise there is no point whatsoever in returning NULL and the consequence will be eventual obscure crashing.
We've ran into this issue many times over the years too, so it's not just a rare hypothetical.
So is the shortest path forward for to finish the check through the code for the null checks, or should I try to look at excising the bu_vls slots from the dm implementation struct?
Need to tie this off - I'm game to do the work, but need to have a finish line where this can be called done
Shortest and safest path depends if we're talking long or short term, but the difference is probably only an hour or two effort either way.
All right - what in your estimate is the best way forward?
Make it so caller doesn't have to check null, make caller pass in container if it would otherwise result in an allocation
if null must be returned for some deeper hard-to-untangle reason, then the call sites should get checked
I'll see what I can do.
btw - is this the last big blocker, or have you see other things that'll need to be handled?
(so far - I know you're still commit reviewing...)
I've been getting through about 100 commits per day
so more than half way there now
course that includes reviewing the new stuff as it's coming in, so it's an uphill struggle :)
there's a lot of little undocumented things that I'm concerned about but not enough to hold things up
I could hold off and do one big commit to try and clean up libdm :-P
lots of things that beg testing :(
that'd just screw up the commit review rate :P
you heard about paragon's give to linux kernel? :)
just heard about that yesterday
No, haven't been plugged in to outside world much lately - what'd they give?
they submitted their ntfs driver
it's a 27k line pull request
<snork> Given how the Linux folks reacted to ZFS, I can't wait to see the comments on that one...
Looks like we've got 4 vls strings in the dm container - two Tcl/Tk drawing window strings, the display name, and a log.
paragon is known for having one of the best ntfs drivers and linux needs it, but they're also notorious for having crappy issues in their drivers (including the ntfs one)
starseeker said:
Looks like we've got 4 vls strings in the dm container - two Tcl/Tk drawing window strings, the display name, and a log.
A real cheap 'punt' that avoids lots of restructuring could be to convert them into char[]
then you'd only need to worry about the handful of sites that write
/me nods - got to check where they're being accessed.
like if you made them four char[255] arrays, just need to make sure nobody writes past that
dm_log looks like it should have been a flag from the comment
unless it's the actual log, then the comment is just wrong
Probably want the Tk paths a bit longer - some of the Itk names can get long.
I'll have to check, but I think it's supposed to be a log filename.
the other three look like they could get away with #define MAXNAME 255 ; char[MAXNAME] or similar
ah, if it's a filepath, then we really gobble up memory but that's fine
for filesystem paths use MAXPATHLEN
Does Tk have a maximum path length for their window names? If so that's what we'd want there...
/me flips a type to see what blows up in the build...
I doubt it.
in practice for dm, though, the path names are things like .id_0.some_window.childsite.widgetfoo_window.dialog
that would be a pretty extreme example too. most are far more concise
OK - as long as Archer doesn't do anything crazy - I remember Bob command line manipulating Itk windows once, and the names were kinda nutty
I've never seen any get close to 256
/me nods we can give 255 a shot. I imagine we'll know pretty quick if there's an issue...
I was thinking indexing when I wrote that 0-255 .. size should be 256 so it stays aligned in memory
that's also where the calling sites that write into the buffer can then just bu_bomb justifiably and continuing code can assume it always works
oh wait
so there is another solution here that may be even less work
and can keep the vls
the issue is strictly that they return NULL, so like dm_get_pathname() you could just make it halt on that condition. then callers never need to check. It's not a great pattern to do aggressively, but it certainly seems reasonable for the four vls functions as they're in the dmp.
i.e., they're not pointers in the dmp
dm_get_vp() is a good counter-example. it returns null and is potentially null itself in the dmp.
Make it hault? You mean bu_exit?
no, bu_exit is for applications
bu_bomb
or a magic number check
(which bu_bombs)
since that's conceptually what you're doing with the tests anyways, just checking the dmp
might as well shove a magic in there and BU_CKMAG it
then no null return possible
@Sean a quick experiment with a BU_CKMAG in the dm causes a couple of MGED crashes...
Working through them...
There we go - was r76921 what you were thinking?
(along with r76922)
If so I'll have to make another pass in the calling code to handle the (DM-NULL) returns instead of NULL (there may still be cases were we don't want to feed that to a Tcl script for execution...)
starseeker said:
Sean a quick experiment with a BU_CKMAG in the dm causes a couple of MGED crashes...
Nominally means NULL is/was being returned at least somewhere/sometime! That's why all the fuss... even if the condition is an unexpected/unintended behavior or even if it happened to work today (might not tomorrow after some other change or recompile).
I noticed several of the funcs are unused -- what's up with that? :)
At least one of them I just added as a get/set pairing, when only set was needed
Actually, crash ended up being silly - just me missing one of the magic initialization cases in the allocation logic.
starseeker said:
At least one of them I just added as a get/set pairing, when only set was needed
But if nothing reads them, why bother setting them... can they be removed?
It's read internally by the ogl backend, IIRC - a parameter passed from the app to the backend. I'm sure it can be refactored somehow - there's a TODO note about it - but I haven't had time to fool with it. That level of the drawing logic, it's easy to break things in subtle ways that are hard to catch.
OK, so upon further investigation (at least in latest trunk) the X dm seems to work embedded in Tk on Linux - it's only the dmtype set (runtime switching of the display manager types) that's failing.
Setting it in the .mgedrc works fine. That's not too surprising, really - dmtype was always a rather evil hack...
does attach work?
attach X .. attach ogl
Yes, attach works.
both simultaneously?
Yes.
mged -c -a X share/db/moss brings up the X based version, then I can "attach ogl" and also get an OpenGL version. "e all.g" draws to both.
That's great. It should accept an arbitrary number of attach calls. That's wicked fun way to set up mged on CAVE walls and projection displays.
just fyi, in regex [12] and [1-2] are equivalent (saw the real error, missing a right bracket)
@Sean any more issues pop up in commit review?
quite a few flagged as needing testing, but none yet that warrant release blocking. some still to document; more likely to miss one or two of those :/ .. I think I'm over 90% through, so should be done soon.
/me stares at the MGED and dm-ogl code, and reluctantly concludes that a major shift in the way dm/fb rendering events are managed is too much of a time sink to tangle with for now...
Will require a deep dive into Tk window setup, management, and events... blegh.
@starseeker one thing you could check for me -- can you see if obj export is working? see if an object from bot_dump and g-obj will open in anything else.
was reported and recorded but don't think it got checked at all as a release blocker, don't think we have anything that validates obj, only verification
/me is down to 165 remaining, woot!
bot_dump'ed a facetized sph to an obj file, and ran ./bin/g-obj -o g_obj.obj sph_bot.g sph.s on the same sph - both .obj files opened successfully in meshlab.
So working at a basic level, at least on Linux.
Is it me, or is sourceforge slow today?
huh, okay. thanks for checking! that's really bizarre because I do recall that distinctly not working. Does Blender open them? I'll have to recheck xcode and cura, or maybe it's somehow mac-specific.
I'll try on Windows - rtwizard is barking only on Windows, so maybe it'll expose this as well...
Seems OK on Windows as well. Only thing noticed so far was an empty file getting created if I tried the -b option with -t obj for bot_dump. It provided an informative error message, but ideally shouldn't have produced the empty output file.
@starseeker do you have any more information you can give about the NUL file getting created -- what does "test -f /dev/null" report?
I'll see if I can tell - it only fails when run inside of CTest - a straight command-line run succeeds
all the more bizarre.. 'test' is used all throughout the benchmark and other scripts run by ctest.
very concerning if something this fundamental isn't reliable
I'm not sure what to make of this. If I put the following in:
if test -f /dev/null ; then
echo $?
echo "have /dev/null"
else
echo $?
echo "no /dev/null"
fi
I get:
875: 1
875: no /dev/null
I get that both in and out of CTest, but if I make the "NUL" file that gets created read-only CTest will abort after trying to write to it and failing, but running outside of CTest benchmark still proceeds.
put in something more explicit like "echo ls -la /dev/null
" before the test and
those were backticked
could just put ls -la /dev/null to make sure it's not running in some kind of environment, also "which test" and see if /bin/test -f /dev/null and i [ -f /dev/null ] behaves differently...
crw-rw-rw- 1 root root 1, 3 Sep 2 16:50 /dev/null
/bin/test and /usr/bin/test don't seem to make a difference...
what about: if [ -f /dev/null ] ; then ...
No change.
and this is linux you said??
Yes - Linux + CTest
how are you invoking ctest?
ctest -R benchmark --verbose
dvn=$(test -f /dev/null)
echo $dvn
echo $?
returns:
0
and silly question, but sure it's regenerating the benchmark script, picking up the changs?
this literally makes no sense...
I'm directly editing the copy in bin/benchmark, and not re-running CMake (not silly question, I made that mistake when I first started trying to fix)
with ctest --verbose, does it show the actual shell command it's invoking?
/usr/bin/sh "/home/cyapp/RELEASE/build2/bin/benchmark" "run" "TIMEFRAME=1"
But if I run that straight up, it does start running even with the NUL file read only (all the printouts seem to be the same though...)
is there a trace flag or something that can give more info?
-x to the shell script, but that's not helping much...
+ ls -la /dev/null
+ echo crw-rw-rw- 1 root root 1, 3 Sep 2 16:50 /dev/null
crw-rw-rw- 1 root root 1, 3 Sep 2 16:50 /dev/null
+ test -f /dev/null
+ dvn=
+ echo
+ echo 0
0
+ [ -f /dev/null ]
+ echo 1
1
+ echo no /dev/null
no /dev/null
Interesting, I didn't notice before, but I'm seeing the behavior here in ctest too
this is beyond messed up...
oh crap, right, huh.
my mistake all along.
While we're at it - should I go ahead and pull in the gcv changes?
fixed
no, I'd hold up on them. lots of potential for brokenness.
/me nods - agreed. So RELEASE isn't quite equal to trunk, I misspoke earlier
I didn't look in detail, but at a glance appeared to just wildcard it
The gcv changes? Well, the intent was to explicitly set the context, instead of assuming MODEL.
Essentially remove the implicit assumption, and check extensions and the like against more types than just the model types.
right, but a plugin could be multiple / any contexts
Right. In which case, it sets the generalized match, the gcv program will pass its inputs to the plugin regardless, and it's up to the plugin to figure it out.
which then begs what are the contexts for, just complicates api and callers if most of our tools will want to allow multiple types
I'm not sure they will - for example, wouldn't icv want to accept just image types?
already ran into a problem with sumanga's plugin where he had to force PNG mime type because some other plugin was happily parsing it as pix data
no, probably not .. same reasons it's a problem in gcv. someone might want to import image data from a spreadsheet or some random application/vendor proprietary type
That's going to place a massive input validation burden on each individual plugin though - that feels wrong.
that would be wrong, I agree, and it's worse than that
pix reader plugin isn't going to have any basis for denying input really
for exampl
Right. That's I think the original motivation for the contexts - there's no guarantee that file extensions are unique mappings to types, so we may need the context to disambiguate if (for example) an image extension and a geometry extension are the same.
it's up to icv/gcv to somehow prioritize unless overridden based on file suffix and/or signature or some other measure .. a callback can't/shouldn't know about other formats
that's a case of clear ambiguity, that's fine
that has to be called out, it's fine if that case has to be explicit
How do we allow the user to specify the resolution?
before even getting to options, there's a question of whether the file is even getting routed to the right plugin. maybe a scoring system?
I don't quite follow about the callback - the callbacks can't really be format specific. GDAL is currently the canonical example, but assimp will be similar.
what do you mean the callbacks can't be format specific?
they are plugin specific
a plugin certainly could be specific to a format
and their callback would be specific to that format...
Right, but plugins in general aren't necessarily format specific. Hence the "wildcard" specifier for types when hooking in a plugin
(if that plugin does support multiple types - otherwise you specify the one it does support)
Unless you're looking for some way for a plugin to register itself for multiple formats - that'd be a completely different architecture than what we've got now...
sure, but that's a different statement than "can't" .. they can and are .. and they might not be, and sometimes aren't
if they're all wildcard, then there wouldn't really be much point in typing at all and the api complexity it entail
Sure, but the majority aren't general and if we can limit the types of data those limited ones get fed it bounds the data validation problem somewhat.
sure but it doesn't solve the problem :)
i mean png-to-vol is a good example. if we marked the mime type as auto, then gdal proceeded to grab the input and crash-n-burned on it. if marked as png, then the new plugin got to try.
new plugin was made to use icv, so in theory it handles any icv type now, but can't set a type that gets it routed to it without it being specific
My off-the-cuff thought there was to set a generic type and an image context - that way the plugin would get a crack at anything gcv identifies as image data.
The png only version would get a PNG image type instead of the generic type and the image context, which means it would only be invoked for PNG image data.
I mean in fairness, this is probably a case where there's a genuine conflict -- both gdal and an icv reader can read images
so needs someway to force which and/or fallback to another when the one chose fails and there were alternatives
/me nods - I don't think we've worked out that part of the command option set yet - this may be the first case that's popped up that actually results in it tripping on the issue.
in this case, we didn't actually want a png-only version -- want it to be any icv format
Right, but that's an interesting point. If we did have a PNG specific version and the generic icv version both present, it would be a reasonable inference that the format specific one is in there because it could do a better job on that specific format than the generic alternative. (Otherwise, why have both in the first place?)
That breaks down when we get to more complex things like geometry where it's too complex to guarantee one of anything is better than another, of course, but the inference will likely still be drawn.
I don't really have any good answers - the existing gcv app was implemented with the mindset of replacing all of the -g and g- tools (which didn't impose any of the multiple plugin issues of the more general solution.) So it's probably got quite a few limitations of that sort...
yeah, I don't think we want to build assumptions like that into the system. we either need to find a way to let plugins encode/grade/characterize themselves or let it get resolved explicitly by the caller
just because one plugin reads multiple formats (e.g., assimp) and another one format (e.g., stl) doesn't mean either is better at it and likely isn't the case
I know, but a naive user looking for solutions will see they have an stl file, see "stl" in the listed plugins, and almost certainly jump to that one. If we're lucky, they may think to look for other options if that one fails, but even money they won't.
if it were the same author, then I could see the case for inferring "png" plugin trumps "*"plugin when presented with a png file, but that's not an assumption we can make
starseeker said:
I know, but a naive user looking for solutions will see they have an stl file, see "stl" in the listed plugins, and almost certainly jump to that one. If we're lucky, they may think to look for other options if that one fails, but even money they won't.
sure and that behavior doesn't present a problem... let them explore
they can infer and assume whatever they like
it's whether the code is inferring and assuming anything that we have control over
I'm suggesting the code should not, for example, "prefer" sending a .png file to a particular plugin because it's name has 'png' in it or it's type was set to 'image/png" whereas the icv and gdal and other plugins declared multiple image support
But it will need to prefer something, unless we want to scatter-gun multiple conversion attempts in parallel automatically and try to detect the "best" result somehow...
i.e., specific type shouldn't intrinsically trump a wildcard type; wildcard should just be like a globbed expansion of types (and maybe we need to make it be something like that)
We can establish a plugin ranking for plugins we are providing, but if users start hooking in their own 3rd party plugins how will that interact?
it could need to pick something, but not necessarily "prefer"
Oh, blast it:
Files present after distclean in /home/user/RELEASE-build/distcheck-autodetect_release/build:
bench/NUL
yeah, I don't see how ranking would work (and not b totally gameable)
starseeker said:
Oh, blast it:
Files present after distclean in /home/user/RELEASE-build/distcheck-autodetect_release/build: bench/NUL
that's certainly from earlier testing.
/me double checks to make sure he's got the test -e update from trunk
I don't get a NUL any more
Doesn't look like it - let me make absolutely sure and start clean again...
kind of funny that it seemed to work writing out a NUL file.. it almost certainly wasn't fully behaving and just appeared to work
make sure you don't still have a customized installed script
with debug printings and such
/me nods - clean SVN release branch, updated, with the trunk fix pulled in.
it all makes sense now. I got my flags mixed up forgetting that /dev/null is a file in the eyes of the filesystem, but not in the eyes of the test command where it only distinguishes "regular" files
note, test -f /dev/null did not return 0 for you. looking back, your snippet printed the return code of running the 'echo' command. ;-)
OK, works in a clean toplevel test... kicking off distcheck-full again
it returns 1
Ah. Figures. I still can't believe I'm enabling you to keep shell around by making it work on Windows... ;-)
It's even worse than regex expressions
OpenBSD behaved itself, by the way.
I'd still love to create a proper geometry shell, gash or whatever for navigating a geometry filesystem using libged commands and shell intrinsics
then we could have some seriously (more) powerful scripting constructs for creating and manipulating geometry
we're so close
gsh is a start - still just a raw argc/argv interface right now, and I'm not happy with the subprocess callback, but it's heading in that direction.
gsh is cool, but that's completely on the other end of the spectrum.
Oh, just so I know where to look - are you planning to update the NEWS file in trunk, or RELEASE?
i'm talking about taking something like zsh or bash and replacing all the libc I/O calls with libg I/O calls
Ah - you mean a complete port of the shell environment, not just some shell interaction constructs on top of the argc/argv calls
having a full "geometry shell" environment including notion of current working dir, most of the built-in commands like cd, pwd, ls, stat, etc, plus command-line editing, command history, key bindings, terminal interfacing, etc
right, full deal
we're literally less than a gsoc-project away from that being realizable
that's why I started putting in the libbu/librt dir.h API as that's the foundation of the shell
I considered for a while how we could actually use shells unmodified, but that would require some pretty heavy FUSE integration work. when I last looked, it felt like creating a full FUSE filesystem driver was going to be quite a bit more work than porting and customizing a shell.
plus there are some legacy shell constructs we probably don't want to keep
/me nods. How would terminal interaction work with Windows though? Only native sh port I'm familiar with other than the Git thing is the Windows zsh port from about 10 years ago... zsh-nt or some such.
terminal is a separate issue from the shell. can display a shell in any text environment, even dumb ones like a Qt text widget (just a lot won't display and behave right)
terminal would likely be something like what git-bash is using (i.e., a customized msys)
Phew! There we go - one of the distcheck tests passed clean. Must have had a stale file somewhere in the old build.
@Sean distcheck full succeeded
(Initial success was on Ubuntu - also checking CentOS and GhostBSD)
distcheck-full passed on GhostBSD except for what looks like another manifestation of the BSD threading issue.
CentOS 8 distcheck-full passed.
@starseeker you are off by one throughout r 77191 .. strl*() all take the size of the allocated buffer. they ensure the last byte is nul so you don't want to drop the +1 that was added for nul.
@Sean - did I follow correctly? (r77193)
nope, wrong way
now off by 2
it's the size of the allocation
that's what strl wants
so find the buffer if it's static or the malloc/calloc size if it's dynamic
that's the size to put... they made it simple
so it's:
char str[COUNT];
func(..., str, COUNT);
You're saying inside func, it would be bu_strlcat(str, in_str, COUNT+1) to correctly copy in_str into str?
heh, no...
str[COUNT] ... strlcat(str, str2, COUNT)
But... isn't that what I had in 77191?
I saw str = malloc(cnt+1, ...) ... strlcat(str, .., cnt)
They may have both...
So looking at get_style_tag_for_cell...
fort.c:2397 it looks like it's getting cell_style_tag, which is TEXT_STYLE_TAG_MAX_SIZE
There was F_MALLOC((sz + 1) that was in the logic that got all up replaced with a call to bu_strdup in 77191....
yeah, that's probably the sz+1 I saw replaced with a strlcat sz, but then the definition of sz changed too
As far as I can tell, it's these 4 functions that are calling the strl logic:
get_style_tag_for_cell
get_reset_style_tag_for_cell
get_style_tag_for_content
get_reset_style_tag_for_content
They're all, as far as I can tell, writing into static buffers defined with constants passed to sz. The only more complex case is line 4469.
And I think that's right?
(I reverted 77193, btw)
4469 looks right
yeah, I should have read the file instead of the patch, greater context
Wonder if they'd take a strlcat patch upstream? As far as I know strlcpy is still non-standard...
Although we might be able to use strncpy there, actually...
can always use strn .. it's just much more error prone, especially when cat'ing into existing buffers like the 4469 case where it's appending
Did glibc ever add strlcpy? I know they were adamantly opposed to it for a long time...
Oh well, no biggie - easy to adjust for regress once the pattern is clear.
Now just need to make sure it works on Windows...
Ah, there we go.
https://sourceware.org/glibc/wiki/strlcpy
sigh. figures
looks like there was a second patch 6 years ago that's getting better reception but still not integrated / released
bu_strlcpy it is
I see MSVCRT isn't on board either
@starseeker seeing a weird ged plugin issue. the bigdb gtools test is failing on mac. It works when called directly in the build dir (e.g., src/gtools/tests/bigdb 1), but fails when run via ctest with:
Start 874: slow-bigdb_1gb
874: Test command: /Users/morrison/brlcad.trunk/.build/src/gtools/tests/bigdb "1"
874: Test timeout computed to be: 1500
874: bigdb_tops="unknown command: tops"
874: bigdb_idlen=43
874: bigdb_szlen=1073741825
874: bigdb_lenmatch=0
874: bigdb_strmatch=0
1/1 Test #874: slow-bigdb_1gb ...................***Failed 8.97 sec
I can only imagine this has something to do with the plugin system, but don't have time to debug it at the moment.
all the rtwiz tests have also been out for a while with tclcad init failures, assumed you know about those but maybe not? they report "can't find package cadwidgets::RtImage"
just fyi, haven't tried a rebuild today if you made a recent change as I'm in the middle of debugging session.
I think r77523 will take care of the bigdb error. Wasn't aware of rtwiz failures - they're not part of the standard tests because they can't be safely run in parallel. I'll see if I can run it to ground.
How are you launching the rtwizard tests? ninja regress-rtwizard just succeeded on Linux...
Ah, fails on BSD
OK...
Oh - it's a bundled vs system tcl issue, I'll bet...
@Sean I think r77525 should have it.
just got a successful regress-rtwizard on bz
@starseeker I don't know how long it's been an issue, but I'm seeing a pretty big behavior change on trunk with rt blocking until the framebuffer window is closed. This is on a default build on Mac, affecting both classic and non-classic runtime.
Anyone else seeing similar? (just run "rt" in windows on some geometry)
You mean launching rt from within MGED? Not seeing that on Linux...
Yes, it's blocking for some reason.
Might be the r77436-77454 libtclcad changes to fix an issue with Windows.
I finally got a test Mac graphical setup. Can confirm the MGED prompt ignoring input while rt is processing, although it seems to be a bit more subtle than just the framebuffer being up - if I wait long enough, it will resume taking input with the fb lingering.
It's almost as if rt and MGED are sharing channels. Definitely something to do with the rt callback handler rework
Very surprised this is unique to the Mac... wonder what's different?
the bad news is right at this second I have absolutely no idea how to fix it...
r77435 isn't any better, so it's an older issue than those changes...
r77073 has the same problem...
r76199 works
r76226 works
r76621 works
r76654 works
r76670 works
r76903 does not work
r76800 works
r76850 does not work
r76825 does not work
r76824 is the one that broke the GUI MGED interactivity
@Sean Can you see if r77998 fixes your issue on the Mac?
Sure, will do
Confirmed, appears to fix the issue.
nice work
so what was the problem?
I set up a handler on STDOUT, trying to fix a crash on Windows. It avoided the crash on Windows, but it was the wrong way to do it. It looks in the end like a copy-paste error setting up the original logic meant I was deleting a structure before stderr was actually clear (checked stdout twice instead of stdout+stderr), which (surprise) messed up the application. A stdout handler avoided the crash all right, but messed with a channel the callbacks shouldn't have been hooked up to.
It might explain a bit of quirky I/O behavior that's been reported on Windows too, although it's hard to know that for sure.
In retrospect I'm surprised it only showed in the Mac GUI configuration. Tcl/Tk has a rather irritating habit of "almost" working and masking problems, sometimes...
I'll have to double check everything is working now across all platforms, but if that's got it the last significant blocker I know of now is the gqa plot file problem. That was a user report, so I'll need to run it down before releasing.
@Sean I shifted a fair number of things down in the "do before release" queue - let me know if any of the ones I moved are "need to haves".
Okay will do.
The only concerning thing I've noticed is that I encountered mged command line corruption again recently. I need to fully rebuild clean to confirm as it's tricky to reproduce, but it's the same behavior we had a year ago when there was a bug in static initialization.
Becomes apparent when jumping around the mged command line in console mode, cutting to end of line, pasting, navigating with error keys, up arrow. Resulted in unpredictable but obvious command line corruption.
Hmm. Not ringing a bell - it was caused by static initialization error?
Yes, fortunately/unfortunately the bug was found before release and never made public, but that means there's no NEWS breadcrumb on what the exact cause was not that it matters here. I seem to recall the prior being bu init related and I don't think that code has changed. It's probably just some other corruption.
But like I said, I need to confirm with a clean build before throwing up red flags. Just a yellow caution for now. :)
Was having some errors compiling libbu
because there was an unresolved external symbol BU_SEM_DATETIME
. Seems like in bu_init.cpp
, there is extern "C" int BU_SEM_DATETIME
, but BU_SEM_DATETIME
is defined in datetime.cpp
which is a C++ file. Removing the "C" part fixed the error for me.
^ would changing something like this cause issues outside of Windows?
Possibly - what if we do r78245 instead? Does that work?
I'm still compiling r78244 right now, but I'll let you know.
Last updated: Jan 09 2025 at 00:46 UTC