Anyone else seeing distcheck-full errors on main? I'm getting a handful that look like valid bugs:
...
35/922 Test #33: bu_file ....................................................***Failed 0.01 sec
file [FAIL] test input file bu_file_test_dir/bu_file_1 is incorrectly reported to be executable
...
373/922 Test #374: bu_color_to_rgb_floats_1 ...................................***Failed 0.00 sec
Result: 0.752941,0.305882,0.839216
...
Start 922: ged_test_drawing_quad
919/922 Test #922: ged_test_drawing_quad ......................................Subprocess aborted***Exception: 0.00 sec
dyld[94599]: Library not loaded: @executable_path/../lib/libtcl8.6.dylib
Referenced from: <77324E71-4FC8-3FA9-B662-6DD9330C4344> /Volumes/X10/brlcad/.build.debug/distcheck-in_src_dir/brlcad-7.40.1/src/libged/tests/draw/ged_test_quad
Reason: tried: '/Volumes/X10/brlcad/.build.debug/distcheck-in_src_dir/brlcad-7.40.1/src/libged/tests/lib/libtcl8.6.dylib' (no such file)
920/922 Test #907: rt_cache_serial_multiple_different_objects ................. Passed 7.23 sec
921/922 Test #555: bu_mappedfile_serial_16384 ................................. Passed 100.26 sec
922/922 Test #556: bu_mappedfile_parallel_16384 ............................... Passed 120.57 sec
99% tests passed, 12 tests failed out of 922
Total Test time (real) = 125.61 sec
The following tests FAILED:
33 - bu_file (Failed)
374 - bu_color_to_rgb_floats_1 (Failed)
910 - rt_search_tests (Subprocess aborted)
914 - ged_test_tops_moss (Subprocess aborted)
915 - ged_test_list (Subprocess aborted)
916 - ged_test_material (Subprocess aborted)
917 - ged_test_search (Subprocess aborted)
918 - ged_test_drawing_basic (Subprocess aborted)
919 - ged_test_drawing_faceplate (Subprocess aborted)
920 - ged_test_drawing_lod (Subprocess aborted)
921 - ged_test_drawing_select (Subprocess aborted)
922 - ged_test_drawing_quad (Subprocess aborted)
Errors while running CTest
First two look like bugs, subprocess looks to be maybe something wrong in the relocatability dylib/exec editing.
I've not seen those errors - are they only in the in-src-dir config or do they pop up in others as well?
@starseeker I just ran distcheck-full… didn’t see any other errors. I haven’t just run a plain distcheck too but was going to check that also.
Just an update, ran make check and the errors are agnostic. Fails debug and release on those tests, so something newish.
Failures are also on RELEASE branch just the same.
Looking into the lib issues, looks like it's because the unit tests are wired wrong, as if they were installed into bin:
morrison@Miniagua tests % ./ged_test_tops
dyld[47269]: Library not loaded: @executable_path/../lib/libtcl8.6.dylib
Referenced from: <82312D51-F510-33AC-9283-067C3FBAFF54> /Volumes/X10/brlcad.RELEASE/.build.release/src/libged/tests/ged_test_tops
Reason: tried: '/Volumes/X10/brlcad.RELEASE/.build.release/src/libged/lib/libtcl8.6.dylib' (no such file)
zsh: abort ./ged_test_tops
morrison@Miniagua tests % DYLD_LIBRARY_PATH=../../../lib ./ged_test_tops
Usage: ./ged_test_tops file.g
morrison@Miniagua tests % otool -L ./ged_test_tops
./ged_test_tops:
@rpath/libged.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libanalyze.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libwdb.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/liboptical.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@executable_path/../lib/libtcl8.6.dylib (compatibility version 8.6.0, current version 8.6.14)
@rpath/libdm.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libicv.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libnetpbm.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libutahrle.19.dylib (compatibility version 19.0.0, current version 19.0.1)
@rpath/librt.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libbrep.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libnmg.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libbv.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libbg.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libbn.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libOpenNURBS.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libmanifold.2.dylib (compatibility version 2.0.0, current version 2.4.5)
@rpath/libassimp.5.dylib (compatibility version 5.0.0, current version 5.4.1)
@rpath/libgeogram.1.dylib (compatibility version 1.0.0, current version 1.9.0)
@rpath/libpkg.20.dylib (compatibility version 20.0.0, current version 20.0.1)
@rpath/libbu.20.dylib (compatibility version 20.0.0, current version 20.0.1)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1)
@rpath/png.framework/Versions/1.6.43/png (compatibility version 0.0.0, current version 0.0.0)
@rpath/libz_brl.1.dylib (compatibility version 1.0.0, current version 1.3.1)
@rpath/liblmdb.dylib (compatibility version 0.0.0, current version 0.0.0)
@rpath/libregex_brl.1.dylib (compatibility version 1.0.0, current version 1.0.4)
morrison@Miniagua tests % pwd
/Volumes/X10/brlcad.RELEASE/.build.release/src/libged/tests
@starseeker investigated the dyld issue a few hours and don't understand why libtcl is being changed/installed with a @executable_path into the lib. Do you recall why?
That seems to fundamentally be the issue. If I replace @executable_path/../lib/libtcl8.6.dylib with @rpath/libtcl8.6.dylib like all the other libs, it works just fine. I had to replace them in libtcl itself as well as all the libs and apps that link tcl, but presumably they're inheriting from libtcl's original setting.
I found the section in the patch file where you override it, but wanted to check why before mucking with it.
At a guess I probably was using the settings from elsewhere in our build, and didn't properly clue in that I needed to go that route for libtcl
No other library appears to have/do that.. don't see it anywhere else, but maybe I'm missing it. That's why I was wondering why you have that specific patch being applied to Tcl's build logic, to make it use @executable_path.
I don't recall anything specific - most likely an error on my part, would be my guess
I might have an idea about what I was thinking, looking at the Tcl patch. The first part of the patch, with the logic for CFG_RUNTIME and CFG_INSTALL, depends on Tcl_GetNameOfExecutable having something that will work with the relative paths.
It's basically similar to what happens when _bu_dir_brlcad_root calls wai_getExecutablePath - except in the Tcl case we don't have the equivalent of a fallback to wai_getModulePath
In a "normal" Tcl install those paths are absolute to the install path, so it's not an issue, but that's not an option for a relocatable Tcl install.
That may be why I went ahead and set that DYLIB_INSTALL_DIR the way I did, since there wouldn't be any expectation of things working without a suitable path from Tcl_GetNameOfExecutable
I'll try the @rpath setting anyway and we can see what happens
Pushed to main
At least on a auto debug rebuild, I'm hitting an error that is probably unrelated, but same error I think bill was getting:
[ 6%] Building C object CMakeFiles/itcl3.4.dir/generic/itclStubInit.c.o
-- stderr output is:
In file included from /Volumes/X10/brlcad/.build/bext_build/itcl/ITCL_BLD-prefix/src/ITCL_BLD/generic/itclStubInit.c:12:
/Volumes/X10/brlcad/.build/bext_build/itcl/ITCL_BLD-prefix/src/ITCL_BLD/generic/itclInt.h:50:10: fatal error: 'tclInt.h' file not found
#include "tclInt.h"
^~~~~~~~~~
1 error generated.
make[5]: *** [CMakeFiles/itcl3.4.dir/generic/itclStubInit.c.o] Error 1
On a debug BRLCAD_BEXT_DIR build, it gets much much farther, but fails during asc2g:
[ 62%] Built target asc2g
[ 62%] Generating ../share/db/bldg391.g
[ 62%] Built target bldg391.g
[ 62%] Generating ../share/db/m35.g
[ 62%] Built target m35.g
[ 62%] Generating ../share/db/moss.g
[ 62%] Built target moss.g
[ 62%] Generating ../share/db/sphflake.g
[ 62%] Built target sphflake.g
[ 62%] Generating ../share/db/star.g
[ 62%] Built target star.g
[ 62%] Generating ../share/db/world.g
[ 62%] Built target world.g
[ 62%] Generating ../share/db/aet.g
CMake Error at aet.cmake:36 (message):
[aet] Failure: 1
/Volumes/X10/brlcad/.build.debug/bin/asc2g
/Volumes/X10/brlcad/db/aet.asc;/Volumes/X10/brlcad/.build.debug/share/db/aet.g"
Failed to process input file (/Volumes/X10/brlcad/db/aet.asc)!
unknown command: title
release fails also on same asc2g
OK. The latter means Tcl isn't functioning correctly - probably not initializing correctly. If you try manually running btclsh or bwish I would expect some kind of error report?
The tclInt.h bit is Itcl not finding an internal Tcl header it needs to build
Hold on the latter -- it doesn't look like it applied your rpath change
The Itcl CMakeLists.txt file should be telling the Itcl build where to go looking for Tcl private headers - we should be passing in TCL_SOURCE_DIR to the parent ExternalProject_Add for Itcl
-DTCL_SOURCE_DIR=${CMAKE_SOURCE_DIR}/tcl/tcl
sigh .. so I must be crazy, but outside of blowing away an entire bext build dir, how "should" I be getting libtcl to recompile correctly?
If the tcl/tcl submodule isn't populated, that's when I would expect it not to find the header
I clear the build subdirectory for the specific target I'm wanting to reset. So, if I needed to start over with Tcl, I'd clear the build/tcl subdirectory
CMake should automatically re-run and reset things
just doing a git pull and rebuild is what I'd expect and that didn't work.
Once the ExternalProject_Add outputs are populated, an update of the source directory doesn't (in my experience) reliably reset things
so make clean in bext/.build/tcl ?
That might do it - I would straight up remove bext/.build/tcl
You can then do make TCL_BLD to just redo the Tcl build and dependencies
starseeker said:
Once the ExternalProject_Add outputs are populated, an update of the source directory doesn't (in my experience) reliably reset things
Is there intuition as to why that's not working? Is that because of things in our cmake? What can we do about it barring massive deletions?
Because that's going to bite ... every time we git pull on bext, going to have to either blow everything away whenever there's an update (which is going to be crazy dev cycle times) or hope to catch which subdirs update each pull and manually make-clean each of them (make clean in tcl subdir worked btw)
I'm not completely sure of the reasons. I do have the parent build always re-executing the build steps every time the targets are run, specifically to try to catch such things, but I think that may be only the build step.
I did notice it seems to reliably avoid rework once compile + install is complete
I don't know if it will re-execute the configure step reliably - that may be up to the build system itself (the way CMake spots and knows to re-run configure if the build files change.)
So if Tcl isn't smart about that, we may not be getting reconfigures we need after a source update
Here's another error, bext release rebuild (after clearing and rebuilding tcl):
CMake Error at /Volumes/X10/brlcad.bext/.build.release/qt/Qt6_BLD-prefix/src/Qt6_BLD-stamp/Qt6_BLD-build-Release.cmake:37 (message):
Command failed: 1
'/opt/homebrew/Cellar/cmake/3.28.3/bin/cmake' '-DCMAKE_BUILD_TYPE=Release' '-P' '/Volumes/X10/brlcad.bext/.build.release/qt/qt_build.cmake'
See also
/Volumes/X10/brlcad.bext/.build.release/qt/Qt6_BLD-prefix/src/Qt6_BLD-stamp/Qt6_BLD-build-*.log
-- stdout output is:
-- stderr output is:
CMake Error at /Volumes/X10/brlcad.bext/.build.release/qt/qt_build.cmake:18 (message):
Qt build failed: [ 0%] Built target syncqt
[ 1%] Built target Core_lib_pri
[ 1%] Built target qmodule_pri
Error copying directory from
"/Volumes/X10/brlcad.bext/.build.release/qt/qt6-build/include/QtCore/.syncqt_staging"
to
"/Volumes/X10/brlcad.bext/.build.release/qt/qt6-build/lib/QtCore.framework/Versions/A/Headers".
make[5]: *** [src/corelib/CMakeFiles/Core_copy_fw_sync_headers] Error 1
make[4]: *** [src/corelib/CMakeFiles/Core_copy_fw_sync_headers.dir/all]
That one is surprising - that appears to be internal to the Qt build itself
That's why I remove the tcl subdirectory in such a case - it reliably reconfigures and rebuilds from scratch
CMake Warning (dev) at /opt/homebrew/Cellar/cmake/3.28.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
The package name passed to find_package_handle_standard_args
(OpenCV)
does not match the name of the calling package (OPENCV). This can lead to
problems in calling code that expects find_package
result variables
(e.g., _FOUND
) to follow a certain pattern.
Call Stack (most recent call first):
/opt/homebrew/lib/cmake/opencv4/OPENCVConfig.cmake:354 (find_package_handle_standard_args)
CMakeLists.txt:374 (find_package)
opencv/CMakeLists.txt:4 (bext_enable)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at /opt/homebrew/Cellar/cmake/3.28.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:447 (message):
find_package()
specify a version range but the module TCL does not
support this capability. Only the lower endpoint of the range will be
used.
Call Stack (most recent call first):
CMake/FindTCL.cmake:461 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
tcl/CMakeLists.txt:24 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
Looks like the @rpath fix to Tcl is working, at least preliminary build and make check are back to working again without that issue.
Remaining check failures are:
The following tests FAILED:
33 - bu_file (Failed)
374 - bu_color_to_rgb_floats_1 (Failed)
918 - ged_test_drawing_basic (Failed)
920 - ged_test_drawing_lod (Failed)
921 - ged_test_drawing_select (Failed)
fixed bu_file, test had a flawed assumption about defaults
fixed bu_color_to_rgb_floats_1
(note @rpath is fixed on bext main, not bext RELEASE)
OK, I got a look at the drawing failures
The lod failure is doing an exact check on an image that is off by a single pixel, so that should probably just be switched to an approximate check.
There may be more failures after that one, so I'll have to try and see what happens.
Actually it's one pixel off by one, even
The basic test difference is a bit larger, but looking at the diff image it appears to be a difference in how the tor got tessellated
Or more likely (since the prior draw of the bot didn't trigger) it's a difference in which lines ended up visible in the "hidden line" draw mode.
Looks like the approximate compare is still a bit too strict for that one, so I'll adjust
The real problem is the third case.
To my astonishment, it is in fact failing to tessellate the tor primitive
This is due to the select test changing the relative tolerance before facetizing the tor
Running the process manually we can see more info:
mged> facetize -vv tor t2.bot
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods NMG --method-opts NMG nmg_debug=0x00000000 tol_abs=0.00000000000000000 tol_rel=0.00010000000000000 tol_norm=0.00000000000000000 --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
cut_unimonotone(): infinite loop 0x60000231aac0
cut_unimonotone(): infinite loop
FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods CM --method-opts "CM" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
CM: error at size 10.1554
CM: retrying with size 1.01554
CM: error at size 1.01554
CM: retrying with size 0.507772
CM: surface reconstruction failed: tor
FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods CO3NE --method-opts "CO3NE" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
Geogram result not manifold
o-[reconstruct ] Elapsed time: 3.013s
FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods SPSR --method-opts "SPSR" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
SPSR: decimating with feature size: 1.30208
bu_mtx_trylock() failed
bu_mtx_trylock() failed
Saving stack trace to facetize_process-83790-bomb.log
bu_mtx_lock() failed
bu_mtx_lock() failed
FAILED.
Primitive tessellation summary:
1 object(s) failed:
tor
It's annoying that the fallback methods fail (the SPSR decimation is where those bu_mtx failures are coming from) but the real problem is the NMG "cut_unimonotone" failure.
For a basic tor like that we should never have gotten to the fallbacks at all
The tol setting from the select.cpp setup is: "tol rel 0.0001"
A facetize without changing the rel tolerance does succeed
Looks like in the LoD test I changed this to 0.0002, probably to avoid a similar failure. Not sure why I missed the select.
OK, we'll see if a248e798bc2c passes. If so I'll pull the necessary changes over to RELEASE.
At some point we'll need to do something about that NMG failure, but that could be quite the rabbit hole
Last updated: Jan 09 2025 at 00:46 UTC