Stream: brlcad

Topic: distcheck


view this post on Zulip Sean (Aug 01 2024 at 17:30):

Anyone else seeing distcheck-full errors on main? I'm getting a handful that look like valid bugs:

...
 35/922 Test  #33: bu_file ....................................................***Failed    0.01 sec

file [FAIL] test input file bu_file_test_dir/bu_file_1 is incorrectly reported to be executable
...
373/922 Test #374: bu_color_to_rgb_floats_1 ...................................***Failed    0.00 sec
Result: 0.752941,0.305882,0.839216
...
Start 922: ged_test_drawing_quad
919/922 Test #922: ged_test_drawing_quad ......................................Subprocess aborted***Exception:   0.00 sec
dyld[94599]: Library not loaded: @executable_path/../lib/libtcl8.6.dylib
  Referenced from: <77324E71-4FC8-3FA9-B662-6DD9330C4344> /Volumes/X10/brlcad/.build.debug/distcheck-in_src_dir/brlcad-7.40.1/src/libged/tests/draw/ged_test_quad
  Reason: tried: '/Volumes/X10/brlcad/.build.debug/distcheck-in_src_dir/brlcad-7.40.1/src/libged/tests/lib/libtcl8.6.dylib' (no such file)

920/922 Test #907: rt_cache_serial_multiple_different_objects .................   Passed    7.23 sec
921/922 Test #555: bu_mappedfile_serial_16384 .................................   Passed  100.26 sec
922/922 Test #556: bu_mappedfile_parallel_16384 ...............................   Passed  120.57 sec

99% tests passed, 12 tests failed out of 922

Total Test time (real) = 125.61 sec

The following tests FAILED:
         33 - bu_file (Failed)
        374 - bu_color_to_rgb_floats_1 (Failed)
        910 - rt_search_tests (Subprocess aborted)
        914 - ged_test_tops_moss (Subprocess aborted)
        915 - ged_test_list (Subprocess aborted)
        916 - ged_test_material (Subprocess aborted)
        917 - ged_test_search (Subprocess aborted)
        918 - ged_test_drawing_basic (Subprocess aborted)
        919 - ged_test_drawing_faceplate (Subprocess aborted)
        920 - ged_test_drawing_lod (Subprocess aborted)
        921 - ged_test_drawing_select (Subprocess aborted)
        922 - ged_test_drawing_quad (Subprocess aborted)
Errors while running CTest

First two look like bugs, subprocess looks to be maybe something wrong in the relocatability dylib/exec editing.

view this post on Zulip starseeker (Aug 01 2024 at 18:52):

I've not seen those errors - are they only in the in-src-dir config or do they pop up in others as well?

view this post on Zulip Sean (Aug 01 2024 at 21:31):

@starseeker I just ran distcheck-full… didn’t see any other errors. I haven’t just run a plain distcheck too but was going to check that also.

view this post on Zulip Sean (Aug 02 2024 at 01:07):

Just an update, ran make check and the errors are agnostic. Fails debug and release on those tests, so something newish.

view this post on Zulip Sean (Aug 02 2024 at 06:08):

Failures are also on RELEASE branch just the same.

Looking into the lib issues, looks like it's because the unit tests are wired wrong, as if they were installed into bin:

morrison@Miniagua tests % ./ged_test_tops
dyld[47269]: Library not loaded: @executable_path/../lib/libtcl8.6.dylib
  Referenced from: <82312D51-F510-33AC-9283-067C3FBAFF54> /Volumes/X10/brlcad.RELEASE/.build.release/src/libged/tests/ged_test_tops
  Reason: tried: '/Volumes/X10/brlcad.RELEASE/.build.release/src/libged/lib/libtcl8.6.dylib' (no such file)
zsh: abort      ./ged_test_tops
morrison@Miniagua tests % DYLD_LIBRARY_PATH=../../../lib ./ged_test_tops
Usage: ./ged_test_tops file.g
morrison@Miniagua tests % otool -L ./ged_test_tops
./ged_test_tops:
    @rpath/libged.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libanalyze.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libwdb.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/liboptical.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @executable_path/../lib/libtcl8.6.dylib (compatibility version 8.6.0, current version 8.6.14)
    @rpath/libdm.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libicv.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libnetpbm.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libutahrle.19.dylib (compatibility version 19.0.0, current version 19.0.1)
    @rpath/librt.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libbrep.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libnmg.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libbv.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libbg.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libbn.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libOpenNURBS.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libmanifold.2.dylib (compatibility version 2.0.0, current version 2.4.5)
    @rpath/libassimp.5.dylib (compatibility version 5.0.0, current version 5.4.1)
    @rpath/libgeogram.1.dylib (compatibility version 1.0.0, current version 1.9.0)
    @rpath/libpkg.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    @rpath/libbu.20.dylib (compatibility version 20.0.0, current version 20.0.1)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1)
    @rpath/png.framework/Versions/1.6.43/png (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libz_brl.1.dylib (compatibility version 1.0.0, current version 1.3.1)
    @rpath/liblmdb.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libregex_brl.1.dylib (compatibility version 1.0.0, current version 1.0.4)
morrison@Miniagua tests % pwd
/Volumes/X10/brlcad.RELEASE/.build.release/src/libged/tests

view this post on Zulip Sean (Aug 02 2024 at 20:24):

@starseeker investigated the dyld issue a few hours and don't understand why libtcl is being changed/installed with a @executable_path into the lib. Do you recall why?

That seems to fundamentally be the issue. If I replace @executable_path/../lib/libtcl8.6.dylib with @rpath/libtcl8.6.dylib like all the other libs, it works just fine. I had to replace them in libtcl itself as well as all the libs and apps that link tcl, but presumably they're inheriting from libtcl's original setting.

I found the section in the patch file where you override it, but wanted to check why before mucking with it.

view this post on Zulip starseeker (Aug 02 2024 at 22:27):

At a guess I probably was using the settings from elsewhere in our build, and didn't properly clue in that I needed to go that route for libtcl

view this post on Zulip Sean (Aug 03 2024 at 05:50):

No other library appears to have/do that.. don't see it anywhere else, but maybe I'm missing it. That's why I was wondering why you have that specific patch being applied to Tcl's build logic, to make it use @executable_path.

view this post on Zulip starseeker (Aug 03 2024 at 13:14):

I don't recall anything specific - most likely an error on my part, would be my guess

view this post on Zulip starseeker (Aug 05 2024 at 22:33):

I might have an idea about what I was thinking, looking at the Tcl patch. The first part of the patch, with the logic for CFG_RUNTIME and CFG_INSTALL, depends on Tcl_GetNameOfExecutable having something that will work with the relative paths.

view this post on Zulip starseeker (Aug 05 2024 at 22:34):

It's basically similar to what happens when _bu_dir_brlcad_root calls wai_getExecutablePath - except in the Tcl case we don't have the equivalent of a fallback to wai_getModulePath

view this post on Zulip starseeker (Aug 05 2024 at 22:36):

In a "normal" Tcl install those paths are absolute to the install path, so it's not an issue, but that's not an option for a relocatable Tcl install.

view this post on Zulip starseeker (Aug 05 2024 at 22:38):

That may be why I went ahead and set that DYLIB_INSTALL_DIR the way I did, since there wouldn't be any expectation of things working without a suitable path from Tcl_GetNameOfExecutable

view this post on Zulip starseeker (Aug 05 2024 at 23:28):

I'll try the @rpath setting anyway and we can see what happens

view this post on Zulip starseeker (Aug 05 2024 at 23:30):

Pushed to main

view this post on Zulip Sean (Aug 06 2024 at 14:38):

At least on a auto debug rebuild, I'm hitting an error that is probably unrelated, but same error I think bill was getting:

[  6%] Building C object CMakeFiles/itcl3.4.dir/generic/itclStubInit.c.o
-- stderr output is:

In file included from /Volumes/X10/brlcad/.build/bext_build/itcl/ITCL_BLD-prefix/src/ITCL_BLD/generic/itclStubInit.c:12:

/Volumes/X10/brlcad/.build/bext_build/itcl/ITCL_BLD-prefix/src/ITCL_BLD/generic/itclInt.h:50:10: fatal error: 'tclInt.h' file not found

#include "tclInt.h"

^~~~~~~~~~

1 error generated.

make[5]: *** [CMakeFiles/itcl3.4.dir/generic/itclStubInit.c.o] Error 1

view this post on Zulip Sean (Aug 06 2024 at 14:39):

On a debug BRLCAD_BEXT_DIR build, it gets much much farther, but fails during asc2g:

[ 62%] Built target asc2g

[ 62%] Generating ../share/db/bldg391.g

[ 62%] Built target bldg391.g

[ 62%] Generating ../share/db/m35.g

[ 62%] Built target m35.g

[ 62%] Generating ../share/db/moss.g

[ 62%] Built target moss.g

[ 62%] Generating ../share/db/sphflake.g

[ 62%] Built target sphflake.g

[ 62%] Generating ../share/db/star.g

[ 62%] Built target star.g

[ 62%] Generating ../share/db/world.g

[ 62%] Built target world.g

[ 62%] Generating ../share/db/aet.g

CMake Error at aet.cmake:36 (message):

[aet] Failure: 1

/Volumes/X10/brlcad/.build.debug/bin/asc2g

/Volumes/X10/brlcad/db/aet.asc;/Volumes/X10/brlcad/.build.debug/share/db/aet.g"

Failed to process input file (/Volumes/X10/brlcad/db/aet.asc)!

unknown command: title

view this post on Zulip Sean (Aug 06 2024 at 14:39):

release fails also on same asc2g

view this post on Zulip starseeker (Aug 06 2024 at 14:40):

OK. The latter means Tcl isn't functioning correctly - probably not initializing correctly. If you try manually running btclsh or bwish I would expect some kind of error report?

view this post on Zulip starseeker (Aug 06 2024 at 14:41):

The tclInt.h bit is Itcl not finding an internal Tcl header it needs to build

view this post on Zulip Sean (Aug 06 2024 at 14:41):

Hold on the latter -- it doesn't look like it applied your rpath change

view this post on Zulip starseeker (Aug 06 2024 at 14:43):

The Itcl CMakeLists.txt file should be telling the Itcl build where to go looking for Tcl private headers - we should be passing in TCL_SOURCE_DIR to the parent ExternalProject_Add for Itcl

view this post on Zulip starseeker (Aug 06 2024 at 14:43):

-DTCL_SOURCE_DIR=${CMAKE_SOURCE_DIR}/tcl/tcl

view this post on Zulip Sean (Aug 06 2024 at 14:44):

sigh .. so I must be crazy, but outside of blowing away an entire bext build dir, how "should" I be getting libtcl to recompile correctly?

view this post on Zulip starseeker (Aug 06 2024 at 14:44):

If the tcl/tcl submodule isn't populated, that's when I would expect it not to find the header

view this post on Zulip starseeker (Aug 06 2024 at 14:44):

I clear the build subdirectory for the specific target I'm wanting to reset. So, if I needed to start over with Tcl, I'd clear the build/tcl subdirectory

view this post on Zulip starseeker (Aug 06 2024 at 14:45):

CMake should automatically re-run and reset things

view this post on Zulip Sean (Aug 06 2024 at 14:45):

just doing a git pull and rebuild is what I'd expect and that didn't work.

view this post on Zulip starseeker (Aug 06 2024 at 14:45):

Once the ExternalProject_Add outputs are populated, an update of the source directory doesn't (in my experience) reliably reset things

view this post on Zulip Sean (Aug 06 2024 at 14:45):

so make clean in bext/.build/tcl ?

view this post on Zulip starseeker (Aug 06 2024 at 14:46):

That might do it - I would straight up remove bext/.build/tcl

view this post on Zulip starseeker (Aug 06 2024 at 14:46):

You can then do make TCL_BLD to just redo the Tcl build and dependencies

view this post on Zulip Sean (Aug 06 2024 at 14:52):

starseeker said:

Once the ExternalProject_Add outputs are populated, an update of the source directory doesn't (in my experience) reliably reset things

Is there intuition as to why that's not working? Is that because of things in our cmake? What can we do about it barring massive deletions?

view this post on Zulip Sean (Aug 06 2024 at 14:54):

Because that's going to bite ... every time we git pull on bext, going to have to either blow everything away whenever there's an update (which is going to be crazy dev cycle times) or hope to catch which subdirs update each pull and manually make-clean each of them (make clean in tcl subdir worked btw)

view this post on Zulip starseeker (Aug 06 2024 at 14:58):

I'm not completely sure of the reasons. I do have the parent build always re-executing the build steps every time the targets are run, specifically to try to catch such things, but I think that may be only the build step.

view this post on Zulip Sean (Aug 06 2024 at 14:58):

I did notice it seems to reliably avoid rework once compile + install is complete

view this post on Zulip starseeker (Aug 06 2024 at 14:58):

I don't know if it will re-execute the configure step reliably - that may be up to the build system itself (the way CMake spots and knows to re-run configure if the build files change.)

view this post on Zulip starseeker (Aug 06 2024 at 14:59):

So if Tcl isn't smart about that, we may not be getting reconfigures we need after a source update

view this post on Zulip Sean (Aug 06 2024 at 14:59):

Here's another error, bext release rebuild (after clearing and rebuilding tcl):

CMake Error at /Volumes/X10/brlcad.bext/.build.release/qt/Qt6_BLD-prefix/src/Qt6_BLD-stamp/Qt6_BLD-build-Release.cmake:37 (message):

Command failed: 1

'/opt/homebrew/Cellar/cmake/3.28.3/bin/cmake' '-DCMAKE_BUILD_TYPE=Release' '-P' '/Volumes/X10/brlcad.bext/.build.release/qt/qt_build.cmake'

See also

/Volumes/X10/brlcad.bext/.build.release/qt/Qt6_BLD-prefix/src/Qt6_BLD-stamp/Qt6_BLD-build-*.log
-- stdout output is:
-- stderr output is:

CMake Error at /Volumes/X10/brlcad.bext/.build.release/qt/qt_build.cmake:18 (message):

Qt build failed: [ 0%] Built target syncqt

[ 1%] Built target Core_lib_pri

[ 1%] Built target qmodule_pri

Error copying directory from

"/Volumes/X10/brlcad.bext/.build.release/qt/qt6-build/include/QtCore/.syncqt_staging"

to

"/Volumes/X10/brlcad.bext/.build.release/qt/qt6-build/lib/QtCore.framework/Versions/A/Headers".

make[5]: *** [src/corelib/CMakeFiles/Core_copy_fw_sync_headers] Error 1

make[4]: *** [src/corelib/CMakeFiles/Core_copy_fw_sync_headers.dir/all]

view this post on Zulip starseeker (Aug 06 2024 at 15:00):

That one is surprising - that appears to be internal to the Qt build itself

view this post on Zulip starseeker (Aug 06 2024 at 15:01):

That's why I remove the tcl subdirectory in such a case - it reliably reconfigures and rebuilds from scratch

view this post on Zulip Sean (Aug 06 2024 at 15:22):


CMake Warning (dev) at /opt/homebrew/Cellar/cmake/3.28.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:438 (message):

The package name passed to find_package_handle_standard_args (OpenCV)

does not match the name of the calling package (OPENCV).  This can lead to

problems in calling code that expects find_package result variables

(e.g., _FOUND) to follow a certain pattern.

Call Stack (most recent call first):

/opt/homebrew/lib/cmake/opencv4/OPENCVConfig.cmake:354 (find_package_handle_standard_args)

CMakeLists.txt:374 (find_package)

opencv/CMakeLists.txt:4 (bext_enable)

This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at /opt/homebrew/Cellar/cmake/3.28.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:447 (message):

find_package() specify a version range but the module TCL does not

support this capability.  Only the lower endpoint of the range will be

used.

Call Stack (most recent call first):

CMake/FindTCL.cmake:461 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)

tcl/CMakeLists.txt:24 (find_package)

This warning is for project developers.  Use -Wno-dev to suppress it.

view this post on Zulip Sean (Aug 07 2024 at 15:27):

Looks like the @rpath fix to Tcl is working, at least preliminary build and make check are back to working again without that issue.

Remaining check failures are:

The following tests FAILED:

33 - bu_file (Failed)
374 - bu_color_to_rgb_floats_1 (Failed)
918 - ged_test_drawing_basic (Failed)
920 - ged_test_drawing_lod (Failed)
921 - ged_test_drawing_select (Failed)

view this post on Zulip Sean (Aug 07 2024 at 17:25):

fixed bu_file, test had a flawed assumption about defaults

view this post on Zulip Sean (Aug 07 2024 at 18:28):

fixed bu_color_to_rgb_floats_1

view this post on Zulip Sean (Aug 07 2024 at 18:30):

(note @rpath is fixed on bext main, not bext RELEASE)

view this post on Zulip starseeker (Aug 14 2024 at 15:28):

OK, I got a look at the drawing failures

view this post on Zulip starseeker (Aug 14 2024 at 15:38):

The lod failure is doing an exact check on an image that is off by a single pixel, so that should probably just be switched to an approximate check.

view this post on Zulip starseeker (Aug 14 2024 at 15:38):

There may be more failures after that one, so I'll have to try and see what happens.

view this post on Zulip starseeker (Aug 14 2024 at 15:39):

Actually it's one pixel off by one, even

view this post on Zulip starseeker (Aug 14 2024 at 15:40):

The basic test difference is a bit larger, but looking at the diff image it appears to be a difference in how the tor got tessellated

view this post on Zulip starseeker (Aug 14 2024 at 15:40):

v_23_diff.png

view this post on Zulip starseeker (Aug 14 2024 at 15:42):

Or more likely (since the prior draw of the bot didn't trigger) it's a difference in which lines ended up visible in the "hidden line" draw mode.

view this post on Zulip starseeker (Aug 14 2024 at 15:44):

Looks like the approximate compare is still a bit too strict for that one, so I'll adjust

view this post on Zulip starseeker (Aug 14 2024 at 15:44):

The real problem is the third case.

view this post on Zulip starseeker (Aug 14 2024 at 15:45):

To my astonishment, it is in fact failing to tessellate the tor primitive

view this post on Zulip starseeker (Aug 14 2024 at 15:45):

This is due to the select test changing the relative tolerance before facetizing the tor

view this post on Zulip starseeker (Aug 14 2024 at 15:46):

Running the process manually we can see more info:

mged> facetize -vv tor t2.bot
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods NMG --method-opts NMG nmg_debug=0x00000000 tol_abs=0.00000000000000000 tol_rel=0.00010000000000000 tol_norm=0.00000000000000000 --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
cut_unimonotone(): infinite loop 0x60000231aac0

cut_unimonotone(): infinite loop

 FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods CM --method-opts "CM" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
CM: error at size 10.1554
CM: retrying with size 1.01554
CM: error at size 1.01554
CM: retrying with size 0.507772
CM: surface reconstruction failed: tor
 FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods CO3NE --method-opts "CO3NE" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
Geogram result not manifold
o-[reconstruct ] Elapsed time: 3.013s
 FAILED.
/Users/user/brlcad/build/bin/ged_exec facetize_process -O /Users/user/.cache/BRL-CAD/facetize_5733869552563525943/facetize_moss_select_tmp.g --methods SPSR --method-opts "SPSR" --cache-dir /Users/user/.cache/BRL-CAD tor
Attempting to triangulate tor...
feature_size: 0.000000
feature_scale: 0.150000
target_feature_size: 0.761658
SPSR: decimating with feature size: 1.30208

bu_mtx_trylock() failed

bu_mtx_trylock() failed
Saving stack trace to facetize_process-83790-bomb.log

bu_mtx_lock() failed

bu_mtx_lock() failed
 FAILED.

Primitive tessellation summary:

1 object(s) failed:
        tor

view this post on Zulip starseeker (Aug 14 2024 at 15:47):

It's annoying that the fallback methods fail (the SPSR decimation is where those bu_mtx failures are coming from) but the real problem is the NMG "cut_unimonotone" failure.

view this post on Zulip starseeker (Aug 14 2024 at 15:47):

For a basic tor like that we should never have gotten to the fallbacks at all

view this post on Zulip starseeker (Aug 14 2024 at 15:48):

The tol setting from the select.cpp setup is: "tol rel 0.0001"

view this post on Zulip starseeker (Aug 14 2024 at 15:48):

A facetize without changing the rel tolerance does succeed

view this post on Zulip starseeker (Aug 14 2024 at 15:49):

Looks like in the LoD test I changed this to 0.0002, probably to avoid a similar failure. Not sure why I missed the select.

view this post on Zulip starseeker (Aug 14 2024 at 16:29):

OK, we'll see if a248e798bc2c passes. If so I'll pull the necessary changes over to RELEASE.

view this post on Zulip starseeker (Aug 14 2024 at 16:30):

At some point we'll need to do something about that NMG failure, but that could be quite the rabbit hole


Last updated: Oct 09 2024 at 00:44 UTC