regress-pkg · brlcad · Zulip Chat Archive

@Sean I'm not able to reproduce r75847 on GhostBSD, OpenBSD or bz - are there specific conditions that trigger it?

Sean (May 20 2020 at 19:05):

Sean (May 20 2020 at 19:06):

or did at least once. i haven't been back to check it, been debugging it for a while.

Sean (May 20 2020 at 19:14):

@starseeker debugging is going slow though. took a while just to get at the actual calls.

the regression tests are a bit of a mess.. some using macros, some using functions, some using nothing, some logging, some not logging, all slightly different from the rest. any relief planned?

even noticed undefined global keywords like EXCLUDE_FROM_DEFAULT_BUILD and EXCLUDE_FROM _REGRESS that trigger special logic elsewhere (e.g., top-level cmakelists.txt file ). that makes it super hard to figure out what's going on, frustrating because of the unnecessary spaghet.

starseeker (May 20 2020 at 19:18):

Yeah, most of those grew organically and weren't well designed for multi-config setups.

Sean (May 20 2020 at 19:19):

starseeker (May 20 2020 at 19:21):

Mostly has to do with the need to launch built executable programs from configuration dependent output directories. I need to look into whether generator expressions in newer CMake can help clean that up, but it was a real pain originally

Sean (May 20 2020 at 19:21):

starseeker (May 20 2020 at 19:22):

It may now - that wasn't an option when most of them were written. regress-repository and regress-license experiment with using those expressions - so far successfully

Sean (May 20 2020 at 19:22):

starseeker (May 20 2020 at 19:23):

Log and stamp files are another issue though - if you switch configs in MSVC, you need everything to look for configuration dependent stamp files to know to re-run tests.

starseeker (May 20 2020 at 19:23):

Those aren't an issue with repo and license - those just run every time the target is called. Maybe that's what all the regress targets should do - would simplify that a fair bit

starseeker (May 20 2020 at 19:25):

Also depends on how much you want logged to text files - Running things from execute_process in CMake allows capture of stdout and stderror to per-test log files.

Sean (May 20 2020 at 19:25):

30/37 Test #33: regress-nurbs-nirt-NIST03_MISS_01 .........   Passed   37.87 sec
31/37 Test  #3: regress-solids ............................   Passed    8.65 sec
32/37 Test #48: regress-repository ........................   Passed   47.49 sec
33/37 Test #46: regress-gchecker ..........................   Passed   21.88 sec
34/37 Test  #6: regress-rtedge ............................   Passed   60.01 sec
35/37 Test #47: regress-licenses ..........................   Passed   26.60 sec
36/37 Test #14: regress-red ...............................   Passed   68.78 sec
37/37 Test #36: regress-pkg ...............................***Failed  1009.37 sec

starseeker (May 20 2020 at 19:26):

(looks like latest jenkins tests either succeeded or are failing on usual rtedge issue unless I'm looking at the wrong thing...)

Sean (May 20 2020 at 19:26):

starseeker (May 20 2020 at 19:26):

starseeker (May 20 2020 at 19:27):

starseeker (May 20 2020 at 19:28):

I thought I set regress-pkg to not be part of the make check/make regress targets - let me see if I goofed that up somehow

starseeker (May 20 2020 at 19:29):

Huh. r75832 should have taken it out of both regress and check - only make test or a specific run of the regress-pkg target should trigger it.

starseeker (May 20 2020 at 19:31):

@Sean will the jenkins instance run multiple builds in parallel (i.e. if commits some in one after the other?)

starseeker (May 20 2020 at 19:32):

Sean (May 20 2020 at 19:32):

Sean (May 20 2020 at 19:34):

the stall feels like a race condition and that's what the log seems to confirm to some extent, like the client runs before the server, exits, and then the server runs and waits forever, just due to how the threads got scheduled.

Sean (May 20 2020 at 19:35):

this probably needs to be accommodated in the client code. I noticed it exits immediately, which is not the behavior needed here.

Sean (May 20 2020 at 19:36):

starseeker (May 20 2020 at 19:38):

So you're thinking around line 94 to loop for a little while if the initial connection attempt fails?

starseeker (May 20 2020 at 19:39):

Probably could also have the sever time out after a second or so - this test should be virtually instantaneous if it's running correctly

starseeker (May 20 2020 at 19:41):

@Sean you want me to go ahead and add some timing logic to see if I can force it to behave a little better if something goes wrong?

starseeker (May 20 2020 at 20:06):

r75850 should help some. I'm not seeing a non-blocking option to pkg_process in pkg.h offhand...

Sean (May 20 2020 at 20:13):

a second is likely not enough, that depends on the kernel. starting a process can commonly take 0-5 depending on operating system, level of concurrency, and other factors

Sean (May 20 2020 at 20:13):

Sean (May 20 2020 at 20:14):

should document those magic numbers (which is what the BU_SEC2USEC macro does in effect, or a comment)

Sean (May 20 2020 at 20:14):

not sure about making the server non-blocking on getting clients, that doesn't seem right

Sean (May 20 2020 at 20:15):

that's just going to make it spin hot and/or fall right into the error condition

starseeker (May 20 2020 at 20:16):

starseeker (May 20 2020 at 20:17):

Only other option that came to mind was launching the connection wait from a thread, which seemed like a fair bit of complication for what's supposed to be something simple...

Sean (May 20 2020 at 20:42):

no, you're right -- the sleep alleviates it running hot, but it does fall right into the condition

note, that it doesn't mean it's busy. null return just means there's no client to get.

Sean (May 20 2020 at 20:42):

this is informative, the server is now running and timing out before the client even starts

starseeker (May 20 2020 at 20:45):

starseeker (May 20 2020 at 20:46):

Sean (May 20 2020 at 20:47):

starseeker (May 20 2020 at 20:48):

Sean (May 20 2020 at 20:48):

34/37 Test #36: regress-pkg ...............................***Failed    2.10 sec
Launching server
SERVER thread starting
Listening on port 2000
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Launching client
CLIENT thread starting
Connection inactive for > 1 second, quitting.

Timeout - inactive
SERVER thread ending
Connecting to 127.0.0.1, port 2000
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61

starseeker (May 20 2020 at 20:50):

So... the 1 second wait isn't long enough? It looks like CLIENT thread is starting just before the server bails?

Sean (May 20 2020 at 20:52):

Sean (May 20 2020 at 21:09):

Launching server
SERVER thread starting
Listening on port 2000
No client input available, waiting...
No client input available, waiting...
Launching client
CLIENT thread starting
No client input available, waiting...
No client input available, waiting...
Waiting for client to exit
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
Connection inactive for > 5 seconds, quitting.

Timeout - inactive
SERVER thread ending
Connecting to 127.0.0.1, port 2000
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
Connection to 127.0.0.1, port 2000, failed.

starseeker (May 20 2020 at 21:10):

Sean (May 20 2020 at 21:11):

so the client thread starts, but the client process does not until the second process completes. I wonder, this could be a security or system implementation issue that prevents simultaneous std::system() calls from multiple threads

Sean (May 20 2020 at 21:11):

starseeker (May 20 2020 at 21:13):

Um. Unfortunately, to the best of my knowledge CTest doesn't provide a way to define a test that launches multiple executables. (The controlling regress_pkg program was an attempt to get cute and work around that.)

Sean (May 20 2020 at 21:14):

starseeker (May 20 2020 at 21:14):

Sean (May 20 2020 at 21:14):

starseeker (May 20 2020 at 21:16):

That'll probably do for exercising libpkg. That wasn't why I originally put that example together, so I was probably to focused on the multiple programs sharing information aspect - I'll try the renamed mains and see if communication-across-threads works.

Sean (May 20 2020 at 21:16):

Sean (May 20 2020 at 21:17):

starseeker (May 20 2020 at 21:17):

Docs for add_test don't look like it takes multiple COMMANDs - even if it did, if it's like the other add_ functions it would execute in series.

Sean (May 20 2020 at 21:18):

I thought we had examples calling multiple commands, things that replaced shell scripts

Sean (May 20 2020 at 21:18):

in series is fine -- server is run (stays running), client is run (exits), server is shut down

starseeker (May 20 2020 at 21:19):

I don't think it will launch the client until the server completes - e.g. std::system()

Sean (May 20 2020 at 21:19):

if cmake can run multiple commands in series or otherwise, it's doable as multiprocess

starseeker (May 20 2020 at 21:20):

Er, sorry - wasn't clear. As far as I know, CMake script execution can't do async process launching - it runs one program to exit, then launches the next

Sean (May 20 2020 at 21:20):

starseeker (May 20 2020 at 21:21):

Not following - it would wait for the server to finish, then launch the client - would do just what OSX is doing now.

Sean (May 20 2020 at 21:21):

you run the server, it detaches and keeps running, the process exits (server remains running)

starseeker (May 20 2020 at 21:21):

Sean (May 20 2020 at 21:21):

well not now, we're talking about what it could be made to do in a serial cmake context

Sean (May 20 2020 at 21:22):

starseeker (May 20 2020 at 21:22):

I don't think CMake detaches either - looking for docs now, but I think a script doesn't proceed until program return.

Sean (May 20 2020 at 21:22):

Sean (May 20 2020 at 21:23):

Sean (May 20 2020 at 21:24):

we don't have pkg unit testing, that would be interesting
(and maybe tricky to get right)

starseeker (May 20 2020 at 21:25):

I'll try the multithreading - at this point I'm regretting stepping on the land mine, so if that's the shortest way out of the patch I'm all for it ;-)

Sean (May 20 2020 at 21:38):

agua:.build morrison$ clang++ test2.cpp
agua:.build morrison$ ./a.out
running ONE
running ONE
running ONE
running ONE
running ONE
running ONE
running TWO
running TWO
running TWO
running TWO
running TWO
running TWO

Sean (May 20 2020 at 21:39):

starseeker (May 20 2020 at 21:39):

I just got a multithreaded test working (I think) - I'll pull in your updates and commit so you can give it a whirl

Sean (May 20 2020 at 21:45):

starseeker (May 20 2020 at 21:47):

Sean (May 20 2020 at 22:00):

I think you should do the threaded approach. it's still simpler and avoids std::system (which will show up on a security audit anyways as a high-vulnerability code issue).

starseeker (May 20 2020 at 22:00):

@Sean btw, add_executable in CMake won't install out of the box - you need an install() as well. The BRLCAD_ADDEXEC call does all that automatically, so that's why it has the NO_INSTALL flag. What you did works fine, just thought I'd mention it for clarity

starseeker (May 20 2020 at 22:01):

Sean (May 20 2020 at 22:01):

starseeker (May 20 2020 at 22:03):

I'll sometimes just us add_executable when I don't need to link in lots of stuff, worry about DLL defines, and all the other complex pieces - I shouldn't really, since I've also been bit on a few occasions when I was wrong about not needing it after trying on Windows...

starseeker (May 20 2020 at 22:04):

If the free OSes are going to implement something similar to that DLL import/export business, we should probably enable it even if we don't otherwise care just to more easily test the Windows behavior

Sean (May 20 2020 at 22:05):

I think it's good practice not to just because it creates outliers that beg questions to whomever didn't write that logic

starseeker (May 20 2020 at 22:06):

I really need to get a CI setup somewhere using the new Ninja/CMake multiconfig build - up until that was added it was only Xcode and Visual Studio that could exercise that part of things.

Sean (May 20 2020 at 22:06):

starseeker (May 20 2020 at 22:06):

Sean (May 20 2020 at 22:06):

Sean (May 20 2020 at 22:07):

we could also adopt the forced import/export behavior on linux/mac so it's the same everywhere

Sean (May 20 2020 at 22:08):

starseeker (May 20 2020 at 22:08):

I've never actually seen that done on any platform other than Windows - how does it work?

Sean (May 20 2020 at 22:08):

Sean (May 20 2020 at 22:09):

starseeker (May 20 2020 at 22:09):

Sean (May 20 2020 at 22:09):

starseeker (May 20 2020 at 22:09):

Sean (May 20 2020 at 22:09):

starseeker (May 20 2020 at 22:10):

Much as it irks me in some ways, we should probably mandate it - that's one of the major rolling pain points building on Windows.

Sean (May 20 2020 at 22:11):

will have to do some performance testing as it does affect what is visible to the optimizer as well (could be better or worse, why it warrants testing)

Sean (May 20 2020 at 22:12):

then of course the appropriate dllimport/export decls we use on windows should be used

Sean (May 20 2020 at 22:13):

/me thinks __attribute__ ((dllimport)) is stupid given they made it fully compatible with __declspec(dllimport)

starseeker (May 20 2020 at 22:14):

Trying to remember where we made the DLL bits windows specific... first I'd better make sure I can test for -fvisibility=hidden.

starseeker (May 20 2020 at 22:15):

starseeker (May 20 2020 at 22:20):

Sean (May 21 2020 at 06:03):

starseeker (May 21 2020 at 16:45):

@Sean about the regression testing - I'll go through and convert everything I can to functions.

I can probably remove the stamp file mechanism from the regression tests altogether - that's important for custom build steps, but is unnecessary for regression tests since we want them to run each time one of the targets is invoked. I originally based the regression test code on the other custom runs for the build, and it migrated in with that IIRC.

EXCLUDE_FROM_REGRESS and the other implicit naming conventions are trickier. It might be possible to define a better wrapper function which hides the details to assemble the check and regress targets, but I don't know if you'd like that any better than what's there... I'll take a stab at it, but we'll have to see.

1) current solution - launch test programs with execute_process inside a .cmake script which is run by cmake -P via add_test in the parent build. This captures I/O to log files, but does complicate execution. In particular, the .cmake script has to know how to launch the executable specific to the current configuration, as well as writing out the log files in the appropriate directories.

2) Directly run the regression executable with add_test, no wrapping .cmake file. This has the virtue of simplicity, but to the best of my knowledge this approach has no way to capture the I/O from the test executable to a file. It is captured by ctest, which I believe does do some organization of its reported outputs, but as far as I can discover it has no sophisticated logging options so we give up per-test logging files.

starseeker (May 25 2020 at 18:00):

@Sean upon inspection It looks like the regress-flawfinder target is successfully running, so the failure is flawfinder not liking something

starseeker (May 25 2020 at 19:56):

@Sean bioh branch has the rework of the regression testing system. It should be at a point now where it's worth you taking a look and seeing if it addresses your concerns

starseeker (May 26 2020 at 02:18):

The system is now the BRLCAD_REGRESSION_TEST function in misc/CMake/BRLCAD_Targets.cmake along with the *.cmake.in files and CMakeLists.txt files in the regress directory and subdirectories.

Sean (May 27 2020 at 05:20):

Oh cool. I thought flawinder was left at a really low setting that always detected real code smells, but wasn't sure.

Sean (May 27 2020 at 05:38):

starseeker (May 31 2020 at 01:58):

@Sean Unless something is clearly broken, I'd like to go ahead and merge the regress updates - even if that's not its final form, I think it's an improvement over what was there previously

Sean (May 31 2020 at 08:23):

@starseeker the branch looks good to me, all running well. love the unified logging. making it work with/without existing test definitions was a nice addition.

Stream: brlcad

Topic: regress-pkg

starseeker (May 20 2020 at 18:14):

Sean (May 20 2020 at 19:05):

Sean (May 20 2020 at 19:06):

Sean (May 20 2020 at 19:14):

starseeker (May 20 2020 at 19:18):

Sean (May 20 2020 at 19:19):

Sean (May 20 2020 at 19:19):

starseeker (May 20 2020 at 19:21):

Sean (May 20 2020 at 19:21):

starseeker (May 20 2020 at 19:22):

Sean (May 20 2020 at 19:22):

starseeker (May 20 2020 at 19:23):

starseeker (May 20 2020 at 19:23):

starseeker (May 20 2020 at 19:25):

Sean (May 20 2020 at 19:25):

Sean (May 20 2020 at 19:25):

starseeker (May 20 2020 at 19:26):

Sean (May 20 2020 at 19:26):

starseeker (May 20 2020 at 19:26):

starseeker (May 20 2020 at 19:27):

starseeker (May 20 2020 at 19:28):

starseeker (May 20 2020 at 19:29):

starseeker (May 20 2020 at 19:31):

starseeker (May 20 2020 at 19:32):

Sean (May 20 2020 at 19:32):

Sean (May 20 2020 at 19:34):

Sean (May 20 2020 at 19:35):

Sean (May 20 2020 at 19:36):

starseeker (May 20 2020 at 19:38):

starseeker (May 20 2020 at 19:39):

starseeker (May 20 2020 at 19:41):

starseeker (May 20 2020 at 20:06):

Sean (May 20 2020 at 20:13):

Sean (May 20 2020 at 20:13):

Sean (May 20 2020 at 20:14):

Sean (May 20 2020 at 20:14):

Sean (May 20 2020 at 20:15):

starseeker (May 20 2020 at 20:16):

starseeker (May 20 2020 at 20:17):

Sean (May 20 2020 at 20:42):

Sean (May 20 2020 at 20:42):

starseeker (May 20 2020 at 20:45):

starseeker (May 20 2020 at 20:46):

Sean (May 20 2020 at 20:47):

starseeker (May 20 2020 at 20:48):

Sean (May 20 2020 at 20:48):

Sean (May 20 2020 at 20:48):

starseeker (May 20 2020 at 20:50):

Sean (May 20 2020 at 20:52):

Sean (May 20 2020 at 21:09):

starseeker (May 20 2020 at 21:10):

Sean (May 20 2020 at 21:11):

Sean (May 20 2020 at 21:11):

starseeker (May 20 2020 at 21:13):

Sean (May 20 2020 at 21:14):

starseeker (May 20 2020 at 21:14):

Sean (May 20 2020 at 21:14):

starseeker (May 20 2020 at 21:16):

Sean (May 20 2020 at 21:16):

Sean (May 20 2020 at 21:17):

starseeker (May 20 2020 at 21:17):

Sean (May 20 2020 at 21:18):

Sean (May 20 2020 at 21:18):

starseeker (May 20 2020 at 21:19):

Sean (May 20 2020 at 21:19):

Sean (May 20 2020 at 21:19):

starseeker (May 20 2020 at 21:20):

Sean (May 20 2020 at 21:20):

starseeker (May 20 2020 at 21:21):

Sean (May 20 2020 at 21:21):

starseeker (May 20 2020 at 21:21):

Sean (May 20 2020 at 21:21):

Sean (May 20 2020 at 21:22):

starseeker (May 20 2020 at 21:22):

Sean (May 20 2020 at 21:22):

Sean (May 20 2020 at 21:22):

Sean (May 20 2020 at 21:22):

Sean (May 20 2020 at 21:23):