Stream: brlcad

Topic: regress-pkg


view this post on Zulip starseeker (May 20 2020 at 18:14):

@Sean I'm not able to reproduce r75847 on GhostBSD, OpenBSD or bz - are there specific conditions that trigger it?

view this post on Zulip Sean (May 20 2020 at 19:05):

no conditions, default build + make regress
.bz is stalling (see jenkins)

view this post on Zulip Sean (May 20 2020 at 19:06):

or did at least once. i haven't been back to check it, been debugging it for a while.

view this post on Zulip Sean (May 20 2020 at 19:14):

@starseeker debugging is going slow though. took a while just to get at the actual calls.

the regression tests are a bit of a mess.. some using macros, some using functions, some using nothing, some logging, some not logging, all slightly different from the rest. any relief planned?

even noticed undefined global keywords like EXCLUDE_FROM_DEFAULT_BUILD and EXCLUDE_FROM _REGRESS that trigger special logic elsewhere (e.g., top-level cmakelists.txt file ). that makes it super hard to figure out what's going on, frustrating because of the unnecessary spaghet.

view this post on Zulip starseeker (May 20 2020 at 19:18):

Yeah, most of those grew organically and weren't well designed for multi-config setups.

view this post on Zulip Sean (May 20 2020 at 19:19):

is that because of the log files?

view this post on Zulip Sean (May 20 2020 at 19:19):

how does multiconfig complicate it?

view this post on Zulip starseeker (May 20 2020 at 19:21):

Mostly has to do with the need to launch built executable programs from configuration dependent output directories. I need to look into whether generator expressions in newer CMake can help clean that up, but it was a real pain originally

view this post on Zulip Sean (May 20 2020 at 19:21):

$<TARGET_FILE: doesn't take care of that?

view this post on Zulip starseeker (May 20 2020 at 19:22):

It may now - that wasn't an option when most of them were written. regress-repository and regress-license experiment with using those expressions - so far successfully

view this post on Zulip Sean (May 20 2020 at 19:22):

ah okay

view this post on Zulip starseeker (May 20 2020 at 19:23):

Log and stamp files are another issue though - if you switch configs in MSVC, you need everything to look for configuration dependent stamp files to know to re-run tests.

view this post on Zulip starseeker (May 20 2020 at 19:23):

Those aren't an issue with repo and license - those just run every time the target is called. Maybe that's what all the regress targets should do - would simplify that a fair bit

view this post on Zulip starseeker (May 20 2020 at 19:25):

Also depends on how much you want logged to text files - Running things from execute_process in CMake allows capture of stdout and stderror to per-test log files.

view this post on Zulip Sean (May 20 2020 at 19:25):

just for some context, here's the hot tests on mac laptop (ssd):

view this post on Zulip Sean (May 20 2020 at 19:25):

30/37 Test #33: regress-nurbs-nirt-NIST03_MISS_01 .........   Passed   37.87 sec
31/37 Test  #3: regress-solids ............................   Passed    8.65 sec
32/37 Test #48: regress-repository ........................   Passed   47.49 sec
33/37 Test #46: regress-gchecker ..........................   Passed   21.88 sec
34/37 Test  #6: regress-rtedge ............................   Passed   60.01 sec
35/37 Test #47: regress-licenses ..........................   Passed   26.60 sec
36/37 Test #14: regress-red ...............................   Passed   68.78 sec
37/37 Test #36: regress-pkg ...............................***Failed  1009.37 sec

view this post on Zulip starseeker (May 20 2020 at 19:26):

(looks like latest jenkins tests either succeeded or are failing on usual rtedge issue unless I'm looking at the wrong thing...)

view this post on Zulip Sean (May 20 2020 at 19:26):

pkg only aborted after I ended up killing the server process manually.

view this post on Zulip starseeker (May 20 2020 at 19:26):

O.o

view this post on Zulip starseeker (May 20 2020 at 19:27):

OK... if you run stand-alone what happens?

./bin/regress_pkg ./bin/regress_pkg_server ./bin/regress_pkg_client

view this post on Zulip starseeker (May 20 2020 at 19:28):

I thought I set regress-pkg to not be part of the make check/make regress targets - let me see if I goofed that up somehow

view this post on Zulip starseeker (May 20 2020 at 19:29):

Huh. r75832 should have taken it out of both regress and check - only make test or a specific run of the regress-pkg target should trigger it.

view this post on Zulip starseeker (May 20 2020 at 19:31):

@Sean will the jenkins instance run multiple builds in parallel (i.e. if commits some in one after the other?)

view this post on Zulip starseeker (May 20 2020 at 19:32):

that could definitely hang it, if multiple pkg test programs start mixing it up

view this post on Zulip Sean (May 20 2020 at 19:32):

it's configured to test commits one at a time

view this post on Zulip Sean (May 20 2020 at 19:34):

the stall feels like a race condition and that's what the log seems to confirm to some extent, like the client runs before the server, exits, and then the server runs and waits forever, just due to how the threads got scheduled.

view this post on Zulip Sean (May 20 2020 at 19:35):

this probably needs to be accommodated in the client code. I noticed it exits immediately, which is not the behavior needed here.

view this post on Zulip Sean (May 20 2020 at 19:36):

client should probably try a few times and for a few seconds before giving up

view this post on Zulip starseeker (May 20 2020 at 19:38):

So you're thinking around line 94 to loop for a little while if the initial connection attempt fails?

view this post on Zulip starseeker (May 20 2020 at 19:39):

Probably could also have the sever time out after a second or so - this test should be virtually instantaneous if it's running correctly

view this post on Zulip starseeker (May 20 2020 at 19:41):

@Sean you want me to go ahead and add some timing logic to see if I can force it to behave a little better if something goes wrong?

view this post on Zulip starseeker (May 20 2020 at 20:06):

r75850 should help some. I'm not seeing a non-blocking option to pkg_process in pkg.h offhand...

view this post on Zulip Sean (May 20 2020 at 20:13):

a second is likely not enough, that depends on the kernel. starting a process can commonly take 0-5 depending on operating system, level of concurrency, and other factors

view this post on Zulip Sean (May 20 2020 at 20:13):

but yeah, that looks like the right direction

view this post on Zulip Sean (May 20 2020 at 20:14):

should document those magic numbers (which is what the BU_SEC2USEC macro does in effect, or a comment)

view this post on Zulip Sean (May 20 2020 at 20:14):

not sure about making the server non-blocking on getting clients, that doesn't seem right

view this post on Zulip Sean (May 20 2020 at 20:15):

that's just going to make it spin hot and/or fall right into the error condition

view this post on Zulip starseeker (May 20 2020 at 20:16):

The snooze was intended to keep it from spinning hot - will that not work?

view this post on Zulip starseeker (May 20 2020 at 20:17):

Only other option that came to mind was launching the connection wait from a thread, which seemed like a fair bit of complication for what's supposed to be something simple...

view this post on Zulip Sean (May 20 2020 at 20:42):

no, you're right -- the sleep alleviates it running hot, but it does fall right into the condition

note, that it doesn't mean it's busy. null return just means there's no client to get.

view this post on Zulip Sean (May 20 2020 at 20:42):

this is informative, the server is now running and timing out before the client even starts

view this post on Zulip starseeker (May 20 2020 at 20:45):

With a 10 second wait time? O.o

view this post on Zulip starseeker (May 20 2020 at 20:46):

Is the machine thread starved somehow?

view this post on Zulip Sean (May 20 2020 at 20:47):

nope

view this post on Zulip starseeker (May 20 2020 at 20:48):

Or is the port just blocked outright? Not sure how that would manifest...

view this post on Zulip Sean (May 20 2020 at 20:48):

nope

view this post on Zulip Sean (May 20 2020 at 20:48):

34/37 Test #36: regress-pkg ...............................***Failed    2.10 sec
Launching server
SERVER thread starting
Listening on port 2000
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Connection seems to be busy, waiting...
Launching client
CLIENT thread starting
Connection inactive for > 1 second, quitting.

Timeout - inactive
SERVER thread ending
Connecting to 127.0.0.1, port 2000
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61

view this post on Zulip starseeker (May 20 2020 at 20:50):

So... the 1 second wait isn't long enough? It looks like CLIENT thread is starting just before the server bails?

view this post on Zulip Sean (May 20 2020 at 20:52):

running some tests

view this post on Zulip Sean (May 20 2020 at 21:09):

curiouser and curiouser

Launching server
SERVER thread starting
Listening on port 2000
No client input available, waiting...
No client input available, waiting...
Launching client
CLIENT thread starting
No client input available, waiting...
No client input available, waiting...
Waiting for client to exit
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
No client input available, waiting...
Connection inactive for > 5 seconds, quitting.

Timeout - inactive
SERVER thread ending
Connecting to 127.0.0.1, port 2000
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
pkg_open: client connect: errno=61
Connection to 127.0.0.1, port 2000, failed.

view this post on Zulip starseeker (May 20 2020 at 21:10):

Is that BSD? or OSX?

view this post on Zulip Sean (May 20 2020 at 21:11):

so the client thread starts, but the client process does not until the second process completes. I wonder, this could be a security or system implementation issue that prevents simultaneous std::system() calls from multiple threads

view this post on Zulip Sean (May 20 2020 at 21:11):

this is OSX

view this post on Zulip starseeker (May 20 2020 at 21:13):

Um. Unfortunately, to the best of my knowledge CTest doesn't provide a way to define a test that launches multiple executables. (The controlling regress_pkg program was an attempt to get cute and work around that.)

view this post on Zulip Sean (May 20 2020 at 21:14):

why make it multiprocess?

view this post on Zulip starseeker (May 20 2020 at 21:14):

To verify that information moved successfully between two different programs

view this post on Zulip Sean (May 20 2020 at 21:14):

instead of std::system, just call their (renamed) main functions accordingly

view this post on Zulip starseeker (May 20 2020 at 21:16):

That'll probably do for exercising libpkg. That wasn't why I originally put that example together, so I was probably to focused on the multiple programs sharing information aspect - I'll try the renamed mains and see if communication-across-threads works.

view this post on Zulip Sean (May 20 2020 at 21:16):

add_test can't take multiple COMMAND lines?

view this post on Zulip Sean (May 20 2020 at 21:17):

or custom_target

view this post on Zulip starseeker (May 20 2020 at 21:17):

Docs for add_test don't look like it takes multiple COMMANDs - even if it did, if it's like the other add_ functions it would execute in series.

view this post on Zulip Sean (May 20 2020 at 21:18):

I thought we had examples calling multiple commands, things that replaced shell scripts

view this post on Zulip Sean (May 20 2020 at 21:18):

in series is fine -- server is run (stays running), client is run (exits), server is shut down

view this post on Zulip starseeker (May 20 2020 at 21:19):

I don't think it will launch the client until the server completes - e.g. std::system()

view this post on Zulip Sean (May 20 2020 at 21:19):

it wouldn't be using std::system, pkg_regress would go away

view this post on Zulip Sean (May 20 2020 at 21:19):

if cmake can run multiple commands in series or otherwise, it's doable as multiprocess

view this post on Zulip starseeker (May 20 2020 at 21:20):

Er, sorry - wasn't clear. As far as I know, CMake script execution can't do async process launching - it runs one program to exit, then launches the next

view this post on Zulip Sean (May 20 2020 at 21:20):

that'd work

view this post on Zulip starseeker (May 20 2020 at 21:21):

Not following - it would wait for the server to finish, then launch the client - would do just what OSX is doing now.

view this post on Zulip Sean (May 20 2020 at 21:21):

you run the server, it detaches and keeps running, the process exits (server remains running)

view this post on Zulip starseeker (May 20 2020 at 21:21):

It doesn't detach (as far as I know)

view this post on Zulip Sean (May 20 2020 at 21:21):

well not now, we're talking about what it could be made to do in a serial cmake context

view this post on Zulip Sean (May 20 2020 at 21:22):

it could be made to detach

view this post on Zulip starseeker (May 20 2020 at 21:22):

I don't think CMake detaches either - looking for docs now, but I think a script doesn't proceed until program return.

view this post on Zulip Sean (May 20 2020 at 21:22):

still, no it wouldn't you'd have to write the code to make it detach

view this post on Zulip Sean (May 20 2020 at 21:22):

I mean, I think multithreaded tests is adequate myself

view this post on Zulip Sean (May 20 2020 at 21:22):

but sounds doable

view this post on Zulip Sean (May 20 2020 at 21:23):

we have multiprocess libpkg testing, several regress scripts render via pkg

view this post on Zulip Sean (May 20 2020 at 21:24):

we don't have pkg unit testing, that would be interesting
(and maybe tricky to get right)

view this post on Zulip starseeker (May 20 2020 at 21:25):

I'll try the multithreading - at this point I'm regretting stepping on the land mine, so if that's the shortest way out of the patch I'm all for it ;-)

view this post on Zulip Sean (May 20 2020 at 21:38):

test2.cpp test2.sh

view this post on Zulip Sean (May 20 2020 at 21:38):

looks like that's definitely what's going on

view this post on Zulip Sean (May 20 2020 at 21:38):

agua:.build morrison$ clang++ test2.cpp
agua:.build morrison$ ./a.out
running ONE
running ONE
running ONE
running ONE
running ONE
running ONE
running TWO
running TWO
running TWO
running TWO
running TWO
running TWO

view this post on Zulip Sean (May 20 2020 at 21:39):

I wonder...

view this post on Zulip starseeker (May 20 2020 at 21:39):

I just got a multithreaded test working (I think) - I'll pull in your updates and commit so you can give it a whirl

view this post on Zulip Sean (May 20 2020 at 21:45):

I found an alternative that should work for multiprocess too

view this post on Zulip Sean (May 20 2020 at 21:45):

if it doesn't pan out, but testing indicates it will

view this post on Zulip starseeker (May 20 2020 at 21:47):

OK, I'll hold off

view this post on Zulip Sean (May 20 2020 at 22:00):

I think you should do the threaded approach. it's still simpler and avoids std::system (which will show up on a security audit anyways as a high-vulnerability code issue).

view this post on Zulip starseeker (May 20 2020 at 22:00):

@Sean btw, add_executable in CMake won't install out of the box - you need an install() as well. The BRLCAD_ADDEXEC call does all that automatically, so that's why it has the NO_INSTALL flag. What you did works fine, just thought I'd mention it for clarity

view this post on Zulip starseeker (May 20 2020 at 22:01):

alrightie - committing. Let me know what the Mac thinks.

view this post on Zulip Sean (May 20 2020 at 22:01):

ah, gotcha. it was more that I noticed they were oddities in my build/bin dir.

view this post on Zulip starseeker (May 20 2020 at 22:03):

I'll sometimes just us add_executable when I don't need to link in lots of stuff, worry about DLL defines, and all the other complex pieces - I shouldn't really, since I've also been bit on a few occasions when I was wrong about not needing it after trying on Windows...

view this post on Zulip starseeker (May 20 2020 at 22:04):

If the free OSes are going to implement something similar to that DLL import/export business, we should probably enable it even if we don't otherwise care just to more easily test the Windows behavior

view this post on Zulip Sean (May 20 2020 at 22:05):

I think it's good practice not to just because it creates outliers that beg questions to whomever didn't write that logic

view this post on Zulip starseeker (May 20 2020 at 22:06):

I really need to get a CI setup somewhere using the new Ninja/CMake multiconfig build - up until that was added it was only Xcode and Visual Studio that could exercise that part of things.

view this post on Zulip Sean (May 20 2020 at 22:06):

won't know why one is different from the rest

view this post on Zulip starseeker (May 20 2020 at 22:06):

true

view this post on Zulip Sean (May 20 2020 at 22:06):

looks like there are only 6 instances currently

view this post on Zulip Sean (May 20 2020 at 22:07):

we could also adopt the forced import/export behavior on linux/mac so it's the same everywhere

view this post on Zulip Sean (May 20 2020 at 22:08):

it would eliminate the conditionals and make everyone fail like windows

view this post on Zulip starseeker (May 20 2020 at 22:08):

I've never actually seen that done on any platform other than Windows - how does it work?

view this post on Zulip Sean (May 20 2020 at 22:08):

it's identical

view this post on Zulip Sean (May 20 2020 at 22:09):

I think clang (or maybe gcc or both?? I forget) even adopted the same syntax

view this post on Zulip starseeker (May 20 2020 at 22:09):

__declspec(dllexport) and __declspec(dllimport) ?

view this post on Zulip Sean (May 20 2020 at 22:09):

yep

view this post on Zulip starseeker (May 20 2020 at 22:09):

hrm. Well, I guess if configure tested for support for it...

view this post on Zulip Sean (May 20 2020 at 22:09):

there's another cflag that makes them required iirc

view this post on Zulip starseeker (May 20 2020 at 22:10):

Much as it irks me in some ways, we should probably mandate it - that's one of the major rolling pain points building on Windows.

view this post on Zulip Sean (May 20 2020 at 22:11):

here it is: https://gcc.gnu.org/wiki/Visibility

view this post on Zulip Sean (May 20 2020 at 22:11):

will have to do some performance testing as it does affect what is visible to the optimizer as well (could be better or worse, why it warrants testing)

view this post on Zulip Sean (May 20 2020 at 22:12):

there it is -- just have to add -fvisibility=hidden

view this post on Zulip Sean (May 20 2020 at 22:12):

then of course the appropriate dllimport/export decls we use on windows should be used

view this post on Zulip Sean (May 20 2020 at 22:13):

/me thinks __attribute__ ((dllimport)) is stupid given they made it fully compatible with __declspec(dllimport)

view this post on Zulip starseeker (May 20 2020 at 22:14):

agreed - if that works I see no reason to complicate things.

view this post on Zulip starseeker (May 20 2020 at 22:14):

Trying to remember where we made the DLL bits windows specific... first I'd better make sure I can test for -fvisibility=hidden.

view this post on Zulip starseeker (May 20 2020 at 22:15):

@Sean Did the threaded version of regress_pkg build on OSX?

view this post on Zulip starseeker (May 20 2020 at 22:20):

be back in a bit

view this post on Zulip Sean (May 21 2020 at 06:03):

builds and passes now

view this post on Zulip starseeker (May 21 2020 at 16:45):

@Sean about the regression testing - I'll go through and convert everything I can to functions.

I can probably remove the stamp file mechanism from the regression tests altogether - that's important for custom build steps, but is unnecessary for regression tests since we want them to run each time one of the targets is invoked. I originally based the regression test code on the other custom runs for the build, and it migrated in with that IIRC.

The EXCLUDE_FROM_DEFAULT_BUILD property is a standard CMake option that keeps a target from being run by "Build Solution" in Visual Studio. I never run that target myself, so I'd be OK with skipping it:
https://cmake.org/cmake/help/latest/prop_tgt/EXCLUDE_FROM_DEFAULT_BUILD.html

EXCLUDE_FROM_REGRESS and the other implicit naming conventions are trickier. It might be possible to define a better wrapper function which hides the details to assemble the check and regress targets, but I don't know if you'd like that any better than what's there... I'll take a stab at it, but we'll have to see.

For test logging, here are the options I know of:

1) current solution - launch test programs with execute_process inside a .cmake script which is run by cmake -P via add_test in the parent build. This captures I/O to log files, but does complicate execution. In particular, the .cmake script has to know how to launch the executable specific to the current configuration, as well as writing out the log files in the appropriate directories.

2) Directly run the regression executable with add_test, no wrapping .cmake file. This has the virtue of simplicity, but to the best of my knowledge this approach has no way to capture the I/O from the test executable to a file. It is captured by ctest, which I believe does do some organization of its reported outputs, but as far as I can discover it has no sophisticated logging options so we give up per-test logging files.

view this post on Zulip starseeker (May 25 2020 at 18:00):

@Sean upon inspection It looks like the regress-flawfinder target is successfully running, so the failure is flawfinder not liking something

view this post on Zulip starseeker (May 25 2020 at 19:56):

@Sean bioh branch has the rework of the regression testing system. It should be at a point now where it's worth you taking a look and seeing if it addresses your concerns

view this post on Zulip starseeker (May 26 2020 at 02:18):

The system is now the BRLCAD_REGRESSION_TEST function in misc/CMake/BRLCAD_Targets.cmake along with the *.cmake.in files and CMakeLists.txt files in the regress directory and subdirectories.

view this post on Zulip Sean (May 27 2020 at 05:20):

starseeker said:

Sean upon inspection It looks like the regress-flawfinder target is successfully running, so the failure is flawfinder not liking something

Oh cool. I thought flawinder was left at a really low setting that always detected real code smells, but wasn't sure.

view this post on Zulip Sean (May 27 2020 at 05:38):

starseeker said:

Sean bioh branch has the rework of the regression testing system. It should be at a point now where it's worth you taking a look and seeing if it addresses your concerns

K, will take a look!

view this post on Zulip starseeker (May 31 2020 at 01:58):

@Sean Unless something is clearly broken, I'd like to go ahead and merge the regress updates - even if that's not its final form, I think it's an improvement over what was there previously

view this post on Zulip Sean (May 31 2020 at 08:23):

@starseeker the branch looks good to me, all running well. love the unified logging. making it work with/without existing test definitions was a nice addition.


Last updated: Oct 09 2024 at 00:44 UTC