Continuous integration · brlcad · Zulip Chat Archive

@Sumagna Das We can get you an account set up for our Jenkins. Actually try requesting one. Not sure what it does, how it notifies. Go to https://ci.brlcad.org/status/ and select "create an account"

Erik (Jun 15 2020 at 22:38):

hm, my script blockers 'n stuff may've screwed things up, it gives me a single field for the password and complains that the passwords don't match

Sean (Jun 25 2020 at 12:54):

@Erik what do you mean? you mean if you try to create a jenkins account? note that's not apache... it's all going through a proxy redirect on a different port, talks directly to jenkins.

Sumagna Das (Aug 28 2020 at 08:24):

hey @Sean did you open jenkins recently? if not, i just opened it and it shows these warnings. See if you can fix them
Screenshot-from-2020-08-28-13-53-23.png

starseeker (Sep 23 2020 at 12:57):

Here's an example of the bizarre intermittent failure I'm seeing on the Linux github runners: lemon_build_failure_github_runner.txt

starseeker (Sep 23 2020 at 12:58):

maybe the perplex template copying is involved...

starseeker (Sep 23 2020 at 13:02):

Don't see how though...

starseeker (Sep 23 2020 at 13:03):

It almost looks like the lemon.c.o file hasn't had enough time to fully write out to disk before ld tries to turn it into an executable...

Sean (Sep 23 2020 at 15:40):

are you sure they're not being run concurrently?

Sean (Sep 23 2020 at 15:40):

it's not clear from the log, but they could be simultaneously executing

starseeker (Sep 23 2020 at 16:01):

Even if they are it shouldn't matter, looking closer - lemon is a stand alone single C file, and it's the lemon executable that's not getting generated correctly.

starseeker (Sep 23 2020 at 16:01):

There's another failure mode that popped up where the build step completes but then the lemon binary doesn't run.

starseeker (Sep 23 2020 at 16:06):

I wish it was some kind of dependency issue - that at least I might be able to do something with. Lemon is just a straight add_executable call in CMake, about as basic as it gets.

I put in a ticket with the github runner project - maybe they can offer some insight into what might be causing it.

Sean (Sep 23 2020 at 16:11):

only thing maybe coming into play is lemon.c is fairly big

starseeker (Sep 23 2020 at 16:11):

Grrr. And now CMP (pixcmp) doesn't want to run on the Windows version for some reason...

Sean (Sep 23 2020 at 16:11):

do you have access to the .o file, can you confirm whether it has _main

starseeker (Sep 23 2020 at 16:11):

No, unfortunately - I've not figured out yet how to capture what the actions are generating.

starseeker (Sep 23 2020 at 16:12):

/me more or less accidentally got Windows benchmark working on his laptop... Will have to try to figure it out again.

Sean (Sep 23 2020 at 16:13):

maybe add a stage to the CI runner that runs "nm misc/tools/lemon/CMakeFiles/lemon.dir/lemon.c.o"

starseeker (Sep 23 2020 at 16:13):

That's doable, I think - one sec.

Sean (Sep 23 2020 at 16:13):

it's compiling lemon.c with /usr/bin/gcc -Iinclude/brlcad -w -fPIE -std=gnu99 -MD -MT

Sean (Sep 23 2020 at 16:17):

hm, apparently they are gcc flags too. just hadn't seen them before.

Sean (Sep 23 2020 at 16:20):

okay, MF is dependencies (cool), MT is the output target, and MD I don't fully understand but they're all valid preprocessor flags

starseeker (Sep 23 2020 at 16:25):

And now of course it doesn't want to fail...

Sean (Sep 23 2020 at 16:28):

Only suggestion I have is to add a "sync" command before running make/ninja/whatever, to make sure all the container I/O has settled down before compilation begins

Sean (Sep 23 2020 at 16:28):

that way at least should rule out it being anything prior to compilation

starseeker (Sep 23 2020 at 16:28):

lemon_build_failure_github_runner2.txt

starseeker (Sep 23 2020 at 16:30):

As you say, lemon.o is pretty large - I wonder if we really are getting some kind of "I/O claiming it's complete when it isn't yet" issue...

Sean (Sep 23 2020 at 16:31):

nah, just looking at that log, I think you probably found a ninja bug

Sean (Sep 23 2020 at 16:32):

i mean it could be provoked by some container I/O issue, but the logs do indicate there are multiple threads working there (notice the interrupted nm output)

Sean (Sep 23 2020 at 16:33):

I wouldn't be surprised if there's some race, like ninja is waiting on the output file to exist and then kicking off the linker rule

starseeker (Sep 23 2020 at 16:35):

/me nods - could be, but would have thought for sure that kind of mistake on ninja's part would have blown up spectacularly on our normal builds.

Sean (Sep 23 2020 at 16:36):

except we don't have any builds with IO latency that provoke the bug

Sean (Sep 23 2020 at 16:37):

the container is almost certainly running on a virtualized, distributed, or maybe even a network filesystem

starseeker (Sep 23 2020 at 16:37):

Hmm. I'll flip back to make and see what we get.

Sean (Sep 23 2020 at 16:39):

it's undoubtedly the environment setting up the conditions, but could easily be some flawed assumption on ninja's part -- is there a way to control ninja's multithreading?

starseeker (Sep 23 2020 at 16:39):

On the command line yes - I'll have to check with the cmake -build trigger.

Sean (Sep 23 2020 at 16:40):

just by the fact that the .o has main afterwards but gcc can't find it during linking means the file exists but is not yet written.

that's easily possible with concurrency. it's also possible with FS caching that it's simply not sync'd to disk yet despite the command exiting, but then the only fix would be a sync command prior to the second gcc call.

starseeker (Sep 23 2020 at 17:06):

With both make and single threaded ninja the build order is different, so may be harder to reproduce (or at least a lot slower.)

Sean (Sep 23 2020 at 17:11):

It'd be interesting to let make run parallel - it's a lot older and already has a lot of 'sync'ing and gating going on.

Sean (Sep 23 2020 at 17:12):

it's seen some sh!t

starseeker (Sep 23 2020 at 17:14):

I left make in parallel mode

starseeker (Sep 23 2020 at 17:21):

Hrm. This is Windows related, so not what we're seeing, but sounds similar in some ways: https://github.com/ninja-build/ninja/issues/1802

starseeker (Sep 23 2020 at 17:23):

https://github.com/ninja-build/ninja/issues/1802

starseeker (Sep 23 2020 at 17:40):

Whoops, this link was what I meant: https://github.com/ninja-build/ninja/issues/1794

starseeker (Sep 23 2020 at 18:01):

Sigh. OK, we can go with default tools for the moment and see if things eventually shake down with ninja .

starseeker (Sep 23 2020 at 18:14):

Now, what's Windows complaining about...

Sean (Sep 23 2020 at 19:39):

1802 appears to be the issue, and I think the naivety of the response is indicative. they've not encountered anything but "nearly instantaneous" local filesystems, so that totally explains it

Sean (Sep 23 2020 at 19:48):

I commented on the bug, maybe he'll reopen it

starseeker (Sep 23 2020 at 19:56):

Gah. "test -e /dev/null" is succeeding in benchmark.sh, but when we get to "fopen" in the code it doesn't always work - running from inside msbuild, that fopen call fails even though the same executable will succeed from the msys prompt.

starseeker (Sep 23 2020 at 20:01):

Running from inside benchmark.sh, actually - pixcmp.exe works in isolation, but not from within the script. Grr...

Sean (Sep 23 2020 at 20:02):

this is on Windows?

starseeker (Sep 23 2020 at 20:02):

Yeah.

Sean (Sep 23 2020 at 20:02):

what fopen then?

starseeker (Sep 23 2020 at 20:02):

pixcmp.exe

starseeker (Sep 23 2020 at 20:02):

Initial CMP test in the benchmark script.

Sean (Sep 23 2020 at 20:04):

the sanity check to make sure CMP runs?

starseeker (Sep 23 2020 at 20:04):

yes

Sean (Sep 23 2020 at 20:05):

so it's running: eval "path/to/pixcmp" /dev/null /dev/null > /dev/null 2>&1

starseeker (Sep 23 2020 at 20:05):

Using [D:/a/cadcitest/cadcitest/build/Release/bin/rt] for RT
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/db] for DB
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for PIX
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for LOG
Using [D:/a/cadcitest/cadcitest/build/Release/bin/pixcmp] for CMP
Using [D:/a/cadcitest/cadcitest/build/Release/bin/elapsed.sh] for ELP

CUSTOMBUILD : error : CMP does not seem to work as expected [D:\a\cadcitest\cadcitest\build\check.vcxproj]

starseeker (Sep 23 2020 at 20:06):

Should be, yes (if test -e /dev/null is succeeding within the script)

starseeker (Sep 23 2020 at 20:06):

Seems to be in the debugger on my local laptop

Sean (Sep 23 2020 at 20:06):

so first question is -- is there a /dev/null

Sean (Sep 23 2020 at 20:07):

or better, can we confirm what exactly it's running

Sean (Sep 23 2020 at 20:07):

set +

starseeker (Sep 23 2020 at 20:07):

Um - set +? You mean in benchmark.sh?

starseeker (Sep 23 2020 at 20:08):

test -e /dev/null
echo $?
0
(from the MSYS command prompt)

Sean (Sep 23 2020 at 20:09):

ls -la /dev/null

starseeker (Sep 23 2020 at 20:10):

Reports there.

starseeker (Sep 23 2020 at 20:11):

crw-rw-rw-1 user 197612 1, 3 Sep 23 16:10 /dev/null

Sean (Sep 23 2020 at 20:12):

okay, so it actually has a /dev/null -- that should be fine

Sean (Sep 23 2020 at 20:12):

so cmp is genuinely failing

starseeker (Sep 23 2020 at 20:12):

Yes - fopen("/dev/null") is failing.

Sean (Sep 23 2020 at 20:13):

o.O need a perror() on that failure

Sean (Sep 23 2020 at 20:13):

looks like it should already be perror()ing

Sean (Sep 23 2020 at 20:14):

what's printed if you run "pixcmp /dev/null /dev/null"

starseeker (Sep 23 2020 at 20:18):

It is perror() calling, but I'm not sure where to check for output in visual studio?

Sean (Sep 23 2020 at 20:19):

the script is sinking the output intentionally

Sean (Sep 23 2020 at 20:19):

so either need to unsink it, or add another call

Sean (Sep 23 2020 at 20:19):

i.e., remove the >$NUL 2>&1

Sean (Sep 23 2020 at 20:19):

then it should be in the log

starseeker (Sep 23 2020 at 20:21):

/dev/null: No such file or directory

starseeker (Sep 23 2020 at 20:22):

(And yes, ls -la /dev/null succeeds from the same command prompt...)

starseeker (Sep 23 2020 at 20:22):

It's almost like when it's launching pixcmp.exe, it's losing the wrapping environment that's providing /dev/null

Sean (Sep 23 2020 at 20:23):

I wonder...

Sean (Sep 23 2020 at 20:27):

think I found it, gimmie a sec

Sean (Sep 23 2020 at 20:32):

try that @starseeker

starseeker (Sep 23 2020 at 20:37):

Nope, still getting /dev/null

Sean (Sep 23 2020 at 20:39):

you said you tried running "test -e /dev/null" what about trying this:
MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="\*" test -e /dev/null

Sean (Sep 23 2020 at 20:40):

or more explicitly: MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="/dev/null" test -e /dev/null

starseeker (Sep 23 2020 at 20:43):

echo $?
0

starseeker (Sep 23 2020 at 20:43):

Test still succeeds

starseeker (Sep 23 2020 at 20:51):

MSYS seems to be the uncanny valley of shells, in some ways - almost but not quite...

Sean (Sep 23 2020 at 21:02):

oh right, because it does exist in the shell environment ... just curiously not in the compiled environment

Sean (Sep 23 2020 at 21:02):

did that change have any affect?

starseeker (Sep 23 2020 at 21:08):

The committed change? No, didn't seem to

starseeker (Sep 23 2020 at 21:09):

Hmm, the build issues go beyond just ninja - I had forgotten cmake -build defaults to single threaded make. If I enable the -j flag explicitly, we get a different failure in a libpng cmake script execution.

Sean (Sep 23 2020 at 21:10):

so then let me try to change the test the other way around.

starseeker (Sep 23 2020 at 21:19):

And that script execution failure mode looks like an execute_process isn't properly completing its I/O before it tries the next script step. Auuuuugh.

starseeker (Sep 23 2020 at 21:20):

So we may be limited to single threaded building on the runners, not make-only building. That's... annoying.

Sean (Sep 23 2020 at 21:23):

c'est la vie!

Sean (Sep 23 2020 at 21:32):

applied a different change to bench/run.sh

starseeker (Sep 23 2020 at 21:35):

Same failure, still saying no such file/directory for /dev/null

starseeker (Sep 23 2020 at 21:36):

You might be able to try a test script on a github runner to iterate faster - would you like me to set up an example?

Sean (Sep 23 2020 at 21:36):

I'll have to add some diagnostics

starseeker (Sep 23 2020 at 22:27):

@Sean - I set up a little test repo that you can fork on github - it will let you edit bin/benchmark, push the changes, and try to run the benchmark script on Windows:
https://github.com/starseeker/wintest

Sean (Sep 24 2020 at 04:16):

okay, will take a look!

Sean (Sep 24 2020 at 04:25):

okay, so new to this. forked it, and it told me that there are/were workflows in it but they weren't going to be enabled. so now I don't see how to enable the benchmark.yml one...

Sean (Sep 24 2020 at 04:39):

couldn't figure it out, so just created a duplicate and that ran

Sean (Sep 24 2020 at 06:47):

there, that fixed it. when in doubt, rip it out? heh.

starseeker (Sep 24 2020 at 11:18):

Jenkins seems to be unhappy now?

ERROR: CMP does not seem to work as expected
(output was [
Different file sizes found: -(3) and -(72). Cannot perform pixcmp.])

starseeker (Sep 24 2020 at 11:21):

I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.

starseeker (Sep 24 2020 at 11:22):

starseeker (Sep 24 2020 at 11:35):

fwiw, the failure seems to be specific to bz - my Linux box works fine.

starseeker (Sep 24 2020 at 11:45):

I doubt it's what you want to do long term, but r77207 seems to run - apparently BSD doesn't like the "CMP - -" bit for some reason?

starseeker (Sep 24 2020 at 12:22):

Yeah, looks like Mac has the same issue as BSD.

Sean (Sep 24 2020 at 13:45):

starseeker said:

I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.

I did that and then it displayed no workflows, didn't run any when I committed.

Sean (Sep 24 2020 at 13:46):

Once I added another workflow, it ran both.

Sean (Sep 24 2020 at 13:59):

interesting on the failure -- I tested it on my Mac. pixcmp parses "-" so it should work unless we gots a bug

Sean (Sep 24 2020 at 14:01):

interesting. assuming you don't have a '-' file accidentally in that dir, it's basically saying stdin's sizes are different between the two stat calls...

starseeker (Sep 24 2020 at 14:17):

::facepalm:: - methinks Github Actions still have some teething pains...

Sean (Sep 24 2020 at 14:17):

working on pixcmp

starseeker (Sep 24 2020 at 14:17):

Does running the benchmark script itself work on your mac with the "-" style input? It fails on the runner...

starseeker (Sep 24 2020 at 14:18):

/me is trying to figure out how to save artifacts from the actions.

Sean (Sep 24 2020 at 14:19):

yep, worked here

Sean (Sep 24 2020 at 14:19):

it's a timing issue, so not unexpected

Sean (Sep 24 2020 at 14:21):

that's telling pixcmp to read from stdin for pixel comparisons and for whatever reason, there's 3 bytes the first time it checks and a few more bytes the next time. what's curious is that it's more than 1 byte. need to make pixcmp print out those byte streams.

starseeker (Sep 24 2020 at 14:22):

fwiw, it is failing for me interactively on bz (or did this morning, at any rate)

Sean (Sep 24 2020 at 14:22):

should just be a nul byte, 1 byte .. so that's also an issue

Sean (Sep 24 2020 at 14:22):

Sean (Sep 24 2020 at 17:00):

fwiw, I'm sure 77207 worked, but I'm intentionally avoiding hitting the disk (and working to remove the few remaining bits that violate it from benchmark). more importantly, my knowledge needs to be improved or pixcmp has a bug if stat'ing stdin twice is wrong.

starseeker (Sep 24 2020 at 18:02):

I know - that's why I mentioned I didn't think you'd want to stay with that solution. I just needed something that would work for the CI testing.

starseeker (Sep 24 2020 at 18:02):

What about pixdiffing one of the reference images with itself? That's a read from the disk but not a write, and we'll be reading from the disk for benchmark anyway...

Sean (Sep 24 2020 at 18:03):

don't worry about it... ;)

Sean (Sep 24 2020 at 18:03):

I'm going to find out what's wrong with - - because that's something that should clearly work from my understanding.

Sean (Sep 24 2020 at 18:04):

so either my understanding is very wrong or pixcmp has a bug. either way, this involves all of like 4 lines of code, so it's easy to diagnose.

starseeker (Sep 24 2020 at 18:05):

so, no git conversion this week then ;-)

starseeker (Sep 24 2020 at 18:05):

Is it failing for you on BSD at least? It'll be hell to debug if it's only failing in the runners

starseeker (Sep 24 2020 at 18:06):

/me sees another random github build failure on the runner, even in single threaded mode...

starseeker (Sep 24 2020 at 18:12):

Maybe if I add another sync after git clone finishes...

starseeker (Sep 24 2020 at 19:10):

Another new one:

D:\a\cadcitest\cadcitest\src\other\libspsr\Src\SPSR.cpp : fatal error
 C1083: Cannot open compiler generated file: 'D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.dir\Release\SPSR.obj': Permission denied
 [D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.vcxproj]

Last updated: Aug 07 2025 at 01:01 UTC

Stream: brlcad

Topic: Continuous integration

Sumagna Das (Jun 13 2020 at 03:18):

starseeker (Jun 13 2020 at 03:19):

Sumagna Das (Jun 13 2020 at 03:19):

Sumagna Das (Jun 13 2020 at 03:19):

Sumagna Das (Jun 13 2020 at 03:20):

starseeker (Jun 13 2020 at 03:21):

Sumagna Das (Jun 13 2020 at 03:21):

Sumagna Das (Jun 13 2020 at 03:22):

Sumagna Das (Jun 13 2020 at 03:22):

starseeker (Jun 13 2020 at 03:22):

Sumagna Das (Jun 13 2020 at 03:23):

Sumagna Das (Jun 13 2020 at 03:23):

starseeker (Jun 13 2020 at 03:24):

Sumagna Das (Jun 13 2020 at 03:24):

Sumagna Das (Jun 13 2020 at 03:25):

Sumagna Das (Jun 13 2020 at 03:25):

starseeker (Jun 13 2020 at 03:25):

starseeker (Jun 13 2020 at 03:26):

starseeker (Jun 13 2020 at 03:26):

Sumagna Das (Jun 13 2020 at 03:27):

Sumagna Das (Jun 13 2020 at 03:27):

starseeker (Jun 13 2020 at 03:28):

Sumagna Das (Jun 13 2020 at 03:28):

Sumagna Das (Jun 13 2020 at 03:29):

Sumagna Das (Jun 13 2020 at 03:29):

Sean (Jun 15 2020 at 14:52):

Erik (Jun 15 2020 at 22:38):

Sean (Jun 25 2020 at 12:54):

Sumagna Das (Aug 28 2020 at 08:24):

starseeker (Sep 23 2020 at 12:57):

starseeker (Sep 23 2020 at 12:58):

starseeker (Sep 23 2020 at 13:02):

starseeker (Sep 23 2020 at 13:03):

Sean (Sep 23 2020 at 15:40):

Sean (Sep 23 2020 at 15:40):

starseeker (Sep 23 2020 at 16:01):

starseeker (Sep 23 2020 at 16:01):

starseeker (Sep 23 2020 at 16:06):

Sean (Sep 23 2020 at 16:10):

Sean (Sep 23 2020 at 16:11):

starseeker (Sep 23 2020 at 16:11):

Sean (Sep 23 2020 at 16:11):

starseeker (Sep 23 2020 at 16:11):

starseeker (Sep 23 2020 at 16:12):

Sean (Sep 23 2020 at 16:13):

starseeker (Sep 23 2020 at 16:13):

Sean (Sep 23 2020 at 16:13):

Sean (Sep 23 2020 at 16:13):

Sean (Sep 23 2020 at 16:13):

Sean (Sep 23 2020 at 16:17):

Sean (Sep 23 2020 at 16:20):

starseeker (Sep 23 2020 at 16:25):

Sean (Sep 23 2020 at 16:28):

Sean (Sep 23 2020 at 16:28):

starseeker (Sep 23 2020 at 16:28):

starseeker (Sep 23 2020 at 16:30):

Sean (Sep 23 2020 at 16:31):

Sean (Sep 23 2020 at 16:32):

Sean (Sep 23 2020 at 16:33):

Sean (Sep 23 2020 at 16:33):

starseeker (Sep 23 2020 at 16:35):

Sean (Sep 23 2020 at 16:36):

Sean (Sep 23 2020 at 16:37):

starseeker (Sep 23 2020 at 16:37):

Sean (Sep 23 2020 at 16:39):

starseeker (Sep 23 2020 at 16:39):

Sean (Sep 23 2020 at 16:40):

Sean (Sep 23 2020 at 16:42):

starseeker (Sep 23 2020 at 17:06):

Sean (Sep 23 2020 at 17:11):

Sean (Sep 23 2020 at 17:12):

starseeker (Sep 23 2020 at 17:14):

starseeker (Sep 23 2020 at 17:21):

starseeker (Sep 23 2020 at 17:23):

starseeker (Sep 23 2020 at 17:40):

starseeker (Sep 23 2020 at 18:01):

starseeker (Sep 23 2020 at 18:14):

Sean (Sep 23 2020 at 19:39):