what CI system are you using?
Jenkins at the moment
oh
i use travis for my repo
where is the repo which holds the yml file?
I don't think our Jenkins CI files are checked in to a repository at the moment - they're a setup on the server
ohh
can i see the file or is it on a private server?
i like seeing those CI files and how they are setup
There's not much to see - it's a pretty basic svn checkout, configure and build
i know but still they are cool and so i want to see
see if i can understand Jenkins CI files
Jenkins isn't typically controlled with files - it's managed through a web interface
oh
thats kinda cool
ok then
There is one bash script we have that is run from Jenkins (when we're set up for it): https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/misc/clang-static-analyzer-run.sh
That's a lot more complicated than the normal process - it is used to check the code using the clang compiler's static analysis tool.
The same script could be launched by any CI system, but that is the core of the testing logic
i know travis and circleci use config files
but didnt knew that jenkins could be setup using a web interface
https://www.jenkins.io/doc/tutorials/
ok gotta go
my classes are going on
thanks for the information about jenkins
@Sumagna Das We can get you an account set up for our Jenkins. Actually try requesting one. Not sure what it does, how it notifies. Go to https://ci.brlcad.org/status/ and select "create an account"
hm, my script blockers 'n stuff may've screwed things up, it gives me a single field for the password and complains that the passwords don't match
@Erik what do you mean? you mean if you try to create a jenkins account? note that's not apache... it's all going through a proxy redirect on a different port, talks directly to jenkins.
hey @Sean did you open jenkins recently? if not, i just opened it and it shows these warnings. See if you can fix them
Screenshot-from-2020-08-28-13-53-23.png
Here's an example of the bizarre intermittent failure I'm seeing on the Linux github runners: lemon_build_failure_github_runner.txt
maybe the perplex template copying is involved...
Don't see how though...
It almost looks like the lemon.c.o file hasn't had enough time to fully write out to disk before ld tries to turn it into an executable...
are you sure they're not being run concurrently?
it's not clear from the log, but they could be simultaneously executing
Even if they are it shouldn't matter, looking closer - lemon is a stand alone single C file, and it's the lemon executable that's not getting generated correctly.
There's another failure mode that popped up where the build step completes but then the lemon binary doesn't run.
I wish it was some kind of dependency issue - that at least I might be able to do something with. Lemon is just a straight add_executable call in CMake, about as basic as it gets.
I put in a ticket with the github runner project - maybe they can offer some insight into what might be causing it.
yeah, I see that
only thing maybe coming into play is lemon.c is fairly big
Grrr. And now CMP (pixcmp) doesn't want to run on the Windows version for some reason...
do you have access to the .o file, can you confirm whether it has _main
No, unfortunately - I've not figured out yet how to capture what the actions are generating.
/me more or less accidentally got Windows benchmark working on his laptop... Will have to try to figure it out again.
maybe add a stage to the CI runner that runs "nm misc/tools/lemon/CMakeFiles/lemon.dir/lemon.c.o"
That's doable, I think - one sec.
there is one oddity in the build flags
it's compiling lemon.c with /usr/bin/gcc -Iinclude/brlcad -w -fPIE -std=gnu99 -MD -MT
those last two look like MSVC flags
hm, apparently they are gcc flags too. just hadn't seen them before.
okay, MF is dependencies (cool), MT is the output target, and MD I don't fully understand but they're all valid preprocessor flags
And now of course it doesn't want to fail...
Only suggestion I have is to add a "sync" command before running make/ninja/whatever, to make sure all the container I/O has settled down before compilation begins
that way at least should rule out it being anything prior to compilation
lemon_build_failure_github_runner2.txt
As you say, lemon.o is pretty large - I wonder if we really are getting some kind of "I/O claiming it's complete when it isn't yet" issue...
nah, just looking at that log, I think you probably found a ninja bug
i mean it could be provoked by some container I/O issue, but the logs do indicate there are multiple threads working there (notice the interrupted nm output)
I wouldn't be surprised if there's some race, like ninja is waiting on the output file to exist and then kicking off the linker rule
yet file exist is insufficient
/me nods - could be, but would have thought for sure that kind of mistake on ninja's part would have blown up spectacularly on our normal builds.
except we don't have any builds with IO latency that provoke the bug
the container is almost certainly running on a virtualized, distributed, or maybe even a network filesystem
Hmm. I'll flip back to make and see what we get.
it's undoubtedly the environment setting up the conditions, but could easily be some flawed assumption on ninja's part -- is there a way to control ninja's multithreading?
On the command line yes - I'll have to check with the cmake -build trigger.
just by the fact that the .o has main afterwards but gcc can't find it during linking means the file exists but is not yet written.
that's easily possible with concurrency. it's also possible with FS caching that it's simply not sync'd to disk yet despite the command exiting, but then the only fix would be a sync command prior to the second gcc call.
With both make and single threaded ninja the build order is different, so may be harder to reproduce (or at least a lot slower.)
It'd be interesting to let make run parallel - it's a lot older and already has a lot of 'sync'ing and gating going on.
it's seen some sh!t
I left make in parallel mode
Hrm. This is Windows related, so not what we're seeing, but sounds similar in some ways: https://github.com/ninja-build/ninja/issues/1802
https://github.com/ninja-build/ninja/issues/1802
Whoops, this link was what I meant: https://github.com/ninja-build/ninja/issues/1794
Sigh. OK, we can go with default tools for the moment and see if things eventually shake down with ninja .
Now, what's Windows complaining about...
1802 appears to be the issue, and I think the naivety of the response is indicative. they've not encountered anything but "nearly instantaneous" local filesystems, so that totally explains it
I commented on the bug, maybe he'll reopen it
Gah. "test -e /dev/null" is succeeding in benchmark.sh, but when we get to "fopen" in the code it doesn't always work - running from inside msbuild, that fopen call fails even though the same executable will succeed from the msys prompt.
Running from inside benchmark.sh, actually - pixcmp.exe works in isolation, but not from within the script. Grr...
this is on Windows?
Yeah.
what fopen then?
pixcmp.exe
Initial CMP test in the benchmark script.
the sanity check to make sure CMP runs?
yes
so it's running: eval "path/to/pixcmp" /dev/null /dev/null > /dev/null 2>&1
Using [D:/a/cadcitest/cadcitest/build/Release/bin/rt] for RT
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/db] for DB
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for PIX
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for LOG
Using [D:/a/cadcitest/cadcitest/build/Release/bin/pixcmp] for CMP
Using [D:/a/cadcitest/cadcitest/build/Release/bin/elapsed.sh] for ELP
CUSTOMBUILD : error : CMP does not seem to work as expected [D:\a\cadcitest\cadcitest\build\check.vcxproj]
Should be, yes (if test -e /dev/null is succeeding within the script)
Seems to be in the debugger on my local laptop
so first question is -- is there a /dev/null
or better, can we confirm what exactly it's running
set +
Um - set +? You mean in benchmark.sh?
test -e /dev/null
echo $?
0
(from the MSYS command prompt)
ls -la /dev/null
Reports there.
crw-rw-rw-1 user 197612 1, 3 Sep 23 16:10 /dev/null
okay, so it actually has a /dev/null -- that should be fine
so cmp is genuinely failing
Yes - fopen("/dev/null") is failing.
o.O need a perror() on that failure
looks like it should already be perror()ing
what's printed if you run "pixcmp /dev/null /dev/null"
It is perror() calling, but I'm not sure where to check for output in visual studio?
the script is sinking the output intentionally
so either need to unsink it, or add another call
i.e., remove the >$NUL 2>&1
then it should be in the log
/dev/null: No such file or directory
(And yes, ls -la /dev/null succeeds from the same command prompt...)
It's almost like when it's launching pixcmp.exe, it's losing the wrapping environment that's providing /dev/null
I wonder...
think I found it, gimmie a sec
try that @starseeker
Nope, still getting /dev/null
you said you tried running "test -e /dev/null" what about trying this:
MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="\*" test -e /dev/null
or more explicitly: MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="/dev/null" test -e /dev/null
echo $?
0
Test still succeeds
MSYS seems to be the uncanny valley of shells, in some ways - almost but not quite...
oh right, because it does exist in the shell environment ... just curiously not in the compiled environment
did that change have any affect?
The committed change? No, didn't seem to
Hmm, the build issues go beyond just ninja - I had forgotten cmake -build defaults to single threaded make. If I enable the -j flag explicitly, we get a different failure in a libpng cmake script execution.
so then let me try to change the test the other way around.
And that script execution failure mode looks like an execute_process isn't properly completing its I/O before it tries the next script step. Auuuuugh.
So we may be limited to single threaded building on the runners, not make-only building. That's... annoying.
c'est la vie!
applied a different change to bench/run.sh
Same failure, still saying no such file/directory for /dev/null
You might be able to try a test script on a github runner to iterate faster - would you like me to set up an example?
I'll have to add some diagnostics
@Sean - I set up a little test repo that you can fork on github - it will let you edit bin/benchmark, push the changes, and try to run the benchmark script on Windows:
https://github.com/starseeker/wintest
okay, will take a look!
okay, so new to this. forked it, and it told me that there are/were workflows in it but they weren't going to be enabled. so now I don't see how to enable the benchmark.yml one...
couldn't figure it out, so just created a duplicate and that ran
there, that fixed it. when in doubt, rip it out? heh.
Jenkins seems to be unhappy now?
ERROR: CMP does not seem to work as expected
(output was [
Different file sizes found: -(3) and -(72). Cannot perform pixcmp.])
I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.
fwiw, the failure seems to be specific to bz - my Linux box works fine.
I doubt it's what you want to do long term, but r77207 seems to run - apparently BSD doesn't like the "CMP - -" bit for some reason?
Yeah, looks like Mac has the same issue as BSD.
starseeker said:
I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.
I did that and then it displayed no workflows, didn't run any when I committed.
Once I added another workflow, it ran both.
interesting on the failure -- I tested it on my Mac. pixcmp parses "-" so it should work unless we gots a bug
interesting. assuming you don't have a '-' file accidentally in that dir, it's basically saying stdin's sizes are different between the two stat calls...
::facepalm:: - methinks Github Actions still have some teething pains...
working on pixcmp
Does running the benchmark script itself work on your mac with the "-" style input? It fails on the runner...
/me is trying to figure out how to save artifacts from the actions.
yep, worked here
it's a timing issue, so not unexpected
that's telling pixcmp to read from stdin for pixel comparisons and for whatever reason, there's 3 bytes the first time it checks and a few more bytes the next time. what's curious is that it's more than 1 byte. need to make pixcmp print out those byte streams.
fwiw, it is failing for me interactively on bz (or did this morning, at any rate)
should just be a nul byte, 1 byte .. so that's also an issue
ok
fwiw, I'm sure 77207 worked, but I'm intentionally avoiding hitting the disk (and working to remove the few remaining bits that violate it from benchmark). more importantly, my knowledge needs to be improved or pixcmp has a bug if stat'ing stdin twice is wrong.
I know - that's why I mentioned I didn't think you'd want to stay with that solution. I just needed something that would work for the CI testing.
What about pixdiffing one of the reference images with itself? That's a read from the disk but not a write, and we'll be reading from the disk for benchmark anyway...
don't worry about it... ;)
I'm going to find out what's wrong with - - because that's something that should clearly work from my understanding.
so either my understanding is very wrong or pixcmp has a bug. either way, this involves all of like 4 lines of code, so it's easy to diagnose.
so, no git conversion this week then ;-)
Is it failing for you on BSD at least? It'll be hell to debug if it's only failing in the runners
/me sees another random github build failure on the runner, even in single threaded mode...
Maybe if I add another sync after git clone finishes...
Another new one:
D:\a\cadcitest\cadcitest\src\other\libspsr\Src\SPSR.cpp : fatal error
C1083: Cannot open compiler generated file: 'D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.dir\Release\SPSR.obj': Permission denied
[D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.vcxproj]
Last updated: Jan 09 2025 at 00:46 UTC