Stream: brlcad

Topic: Continuous integration


view this post on Zulip Sumagna Das (Jun 13 2020 at 03:18):

what CI system are you using?

view this post on Zulip starseeker (Jun 13 2020 at 03:19):

Jenkins at the moment

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:19):

oh

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:19):

i use travis for my repo

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:20):

where is the repo which holds the yml file?

view this post on Zulip starseeker (Jun 13 2020 at 03:21):

I don't think our Jenkins CI files are checked in to a repository at the moment - they're a setup on the server

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:21):

ohh

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:22):

can i see the file or is it on a private server?

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:22):

i like seeing those CI files and how they are setup

view this post on Zulip starseeker (Jun 13 2020 at 03:22):

There's not much to see - it's a pretty basic svn checkout, configure and build

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:23):

i know but still they are cool and so i want to see

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:23):

see if i can understand Jenkins CI files

view this post on Zulip starseeker (Jun 13 2020 at 03:24):

Jenkins isn't typically controlled with files - it's managed through a web interface

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:24):

oh

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:25):

thats kinda cool

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:25):

ok then

view this post on Zulip starseeker (Jun 13 2020 at 03:25):

There is one bash script we have that is run from Jenkins (when we're set up for it): https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/misc/clang-static-analyzer-run.sh

view this post on Zulip starseeker (Jun 13 2020 at 03:26):

That's a lot more complicated than the normal process - it is used to check the code using the clang compiler's static analysis tool.

view this post on Zulip starseeker (Jun 13 2020 at 03:26):

The same script could be launched by any CI system, but that is the core of the testing logic

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:27):

i know travis and circleci use config files

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:27):

but didnt knew that jenkins could be setup using a web interface

view this post on Zulip starseeker (Jun 13 2020 at 03:28):

https://www.jenkins.io/doc/tutorials/

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:28):

ok gotta go

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:29):

my classes are going on

view this post on Zulip Sumagna Das (Jun 13 2020 at 03:29):

thanks for the information about jenkins

view this post on Zulip Sean (Jun 15 2020 at 14:52):

@Sumagna Das We can get you an account set up for our Jenkins. Actually try requesting one. Not sure what it does, how it notifies. Go to https://ci.brlcad.org/status/ and select "create an account"

view this post on Zulip Erik (Jun 15 2020 at 22:38):

hm, my script blockers 'n stuff may've screwed things up, it gives me a single field for the password and complains that the passwords don't match

view this post on Zulip Sean (Jun 25 2020 at 12:54):

@Erik what do you mean? you mean if you try to create a jenkins account? note that's not apache... it's all going through a proxy redirect on a different port, talks directly to jenkins.

view this post on Zulip Sumagna Das (Aug 28 2020 at 08:24):

hey @Sean did you open jenkins recently? if not, i just opened it and it shows these warnings. See if you can fix them
Screenshot-from-2020-08-28-13-53-23.png

view this post on Zulip starseeker (Sep 23 2020 at 12:57):

Here's an example of the bizarre intermittent failure I'm seeing on the Linux github runners: lemon_build_failure_github_runner.txt

view this post on Zulip starseeker (Sep 23 2020 at 12:58):

maybe the perplex template copying is involved...

view this post on Zulip starseeker (Sep 23 2020 at 13:02):

Don't see how though...

view this post on Zulip starseeker (Sep 23 2020 at 13:03):

It almost looks like the lemon.c.o file hasn't had enough time to fully write out to disk before ld tries to turn it into an executable...

view this post on Zulip Sean (Sep 23 2020 at 15:40):

are you sure they're not being run concurrently?

view this post on Zulip Sean (Sep 23 2020 at 15:40):

it's not clear from the log, but they could be simultaneously executing

view this post on Zulip starseeker (Sep 23 2020 at 16:01):

Even if they are it shouldn't matter, looking closer - lemon is a stand alone single C file, and it's the lemon executable that's not getting generated correctly.

view this post on Zulip starseeker (Sep 23 2020 at 16:01):

There's another failure mode that popped up where the build step completes but then the lemon binary doesn't run.

view this post on Zulip starseeker (Sep 23 2020 at 16:06):

I wish it was some kind of dependency issue - that at least I might be able to do something with. Lemon is just a straight add_executable call in CMake, about as basic as it gets.

I put in a ticket with the github runner project - maybe they can offer some insight into what might be causing it.

view this post on Zulip Sean (Sep 23 2020 at 16:10):

yeah, I see that

view this post on Zulip Sean (Sep 23 2020 at 16:11):

only thing maybe coming into play is lemon.c is fairly big

view this post on Zulip starseeker (Sep 23 2020 at 16:11):

Grrr. And now CMP (pixcmp) doesn't want to run on the Windows version for some reason...

view this post on Zulip Sean (Sep 23 2020 at 16:11):

do you have access to the .o file, can you confirm whether it has _main

view this post on Zulip starseeker (Sep 23 2020 at 16:11):

No, unfortunately - I've not figured out yet how to capture what the actions are generating.

view this post on Zulip starseeker (Sep 23 2020 at 16:12):

/me more or less accidentally got Windows benchmark working on his laptop... Will have to try to figure it out again.

view this post on Zulip Sean (Sep 23 2020 at 16:13):

maybe add a stage to the CI runner that runs "nm misc/tools/lemon/CMakeFiles/lemon.dir/lemon.c.o"

view this post on Zulip starseeker (Sep 23 2020 at 16:13):

That's doable, I think - one sec.

view this post on Zulip Sean (Sep 23 2020 at 16:13):

there is one oddity in the build flags

view this post on Zulip Sean (Sep 23 2020 at 16:13):

it's compiling lemon.c with /usr/bin/gcc -Iinclude/brlcad -w -fPIE -std=gnu99 -MD -MT

view this post on Zulip Sean (Sep 23 2020 at 16:13):

those last two look like MSVC flags

view this post on Zulip Sean (Sep 23 2020 at 16:17):

hm, apparently they are gcc flags too. just hadn't seen them before.

view this post on Zulip Sean (Sep 23 2020 at 16:20):

okay, MF is dependencies (cool), MT is the output target, and MD I don't fully understand but they're all valid preprocessor flags

view this post on Zulip starseeker (Sep 23 2020 at 16:25):

And now of course it doesn't want to fail...

view this post on Zulip Sean (Sep 23 2020 at 16:28):

Only suggestion I have is to add a "sync" command before running make/ninja/whatever, to make sure all the container I/O has settled down before compilation begins

view this post on Zulip Sean (Sep 23 2020 at 16:28):

that way at least should rule out it being anything prior to compilation

view this post on Zulip starseeker (Sep 23 2020 at 16:28):

lemon_build_failure_github_runner2.txt

view this post on Zulip starseeker (Sep 23 2020 at 16:30):

As you say, lemon.o is pretty large - I wonder if we really are getting some kind of "I/O claiming it's complete when it isn't yet" issue...

view this post on Zulip Sean (Sep 23 2020 at 16:31):

nah, just looking at that log, I think you probably found a ninja bug

view this post on Zulip Sean (Sep 23 2020 at 16:32):

i mean it could be provoked by some container I/O issue, but the logs do indicate there are multiple threads working there (notice the interrupted nm output)

view this post on Zulip Sean (Sep 23 2020 at 16:33):

I wouldn't be surprised if there's some race, like ninja is waiting on the output file to exist and then kicking off the linker rule

view this post on Zulip Sean (Sep 23 2020 at 16:33):

yet file exist is insufficient

view this post on Zulip starseeker (Sep 23 2020 at 16:35):

/me nods - could be, but would have thought for sure that kind of mistake on ninja's part would have blown up spectacularly on our normal builds.

view this post on Zulip Sean (Sep 23 2020 at 16:36):

except we don't have any builds with IO latency that provoke the bug

view this post on Zulip Sean (Sep 23 2020 at 16:37):

the container is almost certainly running on a virtualized, distributed, or maybe even a network filesystem

view this post on Zulip starseeker (Sep 23 2020 at 16:37):

Hmm. I'll flip back to make and see what we get.

view this post on Zulip Sean (Sep 23 2020 at 16:39):

it's undoubtedly the environment setting up the conditions, but could easily be some flawed assumption on ninja's part -- is there a way to control ninja's multithreading?

view this post on Zulip starseeker (Sep 23 2020 at 16:39):

On the command line yes - I'll have to check with the cmake -build trigger.

view this post on Zulip Sean (Sep 23 2020 at 16:40):

just by the fact that the .o has main afterwards but gcc can't find it during linking means the file exists but is not yet written.

view this post on Zulip Sean (Sep 23 2020 at 16:42):

that's easily possible with concurrency. it's also possible with FS caching that it's simply not sync'd to disk yet despite the command exiting, but then the only fix would be a sync command prior to the second gcc call.

view this post on Zulip starseeker (Sep 23 2020 at 17:06):

With both make and single threaded ninja the build order is different, so may be harder to reproduce (or at least a lot slower.)

view this post on Zulip Sean (Sep 23 2020 at 17:11):

It'd be interesting to let make run parallel - it's a lot older and already has a lot of 'sync'ing and gating going on.

view this post on Zulip Sean (Sep 23 2020 at 17:12):

it's seen some sh!t

view this post on Zulip starseeker (Sep 23 2020 at 17:14):

I left make in parallel mode

view this post on Zulip starseeker (Sep 23 2020 at 17:21):

Hrm. This is Windows related, so not what we're seeing, but sounds similar in some ways: https://github.com/ninja-build/ninja/issues/1802

view this post on Zulip starseeker (Sep 23 2020 at 17:23):

https://github.com/ninja-build/ninja/issues/1802

view this post on Zulip starseeker (Sep 23 2020 at 17:40):

Whoops, this link was what I meant: https://github.com/ninja-build/ninja/issues/1794

view this post on Zulip starseeker (Sep 23 2020 at 18:01):

Sigh. OK, we can go with default tools for the moment and see if things eventually shake down with ninja .

view this post on Zulip starseeker (Sep 23 2020 at 18:14):

Now, what's Windows complaining about...

view this post on Zulip Sean (Sep 23 2020 at 19:39):

1802 appears to be the issue, and I think the naivety of the response is indicative. they've not encountered anything but "nearly instantaneous" local filesystems, so that totally explains it

view this post on Zulip Sean (Sep 23 2020 at 19:48):

I commented on the bug, maybe he'll reopen it

view this post on Zulip starseeker (Sep 23 2020 at 19:56):

Gah. "test -e /dev/null" is succeeding in benchmark.sh, but when we get to "fopen" in the code it doesn't always work - running from inside msbuild, that fopen call fails even though the same executable will succeed from the msys prompt.

view this post on Zulip starseeker (Sep 23 2020 at 20:01):

Running from inside benchmark.sh, actually - pixcmp.exe works in isolation, but not from within the script. Grr...

view this post on Zulip Sean (Sep 23 2020 at 20:02):

this is on Windows?

view this post on Zulip starseeker (Sep 23 2020 at 20:02):

Yeah.

view this post on Zulip Sean (Sep 23 2020 at 20:02):

what fopen then?

view this post on Zulip starseeker (Sep 23 2020 at 20:02):

pixcmp.exe

view this post on Zulip starseeker (Sep 23 2020 at 20:02):

Initial CMP test in the benchmark script.

view this post on Zulip Sean (Sep 23 2020 at 20:04):

the sanity check to make sure CMP runs?

view this post on Zulip starseeker (Sep 23 2020 at 20:04):

yes

view this post on Zulip Sean (Sep 23 2020 at 20:05):

so it's running: eval "path/to/pixcmp" /dev/null /dev/null > /dev/null 2>&1

view this post on Zulip starseeker (Sep 23 2020 at 20:05):

Using [D:/a/cadcitest/cadcitest/build/Release/bin/rt] for RT
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/db] for DB
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for PIX
Using [D:/a/cadcitest/cadcitest/build/Release/bin/../share/pix] for LOG
Using [D:/a/cadcitest/cadcitest/build/Release/bin/pixcmp] for CMP
Using [D:/a/cadcitest/cadcitest/build/Release/bin/elapsed.sh] for ELP

CUSTOMBUILD : error : CMP does not seem to work as expected [D:\a\cadcitest\cadcitest\build\check.vcxproj]

view this post on Zulip starseeker (Sep 23 2020 at 20:06):

Should be, yes (if test -e /dev/null is succeeding within the script)

view this post on Zulip starseeker (Sep 23 2020 at 20:06):

Seems to be in the debugger on my local laptop

view this post on Zulip Sean (Sep 23 2020 at 20:06):

so first question is -- is there a /dev/null

view this post on Zulip Sean (Sep 23 2020 at 20:07):

or better, can we confirm what exactly it's running

view this post on Zulip Sean (Sep 23 2020 at 20:07):

set +

view this post on Zulip starseeker (Sep 23 2020 at 20:07):

Um - set +? You mean in benchmark.sh?

view this post on Zulip starseeker (Sep 23 2020 at 20:08):

test -e /dev/null
echo $?
0
(from the MSYS command prompt)

view this post on Zulip Sean (Sep 23 2020 at 20:09):

ls -la /dev/null

view this post on Zulip starseeker (Sep 23 2020 at 20:10):

Reports there.

view this post on Zulip starseeker (Sep 23 2020 at 20:11):

crw-rw-rw-1 user 197612 1, 3 Sep 23 16:10 /dev/null

view this post on Zulip Sean (Sep 23 2020 at 20:12):

okay, so it actually has a /dev/null -- that should be fine

view this post on Zulip Sean (Sep 23 2020 at 20:12):

so cmp is genuinely failing

view this post on Zulip starseeker (Sep 23 2020 at 20:12):

Yes - fopen("/dev/null") is failing.

view this post on Zulip Sean (Sep 23 2020 at 20:13):

o.O need a perror() on that failure

view this post on Zulip Sean (Sep 23 2020 at 20:13):

looks like it should already be perror()ing

view this post on Zulip Sean (Sep 23 2020 at 20:14):

what's printed if you run "pixcmp /dev/null /dev/null"

view this post on Zulip starseeker (Sep 23 2020 at 20:18):

It is perror() calling, but I'm not sure where to check for output in visual studio?

view this post on Zulip Sean (Sep 23 2020 at 20:19):

the script is sinking the output intentionally

view this post on Zulip Sean (Sep 23 2020 at 20:19):

so either need to unsink it, or add another call

view this post on Zulip Sean (Sep 23 2020 at 20:19):

i.e., remove the >$NUL 2>&1

view this post on Zulip Sean (Sep 23 2020 at 20:19):

then it should be in the log

view this post on Zulip starseeker (Sep 23 2020 at 20:21):

/dev/null: No such file or directory

view this post on Zulip starseeker (Sep 23 2020 at 20:22):

(And yes, ls -la /dev/null succeeds from the same command prompt...)

view this post on Zulip starseeker (Sep 23 2020 at 20:22):

It's almost like when it's launching pixcmp.exe, it's losing the wrapping environment that's providing /dev/null

view this post on Zulip Sean (Sep 23 2020 at 20:23):

I wonder...

view this post on Zulip Sean (Sep 23 2020 at 20:27):

think I found it, gimmie a sec

view this post on Zulip Sean (Sep 23 2020 at 20:32):

try that @starseeker

view this post on Zulip starseeker (Sep 23 2020 at 20:37):

Nope, still getting /dev/null

view this post on Zulip Sean (Sep 23 2020 at 20:39):

you said you tried running "test -e /dev/null" what about trying this:
MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="\*" test -e /dev/null

view this post on Zulip Sean (Sep 23 2020 at 20:40):

or more explicitly: MSYS_NO_PATHCONV=1 MSYS2_ARG_CONV_EXCL="/dev/null" test -e /dev/null

view this post on Zulip starseeker (Sep 23 2020 at 20:43):

echo $?
0

view this post on Zulip starseeker (Sep 23 2020 at 20:43):

Test still succeeds

view this post on Zulip starseeker (Sep 23 2020 at 20:51):

MSYS seems to be the uncanny valley of shells, in some ways - almost but not quite...

view this post on Zulip Sean (Sep 23 2020 at 21:02):

oh right, because it does exist in the shell environment ... just curiously not in the compiled environment

view this post on Zulip Sean (Sep 23 2020 at 21:02):

did that change have any affect?

view this post on Zulip starseeker (Sep 23 2020 at 21:08):

The committed change? No, didn't seem to

view this post on Zulip starseeker (Sep 23 2020 at 21:09):

Hmm, the build issues go beyond just ninja - I had forgotten cmake -build defaults to single threaded make. If I enable the -j flag explicitly, we get a different failure in a libpng cmake script execution.

view this post on Zulip Sean (Sep 23 2020 at 21:10):

so then let me try to change the test the other way around.

view this post on Zulip starseeker (Sep 23 2020 at 21:19):

And that script execution failure mode looks like an execute_process isn't properly completing its I/O before it tries the next script step. Auuuuugh.

view this post on Zulip starseeker (Sep 23 2020 at 21:20):

So we may be limited to single threaded building on the runners, not make-only building. That's... annoying.

view this post on Zulip Sean (Sep 23 2020 at 21:23):

c'est la vie!

view this post on Zulip Sean (Sep 23 2020 at 21:32):

applied a different change to bench/run.sh

view this post on Zulip starseeker (Sep 23 2020 at 21:35):

Same failure, still saying no such file/directory for /dev/null

view this post on Zulip starseeker (Sep 23 2020 at 21:36):

You might be able to try a test script on a github runner to iterate faster - would you like me to set up an example?

view this post on Zulip Sean (Sep 23 2020 at 21:36):

I'll have to add some diagnostics

view this post on Zulip starseeker (Sep 23 2020 at 22:27):

@Sean - I set up a little test repo that you can fork on github - it will let you edit bin/benchmark, push the changes, and try to run the benchmark script on Windows:
https://github.com/starseeker/wintest

view this post on Zulip Sean (Sep 24 2020 at 04:16):

okay, will take a look!

view this post on Zulip Sean (Sep 24 2020 at 04:25):

okay, so new to this. forked it, and it told me that there are/were workflows in it but they weren't going to be enabled. so now I don't see how to enable the benchmark.yml one...

view this post on Zulip Sean (Sep 24 2020 at 04:39):

couldn't figure it out, so just created a duplicate and that ran

view this post on Zulip Sean (Sep 24 2020 at 06:47):

there, that fixed it. when in doubt, rip it out? heh.

view this post on Zulip starseeker (Sep 24 2020 at 11:18):

Jenkins seems to be unhappy now?

ERROR: CMP does not seem to work as expected
(output was [
Different file sizes found: -(3) and -(72). Cannot perform pixcmp.])

view this post on Zulip starseeker (Sep 24 2020 at 11:21):

I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.

view this post on Zulip starseeker (Sep 24 2020 at 11:22):

view this post on Zulip starseeker (Sep 24 2020 at 11:35):

fwiw, the failure seems to be specific to bz - my Linux box works fine.

view this post on Zulip starseeker (Sep 24 2020 at 11:45):

I doubt it's what you want to do long term, but r77207 seems to run - apparently BSD doesn't like the "CMP - -" bit for some reason?

view this post on Zulip starseeker (Sep 24 2020 at 12:22):

Yeah, looks like Mac has the same issue as BSD.

view this post on Zulip Sean (Sep 24 2020 at 13:45):

starseeker said:

I think you can enable Actions in a forked repo from the Actions tab, but I haven't tried it myself.

I did that and then it displayed no workflows, didn't run any when I committed.

view this post on Zulip Sean (Sep 24 2020 at 13:46):

Once I added another workflow, it ran both.

view this post on Zulip Sean (Sep 24 2020 at 13:59):

interesting on the failure -- I tested it on my Mac. pixcmp parses "-" so it should work unless we gots a bug

view this post on Zulip Sean (Sep 24 2020 at 14:01):

interesting. assuming you don't have a '-' file accidentally in that dir, it's basically saying stdin's sizes are different between the two stat calls...

view this post on Zulip starseeker (Sep 24 2020 at 14:17):

::facepalm:: - methinks Github Actions still have some teething pains...

view this post on Zulip Sean (Sep 24 2020 at 14:17):

working on pixcmp

view this post on Zulip starseeker (Sep 24 2020 at 14:17):

Does running the benchmark script itself work on your mac with the "-" style input? It fails on the runner...

view this post on Zulip starseeker (Sep 24 2020 at 14:18):

/me is trying to figure out how to save artifacts from the actions.

view this post on Zulip Sean (Sep 24 2020 at 14:19):

yep, worked here

view this post on Zulip Sean (Sep 24 2020 at 14:19):

it's a timing issue, so not unexpected

view this post on Zulip Sean (Sep 24 2020 at 14:21):

that's telling pixcmp to read from stdin for pixel comparisons and for whatever reason, there's 3 bytes the first time it checks and a few more bytes the next time. what's curious is that it's more than 1 byte. need to make pixcmp print out those byte streams.

view this post on Zulip starseeker (Sep 24 2020 at 14:22):

fwiw, it is failing for me interactively on bz (or did this morning, at any rate)

view this post on Zulip Sean (Sep 24 2020 at 14:22):

should just be a nul byte, 1 byte .. so that's also an issue

view this post on Zulip Sean (Sep 24 2020 at 14:22):

ok

view this post on Zulip Sean (Sep 24 2020 at 17:00):

fwiw, I'm sure 77207 worked, but I'm intentionally avoiding hitting the disk (and working to remove the few remaining bits that violate it from benchmark). more importantly, my knowledge needs to be improved or pixcmp has a bug if stat'ing stdin twice is wrong.

view this post on Zulip starseeker (Sep 24 2020 at 18:02):

I know - that's why I mentioned I didn't think you'd want to stay with that solution. I just needed something that would work for the CI testing.

view this post on Zulip starseeker (Sep 24 2020 at 18:02):

What about pixdiffing one of the reference images with itself? That's a read from the disk but not a write, and we'll be reading from the disk for benchmark anyway...

view this post on Zulip Sean (Sep 24 2020 at 18:03):

don't worry about it... ;)

view this post on Zulip Sean (Sep 24 2020 at 18:03):

I'm going to find out what's wrong with - - because that's something that should clearly work from my understanding.

view this post on Zulip Sean (Sep 24 2020 at 18:04):

so either my understanding is very wrong or pixcmp has a bug. either way, this involves all of like 4 lines of code, so it's easy to diagnose.

view this post on Zulip starseeker (Sep 24 2020 at 18:05):

so, no git conversion this week then ;-)

view this post on Zulip starseeker (Sep 24 2020 at 18:05):

Is it failing for you on BSD at least? It'll be hell to debug if it's only failing in the runners

view this post on Zulip starseeker (Sep 24 2020 at 18:06):

/me sees another random github build failure on the runner, even in single threaded mode...

view this post on Zulip starseeker (Sep 24 2020 at 18:12):

Maybe if I add another sync after git clone finishes...

view this post on Zulip starseeker (Sep 24 2020 at 19:10):

Another new one:

D:\a\cadcitest\cadcitest\src\other\libspsr\Src\SPSR.cpp : fatal error
 C1083: Cannot open compiler generated file: 'D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.dir\Release\SPSR.obj': Permission denied
 [D:\a\cadcitest\cadcitest\build\src\other\libspsr\SPSR.vcxproj]

Last updated: Oct 09 2024 at 00:44 UTC