Interesting, some preliminary results coming in from testing main vs 7.36.0 on Mac.
Overall, main is definitely succeeding more. There are some that timeout after 5min and some that throw mtx errors and "SHOULD NOT HAPPEN" errors for both old and new at a seemingly similar rate.
Looks like conversion/facetize is also taking about 2x longer with the new approach. Most simple objects that succeed seem to do so in about 3s whereas they're 1.5s on average in 7.36.0. Running the conversion across lots of files, that of course adds up but then is partially offset by slightly more old that hit the 5min timeout limit (I think, still have to verity).
For the fully successful run, I had MAXTIME=5000
as my upper limit - 300s was definitely much too short, so the timeouts are no surprise.
I noticed the simple cases being slower - I think, at least in my case, what appeared to be happening was the overhead of starting up the subprocess to process a single primitive was adding the extra time (which would explain the 1.5s vs 3s difference - two process startup/teardowns vs 1 for 7.36.0 - the actual primitive facetize in such cases should be virtually instantaneous.)
For the SHOULD NOT HAPPEN errors, one possible approach would be to get the newer mmesh version of that logic working: https://github.com/BRL-CAD/mmesh
When I took a quick look it's not quite a 1-1 drop in replacement for the gct calls, so I could use a little help getting it figured out and set up.
Still don't yet have the delta, but here's full run results I got on Linux w/ main:
Summary
=======
Converted: 96.9% ( 9923 of 10245 objects, 40 files )
Passed: 9923 ( 9974 NMG 10231 BoT 10116 Brep )
Failed: 303 ( 243 NMG 12 BoT 126 Brep )
Timeout: 19 ( 28 NMG 2 BoT 3 Brep )
NMG rate: 97.4% ( 9974 of 10245 )
BoT rate: 99.9% ( 10231 of 10245 )
Brep rate: 98.7% ( 10116 of 10245 )
Prim rate: 99.7% ( 7029 of 7050 )
Reg rate: 95.1% ( 2316 of 2436 )
Elapsed: 61697.0 seconds
Average: 6.0 seconds per object
That's a surprisingly high failure rate for BoT - the input test set is our sample models?
Yep, it's a straight up run on db/, no additions.
0.1% is a high failure rate?
On my local machine, at least with a timeout of 5000, I had a completely clean run
I was more surprised that BoT succeeds more than NMG -- in theory could go bot-to-nmg on those failure cases to get both in sync.
NMG has to use the old NMG boolean - that'll fail more often
Manifold is BoT only
NMG has to result in an NMG...
so if we had a bot-to-nmg(), which is pretty straightforward, they could be in sync, more success all around.
I mean, if you're willing to ditch the fancy polygons for NMG triangles I guess - but doesn't that defeat the point of NMG in the first place?
nmg is would then still give nice quad-mesh results for some things, but still give some mesh where it'd otherwise fail.
No no, that's why I said when nmg fails
Oh, gotcha
a fail is useless for everyone
True - I suppose the only time you'd want to see it would be a dev trying to use the Manifold techniques with NMG data types
I'm really curious what primitives out-and-out failed. Timeout I can see - the fallback methods can be expensive - but failure...
starseeker said:
On my local machine, at least with a timeout of 5000, I had a completely clean run
That could be the difference. I used the default 5min limit. 50min is nuts imho... :) That said, good point about whether it succeeds at all vs cannot.
I'll re-run with a higher limit, but definitely a UX argument to be made given how simple all the sample models are. As an outsider, I would expect them all to be sub-minute or something is "wrong". Even 5min (again, per object) seems pretty generous.
Also, I didn't dive into the log yet to see if they failed because of timeout. So will have to check that.
I will have to get it running in parallel before doing that -- it took a super long time to get through everything as it is.
plate mode to vol and brep point sampling are the two worse offenders
Generic twin booleans are nothing to sneeze at, even if you pre-convert the plate modes
17 hours to run everything...
(with a 5min timeout!)
my numbers are in the facetize thread, IIRC
I'm kind of assuming that real models are going to take 10-100x for full conversions.
<nod> Not claiming we're "done" in any way - all the fallback method uses except DSP are an indication of a problem in our primitive conversions that should be fixed.
Given where we have been historically, I was willing to take any sort of "working" I could get to start with. Frankly I didn't expect them all to succeed even with the long timeout, I was rather surprised when they did
That's not a dig or push, just talking out loud my thoughts on implications
If your run did have all timeouts, we need to fix the summary - what you posted above made it look like some timeouts and some failures
I kinda buy the fallout methods being quirky depending on the environment, since the point sampling is pseudorandom
Immediate goal is still to get a conversion trajectory over time, but hit a load of build issues that I'm working through. Would be nice to track the finish line progress on %conversion, time, and conversion methods, dashboard it up onto a graph. Then throw more models into the mix.
starseeker said:
If your run did have all timeouts, we need to fix the summary - what you posted above made it look like some timeouts and some failures
Can change it, but the pragmatic issue is picking a line in the sand that is "too much" no matter the reason. If a single object conversion took 10 days, I would kind of say it doesn't matter -- that's a fail for all intents.
Original line was 5min as a general rule that a full model would potentially have two orders more, which would be approximately a full day for a full model to convert. Something that doesn't complete overnight is a difficult proposition.
Bumping to an hour per object increases that to an order, so over a week to convert... potentially useful to know where we are algorithmically, but definitely long enough to give anyone pause.
(on a real model)
Still, point taken that it can+should include the second number (#timeouts) in the summary just so it doesn't conflate the two. That's definitely important for our own purposes.
Sean said:
Still, point taken that it can+should include the second number (#timeouts) in the summary just so it doesn't conflate the two. That's definitely important for our own purposes.
and I'm blind... it already calls out timeouts. There were only 2 bot timeouts. The rest were actual failures.
So it does affect the success percentages, but not terribly so for bot. It should probably list the timeout and other details (like the version identifier) in the summary for sure.
The failures are what surprise me - I didn't see those here
Not terribly surprising if there's some random perturbations involved. That'd make it non-deterministic.
Even if there's somehow not random involved (which would be a little surprising), floating point fuzz can definitely still do it. Floating point issues only present across platforms or compilation setting changes. To be expected unless that was pretty exhaustively tested for specifically.
Last updated: Jan 09 2025 at 00:46 UTC