@starseeker did you happen to test 77843+ to make see if our step converter still works on some real world (non-nist) step files? I seem to remember there being some issue the last time we tried to merge upstream.
probably shouldn't be on that branch since it's a significant and unrelated change. that's probably an update that could be made on trunk but it'll need it's own testing regardless.
and upon further inspection, I see it's not only a major update of the dependency, but you also made edits to the dep that cascaded edits to our converter code... please -- you know that really makes it terribly hard to review and even harder to test with any sort of isolation or assurance. I get how one change led to another.
Example critique unrelated to extbuild but complicating the review, you added a Raw() accessor to stepcode.. why? the union 'ptr' field looks like it is/was already public, Raw() is public. If I were reviewing that commit by itself, I'd reject it as unnecessary/unfounded complexity.
So here was the thinking, fwiw: the extbuild stepcode controlling logic and stepcode build logic were causing problems in testing - faced with needing to change them, it made more sense (to me) to see if I could invest the time in getting the upstream stepcode to behave as we needed rather than (further) changing our own copy.
In the course of testing with the upstream stepcode, I identified a few cases (like the Raw case) where upstream had made previously public class members protected. Most of the upstream stepcode compatibility changes to the converter code could easily be added back into trunk and help keep the extbuild branch closer to trunk, if we just made the minor adjustments to our bundled stepcode to supply the accessor methods the upstream code has migrated to. I didn't make ptr protected in our bundled stepcode, because that might have had other implications for changing things in stepcode itself. Since the point is to get off of our fork eventually, I only wanted to make the minimal changes necessary for the purpose.
Having identified the necessary changes for the upstream stepcode, I can probably make similar changes to our bundled stepcode and avoid upgrading the stepcode dependency in the extbuild branch.
That raises a question, as to whether stepcode should be synced with upstream first or the build system rework should go in before doing so. If the former I'd need to do the update to src/other/stepcode in trunk and test the converter, if the latter I can try to put the non-upstream back in extbuild and make the changes needed to support the extbuild style of building.
@Sean Just for future reference, how should I have made 77837's commit message clearer? (That's the one that added the Raw() accessor to trunk's stepcode)
I think merging with upstream is great, and totally worthwhile. The example was just that -- an example. It's discussion-provoking all on it's own with rippling changes, potential issues and considerations... that don't really belong on that branch.
maybe it would be better to undo the move of ptr from private to protected -- begs the question what that motivation was, or what alternative was being provided
adding an accessor violates the scope encapsulation of making something protected, so even on the upstream, I can't imagine that being a solution more desirable than just move the ptr back to public.
it's not about having a clear commit message -- it's that it's updating a dep and changing a dep, which doesn't really have anything to do with the "extbuild" restructuring
/me winces. Alright, I'll revert and try to do the minimal mods to src/other/stepcode to make extbuild work - just so it's understood that that moves our stepcode further from upstream.
NOOoooo...
it's mixing the two that's the problem -- can just make the step code changes on trunk and then merge that to extbuild
That's what I ment
(sorry, wasn't clear)
i'm not talking about minimal mods, i'm saying you could do the upstream merge on trunk or on a branch
something that can be tested in isolation from the branch work
then that could be merged to the extbuild branch
Oh. I know it probably doesn't seem like it, but I've been trying not to disturb trunk too much lately...
I noticed and have appreciated that :)
I think we should release ...
A stepcode upstream merge will be disruptive, most likely - I can probably revert the wrappers in upstream, although I don't yet know how much work it will be and that may dislocate whoever added them in the first place...
so maybe on a branch, and then merge that to extbuild
that'll still make it hard to test the extbuild branch if anything is wrong, but it will make the stepcode update separate and easier to test at least
Do we have enough changes for a release?
can ask or check the history, but I doubt it's a big deal if we need to make ptr public again. I'd be more concerned that there's some other intended access method.
/me looks...
sure, there's like 30 items there
There's at least one showstopper MGED bug not run to ground I'll have to prioritize if we do that, IIRC
the subprocess bug?
I was thinking of the dm window not closing properly
What's subprocess up to? I addressed the Windows issue, IIRC...
I haven't retested it lately but it's in the todo -- locking up mged after running an rt until you close the framebuffer window
Ah, right - that one was platform specific, IIRC. I don't know if I can reproduce it...
OK, so two showstoppers
as for stepcode, I'm 75% sure making ptr protected was just part of general cleanup. if there's no indication in the log that was done for any reason other than good modularity and code updating, I don't think it's an issue to re-expose it. I can ask on the stepcode list tomorrow, or you can, but I'd definitely do that before injecting a workaround accessor (getters/setters are often frowned upon these days, especially if the alternative is a straight pass-through that has no type encapsulation benefit).
would be curious to see if exppp accesses that field, if/how it was changed
OK, so gameplan:
that sounds like a solid to me!
damn, that's clarity there
could even do a 3.5 merge step upgrade to extbuild so 4 is minimal/nothing
I know it's probably not practical, but in principle I'd like to try and establish some time limit/cutoff where the "OK to merge" window is declared - these large branches are very difficult to keep current for long timeframes - lots of conflicting merges - and I'd like to put an upper limit on how long that will be necessary.
that's sounds reasonable.
what'd you have in mind?
I wanted to ask you how things stood with the work that's motivating a stable trunk.
I think the time-table is centered around releasability and testability. so I'd think the natural way to force closure of a time window is to release, then make a destabilizing merge, then test+dev+fix and then release again, then another destabilizing change, then test+dev+fix etc. like ships going through the panama canal, the big ships line up one at a time to go through. trunk's just already had a big change that's not yet stable.
Um. I've had the impression you were doing some significant work on the threading subsystem and needed stability for that?
Personally I'd like an upper limit on the order of 3 months or so for the stepcode and extbuild branches, starting from when I get them both working cleanly (distcheck pass).
yes, I was/am but it's going slow due to a certain project taking / stealing priority. every time trunk had some big destabilize, I'd have to start over. it was happening so frequently that I literally have/had 4 debugging sessions on pause at one point and even lost a couple sessions. that's not on you, that's on me for not figuring out the bugs fast enough. but then there'd be another change, and I'd have to start over (again) on the big one.
nobody can guarantee how fast bugs are found - if it were deterministic we could just replace ourselves with shell scripts ;-) That's why I wanted to ask you though, since the cost/benefit tradeoff is branch maintenance costs vs debugging dislocation costs.
might make sense to move the debugging to a branch at this point, but it's still an issue of having an instability change on trunk followed by more changes that destabilize. I'm not sure what can be done about that better honestly as you're obviously unable to reproduce or even see some of these issues.
and branch is really only an option for the threading bug. the 3 other debugging sessions I had were investigating trunk instability, disrupted by other trunk instability after you'd move on
That's part of why the extbuild work has gone on so long, to be honest - I've been trying really hard to hammer it on as many platforms as I can reach. There always seems to be one more thing though, however hard I try...
Once we've migrated to github, might it make more sense to switch to a more branch+merge request style workflow?
Arguably much of the work I've been doing in extbuild and its predecessor would have been better done in my own local branches until ready, but SVN really doesn't work well for that...
nah, I don't think that would have been better. that's exactly the thing I hate about git ... it results in larger branch changes that become even harder to review, so changes either slip in that shouldn't happen or they don't even get seen/reviewed/tested or the patch sits for a long time because it's really hard to give it due diligence.
this is a perfect example -- you'd have gone down the stepcode rabbit hole on extbuild and faced with the extbuild merge it'd basically be a huge wad of take-it-or-leave it, which is a really sucky situation to get into for the code's health
I can't deny though that trying to review all of the changes in thirdparty_rework and extbuild would be a waste, since at least half of it is me following blind alleys until I find out why something won't work. That's just useless noise for anyone else...
There is an option though in the extbuild situation - in a sense, it's exactly the one we will pursue. The change is too big, so break it up into smaller pieces and feed them in one at a time.
I don't know if you get the S2 commits, but they do this on the regular and it's why they don't have pleasant interactions and why there will be a dozen unrelated thing with each scr merge. I really don't want us to get into that situation if we can avoid it.
my bigger concern right now is this feels like a fire sale...
The risk is that whoever made the change won't be willing to do the work to break it up - in my case we know darn well I'm too stubborn to give up, so it's fairly safe
?
I'd rather have the conversation while we're doing the work instead of weeks/months later when it becomes harder and more time consuming to change
something gets intertwined inevitably
Fair enough, but the countervailing consideration is getting overwhelmed. The last couple review processes have been painful for you, and it would be nice to find an approach that's not quite so rough...
I've been doing better to keep up since the last release.
(Still not following the "fire sale"...)
aside from the debugging session issues, 2 of which are nominally on me
oh the fire sale aspect is this surge of relatively major changes that feel like they have to all happen at once right now with some urgency because we're under an unusual situation with fewer eyes on us and other distracting demands on our time.
Ah. I suppose that's part of it, but (for me personally) I guess it's also an attempt to create the project I want to be working on/in...
complete restructuring of libged, major rewiring of tcl/tk, fundamental rewiring of command invocations, major build system rewrite, display management rewrite/merging, hard dependency decoupling (libpng, libz, libregex), repo migration, ... did I forget any? :)
I think those are all the ones you've seen :-P
I can probably point at unresolved issues with each of those, when in any other context we probably wouldn't have moved on to more than one or two simultaneously without investigating and doing something to get closures
honestly, what have I missed? haha
Some out-of-repo work on software rasterization to try and create a platform-agnostic software fallback for 3D rendering
ah, I knew about that one but yeah you're right, out of sight out of mind
That one doesn't matter unless/until it becomes viable anyway
to me that one is up there with my incomplete but significant progress side projects (openvdb, ospray, appleseed, libcsg, benchmark, ...). the instability affects nobody else yet. :)
okay, gotta run.. I have a couple dozen renders to kick off here asap... to be continued
Roger. over and out.
@Sean - any suggestions for a known working step file to try with the new stepcode?
I'd have to go back to a prior release to confirm, but I just downloaded random stuff from grabcad and typically had good success (9/10 would succeed if it was a 203). I'm may have a few tucked away that I saved, I'll check.
wonder how hard it would be to add the 214 entities. the current parsing seems to almost work even with the wrong schema. ran into this problem just a month ago with a model tim needed. he had vrml1 (we only do vrml2 and x3d), got step but then it was 214. had to pull it into commercial to work with it. #fail
Looks like https://grabcad.com/library/planetary-gearbox works as well with stepsync as it does with trunk.
The brep generator isn't up to handling all the pieces, but the stepcode portion of the parsing for this file went fine in both trunk and stepsync: https://grabcad.com/library/xs-650-flat-track-1
Woo hoo! The stepcode upgrade fixed the ninja build on github (old step-g wouldn't run on the Windows runner, but the stepsync test just passed on all three platforms.)
Finally, progress... that github windows runner has been driving me nuts for weeks.
cool, any idea what fixed it?
rather different cmake logic in stepcode upstream, right?
A bit, not enormously - I actually spoke too soon. stepsync branch works across the board, but extbuild doesn't - I'm guessing it's probably the std::string exposure and stepcode getting built slightly differently in extbuild. I'll have a look and see if I can get std::string out of the public facing APIs - we're probably not the only codebase that'll have issues with that at some point.
Been getting away with it by building stepcode as part of the primary build all along, but if we're going to treat stepcode as a properly independent 3rd party code the time has probably come to bite the bullet and deal with it.
Yep, applying WinDBG to the ninja build zeros in on std::string, on a part of stepcode where #pragma warning( disable: 4251 ) is applied... which looks like exactly what I'm seeing: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-1-c4251?view=msvc-160
Looks like ignoring that warning is no longer an option... doing a quick count of those pragmas... 24 of them in stepcode headers.
/me rolls up sleeves...
That's stuff really hard to get right, either using std or plain c-strings. It's very difficult memory management for something recursive like step.
even your c-string changes may not hold up...
OK... so what's the recommended path forward?
I'm not saying it's wrong,... don't get me wrong! I'm saying it's super hard to get right...
it's a huge "it depends"
like is char *str = std::string("foo") ... when is that char * that was likely dynamically allocated invalid?
it's clearly valid for a scope
once it goes out of scope, it depends
I think what likely is wrong in the current code is that it's passing a reference around everywhere
that's done for performance, avoids a copy constructor
but it also means it's subject to the original allocation scope
and will be invalid if/when it leaves that scope
I'm open to suggestions on how to proceed - got to do something unless we want to just keep building stepcode as part of our own build. (Which is the easy punt, but ignores the issue...)
No no, you're good, this is just food for thought -- things to keep an eye on as you go. Because this is subtle corruption territory.
(As an aside, WinDBG is very handy working with a Ninja Windows build - don't need all of Visual Studio up, but let me backtrace to the issue quickly.)
one easy change that "should" be safe but will be wasteful is to convert those references to non-references, which will cause the strings to get copied
I wouldn't convert them all, but if you see a problem/corruption, that's a possible solution in isolation
Hmm. Worth a try...
my c++ is actually getting rusty on the basics. I'd need to write a snippet to see if char *foo = std::string("foo"); is same as std::string s("foo"); char foop = s.c_str(); return foop; /bad */
the latter is definitely bad juju because s is on the stack and only lives in that scope, so they c_str() we're returning is invalidated once the scope is left.
I don't know the prior for sure, I should
it's this issue: https://stackoverflow.com/questions/6456359/what-is-stdstringc-str-lifetime
Would the former even work at all? Assigning a std::string constructor to a C string pointer?
eh, that was a slight typo
it's the general case of the scope on a std::string str();
like your first change, Schema::AddFunction() -- you create a locally scoped std::string fstr(f);
then push that back. What's getting pushed? If it's a copy, then it's all good. If it's a reference or pointer to fstr, then I think it'll be invalidated when the scope is gone. It'll happen to work until some other allocation stomps on fstr's memory and something tries to read the _function_list
I thought a push back made a copy, but I guess it depends on the vector/list/map type?
yes
I think you could also wrap it in autoptr to avoid having the destructor called until there's no more references to the data.
this is one of the messier areas of c++ that become obvious after weeks of having one's nose in a debugger. which I haven't lately, so I'm amazed how much of this I've forgotten... besides the gist that it's complicated :)
Didn't meant to interrupt, carry on -- just might want to try enabling a memory guard library to catch corruption the second it happens in your testing. Because having a pointer to destructed data is going to work .. at first. It'll just present as random corruption later if it's wrong.
/me nods - good advise. Since I know roughly where I have to go now I'll probably start on Linux - any recommended guard techniques these days?
valgrind memcheck is a classic, no?
I believe llvm has something good too.
right -fsanitize=memory compilation option. slows everything down, but really helps isolate memory issues.
It is, although it doesn't see any of our current problems as far as I can tell - the list elements test reports clean
it could be all good, I'd have to dig deeper on that container
AddSupertype_Stmt() looks good for example -- you pass in the char, then allocate a std:;string off it that is stashed in the class (so it stays in scope).
Except that didn't fix anything - Windows still had a problem in the same place :-/
that's the sort of random corruption I would expect if this is wrong anywhere in the code. you'll hit a problem when something else happens to allocate over a thing that was released.
The target std::string looked odd in the debugger, but I'm not sure why...
so that could just happen to be where the corruption becomes visible, not necessarily the cause point
That would explain the quirky failure patterns I was seeing earlier.
Or lack of patterns, rather.
OK, that points to a general cleanup of all the 4251 code as an initial step.
always a good thing
Almost wish I could deliberately induce the Windows 4251 behavior in the Linux C++ libs
if you still see badness, I'd suggest slowly start passing/storing copies
if you're in the debugger and have a reproducible case, you may be able to set a watch point on specific memory as it's allocated, and watch where it goes, what happens when it's pushed into a container, read from it, etc
advanced debugging, but should be able to tell if / when the destructor is called
but will require looking at the context of the current code to determine whether a copy makes sense or if that will screw up the entity management
/me nods. The hard part is going to be the generated code - that'll be a nightmare to follow around, if it comes to that
Yeah, copy didn't fix it either. OK, stay tuned - 4251 week coming up.
@Sean I'd like to simplify the stepcode sources a bit and reorganize - consolidate exppp and base into express, for example. In a quick look at the C++ step iso pdf the only thing that jumped out at me with regards to library or file specifics was the "sdai.h" header - am I missing something that dictates the current structure specifically?
I know function names, etc. are generated according to patterns - I'm thinking more about the stepcode libs and headers themselves rather than what they generate.
what do you mean by consolidate? exppp is a pretty-printing tool. I know nothing about base.
from what I remember, it's the headers and API that I recall being specified in conformance APs -- i.e., what's must be in the API and how you call into it (e.g., header include names). probably a good litmus is whether you have to change application logic (esp. our step-g code) or are the changes all internal.
exppp isn't part of anything, it a pretty-printer iirc. It's just a simple example how to call into libexpress. exppp's code doesn't belong in libexpress any more than a given proc-db tool belongs in libwdb, for example. but where the example lives doesn't matter much so long as it's an identifiable example and not lost alongside other code.
looked at base, it looks like it's pretty irrelevant bu-style code that could get absorbed/disappear
looks like sc_getopt() is used by the fedex-linking binaries express
looks like lots use sc_memmgr(), so libexpress works there too
exp2cxx is the only place that uses sc_trace_printf, so it could move into the exp2cxx processor
sc_benchmark.h appears to be used in cllazyfile test and a p21read test
Sorry, meant libexppp not the exppp executable itself.
Does anything else link it? I don't think that code belongs in libexpress. It's not really express API, it's just some arbitrary function separation someone came up with for exppp. Maybe separated for unit testing or something. You could eliminate libexpp and just make it part of exppp's sources if nothing else calls into it.
exp2cxx links it
/me figured pretty printing express schemas was a reasonable fit for libexpress... you disagree?
/me checks to see if exp2cxx is actually using libexppp...
yep, get undefined references in exp2cxx without it
I can see how you'd see it like that, but I think the original author(s) was/were on point keeping them separate from a design perspective.
Pretty printing is a decorative / stylistic presentation property. It's model vs view in MVC. It's css vs html in web. They certainly can be combined, but it's rarely an improvement (design-wise) unless text presentation is intrinsically required due to the API (which it's not for EXPRESS).
In this instance, it'd add some 50 symbols to a library that already has 400+. Combining them becomes a (relatively minor) detriment for both users and non-users of either lib. Having around 50 highly interrelated symbols in a library is actually rather nice. Our libs aren't exemplary in that regard.
That all said, it's also not a big enough detriment to draw a strong opinion from me. Demodularization is an antipractice in general, but if I were to inspect the libexpp symbols in more depth, I'd likely find some that belong in express and others that don't.
There we go - finally. Success on all three platforms.
/me braces himself and fires up the BSD vms...
Woo-hoo! extbuild distcheck full passing on GhostBSD and Ubuntu, plus a clean github CI run on all three platforms.
1. CHECK revert the trunk step changes
2. CHECK make a branch for the step upgrade
3. CHECK do the upstream sync in the branch
4. CHECK adjust extbuild's version to match whatever the step upgrade branch has
5. fix key bugs for release
6. do release
7. wait for window when trunk can be unstable/dislocated
8. merge stepcode, then extbuild to trunk.
OK, on to release bugs. Time to sample the joys of the MGED dm management system...
Last updated: Jan 09 2025 at 00:46 UTC