As some of you already know, we're planning on moving our main repository and operations from SourceForge to GitHub real soon now. It's taken approximately two years (yes years, but worked predominantly on weekends and evenings) to get the entirety of BRL-CAD's repository converted from Subversion to Git. This work, by Cliff Yapp, has included fairly extensive complicated mappings to preserve as much data as possible, to fix old corruption, to track changes across major disruptions, to verify and validate that everything is preserved.
As this is a big change to our development operations, this is an intentional "open comments" period for folks to talk, to adjust, ask questions, give feedback, get prepared, explore tutorials, etc. The intention is to flip the switch in a few weeks.
One question that's already been a point of discussion (and some of you have already shared your views privately, thank you) is how to handle the commit e-mail associated with past commits. If people want them associated with their current GitHub profile/e-mail, then we'll need to set those before migration is complete. As it currently stands, everyone's commits are associated with a fictitious "USER@sf" e-mail.
If you'd like your commits associated with a specific name and/or address, please contact me in private or make the change yourself in misc/repoconv/account-map
@Sean so I guess , It is possible to fork from brlcad git
@scorp08 Yes, of course it will be possible. Technically it's not hard to fork the Svn repo now, but it will become even easier.
We'll still be maintaining a central repository structure to encourage collaboration and accelerated development, but it's all good. If people feel more empowered to work on the code in a fork than they do in a clone, I'll still be happy to see their development. Hopefully it won't get too messy and we can actually improve coordination and make it even easier for new developers to get involved with improving the code base.
Git?
I need a couple more days to contact the last remaining committers to get their e-mails, create aliases for the handful that aren't reachable, then assume another 2 weeks for @starseeker to run the reconstruction, followed by maybe 1 more week of validation and testing while uploading to GitHub, and if all goes well, we should be up and running by the end of the month!
<squeaky wheel noise>
@Sean status? anything anyone can do to help? do I need to swing by the farm supply store for a salt lick and a cattle prod to do the "carrot and stick" thing? :D
ping...
less hearts, more answers, boy. What's the holdup? my git-fu is pretty strong these days and my drives tend to be more than 8 gigs (the drive my home server used when we did cvs->svn) these days, so I'm not complaining about repo size :D
@Erik If you want a preview, you can take a look at https://github.com/starseeker/git_conv_test - it's about 5 months out of date now and the non-email committer names mess with github's stat calculators, but it should be a pretty fair representation of what the conversion will end up looking like otherwise. If you want to ,check it out to see how it behaves for you (and see if you spot anything wrong). If you want the git notes that have the SVN numbers, you'll need to explicitly grab the notes as well with: git fetch origin refs/notes/commits:refs/notes/commits
The hideous conversion process is laid out in misc/repoconv/CONVERT.sh - it's about as ugly as it gets: C++ mixed with shell scripts mixed with sed and stream of consciousness quick and dirty hackery , but it seems to (slowly) get the job done. I'm still not very skilled with using git day-to-day, but I now know quite a bit more than I wanted to about fast import and export and friends.
@Erik he's been patiently waiting on me. I'm the holdup. I've been confirming with every past committer since I have contacts for nearly everyone and they've responded with a plethora of e-mails to use. I just had a few remaining to contact which got delayed with a tasker at the office and GCI and GSoC prep and server issue and ... delays. Now with all this ample time on hand (hah), at least time at keyboard, I've been getting through mad backloggage so we should be able to wrap this up with the final pass this week I think.
It's not a space issue, it's about having a complete history that doesn't loose anything, which @starseeker has gone to exceptional lengths to preserve. The rest is limitations of github that require real contact info if we want to have real stat preservation.
git history is rewritable. mistakes at this stage can be fixed. (rewriting git history is dangerous and expert friendly, but we're not ... committed.
good luck wrapping up the last few, if'n ya'll need git or shell help, lemme know, I've been using git almost exclusively since... hm, was it '13 that I left arl to check out the modern world? :D
We may not be committed, but once we go live with the new github repo and people start forking rewriting the full history would be highly disruptive. On the order of what we nearly had to do with the Great SVN Duplicate Commit ID crisis a number of years back.
The chaining of SHA1 hashes is neat for repository integrity, but it means there's no such thing as a local history change. I spent a lot of time thrashing trying to figure out if I could splice the newer SVN conversion onto the older CVS git conversion, and it took me longer than it should have to realize that it's actually structurally impossible to do that with anything other than a full commit replay of the post-CVS commits on top of the CVS conversion (hello, rabbit hole).
So since I REALLY don't want to have to wade through all of that any more times than I need to (there are some finicky manual steps that have to get updated each time the committer emails change, not to mention the delightful experience of mucking around in the swamp mud of my conversion logic) I'm willing to wait for @Sean to get it right the first time :-)
Whadya mean "modern"? We use CMake and everything these days! We're even embracing this newfangled C++11 thing! Now get off my lawn! :-P
hehe, ch'know, there's a c++17 now :D
(still c++, though... swift and go are way nicer... I hear good things about rust, too)
@Daniel Rossberg The plan right now is to also convert all the smaller project histories (including rt^3) to their own individual git repos. Does git+github work for you for rt^3? (We have to change the name to rt_3 - the ^ character causes some problems for git..)
Well, sure, I didn't created this name. However, why don't we call it rt3?
That's fine too, assuming it works for the converisons - rt_3 was just what I had put in the original svn-fast-export mapping files when I found out ^ wouldn't work.
The "rt-cubed" name doesn't need to be preserved. I would suggest renaming the repo to "moose" since that's the name we decided on.
or MOOSE ?
would be good to disambiguate from https://en.wikipedia.org/wiki/MOOSE_(software) in some manner, maybe moose++ or just be fine with moose or ...
@Erik git history is typically rewritable, but we're (thus far) using a feature of git (notes) that precludes rewriting history without rebuilding hashes. that's because the way git notes are currently implemented, they attach to specific hashes and are not updated on history edits. they get orphaned. it's lame, but it's the best solution so far for attaching svn's metadata to specific commits. open to other solutions.
Sean said:
The "rt-cubed" name doesn't need to be preserved. I would suggest renaming the repo to "moose" since that's the name we decided on.
I recommend to stay with "rt-cubed" name for the conversion, because this branch is more of a sandbox for experimental extensions than C++ interface specific.
However, I agree with you to aim for an own moose repository for the C++ interface and its belongings in the future.
Sean said:
would be good to disambiguate from https://en.wikipedia.org/wiki/MOOSE_(software) in some manner, maybe moose++ or just be fine with moose or ...
I would officially name it BRL-CAD MOOSE for "BRL-CAD Modular Object Oriented Software Extension". I.e., the MOOSE acronym makes only sense with the BRL-CAD prefix.
software extension? I thought we had a better backronym. ;)
Modular Object-Oriented Solidity Engine
Whatever :grinning_face_with_smiling_eyes:
Important is the name MOOSE with its wonderful logo.
so true
@Erik about the git-notes usage - I did that so the git commit messages could exactly match their SVN counterparts, which allows for a fairly straightforward analysis to map SVN ids to older commits.
For the CVS portion of the conversion (i.e. the commits put in Git straight from the cvs repo) the ordering and specifics of the generated commits varies a bit from the cvs->svn results (which is one of the reasons I went to all this trouble - cvs-git produced better results with the very early commits). That means the commit messages (when unique) are the best available way to find SVN id mappings to older git commits, hence I needed to keep them the same in both conversions. (Even that isn't enough to reliably peg all cvs->git commits with SVN ids, but an upside of using notes is that if someone someday wants to do a better job of SVN id mapping than I managed they can do so without disturbing the main Git history.)
ping...
ping...
does he need rebooted?
Heh - just high load compared to available bandwidth
ping...
traceroute?
ping...
alacaPING
@Sean Are we still waiting on responses?
No, I've not had time to work on it, chasing other issues.
I need to set up aliases and update a couple things and it'll be good to go.
Ah, K.
ping...
@Sean would it help if you sent me the updated info and I integrated it into updated author maps? As long as those are finalized we can start the conversion without actually requiring the aliases be present on bz...
ping...
I'm starting to wonder if starseeker's pinger is broken
Heh, not broken, almost there
Lots going on
tautological. High interest topic, so nosey noses want to know :)
Like whether everyone got their invite -- I resent it again for a third time to all, you get yours?
(burndown list? blockers we can help with? rough eta? have you sourced adequate caffeine?)
I just got a new coffee grinder, it's been amazing
yup, enrolling now
my grinder broke :( I have to drive 5 minutes to the office for real coffee
hey, neato, I guess I'm a "mentor" now
Ah - so that's mentor invites not github invites?
That's quite a contraption @Erik .. you got that 5+ years ago I think, right?
<snort> from the looks of that contraption you're lucky breakage didn't involve an explosion
There must be quite a science to proper coffee grinding
I have a similar expresso machine I've used for 15+ years. Sounds like it could explode any minute, but that's just how they work. They build up steam pressure to heat and force liquid through the grounds. Which also means you want really fine grounds, not the same grinding used in drip coffee machines.
the work one, has two cafe grade grinders to the left of it, it's a beast (and was some of my first on the job training at this place). I'm stuck with keurig's, a moka and an old target espresso maker that gathers dust :/
github? ping?
spending all time fixing builds and debugging, want to help -- can figure out why mysqld is using so much memory, see if it can be cut in half
i'm looking at whether/how jenkin's usage can be reduced
sure, people are using databases. remove the db's and terminate access, problem solved
I mean, uh, O:-) for jenkins, you might be able to tune max vm size, but java historically has a habit of not releasing much memory, it likes to hold onto it for it's own allocator
heh, well I'm almost certain the largest offender is the wiki .. but that's a hypothesis and still doesn't mean there's not some configuration options that might reduce usage too
yeah, I know java is a notorious pig, but almost certainly can get it to use less than 6GB
mysql probably has vm tuning options, too
I suspect it's loading a lot of stuff from our side that it doesn't need to
mysql is the one that's almost certainly loaded with attack attempts
website gets hit constantly, and could just be gradual accumulation of crap
probably... could try just restarting those services and see what happens, certainly something we could tune to fix, but might be a quick bandaid^Wadhesive bandage
mysql gets restarted frequently, it's sitting around 2GB
I tuned down a couple of it's buffers, looks like it's at 1/2 vm and 1/3 res, we'll see how far it drifts up.
and/or what asplodes :D pooters are fun!
it's set to use zero swap, so it's at 50% capacity if everything goes resident
@Sean If I'm still breaking the OSX build, I can shift entirely to working in branches until we finish the github migration...
Alternately, I can put a snapshot of trunk up on my own github and see if I can figure out how to hook up the OSX CI system
ping...
ping...
Eager minds want to know how close this sausage is to being made. :)
ping...
what is this channel for?
discussion of an eventual transition of the BRL-CAD source repository to using the Git version control system
you guys were talking about CI system, right?
Continuous Integration is a separate topic
ok
ping...
what resource is missing to put a bow on this? lack of Seans? we can do the star trek thing and split him into the saucer Sean section and the other Chris section, right? "Make it so, number :poop: "
@starseeker Probably missed your window, but ... it'll be there for whenever you get back.
It's done!
Aliases have been added and lots of confirmations and updates for others. Apparently took some 20+ hours to finish it up. Lots of proper e-mails in there, though, so worth it. Lots of simple awareness too.
a bunch of folks link through an alias to a noreply@ address in cases where I didn't have and couldn't find any contact information or if they were unreachable.
Awesome - thanks! Will have to wait to kick off the main run, but I should be able to start on some of the preliminaries (in particular, updating the bridging commits between cvs and svn, which require manual adjustment.)
There's still a + in front of jgrosh - is that significant?
starseeker said:
There's still a + in front of jgrosh - is that significant?
Oh, good catch. Yes, significant, as it means I hadn't reconciled his yet. Yay for book-keeping that served its purpose! No response, so he's replaced with an alias.
At this point, any remaining uncertainty or issues I'm just replacing with brlcad.org aliases. Frankly, they could have all been replaced with brlcad.org aliases and captured inside GitHub, but this way I don't have to be in the loop (as they are DNS MX records).
@Sean The only other thing I noticed is the cvs_authormap has a "jebbly" entry for Jeffrey Liu, which wasn't in the svn map (probably my fault.) Should that just be jebbly@brlcad.org ?
Actually, per r75095 that's a recent committer?
Presumably the same Jeffery Liu in the chat now... I got misled by seeing the name only in the CVS authormap.
yeah, I did not reconcile against the other file, so another good one to catch
I pulled a list of committers from the svn log and compared it to the ones in the map - I think we're good now. I'll run a basic conversion of the CVS history and upload it to github to see what happens with the new email addresses
If there's any problem, we should either just switch EVERYTHING to brlcad.org aliases for the historic commits (to preserve the username as-is, even duplicates), or we should check out what GitLab does with the same info.
/me nods
I got the CVS part to convert, which will hopefully be enough to tell the tale
https://github.com/starseeker/brlcad_cvs_git/
I imagine the stats will have to crunch for a little while
Ah. Well, contributors so far only appear to be those tied to a github account? https://github.com/starseeker/brlcad_cvs_git/graphs/contributors
Might be worth a support question...
Not sure if individuals profiles will pick up retroactively on commits made before they joined github...
Also, this being in my own personal grouping (as opposed to an org) might have some impact...
They do, the commits go all the way back (e.g., look at John's)
We could create github accounts for all of the brlcad.org aliased accounts, that way they'd at least show up.
at least for one in particular...
so... let's see. there are 102 entries of which 88 are unique. minus the 19 accounts it found. minus 26 aliased.
that leaves 43 unaccounted and unaccountable.
284281400@qq.com
abhijit.nandy@gmail.com
agkphysics@gmail.com
andrecastelo@gmail.com
anuragmurty@gmail.com
ben.e.saunders@gmail.com
bhinesley@gmail.com
bilmer1@comcast.net
brlcad@mail.lordofbikes.de
carl.nuzman@nokia-bell-labs.com
carlm0404@gmail.com
cdueck93@gmail.com
cezar.elnazli2@gmail.com
cprecup@cisco.com
dgodbey@yahoo.com
dloman77@gmail.com
doug@survice.com
ebautu@gmail.com
g.sayol@gmail.com
indianlarry@verizon.net
jdoliner@gmail.com
kunigami@gmail.com
manuel.montezelo@gmail.com
marcodomingues20@gmail.com
maths22@gmail.com
michael.j.gillich@gmail.com
mireastefangabriel@gmail.com
mohitdaga.lnmiit@gmail.com
nreed1@umbc.edu
popescu.andrei1991@gmail.com
robert.reschly@gmail.com
sam@hocevar.net
sharan.nyn@gmail.com
shubhamrathore1947@gmail.com
thedawnthomas@gmail.com
tim@jvsw.com
tom.browder@gmail.com
u2isaac@gmail.com
vladbogolin@gmail.com
zaqcloud@hotmail.com
indianlarry has an account, so could check with him to see what's up, what he set it to
several of those are surprising
question for you @starseeker , looking at the docs it looks like both username and email get recorded. are the old usernames being preserved or collapsed? just wondering.
@Sean Right now they're collapsed - you would need to pull the map file to associate a sourceforge name with the github id, and for individuals with multiple svn ids I didn't preserve which commit was made with which id. Could probably do so using the notes mechanism, now that I think about it, if that's of interest.
Would github allow the creation of accounts by someone other than the individual in question? My thought was to inquire if there was any way to have the contributors page report non-github contributors in some fashion...
btw, did you switch Erik's email as a test of the alias mechanism? He had given us another email earlier, if I'm remembering correct.y
When I look at (say) your individual page, it's only reporting your contribution activity back to when you joined github in 2011 - it doesn't look like it's picking up on the older commits and associating them with your account
might be another question for the github folks, if anyone has good contacts there...
eh, I think I asked to be erik@brlcad.org
@Erik Oh, OK - good. I couldn't remember.
@Sean What do you think? Want to go ahead with the conversion with the email addresses as-is? Or try to contact github to find out more?
I'm thinking it's probably not worth it to tweak too much more, unless you want to track down the root cause of the "surprising" accounts that ought to show up even by github's current contributor criteria but aren't... We can provide something like https://brlcad.org/~starseeker/git_stats/general.html on our own project site to document contributions with more control.
Not ideal certainly - it would be nice if we could get the github site to more accurately reflect the full history - but so far I'm not having much luck trying to research whether that is doable...
(one thought - did indianlarry commit to the repository prior to our conversion to SVN? If he doesn't have any CVS commits he wouldn't show up in this test...)
He had a run in the long ling ago, perhaps under rcs, roght?
Long long ago
2009 was indianlarry's earliest commit, according to the previous git conversion.
Yep, post-dates CVS - last commit there was end of 2007. OK, that explains it.
Better test will be once I've got the SVN history spliced on, but that's the hard part. Looks like it's time to update the bridge commits...
(or more precisely, I'm out of excuses to avoid updating the bridging commits... ick.)
@starseeker I think it's worth doing both - asking github if there's a way to show/list contributors that don't have a github account, just to make sure we're not doing something wrong, and proceeding ahead.
The surprising accounts are probably worth looking into just to double-check that they're not a typo or dead e-mail. There was only a couple, one that I just fixed yesterday. some people we had gmail accounts for have switched to different gmail accounts.
I think if we can account for everyone and the vast majority - say 95% - show up under the contributor list, we're good. We may get to that percentage just by ensuring one or two accounts.
https://github.com/starseeker/brlcad_convtest has what I'd gotten as of this morning - up to 33 github contributor links
indianlarry is in there
I created an account for mike, so that should get another ten thousand commits. I'm looking through the list and going to see if any heavy contributors are missing.
I'll do another upload in the next day or two, if my laptop doesn't die
parker has a curious commit count.. that page is showing 4431 but I'm seeing 5105 in svn, that because it's only played through 2012?
tbrowder is similar -- shows 57, but svn has 1637
Yeah, only into the 40000s on commits
okay, maybe around the right time before browder did a lot
It'll be next week sometime before I get all the way through
looks like we're only missing one >1k commit author
why would someone have more commits in git than they had in svn?
If we're missing any from the cvs era that's a problem - newer SVN committers (post 2011) won't show yet.
cvs -> git conversion may have broken out the commits differently
(than cvs2svn)
Can you check gdurf? He had 682 in svn, 710 in git.
don't know if you have an easy way to compare, I just did an svn log > log and counted them up
git log --author="Glenn Durfee" --pretty=oneline - gives a quick overview of the git commits
SVN is harder...
I can already see early commits in the git history that have the same git message - that's a probable source
Are they mistakes or denoting something like a file move?
My understanding is that cvs-fast-export had to deduce which cvs operations in different files denote "commits" for git, when those timestamps don't exactly line up. The tolerance on how much the time span is allowed to vary before a new commit is declared is one of the settings that can be altered on the tool.
I haven't tried to adjust that too much - it only impacts the CVS portion of the history. The CONVERT.sh script has the setup used for the initial cvs conversion, and that's quite fast if you want to do some experimentation.
I don't want to experiment, but I would like to confirm that is exactly what's happening here, as opposed to some other unexpected behavior or a bug or bad data or ...
if you look at a couple of the duplicates, do they differ in files, timestamps separated by a few seconds, or something else? might be concerning if they're different changes to the same files.
I see a commit 1996-03-25 16:42:45 that doesn't have a corresponding svn commit id
That means the analysis scripts couldn't find a commit message with a close timestamp
what's the actual content of the commit? does the other commit with the matching log message match an svn commit it? does it's change match the svn change? (should it)
That one doesn't have a matching message. I'm seeing more that don't (at least 10 so far) which is a bit surprising. There is one at 1994-12-16 15:33:48 that does have a subsequent commit with an SVN id assigned - 1994-12-16 15:38:40
r10215 looks like it got split up into a couple commits in git, and the time ordering is slightly different.
okay, cool ... does that fully explain it? how much time are we talking about? couple seconds?
looks like a few minutes
If there's a way to get git and svn to generate identically formatted diffs, we could identify when we have actually differing commits - that would be the best/only way to get true assignment of SVN commits to exactly corresponding commits. My estimate was that the maximal utility was to go ahead and assign the numbers based on the commit ids, since it would localize the commit to the general portion of the git history containing the corresponding changes.
The next best thing would be to generate a list of all commits that don't have svn ids assigned and inspect what's happening around them.
git log --notes --invert-grep --grep=".svn." --pretty=format:"%h %an %ad %s"
gitk --notes --invert-grep --grep=".svn." allows for inspection in gitk
Bah - github's web history doesn't have --follow enabled, from the looks of it - ell.c history stops at the restructure.
Sigh... https://github.com/isaacs/github/issues/900
@Sean Today's upload sees Mike as a contributor: https://github.com/starseeker/brlcad_convtest2/graphs/contributors
Looks like github doesn't retroactively add contributors when they create accounts - yesterday's upload still doesn't show him.
So I'll need to make sure others are created before final upload. Good to know. Was going to crunch the numbers, but I think carl is the only >1k committer missing. He's not likely to create a github account, so he can be switched to a brlcad.org alias.
@Sean looks like it will be a few more days at least to finish up the test run (in the mid 60000s range now) - I'll upload it as before once it's done so we can inspect the github integration.
Seeing as we now appear to be getting very close, what's the procedure for flipping the switch from sf to github? Lock the SVN repo as read-only, upload the github repo, and update the web page links are the obvious first steps - will we keep using the existing email lists for the time being?
Possibly of interest: https://github.com/cmungall/gosf2github
yeah, I'd definitely like to import the feature request, support, and bug report trackers to issues, so that's more than of interest. I've come across a couple similar efforts to import sf data. the one you link looks pretty good. suggests we utilize a non-dev account, which is probably a good idea. the patches tracker is a separate beast and will need to be dealt with differently.
would help to have a checklist on the wiki so we don't miss an action. willing to write down what you know so we can look into ordering and making sure we got everything? I can add my notes as well.
Made a quick start here: https://brlcad.org/wiki/Github_Migration
I've got notes scattered around (most in misc/repoconv/NOTES) with more details.
Cool, I'll add some of mine to it. Awesome! Thanks!
@Erik We're still a few weeks out (at a minimum I need to re-run the conversion with the final account-map in place) but we can see the finish line now
@Sean Is it worth putting out an email to the brlcad-devel list with a "last chance" call for any account info updates? Or is that not needed?
Good idea, I'll write up and send an announcement.
Test conversion complete: https://github.com/starseeker/brlcad_conv3
Unless there are more email changes needed, we should now be ready to begin the final conversion run.
/me looks
Nice! Looks like it's recognizing 55 contributors now. Not too shabby. Did you start that before I changed Carl's address?
Unfortunately, yes - doesn't have that change nor Ben's. That's why I'll have to run one more time.
Also why I was suggesting sending out the "last chance" email - once I kick off this time, we're locked in.
okay, I have a couple more to make and we should be good to go
yep
I can send the email now then, unless you wanted to send it? appreciated seeing the draft.
If that looks good I can send it - you're the better wordsmith, so I wanted you to have a crack at it
Assuming a repeat performance, it looks like a bit shy of two weeks for the run - so around the beginning of August we should plan to lock the SVN repository and open up the github repo.
I'd reduce it down a bit and put the main point first, but it looks good enough as is too.
OK, go ahead and send it - makes more sense really for you to do so since you've been POC for the emails all along
I like the automatic stale branch designation
I'm planning to start the run on the 19th, if you want a fixed deadline for the email
We may want to adjust our tag names going forward, so the tar.gz file github generates from the tag will be more meaningful name wise - right now we get "rel-7-30-8.tar.gz"
sure, I can put "before Monday" in the mail
starseeker said:
We may want to adjust our tag names going forward, so the tar.gz file github generates from the tag will be more meaningful name wise - right now we get "rel-7-30-8.tar.gz"
I think we're good. If anything, we could adopt Semantic Versioning (which is simply v1.2.3), but we don't fully comply so that would be a bit misleading.
The tag downloads aren't necessarily meant to be release tarballs (though they obviously should be the same), or at least their convenience priority is geared to for devs on the command line, not downloaders. Some notable examples:
https://github.com/tensorflow/tensorflow/tags
https://github.com/torvalds/linux/tags
https://github.com/redis/redis/tags
OK, cool. As long as nobody complains that our Github tar.gz download links aren't compliant with HACKING ;-)
What we have is simple and self-consistent, historically accurate.
Ah, so Releases are the fancier version. OK, that's a corner of Github I've not delved into yet
Yeah, like I said, those won't necessarily be our source tarballs. They'll be wherever we host our binary platform releases, since we still need those too.
Binary downloads aren't something Github supports directly -- some use the GitHubs LFS, some self-host, others still continue to use SourceForge for downloads since that's the one thing they actually are still good at.
https://docs.github.com/en/github/administering-a-repository/managing-releases-in-a-repository seems to suggest you can upload binaries?
(#7 in that list)
Well what do you know, they added it.
heh, looks like they added it 7 years ago. shows how closely I've been paying attention to it.
<grin> I've mostly been trying to figure out the CI bits - I don't have any projects that do releases, so I hadn't noticed either
/me doesn't speak fluent YAML yet...
okay, so looks like we're at 89% commit coverage across those 55 accounts
63715 commits out of 71289
Carl's will get us to 93%. I think a few more will probably get us into the 95-98% ballpark.
That's a lot better than I expected, to be honest
I just wanted to be sure we didn't miss anyone currently active on github who wanted to be correctly linked into the history
curiously, tom browder's commits are actually the first i've noticed lower than his Svn count... should there be any reason for that?
Is is svn count counting any repos other than the main BRL-CAD history?
nope, it's just an svn log dump off trunk
If it's an admin dump that'd be everything (rt^3, geomcore, etc.)
Ah, you mean from a checkout
one sec...
can check on your end: svn log > log && grep -E '^r[[:digit:]]+[[:space:]]\|' log | awk '{print $3}' | sort | uniq -c | sort -n
tom's showing 1637 commits. git count is 1630.
I'm not sure what to make of this ... https://github.com/starseeker/brlcad_conv3/pulse
oh, I get it. That's activity in the last N days.
git log --pretty=oneline --author="Thomas Browder" --branches="*" |wc -l gives me a count of 1634
Working on svn
https://github.com/starseeker/brlcad_conv3/graphs/contributors lists tom at 1630 .. so then there's two discrepancies
I don't know that the contributors is looking across all branches
probably not, but seems odd that tom would commit .. 4 times to a branch
easy enough to verify
I had something in the notes for doing a deeper dive into the git history... one sec...
When I do your svn log count on my local brlcad_repo copy I get:
1988 tbrowder2
That's across everything
o.O
send me your log
sent
heh, you know you can just drag n drop them into here? :)
got it
Ah, right - sorry, my reflexes still think this is a fancy version of irssi
If I do the following script I end up with 73882 commit messages, where github (and git log itself) give only 71289 log.sh
so he did make a lot of commits into the ova repository, so that's one difference albeit to be expected
and he did make a branch for working on binary attributes, so that's explaining the 297 additional commits. if you ran off trunk, you would have seen 1637.
Which is also interesting in itself ... github is not counting branch commits?
yeah, says it right on the page -- Contributions to master
/me is thinking it's probably still worthwhile to have our own gitstats page...
which reminds me
So only the original question remains -- why his commit count is 7 commits short.
(btw, the attached script reports 1930 commits for Tom in git.) log_browder.sh
so similar question since svn is saying he had 1988
I have some C++ code in misc/repoconv (I think it's in the svn_map_commit_revs.cxx file) which could probably be repurposed to actually diff the logs, with some work - svn makes those types of comparisons very annoying...
wouldn't be unusual except that everyone else is slightly higher in git with the fake/duplicate commits
Did he ever adjust svn ignore properties or mime types?
Those commits didn't translate, so we may have lost a few there if he did do that and didn't do any move+change commits
Should be able to figure this out easily by process of elimination.
I just got the repo cloned -- how do we access the svn rev?
It's in the notes. I have a convenience script in misc/repoconv/NOTES
if we can get a list of svn-to-sha, they can be eliminated from the svn list or sha list and vice versa .. should be just a handful remaining in both
.gitconfig helpers section
svnrev
i'm okay unaliased, rather know what's going on first
got it: git log --all --pretty=format:"%H %N" --grep svn:revision:29886|awk '{system("git checkout "$1)}'
It's probably pretty slow for scripting - I wasn't trying to performance optimize, thinking it was just for checking out one svn rev...
Make sure you cloned the notes too, by the way, or that won't work: git fetch origin refs/notes/commits:refs/notes/commits
ah, was just about to say - I have no notes
is there a way to fetch all?
git clone --mirror https://github.com/starseeker/brlcad_conv3.git
I think that'll do it
k
it drives me nuts that git won't pull the notes by default... probably another one of those decisions like not tracking file moves.
okay, so now it's just a process of elimination
Was Tom an SVN era only committer or did he have CVS commits? Things get a lot more wonky when we cross the CVS threshold...
I wish bob would put at least his name on his github account - his account name by itself looks rather bleak
this is going to take a lil while, but getting closer .. lots of curious little discrepancies to chase down
just looking at svn revisions, some clearly didn't map, so it'll be easy to find those -- I suspect they're something categoric like adding directories or moving files
or attributes like you mentioned
@Sean Are you checking just Tom's commits, or doing a whole-history analysis? If the latter you'll probably see on the order of a couple thousand commits that won't line up, at a guess.
Any git commit without a note doesn't have a matching SVN commit (or at least, an identified one) although the "preliminary move commit" commits arguably do map to specific revisions (I just didn't bother assigning the rev number, since the subsequent change commit is the one that should actually restore the tree to the state that matches the SVN commit.)
heh, why would I expand scope on a specific discrepancy? that'd be terrible way to go about v&v :)
just checking tom's to understand this delta. it's 7 commits, should be easy to isolate and understand.
Wasn't sure what you were up to - "lots of curious little discrepancies" sounded omnious
At various points when debugging the conversion, I generated lists of sets of unmapped commits. Can't say I'd look forward to it, but if you need me to I can prepare a complete list of SVN brlcad commits and the corresponding git log and produce the sets of commit deltas.
well, in trying to pin it down, a couple more numbers aren't adding up like if I do a git log on browder and pull all that have an svn ID, I get 1626 commits on trunk, 1919 on all .. which is slightly off the 1630 on the public site and 1930 you reported via some script
If I remember correctly commits that are only locatable by tags won't show up in the default git log listings, which was the reason for that crazy script to introspect everything.
No need, it's easy to pull the mapping with: git log --pretty=format:"%H %N" | grep revision | sed 's/svn:revision://g'
can write an awk to show the gaps or just inverse grep as needed
Just FYI, now that we're really close, I'm planning on doing actual v&v on the repo to sanity check everything. I don't expect to find anything cause I know you poured heart and soul into in the previous revisions, but still better to find any problems now rather than later.
Basically just looking for any actual data loss, like commits missing that shouldn't be missing or something that's off by one or some other bug.
Nothing exhaustive, nothing to hold anything up either. Just basic comparative testing to see if we understand and expect all the differences.
tom's actually a rather convenient delta to investigate.
@starseeker so is the account name actually preserved anywhere? It's okay if it's not, but I'm not seeing it and thought it was getting collapsed and preserved somewhere.
It's not, except in the account-map file. Only other approach I can think of is to add another note line to the commits with the cvs/svn commit name, and that'd be a bit of a job to do right.
I'm afraid to touch the main logic at this point if I don't absolutely have to, which would mean appending another not line with a post-conversion analysis. Doable, but not trivial.
If it's helpful, here's the list my logic generates of SVN commits that have no identifiable corresponding git commit (at least, without analyzing the contents of the diffs themselves, which I have not attempted): svn_list.txt
hmm, interesting
starseeker said:
If it's helpful, here's the list my logic generates of SVN commits that have no identifiable corresponding git commit (at least, without analyzing the contents of the diffs themselves, which I have not attempted): svn_list.txt
Have you gone through the list already exhaustively? Anything unexpected?
obviously lots of categoric ones to not worry about that I'd distill to, like all the generated ones and tag commits are non-issues.
starseeker said:
It's not, except in the account-map file. Only other approach I can think of is to add another note line to the commits with the cvs/svn commit name, and that'd be a bit of a job to do right.
I have mixed feelings on this. On one hand, it would be nice to preserve the actual user name recorded on that specific commit, but the historic merit is questionable (beyond provenance, which is already lost) and can't think of an actual use case unless mappings are wrong (which reminds me, should check the first and last author in the mapping file specifically).
It's fine without. If you want to add it, that'd be fine too.
Not exhaustively, and that particular list is CVS era only - I'm working on a more comprehensive one.
I'm inclined to skip it for now - since git notes can be added without impacting the main sha1 repo history, we can always go back and generate the mappings later if we discover it's worthwhile. ( I plan to put the original CVS and SVN repos up in a single archived git repository on the project, to preserve them for potential use when something comes along to dethrone git and some poor sucker gets to do this again.)
These might be a bit more interesting - I disable the limiter and ran the check for all svn commits, as well as printing out unmapped (or at least, not uniquely mapped by commit message) git commits.
svn_list.txt
git_list.txt
The git version is less sophisticated - duplicate commit messages on different commits will show up - but it's a start.
Update - better version of the git_list.txt file that also removes unique timestamp + message matches. < 2k as opposed to almost 6, and visually most of them look like cvs-fast-export breaking down commits differently:
git_list.txt
The branch delete commits are needed to preserve when a branch was removed in SVN, since we can't actually delete the branches in git without unreachable commits being garbage collected.
First number in both lists is the timestamp, so they're sorted chronologically. SVN has commit ids, and git has sha1 hashes. Then for both commit message is shown, which usually gives a hint as to why there's no mapping in the other system.
posted announcement to mailing lists, facebook, and twitter
git-stats looks like it's pretty much working - probably need to tweak the output some for our specific needs, but right general idea:
https://brlcad.org/~starseeker/git_stats/authors/best_authors.html
/me pushes his luck by feeding the full 70k+ commit history through the git->fossil converter... curious to see if fossil can handle this.
of course it can, no reason it shouldn't. he's pretty consistent in making things robust to scale.
hey question on the svn revisions and git notes...
I know this is coming late and maybe we hashed it out earlier(??), but given the tooling issues, what about just stashing the cvs/svn rev info as the last line in the commit log?
@Daniel Rossberg per your e-mail, you're also welcome to use your brlcad.org alias (rossberg).. which can be pointed to anything, and can be claimed in your github account as an additional address.
Took most of a day to run, but it did work - cool! I present BRL-CAD, in fossil:
brlcad-fossil.jpg
Theoretically possible to stash it there, but once we do we lose the trivial 1-1 commit message correspondence with the earlier repositories. The latter is what let me generate the svn_list.txt and git_list.txt files above - I know we could work around adding the extra info, but the git notes appealed to me semantically (metadata on the commit, rather than part of the core message/data/parent relationship)
Also, I can't incorporate it into the CVS portion of the history without trying to hack the cvs-git tool in some weird way - I'm taking their git output and assigning the notes with our ID numbers post-conversion, rather than during.
Once they're in git, the log messages can be edited, so CVS could still be annotated too.
Editing the log messages is (I think) like editing the commit names - it will propagate invalidating the SHA1 hashes all the way up the chain.
I get the appeal, but the downsides are starting to dominate the more I work with it.
invalidating the sha hashes was a notes issue though, wasn't it
if we're not using notes, then that's no longer an issue
Actually, when you asked the git list they gave us a theoretical way around that.
I never tested it, but it's not a huge issue - in principle I could do a complete regeneration of all the notes information given timestamps and commit messages that match the CVS/SVN messages.
What downsides are you encountering?
right but that's the whole point -- it's really a half-baked feature that isn't working well. the log message is part of the commit and the only reliable place to stash it.
I could do it for the SVN portion of the history, although there is a risk I'll break something - CVS is much harder.
How much do you envision using that information? I was figuring the "svnrev" alias for the gitconfig file would cover the most common use case - check out an svn revision - and those ids would grow steadily less relevant with time... Is the part you're not liking that you don't get the notes in a default git clone?
Actually, doing it even with the SVN history would be a substantial effort as I look at it - over 300 commits would have be manually updated, plus the correct surgery on the C++ commit header generation code.
well let's see.. there's:
1) people have to be told that notes exist and use a command they've probably never used before to pull them
2) additional options that must be learned to work with them (e.g., --pretty=format: %N)
3) 72354 commits to add them that show up in log, have to be ignored or scripted around
4) the restriction that if we change any historic commit, we'll need to do surgery to reattach the note
5) the general feeling that notes are half-baked and they're not prioritized to change anytime soon
6) needing to have additional customizations/macros that have to be remembered, maintained, explained
7) the fat that presents to users simply as the last line of the log, so it didn't really buy us more than logical separation
8) logical separation isn't compelling by itself as it could just as easily be stripped from logs (with less machinery than adding it)
9) the svn revs are not visible to an observer without it being explained...
I suppose #1 and #9 are related, but separate points on needing to know they exist, and needing to actively take steps to do something about it
@Sean Another possibility is to write a utility to take the completed conversion and construct a new repository from that, incorporating the notes as commits.
(essentially, "replay" the history again, but this time from git->git rather than through all the custom insanity.)
That's probably the most practical option by a long shot, actually, now that I think about it.
er, incorporating the notes on the end of the commit messages rather.
starseeker said:
How much do you envision using that information? I was figuring the "svnrev" alias for the gitconfig file would cover the most common use case - check out an svn revision - and those ids would grow steadily less relevant with time... Is the part you're not liking that you don't get the notes in a default git clone?
I actually envision using it on the regular for at least a while until references in trackers and notes and other places become less frequent.. but again, I don't need machinery to do that. I just need it somewhere. A file in the repo would work if the revs didn't change. Since they do, the log becomes the next best place I think. One can grep a log and grab a sha.
So you're wanting something robust even to a full history rewrite, if it comes to that?
Is there value in the branch note? aren't they on their respective branches?
Not really, git doesn't have the same notion of branch specific histories that svn does
If you want the ability to find the commits made to a branch, and only those commits, you need the branch notes
I think I've got a link somewhere that explains how that works - it's a low level consequence of Git's world view
I do recall the conversation a while back
I guess I've just not needed to know that specifically
and can't it be derived? I mean I can pull a git tree view and see all the commits on that branch
it's of course squirrelly when commits are cherry picked over, but from svn's perspective, they would have presented as being made on the branch too,
unless one peeks at the mergeinfo
You'll see the commits, but git doesn't retain the origin branch for the commit. Once the commit is referenced by multiple branches, they're equal - there's nothing that remember what the "first" branch was. It will work up to a point, but once you start merging multiple directions between branches you lose the origin information
but... it'd be the first chronologically
https://stackoverflow.com/questions/4629358/show-only-history-of-one-branch-in-a-git-log discusses some of the issues
and even then, I'm not sure what knowing the branch is going to help with. knowing the committer, sure. knowing when or a commit message saying why, sure.
I sometimes want it to know if a particular change took place while I was working in a topic branch, or whether the change took place in trunk.
so that joker already does something I really despise.. --squash
@Sean If I remember correctly, you can see the issue by trying to look at the history of the bullet branch - use git's own tools, and then the method I have in the NOTES file using the branch notes.
starseeker said:
I sometimes want it to know if a particular change took place while I was working in a topic branch, or whether the change took place in trunk.
but that's my point, if you annotate the line and find the hash, and look at the first instance on a git tree, won't you know that?
If I'm trying to review what was done in the branch, but I've merged in trunk/master, it gets hard because suddenly a whole bunch of "master" commits are now part of that branches history, interwoven with the commits made on the branch
I'd have to try that for an individual commit, but if both branches that reference the individual commit are older than the commit itself I don't think you can distinguish which one created it.
Another concrete case - if I want to look at the original development of the CMake build system in the cmake branch, in SVN I can log just in that branch and not see any trunk commits that happened while that branch was live. In Git, once I merged the cmake branch back into master, suddenly all the master commits that took place while the cmake branch was live are effectively part of the history of both branches.
I'm still not seeing how that's a problem that needs to be solved. So commits are interwoven... that means cherry picking might be hard. It probably means I should merge more frequently or will make me merge less frequently or, better yet, not be working on a branch for a long time.
It makes it hard for me to follow the commit history of a particular feature's development, without interference from commits in other branches. If I'm the only one that has the problem it doesn't matter particularly, but that was my motivation since it is something that can be done now in SVN (and I have done on occasion).
I may use it more than I realize, but I'm still struggling to come up with a case where knowing the branch is going to change my behavior or awareness on something. I'm usually wondering "who wrote this chunk of code, why was it written". I suppose knowing a branch might help indicate that but to date the info's either not existed or come from log messages because branch use has historically been big isolated things.
Right - that's the point though, in Git we lose that isolation. Hang on, let me see if I can give you a concrete example with bullet...
like I might consider the binary attributes or opencl branches, they both have lots of changes, so it might be nice to know what changes aren't on trunk
@Sean do you want me to start trying to figure out how to replay the history and consolidate the notes into the commit message?
but then maybe I should check out those histories, because I expect the tree view to clearly show what was done on the branch
in my experience it does not
I may be missing something - see if you can use (say) gitk to visualize the history of the bullet branch
(by the way, for general history browsing I generally use gitk --branches"*" to avoid seeing the notes commits)
okay, so then convinced me it's worth keeping for now -- the branch info -- if only because we have a dozen branches with work worth isolating and if it helps isolate them, fair enough
OK, so in the NOTES file I have two aliases defined - logb and logsvnb. The former tries to use git's "standard" information to follow the branch history, and the logsvnb alias uses the notes.
If you checkout the bullet branch, then do:
git logb
you'll get one result, and
git logsvnb
will produce another.
(you can also do what the aliases are doing in scripts, that was just an easy way for me to achieve the result)
after I set something up, right, to get those aliases?
Screen-Shot-2020-07-17-at-5.09.07-PM.png <-- another downside...
Try --branches="*" instead of --all - does that help?
Yes, the NOTES definitions get added to your ~/.gitconfig file
ah, need a [alias] header
Oh, sorry - I figured for the docs to put in a fully populated .gitconfig file as an example, but I haven't assembled it yet (if we decide not to keep the notes in this form it's moot anyway).
I know this is all one-time setup, but it really does feel clunky -- I think if we can make it work as the last two lines of the log message, we should and most if not all of this custom can go away
Well, logsvnb won't but --all will behave better, that's true
well there simply won't be 72k commits that sometimes appear and have to be explained/ignored/parsed over/etc
so that'd be a plus
curious -- are they on a branch or something?
It's some soft of separate mechanism .git/refs/notes/commits I think
very bizarre presentation
when I inspected one, it was presented as a change to file /dev/null
user "CVS_SVN_GIT Mapper <cvs_svn_git>" which I presume you set
yes
and are predominantly at the end of the git log --all listing, but then are partially interwoven ...odd ordering
I wish git embraced a feature like svn attributes. I think mercurial supports arbitrary key/value attributes on their objects. sigh
https://github.com/newren/git-filter-repo/ might have some possibilities
you sure it's not easier to update the tooling? seems like it should be easier to not write notes and simply append to the log messages as they are committed.
I think I could also probably write a script that adds them to the existing log if that'd help
I've got over 300 manually adjusted commits which would have to be updated by hand (and being off by one character length in any of them will halt the commit) - plus it's now been close to a year since I've mucked in the code that generates the commit headers. And that's still just the SVN portion of the history - I'd need something like git-filter-repo anyway to get the CVS version.
ah, you're not rebuilding the cvs portion repeatedly?
just picking up at 17k or wherever
Correct - cvs-git generates that, I then post-process it to match SVN commits to CVS->GIT commits
~29k, IIRC
ah, right, the reorg was cvs
and that was nearly 23k
/me nods - I could have put the svn numbers in the commit messages when I was originally writing that code - in fact I considered it - but it wouldn't have been a universal solution and it complicated the commit message mappings, which had to happen for CVS anyway.
/me nods
it wasn't really apparent the burden or full implications until working with it more
If you want to help, you could take a look at https://github.com/newren/git-filter-repo/ and see if that provides enough power to rewrite the history by pulling the note (if any) from each commit and appending it to the commit message.
The notes associate the information with the commit, so the problem becomes to (for each commit) retrieve the information and assemble the new commit message. Then, it needs to be applied and the history above it rewritten to accommodate the new sha1.
Even with a well tuned process that'll be quite slow, especially for the older commits...
@Sean if you're OK with a mapping file, what about a mapping file for timestamp plus commit message to SVN id? That should be robust if we can supply a way to look up a given commit using those inputs, even if we skip the notes
(by the way, since a default git clone from github doesn't pull the notes, they're not going to be an issue for people unless they go looking for them...)
I would just shell script it myself, something like:
oldmessage="git log ...
"
git --ammend -m "$oldmessage\nsvn:revision:$revision"
If you want to give that a go, the repo on github now should be a suitable test
I can, but it might delay things for monday -- still working through yesterday's validation check and need to create a few more accounts for the final upload
If we need to figure out another solution that involves the conversion process, Monday is shot anyway...
a couple other things I wanted to test too, like what happens if we garbage collect -- are there any orphans now?
git fsck --lost-found can check for that, IIRC
also, what happens after deleting all the note commits. . and then garbage collecting. is there more to clean up.
right, I know -- that's just one of a couple dozen validation things to check on my list
common ops someone might do to their checkout
/me shakes his head - I think we'd better not plan on Monday. You may find more issues, so let's just wait until you're either confident or have identified specifically where we need to end up to be ready.
We did originally plan for there being about 2 weeks of validation. I was going to try an cram as much as possible in 4 days :smile:
okay, time to stretch legs.. oof. giving myself nerve issues with so much sitting for months now.
/me nods. Let me know.
FWIW, this might generate a SHA1 independent map:
git log --all --pretty=format:"%ct%nGITMSG%n%B%nGITMSGEND%n%N%n"
oh nice, that eliminates the indentation too.... I was just going to sed that out, but this is better.
We could just commit that, and then the notes wouldn't matter much... most clones wouldn't have them.
That's one thing about git I unreservedly approve of over SVN - it is way way better about programmatic extraction of information.
If you re-clone from github, without pulling the notes, your git log (and gitk) won't show the notes commits even with the --all option.
Here's a demonstration of a command pair that can use a timestamp and message to checkout a specific commit:
sha1=$(git log -F --after=1047583133 --before=1047583133 --grep="* empty log message *" --pretty=format:"%H") && git checkout $sha1
starseeker said:
We could just commit that, and then the notes wouldn't matter much... most clones wouldn't have them.
Just commit what?
good to know about the notes, so the extra commits wouldn't be ongoing nuisance unless someone pulls them. but if they're on the commit log, would there be a reason for keeping both? or is that not what you meant?
If we generate a script that is capable of checking out the matching git commit without requiring the sha1, based on the timestamp and some or all of the commit message, then the git notes won't be needed anymore.
I suppose we could strip them, but I'd rather leave them (at least in the primary github repo, even if we don't tell people to grab them by default) in case whatever script we come up with proves to have some sort of problem - then they'd be available as a fallback.
Won't they get disassociated when the commits get ammended? guess we can find out..
Amended? Why would we do that, if the script can check out the SVN id?
If we eventually have to change the repo for some reason we'd have to either try the solution the git folks gave us or re-generate the notes, I suppose
maybe talking about different things?
to append to svn rev the log message, that's an amend
Right - I'm trying to avoid having to do that
i'm not following then what the suggestion was
To generate a shell script that can accept a SVN revision number as an input, and do the appropriate checkout based on timestamp and commit message matching to check out the corresponding git commit.
That won't tie the script to a particular sha1 hash, and so should be robust.
Then we won't need to worry particularly about notes, updating log messages, etc.
how's it mapping svn rev to commit message? talking to origin or something?
It would hard code the timestamp and message associations into a case statement, which would use the SVN rev as the lookup key
part of the issue was also one of simplicity and obviousness, not having to know some special knowledge to discover the svn rev or have it explained or documented
Changing the commit messages is the most disruptive of all the options - are you sure it's worth it?
You, Nick and I are probably the most likely to need SVN revs, and we're the most able to handle something less obvious...
agreed, but looking up commits wasn't the only issue
kanzure's first response weighs on me
his comment, questioning where the revs are and hoping we saved them somewhere.
it's not just that the data is or isn't available.
it's that he had to ask
and even then, that's still only 3 of the 7 issues that came to mind..
with all the churn and back and forth, an e-mail change seems inevitable ... like if github suddenly becomes persona non grata and we move to gitlab. we might want/need to rewrite all those stupid github privacy aliases .. talk about f'ing vendor lock in.
of course, that one is less interesting as future us may have other options
s/interesting/compelling/
Heh - that's an argument to go to all brlcad.org emails, in some ways...
I thought about that
but then I think we'd end up with even less coverage displayed
Ah - because people have to add the brlcad.org email to their profile?
unlikely to get the old devs like gary to associate an alias he's never used to his github account that he probably never uses.
right
I mean, 55 out of about 75 is not too shabby
OK, I'll see if I can figure out the amending thing, since there are multiple potential applications/use cases. Just be aware I'm running out of steam, to a degree.
I know, I hated bringing it up. Don't mean to cause more work.
The usability implications have been somewhat jarring/unexpected, and simpler may be better. We're not losing anything.
And we probably could revert back to notes or attributes or some other feature ends up getting developed. I have to imagine something eventually will..
/me winces. Once the repo goes live, a change of that sort will be disruptive for all forks even if we figure out how to do it.
yeah, I'm thinking more like if/when we change hosts again
it took us so long to get of sourceforge that github is bound to be obsolete soon.
off*
</sarcasm>
Um. Even then, in principle we could migrate the git repo without breaking forks, if I understand correctly - it would just be a change in origins. The breakage would be if we needed to change emails on old commits (as opposed to associating them with the new accounts, say...)
well, yeah -- I think that'd be implicit because of all the github-specific aliases. that only works on github.
Heh - how many times did sourceforge get sold before they started having trouble? That might be a decent yardstick...
if github were shuttered, there'd be no way to authenticate/claim those addresses
Hm. That's true enough.
I think people just assume they'd rewrite their author names. I would if I were using one.
I find the idea of using a content provider's e-mail alias a bit wonky personally. Unless it's something "too big to fail" like gmail.com ... tech is notoriously unreliable, even fickle ... looking at you yahoo.com
/me likes the idea of spam going to /dev/null with the noreply email...
so just put your email address someplace permanent, like geocities.com
To be honest, I'm not all that worried (on a personal level) about my commits showing up anywhere - they haven't for a decade, and I'll live if they don't... as long as the project's stats behave reasonably, whether it ties to my account is secondary.
@Erik you mentioned your git fu being strong - we have a case where where help would be appreciated, if you have any ideas
a git repo is a git repo, they can be rewritten, there is no "exporting", it just is
We need to rewrite a git history to take those commits that have notes, and append them to the end of the commit message instead.
shrug so slap a hook in and clone it to fire it or something
words like append, rebase, filter-branch, and such hover around this question, but there are additional challenges - such as preserving the original timestamps while doing all this.
we are striving for a degree of fidelity in history preservation that I conclude is somewhat unusual among git users...
there are several types of dates kept in git.. author date, commit date, merge date... um, read all the formatting options in the pretty printing section of man git-log
it's probably the quickest most comprehensive way to grok what git stores
Is there some sort of standard "advanced" script for a situation like this, needing extensive (and non-unique) commit msg updates?
If push comes to shove I can manipulate the data at whatever level is required, but it would be nice if there's a pre-packaged answer...
/me realizes he should probably eat dinner...
there's a hooks directory that can be used for crap like this, the script just has to do one commit, then ask git to clone using it, or filter if you want to try to do it in place, or whatever. Or just write a script to iterate the commits and -amend them. Or ...
and it's a dvcs, so, y'know, if you break it, just grab another copy
It's not the breaking it, it's the 200 iterations of breaking it before I manage not to break it...
Sean said:
Daniel Rossberg per your e-mail, you're also welcome to use your brlcad.org alias (rossberg).. which can be pointed to anything, and can be claimed in your github account as an additional address.
For github I want to stay with my github address.
Another issue is that my brlcad.org address is dead. It points to a sourceforge address, which don't accept mails from outside. ~/.forward seems to not work.
@Daniel Rossberg your alias no longer points to any sourceforge addresses -- they were all updated recently for everyone for that very reason.
just fyi.
@Sean As long as I'm doing this anyway, would you prefer a different format for the SVN revision and branch info than what I was using? It wouldn't be too much more work to change the formatting once I get the initial logic working, if you would prefer something different.
Daniel Rossberg said:
For github I want to stay with my github address.
Another issue is that my brlcad.org address is dead. It points to a sourceforge address, which don't accept mails from outside. ~/.forward seems to not work.
And of course, not a problem either to keep it on your github address either, can be whatever you want. Just was letting you know it was an option. The aliases are DNS MX records, so they are aliased before they even hit a mail server.
starseeker said:
Sean As long as I'm doing this anyway, would you prefer a different format for the SVN revision and branch info than what I was using? It wouldn't be too much more work to change the formatting once I get the initial logic working, if you would prefer something different.
I think what you used is perfectly reasonable.
https://github.com/starseeker/brlcad_nonotes
how's that 5 contributors less?
I'll have to check... just got it working a couple hours ago, pretty fried.
I'm actually seeing more?
60 on the new vs 55 on the previous?
might be taking time to populate
interesting, now it says sixty here too. guess it wasn't done processing
I got 7 more names sorted out, so we should be up to 62 now
Ugh. Alright, can't wrap this up tonight (quite.)
So close...
still got a bit more validation too, but yeah, the last couple names I got were huge wins
this is looking good. we're up to 74% of authors - 64 of 86 - and will be up to at least 95% commits after the next run. that should do it!
it should be at least 96.4%
there are flags on just three accounts with anomalies that I'll need to investigate. one with too few, one with way too many, and one not linking to their github
Here's an upload with all the bells and whistles - converted all the emails as of account-maps earlier this morning, notes consolidated into commit messages, and just for grins I also wrapped single line commit messages to 72 chars:
https://github.com/starseeker/brlcad_conv4
Needs a validation check to make sure I didn't accidentally mess something up in the processing, still...
Ah crud, typo'ed a couple of the mappings. screeech, rerun...
There we go (still populating site info...)
https://github.com/starseeker/brlcad_conv5
@Sean is this what you were looking for with the svn commit names?
https://github.com/starseeker/brlcad_conv6/commit/6dc9436d0fc5f17176a0a5fc5d00b54b1194f75c
I think I'm pretty much out of stuff I know still has to be done (aside from pulling newer commits of course) - let me know if you spot anything else.
Ugh, 80 col experiment didn't work well. conv6 removed, replaced:
https://github.com/starseeker/brlcad_conv7
(github must be wondering what on earth I'm doing...)
/me proceeds to unplug brain for recharging...
yeah, I've got very strongly mixed feelings about inserting newlines where they didn't exist. I feel that's just bad git presentation defaults. Apparently they can be overcome (e.g., default interactive pager is LESS=-S even though it can auto-wrap to screen correctly).
starseeker said:
I think I'm pretty much out of stuff I know still has to be done (aside from pulling newer commits of course) - let me know if you spot anything else.
I have commits for three accounts to investigate, which I hope to finish up with tomorrow. I'm done with accounts -- we nearly got everyone that made at least 100 commits (woot!). We're definitely getting super close.
The log additions for branch and revision look like they were flawless. Trying to find one with no log message to see what it did...
@Sean That was my experience with wrapping - gitk I know can deal with it, but doesn't by default??? (I can only conclude that it's a deliberate design decision, given the feature does exist and works...)
I've got notes somewhere on which option to set (at least for gitk), which we'll probably still want to advise people to do regardless because there are some cases I don't detect as wrappable.
My motivation for wrapping was two fold - 1) if we wrap lines, we'll get better behavior for new users with default tool settings and 2) interfaces/websites/tools that assume "standard" git commit message settings may behave better.
It's quite literally an option in the post-processing tool, so trivial to disable if you decide we shouldn't wrap them.
@Sean If you just want any commit without a note migration, here's one:
https://github.com/starseeker/brlcad_conv7/commit/a8161859aa2d1d3935a257be9e725daff89e8157
git log --invert-grep --grep="svn:revision" will list the ones without an svn tag
Here's a no-svn-id commit with a longer message:
https://github.com/starseeker/brlcad_conv7/commit/0758d43db1ef6e2bb518d4e1db355bf6dc864527
CVS era commit without svn id (i.e. not an artifact of SVN conversion)
https://github.com/starseeker/brlcad_conv7/commit/dd3a2e848c19e8610c82f81a40e6c9d7fdbc8c81
I don't think we had any commits with an actual empty string in the git history - the preliminary conversions produced some like this:
https://github.com/starseeker/brlcad_conv7/commit/a4ad5e277ff55f47cc70bb36dd12097b31d03c02
Ah, whoops - sorry Sean, just messed up that repo with an experiment. Hang, on creating a new one.
https://github.com/starseeker/brlcad_r76458
Nice, I like the annotations.
how about "account" instead of "author" though
author has implications that may or may not be true
"account" or "username"
What about "committer" ? Other than that I'd vote for account.
@Sean can I do anything to help analyze the remaining concerns?
(fwiw, "committer" is the equivalent term from git)
I would stick with svn's nomenclature
it's an account username, so either works
Well, I'd prefer to match git, but it's not worth bikeshedding - account it is.
Looks like the distcheck-repo_verify adaptation for Git is working.
/me should confirm the svn-fast-export method is working for the other repos, actually - been a while since I tested that.
Rather crude, but this should encapsulate what's needed for doing a verification between git and svn (at least, as far back as the end of the CVS history)
https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/misc/repoconv/verify.cpp
Will need to run it against this version, with fixed svn branch names:
https://github.com/starseeker/brlcad_conv8
I've not run it myself, beyond a few commits to see if it looks like it's working - it will be very slow, and there may be more optimal ways to go about checking - this is very much a brute force approach.
We can compare the CVS portion of the history as well if you want to, but I'm not sure what we'd do about any discrepancies - I'm just using the output from cvs-fast-export, so any changes would be quite difficult.
And in that case the true "ground truth" would actually be the equivalent CVS checkout, if we can map the svn revisions back to CVS in some fashion.
I must be out of my mind, but I taught https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/misc/repoconv/verify/verify.cpp to check both the SVN and the CVS repositories, once things get that far back. (recommended to replace the sphflake.pix,v file as documented in the beginning of CONVERT.sh)
@Sean when you said there were authors you need to check, was that in the conversion or the Github integration?
(trying to think of anything else useful I can do...)
In the conversion
I hope to have them all inspected today, will ask if there are questions. It's more a matter a tracing down all their master commit shas and seeing what the delta is against their trunk commits, to make sure all the differences can be explained.
starseeker said:
Sean That was my experience with wrapping - gitk I know can deal with it, but doesn't by default??? (I can only conclude that it's a deliberate design decision, given the feature does exist and works...
I'll just note that is your assumption, and not one I would make. Yet you're using it to justify a subsequent decision that all have to live with.
I've got notes somewhere on which option to set (at least for gitk), which we'll probably still want to advise people to do regardless because there are some cases I don't detect as wrappable.
We will also because I have little intention of manually injecting newlines once we're on github for command-line commits. I don't do it for other git repos and don't plan to on ours either except when the commit warrants a longer description and I'm in an editor.
My motivation for wrapping was two fold - 1) if we wrap lines, we'll get better behavior for new users with default tool settings and 2) interfaces/websites/tools that assume "standard" git commit message settings may behave better.
That said, these are sound reasons, save for the caveat I just stated -- that it just means the historic commits might be pretty but not the more recent ones.
It's quite literally an option in the post-processing tool, so trivial to disable if you decide we shouldn't wrap them.
starseeker said:
Sean That was my experience with wrapping - gitk I know can deal with it, but doesn't by default??? (I can only conclude that it's a deliberate design decision, given the feature does exist and works...
I'll just note that is your assumption, and not one I would make. Yet you're using it to justify a subsequent decision that all have to live with.
From my perspective, this is a feature that git and github have wrong. Line wrapping is a presentation issue that is trivially handled by apps. Other distributed vcs didn't make the same decisions, and if we'd picked another we wouldn't even be having this consideration. Which is to say that it's possibly something we'll regret in the future when we migrate to git's successor. Unfortunately, it's trivial to add newlines but it's not trivial to remove them.
I've got notes somewhere on which option to set (at least for gitk), which we'll probably still want to advise people to do regardless because there are some cases I don't detect as wrappable.
We will also because I have little intention of manually injecting newlines once we're on github for command-line commits. I don't do it for other git repos and don't plan to on ours either except when the commit warrants a longer description and I'm in an editor.
My motivation for wrapping was two fold - 1) if we wrap lines, we'll get better behavior for new users with default tool settings and 2) interfaces/websites/tools that assume "standard" git commit message settings may behave better.
That said, these are sound reasons, save for the caveat I just stated -- that it just means the historic commits might be pretty but not the more recent ones.
It's quite literally an option in the post-processing tool, so trivial to disable if you decide we shouldn't wrap them.
I don't feel that strongly to oppose it. I do like it neat and tidy though it begs a couple questions (like what column did you wrap on? what about things like URLs? what about punctuation? ..).
It's a little concerning that it's not preserving what actually was written. It's slightly complicating the review process because they don't match and I have to do additional scripting (but I'll deal, just slows things down). Those are not strong enough to argue against though. I think you said you limited it to commits that had only 1-line comments? That's probably a good balance.
Sean said:
From my perspective, this is a feature that git and github have wrong.
Github actually appears to handle the long lines just fine (ellipses on presentation). This really is just a git tooling convention / defaults issue. I think I even read how one can make git log behave for a different format line.
But again, not enough to fight against it, just sharing my perspective. I probably wouldn't, but if you want to inject them on the single line commits, I won't fuss too much. :)
Hm... one possibility comes to mind. Putting svn:log:wrapped could be used to denote the ones wrapped, which would then make them invertible and an encoding of the original data.
I'll just note that is your assumption, and not one I would make. Yet you're using it to justify a subsequent decision that all have to live with.
Didn't intend it to be a justification - more an assessment of likelihood of it being changed.
From my perspective, this is a feature that git and github have wrong. Line wrapping is a presentation issue that is trivially handled by apps. Other distributed vcs didn't make the same decisions, and if we'd picked another we wouldn't even be having this consideration. Which is to say that it's possibly something we'll regret in the future when we migrate to git's successor. Unfortunately, it's trivial to add newlines but it's not trivial to remove them.
Point. I wasn't strongly advocating for it - I just put it in the test conversion as a demonstration of what I could achieve if it was of interest.
We will also because I have little intention of manually injecting newlines once we're on github for command-line commits. I don't do it for other git repos and don't plan to on ours either except when the commit warrants a longer description and I'm in an editor.
Fair enough.
That said, these are sound reasons, save for the caveat I just stated -- that it just means the historic commits might be pretty but not the more recent ones.
Which actually argues against doing it - don't want newer stuff to look "worse" in some sense.
I don't feel that strongly to oppose it. I do like it neat and tidy though it begs a couple questions (like what column did you wrap on? what about things like URLs? what about punctuation? ..).
Column 72 - used the "TextFlow" algorithm, which I gather is similar to what editors do for work wrapping.
It's a little concerning that it's not preserving what actually was written. It's slightly complicating the review process because they don't match and I have to do additional scripting (but I'll deal, just slows things down). Those are not strong enough to argue against though. I think you said you limited it to commits that had only 1-line comments? That's probably a good balance.
I wish you'd said something, I could have generated another version without the wrapping. (Still can, for that matter...)
The posted version is not the final version anyway... Over the weekend I think I figured out how to actually audit and fix the CVS era commits so the git checkout for each commit will match what cvs would produce (still testing, and will take a while to run, but initial results are promising.)
Column 74 is I think the minimum only because Git defaults to presenting 4 char indents on log output.
starseeker said:
I wish you'd said something, I could have generated another version without the wrapping. (Still can, for that matter...)
The posted version is not the final version anyway... Over the weekend I think I figured out how to actually audit and fix the CVS era commits so the git checkout for each commit will match what cvs would produce (still testing, and will take a while to run, but initial results are promising.)
Only slightly. But each upload has meant I need to regenerate my list of comparison hashes... ;)
On the latter point, what do you think about having the log tags actually denote cvs:revision:### (in addition to the svn revision) for the cvs portion?
if there's a way to record the actual account name used (not just the mapped account name) for both cvs and svn, that would be a nice-to-have preservation. if not, no biggie.
and wouldn't do it to only get cvs or only svn.. that could be confusing
by the way, I updated https://brlcad.org/wiki/Github_Migration with all the migration steps as I'd envisioned them. I may have forgotten a step or two, but I think most of it is there. I did try to make sure it incorporated all the points you mentioned in your (more elaborate) discussion.
Of course, some of the verification steps may cause more verification steps, but it's got the gist of what's needed.
Growl... well, I can change CVS era commits but auditing them is proving trickier than I'd hoped in some ways... specifically, what do I check out from CVS when Git says a particular commit is on a dozen branches?
About cvs:revision - correct me if I'm wrong, but did the CVS tool actually have revision numbers? I thought all we had was the numbers SVN assigned various commits when the cvs2svn migration occurred.
I'm checking out by date and -r tag (when trunk/master isn't available) - is there another option?
Are the CVS commit names different from the SVN authors? I'd been assuming a 1-1 mapping there, but perhaps I'm wrong?
I suppose one possibility might be to add the cvs checkout lines corresponding to each commit...
cvs:checkout:cvs co -ko -D "<date>" [-r tag] -P brlcad
probably overkill
starseeker said:
About cvs:revision - correct me if I'm wrong, but did the CVS tool actually have revision numbers? I thought all we had was the numbers SVN assigned various commits when the cvs2svn migration occurred.
revisions in cvs are per file -- akin to git. there is no global number like svn.
starseeker said:
Are the CVS commit names different from the SVN authors? I'd been assuming a 1-1 mapping there, but perhaps I'm wrong?
They're different.
At least, there's a swath of names that only exist in cvs, a swath that exist in cvs and svn, and a swath that are only in svn
it's whatever account we committed from
so for example, Markowski had commits as 'mmark' under rcs and 'mm' under cvs (or vice versa). I had commits as 'morrison' under cvs, never as that via svn though.
not a terrible loss, but it'd be really cool if we could preserve that original commit account name per commit. there's some semantic repo history that would be preserved just by knowing the name.
So if we can ID which account names are unique to CVS, we could flag them. A quick check shows svn:account:mmark and svn:account::mm both present in the conversion, so the names made it. A cvs prefix could probably be added based on which commits originally came from the CVS conversion - I'd have to think about that, but it's probably possible.
Actually, that might be best - just prefix with cvs:account or svn:account based on which VCS the commits came from.
revisions will always be svn:revision (those commits that have it) since the numbers came from SVN.
branches are trickier, but based on my experiences so far I'd rather just leave the svn branches alone - my brain hurts trying to sort out the various mappings, and I doubt it's terribly critical as long as git blame can walk back through the history successfully.
Actually since you mentioned it about mmark and mm, it looks like you already have it doing the right thing -- it's using the account username as originally committed for both svn and cvs. That's great!
starseeker said:
Actually, that might be best - just prefix with cvs:account or svn:account based on which VCS the commits came from.
That would be a pretty slick detail. We'd actually be able to distinguish three "generations" of commits, hah.
I think I've found a way to associate the author ids (and cvs-fast-export's branch analysis) with the comments in the final conversion. I'll need to actually test applying the data in repowork, but I've got a script now that looks like it is successfully extracting the information. (misc/repoconv/cvs_info.sh)
OK, managed to apply the CVS account/branch information, FWIW:
https://github.com/starseeker/brlcad_conv9
@Sean Is there anything more I can do? I'm not sure it makes sense to have me do the check steps, since I'd basically be re-using the same logic I put together to do the conversion in the first place, but if there's anything that will move the process forward I'd like to help...
best you can probably do is probably just having a bit of patience, however frustrating.. :) you're right -- you can't / shouldn't verify since you may unintentionally dismiss or overlook something whereas someone else won't know to. sumanga and I don't know your conversion logic at all, so this is nice indep validation. :)
OK, will do.
I had to make one adjustment post brlcad_conv9 to get the spacing right for the CVS-only comments - should I upload that version of the repo?
(I know you mentioned the changing sha1 values in the various versions was a pain, so I wanted to check...)
starseeker said:
OK, managed to apply the CVS account/branch information, FWIW:
https://github.com/starseeker/brlcad_conv9
should i pull this one right now and run the script? :worried:
or should i operate on the brlcad_conv8 repo?
brlcad_conv8 is fine.
The newer ones are just minor variations on the commit message formatting
thanks for the info :smile:
I'll let you know if one appears that would motivate a restart in the check, but unless someone finds an actual error I doubt it will be necessary at this point...
@Sumagna Das how are the checks going?
somehow the local copy of the github repo got wiped and all i know was that the last revision being checked was 75007
now if i clone it, it will start from the beginning
Can you tell your script to start at a lower revision?
it starts first with the github repo commits, check them and then checkout the svn revision
i have to find a way to go backwards then
@Sumagna Das The github checkout has the svn revisions in the comments - could you just filter out any commits that have a number higher than 75007?
i was thinking about that
(btw, if you're going to check it out again go with https://github.com/starseeker/brlcad_conv10 )
i will make it skip them but not save them in the skipped_commits.txt
starseeker said:
(btw, if you're going to check it out again go with https://github.com/starseeker/brlcad_conv10 )
will be a good idea
/me cleans up older conversion tests...
There we go - now my github account looks less manically busy.
:grinning_face_with_smiling_eyes:
found it through grep checked it out and restarted from the same point
my script starts checking from the commit which is checked out at the moment on the git repo
@Sean ping?
Got through two checks the past week, looking good so far. Few more to go. Hoping we will be able to go live soon, maybe next weekend if these check go good.
How’s your scan going @Sadeep Darshana ?
I believe he finished it, results in the "issues in migrated repo" thread
When I checked, all the SVN era differences I saw where when his script tried to compare brep-debug commits with trunk. CVS era is messier, as expected.
how much of the migration is left?
@Sumagna Das not sure - @Sean , is https://brlcad.org/wiki/Github_Migration still current?
@Sumagna Das we're shaking down for a release, which will slow things up a bit
if any help is needed, i will try to help if i can
starseeker said:
Sumagna Das not sure - Sean , is https://brlcad.org/wiki/Github_Migration still current?
I completed a couple more tasks, will update.
ping? (I know commit reviews are competing with this...)
not really a competition, it's been a full-stop shift to eye-bleeding commit reading for hours on end...
/me winces. Well, hopefully the commit storm will be letting up after this for a while.
Anything helpful I can do? (Testing, etc?)
@Sean Just FYI, realized my updates were missing the svn commit ids for newer commits, in case you were using my github test conversion. New version, current as of last night with all commits, up at https://github.com/starseeker/brlcad_conv11
how much of the migration is done or left?
https://brlcad.org/wiki/Github_Migration is the place to watch
nothing changed i think
From a technical standpoint the main SVN->Git conversion is essentially complete (barring discovery of some significant, heretofore unnoticed problem).
The migration of the secondary data hasn't been as thoroughly explored - that'll probably be tricky, and hasn't (yet) been tested.
@Sumagna Das If you want to do a little experimenting you might see if you can figure out how https://github.com/cmungall/gosf2github works...
Ah, right, now I remember. Unfortunately I don't have admin privileges on BRL-CAD necessary to do the export...
or atleast someone who has admin privileges who can give the exported stuff needed
I don't think any of my old projects have anything to export in this department, certainly not on a scale like BRL-CAD's...
@Sumagna Das One question I don't know the answer to yet is what the best way to handle unmerged patches is. On github they're pull requests, but on sourceforge they're patch files... I don't know off hand how we're going to handle patch file submissions to github. Have you seen anything about how people address that problem?
Maybe the gosf2github script migrates them somehow, since it looks like sourceforge categorizes bugs, patches and feature requests as tickets...
that is the main question....if it can migrate them correctly
/me is not quite sure what gosf2github is talking about with setting up oauth... never done that before
OK, I think the "Personal access tokens" will work, but the perl script is a bit cranky...
Blegh. This begs for a detailed, step-by-step guilde for folks unfamiliar with any of this...
OK, It looks like to get the "collaborators" list needed by gosf2github the repo needs to be an organization-owned repository: https://docs.github.com/en/rest/reference/repos#collaborators
Yeesh. I guess someone needs to experiment with this stuff on a test import of BRL-CAD in the org project...
Oh well... at least it's not necessary to stand up the primary VCS repos.
/me make a note to check more recently updated fork at https://github.com/n-soda/gosf2github
remotely relevant, https://github.com/github/renaming .. so we could / probably should adopt main instead of master
/me tries renaming...
OK, we can rename master->main - proved out on brlcad_conv11
Essentially painless if we do it before pushing to github - I'll just note it in the CONVERT.sh script.
Good note to be aware of when we eventually try migrating issues (last 2 comments in particular): https://github.com/beanshell/beanshell/issues/44
OK, looks like our contributions stats are still there after the default branch rename too.
Phew. Had a few bad moments wondering if we were going to have to re-run the whole thing again to get commits reassigned...
ping?
@starseeker @Sean any update on the migration?
whatcha renaming master to, "tyrant"? :D
"main" appears to be the new convention. I like it - it's shorter and still starts with the same letters.
@Sean ping?
I worked on it some this past weekend. Will update the checklist with things done tomorrow to see where we're at.
We've got a problem - my incremental conversion process just broke.
Not Good.
/me apprehensively re-runs to see if he can diagnose the failure...
OK, I think I know what happened... let's see if I can adjust and recover.
Alright, run kicked off - I've got to crash
phew. Looks like that adjustment got past it.
@Sean ping?
@Sean ping?
@Sean ping?
is he doing that thing were he defers working on it a week every time someone bugs him about it? :D
No :P
https://github.com/starseeker/brlcad_conv11 is updated through r77867
(deleted)
http://bsdimp.blogspot.com/2020/09/freebsd-subversion-to-git-migration.html
leave it to OpenBSD to try and improve on the git frontend: http://gameoftrees.org/
excellent, I'm going to try and push on it this friday now that a particular render task is finishing up.
(excellent == updated through r...)
Now through 77924
@Sean Just wanted to check if you were/are able to push on the Git conversion - I can make more of an effort to keep the github repo in sync with SVN if it is helpful, but otherwise it's a little simpler to only do it every few hundred commits...
Current through r77936
ping?
https://github.com/starseeker/brlcad_conv11 is updated through r77978
@Sean any chance we'll be able to move before 2021?
Hey, cool - you can select a range on the commit graph for Github. image.png
Your'e on fire!
Yeah, I think so. Was thinking the same thing myself. working on it!
Heh, well, like you said, it's a fire sale :-P
according to those graphs, the fire's been raging for a couple years
Gotta say, I like the dark github theme - previously their website was the brightest thing on my desktop
/me blinks - rsyncing the SVN repo from sf.net didn't complete. That's a new one...
There we go. Github brlcad_conv11 updated through r78038
@Sean barring something unforeseen, that's probably my last update of both SVN and Github for the year.
I've not done the full cross platform distcheck-full hammering for release testing since the gqa multithreaded test will currently fail, but otherwise things are generally looking like they're in fairly good shape...
Hey can i get the link for github repo where i can look for beginner level,easy to fix problems and try to fix them
Hey @Aniket Khandagale welcome to BRL-CAD Community, I think you can have a look at BRL-CAD Wiki www.brlcad.org/wiki and start with compiling BRL-CAD to your PC. You can find build instructions from here, https://brlcad.org/wiki/Building_from_SVN
If you want assistance, ask from Community and also you can ask from Sean and starseeker.
Aniket Khandagale said:
Hey can i get the link for github repo where i can look for beginner level,easy to fix problems and try to fix them
In this case, I am not sure about the Github Repo, please help @Sean
Aniket Khandagale said:
Hey can i get the link for github repo where i can look for beginner level,easy to fix problems and try to fix them
@Aniket Khandagale BRL-CAD is available on sourceforge(SVN). it is being migrated from sourceforge (svn) to github (git) so there is no (official) github repo. (there is one where the migration is happening but it is not up to date and behind the main repo by a couple of commits (or revisions, as per SVN terminology).
Thanks @Sumagna Das should i wait till the time its been migrated to github?
@Aniket Khandagale you dont need to wait for the migration.
@Sumagna Das can i get the link for sourceforge
@starseeker Is it possible to fork from blrcad github, push staff or wait to finish migration??
It's technically possible to fork, but the repository of record is still SVN at the moment. I'd recommend waiting for us to complete the migration.
Yes, best to wait or things could get messy when it comes time to switch it. I've been going through the repo so it hopefully won't be a long additional wait for folks.
ping?
ping?
I'll post an announcement to the email list as well, but my plan is to lock the SVN repository sometime on Friday, Jan. 29th to finalize the repository contents for the Git conversion.
starseeker said:
I'll post an announcement to the email list as well, but my plan is to lock the SVN repository sometime on Friday, Jan. 29th to finalize the repository contents for the Git conversion.
does this mean that the migration is done or are there some more things left?
@Sean is doing final review - I'm going to start uploading the secondary repositories while he finishes looking at the main repository.
Okay
tell me when the whole operation is done.
@Daniel Rossberg I've uploaded the svn-all-fast-export conversions of all the projects except BRL-CAD itself to https://github.com/BRL-CAD - can you take a look at rt-cubed and make sure it looks OK to you before anyone starts committing to it?
@starseeker I looks good to me.
Git doesn't know empty directories, that's why they got lost from src/other/ogre. But, as far as I know, Ogre isn't used anywhere.
@Sean status?
I spent most of the weekend validating and reviewing. It's looking really fantastic to me. I have questions, but no show-stoppers. I actually got through the laundry checklist I'd written up to identify all the deltas as document discrepancies. Filed support request for --follow and doing one more pass through the log of missing commits now. Planning to upload repo myself today so I know the process, unless there's some reason not to.
@Sean brlcad_conv11 is current with the latest commits.
Any other questions I can help answer?
if an empty directory is needed by git, typically a .do_not_delete file is touched
I'm not aware of any situation where we actually need an empty directory in the raw source repo - if nothing else, it's simple to have the build system or the code create such directories on the fly...
yup, just quipping what I done seened :)
@Erik It looked like your git isst repo had everything from SVN's isst as well - is that correct?
I'm sure it was just an import, I hope I tried to make the introduction commit as basic as possible and did the "tidy" as a next commit... it's been a few, yo :)
if not, I hope someone is archiving the svn repo and the latest snapshot, y'know, "just in case". I'm sure the DoD can still afford the bits :)
https://github.com/BRL-CAD/vcs-history
neato :D (keep an in-house copy or 20)
I may have to "top off" the SVN portion depending on whether I need to make more SVN commits before the final switch, but if some poor soul has to repeat the VCS conversions for whatever reason they should have the necessary inputs to work with.
(straight jacket not included)
@starseeker still no show-stoppers but found a couple oddities. there are about 100 "* empty log message *" cvs commits that exist in git and svn but are missing the corresponding svn:revision:#### line. would that be because of timestamps or something else?
some appear to have it while others do not.
there are also 139 empty log message cvs commits in addition to the 100 that don't seem to be in git, but I'm writing them off as different cvs2git vs cvs2svn translation until I see evidence otherwise.
@Sean I was probably somewhat hesitant about mapping SVN numbers to those commits - with such ambiguous messages, all I had to go on for those was the timestamps, and the git and svn conversions of CVS didn't always end up exactly mapping those.
svn number assignment logic is in misc/repoconv/svn_map_commit_revs.cxx FWIW
Looking at the logic, I don't know that I did a whole lot with the empty log message commits.
I think by that point I figured we were well into diminishing returns.
I see that I categorized some commits as "non-unique, has exact timestamp match" but I think without manual inspection of the diffs I wouldn't have had the confidence to assign them SVN ids
Even the "Initial revision" commit assignments, which I did make, are a bit dubious
(non-unique in that context would be "non-unique commit message string")
AH! I think I found the processing log
Looks like when I ran that it was against a git repo that didn't have newer 76300+ commits. The following is cleaned up and sorted:
So 735 there is an example -- it's the first in my list. It's not got a timestamp match, so you didn't know which svn :revision that was (which is curious in itself)
at least maybe? that's the curious part because you did know it was 735 ...
under what conditions would cvs commits get or not get the svn:revision:### note?
I know about SVN commit 735, but in the git repository I don't have a commit with an exact timestamp match with the same commit message
What you're seeing in that log is a processing of a detailed log from SVN, combined with a log of available git commits.
For that printout, all SVN commits were checked against what was/is available in git
CVS commits would get the svn:revision note under the following conditions:
1) there exists an exact, unique commit message that is shared by an SVN commit and a Git commit
2) There exists an SVN commit with a non-unique commit message match that also shares an exact timestamp with a Git commit having the same commit message
3) The special case commit message "Initial revision" when there exists a Git commit with an exact timestamp match, and the timestamp match is outside the known "bad" range of early commits with unreliable timestamps.
Yeah, that's odd. The timestamp you have for 735 is different in cvs2git from what svn had...
Looks like r735 is 771b3183f9e315f6e1451a1e3462e6f84724a9cd
svn lists that date as 1986-08-12 23:18:25 -0400
git is a solid ten minutes off at Wed Aug 13 03:08:40 1986 +0000
(that's not to suggest git's is wrong -- svn could of course be wrong)
I will readily admit I didn't delve into the details of how cvs2git and cvs2svn differed in their processing, so I can't say which one is right or better. For myself I wasn't worried about it - in some circumstance where precision for a given commit's timing mattered, I'd want to query CVS directly...
A word of caution - now that we're no longer using git notes for svn revision information, any updates to add more mappings are going to be difficult (not impossible, but it will be another custom processing implementation in repowork.)
No worries, not seeing any reason to reprocess anything -- just accounting to make sure nothing is missing. And simply trying to understand.
I was able to rule out all 81 "Initial revision" commits for example, as they're clearly all categorically present, just not labeled (likely your #3 above).
I think #1 may also be accounting for a lot of the 400 remaining. Many have a repeat commit message but that was made (sometimes seconds) later. If there's some discrepancy between the clock being used, that would also potentially account for more. I should know more definitively here in a bit.
@Sean If you're doing the grunt work to go identify mappings manually, you may as well make a note of the mappings. If you're going to that degree of trouble, I might as well do the extra work to capture it in the commit messages...
Just something like:
sha1;#
sha1;#
...
should do it.
One of the problems though is I don't expect some of them to have exact 1-1 mappings at all, since cvs2git may have grouped things differently.
Pardon, my terminology was loose - the tool we're using is cvs-fast-export, not cvs2git.
Hm, yeah I have that info. I basically wrote two 1-liners to pull a diff of the missing svn revs and and of all git revs, then a 1-liner to make sure they're all accounted for. I could make it print which commit is actually which missing rev.
You're actually comparing the commit diffs themselves? nifty
technically I'm comparing the md5 sum of just the changed/added/removed lines, but yeah. it was also needed to figure out which missing commits were because they were just propset changes.
they show up as empty diffs, so easy to cull them from review
Ah, I hadn't thought of extracting just the diff lines - good call
@Sean upon reflection, I'm second guessing myself - if I update the older commit messages, it changes all the sha1s again and arguably we would need to do more verification to make sure the new step didn't mess with anything. Maybe it's not worth it for the stray svn:revision tags?
With a diff based approach, you might in principle be able to spot if any of the Initial Revision commits ended up mapped wrong despite exact timestamp matches...
If I ended up assigning demonstrably incorrect numbers, that's probably worth fixing...
Sure, can revisit the decision -- my priority has been on finding / validating they're there somewhere. If they're all there and just not tagged, I agree that's less of a concern. I mean it'd be cool to have them all tagged, but that can happen at a later date even and we make everyone re-clone.
Hey, can you take a look at a couple commits and tell me what I'm seeing...
4850989e3a2f9624127ae043c6094076a60bc472 and 97d02527843ffb84f8bb3da0e64ef5f7db6df28c
I'm not entirely sure what those are - some artifact of the cvs-to-git conversion process, obviously, but I'm not entirely clear on what they're trying to represent.
I haven't considered trying to "clean up" any of the cvs era artifacts of the conversion, since I don't know which of them might be added to preserve content that would otherwise be garbage collected out.
at a quick glance, they look like the entire repository was deleted. they're the two largest commits in the git repo. they're fortunately in branches, but would be good to understand what's going on there because it smells like something went wrong
I recall the 7.0 branch and don't remember any sort of merge event like that happening ...
The "cvsconvert" tool did generate some sort of audit...
@Sean How do you want to proceed?
@Sean FWIW, I think I've gotten the necessary piece in place to do the sha1;rev# updating successfully. I'd still want to run your diff check on the final results and probably inspect the updated commits to be sure, but a quick test with your 735 example succeeded.
bbl
starseeker said:
Sean How do you want to proceed?
It would be nice to understand why either of those branches appear to wipe out everything (if that's indeed what happened), even if were not going to do anything about it. I think that'd entail checking out one of those branches and looking at the commits before/after to see if there's an explanation. Not a show-stopper since they're on branches, but concerning from a data anomaly perspective.
Good to know about the sha/rev updating. At a quick glance, lookup succeeded on about 1/2 to 2/3rds of the commits missing. I'm looking at the ones that didn't match to see if they're actually missing or if there's something in the diffing method pooching things. There are a few dozen that map 1:many that we can either ignore or map manually by their date, but I wasn't going to worry about them.
@Sean The immediate question is whether they did do that or cvs-fast-export is misinterpreting some aspect of the CVS data.
FWIW, I think a685e85ff730450f669a0d853c69ef545c30b46f may be related to the 97d02527843ffb84f8bb3da0e64ef5f7db6df28c commit
"remove the cvs tag relic" may be why the prior incomplete tag commit removed everything?
Ah, wait a minute - I wasn't looking closely enough. "merge-to-head" incomplete tag (4850989e3a2f9624127ae043c6094076a60bc472) is an SVN era commit, and also seems to have an associated commit (dd2bb79965568f5aab4f7458606d875d22b74b40)
Yeah, those are both SVN era commits - my apologies.
I was fooled by the "cvs" in the commit messages and didn't look closely enough
OK, so checking more carefully, here's the breakdown:
97d02527843ffb84f8bb3da0e64ef5f7db6df28c - Synthetic commit for incomplete tag release-7-0 - CVS era commit
a685e85ff730450f669a0d853c69ef545c30b46f - child of 97d02, SVN era commit. Message:
clearly not actually release 7.0 .. remove the cvs tag relic that was made on a few files just before the project was converted to open source. (svn branch delete)
4850989e3a2f9624127ae043c6094076a60bc472 - Synthetic commit for incomplete tag merge-to-head-20051223 - CVS era commit
dd2bb79965568f5aab4f7458606d875d22b74b40 - child of 485098, SVN era commit. Message:
move cvs branch tagging artifact removal (svn branch delete)
So, my guess is that the CVS conversions (evidently both of them, cvs2svn and cvs-fast-export) found something in the data prompted tagging. Based on the 7.0 message, it looks like a stray tag was on a few files, the converter interpreted that as a tag in Git that preserved only those files and removed everything else (hence the massive diff.)
Back in 2011, you did some cleanup on the SVN branches and spotted those as spurious. So, we've got the cvs-fast-export generated tags and associated branches, and then the 2011 SVN cleanup of the cvs2svn versions of the same thing.
@Sean Your call how you want to handle the 1-many - I'm pretty sure I can handle that in the svn:revision assignment, as long as each sha1 maps to only one SVN rev.
(I think if we delete the two branches in question from Git we can probably garbage collect them out, by the way - do we want to preserve that, or would it be better to remove?)
starseeker said:
So, my guess is that the CVS conversions (evidently both of them, cvs2svn and cvs-fast-export) found something in the data prompted tagging. Based on the 7.0 message, it looks like a stray tag was on a few files, the converter interpreted that as a tag in Git that preserved only those files and removed everything else (hence the massive diff.)
Back in 2011, you did some cleanup on the SVN branches and spotted those as spurious. So, we've got the cvs-fast-export generated tags and associated branches, and then the 2011 SVN cleanup of the cvs2svn versions of the same thing.
This is the explanation I was hoping for!
Yeah, okay I can see that happening and how it might have gotten intepreted -- a branch was tagged, the branch was removed, but a few stray files from that branch ended up remaining tagged/referenced, so it generated delete commits to preserve their lineage.
There were actually a dozen or two commits very similar to those, which is also why it was concerning (they were just the biggest two), but I think that fully explains them.
starseeker said:
(I think if we delete the two branches in question from Git we can probably garbage collect them out, by the way - do we want to preserve that, or would it be better to remove?)
I think we can just ignore them for now. They're not the only ones, they just stood out during validation as potential processing corruption.
Knowing that they're not, they are out of sight, out of mind. Excellent!
might wanna docco that with stashed history
@Erik Did folks manually edit CVS files to tag releases or some such? I know from what Sean said the history was edited at least once to deal with some Tcl/Tk issues (which can be seen comparing CVS checkouts vs git checkouts, actually)...
I have no recollection of manually tweaking CVS files for a release O.o I was slid off to muves3 around that time I think, I think 7 happened without me
I mostly just did fbsd support and autoconf before reassignment (plus a few side projects, uh, some parser for matrex federations, uh, something else for Geoff, too... )
Fair enough. The more I see of all this the more grateful I am that I got to come on board just as SVN was introduced.
CVS is... weird. At one point I even considered https://github.com/rcls/crap as an alternative to cvs-fast-export, since it seems to reproduce in Git what CVS checks out, but after discussions with Sean (and I think I noticed this myself at one point) I learned even CVS itself won't accurately check out some parts of our history (accurately in the sense of reproducing the tree that the users would have seen at the time) due to the edits made to work around the libtcl/libtk problems.
CVS is a "remote RCS server", grokking rcs is kinda important for grokking cvs
<snort> I guess as a young whippersnapper I joined the software community too late to properly appreciate them. "RCS" to me mostly means annoying tags at the beginning of files that complicate diffing :-P
Which is not to say I could have designed anything better than RCS back in the day, of course - I get the sense that VCS is one of those problems where only experience with the day-to-day requirements of the problems at scale can really result in good designs.
I don't know of CVS files being edited for releases. They were mostly edited to "fix" things CVS couldn't do, like renaming a directory or eliminating a bad commit.
@starseeker another something to investigate... do you know why this doesn't work?
git diff 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47~1 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47
show works, but can't diff it... somehow it doesn't have an ancestor or has multiple or ... ?
Author: Douglas Kingston <dpk@randomnotes.org>
Date: Fri Dec 16 00:10:31 1983 +0000
Original 4.2 Distribution Source
svn:revision:2
cvs:account:dpk
cvs:branch:trunk
That's the earliest commit in the history - it doesn't have an ancestor
OH!
g'dammit.. okay, I can special case it. haha.
I've noticed git tools aren't always graceful when they encounter that case.
I was able to identify most of the missing revisions, but there are about 160 that didn't match, and when I investigated it was because git's show syntax doesn't match diff syntax for merge commits. manually looking at one of the ones that didn't match, it was indeed a merge commit that didn't match because of the format. so I regen'd the diffs but it barfed on that one.
/me grins - you found the last turtle!
there must be some other diff syntax that will get that commit?
tried ^ ...
git show will print it... don't know about diff.
show is the wrong format
Conceptually, what are we diffing against?
I want the patch for that commit
the commit
so patch format, it'll be lines added.
git diff 4b825dc642cb6eb9a060e54bf8d69288fbee4904 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47
^! seems to work, but I'm not sure what that means...
https://stackoverflow.com/a/40884093
Does that work?
eh... that's f'ing retarded.
I'm sure it will. There's got to be some shorthand for that shit.
Maybe ^! is a shorthand for that? Dunno, haven't encountered that syntax before - @Erik ?
I found that on some other SO but can't find it in the docs to know what it means.
Yeah, that's a tough one to google...
I dunno !
^ means "previous"
~<n> means "nth previous"
here we go: The r1^! notation includes commit r1 but excludes all of its parents. By itself, this notation denotes the single commit r1.
sounds like that might be correct
ah, neat
'git show' is what I use for a single commit
me too, but it's syntax is wrong for merge commits (at least for patch and diffing purposes)
it does ++ and -- lines
there's undoubtedly other options to change the format, but diff is the command that does it in the right format by default, so it was a hunt to find the right syntax for "just this commit"
apparently it's rev^! ...
/me reruns a diff dump.. should have answers within the hour
I think I would have ended up doing "git diff rev^..rev"
or something-ish
yep, did that and that's what barfs when it encounters the last rev
er, first commit
git diff rev~ rev
@Sean If you think it's worthwhile, I'd stick your verification script or at least notes about the key gotchas associated with creating it in misc/repoconv once this is all over - it can't be any worse than my conversion logic, and it might be useful someday if we ever have to dive back into this swamp...
sure
I've been stashing notes just in case I need to reference one of the 1-liners later, and notes on missing revs as they've been explained
/me is embarrassed that he didn't think of cutting down the diff into +/- lines - should have considered that when the commit messages didn't resolve things unambiguously
well still remains to be seen -- they may need to be sorted too, but wasn't going to do that until there's evidence it's needed
e.g., if there are multiple file changes and svn shows A, B, C but then git displays C, B, A or similar ... shouldn't but might be possible. so far I'm thinking not just because so many are matching.
but the bigger set was empty merges next so should see how many of the 160 this eliminates
/me will be curious to see if any of the commit message + timestamp based mappings prove to be incorrect.
I can re-run on everything next but immediate priority was just identifying potentially missing commits
/me nods
Whatever you think best - just want to do whatever I can to put bow on this sucker.
that will require pulling all the svn diffs, which takes a while. took longer to pull 720 svn diffs from sf than it took to pull 70000 git diffs locally ... not much longer but still was a while
I'm fine just making sure we're not missing data. if a commit is mis-tagged, that could be fixed later.
Might be faster to rsync the SVN repo and pull it locally - that's how I've worked with it
oh it definitely would
I just didn't bother
that'd take like two lines.
i'm all about the 1-liners
K. If you've got the data to hand though, now that I've got what should be a means to correct them implemented, I'd kinda like to to ahead and fix them. Remember, if we have to ask everyone to re-clone, it's also going to wipe out any pull requests, etc. on github folks may have open.
Remember what happened when I messed up the web git repo
ah, okay, good point. I'll poke that next then.
One trick will be the known cases where cvs-fast-export split things more finely than cvs2svn with those desc tags from CVS - any commit with that in play won't match in diff - for those cases (most of them, anyway) the commit message would actually be more reliable.
So I guess the priority ordering would be:
1) unique commit message mapping
2) diff match
Starts getting a bit more iffy if we have non-unique commit message, matching timestamp, but non-matching diff (with no other exact matching diff) - if the diff is a subset of the SVN diff a case could be made for assigning the number, but that'd probably take some laborious manual inspection...
Hopefully we'll have few/no cases that fall into those categories.
Also, fair warning - I'd expect some differences (due to line endings especially) in the CVS era commits.
If you want to focus on just the commits that are currently unmapped, and ignore trying to validate all of them based on diffs, I'd personally be fine with that given the difficulties of the latter.
I haven't run into any of those yet, but split commits could be handled pretty easily I think. If they're already tagged, I'd just ignore them and rely on all the unsplit matching as sufficient validation.
Update -- my reprocessing using diff format indeed improved things significantly. Found more than half the remaining missing commits. Down to just 64 commits unidentified.
Digging in, turns out at least a portion of them are due to changed lines with internal whitespace differences. It's some sort of expanded tabs issue, possibly where cvs-fast-export preserved tabs correctly whereas cvs2svn did not preserve them. That's unconfirmed, but matches the commit I was checking. Rerunning it now with internal space stripped and should know in the morning what's left.
Huh - I expected some line ending oddities, but I'm surprised there's actual internal whitespace diffs.
@Sean anything useful I can do?
/me shuts his mouth O:-)
is there a public preview of the current incarnation of the git repo?
https://github.com/starseeker/brlcad_conv11
@Erik Unless @Sean spots something, the only planned remaining changes (other than updates until SVN closes) are the application of some addition SVN commit -> Git commit mappings @Sean has identified during his validation.
I'm down to reviewing the last few remaining missing commits -- it's down to about 50 missing, so I should hopefully figure out what happened without too much trouble (e.g., if they're categoric processing artifacts or actually missing data). It's a manual process for the few remaining until I find a categoric pattern.
So far, I'm genuinely having trouble finding one of them, but not done hunting for it (I found a fragment but then couldn't find its commit, so have to re-find the fragment to see if that was combined/merged with something else or just a coincidental edit to the same line in an unrelated commit.)
@Sean you're much deeper in than I at this point, but is there anything I might be able to help with?
Confound it - @Sean any SHA1s after r77842 are most likely going to change
I'm going to see if I can arrange a partial re-run, but there's a glitch in one of my processing filters
Phew. Tightrope walked, looks like: https://github.com/starseeker/brlcad_conv12
@starseeker can you take a look at 4d401a8617869d3594b5948de12a374a5bd292fe and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942 and r19440
there are no remappings or missings after 77231 so it's fine
pulling the new repo
Sean said:
starseeker can you take a look at 4d401a8617869d3594b5948de12a374a5bd292fe and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942 and r19440
If I'm interpreting this correctly, 4d401a8617869d3594b5948de12a374a5bd292fe matches the r19440 change on trunk, and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942 is the same change applied to the rel-5-3 branch. However, the r19440 label was applied to the branch commit rather than the trunk commit.
Which, since SVN reports a diff in trunk for r19440, means the timestamp must have matched for the branch application, but it should instead have been applied to the trunk version
So the "correct" fix there would be to apply the SVN revision to the trunk commit and strip it from the branch commit. I can do the former, but I'll have to tweak things to support the latter.
If you want, we can establish a line with the convention SHA1; to denote commits that I should clear an SVN revision assignment from.
woo hoo, resolved another 13... damn commit messages :)
I don't think it's a huge deal, I'm not sure how many of those there are. possibly quite unlikely if it was just because those commits were within a few seconds of each other?
I'm getting a count now -- there's some number of trunk commits are tagged on branches
If cvs2svn consolidated the timestamps on those two commits as "identical" and picked the newer timestamp, then it will happen every time cvs-fast-export resolved those cases into individual commits and the branch commit was the newer of the two.
Might as well fix them if it's easy to pull the data set - it won't be appreciably more work than adding the missing mappings in the first place.
I think if I've counted correctly, that there are at least 121 commit revisions that were on trunk, but are tagged in git on a branch.
Blegh.
OK, I'll set up to fix 'em
Fortunately, you've identified a test case already ;-)
that one was an anomaly... wasn't even looking, I was collapsing the multiple-match revs manually for the 30 or so that match multiple diffs and that matched two... and noticed it seemed flipped
Ah. Well, either way, good catch.
I'm going to double-check that 121 too. That seems high to me.
Is the branch/trunk label reliable? I've been assuming it was generated based off commit location with no guessing involved, but realize I should double-check that assumption.
For SVN it should be reliable. CVS identifications were up to cvs-fast-export/cvs2svn and I'm not as certain there
The cvs:branch labels were based off of a fairly low-level analysis of the git conversion data - misc/repoconv/cvs_info.sh IIRC
The root was the git rev-list --first-parent reporting, which depends on cvs-fast-export correctly assigning the first parent based on CVS branch data.
(and on me having correctly interpreted the information, of course)
@Sean are we still at about 40 unresolved?
There was a categoric anomaly so I cleaned up and changed some things to check, and am re-running the comparison to make sure.
The 40 count was wrong (it was higher). On the plus side, scripting is cleaned up (had to rewrite everything) to the point that it can check all revs easily now. Got svn cloned too so it can do that quickly. Got it matching files and log messages cleanly now too. It's running through re-processing the missing batch now and should have an update in the morning.
Can you see if you can find c1644? There's a number of initial rev commits like that that I can't find. I'd hope it simply got merged with something else, but trying to verify that on one of them like 1644.
Hmm. Well, if I cheat a bit and use https://stackoverflow.com/a/13598028 to find when the files added in c1644 were added in Git, I get:
git log --diff-filter=A -- util/pl-X.c
commit 86a7fcc40057934832f61255b606c0bd6f7fc12b
Author: Phillip Dykstra <phil@pdykstra.com>
Date: Thu Apr 28 17:40:50 1988 +0000
Unix-plot to X Window System display (X11)
cvs:branch:trunk
cvs:account:phil
and
git log --diff-filter=A -- util/pl-X10.c
commit a6feb76ce1551b09222463514f15e65db0343b55
Author: Phillip Dykstra <phil@pdykstra.com>
Date: Thu Apr 28 17:43:26 1988 +0000
Unix-plot to X Window System Display (X10R4)
cvs:branch:trunk
cvs:account:phil
I didn't do a detailed diff analysis, but it looks the difference is splitting up the commit to get the distinct commit messages?
Cool, that was super helpful. I'm not sure about the general case but I'm guessing it's split them up because they were far apart enough in time (couple min), so cvs2git decided to handle them differently. Checking down through, that rules out a bunch but I have to figure out how to automate the check across all 135 missing. I have checks for matching diffs vs logs vs changed files but obviously doesn't catch split/merge changes unless all that changed was the log message (did verify a slew with that lil trick).
Initial revisions seem to be a large portion of the bulk missing. Took some work to figure out they're not just on branches.
Three commits you could check on for me are r51428, r54352, and r64428. They're fairly modern commits, so they stick out like a sore thumb for not matching. Haven't dove in to figure out what's up with them.
Set up the check across all svn commits and that's chugging along now. When that finishes up, should have a list of commits that are mistagged on branch vs trunk.
@Sean Starting with r51428... The checkouts of the files are identical, so i pulled the diffs:
git format-patch -1 be5072cb90113d7c0d75839cc4f183d8cde1646b
svn diff -c51428 > r51428.patch
The patch formatting is different, so I brought them up in meld and applied all the SVN style headers to the git patch. Doing that, I was left with:
diff.png
It looks like git and svn made very slightly different decisions on where to start and end their patch blocks.
r54352 is similar, but less subtle - identical files in checkouts, but different ordering on the subtraction line instructions in the diff: diff_r54352.png
r64428 is the most spectacularly different of the diffs, but checking the Git and SVN checkouts of r64427 and 64428 all files appear to agree, so the two different diffs appear to end up doing the same job.
@Sean was that what you were looking for, or is there something else about those commits that is concerning?
No that was great, helpful. I hypothesized that'd happen but hadn't actually seen it (or at least hadn't noticed). Those stuck out because they were new. I've been going through the list ruling out others like those.
Any more you'd like me to check?
@Sean How did the re-run go?
Went well! Took a while to process, but went really well. I double checking a couple lists, but here's the list of trunk commits that are misattributed to branches in git. It's not as many as originally seemed fortunately, but it's a few:
mistagged_trunk_commits.log
Looks good - thanks!
r66607 is surprising - I wouldn't have expected any issues like that in the SVN era
OK. It looks like r66607 was a multi-branch commit, making changes to both the branch and trunk in the same commit. I didn't realize we had any of those in the modern era - all the instances I had spotted were much earlier.
What the conversion ended up doing was to apply the changes from r66607 to trunk in commit r66672.
Which will also mean that the r66672 diff won't match that from SVN, since the SVN change was just the HAVE_ANALYZER_NORETURN test.
That'll be tricky to fix. Hmm.
Going through the rest of the list, I haven't identified obvious reassignment candidates yet for the following:
19033
19757
19759
19761
19763
I think I've got the trunk portion of r66607 spliced in correctly.
Identified 19033
Ah, I see. The other four are cvs2svn artifacts - so rather than reassigning, they simply don't have direct analog commits and all and we just remove the assignments.
@Sean OK, next! ;-)
Cool, glad you could deduce them. I wasn't 100% sure if you have it tagging revs separate from branches. I didn't check whether the :branch: tag was correct or not, only that that rev definitely didn't happen on a branch.
Next set is the inverse -- looking a lot better (half done it just found one :trunk mis-assignment) but taking longer to process for some reason. Should be done here soon.
Will share the list of found assignments missing in the morn.
starseeker said:
I think I've got the trunk portion of r66607 spliced in correctly.
What am I looking at in that github date view?
Sean said:
starseeker said:
I think I've got the trunk portion of r66607 spliced in correctly.
What am I looking at in that github date view?
The insertion of this commit into the history: https://github.com/starseeker/brlcad_conv13/commit/e977c035ec8a79967cb3d2a0874af08d86a89764
I used the date view to illustrate it's not just an isolated commit in the repo, but part of the main history
@Sean Where are we with the list of previously unidentified SVN id matches found by your diffing method? I'd be glad to help if you have a set of commits for manual review.
Also, just conceptually, what is your preference for cases like the one identified earlier where a single cvs2svn commit got split up into multiple git commits? Did you want to assign the SVN id to each "portion" commit in Git, if they can be identified?
That would be totally awesome to tag both commits, and similarly, tag merged commits with multiple revision tags. I know some of them but haven't been fully tracking. I do think there are probably 100-200 in that category.
Tagging multiple svn revs onto a single Git commit would require some rework of the assignment code - let me know if that's something you definitely want to do.
If you want to, go for it, but I don't think it's strictly necessary. So long as the commit is tagged somewhere on one of the rev parts, that should be sufficient for tracing.
I finished checking the inverse and the only anomaly was 30804. It's tagged as "svn:branch:trunk-UNNAMED-BRANCH" but was branch "unlabeled-2.5.1" in svn.
Am seeing some other anomalies on these tagged revisions, what's going on with r30687 ? The tags don't appear to match svn at all.
Another curious one is 46324 -- it's tagged as being on four branches but it was a tag, never committed to branches. Saw some others like that.
Even if tags are treated as branches, it's tagged on:
svn:branch:ansi-20040316-freeze
svn:branch:bobWinPort-20051223-freeze
svn:branch:ctj-4-5-post
svn:branch:ctj-4-5-pre
svn:branch:hartley-6-0-post
svn:branch:offsite-5-3-pre
svn:branch:opensource-pre
svn:branch:windows-20040315-freeze
but it was on these tags in svn:
ansi-20040316-freeze
ansi-20040405-merged
autoconf-freeze
bobWinPort-20051223-freeze
ctj-4-5-post
ctj-4-5-pre
hartley-6-0-post
hartley-6-0-pre
offsite-5-3-pre
opensource-post
opensource-pre
windows-20040315-freeze
Sean said:
I finished checking the inverse and the only anomaly was 30804. It's tagged as "svn:branch:trunk-UNNAMED-BRANCH" but was branch "unlabeled-2.5.1" in svn.
Looks like 30688 is also tagged as trunk-UNNAMED-BRANCH but also cjohnson-mac-hack, but I don't see that in svn. Svn only lists it affecting:
unlabeled-1.1.1
unlabeled-1.1.2
unlabeled-1.2.1
unlabeled-11.1.1
unlabeled-2.12.1
unlabeled-2.6.1
unlabeled-9.1.1
unlabeled-9.10.1
unlabeled-9.12.1
unlabeled-9.2.1
unlabeled-9.3.1
unlabeled-9.7.1
unlabeled-9.9.1
@Sean I've added the ability to correct the r30804 and r30688 branch assignments.
Looking at r46324, here's what I'm seeing:
the svn:revision:46324 label is on four commits:
commit 44e3d7341c5680250d65091b2aff6ed051720a11 (HEAD, origin/itcl3-2, itcl3-2)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Tue Aug 23 12:19:43 2011 +0000
revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)
svn:revision:46324
svn:branch:itcl3-2
svn:account:brlcad
commit a988903bbe27985e0dd94228e07079e91e98be4d (origin/libpng_1_0_2, libpng_1_0_2)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Tue Aug 23 12:19:43 2011 +0000
revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)
svn:revision:46324
svn:branch:libpng_1_0_2
svn:account:brlcad
commit c54b9b07158d4a904aabddae264290854ecb250c (origin/tcl8-3, tcl8-3)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Tue Aug 23 12:19:43 2011 +0000
revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)
svn:revision:46324
svn:branch:tcl8-3
svn:account:brlcad
commit 03af105da8dd3cf85a29cc7f056513cc8e79d751 (origin/tk8-3, tk8-3)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Tue Aug 23 12:19:43 2011 +0000
revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)
svn:revision:46324
svn:branch:tk8-3
svn:account:brlcad
When I look at what r46324 did in SVN, it eliminated branches/tags/itcl3-2, branches/tags/tcl8-3, branches/tags/tk8-3, and branches/tags/libpng_1_0_2 - this seems to corresponds to what is recorded in those Git commits (which can't actually delete the branches without any commits being uniquely referenced by them getting garbage collected.)
Similarly, for r30687:
$ git log --all --grep 30687
commit 004ec0ae439f0ca3c814d22a46957012cd8fb239
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:Original
svn:account:brlcad
commit 720f9b9b75588e35d3cce0f9f5b802abea2259ab
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:itcl3-2
svn:account:brlcad
commit f206b315ca475d3a3e55e98ec42d772c6b05baee
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:libpng_1_0_2
svn:account:brlcad
commit cbff64617866cc3fc2b25db15cd610e651561958
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:tcl8-3
svn:account:brlcad
commit 87bc784daf7f15cc8d9c9fa980a934a98a17de95
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:tk8-3
svn:account:brlcad
commit d748c2ea214b699008563e18f5a7105de39faba9
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Wed Apr 16 14:40:20 2008 +0000
remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)
svn:revision:30687
svn:branch:zlib_1_0_4
svn:account:brlcad
I think you're right that SVN tags are getting treated as branches - that's the only way to handle SVN tags with edits - and I doubt I attempted to distinguish when assigning the svn:branch labels.
@Sean I'm not following how you're getting an association between (say) svn:branch:ansi-20040316-freeze and r46324 ?
I guess I could try to take a list of commits made to tags instead of branches and update the svn:branch: labels to besvn:tag:
labels instead?
Ah. I think this might actually work:
svn log file:///home/user/brlcad_repo/brlcad/tags|grep \^r|awk '{print $1}' > tags.log
That gives us (more or less) the set of tag commits. If we then look for any of them that match commit messages, we get a set of commits. tag_commits.txt
So those commit labels could then be switched from svn:branch:*
to svn:tag:*
I can generate a list of all branches/tags associated with each commit easily enough -- that's what I was doing to validate specific sets, just not systematically on all commits.
Did my breakdown of r46324 make sense? I may be missing something
you mean the question about the ansi one?
or in general?
I wasn't (am not) seeing how you associated that commit with that branch, either in SVN or git?
let me check where I got it from because I agree, I'm only seeing it on four git commits now... maybe misprocessed on a subsequent validation
ah, yeah, looks like i wrote the wrong rev here in the chat.. 46324 is good...
that list was for 46322 ... which looks like it matches so I just got those two crossed when I was checking them manually
cool, that's great -- could be more thorough but that's good enough for non-branch commits -- means all non-branch commits that are tagged look like they're mostly tagged correctly besides the two trunk-UNNAMED-BRANCH commits.
Those are partially my fault - a regex match was too loose and turned master-UNNAMED-BRANCH into trunk-UNNAMED-BRANCH. Either way though they had the wrong branch somehow, so I added corrections
uploading a mapping of all revs to non-trunk branches
can you check on something unusual... commits 21570 through 21634 in svn
I got nothing but a log message.
perhaps cvs2svn garbage of some sort? did cvs2git fix/import any of those better?
Are there equivalent git commits?
Hmm. No matching commit message anywhere for 21570
Here's the portion of brlcad/h/Attic/tclIntPlatDecls.h,v from CVS that seems to have generated that commit:
1.1
log
@file tclIntPlatDecls.h was initially added on branch windows-6-0-branch.
@
text
@d1 585
@
1.1.2.1
log
At a guess, cvs2svn put in an empty commit and cvs-fast-export ignored it as an empty commit...
what about one of the revs in the middle?
that's a huge range of commits, all with detailed log messages indicating activity
I mean, I guess it's garbage or old cvs issue of some sort, so not a problem, but odd
also, how'd you manage to catch/fix r62027 ? looks like it was added alongside trunk and you somehow fixed it (or at least tagged it better) as being a branch
Same deal with 21600 from brlcad/libpkg/Attic/libpkg.dsp,v
1.1
log
@file libpkg.dsp was initially added on branch windows-6-0-branch.
@
text
@d1 115
@
1.1.2.1
I was the one who messed that up, so I knew it was coming and did some manual work in the initial conversion to special case that.
neat
Okay! Finally... here's the list of commits that appear to have applied to multiple branches at the same time: commits_to_multiple_branches.txt
Might want to double-check me there, but that's only looking at the svn side. You may already be handling some of them differently like the branches AUTOCONF vs autoconf-branch ?
Maybe. I recognize 19033 - it's one of the ones you flagged as being missing on trunk. I had removed its commit id from the branch, but if that's right it actually needs to be on both
Blegh. Well, I uploaded the latest state at brlcad_conv14 to demonstrate the switch to svn:tag:
labeling for those commits made to tags, but don't use that for SHA1 lists of any sort - stick to brlcad_conv12. Clearly the post-processing isn't done yet...
That transcript is derived by pulling a diff of all commits and extracting all the filepaths that changed.
I'll take a run through - probably it's just going to mean an adjustment/expansion of the branch and/or trunk commits I need to manually specify revisions for
The other list I know we still need is the svn commit IDs you were able to identify that I had never mapped, like 735 - did that prove practical or were there roadblocks?
yes, that's in one of the windows ... Screen-Shot-2021-02-19-at-12.41.50-PM.png :)
/me grins - it's like my desk, but with yellow terminals instead of dark gray!
I'm not sure what to make of 18999 - I'm not seeing two commits associated with that in Git
I was dark, but eyes needed a different hue some months back
/me nods - it's surprising how much difference that makes over long stretches of time.
/me goes through the list to see if he can quickly spot any candidates for svn revision labels...
@Sean How authoritative was the cvs2svn branch identification for commits? A lot of these in git are tagged as rel-5-2 rather than rel-5-1-branch - given the process I used to try and determine which branch was the "origin" branch in CVS relied on the git conversion itself, it's possible I've not correctly identified the original branches...
A sizable chunk of these are proving to be the mirror image of the other case - instead of the branch getting the svn id and trunk not getting it, it's trunk that got the id and the branch didn't.
How are the rev updates committed in the repo still valid? Doesn't assigning a different tag on earlier commits affect the future commit shas?
Can you give me an example of the rel-52 vs? Could be a bug, but the processing was pretty straightforward to have it report what actually changed.
Yes. Every time I have to do that, I have to upload a new repository. That's why the brlcad_conv13 and brlcad_conv14 repos were up briefly
Sure, one sec
OK, to pick one example (this is typical): r19440 in your list has:
branches/rel-5-1-branch trunk
The corresponding commits in the Git conversion report:
cvs:branch:rel-5-3 cvs:branch:trunk
4d401a8617869d3594b5948de12a374a5bd292fe and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942
morrison@agua brlcad_conv11 % svn diff -c 19440 svn+ssh://brlcad@svn.code.sf.net/p/brlcad/code|grep "^Index: brlcad"
Index: brlcad/branches/rel-5-1-branch/tclscripts/mged/grid.tcl
Index: brlcad/trunk/tclscripts/mged/grid.tcl
No mention of rel-5-3 ...
Looks like the git commit log and diff are correct, just incorrectly asssociated with rel-5-3
How is the branch/tag figured out? I sort of assumed it was coming from the processing. If they're suspect, that might explain some of the missing trunk tags.
The SVN era branch assignments should come directly from repository information. The CVS era branch assignments were done using the script in misc/repoconv/cvs_info.sh
Fundamentally, it uses git rev-list --first-parent
to follow commit chains back up the branches.
I was trying to identify the commit chains independent of the SVN history.
So my question was whether cvs2svn was more likely to correctly assign a correct commit branch of origin. If that's the case, then I'll have to reassign the CVS era branches somehow.
I'm not conversant enough with CVS to know how to try and directly coax the information out of the original repo, so my reasoning was that since the cvs-fast-export conversion was the one we were using from the CVS era the branch assignments were the ones to use for that part of the history.
FWIW, SVN commit r19990 "Release 5.3" was right in amongst the latter of the multibranch commits SVN reported as being on rel-5-1-branch. It seems a bit suspect that all the multibranch commits would be originating on rel-5-1-branch when they were about to release 5.3...
I can't say for sure, but I do recall that branches in cvs are recorded explicitly so there's no guessing. Any tool converting has perfect branch knowledge so I would expect cvs2svn (and cvs-to-git) to correctly reflect what was in cvs in svn.
Perhaps --first-parent isn't appropriate? What if something is a branch of a branch or similar? Git could be tracking through to a grandparent branch.
the branch names are in the ,v files, if you want to see if/when rel-5-1-branch vs 5-3 branch are associated with a 0particular commit. They're in a "symbolic names:" block near the top.
Some explanation here: https://www.astro.princeton.edu/~rhl/cvs-branches.html#branchnumbers
I'm beginning to think git just literally doesn't track this properly at all, at ANY level. If I'm interpreting these number correctly per the Princeton site, it looks like SVN has it correct.
That's really, really annoying.
@Sean I don't suppose in that pile of scripts you've got one that will generate the set of branches for all SVN commits?
Nevermind, got it.
OK, there we go. Can now scrub out the existing cvs:branch labels and replace them with SVN data.
Yeah, that finished processing. Careful if you used the previous script, had a bug.
Here's all revisions, all branches and tags:
commits_to_multiple_branches2.txt
Er, rather that's all multiple branchpoint commits. This is all commits in the repo: all_branches2.log
note the multiple branches list did update, if that changes anything on the processing
OK, I think I've got the branch assignments working using SVN data now. Here's the diff that shows the changes to the commit messages in brlcad_conv12 diff.txt
The sha1s won't match, but I can upload that version of the repository if it is useful.
Updated diff file: diff.txt
@Sean Any luck with generating the mappings? As an alternative if you want you can post the brlcad_conv11 repo you were using (I haven't kept that iteration so I'd need a copy of what you're using) and your existing SHA1 sets - I think I've hammered out an update script now.
Yeah, it's processing now.
Should be done soon. Taking a while to recompute all the hashes. Looks like the first few hundred ended up unmodified (same sha) but once a commit message changed, everything after had to be re-associated with the new shas and that process takes a couple hours (and it's a couple hours in, so almost done).
One curiosity that you can maybe help explain / educate me on ... do you know why a commit like 944 would be in git log --all but not in git log --follow . ?
maybe a bad example -- I didn't check if it was a commit to a different repo or something, just the first I noticed
Hmm. If I save the log output of git log --follow . to a file and then search for c037a5e3a6eb97d2f9455225bbafeffec5b79be4 (which I think is the commit corresponding to 944 in brlcad_conv12) it is there.
I do know in general that git log --all will incorporate the history from all branches, not just the currently checked out branch.
https://stackoverflow.com/a/7203551 is sometimes useful in the context of tracking back specific files.
This also seems to work: git log --follow --full-history -- src/fb/fb-orle.c
Ah, whoops, sorry - that's not the right commit/hash. One sec...
Checking SVN, that's a property change - so the bug is the SVN revision getting assigned at all.
I.e. it shouldn't be in either git log --all or git log --follow .
OK. Here are my thoughts so far: 944 looks like a timestamp match with 52036a8b4569b8ffe90e2e8fb0b43f5ed36ba040. It's got one of the generic log messages, so my revision assignment code went ahead and assigned it that revision.
Based on the diff report from SVN, that's an incorrect assignment and needs to be changed/cleared. Hopefully the diff based checking will catch that.
That probably explains why it doesn't show in git log --follow . - that search is based on all the files in the currently checked out branch, working backwards. Since the incorrectly identified "944" has no files associated with it, there's no way for git to associate it with the history walking backwards from the tree as a starting point.
Or, another possibility - even if it can associate it following the commit chains, an empty commit won't match the "." specifier.
OK - I see "Added fb_close", which is the parent of 52036a8b4569b8ffe90e2e8fb0b43f5ed36ba040, does make it into the git log --follow . output. That suggests it's following the chain through that commit, but not matching "." and skipping reporting it.
(Sorry, that's probably a little more stream of consciousnesses than you were looking for...)
In some ways it's tempting to try to scrub empty commits like that with generic commit messages out, but at this juncture I'd be worried about inadvertently breaking something else...
Hmm. Actually, repowork already has the info to detect empty commits, in principle, and even categorize them...
Some I know we need (branch creation/deletion), some are marginal (commits removing empty directories, which are no-ops in git) and some of them are useless (empty generic message, empty contents).
Here's the CVS era empties: empty.log
@Sean What do you think - should I scrub out the empty commits with "* empty log message*" and maybe some of the other obvious ones?
"BRL CAD Distribution Release 1.10" has a couple non-empties in addition to the 4 empties, for example...
Yeah, it's a variation on the splicing problem. Have the ability to remove specified commits now.
starseeker said:
Hmm. If I save the log output of git log --follow . to a file and then search for c037a5e3a6eb97d2f9455225bbafeffec5b79be4 (which I think is the commit corresponding to 944 in brlcad_conv12) it is there.
by the way what is the difference between conv11 and 12 ?? :)
I don't recall at this point - they were iterative refinements to the process of correcting the output from the main svnfexport conversion (merging git notes into comments, correcting emails, etc.)
conv12 is the "target" for a third series of refinements at this point, mainly because I need stable SHA1s to target for processing. (In principle I've prepared a script to translate between old and new repositories if necessary, but I'd rather not have to use it... this is already complicated enough.)
I see you're getting ahead of my own validation pace... Sorry it's taking so long, I'm just chasing down issues in the multiassignment, a couple bugs in the scripting, wanted the list I give to be more certain than a blanket wash as I'm seeing lots of little discrepancies and ways to mis-associate.
@starseeker can you validate this list against the assignments you made: svn.to.git.complete_matches.log
I don't see any collisions - you've got about a dozen that I haven't got yet, but I suspect that's probably because I forgot to use the version of the SVN repository that had the RCS tags scrubbed down.
I'll have to look more closely at the ones that popped up on mine as matching you don't have... may be an issue I haven't found yet.
420b6c86aebaab8d233b9124aac2dfcaab390158;2253
3626fd67e335d89391ce624b8a3246bd99adffec;2470
231fd989a63e842f6ed485d8ac49caec4eee3660;2471
bbd7e8166d10d1f8c1c3355f87814fd9c4e652df;2489
1a79a71444aa3900b25c61c321c270d3f83d7065;2657
68245f26449e72b3fa8362bfdaa8ec4b458566bc;2841
0857eeb72eb573cad76f86b770e765349a85a671;2875
b026f5c0fe0aa8e2d9ca34051a20fb9afb92162a;2884
3732cf651af0b526eb3ec6bdf5893892f22afef4;2886
84c054fee5394109f52dfaf15add46d671ede196;2890
de144fde847a9fe45cac391c97bd7abaeacc3b0b;2900
d0f9348a8847c22a1f5cb4846f9c7414c7c1081b;3578
Just for reference, those are the ones in your set I've not spotted yet.
I can pick up 68245f26449e72b3fa8362bfdaa8ec4b458566bc;2841
if I sort the diff contents ahead of doing the md5sum.
I have others that partially match, I just haven't validated them so didn't share them yet.
Ah, it looks like the rest categorized in my processing as having non-unique content matching.
/me inspects...
OK. Diff content wasn't unique for r2253 - matches with r22190 - so it takes the path and/or date to resolve. 420b6c86aebaa is correct
@Sean Other matches all check out - your list looks good.
@Sean Sorry, just read back up through chat history - not trying to replace your work (defeats the point of independent V&V) - goal was/is to get representative inputs to make sure my repo updating logic can handle something similar to what the final pass will look like. Just stashed the various bits and pieces (and notes) in case they prove to be useful.
@starseeker e6417be98f27d570d863744f566f5aaf738abbe6 .. I'm seeing listed as branch commit, but it was a trunk commit 19763
/me nods. I'll add it to branch_corrections.txt
Here's a second update with about 70 more matches svn.to.git.complete_matches2.log
had a bug that had to get sorted out in verifying the other 70
One sec while I merge/verify.
Here are more that appear to be mistagged as branch/tag commits:
19033 LOG+FILE MATCH ON c365a032935f99d5cbcc5e0b7316253e918183f5
19211 LOG+FILE MATCH ON 2ec20a87d6e216cc3af62da933a2917e96459ce2
19282 LOG+FILE MATCH ON 4d5fe4e8afa57a275c04f0a11cbf20c1378ce600
19283 LOG+FILE MATCH ON 4af5f01acc93a65ba8e158c1e407e6fa30f0a867
19288 LOG+FILE MATCH ON 9f4472b6c4a9d77005a25bac0e6ea9d0b45c6829
19289 LOG+FILE MATCH ON 3312597ec11da607ad8cdecb8e86ecd6cd43a21c
19440 LOG+FILE MATCH ON ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942
19449 LOG+FILE MATCH ON af33297408e4ec0b38fa37d211104ae8e3f4b850
19558 LOG+FILE MATCH ON 3a6fdd142e59c7fee7dfb06fdaecc3b30f28d633
19587 LOG+FILE MATCH ON a53d24a82016e59e54ad3fa0750238b077313a33
19720 LOG+FILE MATCH ON f1c200f10e9d5c0f896508b2967f644abafad234
19723 LOG+FILE MATCH ON 45a67834524348e32e2c1d34071b59dbb1360d9e
19763 FILE+DIFF MATCH ON e6417be98f27d570d863744f566f5aaf738abbe6
19772 LOG+FILE MATCH ON 6f4104bd83cf4a930bda9cbaa1b811d3e0d236b3
19783 LOG+FILE MATCH ON 6af6602bcdb5227c51a6b467226d5fc70d321855
19797 LOG+FILE MATCH ON 4b51763bd75123f81f069bba1b873c4538776530
19798 LOG+FILE MATCH ON d82708b47d89c008a20ce23ba23ce4aca80cf232
19839 LOG+FILE MATCH ON 8dcb60d4529dc5e0cf99729338e05869cf270c06
Confirmed - no collisions.
This one is an outlier I'm not sure about, 11077485329842c81213eab68006fe5d58b5925f ...
it says it was 21565 but that was a trunk cvs2svn conversion commit. Commit message on 11077.. is that of 21564
21564 is not found tagged in git
/me nods. Probably means it should be 21564
I need to investigate why 21564 isn't in my list of missing commits... should have caught that but didn't
@Sean am I correct that all the commits you listed are on trunk?
I think I've got 19033 set up as follows: aec4367dafd37a7b0657c4b27414caa21ac4c1be is the trunk portion of that commit, and c365a032935f99d5cbcc5e0b7316253e918183f5 is the rel-5-1-branch portion
starseeker said:
Sean am I correct that all the commits you listed are on trunk?
I'll have to confirm that myself, as I've been toggling between processing all commits and only those on trunk.
aha! yes, that explains it. that's why 21564 wasn't in my list. thought I was going crazy. that was a branch commit.
so in svn, 21564 was committed to branch, then 21565 commited to trunk to compensate?? I'm not sure what cvs2svn did there.
regardless, in git .. 21564's diff turned into 11077485.. and perhaps properly tagged as branch, despite being tagged as trunk commit 21565. do I have that right?
21565 appears to be empty in svn
In git, if I'm interpreting gitk's display properly, 11077485329842c81213eab68006fe5d58b5925f is a branch commit. If 21564 was the branch commit in SVN, that's probably what it should be in Git. Not 100% sure why it got the 21565 assignment instead.
Best guess is something funky happened because the timestamps of those two commits are identical in SVN, as far as I can tell.
Okay, yeah, that's what I thought I was seeing as well. Don't see how it got 21565 either. Is there a way to check, see if that happened anywhere else? Not too worried but if it's scannable, we can do a quick check.
Only thing I can think of would be to look for identical timestamp commits in SVN and double check the Git assignments, but not sure how script-able that is (especially since we're accumulating a fair set of revision number assignments/updates.)
f5a1b0037fec2927cba073d118db24cdbd681975
a098425430db227021617976961e6b51ce5569cb
e6417be98f27d570d863744f566f5aaf738abbe6
Those might be worth checking - I think they also had incorrect revision numbers
It might get to the point where I should run the updates we've accumulated and establish a new baseline for additional comparisons, so we can focus without re-discovering what we've already fixed, but I know that would require regenerating the sha1/md5 mappings again. Let me know if you think things reach the point where that would be worthwhile.
Yeah, I'm ignoring timestamps because it'd be a fair bit of work to parse the date string into something that could be fuzzy compared in script land
You may already have, but here's a couple outliers that are partial matches, appear to be probably split commits?:
2125 LOG+DIFF MATCH ON 0c1f4a88c5c960bd7de51ef8a05e7f53f00fb1a2 (NOT TAGGED)
3102 LOG+DIFF MATCH ON 402419dac49d3abe9bd6036f76696b43a70a66f5 (NOT TAGGED)
awesome! got it doing the comparisons in parallel now... that should speed things up a bit!
I've put up a demo repo at https://github.com/starseeker/brlcad_conv15 showing all the accumulated changes thus far.
Simple way to compare the brlcad_conv12 and the brlcad_conv15 logs to see changes seems to be:
git log --all |grep -v ^commit |grep -v ^Merge > all_nc.log
That filters out the sha1s so the message and other changes can be seen easily in a diff.
@Sean It's looking like SVN and git use subtly different diffing algorithms, so the diff file changes don't always map up.
I think I've pretty well reached my limits: https://github.com/starseeker/brlcad_conv16
@Sean what comparisons do you have it doing?
I compared every rev.
/me winces. Yeah, that's a slow process.
Cool thing is that now takes about 3-4 hours total to test every rev.
Crunches in parallel.
Finished over the weekend pretty quickly actually, but I was too exhausted to verify+upload it.. sorry.
Did a workout on Sat that wiped me out.
I have a laundry list now.. will post it in the categoric sets here in a few min.
Np, happens. I ended up manually hunting up a bunch of Git commits in SVN - hopefully that'll be helpful.
Yeah, you may have already found/fixed a lot or all of them.
I've not done anything with 15 or 16. I can kick that off a final pass on 17 assuming there are a few updates, but still working on 12 to keep shas in sync.
/me nods - sounds good.
Hopefully there won't be too much more to do...
FWIW, I'm not convinced all the CVS era commits will be diff free, even if the revisions line up.
Yeah, I think we already found a few differences where commits were split differently. They seem to be very few overall.
I think cvs2svn and cvs-fast-export might have picked different contents for their "synthetic commit to represent incomplete tag" commits... I suppose a case can be made either way for assigning the corresponding SVN revs if that's what happened. I went ahead and did so, but I could go either way.
r4778 actually is a nice compact illustration of different diff picks - at least with the svn and git versions I have, git produces:
diff --git a/librt/db_io.c b/librt/db_io.c
index 3645cea1dc..7faa9be6ba 100644
--- a/librt/db_io.c
+++ b/librt/db_io.c
@@ -32,8 +32,8 @@ static char RCSid[] = "@(#)$Header$ (BRL)";
#include "machine.h"
#include "vmath.h"
-#include "raytrace.h"
#include "db.h"
+#include "raytrace.h"
#include "./debug.h"
and SVN produces:
Index: brlcad/trunk/librt/db_io.c
===================================================================
--- brlcad/trunk/librt/db_io.c (revision 4777)
+++ brlcad/trunk/librt/db_io.c (revision 4778)
@@ -32,8 +32,8 @@
#include "machine.h"
#include "vmath.h"
+#include "db.h"
#include "raytrace.h"
-#include "db.h"
#include "./debug.h"
Git moves raytrace.h down, and SVN moves db.h up, both to the same effect.
Shouldn't impact a full-up revision check of course, but does illustrate the limits of diff comparisons nicely.
I should have one of the lists cleaned up here soon now. Trying to make sure I don't feed you bad data... so much scripting...
The good news is I'd say the vast majority match and map well.
/me can imagine - once this is done I'm going to have to scrub my home dir to clean out a truly amazing pile of intermediate scripting files, checkouts, test dirs, etc.
Yeah, I noticed some of the different diffs like that. Pretty interesting. I found a couple more complex cases where an entire function appeared to be added/removed when in reality all that happened was the end parenthesis on one function was moved and the signature on the next function had an edit. Somehow git's diff engine decided it would represent that as some mangled movement.
@Sean I'm seeing a big swath of differences between r702 and r3735 - given the timing I'd guess that's tied up with that timestamp business in the SVN conversion?
Datestamp wise it lines up, as near as I can tell.
yeah, I noticed them a while back. found many/most of them (or ruled them out as splits/inconsequential).
@Sean if we hit a situation where a commit message matches to one revision but the change matches a different revision, which mapping do you prefer to use?
o.O I'd wonder how that happened...
regardless, I think it's more important the rev match the diff since we're notionally using these numbers to trace back changes in a file
unrelated, here's a neat little find in the commits. there appear to be exactly 7 commits that were perfectly duplicated on branches and trunk:
10 19514 LOG+FILE+DIFF PERFECT MATCH ON c9cc663089d441f8a7d40f63757b0080dec5af10 f5419dcbab0e9edc78c90af24b5318b04686a7b2 (TAGGED MISMATCH f5419dcbab0e9edc78c90af24b5318b04686a7b2)
10 19595 LOG+FILE+DIFF PERFECT MATCH ON 0c2cb0cf51b8f543cd740e758ea3ebe2be964336 afbcb106f05606065ae3ce11b602fa566efb0031 (TAGGED MISMATCH afbcb106f05606065ae3ce11b602fa566efb0031)
10 19605 LOG+FILE+DIFF PERFECT MATCH ON 9bacc2b9ac94977113d3d68617ac4c896a37da60 c614ed067a631ba7d56fee51d1fc289359efb64b (TAGGED MISMATCH 9bacc2b9ac94977113d3d68617ac4c896a37da60)
10 19697 LOG+FILE+DIFF PERFECT MATCH ON e49447b2d924385b7272c6ba8d78e490590f1778 f363b6cbec7bdd415f20e77a9d3734ecfa6cbf98 (TAGGED MISMATCH f363b6cbec7bdd415f20e77a9d3734ecfa6cbf98)
10 19892 LOG+FILE+DIFF PERFECT MATCH ON 200ca9ba685b57dbc4bd0dcd9600649a7bec8117 f5787013aff6a38adc807bcc5a8db617510818a3 (TAGGED MISMATCH 200ca9ba685b57dbc4bd0dcd9600649a7bec8117)
10 19992 LOG+FILE+DIFF PERFECT MATCH ON 1b8fd04c74f8b99551e35ec87d4980bb27735a62 ae67110218bc3d71c5f3301707b5d86a60564cf7 (TAGGED MISMATCH 1b8fd04c74f8b99551e35ec87d4980bb27735a62)
10 64506 LOG+FILE+DIFF PERFECT MATCH ON eb5c98bf8799083d4d946f1f63f9e1edd8e61631 2ca450a34b29f37d58b4ed8288c3f41a4b155a78 (TAGGED MISMATCH eb5c98bf8799083d4d946f1f63f9e1edd8e61631)
Ignore the mismatch, I manually verified and they're all correct in git. It was just interesting because there appear to be so few of those. I kind of expected more, but they were apparently pretty rare to be exactly the same message, the same files, same diff.
How's it going?
So, here's a question - 7496c761e580e1935607fc336ff85bf06c524caf was initially unassigned. It got assigned r10209 based on commit message and history position, but diffing it with the SVN checkout indicates some of the changes for r10209 in SVN got grouped into the git commit labeled r10210 instead.
So we can assign r10209 to 7496c761e5 and be "approximately" correct - presumably the best match available in the git history to that SVN commit, but with a checkout that won't match - or skip assigning r10209 to any commit (losing some mapping info, but skipping a mapping that can't produce a matching output.)
What's the preferred answer in such situations?
So I've been down a rabbit hole trying to sort out how git handles encoding, but it's looking like it's not just that -- I think there's a couple categoric issues potentially. check these out:
b17a2836c85b43422c15faf7b111088bc4e445e3
a9daa166161d57ee6ed486cc9488880ffc5da843
ed4c28dcc1f17520d6596192e2ccae808d44ba4f
bc320ea12852890495809d142600a97eb241bd6f
d1e7455ffff304d2b8f25aba0cf144c6dc0fb4b4
9594f3ce737b98e902379066be02337eabc8db53
18ea6afa636886ee2ba5fb7d7807a920db3ee35e
8a97709dae7e86479bc04ab8d52dcaa65c2b4beb
9ae7c9024838f140c1cb20d0ddaf0606e2e486ef
c13ba71962660bcd2bb471671a08d61c94827e30
9ef20d544982d92f0b1d9183477c42543c4d45c4
4d6d7aad28eed5f23e31aa3f3fc37576de05b6dc
03eab0819b8a74d2a046273443ff14122f2d7e98
92cc90f7397cf45802a70f70260cfa2f57b1fc3b
106637f9c2913d3cc43d8a02a0f955c9709f67d4
6c20c610b10b3c098ad8c8bd53fc111791bca7e6
9f1e2c92eb250b39ac64b981c6246236f0cdb2c5
4c103440e2947d6990386e2767b9778266dd1517
b7a0eb56822e52c1a18ca30f312abea93ead6867
c90bfc8e507ea27863d82ea9ff514d2c79253b98
cb8ebedb7da7eb7981d0038fc826b61f4315e699
b0f3314a23e067051d520b11da483d068b73ebe6
there's clearly some utf-8 going on there that wasn't preserved, but then there's also some utf-8 getting added where it previously did not exist. I didn't scan all commits for the condition -- these are the ones that came up as matching DIFF+FILES but not matching the log message.
looks like about half of them have message up log messages where there was an apostrophe or a double quote. I checked svn and they were indeed just simple single/double quotes, so I'm thinking something in the scripting
As to your question (sorry, had to offload before I lost the context) ... I have that commit matching 10209 and 10210 as well because of the log message match. They match these git commits:
5222348e9f8c57c3a7623700413d0f37a1d74122
7496c761e580e1935607fc336ff85bf06c524caf
46472340020700642675b9613c7ddce85c391bea
becf17cb8e73ddbef7a0e840090712714ef4cff0
5846eaff72182de5f744baf4ef8c757b1e44b615
So you could tag them all or just the first, shrug, all valid enough choices I think
presumably some are 10209 and some are 10210
I have a list of others like that, 1156 only match log message
when I gave them a prelim scan, it looked like most are commits split up differently than they were in svn
So looking at the first one on that list (b17a2836c85b43422c15faf7b111088bc4e445e3) I'm seeing the following:
CVS
add Roßberg to list of contributors
SVN
add Roßberg to list of contributors
Git:
add Roßberg to list of contributors
You're saying your scripts indicate the SVN and Git messages don't match? All three lines appear to have the same utf8 character, at least here...
I'll take a look at the rest of the list tomorrow...
yeah, I don't get the utf chars here when I query git. I could have done something that caused them, but if I just run git show, I get encoded mess
another set of oddities to check on, search git for svn 30687
appears to be tagged across a variety of branches (which maybe happened, I hadn't checked that yet)
okay, so looks like that's part of the story:
svn diff -c30687 file:///Users/morrison/brlcad.github/svn.sfmirror/code | grep ^Index | cut -f3 -d/ | sort | uniq
VendorARL
libpng
scriptics
zlib
yet the git side of things is:
for i in `echo "004ec0ae439f0ca3c814d22a46957012cd8fb239
720f9b9b75588e35d3cce0f9f5b802abea2259ab
f206b315ca475d3a3e55e98ec42d772c6b05baee
cbff64617866cc3fc2b25db15cd610e651561958
87bc784daf7f15cc8d9c9fa980a934a98a17de95
d748c2ea214b699008563e18f5a7105de39faba9
004ec0ae439f0ca3c814d22a46957012cd8fb239"` ; do git show $i | grep svn:branch ; done | sort | uniq
svn:branch:Original
svn:branch:itcl3-2
svn:branch:libpng_1_0_2
svn:branch:tcl8-3
svn:branch:tk8-3
svn:branch:zlib_1_0_4
@Sean what version of Git are you using?
Is https://stackoverflow.com/a/19436421 related?
Also, what do you see in gitk as opposed to on the console?
OK, so it looks like the git commits are spurious - I may have messed up a correction or some such. Of the 4 branches from r30687, only VendorARL is present and it looks like that's because I custom-added it.
r15365 created the libpng branch in SVN, if I'm not mistaken. In Git, that revision got assigned to f8fa716f5077cdde438f676c1b24244a09eb3fcd
r15338 created the zlib branch in SVN. In Git, that looks like e85b06be0fa6632e097d8c728506ab5251a2b635
The scriptics branch has 4 commits - r19756, r19758, r19760 and r19762. Those don't have assignments right now, but it looks like the corresponding commits have r19757, r19759, r19761 and r19763. Looking at them, I'd say the four earlier commits are probably the better content choices for assignment (not to mention having the mapping commit messages.)
@Sean OK, I think I've got the corrective files in place for r30687. Basically, since cvs-fast-export put the commits on other branches, we don't have png, zlib or scriptics branch deletes. I added the proper VendorARL delete, and removed the spurious itcl3-2, etc. deletes incorrectly associated with r30687 in Git.
Also updated the scriptics commit revision assignments.
I don't think the encoding was a git version issue, I think it's just encoding. I think I have it sorted out.
Looks like the git command I used to dump the log and the svn command used to dump the log ended up dumping differently is all.
So that's pretty much the entirety of commits that had UTF-8 characters in them. The suspicious quote-related ones look like they're actually smart single quotes, probably copy-pasted from some output.
Ah, cute.
starseeker said:
OK, so it looks like the git commits are spurious - I may have messed up a correction or some such. Of the 4 branches from r30687, only VendorARL is present and it looks like that's because I custom-added it.
I can pull the rest... r30687 was just one example. There are others like that.
Well, that was mostly a blind alley I should have known better than to chase, but it did result in characterizing some of the commit diffs... looks like cvs-fast-export and cvs2svn sometimes picked different commit ordering for commits with the same timestamps.
@Sean unless you feel really strongly about that I'd rather not try to switch them around - it'll take some effort on the repowork code to support doing so.
We need some kind of "good enough" criteria... my sense is that chasing down all the CVS vs SVN vs Git differences has the potential to be nearly endless...
Yeah, I'm not worried about commit ordering. The oddity was the multitude of seemingly unrelated branches. Working on pulling that list still, had some diffs that had to get recomputed and worked on tallying where we're at.
My criteria has been to identify or explain all the non-empty trunk commits. That all are tagged or otherwise accounted for correctly (i.e., with something matching or it's a split commit). We're definitely closing that gap.
I can pull the rest... r30687 was just one example. There are others like that.
Here's the others that were like that:
30687 NOT FOUND (empty files) (TAGGED MISMATCH 004ec0ae439f0ca3c814d22a46957012cd8fb239 720f9b9b75588e35d3cce0f9f5b802abea2259ab f206b315ca475d3a3e55e98ec42d772c6b05baee cbff64617866cc3fc2b25db15cd610e651561958 87bc784daf7f15cc8d9c9fa980a934a98a17de95 d748c2ea214b699008563e18f5a7105de39faba9 )
30688 NOT FOUND (empty files) (TAGGED MISMATCH 3882bb89a329277499b8b6c2246be115544740a0 68aeb784b3ee698c854878c190eb4b229b88e1fe )
30690 NOT FOUND (empty files) (TAGGED MISMATCH bab9cb74c7cf403e3c6ffb862367e7e921d5de5e 1f1d7a7f607d5b4c673d1d73cc7bcb126b0da82b ebaea28c7f234f5af88bbe8f60e8cae1026d7f08 9a4972e8d397e2fe1457987252531bbb08aae2b5 2000a7fd53ba7f017eadb55168ed737a0e6d2906 47ca01661701d59ee6aa948cd914b42e9ae9e36e )
36471 NOT FOUND (empty files) (TAGGED MISMATCH 5d5a16ac1af3bef7ea3acd9df913a882ecb2c450 cf54441bbb9da781638c782f0330e2399b114ba2 f3402be29c09993717319df0a8045087c3c1efcc 29fb00141b4040de08c9319404bfe44946ef43f2 2c43fbad65f4bc373dfa80a6254077b5913623d0 e19308e9b43771204ad04daa015bb646ffda7077 )
36472 LOG+FILE MATCH ON 96a3e5fb75628744e4835d9ce2f7cbf8dbca8ec4 (TAGGED MISMATCH 96a3e5fb75628744e4835d9ce2f7cbf8dbca8ec4 c0737a9252506872ce5ce6cd14207f7c375741da )
46324 NOT FOUND (empty files) (TAGGED MISMATCH 44e3d7341c5680250d65091b2aff6ed051720a11 a988903bbe27985e0dd94228e07079e91e98be4d c54b9b07158d4a904aabddae264290854ecb250c 03af105da8dd3cf85a29cc7f056513cc8e79d751 )
46322 NOT FOUND (empty files) (TAGGED MISMATCH e56ca9ed3e746b0f0531a5a90a50706dc4486786 cbd805930e92e0174548d245eee8a50f79f4be6a 8db928ed630bba609e98e97045dc91377539353e f64cf35a3a10e027863a68a07f6d4dda041d0fb4 3e54caeb944540d809a8c123289f9fb3624b7509 f199b69dd1f620bfa299a9e8fd520c37cc9b3c26 33b42ffbd7c4aa5e42e5854d020a8d66dd69ccfc a6225b252463bcb48ce3376200227c1e783c77d5 )
46328 NOT FOUND (empty files) (TAGGED MISMATCH ccb829355adc0829b9a5a7a3f0b5ac72dc13ea45 a442ff82f39e00b14ef139fb8f62b18c0ec32046 e4fba5a7cdbc525184d64170eec22e7eeedbd1f2 4396a0b1cd513d4ce9945589c4aadb17eda9a6d0 5452bab5c382ff6f2d0af42c2d4b367a0fdc13aa 9884b41b3aa6f790c80dbb4a55cf5cea4844fc8b 46d4a300710516c5547fa1b6f64ba29ec64ab3b4 )
46335 NOT FOUND (empty files) (TAGGED MISMATCH dd2bb79965568f5aab4f7458606d875d22b74b40 f5e6fc5ebfaaedceb7538a1f2ba1a3fc1589c399 )
62127 LOG+FILE MATCH ON 797d0138514136e2e95b0dfa1cc7d2e774fef2ab (TAGGED MISMATCH bae6fd511505e5e4f12f16b1cd73b5381f4f47f6 797d0138514136e2e95b0dfa1cc7d2e774fef2ab )
62975 NOT FOUND (empty files) (TAGGED MISMATCH dbaf54ff6b25ad2f576f82f26086101bc5015dec e047bc1116cc3199bbdbf58101ef281c153c2b74 )
69921 NOT FOUND (TAGGED MISMATCH cca216f058fe5791dbbd082ad7293911b6aae9f6 f828d1c0b1f6e68879a1bdecb2c58d1dc9a9207b )
can ignore the NOT FOUND / empty files -- that's just me not tracking branches. What's interesting is those are revs tagged to multiple git commits. Some of course may be intentional, but that's all of them (on conv12).
OK. Going through the list...
r30688 is on two commits because it eliminated both the branch with the Mac Hack commit and other "unlabeled" branches which mapped (collapsed) to "master-UNNAMED-BRANCH" in the cvs-fast-export conversion. So master-UNNAMED-BRANCH has two branch delete commits assigned to it. Might as well delete 68aeb784b3ee698c, since it doesn't add anything.
r30690 is deliberate - multiple branches removed in a single commit.
r36471 is deliberate - multiple branches removed single SVN commit
r36472 is the result of a branch naming consolidation - c0737a92525 can be removed.
r46324 is deliberate - multiple deletions single SVN commit
ditto r46322 - multiple tag deletions, single SVN commit
Same with r46328 - multiple tag deletions, single SVN commit
Same with r46335 - deliberate
r62127 looks like a branch delete that registered an empty commit on trunk for some reason - 797d0138514136e can be removed.
r62975 is deliberate - multiple branch delete
r69921 - looks like a branch rebase got recorded somehow as a branch delete plus re-creation - f828d1c0b1f6e68879a1bdecb2c58d1dc9a9207b can be removed.
@Sean Note that I went and manually tagged a lot of Git commits as mapping to multiple SVN revisions in the post-conv12 update logic...
I did notice that... it "should" just mean a lot more multiple matches no? If so, I think we can just do a post-process check later to make sure there wasn't a typo or other blatant mistake in the manual tagging, but shouldn't affect the upload V&V.
Yes, more multiple matches from the CVS era commits.
I wasn't sure how "deep" you wanted to go checking those manual tags - the majority are based on context (unmapped commit that is immediately before a mapped commit, with a file missing from the "mapped" commit compared to the SVN file list) but I'm not set up to actually try and validate all the diffs as being part of the SVN commits.
It may not be possible in all cases anyway, if one git commit ended up getting deltas from two SVN commits - in that case the best that can be done is an "approximate" assignment.
ankle deep, just blatant sanity check to make sure they are deliberate or mistakes since they were outliers.
Okay, @starseeker here's a batch for you to check out, myriad issues. These are all the commits that do not map uniquely. Most are probably correct as-is and simply aren't unique because they were a common log message applied to the same files or similar or were branch commits (keep in mind that I'm ignoring branch-only diff data so they show up as "not found"), BUT the rest are all multiple candidate diffs. Could be entirely benign or correct, but could use your eyes on at least some of them.
svn.to.git5.multiple_matches.sorted
Here's one you may have already captured with changes you made a couple days ago, but here are all the commits that match svn revs in LOG+FILES+DIFF, but aren't tagged revs in git (or at least weren't as of brlcad_conv12). That's not to say that they should be all mapped -- it's entirely possible for a commit to have gotten split and just happens to map to another with the same files and log message. I'm not sure how to rule that out, but maybe you can verify them easily. There's 167 in this category:
svn.to.git5.matching_not_tagged
Feel like these two might be swapped:
+9016 LOG+FILE+DIFF MATCH ON 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b (TAGGED MISMATCH 68b56645d7a689a3af445bb5dfef16c78a4a4270)
+9015 LOG+FILE+DIFF MATCH ON 68b56645d7a689a3af445bb5dfef16c78a4a4270 (TAGGED MISMATCH 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b)
or they're splits because of cvs screwery and they're right because of other adjacent commits?
another set similar to matching_not_tagged is this batch that aren't/weren't in git but match a log+file pairing, possible candidates. Note some are non-unique.
svn.to.git5.matching_lf_not_tagged
here's a much smaller but similar set of untagged commits where a matching log file was found, possible candidates for manual tagging. affects 37 commits:
svn.to.git5.matching_l_not_tagged
In theory, I think those 5 data sets reconciled fully should nearly result in full coverage... the only ones missing should be ambiguous cases. I'll can run a final trunk pass on any changes you make (conv16?) and we can see if there are any left! This might be it.
Kind of exciting!
Here's where they tallies stand:
78233 total unique commits
-10544 PERFECT MATCH
- 9356 NOT FOUND (branch changes)
- 939 EMPTY (prop changes)
-50807 LOG+FILE+DIFF (matching)
167 LOG+FILE+DIFF MATCH but not tagged (all UNIQUE)
141 MISMATCH or duplicated candidates
- 5001 LOG+FILE (matching)
38 LOG+FILE MATCH but not tagged (13 UNIQUE)
776 MISMATCH or duplicated candidates
- 180 LOG+DIFF (matching)
- 29 FILE+DIFF (matching)
- 30 DIFF (matching)
- 0 FILE (matching)
- 1117 LOG (matching)
90 MATCH but not tagged (all UNIQUE)
350 MISMATCH or duplicated candidates
------
229 unaccounted for in mismatches not tagged
- 90 LOG not tagged
- 13 LOG+FILE not tagged
-167 LOG+FILE+DIFF not tagged
------
-41 dupes not excluded properly (oops)
Sean said:
Feel like these two might be swapped:
+9016 LOG+FILE+DIFF MATCH ON 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b (TAGGED MISMATCH 68b56645d7a689a3af445bb5dfef16c78a4a4270) +9015 LOG+FILE+DIFF MATCH ON 68b56645d7a689a3af445bb5dfef16c78a4a4270 (TAGGED MISMATCH 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b)
Looks like they are swapped based on the diffs, yes.
Going through the multiple_matches file, it's looking like the SVN era commits are mostly "checkpoint" or similarly ambiguous commit messages on similar file sets (which is what I would expect for the SVN era - given how the commits were generated for that portion of the history I'm not sure how we'd get an SVN revision number mis-assignment, since the commits were generated on a per-SVN commit basis to begin with...)
r62027 and r62708 are a bit more interesting - they are branch creation and deletion commits that represent me adding and deleting a branch in the wrong place. They are candidates for removal, unless you want to keep them to preserve the history of what happened at those particular SVN commits.
@Sean I've scanned the logs for the SVN era multiple_matches commits, and r62027 is the only one that jumped out - the rest appear to be either checkpoints, branch syncs, throwaway test commits, or applying identical changes to different branches.
or a few that are different changes to the same file with the same commit message.
One I'm not following - how come da4ace8194f81d0f92565b428dfa309143b37914 and ae970a06e7d02f63e7c77ff927af5ca90721a111 are getting flagged as 75110 match?
I see they're "rename" commit message commits, but I'm not seeing any file matching...
r799 is an example where the commit groupings ended up different - match.c isn't in SVN 799, but the vdeck.c changes in that commit do appear to align with the r799 changes.
I'm trying to go through the CVS era a bit more carefully, but so far none of the multiple_matches seem to indicate mis-mapped files. A couple untagged commits that matched entries on my list, and one minor correction to a git rev assignment.
The matching_not_tagged I confirmed as being part of svn_rev_updates.txt, and the lf_not_tagged had a few that appear to be valid matches as well (some aren't). I think I've accounted for the l_not_tagged commits as well.
@Sean It will take me a bit more time to manually confirm that none of the "TAGGED MISMATCH" cvs era git commits are actually incorrectly identified, but I'm hopeful they're good. I've uploaded the current state at https://github.com/starseeker/brlcad_conv17 - if neither of us finds anything else, I'll do a final update from SVN and we'll be ready to roll.
(The most likely source of any remaining issues is if you spot something in my manually assigned commits - they're more extensive than the ones from the svn.to.git lists, since I was making a stab at mapping all the commits I could back to SVN.)
starseeker said:
r62027 and r62708 are a bit more interesting - they are branch creation and deletion commits that represent me adding and deleting a branch in the wrong place. They are candidates for removal, unless you want to keep them to preserve the history of what happened at those particular SVN commits.
I can go either way. I would probably preserve, but not a big deal either way.
starseeker said:
One I'm not following - how come da4ace8194f81d0f92565b428dfa309143b37914 and ae970a06e7d02f63e7c77ff927af5ca90721a111 are getting flagged as 75110 match?
False positive, can ignore them. They're an artifact of how branches were handled from svn. They have empty file lists, so it erroneously thinks it has a better match than it really does. I didn't get around to detecting and handling that case differently. If you come across a rev that is branch activity, you can just skip it.
So.... based on an assumption that svn_rev_updates.txt is correct enough, we're done to just 48 to resolve...HOME STRETCH!
...
and now down to 8!
and now 0.
O.o
@starseeker Is it possible to tag a merged commit with two revs?
Maybe... if we replace the commit with a new commit having a custom message. A bit tricky, but doable if it's only one or two
there are a handful of svn_rev_updates.txt that didn't apply because the commit was already tagged as something else
when I looked, it's because it's both
Hmm. How many?
I don't know, can run a script to find out -- but basically it's all the entries in svn_rev_updates that are on a commit that has something else
example f9fd3ad956d23e854df73294083cb37ef3c2f341
that's 9011 and 9012 iirc
Oh, OK - we're talking CVS era commits. Yeah, I'm not surprised.
yeah, oldest I found was r14320 which is in 76e74e9e9ce955bf6602171e67cbcd9539bfbec9
My original thought was that as long as we had one rev number assigned in the right general range, that would provide timeline and history context. Is it worth trying to tease out the multiple commit mappings?
Seeing as they won't be 1-1 regardless in that case...
nah, I think it's fine -- just didn't know if it was possible/easy
let me do a quick check to see if we're talking about a few dozen or hundreds
Not trivially - to really do that "Right" I would have had to generate independent per-file diffs for all the git and SVN revisions, then find all the corresponding changes and do all the multi-mappings.
If it's just a couple we can fake it by doing hand-assembled replacement commits, but anything more intensive would be really tough.
Need to leave something for the next generation of historians to figure out ;-)
I'm not looking to find them beyond what's already in svn_rev_mappings.txt ... there's some unknown number of them in there already
In hunting down the last few trunk commits missing, it turned out they were in the mappings file identified
Ah.
they just weren't tagged because that commit got tagged again as something else
only writing out the latter
Whoops. I thought I had checked for those - must have re-introduced a couple. Hang on - awk + sort + uniq to the rescue...
/me blinks - all the sha1 keys in svn_rev_updates.txt appear to only be in there once...
Must have happened earlier in the process.
If we need to do it for the missing ones, as long as it's not too many, I can do what I did for the "Mac Hack" commit to fix its data and make replacement commits to apply.
it's not that there listed more than once
it's that whatever processing association that normally happens happened (or it's because i'm on 12 and if I were testing 17 then I wouldn't find 9011 instead of 9012 and vice versa
/me nods - I get it, and we actually want both so we don't have missing svn rev mappings.
or we ignore it because it's just 3-4 of them
i.e. something for grep to match for both 9011 and 9012, even if it goes to the same commit.
or edit the log message at the last minute
If it's just 3-4 I can deal with it pretty quickly (knock on wood)
I'm running a check now
If it's a dozen or more I'd be more inclined to punt
looks like 859 and 858 is the first it found
/me goes fishing...
it's up to 1500 so I suspect it'll take it 10min to get through them all
You said the highest one was in the 14k range?
well just that I ran into looking for missing trunks, but that won't have all the assignments you did
I'd have to check 17 for that
Ah, right.
/me tries test with 858...
I think this scan over svn_rev_mappings will be a good enough check... if they're really rare, then we can just punt
or can amend the log and let the shas shift prior to upload
so far it's only found 4 and it's up to 10k
Here's the list:
MISMATCH on 859 and 858 in c7da20384024574fddc07c59dcdfcc2879560e31
MISMATCH on 9011 and 9012 in f9fd3ad956d23e854df73294083cb37ef3c2f341
MISMATCH on 11406 and 11407 in eb0179c08aefd8ea90697c42eba31244e4904eed
MISMATCH on 12424 and 12425 in fc9e5a26cba18a926c644a4e2bb4b321855f2a88
MISMATCH on 14320 and 14321 in 76e74e9e9ce955bf6602171e67cbcd9539bfbec9
MISMATCH on 18892 and 18993 in 67c46ada661fdab789632885c34bf77a277962db
MISMATCH on 21564 and 21565 in 11077485329842c81213eab68006fe5d58b5925f
MISMATCH on 22525 and 22521 in edf3df35c8c44492fa25cb3999788338b1f2570b
MISMATCH on 19756 and 19757 in a7c85f280677d70b8eef9aadf79302736ed26ffc
MISMATCH on 19758 and 19759 in f5a1b0037fec2927cba073d118db24cdbd681975
MISMATCH on 19760 and 19761 in a098425430db227021617976961e6b51ce5569cb
MISMATCH on 19762 and 19763 in e6417be98f27d570d863744f566f5aaf738abbe6
9015 and 9016 look suspicious...
Weren't those the ones you though were flipped earlier?
I think I looked and concurred...
yes, they were. okay, so probably showing up just because I'm comparing then to conv12
OK, that's not too bad - my 858 test seems to be going smoothly, so I can probably get the others.
Any other TODOs before switch flipping?
were you still sorting through the other potential branch taggings or done with them?
You mean the multiple_matches file?
well that one, but more importantly the three not_tagged files to see which if any don't have a tagging
since those were content-based lookups, some were unique untagged matches
I think I checked the non-tagged and all of those commits were listed in svn_rev_updates.txt
multiple_matches has a pretty high false-positive rate - I'm basically checking the TAGGED MISMATCH commits against the trunk diff visually to make sure they look like they're correctly lined up. LOG+FILE is apparently not a terribly unique key in the revision set (mostly my fault, too, from what I've seen so far... I should go back in time and tell myself to use more unique commit messages.)
yep
the thing about multiple matches is those are revs for tags that were not tagged in conv12
you mean sha1 keys that didn't have a matching SVN rev?
probably would make sense to only check the multiple_match revs to see which if any are NOT listed in svn_rev_mappings , since they're potential new info
svn revs that aren't referenced in the git log
it should be exclusively branch commits since I half ignored them
no worries, it was just a thought.. I think it's good to go
Did you want me to do the multi-svn labeling? That'll probably take about an hour
would it be faster to just edit the commit?
Maybe, but I'd have to figure out how to do that...
I mean just edit the log message
git --amend or whatever it is
Um. Let me try that once...
I think it's actually "git rebase" for older commit messages.
@Sean 22521 and 22525 should both still be there after svn_rev_update - 22521 was moved to afd806bf472d0ac4b2685be406966a5a6eb28e5c
Ah, I'm supposed to be correcting 18992, not 18892 - that's why
@Sean OK, re-running - I'll upload to brlcad_conv18 once it's done.
@Sean ec2350e47ab0a7a6a2e4f798aaf3a348775077ef is tagged as both 2103 and 2185
Ah, I see - 2103 is right
https://github.com/starseeker/brlcad_conv18
@Sean I think that's got everything.
Cool, checking it now.
@Sean How we looking?
It's still chugging through it all; should know how it looks here in a bit. Per the checklist, we're done with the repo itself if there are no problems on this final pass! so exciting!
Then on to the dang trackers and such...
This is taking a little longer because I had to re-extract the diffs that ran last night. I forgot to set diff.renameLimit on the new conv18 which caused slews of false differences. The re-extraction is running.
I should probably figure out how to set that in my personal config, instead of having to set it every cloning.
I pulled all the latest SVN commits in - brlcad_conv18 should now be up-to-the-minute (i.e. r78389)
Unless you see an issue or someone commits before validation completes, brlcad_conv18 should be ready to upload.
I'm only checking through 78233 just so numbers can be compared with 12, but sounds good!
You had indicated you wanted to do the final upload to the BRL-CAD github site - after setting the origin, this is what I use to push to upload everything:
git push --all -u origin && git push --follow-tags
I'm not sure if a basic clone from github will get all the branches, so I'd recommend pulling a mirror clone:
git clone --mirror https://github.com/starseeker/brlcad_conv18.git
cd brlcad_conv18.git
git remote set-url origin git@github.com:BRL-CAD/brlcad.git
I find myself using --git-dir a lot in scripts (cuz pwd is so passe)
@Erik that's helpful, thanks!
where is that documented?
@Sean did the updated run succeed? (by the way, I think that limit can be set with: git config diff.renameLimit 999999
)
git config merge.renamelimit 999999
may also be relevant
@starseeker: man page? :D a lot of cmds have similar (ninja -C, make -C, cmake -B <dir> -S <dir> ..)
@Erik Oh, I see where I went wrong - it's a top level option supplied before the subcommands, so it's not in their --help statements.
yeah, it's a strange beast, git args/cmds are applied in order with side effects.
starseeker said:
Sean did the updated run succeed? (by the way, I think that limit can be set with:
git config diff.renameLimit 999999
)
That's what I set, and it's needed to get the right diffs/logs for our history. The problem is that config is per cloning, so have to remember to do it every time.
Working on the upload, I think we're golden...
Few changes in the numbers I've been looking at, but nothing turning the train around.
Could use a test, but the svn repo "should" be read only now for everyone.
I'll poke it quick
Bingo:
svn: E000013: Commit failed (details follow):
svn: E000013: Can't open file '/svn/p/brlcad/code/db/txn-current-lock': Permission denied
Excellent.
The migration is almost complete then I guess
The migration is almost complete then I guess
Yes it is, @Sumagna Das ... coming super soon. Just tallying some final stats.
Awesome.🤩🤩
One of the world's oldest continuously developed source code repository's migration should be complete later today... ;)
Noice
how many more hours will it take from now to see the repository on the github?
will it be up tomorrow morning (for me) as it is past 12 midnight?
@Sumagna Das I'm hopeful, but I'm still trying to figure out something that changed.
Can I assist?
@Sean I'll be back on a bit later - please post anything I can help with. I'll be glad to re-run the final fixup pass again if necessary...
I need to make sure it's not something I did differently. I should know here in a bit. I need to pull conv12 again to confirm.
Still might not be enough to stop the gravy train, but it was a surprise. I'm hoping I just fat-fingered something.
svn commit counts changed?
numbers are off. I don't want to speculate too much until I rule out a couple things.
K. The good news is that as long as I don't need to do significant rework in the repowork C++, the post-brlcad_conv12 portion of the conversion runs pretty quickly.
/me will be chewing off his fingernails elsewhere for a few hours...
@Sean How's it going?
It is done. https://github.com/BRL-CAD/brlcad
@starseeker Please check and let me know if you see any mistakes.
I was ultimately able to reconcile most of the big differences, many looked like branch commit improvements (e.g., looks like you categorically eliminated the "initially added on branch" commits, that was 118 of them).
Can I clone the repo right now or are you guys still checking if there's any issues?
Awesome - thank you @Sean for grinding through it!
@Sumagna Das Give me a couple hours to check - this is almost certainly it, but I've got a couple things I need to do before I can focus properly on it.
ohk
(starseeker can focus properly now? :astonished: ) :D
<snort> Only on one thing at a time. There are days when that's a significant handicap...
OK, pull request and direct commit both worked, branches are present, tags are present, logs match, Contributors is populated. Looks good!
Looks like I should have made that a rebase for the pull request... generated a merge commit too. Oh well.
@Sumagna Das Looks like it's good to go.
I'm not as concerned about have a "clean history", let it be what it be. ;)
The thought that worried me is that checkout out (say) prior release tags will produce checkouts that won't build.
@starseeker I hadn't looked at permissions yet, can do that today. There's a lot to do.
Unfortunately, the fix requires a full (multi-week) re-run of the full process...
what do you mean??
If we want (say) 7.30.0 to check out with tkhtml in a build-able state, I need to re-generate the history leaving the tkhtml RCS tags in place. That's an adjustment to the filters, which means a full re-run.
Hm, I'm still not following. Why wouldn't a checkout of the rel-7-30-0 tag be any different than what it was? Is it not right? Or not right for src/other because of how history was spliced?
It's not right because I made a point of stripping out the RCS tags to make the git history following cleaner. So, for example,
static const char rcsid[] = "$Id: cssparser.c,v 1.8 2008/01/19 06:08:13 danielk1977 Exp $";
became
static const char rcsid[] = "$Id$";
I exempted a few specific directories early on that were problematic (mostly the step related stuff) but tkhtml was one of the ones that got stripped.
It never crossed my mind that those headers might be a compilation necessity, and apparently for all the scrutiny I put on the conversion (diffing, log messages, etc.) I never actually tried a full compile of the generated checkout.
It looks like tkhtml does some cute trick where it generates a list of source files that go into a generated file, and the script that generates that source file is matching on those rcsid lines.
Actually, I should probably confirm that they were originally populated in the raw SVN data - since SVN will do RCS keyword expansion, it's theoretically possible that they were stored unevaluated internally. If that's the case, even exempting src/other/tkhtml won't fix it because Git doesn't populate RCS tags.
I'm not sure what to do in that case, actually, short of using something like https://github.com/turon/git-rcs-keywords
@Sean It looks like git-rcs-keywords can populate the RCS tags.
a force push to fix something like that would be pretty traumatic once this is "the way"
I know. I think the answer is going to be git-rcs-keywords
I'm testing now, and I think it can work.
OK. Got it.
I'm working with the https://github.com/kimmormh/git-rcs-keywords fork - there may be a better way to go at it, but that's functional in testing.
Here's how it works:
Install the two scripts (rcs-keywords.clean and rcs-keywords.smudge) to /usr/local/share/git_filters
Add the following section to ~/.gitconfig
[filter "rcs-keywords"]
clean = /usr/local/share/git_filters/rcs-keywords.clean
smudge = /usr/local/share/git_filters/rcs-keywords.smudge %f
Check out the git repository:
git clone https://github.com/BRL-CAD/brlcad.git
Copy the attached file to brlcad/.git/info/attributes: tkhtml_attributes
Note that attributes is the file name, not a directory - the file should be renamed.
What this will do is match the particular tkhtml files in question, and populate the tags. Placing it in .git/info (rather than .gitattributes) means it will be active for all checkout activities (a .gitattributes file wouldn't exist in older checkouts, defeating the purpose.)
I've adjusted main's copy of Tkhtml to not use an RCS tag for what it is doing, so it will not be altered by this filter. The older checkouts still using the $Id: tag, which are the ones that need to be populated, will match and be populated (thus being viable for compilation.)
The attributes file specifically calls out only the tkhtml files in question, to minimize processing time overall.
@Sean If that looks workable to you, I can write it up for inclusion in the src tree
and then a multi-hour regeneration and multi-day re-review?
@Erik pardon? the rcs-keywords works with the existing repository
oh, awesome, i thought it'd be a history rewrite to add the appropriate bits
Going that route means we don't need to worry about re-inserting any RCS expansions into the history - they're just populated on checkout
If we want it to work without any RCS expansion, yes - that's a multi-day rewrite, not multi-hour. If, however, we use the filters to do the expansion just where we need to, all a user has to do is set up the .gitconfig and attributes file.
That has the advantage, once set up, of giving us expanded RCS keywords anywhere we need them. I'm 90% sure the expanded tkhtml tags were originally in the commit history, but if I'm wrong about that even a full regeneration of the history wouldn't be enough - I'd actually have to inject the expanded tags into the commit history.
cool, please leave breadcrumbs for the next poor fool who tries to compile something old :grinning_face_with_smiling_eyes: (i had a 43bsd compiling an old old version in a simh vax11, crazy people will do crazy unexpected things...)
The drawback of this is it's not a "working out of the box" solution - because Git has no way to expand RCS tags by default, it requires some work by the user to prepare the solution.
The best we can do is pre-bake everything and tell folks exactly how to set it up.
@Erik And that's the primary drawback - someone coming into things cold and not knowing they need to set up RCS expansion for older checkouts.
Ooo! There might be an even better way...
/me tests
Sweet!!!!
We don't need to mess with rcs-keywords at all.
All we need to do is put this file at .git/info/attributes : attributes
Do tell, but I have a couple thoughts on this. First off, I'm not terribly concerned about tkhtml working but would be concerned if there's not a simple workaround that can be discovered when the failure is encountered.
what's that attributes file do? looks like it'll match on any files with those names??
Yes, it's a filename match. I haven't figured out yet how to do a full-path match successfully.
It uses the ident property as documented here: https://git-scm.com/docs/gitattributes
looks like src/other/tkhtml/** is what you want
As it happens $Id$ is the RCS keyword at issue, and although the Git expansion is totally different from RCS/CVS/SVN it satisfies the compilation requirement.
/me tries
I'm still more concerned about what the error looks like... is there a non-git workaround possible?
like "install system tkthtml" if that happened to work (which I bet it doesn't)
Um. It might, actually, if anyone packages it...
or comment something in cmake
we test and detect tkhtml??
I thought that was one we didn't check
At one point I did try, but I may have removed it - it's impossible on headless build nodes and problematic otherwise (almost no systems install Tkhtml)
I'm set up as a 3rd party Tcl package build in CMake, so maybe.
/me tries
if there is a simple way to turn it off (even if it disabled and man viewer), that'd be acceptable workaround
Looks like the src/other/tkhtml/** line works.
I'm fairly sure I didn't set up to disable just the Tkhtml dependent components.
I could do it now of course, but that wouldn't help older builds.
Let me see if I can get a system tkhtml install.
what happens if you -DENABLE_TKHTML=OFF ?
of if you just comment out the THIRD_PARTY_TCL_PACKAGE line in src/other/CMakeLists.txt
those would be generic workarounds we could live with
Configure fails. (unnoticed dependency in tktable build on tkhtml build logic being loaded first.)
damn
Actually, wait... let me double check my SVN state.
Yeah, OK - it actually did find the system tkhtml, but then tktable wasn't happy. However...
If we install BOTH tkhtml and tktable, that worked.
/me builds
Urmf. It builds successfully, but at least on Ubuntu those packages doesn't seem to work - Archer can't load and while MGED will load, the man viewer won't come up.
rtwizard will load
/me facepalms sadly
so what's the minimum to turn it all off?
comment package line for both tkhtml and tktable?
it's so absurd that they did that with the $Id tag... ugh.
That would produce the same result. Archer requires them and so wouldn't work, and MGED would work but without the man viewer...
Installing the system tkhtml and tktable produced as much of a working config as we could expect without tkhtml/tktable built.
because of require lines in archer?
I believe so - Archer's right panel uses tktable, and both the help viewer and man viewer use tkhtml. The way the Itcl class system works, if I'm remembering correctly, it all gets defined and loaded up front.
will do recursive globbing into subdirs, you can prefix / as the system root, too ( so /db//*.g )
okay, so archer's got to have it
what about surgery?
aieee, not bold, double asterisk will do recursive globbing :D
that's a finite list -- can it be reduced to a one-line edit in tkhtml's build logic with the list of file names that it would have extracted (or a glob)?
The simplest possible source code fix is probably a one line sed line of some sort that replaces all the $Id$ lines with $Id: tkhtml$
That'll be way simpler than trying to do anything with the Archer codebase.
The issue is already fixed in the Git main - the only problem is the older checkouts that we can't change.
right, which I think is a problem unless we can figure out a trivial workaround :(
otherwise, there'd be almost no point to all the old tags since all the ones since tkhtml was introduced won't/can't work at all
echo "/src/other/tkhtml/** ident" > .git/info/attributes
doesn't count as trivial?
find . -path \*tkhtml\* -type f -exec sed -i 's/\$Id\$/\$Id: tkhtml\$/g' {} \;
should also do the trick.
it's a solution, but then it assumes git and is completely non-intuitive... that doesn't feel right
I'm not seeing where Id is being used...
in either tkthml or tktable
In main it's not, any longer. It was in mkdefaultstyle.tcl
src/other/tkhtml/src/mkdefaultstyle.tcl:21
ah, ${DOLLAR} is why...
so looks like that writes out four files...
Hm, I'm not following this logic ... I think I need to get a branch and see how it was
You mean what the tags originally looked like?
no, I'm looking at the logic in mkdefaultstyle.tcl
it's writing out #define lines (to whomever is calling mkdefaultstyle.tcl), reading from just four files (html.css, tkhtml.css, quirks.css,
and *.c (okay, so not four files, but four categories)
https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/src/other/tkhtml/src/mkdefaultstyle.tcl
yes, that's what I'm looking at...
That file is very old... I don't know that I ever changed it.
I'm saying I don't see yet what that file is actually doing.
because the logic says it's writing out #define lines, but none of them exist (yet)
They go into build/src/other/tkhtml/htmldefaultstyle.c
during compilation
Target in CMakeLists.txt line 34
Yes - custom build step, prior to building Tkhtml lib
hm, so maybe ... can you generate it on trunk and see what happens on branch if you just drop htmldefaultstyle.c into the build dir?
So, check out an older branch in git, take the trunk copy of that file, and drop it in the build dir?
That seemed to work.
@Sean You're thinking to stash the older htmldefaultstyle.c somewhere and just have folks drop it into the build directory?
something like that, it's agnostic
does it work if it's dropped into the source tree? could commit changes to the tags...
course, could just expand the Ids in the tags too..
one time patching
Yeah, that'd be the way to go. That file is entirely build dir generated.
what about just checking out all tags and committing the Id expansion?
does that screw with anything?
I think we'd need to make new branches for those tags that don't already have one - a tag in git is just a pointer to a commit, so we'd be making a new branch from that change for the new tag.
If we did that I'd want to use the same fix I made to main, so we can still keep the attributes option without breaking anything. The attributes solution allows arbitrary older commits to work, tag fixes will only address the tags.
Actually though, with an ident based solution there, it probably won't matter anyway.
right, I was thinking tags==branches
going to take a while to adjust my mental model
/me got it pounded into him for a year writing converter logic ;-)
they're just dirs in svn, so you can keep committing to the tag/branch
Right. The ones were we did that I believe have branches already (or they should) since we can't commit to tags. However, if any tags weren't edited that'll be new branches to introduce.
/me is game to do that if you think it's the best solution. I'm also willing (gulp) to redo the process with the correct tkhtml contents, if you decide that's best - it was my mistake, and I'll do what it takes to fix it.
let me do a checkout to see how confusing it actually looks when it fails
I've been using git checkout rel-7-32-2
fwiw
it's not just on you... checking out a couple tags was on my verification list and I got lazy and skipped that one ... :(
looks like after turning off STRICT, adding include(CheckSymbolExists), all that was required to at least compile was commenting out the line it failed on -- Tcl_SetResult(interp, HTML_SOURCE_FILES, TCL_STATIC); in htmltcl.c:2787
and then both archer and mged work
I think that's an acceptable hardship since there's always going to be tweaking of old builds required.
@Sean Actually, I may have just coaxed repowork into redoing just the tkhtml sources: https://github.com/starseeker/brlcad_tkhtml_fix
Testing now, stand by...
Ah hah! rel-7-32-2 builds clean.
@Daniel Rossberg , @Sumagna Das - don't do anything yet with your forks. I may have achieved a deeper fix for the tkhtml issue, and if so it will require swapping out the current git repository.
@Sean This is not a brlcad_conv12 derivative - it is the current repository up on BRL-CAD/brlcad with a targeted set of blob sha1 replacements centered on just the relevant Tkhtml files.
As such it has the commits we had already made to that repository. Indeed, the commit where I had restored the rcsid strings is now in this new repository almost a no-op (there was one .txt file that got its revision restored in that commit but not in my remapping work.)
Once I contemplated re-running the whole conversion again to avoid filtering the tkhtml sources, I realized it was far easier to take the existing conversion (which already has a multitude of corrections applied that would be tricky to re-apply again) and target just the necessary sources. So I scripted a git log --follow of all the relevant .c files in git, and for each log entry for each file extracted the svn revision. Then I checked out both the git and the SVN tkhtml sources for the corresponding revisions, calculated the git hashes for the git and svn versions of the file, and made a git fast-import blob out of each revison of the svn version of the file. That gave me a way to map the git internal state to what it should have been had the SVN files been properly included, and also gave me the raw blob inputs to feed to fast import so the blobs would be available to reference.
It looks as if it worked. To switch this repository in I would recommend deleting BRL-CAD/brlcad and re-creating it, since all the revisions after somewhere in the 32k range will have different SHA1 values.
Here's the set of commits where those files changed:
32868
32899
40243
40405
47234
49599
49602
49607
49608
49609
49611
49613
57890
Sigh. Another oddity. Now that I can try distcheck, run.sh isn't set to executable and benchmark isn't cleaning up properly.
https://github.com/starseeker/brlcad_bench_fix superceeds the tkhtml_fix repository - it has those fixes and 100755 setting for the SVN executable files that git didn't have listed.
For the latter it's a brute force approach - I generated a list of all the file paths rel-7-32-2 had set executable that git did not, and set those paths' modes to 100755 throughout the git history. I didn't attempt an analysis of when SVN did or didn't set that property on those files.
make distcheck now passes on a git checkout rel-7-32-2
/me fires distcheck-full on rel-7-32-2, notes that brain has reached "E", and heads off to the charging station...
Thanks for the warning. I removed my fork for now.
Should I also remove the fork?
@Sumagna Das I would suggest doing so - @Sean will need to review what I've done and see what he wants to do.
@Sean distcheck-full succeeded on rel-7-32-2 from the brlcad_bench_fix repo
One additional quirk I just hit, but something that's not specific to our setup as far as I can tell - git checked out the .3dm file in text mode by default on Windows. When I added .3dm binary to the .gitattributes file that seems to address it, but since older checkouts won't have a .gitattributes file on Windows they'll probably get the wrong checkout by default.
Workarounds would either be the .git/info/attributes approach discussed earlier for $Id$, or setting .3dm in a global attributes file: https://stackoverflow.com/a/28027656
With that caveat, brlcad_bench_fix rel-7-32-2 built successfully on Windows with MSVC
rel-7-32-0 distcheck-full passed on Ubuntu
rel-7-32-2 enable-all build succeeded on OSX
rel-7-30-8 is too old to distcheck vanilla on this box without modding the system (system proj_api.h interferes )
@Sean If you have other tests you'd like me to do I'm game...
what about 7.0? :D I think that was the first release I contributed to (fbsd support and autoconf)
"here's a cd with fbsd, here's a cd with our source code, here's a computer. We'll try to get you on the network in the next couple of weeks." haha, the good old days :D
/me chuckles. I'd almost certainly need a VM to try building something that old.
speaking of! A lot of my life lately is building singularity and occasionally docker images. I know we had a raw disk image a while back for loading into vmware or bochs or whatever, do we/should we(you) provide container images? :D
@Erik Checking the diffs, it looks like whitespace changes (line endings) and expanded vs. unexpanded RCS tags. Plus a couple files like .cvsignore and .gitignore
he, hehe, he, yeh... 'find . -type f | xargs sed -i.bak 's/[ ^t]*//'` ... I think I did an indent back then, too...
I mean differences between the SVN and git checkouts. Although if git's history following breaks in there I'll know who blame ;-)
c'mon, I was new, had to spraypaint my name all over the place and get established, ch'know :D
/me tries a crazy idea to handle the .3dm checkout issue...
@Sean I've figured out how to insert a .gitattributes file at strategic points in the git history so the .3dm file gets flagged by git as a binary checkout. Ironically, this means the rel-7-32-2 distcheck will break on the default repo_verify step, since by that point: 1. the CMake logic had been taught how to use git for bookkeeping and 2. the .gitattributes file is present and unaccounted for in the CMake logic. However, I think it's still better to insert it - the error message tells the user what flag to supply to to avoid the problem (or they can just delete the .gitattributes file, since it has done its job by that point.) A corrupted 3dm file, on the other hand, has no easy fix.
https://github.com/starseeker/brlcad_added_gitattributes
@starseeker I found a good way to quickly extract all the files ever marked executable. It's a lot more than a handful. Quick scan of just a few dozen revisions found 2635 files. As expected, some are bogus but most look good. I'll do a manual pass over the list in the morning to weed out the ones that clearly shouldn't have exec set. The rest should be harmless.
Think it's better that the build work and the files be valid/usable on checkout. Distcheck failing isn't critical, so it's a reasonable trade. I would be cautious making more changes like that though. Surgery on the history to inject and edit files is risky in a manner that might not be realized for months and need to be re-uploaded to fix if there's some obscure but important bug.
Couldn't wait till morning. I went over the list manually, eliminated all the ones that looked like the exec bit was wrong/unnecessary, and here they are: executables.txt
I only looked at every 250'th commit for expediency, but did look through entire history up through r77000. I also only looked at trunk, so anything only existing on a branch Intentionally delisted all the itcl/itk files, makefile logic, and other outright errors (many of which are still wrong on trunk albeit harmlessly). Identified/Kept 1770 files but could use another pass from fresh eyes.
Sean said:
Think it's better that the build work and the files be valid/usable on checkout. Distcheck failing isn't critical, so it's a reasonable trade. I would be cautious making more changes like that though. Surgery on the history to inject and edit files is risky in a manner that might not be realized for months and need to be re-uploaded to fix if there's some obscure but important bug.
Agreed. It might be better in that sense not to change it, even, since the .git/info/attributes answer would also address the issue and does not require history editing.
If we do opt for adding .gitattributes, there is one final question - the repo I posted last night puts a minimal .gitattributes in at two places - once when terra.dsp is introduced, and the second time when the .3dm file is introduced. The .gitattribute contents are focused tightly on those two file extensions. However, if we're going to more closely mimic the SVN checkout behavior, it would actually make more sense to inject a more comprehensive .gitattributes at the beginning of the history that covers more file types. My initial impulse was to go minimal to avoid surprises, but since SVN did have those mime types set there is an argument that it is more surprising for git not to have them. Thoughts?
@Sean on the executable files - I set up some checks as well, using a brute force approach. (SSD speeds are nice) I checked all commits for trunk/ and branches/ is finishing up now.
Here is the full, unedited trunk list: trunk.txt
(That's all trunk commits from the beginning.)
@starseeker That's basically the list I started with. I edited it down to the executables.txt list as there are many subtley and blatantly wrong entries in there.
We shouldn't set all those. There are entire folders that were checked in with executable bit set, including source files, header files, build files, images, ...
Agreed.
I spent an hour whittling it down to what looked like should be the set to set.
What's the alternative to .gitattributes?
I 'll give it a quick check to see if a full rev check caught any that the 250-per jumping skipped over, but I don't expect to find much.
I'm not terribly a fan of littering folders with vcs files.
After doing a main checkout, the user can add "*.3dm binary" to the .git/info/attributes file.
That has to be a manual step, but because it's not a file in the repo history and it has highest precedence, once it's there any checkouts of other branches or tags will use it.
starseeker said:
I 'll give it a quick check to see if a full rev check caught any that the 250-per jumping skipped over, but I don't expect to find much.
You're comparing apples to oranges there a bit because missing will be any between the 250 jumps that lived ephemerally but 99% will be intentional removals. I can give you the full list I started with.
Either way - I'd be good to go with your list if you're comfortable with it.
Here's the list I started with -- can compare with this to see what got skipped: executables_250.txt
Here's what pulls the list: for i in seq 0 250 77000
; do echo $i ; svn -R -r $i propget svn:executable file:///Users/morrison/brlcad.github/svn.sfmirror/code/brlcad/trunk ; done | tee executables.log
Then just sort | uniq -c | sort -nr | awk | sed | sort ;)
Here's what got added by all revs: 250_vs_trunk_all.diff
Checking all tags, there was only one path that got added compared to trunk - "misc/archlinux/brlcad.sh"
that's diff'd against executables_250.txt ??
250_vs_trunk_all.diff? Yes.
because that looks more like the list of what I deleted
/me redoes it to be sure...
oh, I posted the wrong file dammits
gimmie a sec
This is the full list!
executables_redux250.txt
FYI, I deleted the brlcad repo :(
Mostly culls - couple configure scripts and the like that may pass...
I'm three quarters of the way through the remaining branch checks - so far it looks like a little over 1200 files set exec that are unique to branches.
I'll comb through that diff list. There are a few in there that should be preserved.
Here's the set that are unique to branch commits: branches_uniq.txt
Here's the set with some of the obvious culls removed: branches_uniq_reduced.txt
Line used: cat branches_uniq.txt |grep -v \\.h|grep -v \\.msg |grep -v \\.itk |grep -v \\.cpp |grep -v tzdata > branches_uniq_reduced.txt
still some inappropriates in there
I should be done going through the diff list here in a jiffy after I grab a bite
@starseeker thoughts on the creo3plugin snafu? inclined to ignore it from an exec bit perspective
Agreed. Not significant.
@Sean As far as the .gitattributes thing, which option do you want to go with? We've got:
a) Insert minimal .gitattributes files at strategic points (the brlcad_added_gitattributes repo)
b) Insert more fully populated .gitattributes file for overall repo (closer match to SVN mime types in many cases, but problematic if we get unanticipated matches - personally I'm inclined not to do this)
c) No .gitattributes insertions, require user to set either per-checkout attributes or some form of global git attribute.
I'd be OK with a) or c) - if we go with c) however, we'll need to prominently document what to do to get "proper" older checkout behavior on Windows. The .dsp file isn't particularly noticeable if it gets munged up by the checkout, but the 3dm file is...
Here's the merged reduced set:
executables2.txt
how does git normally handle the exec bit?
I think it stashes it as part of the commit internally.
yeah, it stashes the mode on commit
so ... I think it looks like that's a global property that's just set, not something tracked per commit?
am I reading that right?
the fact that one can do git update-index --chmod=+x foo.sh
the "index" being some sort of permissions ledger
I don't... think so? I think the index update is going to alter the tree entries git uses to track the checkout states?
I know in the fast-export file that's how it's represented... hang on, let me generate something quick.
I guess to answer your question, I'm looking for an option #4 where it's just set in the repo transparently instead of explicitly as a bandaid
since we're going through the rigor to fix it, might as well .. fix it
Oh, you're talking about the binary vs. text mode checkout?
That's different from the exec mode.
what about checking out each rev and doing a git update-index on each of the files in our ledger?
starseeker said:
Oh, you're talking about the binary vs. text mode checkout?
No, I'm not
I've not even considered the binary property...
Here's the fast-export stream from the repository without the blobs (i.e. small enough to be viewed): https://brlcad.org/~starseeker/no_blobs.fi.gz
I find that helpful for understanding how things are stored by git.
So every time it's checked in, it's mode is potentially changed
Yes.
That's my understanding - I can do an experiment quick to confirm that.
So a script that walks every commit and scans for all the executables2.txt files?
Can you do an update-index on a committed sha?
I was planning to do what I did for the previous case - take executables2.txt, reformat it for repowork, and operate on the fast import stream.
I'm not familiar with what "operate on the fast import stream" means... :)
how's it doing surgery on the repo commits?
or is it not?
Heh. Sorry. repowork take the output of "git fast-export", reads it into C++ data structures, manipulates it, and dumps out a new fast-import stream that is in turn fed to "git fast-import"
so re-running the conversion
and fixing it then
or dumping the conversion, fixing, and re-importing
I think you mean you are dumping 12, doing all those corrections+fixes+etc, and then ending up with a new repo (call it 19 or 20 or whatever)?
The latter. So the full sequence is:
cd old_repo && git fast-export --all --show-original-ids > ~/old.fi
./repowork --mode-map exec_update.txt ~/old.fi new.fi
mkdir new_repo && cd new_repo && git init
cat ../new.fi | git fast-import
No, dumping (in this case) brlcad_added_gitattributes.
That way I don't have to redo all the 12->18 corrections - they're already there.
okay, I think I get it
That's why I was asking about the .gitattributes solution - I can also dump brlcad_tkhtml_fix and not incorporate the .gitattributes changes.
sounds good. Okay, so then ... is there anything to be done about the binary files? we could audit them similarly
I'd like to avoid .gitattributes if we can.
Hrm. I hadn't considered that beyond the known breakages...
Well, we can't avoid something like that, unless you know about a Git feature I don't.
I can pull the list of known correct and incorrect binaries the same way. even doing every rev would probably take an hour or so
Whether the dsp or 3dm files get checked out as text or binary I don't think is governed by anything stored internally in the repo.
What do you mean?
That's why I posted the no_blob.fi file. If you check pretty much any commit, you'll see that only the mode and the blob sha1 are associated with the path. There doesn't seem to be an equivalent to the svn:mime-type
https://git-scm.com/docs/gitattributes
right, my understanding is git doesn't actually store mixed encodings like svn
that everything is essentially just stored (binary) and whether it displays them or treats them as binary depends on it detecting non-ascii bytes
Right, which means if you want it to treat a file (say) as binary anyway (or keep Windows line endings on Linux, for that matter) you need some form of gitattributes override. The dsp and 3dm files are getting detected as text, as far as I can tell.
We could look for other files in the repository that should be binary but will match a text detection, although I'm not 100% how to set that up, but even once we know that there's no per-path property we can set in git (that I know of) that doesn't involve the .gitattributes file
gotcha
in that case, I'm inclined to see-no-evil
too much potential to screw up something. e.g., .dsp files .. those were msvc6 project files iirc, so they usually are/were text files
Right - that's where terra.dsp got so messed up historically - when people auto-set all the mime types for .dsp files.
i mean, we can fix our little terra.dsp and 3dm, but probably not worth seeking out more
potential for error would probably be the few .g's that have been committed, but those are almost certainly correctly detected as binary
i'll do a spot check just to see if it looks like there were any important binaries in the history
similar 250 jumping to see what was binary
that just takes a couple min
oh that'll be handy -- this will also tell which files we changed the mime-type on, which might be an indicator that it was important
half done
Sounds good. This repo has the exec updates from your new file: https://github.com/starseeker/brlcad_exec2
4868 unique binary file paths
man... there's a lot of mime-type mistakes in there
/me fires distcheck-full on rel-7-32-2 from brlcad_exec2
(or more specifically, cmake .. -DFORCE_DISTCHECK=ON && make distcheck-full
)
@Sean If you do find more important binary paths that test as text files, what did you want to do about them - make similar insertions of .gitattributes to protect them?
still going through the list
@starseeker do you have an existing .gitconfig or other file specifying file extensions being binary or not somewhere?
My current heuristics are here: https://github.com/starseeker/brlcad_exec2/blob/main/.gitattributes
but you just added that, no?
that's not what I'm getting at
Oh, you mean do I have something else on my system setting attributes?
Not to my knowledge... let me see if there's a system file...
It doesn't look like it, no.
The fact that it got terra.dsp wrong is a little surprising as it clearly has non-printable characters. The only reason I can think of where it would have committed that as text is somewhere something saying '.dsp files are text'. I'm not finding that so it's a little concerning where that came from.
by any automatic measure, terra.dsp would have come up binary
Oh. That may be my error.
terra.dsp may have gotten the svn mime-type set by pattern match.
It may be that left to itself it would be fine in git...
that's file's been set both ways in svn, a source of issues over the years
That would simplify matters, actually - we'd only have to add .gitattributes for the 3dm file.
As long as git doesn't have any built-in file extension awareness for *.dsp... I doubt it...
I can't imagine it does either.
is NIST_MBE_PMI_7-10.3dm the only 3dm file or was something else triggering it?
that's another that should get detected as binary... I mean unless the detection method is onerously too simple.
I believe it's the only uncompressed 3dm file
src/libbrep/tests/ayam_hyperbolid.3dm I guess would be another one.
but I don't think I've got that hooked into any build tests right now...
is that in the history or something?
oh strange
/me blinks - it should be in trunk...
so that file is in svn trunk, but it's not in conv18
oh right, my bad -- that was my 7.30 test
there it is
/me blinks - terra.dsp is coming out different in SVN and git checkouts according to diff, even though I fixed the SVN mime type in 70882
/me growls and heads for a CVS checkout... what's the right file here??
OK, I guess that makes sense, kind of. Both r18847 and latest trunk SVN checkout of terra.dsp diff with the CVS checkout, but the git checkout matches the CVS checkout.
/me doesn't know why NONE of the SVN checkouts match CVS, but I guess it doesn't really matter at this point...
interesting. so the difference is really subtle.
trunk's terra.dsp doesn't appear to have any 0x13 bytes (carriage returns)
it does have 0x11 bytes (newlines)
conv18's terra.dsp has both 0x13 and 0x11 bytes which at a glance is probably correct
it's also worth mentioning that both are perfectly valid dsp data files for the same dimensional specification. the difference is going to be a 1/32768 difference in elevation at those points.
I just tested brlcad_tkhtml_fix on Windows, which for older checkouts doesn't have .gitattributes. terra.dsp checkout matches the CVS version according to diff, so you're correct - we don't need .dsp flagged as binary explicitly. We just need to make sure we don't flag it as Windows line ending in git.
It's probably worth leaving the entry in the new .gitattributes to avoid that, but we don't need to insert it in the old history for that purpose. I'll adjust my logic to only add the .3dm version.
can we get rid of the top-level .gitattributes altogether? rather we stick to defaults if we can manage.
We still need it for .3dm
can't that be a single-line .gitattributes in that lone 3dm's folder?
Let me see if git supports that...
there's so much override specified in that file, I can see that coming to bite down the road or at least being a debugging discovery journey
but that also still begs the question how that 3dm is getting treated as text... it's full of binary
'file' sees it as binary
Yeah, I don't know why Windows treated it differently.
OK, if I'm reading this right, ".gitattributes file in the same directory as the path in question" is in the precedence list, so you should be correct we can target locally.
I would target as specific and minimally as possible for this initial thrust
@Sean I'm game to ditch the top level .gitattributes in main - I added it mostly trying to match the subversion default rules you had set up...
Give me a few minutes to rework the repowork in puts...
okay, so apparently their method is essentially ..."check for any occurrence of a zero/nul byte in the first 8000 bytes"
/me checks that file
so terra.dsp qualifies. it's got a zero around byte 280
okay, so NIST_MBE_PMI_7-10.3dm has a zero byte at the 34th byte in the file...
34th, 35th, 36th, 39th, 40th are all zero...
did you maybe use some git checkout tool that had a built-in config such that it was an individual issue?
/me shrugs. Maybe - I'll try again.
according to git on mac, it thinks they're binary...
this tells which it thinks are binary: git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-'
I just tried again cloning with Git on Windows - the SVN checkout and the git checkout differ.
those would seem to imply we don't need to do anything for them and terra.dsp is getting historically and contemporarily fixed by the migration
maybe the svn checkout is wrong
terra.dsp agreed. The 3dm files are the issue - ayam_hyperbolid.3dm also differs between git and SVN checkouts.
3dm-g works on the SVN checkouts, but not the git versions.
/me double checks that...
Confirmed. Git checkouts of both 3dm files on Windows fail to convert with 3dm-g
what's the checkout tool?
this: https://git-scm.com/download/win ?
Just the standard Git windows install, from the bash command line
I believe so, yes.
what does this report on windows: git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-' | grep 3dm
I get:
morrison@agua brlcad_conv18 % git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-' | grep 3dm
- - db/nist/NIST_MBE_PMI_7-10.3dm
- - regress/nurbs/brep-3dm.tar.bz2
- - src/libbrep/tests/ayam_hyperbolid.3dm
or git diff --stat ... it reports Bin too
Yes, that matches
Are the bytes the same?
% git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
db/nist/NIST_MBE_PMI_7-10.3dm | Bin 0 -> 4232626 bytes
regress/nurbs/brep-3dm.tar.bz2 | Bin 0 -> 103242 bytes
src/conv/3dm/3dm-g.c | 137 +
src/conv/3dm/CMakeLists.txt | 16 +
src/libbrep/tests/ayam_hyperbolid.3dm | Bin 0 -> 4189 bytes
src/other/openNURBS/opennurbs_3dm.h | 528 +
src/other/openNURBS/opennurbs_3dm_attributes.cpp | 1528 +
src/other/openNURBS/opennurbs_3dm_attributes.h | 573 +
src/other/openNURBS/opennurbs_3dm_properties.cpp | 598 +
src/other/openNURBS/opennurbs_3dm_properties.h | 142 +
src/other/openNURBS/opennurbs_3dm_settings.cpp | 4036 +
src/other/openNURBS/opennurbs_3dm_settings.h | 891 +
https://brlcad.org/~starseeker/ayam_hyperbolid.3dm.gz
there's one of the windows checkouts...
https://brlcad.org/~starseeker/NIST_MBE_PMI_7-10.3dm.gz
yeah, it's bigger 4199 bytes
and confirmed in hex mode, it's converted 0a's to 0d0a's
WHY? does stat say they're Bin for you ??
Interestingly, I get the same report when I ask git:
$ git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
db/nist/NIST_MBE_PMI_7-10.3dm | Bin 0 -> 4232626 bytes
regress/nurbs/brep-3dm.tar.bz2 | Bin 0 -> 103242 bytes
src/conv/3dm/3dm-g.c | 137 +
src/conv/3dm/CMakeLists.txt | 16 +
src/libbrep/tests/ayam_hyperbolid.3dm | Bin 0 -> 4189 bytes
src/other/openNURBS/opennurbs_3dm.h | 528 +
src/other/openNURBS/opennurbs_3dm_attributes.cpp | 1528 +
src/other/openNURBS/opennurbs_3dm_attributes.h | 573 +
src/other/openNURBS/opennurbs_3dm_properties.cpp | 598 +
src/other/openNURBS/opennurbs_3dm_properties.h | 142 +
src/other/openNURBS/opennurbs_3dm_settings.cpp | 4036 +
src/other/openNURBS/opennurbs_3dm_settings.h | 891 +
yeah, see that makes no f'ing sense...
it thinks they're binary ... yet it's been translated by something
"something"
Here's what stat says:
$ stat db/nist/NIST_MBE_PMI_7-10.3dm
File: db/nist/NIST_MBE_PMI_7-10.3dm
Size: 4232626 Blocks: 4136 IO Block: 65536 regular file
Device: e02c1581h/3760985473d Inode: 3096224743825018 Links: 1
Access: (0644/-rw-r--r--) Uid: (197612/ cliff) Gid: (197612/ UNKNOWN)
Access: 2021-03-12 15:55:21.920148200 -0500
Modify: 2021-03-12 15:53:06.716503800 -0500
Change: 2021-03-12 15:53:06.716503800 -0500
Birth: 2021-03-12 15:53:06.716503800 -0500
woah, that's odd too
4232626 bytes ... yet the file you sent me is 4243206 bytes
how's that possible?
stat is saying it has no carriage returns, yet the file you sent has carriage returns
that's stat on windows?
Yes. All those were run from the git bash command prompt
what's ls say?
$ ls -l db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4232626 Mar 12 15:53 db/nist/NIST_MBE_PMI_7-10.3dm
!
and you're saying 3dm-g in that same shell dies on it?
Well, I first got this:
/c/brlcad-build/Debug/bin/3dm-g.exe -o test.g brlcad_tkhtml_fix/src/libbrep/tests/ayam_hyperbolid.3dm
invalid input file ('ONX_Model::Read() failed.
Note: if this file was saved from Rhino3D, make sure it was saved using
Rhino's v5 format or lower - newer versions of the 3dm format are not
currently supported by BRL-CAD.')
failed to load input file
but I just tried it again and it seemed to work???
And now the diff matches????????!!!!!
that's what's confusing because all the numbers are pointing at it being correct (now)
What in the world?
maybe make sure you don't ahve a git diff or checkout again or something
It's almost as if it wrote an intermediate version of the file and then went back and changed it
not a .gitattributes or .gitconfig or something...
/me starts over...
check that size after every step
4232626 is correct... 423 good 424 bad
OK, fresh checkout, back to bad file:
MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
MINGW64 /c
$ diff brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm brlcad/db/nist/NIST_MBE_PMI_7-10.3dm
Binary files brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm and brlcad/db/nist/NIST_MBE_PMI_7-10.3dm differ
MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
MINGW64 /c
$ /c/brlcad-build/Debug/bin/3dm-g.exe -o /c/brlcad_tkhtml_fix/test.g brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
invalid input file ('ONX_Model::Read() failed.
Note: if this file was saved from Rhino3D, make sure it was saved using
Rhino's v5 format or lower - newer versions of the 3dm format are not
currently supported by BRL-CAD.')
failed to load input file
MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
MINGW64 /c
$ date
Fri Mar 12 16:11:59 EST 2021
So far it hasn't fixed itself again...
stat at least agrees with the wrong version this time:
$ stat db/nist/NIST_MBE_PMI_7-10.3dm
File: db/nist/NIST_MBE_PMI_7-10.3dm
Size: 4243206 Blocks: 4144 IO Block: 65536 regular file
Device: e02c1581h/3760985473d Inode: 48132221017637258 Links: 1
Access: (0644/-rw-r--r--) Uid: (197612/ cliff) Gid: (197612/ UNKNOWN)
Access: 2021-03-12 16:17:43.513220300 -0500
Modify: 2021-03-12 16:16:43.532223700 -0500
Change: 2021-03-12 16:16:43.532223700 -0500
Birth: 2021-03-12 16:08:53.242552400 -0500
MINGW64 /c/brlcad_tkhtml_fix (main)
$ git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
db/nist/NIST_MBE_PMI_7-10.3dm | Bin 0 -> 4232626 bytes
regress/nurbs/brep-3dm.tar.bz2 | Bin 0 -> 103242 bytes
src/conv/3dm/3dm-g.c | 137 +
src/conv/3dm/CMakeLists.txt | 16 +
src/libbrep/tests/ayam_hyperbolid.3dm | Bin 0 -> 4189 bytes
src/other/openNURBS/opennurbs_3dm.h | 528 +
src/other/openNURBS/opennurbs_3dm_attributes.cpp | 1528 +
src/other/openNURBS/opennurbs_3dm_attributes.h | 573 +
src/other/openNURBS/opennurbs_3dm_properties.cpp | 598 +
src/other/openNURBS/opennurbs_3dm_properties.h | 142 +
src/other/openNURBS/opennurbs_3dm_settings.cpp | 4036 +
src/other/openNURBS/opennurbs_3dm_settings.h | 891 +
MINGW64 /c/brlcad_tkhtml_fix (main)
$ stat db/nist/NIST_MBE_PMI_7-10.3dm
File: db/nist/NIST_MBE_PMI_7-10.3dm
Size: 4243206 Blocks: 4144 IO Block: 65536 regular file
Device: e02c1581h/3760985473d Inode: 48132221017637258 Links: 1
Access: (0644/-rw-r--r--) Uid: (197612/ cliff) Gid: (197612/ UNKNOWN)
Access: 2021-03-12 16:17:43.852005600 -0500
Modify: 2021-03-12 16:16:43.532223700 -0500
Change: 2021-03-12 16:16:43.532223700 -0500
Birth: 2021-03-12 16:08:53.242552400 -0500
@Sean I dont' suppose you have access to a similar environment?
what about a git up or git stat .. wondering if that fixed it
I've tried deleting it and re-checking it out - to no avail
it has to have been a git tool that fixed it.. you tried git diff --stat or git diff --numstat on it?
I'm thinking the GUI checkout tool has a bug
I didn't use the GUI though - just the command line
well torpedo'd that thought good
and your .gitconfig is empty? and no .gitattributes?
cause now it's a double mystery. why it's cloning wrong and how it got fixed.
Only thing I can think of is that .gitattributes file I added in main may actually be the problem.
Yep. I'll be that's it.
It checked out wrong in main because some rule I stuck in there must have matched the 3dm file, then when the branch checked out it kept the file that had been "modified" by the .gitattributes file.
OK, I'm convinced - no gitattributes file.
:thumbs_up: That's what I suspected would eventually happen... just not so soon.
When I switched to the branch that didn't have the file, blew away the modded 3dm file, and restored from the branch rather than main that's where the good file came from.
ahh! yep, that'd explain it. whew.
OK. I'll back up to conv18 (or the one with your readme update if I've got it) and re-apply the tkhtml fix and the exec settings. We should then be Good To Go - minimalism wins again.
Ima need a hard drink tonight after that
README update wasn't important. that was just testing commit
I plan to completely overhaul the readme soon once all the other docs and tickets are in place.
/me has been rather bad for your stress levels this week. OK, give me a few minutes to do a final pass and I'll upload the final candidate.
@Sean remind me after we're done here to check the fast4 regression test - one of those files is deliberately Windows line endings and one is deliberately Linux - we may have to switch the in repo copies of those files to be .bz2 or something so they don't get autoupdated as text files on checkout.
@Sean While we're thinking about it, did you also want to eliminate .gitignore? It's in there now because it gave us some non-empty SVN commits for id mapping, but maybe we want to eliminate it now.
https://github.com/starseeker/brlcad_conv19
Windows build and distcheck-full running on rel-7-32-2 tag from that repo now. Will confirm if successful in a few hours
I'll start testing 19 now too unless there's a reason to wait.
Go for it
Windows build passed, distcheck full Ubuntu passed for rel-7-32-2
@Sean Any other checks you'd like me to run?
maybe just make sure the few .g's that are in the repo open and seem valid?
I'm running out of checks on my end, I think I'll upload the update this afternoon unless you found anything
find ../ -name \*.g -exec ./bin/mged {} ls \;
came through clean, as far as I can tell.
Ditto on Windows (git bash shell)
Hello everyone, will it be possible if someone can point me out to the getting started docs or quick start docs for opencax ?
@starseeker something that exercises external-to-internal form, like doing a get * or draw *. ls doesn't crack them iirc.
@Sean draw console output matches that from SVN build.
g2asc output for all of them matches as well, except for a few chars in the openNURBS serializations of the breps.
OK, yeah - the openNURBS serializations differ even building from the same sources, when different build dirs are used.
@Sean no known blockers on my side...
Anything I can do to help?
I just finished up, didn't find anything else. Will upload in the morning.
@Sean If it's the vanilla brlcad_conv19 repo, shall I go ahead and upload it?
No, not yet
So I think a "soft opening" is probably in order. It's uploaded and live, but perhaps we could give it a few days to "simmer" .. not announce it publicly just yet.
Sounds good. So to be sure - I'm clear to commit?
@starseeker haha, that sounds terrifying when you say it like that
but yeah, I don't see a reason why not. if anything, we should exercise it to make sure it's correct.
Woo hoo! https://github.com/BRL-CAD/brlcad/actions
/me has been working towards that for quite a while now
@Sean Do we want to bother setting up an email mailing list for folks to get commit emails without having a Github account? If so this may be useful... https://docs.github.com/en/github/administering-a-repository/about-email-notifications-for-pushes-to-your-repository
@Sean Right now I've got the "check" target building on the runners, but that's not going to succeed reliably due to the threading issues - looks like regress-gqa is failing some of the time on the OSX runner. Should I disable the check portion of the test until we have an expectation it can reliably run?
I took a run at updating HACKING, but without doing an all-up release I'm sure I've missed something.
One thing that is clear - if we want to keep providing the GNU style ChangeLog files, we'll have to put some effort into it.
My thought, since now each git clone has the whole history locally, would be to either dispense with the ChangeLog all together or simply use the git log output. The only real utility to the ChangeLog would be for folks looking at tarballs without any access to either a local or github version of the history - I would expect that to be a rare case, and even in that scenario I would expect git log (or maybe git log --stat) output to be as useful as the current ChangeLog.
Started populating the releases - that's going to be a job if we want to get all the binaries, source tarballs and notes moved. I got the majority of the release notes set up - all but a couple back to 7.0, except for a couple without obvious corresponding tags. However, I've only gotten a few of the uploads.
@Sean Seems to be working pretty well so far.
/me needs to add a tag for rel-7-10-4
rel-7-6-2 as well
and rel-7-4-2
ehhh, if github is central to development, people who want to see that far into development should come watch...
Phew! OK, missing tags added, source and binary tarballs uploaded. Needs someone to double check to make sure I didn't miss any. OVA image (just barely) uploaded to Release on OVA repository.
Only binaries I don't have up yet are the old ProE plugins - not sure where to put them.
Options would be either to set up a separate project for the creo plugins, or add tags for the plugins (something like proe-plugin-0-2-0 maybe?) and upload the plugins to those tags. If we do want to add older tags for the plugins, we'll need to be careful about setting tag dates once we identify the corresponding commits. (Just got bit by that - it's fixable https://stackoverflow.com/a/21741848/2037687 but we may as well get it right up front...)
Do we want to use https://pages.github.com/ for the site?
@Erik Do you know anything about the Github "packages" feature? Is that anything that might be useful for BRL-CAD?
https://docs.github.com/en/packages/learn-github-packages/core-concepts-for-github-packages looks like the starting point but I'm not clear yet on what a "BRL-CAD package" would be or mean... Is that were we'd stick (say) a docker image?
no clue, never heard of it... my daily is on bitbucket :/
this repo looks like a good one for testing out the workflow files for github
@Sumagna Das Interesting! Have you tried that with the BRL-CAD actions?
i just can keep my laptop open for more than 2-3 hr mainly for classes
but i can try it right now if you want and keep the laptop open for the night....
@Sumagna Das Up to you - I'd actually be surprised if it can do anything much with our action files, since they call for Windows and OSX vms as well as Linux...
as per the readme, it seems like it cant work with windows and macos
only works with linux
so might not be a good one for testing out workflow files except for linux ones
So the question would be whether it knows to skip the non-Linux entries automatically or would we need to edit the files down before running it.
might skip them
let me try a dry run
Sumagna Das said:
How is this offtopic?? :D
i didnt know that it can actually help with BRL-CAD's github actions so thought it was off topic :smile:
@Sumagna Das fits with the Github topic, if you want to shift it over there.
done :smile:
I've lost count of the number of things I've done on this conversion that I've considered where I didn't know whether or not it would help...
starseeker said:
Sean Do we want to bother setting up an email mailing list for folks to get commit emails without having a Github account? If so this may be useful... https://docs.github.com/en/github/administering-a-repository/about-email-notifications-for-pushes-to-your-repository
Yes, though I'm not fond of Github's default that merely links to the diff. It should really be in the e-mail (up to some kb limit) since the entire point of commit notification is quick review of the code change.
Looks like the way to handle it will be to set up a clone on .bz that pulls periodically with a receive hook
starseeker said:
Sean Right now I've got the "check" target building on the runners, but that's not going to succeed reliably due to the threading issues - looks like regress-gqa is failing some of the time on the OSX runner. Should I disable the check portion of the test until we have an expectation it can reliably run?
Yes, advisory in the meantime.
For the ChangeLog, we can start without it. I think it will be good to include one in future source tarballs, though I don't think it matters so much what tool generates it. Including more than the git 1-liner would be essential, but a git log of all changes since last release would probably be adequate.
At a glance, looks like there are a couple that wrap git log, and looks like emacs can do it, or we can just sort out the magic needed to automatically extract commits since the previous release (a little tricky, but not terribly hard).
starseeker said:
OVA image (just barely) uploaded to Release on OVA repository.
How close was it to the file limit?
The github file size limit is, IIRC, 2 gigs. Compressed, it was on the order of 1.8
I thought we determined that was a soft limit, not a hard one...
Maybe... I don't recall for sure
/me bemusedly wonders if @Sean is planning to announce the migration on April 1st...
Okay, I've sent out 16 invitations to add people to our list of members (i.e., people that have commit access to any repo). It's only a fraction of what we had on SourceForge, but it should be a good start.
@starseeker you also apparently lacked the admin bit on the brlcad repo and weren't a member of the dev team, which looks like is why you couldn't add anyone.
/me nods - that'll do it.
That's fixed and I've added the new repos to the existing teams
right now, being in a team pretty much gives full administrative control, so we may want to change that later, but that's essentially how it was on sourceforge
Does github offer finer granularity?
oh heck yes, it's quite granular and with two separate layers
permissions are set on repos themselves or they're set on teams (which then have permissions attached to them) or they're set on members (which have permissions attached to them)
three layers I guess
Wow - nifty!
so for example you were a member, which lets you create repos, but you weren't on the dev team, so you couldn't add people to brlcad
It looks like it's set up this way so you can have teams with admin access, teams without, all accessing some or not having access to other repos. It's not a strict hierarchy of permissions, it's more of a matrix.
A bit complex to manage, but also potentially quite useful for preventing accidents and the like.
right now I just have two teams set up, devs and webdevs, with devs having all repos but only admin on the compiled-code repos, and webdevs having admin over the web-related repos including the website and web projects
@starseeker :( ... git log --follow src/conv/iges/g-iges.c
Looks like "nearly" everything in there has no traceability after the 25XXX converter movements. Looking at iges.h for example, it stops at 25521. Git log appears to have the other changes, for example if I git log --follow src/conv/iges/iges.h, it looks corrupted to me.
the last commit is shown as 0fe9bf30dc0f7980df6486014bb29567bec09a84 (r4502) which was a change to sig/i-a.c ... similarly 1cdf453b9d355b1a7fb10bea445ab18b262a0252 (r5920) was sig/u-a.c
the two commits before that seem to have nothing to do with sig and are other random commits
looks like it's not until 3408f5ba1220271623a90b3740eb43abe06a857a a dozen or commits prior that it starts to get back on track
If I trace back commits in subversion, the last five on iges.h are r13453, r10561, r9487, r8144, r7715. Commits r13453 is 994dcc97ee6d9f60e670aa9a2ed110273920294c for example and r7715 is split across three commits: 317460fce22e6ba835a08bef126e2b75a123ee78
b9f6d30bd15f4c66ed5e7506877b6ae35c80ea06
eb458e30c765b2758097abc1cb5909422e050e90
so the commits are somewhere in the full history, I'm just not sure where. :(
/me hopes this is limited to conv/ or conv/iges and not all directory renames besides r22798... because there were a dozen or so others
looks like it got vfont correct, looks like it got src/external/Creo wrong ..
For whatever reason, the --follow algorithm isn't finding the src/iges/iges.h file starting from src/conv/iges/iges.h. Looking at the gitk history, following the parent commits does get to the rename commit, so my initial guess is that it's not data corruption per say but a limitation of the implementation of --follow (which apparently has some issues...)
That doesn't add up though -- it lists some older commits on some files, commits that have absolutely nothing to do with that directory entirely.
like the sig/ files
If I'm reading this right, git's interpretation (or cvs-fast-export's, at any rate) was that r25518 removed the iges files rather than moving them, 25519 and 25520 were then committed, and 25521 added the iges files back in.
that may what breaks the --follow chain
I don't get any --follow output pass 25521
Your --follow is giving you bogus commits prior to 25521 with follow on iges.h?
git log --full-history -- "**/iges.h"
may be useful here
git log --follow src/conv/iges/iges.h
The last dozen or two commits have nothing to do with iges.h
Are they empty commits?
starseeker said:
git log --full-history -- "**/iges.h"
may be useful here
Doesn't that just mean that the history is attached somewhere? That much is already confirmed, the commits exist in the history, just seemingly not where they should be. Like, where is r13453 ? What file can I do a log on to find it? (inclined to see if it's attached to some other random file like the u-a.c commit.
starseeker said:
Are they empty commits?
Definitely not, they're genuine changes to other files not even related to src/conv in any way.
git show 0fe9bf30dc0f7980df6486014bb29567bec09a84 ... it says that was the first commit to iges.h in that location (sans follow)
The parent commit of 25521 is 3408f5ba1220 (25520) which is an empty commit as far as iges.h is concerned (and iges.h doesn't exist in the tree at that point.) That may break the --follow chain, but I'm not clear yet on why --follow is reporting anything else before src/conv/iges/ iges.h in that case
Wonder if this is related somehow? https://blog.plover.com/prog/git-log-follow.html
Commits back through 22798 in the follow history do have changes that pertain to iges.h, from the looks of things.
It goes off the rails from 22798 to 22606.
some do in a general sense, like license header updates, others not so much
I haven't been able to find the iges/iges.h history which had several dozen commits prior to the move around 25520
I'm not seeing several dozen? Here's what I can find for iges.h: iges_h_svn.txt
@Sean I agree git log --follow is going off the rails in a bizarre way, but if I diff the svn commits and those found by git log -- "**/iges.h" the delta is pretty small:
--- svnrevs.txt 2021-03-31 17:19:14.593937412 -0400
+++ gitrevs.txt 2021-03-31 17:19:50.609358451 -0400
@@ -18,12 +18,13 @@
27341
26074
25521
+25518
23807
23633
23577
+22839
22798
13453
-10561
9487
8144
7715
gitk "**/iges.h"
is also useful...
bbl
Discussion of how --follow works: https://stackoverflow.com/a/43960010/2037687
starseeker said:
I'm not seeing several dozen? Here's what I can find for iges.h: iges_h_svn.txt
Sorry, I meant iges.c for that one -- I was trying to find it's full history the same way and can't get it to report the 30 commits prior to it getting moved around even with git log --full-history -- **/iges.c
Comparing against: svn log svn+ssh://brlcad@svn.code.sf.net/p/brlcad/code/brlcad/trunk/iges/iges.c@22500 | grep '^r'
How can I manually traverse the actual history manually on the git side? In svn, one would see a log stops at r12345, then one pulls log on a path mentioned in the comment at a few revs prior (e.g., r12340), and repeat as needed. if it wasn't mentioned in a comment, one can still pull the tree at r12340, find the file, then continue the log on it.
in general, that system works even if the file was renamed.
I mean, I can think of a really brute force way, checking out the sha prior (-1), but what's the right way?
Relying on "git log -- **/file" feels inadequate in the general case because it 1) only works if the file wasn't renamed, 2) can erroneously catch other same-named files (good luck tracking a subdir README that moved..), and 3) doesn't seem to help figure out where the commit exists..only that it exists.
Any idea what happened with ProEngineer? It seems to similarly have lost track. I didn't check the others.
Sean said:
Comparing against: svn log svn+ssh://brlcad@svn.code.sf.net/p/brlcad/code/brlcad/trunk/iges/iges.c@22500 | grep '^r'
So if I do the following: git log -- "**/iges.c"|grep svn:revision|awk -F':' '{print $3}'
the last few returns are:
22367
21028
20508
19942
19550
19335
19139
19131
18043
17500
16912
13453
12989
11582
9951
9831
9693
9487
9283
9227
9221
9133
9080
8573
8144
8129
7790
7716
7715
With SVN svn log https://svn.code.sf.net/p/brlcad/code/brlcad/trunk/iges/iges.c@22500 | grep '^r'|awk '{print $1}'|sed 's/r//'
I get:
22367
21028
20508
19942
19550
19335
19139
19131
18043
17500
16912
13453
12989
11582
10561
9951
9831
9693
9487
9283
9227
9221
9133
9080
8573
8144
8129
7790
7716
7715
r10561 is the only one missing from Git, and that's expected as it was an SVN property change.
Sean said:
How can I manually traverse the actual history manually on the git side? In svn, one would see a log stops at r12345, then one pulls log on a path mentioned in the comment at a few revs prior (e.g., r12340), and repeat as needed. if it wasn't mentioned in a comment, one can still pull the tree at r12340, find the file, then continue the log on it.
In that situation what I would usually do is bring up gitk (or maybe gitk --all) and go to the last known relevant commit, then browse my way back up the history.
Following file history in Git is a known weak point (
https://stackoverflow.com/questions/5743739/how-to-really-show-logs-of-renamed-files-with-git)
IMHO not tracking file moves was a mistake, since it fundamentally limits what you can successfully pull out of the history in cases like this.
git log --follow
and variations on git log -- "**/fiename"
are the best answers I'm currently aware of, but I'll keep my eyes peeled for better ones.
Sean said:
Any idea what happened with ProEngineer? It seems to similarly have lost track. I didn't check the others.
If I'm interpreting 69329 correctly, the CREO directory was added while the ProEngineer directory was still present.
That may be why it's not following Creo back into ProEngineer - it wasn't a folder rename.
gitk's blame feature might be slightly better in some cases at following changes back, since some of the comments I've seen seem to suggest it's using a more powerful search mechanism than the --follow option...
Woah, okay, hah... huge difference between:
git log -- **/iges.c
and
git log -- "**/iges.c"
... I'd missed quoting the glob, so it was only matching src/conv/iges/iges.c history.
With or without --full-history/-all/etc that was what was causing me grief.
starseeker said:
Sean said:
In that situation what I would usually do is bring up gitk (or maybe gitk --all) and go to the last known relevant commit, then browse my way back up the history.
Er, that's rather error prone I'd think, trying to follow a text line potentially next to a half dozen other | lines, scrolling up for pages, maybe 10k commits back. Still that's also only good in GUI mode -- I'm looking for lower-level that will work even when I'm remove in a console. I mean is "git log -1 sha" where the gitk line connects up to? Or is it sha^! or something else?
Maybe I'm not quite following what you're after... Do you mean something like the following?:
For commit 22798:
$ git log -1 a1e49c
commit a1e49c5edbb4df8eb10f7ae014ae6efeb12fc966
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Thu May 20 15:22:02 2004 +0000
Vast reorganization begins. Sources moved from top-level directories into src/.
svn:revision:22798
cvs:account:morrison
cvs:branch:trunk
If I want info about the immediately preceding commit:
$ git log -1 a1e49c~1
commit be1f3137808b681347a7665a05049911c55166a1
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Thu May 20 14:54:22 2004 +0000
Sources that are external to BRL-CAD are moved from the top level to src/other/.
svn:revision:22797
cvs:account:morrison
cvs:branch:trunk
I can then get (for example) a top level view of the tree at that previous revision:
$ git ls-tree a1e49c~1
100644 blob cf056985dbd9086d3db465d486471e1e4ec5427f .gitignore
100644 blob 20214282c2426fcf91b0cd7635598aedb1ae06a7 AUTHORS
100644 blob c750a69b34d9c6cd4a914966f104176d00edf5f4 BUGS
100644 blob 1df557054723c07a38dc07014966a80bf024fdbc COPYING
...
If I want to see into a subdirectory:
git ls-tree a1e49c~1 iges/
100644 blob 2ce043e0fec731921623a30b66a61350a6ca8f28 iges/Makefile.am
100644 blob 77a5bc69e89aae54230f3594a29345b4a6210c43 iges/add_face.c
100644 blob de7c126c87da7202a0fff25b39915c1605b6624e iges/add_inner_shell.c
...
Or look recursively for a specific path:
$ git ls-tree -r a1e49c~1 |grep /iges\\.c
100644 blob 3cc309a9a5cc94b19ac1ffcda9f4a1204f889bbc iges/iges.c
To follow back up the parent-child chain starting from that commit, I can just pull a local log:
$ git log --oneline -10 a1e49c
a1e49c5edb Vast reorganization begins. Sources moved from top-level directories into src/.
be1f313780 Sources that are external to BRL-CAD are moved from the top level to src/other/.
4440f1c095 Sources that are external to BRL-CAD are moved from the top level to src/other/.
fa32f6950a The old regression test scripts are being replaced by something else. Likely it'll be Corredor with some unit test framework. The old scripts are so far out of sync and so inadequate that it's simply not worth it any more.
074785b939 moved from html/ to doc/html/
4e5eaaaa87 s/.doc/.tr/
b51a0ee5e9 renamed .doc files to .tr since they are [tng]roff files
40e36bc94e old nmake visual studio file no longer exists
679e068d94 cake is no more and theres no incentive to maintain it any more so .. buh bye.
29ba93efce rename the text files from .doc to a .txt extension. reserve .doc extension for groff files
Other cute tricks... this finds all the file paths that had the file name TODO:
$ git log --all --name-only --pretty=format:"" "**/TODO" |sort|uniq
doc/docbook/resources/other/standard/xsl/TODO
doc/docbook/resources/standard/xsl/TODO
doc/docbook/system/man3/en/TODO
doc/docbook/system/man3/TODO
libitcl3.2/TODO
libitcl/TODO
libpng/TODO
misc/d-bindings/TODO
misc/tools/astyle/TODO
misc/tools/svn2cl/TODO
src/archer/TODO
src/libdm/TODO
src/libged/TODO
src/libicv/TODO
src/libpc/TODO
src/other/blt/src/TODO
src/other/ext/stepcode/TODO
src/other/ext/tcl/compat/zlib/contrib/iostream3/TODO
src/other/ext/tcl/pkgs/itcl4.2.0/TODO
src/other/ext/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/other/flex/TODO
src/other/freetype/docs/TODO
src/other/incrTcl/itcl/TODO
src/other/incrTcl/itk/TODO
src/other/incrTcl/TODO
src/other/libitcl/TODO
src/other/libnetpbm/TODO
src/other/libpng/TODO
src/other/libz/contrib/iostream3/TODO
src/other/openscenegraph/TODO
src/other/stepcode/TODO
src/other/step/TODO
src/other/tcl/compat/zlib/contrib/iostream3/TODO
src/other/tcl/pkgs/itcl4.0.4/TODO
src/other/tcl/pkgs/itcl4.2.0/TODO
src/other/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/other/uuid/TODO
src/qbrlcad/TODO
src/qged/TODO
src/superbuild/stepcode/TODO
src/superbuild/tcl/compat/zlib/contrib/iostream3/TODO
src/superbuild/tcl/pkgs/itcl4.2.0/TODO
src/superbuild/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/tclscripts/checker/TODO
If you know something about the contents, you can use git grep - for example, if I think the historical version of "iges.c" that I'm looking for has the string "Code to support the g-iges converter" in it but I don't know if the file name changed, I can do the following to grep for it back 5 commits:
$ git grep "Code to support the g-iges converter" $(git log -5 --pretty=format:"%H" 3408f5ba122027)
3408f5ba1220271623a90b3740eb43abe06a857a:src/conv/iges/iges.c: * Code to support the g-iges converter
90f783ca790a5a2f7d176c1b9c0a5eba4c880927:src/iges/iges.c: * Code to support the g-iges converter
f89fb406daf8348bf215ed96f115bdcf9bbd072c:src/iges/iges.c: * Code to support the g-iges converter
b6414214c3cdd7e883be1d5f3cd19f9102deb9ec:src/iges/iges.c: * Code to support the g-iges converter
Notice only 4 commits reported matching that content. If we look at the straight log for 5 commits from that point:
$ git log --oneline -5 3408f5ba122
3408f5ba12 (HEAD) moved all the geometry converter directories from src/. to src/conv/.
48a6bed946 a single iges file didn't make it for some bizzare reason, manually move from src/iges to src/conv/iges
90f783ca79 iges converter moved
f89fb406da moved all the geometry converter directories from src/. to src/conv/.
b6414214c3 formatting, spelling, reference the tasker too
Commit 48a6bed946's tree does not have a file (by any name) matching that string.
iges.h has similar results, being missing from 3 commits (this is from a checkout of 3408f5ba12:
$ git grep "I G E S . H" $(git log -5 --pretty=format:"%H")
3408f5ba1220271623a90b3740eb43abe06a857a:src/conv/iges/iges.h:/* I G E S . H
b6414214c3cdd7e883be1d5f3cd19f9102deb9ec:src/iges/iges.h:/* I G E S . H
$ git log --oneline -5
3408f5ba12 (HEAD) moved all the geometry converter directories from src/. to src/conv/.
48a6bed946 a single iges file didn't make it for some bizzare reason, manually move from src/iges to src/conv/iges
90f783ca79 iges converter moved
f89fb406da moved all the geometry converter directories from src/. to src/conv/.
b6414214c3 formatting, spelling, reference the tasker too
I'm still not sure why git log --follow pulls in 23649 for iges.h when doing the src/conv/iges/iges.h path search - it's clearly wrong. However, if I check out the first commit that does have the iges.h contents again (b6414214c3) git log --follow
looks like it can go the rest of the way successfully.
@Sean My thinking is it's more likely we found a bug in git log --follow than in the repo data...
@Sean did we want to set the BRL-CAD github org's icon to the BRL-CAD logo? Right now it's just one of the generic Github images...
That's helpful. Looks like ls-tree in combo with a couple other commands can help me walk it back.
starseeker said:
Sean did we want to set the BRL-CAD github org's icon to the BRL-CAD logo? Right now it's just one of the generic Github images...
Yep, good idea. Hadn't gotten to cosmetic yet.
Really needing commits with diffs... but apparently that's going to require some customization. Will have to live with links to the changes for now.
I just saw some other organizations in GitHub and they have verified tag. We can have it too right?
Later, sure. Not a priority right now.
The verified
tag is the committers responsibility.
Commits, made online by GitHub web interface, are verified with a GitHub key automatically.
Committer with push access have to set up a GPG key to their GitHub account and sign local commits using this key.
Then commits are verified by the developers key. A click on the verified
tag shows the key owner.
See https://docs.github.com/en/github/authenticating-to-github/managing-commit-signature-verification
I added the new repository location to OpenHub and removed the older ones: https://www.openhub.net/p/brlcad/enlistments
So far the new repo hasn't "taken" for analysis - it tried to pull it down last night, but towards the end of the processing this morning something must have gone wrong. I added some paths to the ignore files (src/other, etc.) - that may help it complete successfully. Fingers crossed...
There we go - 15 hour delay updating. BRL-CAD's OpenHub page is back!
That's really great to have our stats back online fully. Awesome!
is it, like, really really official, or still moving bits into place? :D
It's official from my perspective - the SVN repo is frozen and all dev activity is now on github
There's still a lot of polishing to do on the site - get our logo up, see if we can migrate the sf metadata (patches, bug reports, etc.) somehow, etc. But Github is now the active development center.
rock on, now we're all moving to ... :D
grats on accomplishing such a mega-effort
thanks :-). It's satisfying to have it complete, although I'm still finding myself in the "confound it, why doesn't git record file moves" camp
The CI testing has been Really Useful though - it's already caught me a number of times.
I tried turning on CodeQL to see what happens - early signs suggest we may be too big a bite for that setup to handle.
blehhhh, ci/cd stacks, that's my life lately. On software that takes 40 minutes on a 64 core (128 hyperthread) machine to compile and, uh, a test sys that is heavy enough that it'd cost ~$500 on aws to run once and has a minimum 10 hour turnaround... I hear ya on the pain of bein' too big :D
I had already evolved a script to target the clang static analyzer at our core libs selectively, but I figured that would be a local machine only affair. However, I found some examples recently which suggested it might actually be possible to install the necessary pieces on the runner to set that up as a github action. I'm letting CodeQL run a bit to see what happens, but I wouldn't be surprised to see it time out without finishing.
The static analyzer script looks like it may be able to complete in on the order of an hour, which isn't too bad.
We're deliberately building serially in order to minimize stress on the I/O subsystem - I pushed it harder in some early tests and had a few cases where file writes didn't complete properly.
ya'll should get a lil nvme raid with one of them melly-nox connectx5's :D beastly i/o pair
(if the file writes didn't complete properly, either there're kernel bugs or your writer doesn't check return values and bitbuckets data when the buffers are full)
I'm not sure what sort of backend system the Actions setup is using for its runners, so I can't say for sure.
So far at least none of the issues we've hit is anything like that sourceforge failure that led to the duplicate SVN commit id crisis (knock on wood)
Usually when that sort of thing happens I suspect another parallel compilation bug, but in this case it was a single .c file that failed to build - not much opportunity there for parallel issues...
https://github.com/actions/runner/issues/718
@Sean @starseeker i am thinking about trying to migrate the bugs to start getting back to work......and while migrating look at the bugs i can try to fix as starters to getting to know the code
where can i start?
@Sumagna Das You'll want to check with @Sean on that one - I know he has some thoughts about migrating SF data
@starseeker meanwhile....can I transfer the bugs and to-dos from the 2 files?
@Sumagna Das getting started with bug migration sounds great!
were you thinking the BUGS file? I wouldn't migrate those to github issues without first confirming that they are still issues. The BUGS file is intended to be for devs to leave notes on issues that may or may not be user visible, may or may not be fixed, may or may not be opinions on design, etc. It's great for finding things to work on, but I wouldn't necessarily think we want to elevate all of them to a github "issue".
Sean said:
Sumagna Das getting started with bug migration sounds great!
well right now my target is the already present BUGS and TODO files....after that i will try the online issues
A better starting point would be to look at the bugs reported at http://sourceforge.net/p/brlcad/bugs/ ... those could all be migrated automatically or manually
Sean said:
were you thinking the BUGS file? I wouldn't migrate those to github issues without first confirming that they are still issues. The BUGS file is intended to be for devs to leave notes on issues that may or may not be user visible, may or may not be fixed, may or may not be opinions on design, etc. It's great for finding things to work on, but I wouldn't necessarily think we want to elevate all of them to a github "issue".
should i try the todo file ?
Sean said:
A better starting point would be to look at the bugs reported at http://sourceforge.net/p/brlcad/bugs/ ... those could all be migrated automatically or manually
well i was going to try the sf2github script but it needs the bugs.json file to start which i dont have
there are 126 bugs listed on sf.net, 67 feature requests on sf.net, 51 support requests, 4 geometry, and 214 patches. there's about 166 entries in the BUGS file and 492 ideas in the TODO file. :)
Sean said:
there are 126 bugs listed on sf.net, 67 feature requests on sf.net, 51 support requests, 4 geometry, and 214 patches. there's about 166 entries in the BUGS file and 492 ideas in the TODO file. :smile:
that was fast
I mean it all depends on what interests you. working on any of those will be helpful!
anyways i saw that the sf2github script is not updated but i can fix it to work as per our need i think
personally, I'd probably start with the smallest (geometry) and next smallest (support requests), etc just because I like to shorten lists.
Sean said:
personally, I'd probably start with the smallest (geometry) and next smallest (support requests), etc just because I like to shorten lists.
thats not a bad idea actually
right now i was trying to parse the TODO file....should i continue with it or start doing the sf requests?
Sean said:
personally, I'd probably start with the smallest (geometry) and next smallest (support requests), etc just because I like to shorten lists.
to start with this i think i need the bugs.json file or something like that
well, I meant actually address the item, not really migrate it -- or migrate it manually (copy-paste and link to the sf item)
I can look into generating the .json file -- there's a script I have to run as admin, I believe
alternatively, could just look through the list of bugs in BUGS like you'd said and find one you think you understand -- then add it to issues, then work on it ;)
Just as an observation - the BUGS and TODO files, by virtue of being part of the repo, are already preserved on Github. The data in the Sourceforge systems isn't migrated at all, so from a data preservation standpoint it's the data we don't have migrated at all, in any form.
For the SF data, my thinking (again for what it's worth) is that it's probably worth migrating them by hand, and doing some checking to see if the original issue is still valid for the current codebase. The end result would be a better set of issues than just a mechanical migration.
Yeah definitely would be most valuable to have some manually migrate and validate sf tracker items.
That’s where I’d probably start with the geometry because there’s just four of them and they could easily turn into four pull requests for new sample geom. iirc they just needed docs and some minor cleanup like making sure top level object name made sense, minimal overlaps, make sure title is set, etc
so i tried pulling all of the tickets throught the SF api.....one thing i have to know is that there are a few tickets with attachments, right?
starseeker said:
For the SF data, my thinking (again for what it's worth) is that it's probably worth migrating them by hand, and doing some checking to see if the original issue is still valid for the current codebase. The end result would be a better set of issues than just a mechanical migration.
if manual checking is needed then i can try putting all of the tickets i got throught API into a text file and then manually checking the needed ones?
(deleted)
There are a lot of tickets with attachments (especially the patches and geometry trackers), but not so much for the feature and support request trackers.
Sumagna Das said:
starseeker said:
if manual checking is needed then i can try putting all of the tickets i got throught API into a text file and then manually checking the needed ones?
Yes, that would definitely work and be helpful! Any trackers that are still relevant could be manually submitted as a gh issue or pr (in the case of the patches and geometry).
Sean said:
There are a lot of tickets with attachments (especially the patches and geometry trackers), but not so much for the feature and support request trackers.
i am giving only the urls of the attachments because nothing else can be gotten from the API
Sean said:
Sumagna Das said:
starseeker said:
if manual checking is needed then i can try putting all of the tickets i got throught API into a text file and then manually checking the needed ones?Yes, that would definitely work and be helpful! Any trackers that are still relevant could be manually submitted as a gh issue or pr (in the case of the patches and geometry).
i will make a text file for an intermediate place for the tickets then......after the manual checking, the text file can again be parsed and then put onto github if that works
bugs
feature-requests
support-requests
geometry
these file contain tickets with their information i got from the sourceforge API.....if this works, then i can make a parser which will parse the checked tickets and get it into github as issues
@Sumagna Das that sounds good, but I don't want to cause you work if there's a tool I can run as admin to migrate everything -- what about this: https://github.com/cmungall/gosf2github ?
Sean said:
Sumagna Das that sounds good, but I don't want to cause you work if there's a tool I can run as admin to migrate everything -- what about this: https://github.com/cmungall/gosf2github ?
wait.....there was and updated tool....the last time i checked there were no updated tools for this
gotta look out for stuff
@Sumagna Das there's no mention whether that tool does anything with file uploads, but I was going to test it out on the geometry tracker since it's so small.. If it goes bad, probably won't be hard to clean up after it.
Sean said:
Sumagna Das there's no mention whether that tool does anything with file uploads, but I was going to test it out on the geometry tracker since it's so small.. If it goes bad, probably won't be hard to clean up after it.
geometry tracker doesnt have any attachments and its small so not a problem i guess
@Sumagna Das the geometry tracker does have attachments... they're in the comments
that's its whole point, they're people submitting geometry models (.g files)
Sean said:
that's its whole point, they're people submitting geometry models (.g files)
i can do something about that i think
the SF API supports providing the discussion (posts) as well as it uploads/attachments via requests i think
(deleted)
Profanity aside, this is actually a really useful reference for common git issues: https://ohshitgit.com
This looks fun… https://github.com/AmrDeveloper/GQL
Huh, interesting. Certainly feels like it should be useful for some sort of repo report generation
Hey Sean, it has been years, how's everything? I have received the "ok" from Mark to work fewer hours to do that point cloud thing. I'm planning the algorithm on paper before I get started, there are some details I'm undecided how to handle
And that post of mine was off-topic. I'm not used to this Zulip topic-based chat
I would have a couple questions... Cliff said you already had Screened Poisson reconstruction, the wording suggested that it was satisfactory but very slow. Could it be just a matter of beating the hell out of that code with threads, SSE/AVX/AVX-512, atomics, NUMA awareness? I briefly looked at the code but was a bit lost backtracking beyond SPSR.cpp
And do you have some kind of deadline or desired date for the mesh reconstruction algorithm? Just to have an idea how I'll weight the couple different things that need to be done
hey @Alexis Naveros very delayed reply!... everything has been going really great, and glad to hear they're going well for you too. short answer is "I dunno" on the screened poisson, at least to say for sure. I'm fairly certain it's typical unstable non-performant academic code, so yeah, probably tons of room for optimizations and improvement.
On that point, I listed to a talk just last week by someone that was comparing screened poisson with other methods, outlining the general deficiencies of the algorithm. I believe they were approaching it from a completely different perspective, incorporating ML into the pipeline to make more dynamic decisions, with good results.
if it wasn't obvious, we don't have deadlines here. or better still, there's many many many desired deadlines to choose from and they often go wooshing by, but we make progress steadily still.
I consequently just finished implementing a montecarlo approach to external surface area estimation that samples the hell out of the exterior surfaces and would love a robust point-cloud to solid mesh routine. My current tactic is going to be to sample it very densely, make thin cylinders at each surface hit point, mesh and union them all together, and (if sampled densely enough) I should be able to eliminate all the interior faces/points. It's stupid, but it just might work well.
If you came up with a better way, I'd gladly use it!
Last updated: Jan 09 2025 at 00:46 UTC