GitHub · brlcad · Zulip Chat Archive

As some of you already know, we're planning on moving our main repository and operations from SourceForge to GitHub real soon now. It's taken approximately two years (yes years, but worked predominantly on weekends and evenings) to get the entirety of BRL-CAD's repository converted from Subversion to Git. This work, by Cliff Yapp, has included fairly extensive complicated mappings to preserve as much data as possible, to fix old corruption, to track changes across major disruptions, to verify and validate that everything is preserved.

Sean (Sep 10 2019 at 18:42):

As this is a big change to our development operations, this is an intentional "open comments" period for folks to talk, to adjust, ask questions, give feedback, get prepared, explore tutorials, etc. The intention is to flip the switch in a few weeks.

Sean (Sep 10 2019 at 18:44):

One question that's already been a point of discussion (and some of you have already shared your views privately, thank you) is how to handle the commit e-mail associated with past commits. If people want them associated with their current GitHub profile/e-mail, then we'll need to set those before migration is complete. As it currently stands, everyone's commits are associated with a fictitious "USER@sf" e-mail.

Sean (Sep 10 2019 at 18:47):

If you'd like your commits associated with a specific name and/or address, please contact me in private or make the change yourself in misc/repoconv/account-map

scorp08 (Sep 11 2019 at 10:24):

Sean (Sep 11 2019 at 18:40):

@scorp08 Yes, of course it will be possible. Technically it's not hard to fork the Svn repo now, but it will become even easier.

Sean (Sep 11 2019 at 18:40):

We'll still be maintaining a central repository structure to encourage collaboration and accelerated development, but it's all good. If people feel more empowered to work on the code in a fork than they do in a clone, I'll still be happy to see their development. Hopefully it won't get too messy and we can actually improve coordination and make it even easier for new developers to get involved with improving the code base.

Erik (Feb 29 2020 at 13:30):

Sean (Feb 29 2020 at 14:51):

I need a couple more days to contact the last remaining committers to get their e-mails, create aliases for the handful that aren't reachable, then assume another 2 weeks for @starseeker to run the reconstruction, followed by maybe 1 more week of validation and testing while uploading to GitHub, and if all goes well, we should be up and running by the end of the month!

starseeker (Mar 14 2020 at 14:09):

Erik (Mar 17 2020 at 12:14):

@Sean status? anything anyone can do to help? do I need to swing by the farm supply store for a salt lick and a cattle prod to do the "carrot and stick" thing? :D

starseeker (Mar 20 2020 at 02:51):

Erik (Mar 22 2020 at 22:37):

less hearts, more answers, boy. What's the holdup? my git-fu is pretty strong these days and my drives tend to be more than 8 gigs (the drive my home server used when we did cvs->svn) these days, so I'm not complaining about repo size :D

starseeker (Mar 23 2020 at 00:04):

@Erik If you want a preview, you can take a look at https://github.com/starseeker/git_conv_test - it's about 5 months out of date now and the non-email committer names mess with github's stat calculators, but it should be a pretty fair representation of what the conversion will end up looking like otherwise. If you want to ,check it out to see how it behaves for you (and see if you spot anything wrong). If you want the git notes that have the SVN numbers, you'll need to explicitly grab the notes as well with: git fetch origin refs/notes/commits:refs/notes/commits

starseeker (Mar 23 2020 at 00:12):

The hideous conversion process is laid out in misc/repoconv/CONVERT.sh - it's about as ugly as it gets: C++ mixed with shell scripts mixed with sed and stream of consciousness quick and dirty hackery , but it seems to (slowly) get the job done. I'm still not very skilled with using git day-to-day, but I now know quite a bit more than I wanted to about fast import and export and friends.

Sean (Mar 23 2020 at 01:57):

@Erik he's been patiently waiting on me. I'm the holdup. I've been confirming with every past committer since I have contacts for nearly everyone and they've responded with a plethora of e-mails to use. I just had a few remaining to contact which got delayed with a tasker at the office and GCI and GSoC prep and server issue and ... delays. Now with all this ample time on hand (hah), at least time at keyboard, I've been getting through mad backloggage so we should be able to wrap this up with the final pass this week I think.

Sean (Mar 23 2020 at 01:58):

It's not a space issue, it's about having a complete history that doesn't loose anything, which @starseeker has gone to exceptional lengths to preserve. The rest is limitations of github that require real contact info if we want to have real stat preservation.

Erik (Mar 23 2020 at 11:07):

git history is rewritable. mistakes at this stage can be fixed. (rewriting git history is dangerous and expert friendly, but we're not ... committed.

Erik (Mar 23 2020 at 11:08):

good luck wrapping up the last few, if'n ya'll need git or shell help, lemme know, I've been using git almost exclusively since... hm, was it '13 that I left arl to check out the modern world? :D

starseeker (Mar 23 2020 at 12:14):

We may not be committed, but once we go live with the new github repo and people start forking rewriting the full history would be highly disruptive. On the order of what we nearly had to do with the Great SVN Duplicate Commit ID crisis a number of years back.

starseeker (Mar 23 2020 at 12:22):

The chaining of SHA1 hashes is neat for repository integrity, but it means there's no such thing as a local history change. I spent a lot of time thrashing trying to figure out if I could splice the newer SVN conversion onto the older CVS git conversion, and it took me longer than it should have to realize that it's actually structurally impossible to do that with anything other than a full commit replay of the post-CVS commits on top of the CVS conversion (hello, rabbit hole).

So since I REALLY don't want to have to wade through all of that any more times than I need to (there are some finicky manual steps that have to get updated each time the committer emails change, not to mention the delightful experience of mucking around in the swamp mud of my conversion logic) I'm willing to wait for @Sean to get it right the first time :-)

starseeker (Mar 23 2020 at 12:25):

Whadya mean "modern"? We use CMake and everything these days! We're even embracing this newfangled C++11 thing! Now get off my lawn! :-P

Erik (Mar 23 2020 at 12:55):

Erik (Mar 23 2020 at 12:56):

(still c++, though... swift and go are way nicer... I hear good things about rust, too)

starseeker (Mar 23 2020 at 16:36):

@Daniel Rossberg The plan right now is to also convert all the smaller project histories (including rt^3) to their own individual git repos. Does git+github work for you for rt^3? (We have to change the name to rt_3 - the ^ character causes some problems for git..)

Daniel Rossberg (Mar 23 2020 at 16:55):

starseeker (Mar 23 2020 at 17:55):

That's fine too, assuming it works for the converisons - rt_3 was just what I had put in the original svn-fast-export mapping files when I found out ^ wouldn't work.

Sean (Mar 24 2020 at 06:32):

The "rt-cubed" name doesn't need to be preserved. I would suggest renaming the repo to "moose" since that's the name we decided on.

Sean (Mar 24 2020 at 06:32):

Sean (Mar 24 2020 at 06:39):

Sean (Mar 24 2020 at 06:41):

@Erik git history is typically rewritable, but we're (thus far) using a feature of git (notes) that precludes rewriting history without rebuilding hashes. that's because the way git notes are currently implemented, they attach to specific hashes and are not updated on history edits. they get orphaned. it's lame, but it's the best solution so far for attaching svn's metadata to specific commits. open to other solutions.

Daniel Rossberg (Mar 24 2020 at 07:45):

I recommend to stay with "rt-cubed" name for the conversion, because this branch is more of a sandbox for experimental extensions than C++ interface specific.

However, I agree with you to aim for an own moose repository for the C++ interface and its belongings in the future.

Daniel Rossberg (Mar 24 2020 at 07:51):

I would officially name it BRL-CAD MOOSE for "BRL-CAD Modular Object Oriented Software Extension". I.e., the MOOSE acronym makes only sense with the BRL-CAD prefix.

Sean (Mar 24 2020 at 07:57):

Sean (Mar 24 2020 at 07:58):

Daniel Rossberg (Mar 24 2020 at 08:02):

Whatever :grinning_face_with_smiling_eyes:
Important is the name MOOSE with its wonderful logo.

Sean (Mar 24 2020 at 08:03):

starseeker (Mar 24 2020 at 11:59):

@Erik about the git-notes usage - I did that so the git commit messages could exactly match their SVN counterparts, which allows for a fairly straightforward analysis to map SVN ids to older commits.

For the CVS portion of the conversion (i.e. the commits put in Git straight from the cvs repo) the ordering and specifics of the generated commits varies a bit from the cvs->svn results (which is one of the reasons I went to all this trouble - cvs-git produced better results with the very early commits). That means the commit messages (when unique) are the best available way to find SVN id mappings to older git commits, hence I needed to keep them the same in both conversions. (Even that isn't enough to reliably peg all cvs->git commits with SVN ids, but an upside of using notes is that if someone someday wants to do a better job of SVN id mapping than I managed they can do so without disturbing the main Git history.)

starseeker (Mar 31 2020 at 11:48):

starseeker (Apr 07 2020 at 18:56):

Erik (Apr 08 2020 at 13:52):

starseeker (Apr 08 2020 at 19:18):

starseeker (Apr 13 2020 at 12:43):

Erik (Apr 17 2020 at 23:39):

starseeker (Apr 19 2020 at 18:44):

Erik (Apr 22 2020 at 17:00):

starseeker (Apr 22 2020 at 19:34):

Sean (Apr 22 2020 at 19:37):

starseeker (Apr 22 2020 at 19:39):

starseeker (Apr 29 2020 at 16:03):

starseeker (Apr 29 2020 at 16:05):

@Sean would it help if you sent me the updated info and I integrated it into updated author maps? As long as those are finalized we can start the conversion without actually requiring the aliases be present on bz...

starseeker (May 02 2020 at 17:21):

Erik (May 03 2020 at 14:44):

Sean (May 03 2020 at 14:44):

Erik (May 03 2020 at 14:46):

Sean (May 03 2020 at 14:48):

Like whether everyone got their invite -- I resent it again for a third time to all, you get yours?

Erik (May 03 2020 at 14:48):

(burndown list? blockers we can help with? rough eta? have you sourced adequate caffeine?)

Sean (May 03 2020 at 14:49):

Erik (May 03 2020 at 14:49):

Erik (May 03 2020 at 14:50):

Erik (May 03 2020 at 14:51):

Erik (May 03 2020 at 15:00):

starseeker (May 03 2020 at 18:30):

Sean (May 03 2020 at 18:45):

starseeker (May 03 2020 at 22:17):

<snort> from the looks of that contraption you're lucky breakage didn't involve an explosion

starseeker (May 03 2020 at 22:18):

Sean (May 03 2020 at 22:46):

I have a similar expresso machine I've used for 15+ years. Sounds like it could explode any minute, but that's just how they work. They build up steam pressure to heat and force liquid through the grounds. Which also means you want really fine grounds, not the same grinding used in drip coffee machines.

Erik (May 05 2020 at 23:20):

the work one, has two cafe grade grinders to the left of it, it's a beast (and was some of my first on the job training at this place). I'm stuck with keurig's, a moka and an old target espresso maker that gathers dust :/

Erik (May 05 2020 at 23:20):

Sean (May 05 2020 at 23:21):

spending all time fixing builds and debugging, want to help -- can figure out why mysqld is using so much memory, see if it can be cut in half

Sean (May 05 2020 at 23:21):

Erik (May 05 2020 at 23:22):

sure, people are using databases. remove the db's and terminate access, problem solved

Erik (May 05 2020 at 23:22):

I mean, uh, O:-) for jenkins, you might be able to tune max vm size, but java historically has a habit of not releasing much memory, it likes to hold onto it for it's own allocator

Sean (May 05 2020 at 23:22):

heh, well I'm almost certain the largest offender is the wiki .. but that's a hypothesis and still doesn't mean there's not some configuration options that might reduce usage too

Sean (May 05 2020 at 23:23):

yeah, I know java is a notorious pig, but almost certainly can get it to use less than 6GB

Erik (May 05 2020 at 23:23):

Sean (May 05 2020 at 23:23):

Sean (May 05 2020 at 23:24):

Erik (May 05 2020 at 23:26):

probably... could try just restarting those services and see what happens, certainly something we could tune to fix, but might be a quick bandaid^Wadhesive bandage

Sean (May 05 2020 at 23:27):

Erik (May 05 2020 at 23:30):

I tuned down a couple of it's buffers, looks like it's at 1/2 vm and 1/3 res, we'll see how far it drifts up.

Erik (May 05 2020 at 23:31):

Sean (May 06 2020 at 00:06):

starseeker (May 06 2020 at 01:55):

@Sean If I'm still breaking the OSX build, I can shift entirely to working in branches until we finish the github migration...

starseeker (May 06 2020 at 01:58):

Alternately, I can put a snapshot of trunk up on my own github and see if I can figure out how to hook up the OSX CI system

starseeker (May 18 2020 at 12:13):

starseeker (May 28 2020 at 16:45):

Erik (May 31 2020 at 13:18):

starseeker (Jun 13 2020 at 03:15):

Sumagna Das (Jun 13 2020 at 03:16):

starseeker (Jun 13 2020 at 03:16):

discussion of an eventual transition of the BRL-CAD source repository to using the Git version control system

Sumagna Das (Jun 13 2020 at 03:17):

starseeker (Jun 13 2020 at 03:17):

Sumagna Das (Jun 13 2020 at 03:17):

starseeker (Jun 22 2020 at 21:22):

Erik (Jun 27 2020 at 00:44):

what resource is missing to put a bow on this? lack of Seans? we can do the star trek thing and split him into the saucer Sean section and the other Chris section, right? "Make it so, number :poop: "

Sean (Jul 02 2020 at 08:35):

@starseeker Probably missed your window, but ... it'll be there for whenever you get back.
It's done!

Sean (Jul 02 2020 at 08:38):

Aliases have been added and lots of confirmations and updates for others. Apparently took some 20+ hours to finish it up. Lots of proper e-mails in there, though, so worth it. Lots of simple awareness too.

Sean (Jul 02 2020 at 08:39):

a bunch of folks link through an alias to a noreply@ address in cases where I didn't have and couldn't find any contact information or if they were unreachable.

starseeker (Jul 02 2020 at 12:48):

Awesome - thanks! Will have to wait to kick off the main run, but I should be able to start on some of the preliminaries (in particular, updating the bridging commits between cvs and svn, which require manual adjustment.)

starseeker (Jul 02 2020 at 12:51):

Sean (Jul 02 2020 at 22:10):

Oh, good catch. Yes, significant, as it means I hadn't reconciled his yet. Yay for book-keeping that served its purpose! No response, so he's replaced with an alias.

Sean (Jul 02 2020 at 22:12):

At this point, any remaining uncertainty or issues I'm just replacing with brlcad.org aliases. Frankly, they could have all been replaced with brlcad.org aliases and captured inside GitHub, but this way I don't have to be in the loop (as they are DNS MX records).

starseeker (Jul 03 2020 at 01:36):

@Sean The only other thing I noticed is the cvs_authormap has a "jebbly" entry for Jeffrey Liu, which wasn't in the svn map (probably my fault.) Should that just be jebbly@brlcad.org ?

starseeker (Jul 03 2020 at 01:52):

Presumably the same Jeffery Liu in the chat now... I got misled by seeing the name only in the CVS authormap.

Sean (Jul 03 2020 at 02:29):

starseeker (Jul 03 2020 at 02:46):

I pulled a list of committers from the svn log and compared it to the ones in the map - I think we're good now. I'll run a basic conversion of the CVS history and upload it to github to see what happens with the new email addresses

Sean (Jul 03 2020 at 02:56):

If there's any problem, we should either just switch EVERYTHING to brlcad.org aliases for the historic commits (to preserve the username as-is, even duplicates), or we should check out what GitLab does with the same info.

starseeker (Jul 03 2020 at 03:01):

starseeker (Jul 03 2020 at 03:04):

starseeker (Jul 03 2020 at 03:20):

starseeker (Jul 03 2020 at 03:23):

starseeker (Jul 03 2020 at 03:29):

starseeker (Jul 03 2020 at 03:32):

Not sure if individuals profiles will pick up retroactively on commits made before they joined github...

starseeker (Jul 03 2020 at 03:34):

Also, this being in my own personal grouping (as opposed to an org) might have some impact...

Sean (Jul 03 2020 at 03:35):

Sean (Jul 03 2020 at 03:37):

We could create github accounts for all of the brlcad.org aliased accounts, that way they'd at least show up.

Sean (Jul 03 2020 at 03:37):

Sean (Jul 03 2020 at 03:44):

so... let's see. there are 102 entries of which 88 are unique. minus the 19 accounts it found. minus 26 aliased.
that leaves 43 unaccounted and unaccountable.

Sean (Jul 03 2020 at 03:52):

284281400@qq.com
abhijit.nandy@gmail.com
agkphysics@gmail.com
andrecastelo@gmail.com
anuragmurty@gmail.com
ben.e.saunders@gmail.com
bhinesley@gmail.com
bilmer1@comcast.net
brlcad@mail.lordofbikes.de
carl.nuzman@nokia-bell-labs.com
carlm0404@gmail.com
cdueck93@gmail.com
cezar.elnazli2@gmail.com
cprecup@cisco.com
dgodbey@yahoo.com
dloman77@gmail.com
doug@survice.com
ebautu@gmail.com
g.sayol@gmail.com
indianlarry@verizon.net
jdoliner@gmail.com
kunigami@gmail.com
manuel.montezelo@gmail.com
marcodomingues20@gmail.com
maths22@gmail.com
michael.j.gillich@gmail.com
mireastefangabriel@gmail.com
mohitdaga.lnmiit@gmail.com
nreed1@umbc.edu
popescu.andrei1991@gmail.com
robert.reschly@gmail.com
sam@hocevar.net
sharan.nyn@gmail.com
shubhamrathore1947@gmail.com
thedawnthomas@gmail.com
tim@jvsw.com
tom.browder@gmail.com
u2isaac@gmail.com
vladbogolin@gmail.com
zaqcloud@hotmail.com

Sean (Jul 03 2020 at 03:54):

indianlarry has an account, so could check with him to see what's up, what he set it to

Sean (Jul 03 2020 at 03:54):

Sean (Jul 03 2020 at 04:07):

question for you @starseeker , looking at the docs it looks like both username and email get recorded. are the old usernames being preserved or collapsed? just wondering.

starseeker (Jul 03 2020 at 12:23):

@Sean Right now they're collapsed - you would need to pull the map file to associate a sourceforge name with the github id, and for individuals with multiple svn ids I didn't preserve which commit was made with which id. Could probably do so using the notes mechanism, now that I think about it, if that's of interest.

starseeker (Jul 03 2020 at 12:25):

Would github allow the creation of accounts by someone other than the individual in question? My thought was to inquire if there was any way to have the contributors page report non-github contributors in some fashion...

starseeker (Jul 03 2020 at 12:32):

btw, did you switch Erik's email as a test of the alias mechanism? He had given us another email earlier, if I'm remembering correct.y

starseeker (Jul 03 2020 at 12:36):

When I look at (say) your individual page, it's only reporting your contribution activity back to when you joined github in 2011 - it doesn't look like it's picking up on the older commits and associating them with your account

starseeker (Jul 03 2020 at 12:37):

might be another question for the github folks, if anyone has good contacts there...

Erik (Jul 03 2020 at 13:12):

starseeker (Jul 04 2020 at 12:42):

starseeker (Jul 04 2020 at 12:43):

@Sean What do you think? Want to go ahead with the conversion with the email addresses as-is? Or try to contact github to find out more?

starseeker (Jul 04 2020 at 13:12):

I'm thinking it's probably not worth it to tweak too much more, unless you want to track down the root cause of the "surprising" accounts that ought to show up even by github's current contributor criteria but aren't... We can provide something like https://brlcad.org/~starseeker/git_stats/general.html on our own project site to document contributions with more control.

Not ideal certainly - it would be nice if we could get the github site to more accurately reflect the full history - but so far I'm not having much luck trying to research whether that is doable...

starseeker (Jul 04 2020 at 13:20):

(one thought - did indianlarry commit to the repository prior to our conversion to SVN? If he doesn't have any CVS commits he wouldn't show up in this test...)

Erik (Jul 04 2020 at 13:46):

starseeker (Jul 04 2020 at 14:38):

2009 was indianlarry's earliest commit, according to the previous git conversion.

starseeker (Jul 04 2020 at 14:39):

starseeker (Jul 04 2020 at 14:40):

Better test will be once I've got the SVN history spliced on, but that's the hard part. Looks like it's time to update the bridge commits...

starseeker (Jul 04 2020 at 14:43):

(or more precisely, I'm out of excuses to avoid updating the bridging commits... ick.)

Sean (Jul 08 2020 at 18:24):

@starseeker I think it's worth doing both - asking github if there's a way to show/list contributors that don't have a github account, just to make sure we're not doing something wrong, and proceeding ahead.

The surprising accounts are probably worth looking into just to double-check that they're not a typo or dead e-mail. There was only a couple, one that I just fixed yesterday. some people we had gmail accounts for have switched to different gmail accounts.

I think if we can account for everyone and the vast majority - say 95% - show up under the contributor list, we're good. We may get to that percentage just by ensuring one or two accounts.

starseeker (Jul 08 2020 at 19:51):

starseeker (Jul 08 2020 at 19:52):

Sean (Jul 08 2020 at 19:54):

I created an account for mike, so that should get another ten thousand commits. I'm looking through the list and going to see if any heavy contributors are missing.

starseeker (Jul 08 2020 at 19:56):

Sean (Jul 08 2020 at 19:57):

parker has a curious commit count.. that page is showing 4431 but I'm seeing 5105 in svn, that because it's only played through 2012?

Sean (Jul 08 2020 at 19:58):

starseeker (Jul 08 2020 at 19:58):

Sean (Jul 08 2020 at 19:59):

starseeker (Jul 08 2020 at 19:59):

Sean (Jul 08 2020 at 20:00):

Sean (Jul 08 2020 at 20:01):

starseeker (Jul 08 2020 at 20:02):

If we're missing any from the cvs era that's a problem - newer SVN committers (post 2011) won't show yet.

starseeker (Jul 08 2020 at 20:02):

Sean (Jul 08 2020 at 20:03):

Sean (Jul 08 2020 at 20:04):

don't know if you have an easy way to compare, I just did an svn log > log and counted them up

starseeker (Jul 08 2020 at 20:07):

git log --author="Glenn Durfee" --pretty=oneline - gives a quick overview of the git commits

starseeker (Jul 08 2020 at 20:08):

I can already see early commits in the git history that have the same git message - that's a probable source

Sean (Jul 08 2020 at 20:09):

starseeker (Jul 08 2020 at 20:33):

My understanding is that cvs-fast-export had to deduce which cvs operations in different files denote "commits" for git, when those timestamps don't exactly line up. The tolerance on how much the time span is allowed to vary before a new commit is declared is one of the settings that can be altered on the tool.

starseeker (Jul 08 2020 at 20:36):

I haven't tried to adjust that too much - it only impacts the CVS portion of the history. The CONVERT.sh script has the setup used for the initial cvs conversion, and that's quite fast if you want to do some experimentation.

Sean (Jul 08 2020 at 20:38):

I don't want to experiment, but I would like to confirm that is exactly what's happening here, as opposed to some other unexpected behavior or a bug or bad data or ...

Sean (Jul 08 2020 at 20:40):

if you look at a couple of the duplicates, do they differ in files, timestamps separated by a few seconds, or something else? might be concerning if they're different changes to the same files.

starseeker (Jul 08 2020 at 20:45):

I see a commit 1996-03-25 16:42:45 that doesn't have a corresponding svn commit id

starseeker (Jul 08 2020 at 20:47):

That means the analysis scripts couldn't find a commit message with a close timestamp

Sean (Jul 08 2020 at 20:48):

what's the actual content of the commit? does the other commit with the matching log message match an svn commit it? does it's change match the svn change? (should it)

starseeker (Jul 08 2020 at 20:51):

That one doesn't have a matching message. I'm seeing more that don't (at least 10 so far) which is a bit surprising. There is one at 1994-12-16 15:33:48 that does have a subsequent commit with an SVN id assigned - 1994-12-16 15:38:40

starseeker (Jul 08 2020 at 21:16):

r10215 looks like it got split up into a couple commits in git, and the time ordering is slightly different.

Sean (Jul 08 2020 at 21:24):

okay, cool ... does that fully explain it? how much time are we talking about? couple seconds?

starseeker (Jul 08 2020 at 22:22):

starseeker (Jul 08 2020 at 22:48):

If there's a way to get git and svn to generate identically formatted diffs, we could identify when we have actually differing commits - that would be the best/only way to get true assignment of SVN commits to exactly corresponding commits. My estimate was that the maximal utility was to go ahead and assign the numbers based on the commit ids, since it would localize the commit to the general portion of the git history containing the corresponding changes.

starseeker (Jul 08 2020 at 22:48):

The next best thing would be to generate a list of all commits that don't have svn ids assigned and inspect what's happening around them.

starseeker (Jul 08 2020 at 23:32):

starseeker (Jul 08 2020 at 23:34):

starseeker (Jul 09 2020 at 00:03):

Bah - github's web history doesn't have --follow enabled, from the looks of it - ell.c history stops at the restructure.

starseeker (Jul 09 2020 at 00:13):

starseeker (Jul 09 2020 at 16:24):

Looks like github doesn't retroactively add contributors when they create accounts - yesterday's upload still doesn't show him.

Sean (Jul 09 2020 at 22:00):

So I'll need to make sure others are created before final upload. Good to know. Was going to crunch the numbers, but I think carl is the only >1k committer missing. He's not likely to create a github account, so he can be switched to a brlcad.org alias.

starseeker (Jul 13 2020 at 00:45):

@Sean looks like it will be a few more days at least to finish up the test run (in the mid 60000s range now) - I'll upload it as before once it's done so we can inspect the github integration.

Seeing as we now appear to be getting very close, what's the procedure for flipping the switch from sf to github? Lock the SVN repo as read-only, upload the github repo, and update the web page links are the obvious first steps - will we keep using the existing email lists for the time being?

starseeker (Jul 13 2020 at 00:55):

Sean (Jul 13 2020 at 05:48):

yeah, I'd definitely like to import the feature request, support, and bug report trackers to issues, so that's more than of interest. I've come across a couple similar efforts to import sf data. the one you link looks pretty good. suggests we utilize a non-dev account, which is probably a good idea. the patches tracker is a separate beast and will need to be dealt with differently.

Sean (Jul 13 2020 at 05:50):

would help to have a checklist on the wiki so we don't miss an action. willing to write down what you know so we can look into ordering and making sure we got everything? I can add my notes as well.

starseeker (Jul 13 2020 at 12:52):

I've got notes scattered around (most in misc/repoconv/NOTES) with more details.

Sean (Jul 13 2020 at 19:17):

starseeker (Jul 14 2020 at 18:33):

@Erik We're still a few weeks out (at a minimum I need to re-run the conversion with the final account-map in place) but we can see the finish line now

starseeker (Jul 14 2020 at 18:37):

@Sean Is it worth putting out an email to the brlcad-devel list with a "last chance" call for any account info updates? Or is that not needed?

Sean (Jul 14 2020 at 20:38):

starseeker (Jul 16 2020 at 18:15):

Unless there are more email changes needed, we should now be ready to begin the final conversion run.

Sean (Jul 16 2020 at 18:17):

Sean (Jul 16 2020 at 18:19):

Nice! Looks like it's recognizing 55 contributors now. Not too shabby. Did you start that before I changed Carl's address?

starseeker (Jul 16 2020 at 18:21):

Unfortunately, yes - doesn't have that change nor Ben's. That's why I'll have to run one more time.

starseeker (Jul 16 2020 at 18:21):

Also why I was suggesting sending out the "last chance" email - once I kick off this time, we're locked in.

Sean (Jul 16 2020 at 18:21):

Sean (Jul 16 2020 at 18:22):

Sean (Jul 16 2020 at 18:23):

I can send the email now then, unless you wanted to send it? appreciated seeing the draft.

starseeker (Jul 16 2020 at 18:23):

If that looks good I can send it - you're the better wordsmith, so I wanted you to have a crack at it

starseeker (Jul 16 2020 at 18:25):

Assuming a repeat performance, it looks like a bit shy of two weeks for the run - so around the beginning of August we should plan to lock the SVN repository and open up the github repo.

Sean (Jul 16 2020 at 18:25):

I'd reduce it down a bit and put the main point first, but it looks good enough as is too.

starseeker (Jul 16 2020 at 18:26):

OK, go ahead and send it - makes more sense really for you to do so since you've been POC for the emails all along

Sean (Jul 16 2020 at 18:27):

starseeker (Jul 16 2020 at 18:27):

I'm planning to start the run on the 19th, if you want a fixed deadline for the email

starseeker (Jul 16 2020 at 18:30):

We may want to adjust our tag names going forward, so the tar.gz file github generates from the tag will be more meaningful name wise - right now we get "rel-7-30-8.tar.gz"

Sean (Jul 16 2020 at 18:31):

Sean (Jul 16 2020 at 18:40):

I think we're good. If anything, we could adopt Semantic Versioning (which is simply v1.2.3), but we don't fully comply so that would be a bit misleading.

starseeker (Jul 16 2020 at 18:41):

OK, cool. As long as nobody complains that our Github tar.gz download links aren't compliant with HACKING ;-)

Sean (Jul 16 2020 at 18:41):

starseeker (Jul 16 2020 at 18:42):

Ah, so Releases are the fancier version. OK, that's a corner of Github I've not delved into yet

Sean (Jul 16 2020 at 18:43):

Yeah, like I said, those won't necessarily be our source tarballs. They'll be wherever we host our binary platform releases, since we still need those too.

Sean (Jul 16 2020 at 18:45):

Binary downloads aren't something Github supports directly -- some use the GitHubs LFS, some self-host, others still continue to use SourceForge for downloads since that's the one thing they actually are still good at.

starseeker (Jul 16 2020 at 18:45):

starseeker (Jul 16 2020 at 18:46):

Sean (Jul 16 2020 at 18:46):

Sean (Jul 16 2020 at 18:48):

heh, looks like they added it 7 years ago. shows how closely I've been paying attention to it.

starseeker (Jul 16 2020 at 18:50):

<grin> I've mostly been trying to figure out the CI bits - I don't have any projects that do releases, so I hadn't noticed either

starseeker (Jul 16 2020 at 18:53):

Sean (Jul 16 2020 at 19:06):

okay, so looks like we're at 89% commit coverage across those 55 accounts
63715 commits out of 71289

Carl's will get us to 93%. I think a few more will probably get us into the 95-98% ballpark.

starseeker (Jul 16 2020 at 19:06):

starseeker (Jul 16 2020 at 19:07):

I just wanted to be sure we didn't miss anyone currently active on github who wanted to be correctly linked into the history

Sean (Jul 16 2020 at 19:07):

curiously, tom browder's commits are actually the first i've noticed lower than his Svn count... should there be any reason for that?

starseeker (Jul 16 2020 at 19:08):

Sean (Jul 16 2020 at 19:08):

starseeker (Jul 16 2020 at 19:08):

starseeker (Jul 16 2020 at 19:09):

Sean (Jul 16 2020 at 19:09):

can check on your end: svn log > log && grep -E '^r[[:digit:]]+[[:space:]]\|' log | awk '{print $3}' | sort | uniq -c | sort -n
tom's showing 1637 commits. git count is 1630.

Sean (Jul 16 2020 at 19:12):

Sean (Jul 16 2020 at 19:13):

starseeker (Jul 16 2020 at 19:13):

git log --pretty=oneline --author="Thomas Browder" --branches="*" |wc -l gives me a count of 1634

starseeker (Jul 16 2020 at 19:13):

Sean (Jul 16 2020 at 19:15):

starseeker (Jul 16 2020 at 19:15):

Sean (Jul 16 2020 at 19:16):

starseeker (Jul 16 2020 at 19:17):

I had something in the notes for doing a deeper dive into the git history... one sec...

starseeker (Jul 16 2020 at 19:18):

Sean (Jul 16 2020 at 19:18):

Sean (Jul 16 2020 at 19:19):

starseeker (Jul 16 2020 at 19:21):

Sean (Jul 16 2020 at 19:22):

starseeker (Jul 16 2020 at 19:23):

starseeker (Jul 16 2020 at 19:31):

If I do the following script I end up with 73882 commit messages, where github (and git log itself) give only 71289 log.sh

Sean (Jul 16 2020 at 19:33):

so he did make a lot of commits into the ova repository, so that's one difference albeit to be expected

Sean (Jul 16 2020 at 19:37):

and he did make a branch for working on binary attributes, so that's explaining the 297 additional commits. if you ran off trunk, you would have seen 1637.

Sean (Jul 16 2020 at 19:38):

starseeker (Jul 16 2020 at 19:39):

Sean (Jul 16 2020 at 19:41):

So only the original question remains -- why his commit count is 7 commits short.

starseeker (Jul 16 2020 at 19:41):

Sean (Jul 16 2020 at 19:43):

starseeker (Jul 16 2020 at 19:44):

I have some C++ code in misc/repoconv (I think it's in the svn_map_commit_revs.cxx file) which could probably be repurposed to actually diff the logs, with some work - svn makes those types of comparisons very annoying...

Sean (Jul 16 2020 at 19:44):

wouldn't be unusual except that everyone else is slightly higher in git with the fake/duplicate commits

starseeker (Jul 16 2020 at 19:46):

Those commits didn't translate, so we may have lost a few there if he did do that and didn't do any move+change commits

Sean (Jul 16 2020 at 19:48):

Should be able to figure this out easily by process of elimination.
I just got the repo cloned -- how do we access the svn rev?

starseeker (Jul 16 2020 at 19:49):

Sean (Jul 16 2020 at 19:49):

if we can get a list of svn-to-sha, they can be eliminated from the svn list or sha list and vice versa .. should be just a handful remaining in both

starseeker (Jul 16 2020 at 19:50):

Sean (Jul 16 2020 at 19:51):

got it: git log --all --pretty=format:"%H %N" --grep svn:revision:29886|awk '{system("git checkout "$1)}'

starseeker (Jul 16 2020 at 19:51):

It's probably pretty slow for scripting - I wasn't trying to performance optimize, thinking it was just for checking out one svn rev...

starseeker (Jul 16 2020 at 19:53):

Make sure you cloned the notes too, by the way, or that won't work: git fetch origin refs/notes/commits:refs/notes/commits

Sean (Jul 16 2020 at 19:54):

starseeker (Jul 16 2020 at 19:55):

Sean (Jul 16 2020 at 19:55):

starseeker (Jul 16 2020 at 19:56):

it drives me nuts that git won't pull the notes by default... probably another one of those decisions like not tracking file moves.

Sean (Jul 16 2020 at 19:56):

starseeker (Jul 16 2020 at 19:59):

Was Tom an SVN era only committer or did he have CVS commits? Things get a lot more wonky when we cross the CVS threshold...

starseeker (Jul 16 2020 at 20:02):

I wish bob would put at least his name on his github account - his account name by itself looks rather bleak

Sean (Jul 16 2020 at 20:07):

this is going to take a lil while, but getting closer .. lots of curious little discrepancies to chase down

Sean (Jul 16 2020 at 20:08):

just looking at svn revisions, some clearly didn't map, so it'll be easy to find those -- I suspect they're something categoric like adding directories or moving files

Sean (Jul 16 2020 at 20:08):

starseeker (Jul 16 2020 at 20:10):

@Sean Are you checking just Tom's commits, or doing a whole-history analysis? If the latter you'll probably see on the order of a couple thousand commits that won't line up, at a guess.

starseeker (Jul 16 2020 at 20:14):

Any git commit without a note doesn't have a matching SVN commit (or at least, an identified one) although the "preliminary move commit" commits arguably do map to specific revisions (I just didn't bother assigning the rev number, since the subsequent change commit is the one that should actually restore the tree to the state that matches the SVN commit.)

Sean (Jul 16 2020 at 20:15):

heh, why would I expand scope on a specific discrepancy? that'd be terrible way to go about v&v :)

Sean (Jul 16 2020 at 20:16):

just checking tom's to understand this delta. it's 7 commits, should be easy to isolate and understand.

starseeker (Jul 16 2020 at 20:16):

Wasn't sure what you were up to - "lots of curious little discrepancies" sounded omnious

starseeker (Jul 16 2020 at 20:18):

At various points when debugging the conversion, I generated lists of sets of unmapped commits. Can't say I'd look forward to it, but if you need me to I can prepare a complete list of SVN brlcad commits and the corresponding git log and produce the sets of commit deltas.

Sean (Jul 16 2020 at 20:18):

well, in trying to pin it down, a couple more numbers aren't adding up like if I do a git log on browder and pull all that have an svn ID, I get 1626 commits on trunk, 1919 on all .. which is slightly off the 1630 on the public site and 1930 you reported via some script

starseeker (Jul 16 2020 at 20:20):

If I remember correctly commits that are only locatable by tags won't show up in the default git log listings, which was the reason for that crazy script to introspect everything.

Sean (Jul 16 2020 at 20:22):

No need, it's easy to pull the mapping with: git log --pretty=format:"%H %N" | grep revision | sed 's/svn:revision://g'

Sean (Jul 16 2020 at 20:22):

Sean (Jul 16 2020 at 20:27):

Just FYI, now that we're really close, I'm planning on doing actual v&v on the repo to sanity check everything. I don't expect to find anything cause I know you poured heart and soul into in the previous revisions, but still better to find any problems now rather than later.

Sean (Jul 16 2020 at 20:29):

Basically just looking for any actual data loss, like commits missing that shouldn't be missing or something that's off by one or some other bug.
Nothing exhaustive, nothing to hold anything up either. Just basic comparative testing to see if we understand and expect all the differences.

Sean (Jul 16 2020 at 20:30):

Sean (Jul 16 2020 at 20:43):

@starseeker so is the account name actually preserved anywhere? It's okay if it's not, but I'm not seeing it and thought it was getting collapsed and preserved somewhere.

starseeker (Jul 16 2020 at 20:44):

It's not, except in the account-map file. Only other approach I can think of is to add another note line to the commits with the cvs/svn commit name, and that'd be a bit of a job to do right.

starseeker (Jul 16 2020 at 20:45):

I'm afraid to touch the main logic at this point if I don't absolutely have to, which would mean appending another not line with a post-conversion analysis. Doable, but not trivial.

starseeker (Jul 16 2020 at 20:49):

If it's helpful, here's the list my logic generates of SVN commits that have no identifiable corresponding git commit (at least, without analyzing the contents of the diffs themselves, which I have not attempted): svn_list.txt

Sean (Jul 16 2020 at 20:50):

Sean (Jul 16 2020 at 21:01):

Sean (Jul 16 2020 at 21:03):

obviously lots of categoric ones to not worry about that I'd distill to, like all the generated ones and tag commits are non-issues.

Sean (Jul 16 2020 at 21:07):

I have mixed feelings on this. On one hand, it would be nice to preserve the actual user name recorded on that specific commit, but the historic merit is questionable (beyond provenance, which is already lost) and can't think of an actual use case unless mappings are wrong (which reminds me, should check the first and last author in the mapping file specifically).

Sean (Jul 16 2020 at 21:08):

starseeker (Jul 16 2020 at 21:17):

Not exhaustively, and that particular list is CVS era only - I'm working on a more comprehensive one.

starseeker (Jul 16 2020 at 21:19):

I'm inclined to skip it for now - since git notes can be added without impacting the main sha1 repo history, we can always go back and generate the mappings later if we discover it's worthwhile. ( I plan to put the original CVS and SVN repos up in a single archived git repository on the project, to preserve them for potential use when something comes along to dethrone git and some poor sucker gets to do this again.)

starseeker (Jul 16 2020 at 21:30):

These might be a bit more interesting - I disable the limiter and ran the check for all svn commits, as well as printing out unmapped (or at least, not uniquely mapped by commit message) git commits.
svn_list.txt
git_list.txt

starseeker (Jul 16 2020 at 21:32):

The git version is less sophisticated - duplicate commit messages on different commits will show up - but it's a start.

starseeker (Jul 16 2020 at 21:37):

Update - better version of the git_list.txt file that also removes unique timestamp + message matches. < 2k as opposed to almost 6, and visually most of them look like cvs-fast-export breaking down commits differently:
git_list.txt

starseeker (Jul 16 2020 at 21:39):

The branch delete commits are needed to preserve when a branch was removed in SVN, since we can't actually delete the branches in git without unreachable commits being garbage collected.

starseeker (Jul 16 2020 at 21:55):

First number in both lists is the timestamp, so they're sorted chronologically. SVN has commit ids, and git has sha1 hashes. Then for both commit message is shown, which usually gives a hint as to why there's no mapping in the other system.

Sean (Jul 16 2020 at 22:05):

starseeker (Jul 17 2020 at 00:08):

starseeker (Jul 17 2020 at 13:27):

/me pushes his luck by feeding the full 70k+ commit history through the git->fossil converter... curious to see if fossil can handle this.

Sean (Jul 17 2020 at 19:06):

of course it can, no reason it shouldn't. he's pretty consistent in making things robust to scale.

Sean (Jul 17 2020 at 19:18):

I know this is coming late and maybe we hashed it out earlier(??), but given the tooling issues, what about just stashing the cvs/svn rev info as the last line in the commit log?

Sean (Jul 17 2020 at 19:40):

@Daniel Rossberg per your e-mail, you're also welcome to use your brlcad.org alias (rossberg).. which can be pointed to anything, and can be claimed in your github account as an additional address.

starseeker (Jul 17 2020 at 20:12):

Took most of a day to run, but it did work - cool! I present BRL-CAD, in fossil:
brlcad-fossil.jpg

starseeker (Jul 17 2020 at 20:15):

Theoretically possible to stash it there, but once we do we lose the trivial 1-1 commit message correspondence with the earlier repositories. The latter is what let me generate the svn_list.txt and git_list.txt files above - I know we could work around adding the extra info, but the git notes appealed to me semantically (metadata on the commit, rather than part of the core message/data/parent relationship)

starseeker (Jul 17 2020 at 20:16):

Also, I can't incorporate it into the CVS portion of the history without trying to hack the cvs-git tool in some weird way - I'm taking their git output and assigning the notes with our ID numbers post-conversion, rather than during.

Sean (Jul 17 2020 at 20:19):

Once they're in git, the log messages can be edited, so CVS could still be annotated too.

starseeker (Jul 17 2020 at 20:20):

Editing the log messages is (I think) like editing the commit names - it will propagate invalidating the SHA1 hashes all the way up the chain.

Sean (Jul 17 2020 at 20:20):

I get the appeal, but the downsides are starting to dominate the more I work with it.

Sean (Jul 17 2020 at 20:21):

starseeker (Jul 17 2020 at 20:21):

Actually, when you asked the git list they gave us a theoretical way around that.

starseeker (Jul 17 2020 at 20:22):

I never tested it, but it's not a huge issue - in principle I could do a complete regeneration of all the notes information given timestamps and commit messages that match the CVS/SVN messages.

starseeker (Jul 17 2020 at 20:23):

Sean (Jul 17 2020 at 20:23):

right but that's the whole point -- it's really a half-baked feature that isn't working well. the log message is part of the commit and the only reliable place to stash it.

starseeker (Jul 17 2020 at 20:24):

I could do it for the SVN portion of the history, although there is a risk I'll break something - CVS is much harder.

starseeker (Jul 17 2020 at 20:28):

How much do you envision using that information? I was figuring the "svnrev" alias for the gitconfig file would cover the most common use case - check out an svn revision - and those ids would grow steadily less relevant with time... Is the part you're not liking that you don't get the notes in a default git clone?

starseeker (Jul 17 2020 at 20:36):

Actually, doing it even with the SVN history would be a substantial effort as I look at it - over 300 commits would have be manually updated, plus the correct surgery on the C++ commit header generation code.

Sean (Jul 17 2020 at 20:36):

well let's see.. there's:
1) people have to be told that notes exist and use a command they've probably never used before to pull them
2) additional options that must be learned to work with them (e.g., --pretty=format: %N)
3) 72354 commits to add them that show up in log, have to be ignored or scripted around
4) the restriction that if we change any historic commit, we'll need to do surgery to reattach the note
5) the general feeling that notes are half-baked and they're not prioritized to change anytime soon
6) needing to have additional customizations/macros that have to be remembered, maintained, explained
7) the fat that presents to users simply as the last line of the log, so it didn't really buy us more than logical separation
8) logical separation isn't compelling by itself as it could just as easily be stripped from logs (with less machinery than adding it)
9) the svn revs are not visible to an observer without it being explained...

Sean (Jul 17 2020 at 20:37):

I suppose #1 and #9 are related, but separate points on needing to know they exist, and needing to actively take steps to do something about it

starseeker (Jul 17 2020 at 20:37):

@Sean Another possibility is to write a utility to take the completed conversion and construct a new repository from that, incorporating the notes as commits.

starseeker (Jul 17 2020 at 20:38):

(essentially, "replay" the history again, but this time from git->git rather than through all the custom insanity.)

starseeker (Jul 17 2020 at 20:39):

That's probably the most practical option by a long shot, actually, now that I think about it.

starseeker (Jul 17 2020 at 20:40):

Sean (Jul 17 2020 at 20:40):

I actually envision using it on the regular for at least a while until references in trackers and notes and other places become less frequent.. but again, I don't need machinery to do that. I just need it somewhere. A file in the repo would work if the revs didn't change. Since they do, the log becomes the next best place I think. One can grep a log and grab a sha.

starseeker (Jul 17 2020 at 20:41):

So you're wanting something robust even to a full history rewrite, if it comes to that?

Sean (Jul 17 2020 at 20:41):

starseeker (Jul 17 2020 at 20:42):

Not really, git doesn't have the same notion of branch specific histories that svn does

starseeker (Jul 17 2020 at 20:42):

If you want the ability to find the commits made to a branch, and only those commits, you need the branch notes

starseeker (Jul 17 2020 at 20:43):

I think I've got a link somewhere that explains how that works - it's a low level consequence of Git's world view

Sean (Jul 17 2020 at 20:43):

I do recall the conversation a while back
I guess I've just not needed to know that specifically

Sean (Jul 17 2020 at 20:45):

and can't it be derived? I mean I can pull a git tree view and see all the commits on that branch

Sean (Jul 17 2020 at 20:46):

it's of course squirrelly when commits are cherry picked over, but from svn's perspective, they would have presented as being made on the branch too,
unless one peeks at the mergeinfo

starseeker (Jul 17 2020 at 20:46):

You'll see the commits, but git doesn't retain the origin branch for the commit. Once the commit is referenced by multiple branches, they're equal - there's nothing that remember what the "first" branch was. It will work up to a point, but once you start merging multiple directions between branches you lose the origin information

Sean (Jul 17 2020 at 20:47):

starseeker (Jul 17 2020 at 20:48):

Sean (Jul 17 2020 at 20:48):

and even then, I'm not sure what knowing the branch is going to help with. knowing the committer, sure. knowing when or a commit message saying why, sure.

starseeker (Jul 17 2020 at 20:49):

I sometimes want it to know if a particular change took place while I was working in a topic branch, or whether the change took place in trunk.

Sean (Jul 17 2020 at 20:49):

starseeker (Jul 17 2020 at 20:50):

@Sean If I remember correctly, you can see the issue by trying to look at the history of the bullet branch - use git's own tools, and then the method I have in the NOTES file using the branch notes.

Sean (Jul 17 2020 at 20:51):

but that's my point, if you annotate the line and find the hash, and look at the first instance on a git tree, won't you know that?

starseeker (Jul 17 2020 at 20:51):

If I'm trying to review what was done in the branch, but I've merged in trunk/master, it gets hard because suddenly a whole bunch of "master" commits are now part of that branches history, interwoven with the commits made on the branch

starseeker (Jul 17 2020 at 20:52):

I'd have to try that for an individual commit, but if both branches that reference the individual commit are older than the commit itself I don't think you can distinguish which one created it.

starseeker (Jul 17 2020 at 20:55):

Another concrete case - if I want to look at the original development of the CMake build system in the cmake branch, in SVN I can log just in that branch and not see any trunk commits that happened while that branch was live. In Git, once I merged the cmake branch back into master, suddenly all the master commits that took place while the cmake branch was live are effectively part of the history of both branches.

Sean (Jul 17 2020 at 20:55):

I'm still not seeing how that's a problem that needs to be solved. So commits are interwoven... that means cherry picking might be hard. It probably means I should merge more frequently or will make me merge less frequently or, better yet, not be working on a branch for a long time.

starseeker (Jul 17 2020 at 20:56):

It makes it hard for me to follow the commit history of a particular feature's development, without interference from commits in other branches. If I'm the only one that has the problem it doesn't matter particularly, but that was my motivation since it is something that can be done now in SVN (and I have done on occasion).

Sean (Jul 17 2020 at 20:59):

I may use it more than I realize, but I'm still struggling to come up with a case where knowing the branch is going to change my behavior or awareness on something. I'm usually wondering "who wrote this chunk of code, why was it written". I suppose knowing a branch might help indicate that but to date the info's either not existed or come from log messages because branch use has historically been big isolated things.

starseeker (Jul 17 2020 at 21:00):

Right - that's the point though, in Git we lose that isolation. Hang on, let me see if I can give you a concrete example with bullet...

Sean (Jul 17 2020 at 21:00):

like I might consider the binary attributes or opencl branches, they both have lots of changes, so it might be nice to know what changes aren't on trunk

starseeker (Jul 17 2020 at 21:01):

@Sean do you want me to start trying to figure out how to replay the history and consolidate the notes into the commit message?

Sean (Jul 17 2020 at 21:01):

but then maybe I should check out those histories, because I expect the tree view to clearly show what was done on the branch

starseeker (Jul 17 2020 at 21:02):

I may be missing something - see if you can use (say) gitk to visualize the history of the bullet branch

starseeker (Jul 17 2020 at 21:03):

(by the way, for general history browsing I generally use gitk --branches"*" to avoid seeing the notes commits)

Sean (Jul 17 2020 at 21:04):

okay, so then convinced me it's worth keeping for now -- the branch info -- if only because we have a dozen branches with work worth isolating and if it helps isolate them, fair enough

starseeker (Jul 17 2020 at 21:06):

OK, so in the NOTES file I have two aliases defined - logb and logsvnb. The former tries to use git's "standard" information to follow the branch history, and the logsvnb alias uses the notes.

starseeker (Jul 17 2020 at 21:07):

starseeker (Jul 17 2020 at 21:08):

(you can also do what the aliases are doing in scripts, that was just an easy way for me to achieve the result)

Sean (Jul 17 2020 at 21:09):

Sean (Jul 17 2020 at 21:10):

starseeker (Jul 17 2020 at 21:13):

starseeker (Jul 17 2020 at 21:14):

Sean (Jul 17 2020 at 21:15):

starseeker (Jul 17 2020 at 21:16):

Oh, sorry - I figured for the docs to put in a fully populated .gitconfig file as an example, but I haven't assembled it yet (if we decide not to keep the notes in this form it's moot anyway).

Sean (Jul 17 2020 at 21:16):

I know this is all one-time setup, but it really does feel clunky -- I think if we can make it work as the last two lines of the log message, we should and most if not all of this custom can go away

starseeker (Jul 17 2020 at 21:16):

Sean (Jul 17 2020 at 21:17):

well there simply won't be 72k commits that sometimes appear and have to be explained/ignored/parsed over/etc

Sean (Jul 17 2020 at 21:17):

Sean (Jul 17 2020 at 21:18):

starseeker (Jul 17 2020 at 21:19):

Sean (Jul 17 2020 at 21:20):

Sean (Jul 17 2020 at 21:21):

starseeker (Jul 17 2020 at 21:21):

Sean (Jul 17 2020 at 21:21):

and are predominantly at the end of the git log --all listing, but then are partially interwoven ...odd ordering

Sean (Jul 17 2020 at 21:26):

I wish git embraced a feature like svn attributes. I think mercurial supports arbitrary key/value attributes on their objects. sigh

starseeker (Jul 17 2020 at 21:35):

Sean (Jul 17 2020 at 21:38):

you sure it's not easier to update the tooling? seems like it should be easier to not write notes and simply append to the log messages as they are committed.

Sean (Jul 17 2020 at 21:41):

I think I could also probably write a script that adds them to the existing log if that'd help

starseeker (Jul 17 2020 at 21:42):

I've got over 300 manually adjusted commits which would have to be updated by hand (and being off by one character length in any of them will halt the commit) - plus it's now been close to a year since I've mucked in the code that generates the commit headers. And that's still just the SVN portion of the history - I'd need something like git-filter-repo anyway to get the CVS version.

Sean (Jul 17 2020 at 21:43):

starseeker (Jul 17 2020 at 21:44):

Correct - cvs-git generates that, I then post-process it to match SVN commits to CVS->GIT commits

starseeker (Jul 17 2020 at 21:44):

Sean (Jul 17 2020 at 21:44):

Sean (Jul 17 2020 at 21:45):

starseeker (Jul 17 2020 at 21:46):

/me nods - I could have put the svn numbers in the commit messages when I was originally writing that code - in fact I considered it - but it wouldn't have been a universal solution and it complicated the commit message mappings, which had to happen for CVS anyway.

Sean (Jul 17 2020 at 21:46):

it wasn't really apparent the burden or full implications until working with it more

starseeker (Jul 17 2020 at 21:47):

If you want to help, you could take a look at https://github.com/newren/git-filter-repo/ and see if that provides enough power to rewrite the history by pulling the note (if any) from each commit and appending it to the commit message.

starseeker (Jul 17 2020 at 21:50):

The notes associate the information with the commit, so the problem becomes to (for each commit) retrieve the information and assemble the new commit message. Then, it needs to be applied and the history above it rewritten to accommodate the new sha1.

starseeker (Jul 17 2020 at 21:51):

Even with a well tuned process that'll be quite slow, especially for the older commits...

starseeker (Jul 17 2020 at 21:53):

@Sean if you're OK with a mapping file, what about a mapping file for timestamp plus commit message to SVN id? That should be robust if we can supply a way to look up a given commit using those inputs, even if we skip the notes

starseeker (Jul 17 2020 at 21:53):

(by the way, since a default git clone from github doesn't pull the notes, they're not going to be an issue for people unless they go looking for them...)

Sean (Jul 17 2020 at 21:54):

I would just shell script it myself, something like:
oldmessage="git log ..."
git --ammend -m "$oldmessage\nsvn:revision:$revision"

starseeker (Jul 17 2020 at 21:55):

Sean (Jul 17 2020 at 21:56):

I can, but it might delay things for monday -- still working through yesterday's validation check and need to create a few more accounts for the final upload

starseeker (Jul 17 2020 at 21:57):

If we need to figure out another solution that involves the conversion process, Monday is shot anyway...

Sean (Jul 17 2020 at 21:58):

a couple other things I wanted to test too, like what happens if we garbage collect -- are there any orphans now?

starseeker (Jul 17 2020 at 21:58):

Sean (Jul 17 2020 at 21:58):

also, what happens after deleting all the note commits. . and then garbage collecting. is there more to clean up.

Sean (Jul 17 2020 at 21:59):

right, I know -- that's just one of a couple dozen validation things to check on my list

Sean (Jul 17 2020 at 21:59):

starseeker (Jul 17 2020 at 22:00):

/me shakes his head - I think we'd better not plan on Monday. You may find more issues, so let's just wait until you're either confident or have identified specifically where we need to end up to be ready.

Sean (Jul 17 2020 at 22:03):

We did originally plan for there being about 2 weeks of validation. I was going to try an cram as much as possible in 4 days :smile:

Sean (Jul 17 2020 at 22:04):

okay, time to stretch legs.. oof. giving myself nerve issues with so much sitting for months now.

starseeker (Jul 17 2020 at 22:05):

FWIW, this might generate a SHA1 independent map:
git log --all --pretty=format:"%ct%nGITMSG%n%B%nGITMSGEND%n%N%n"

Sean (Jul 17 2020 at 22:06):

oh nice, that eliminates the indentation too.... I was just going to sed that out, but this is better.

starseeker (Jul 17 2020 at 22:06):

We could just commit that, and then the notes wouldn't matter much... most clones wouldn't have them.

starseeker (Jul 17 2020 at 22:07):

That's one thing about git I unreservedly approve of over SVN - it is way way better about programmatic extraction of information.

starseeker (Jul 17 2020 at 22:10):

If you re-clone from github, without pulling the notes, your git log (and gitk) won't show the notes commits even with the --all option.

starseeker (Jul 17 2020 at 22:28):

Here's a demonstration of a command pair that can use a timestamp and message to checkout a specific commit:
sha1=$(git log -F --after=1047583133 --before=1047583133 --grep="* empty log message *" --pretty=format:"%H") && git checkout $sha1

Sean (Jul 18 2020 at 00:04):

Sean (Jul 18 2020 at 00:08):

good to know about the notes, so the extra commits wouldn't be ongoing nuisance unless someone pulls them. but if they're on the commit log, would there be a reason for keeping both? or is that not what you meant?

starseeker (Jul 18 2020 at 00:15):

If we generate a script that is capable of checking out the matching git commit without requiring the sha1, based on the timestamp and some or all of the commit message, then the git notes won't be needed anymore.

starseeker (Jul 18 2020 at 00:16):

I suppose we could strip them, but I'd rather leave them (at least in the primary github repo, even if we don't tell people to grab them by default) in case whatever script we come up with proves to have some sort of problem - then they'd be available as a fallback.

Sean (Jul 18 2020 at 00:20):

Won't they get disassociated when the commits get ammended? guess we can find out..

starseeker (Jul 18 2020 at 00:22):

starseeker (Jul 18 2020 at 00:23):

If we eventually have to change the repo for some reason we'd have to either try the solution the git folks gave us or re-generate the notes, I suppose

Sean (Jul 18 2020 at 00:23):

starseeker (Jul 18 2020 at 00:24):

Sean (Jul 18 2020 at 00:24):

starseeker (Jul 18 2020 at 00:25):

To generate a shell script that can accept a SVN revision number as an input, and do the appropriate checkout based on timestamp and commit message matching to check out the corresponding git commit.

starseeker (Jul 18 2020 at 00:25):

starseeker (Jul 18 2020 at 00:26):

Then we won't need to worry particularly about notes, updating log messages, etc.

Sean (Jul 18 2020 at 00:26):

starseeker (Jul 18 2020 at 00:27):

It would hard code the timestamp and message associations into a case statement, which would use the SVN rev as the lookup key

Sean (Jul 18 2020 at 00:28):

part of the issue was also one of simplicity and obviousness, not having to know some special knowledge to discover the svn rev or have it explained or documented

starseeker (Jul 18 2020 at 00:29):

Changing the commit messages is the most disruptive of all the options - are you sure it's worth it?

starseeker (Jul 18 2020 at 00:30):

You, Nick and I are probably the most likely to need SVN revs, and we're the most able to handle something less obvious...

Sean (Jul 18 2020 at 00:30):

Sean (Jul 18 2020 at 00:31):

Sean (Jul 18 2020 at 00:32):

Sean (Jul 18 2020 at 00:35):

with all the churn and back and forth, an e-mail change seems inevitable ... like if github suddenly becomes persona non grata and we move to gitlab. we might want/need to rewrite all those stupid github privacy aliases .. talk about f'ing vendor lock in.

Sean (Jul 18 2020 at 00:35):

starseeker (Jul 18 2020 at 00:35):

Sean (Jul 18 2020 at 00:36):

starseeker (Jul 18 2020 at 00:36):

Sean (Jul 18 2020 at 00:37):

unlikely to get the old devs like gary to associate an alias he's never used to his github account that he probably never uses.

Sean (Jul 18 2020 at 00:37):

starseeker (Jul 18 2020 at 00:39):

OK, I'll see if I can figure out the amending thing, since there are multiple potential applications/use cases. Just be aware I'm running out of steam, to a degree.

Sean (Jul 18 2020 at 00:40):

Sean (Jul 18 2020 at 00:43):

The usability implications have been somewhat jarring/unexpected, and simpler may be better. We're not losing anything.
And we probably could revert back to notes or attributes or some other feature ends up getting developed. I have to imagine something eventually will..

starseeker (Jul 18 2020 at 00:43):

/me winces. Once the repo goes live, a change of that sort will be disruptive for all forks even if we figure out how to do it.

Sean (Jul 18 2020 at 00:44):

it took us so long to get of sourceforge that github is bound to be obsolete soon.

Sean (Jul 18 2020 at 00:44):

starseeker (Jul 18 2020 at 00:45):

Um. Even then, in principle we could migrate the git repo without breaking forks, if I understand correctly - it would just be a change in origins. The breakage would be if we needed to change emails on old commits (as opposed to associating them with the new accounts, say...)

Sean (Jul 18 2020 at 00:46):

well, yeah -- I think that'd be implicit because of all the github-specific aliases. that only works on github.

starseeker (Jul 18 2020 at 00:46):

Heh - how many times did sourceforge get sold before they started having trouble? That might be a decent yardstick...

Sean (Jul 18 2020 at 00:46):

if github were shuttered, there'd be no way to authenticate/claim those addresses

starseeker (Jul 18 2020 at 00:47):

Sean (Jul 18 2020 at 00:47):

I think people just assume they'd rewrite their author names. I would if I were using one.

Sean (Jul 18 2020 at 00:48):

I find the idea of using a content provider's e-mail alias a bit wonky personally. Unless it's something "too big to fail" like gmail.com ... tech is notoriously unreliable, even fickle ... looking at you yahoo.com

starseeker (Jul 18 2020 at 00:50):

Erik (Jul 18 2020 at 00:58):

starseeker (Jul 18 2020 at 01:00):

To be honest, I'm not all that worried (on a personal level) about my commits showing up anywhere - they haven't for a decade, and I'll live if they don't... as long as the project's stats behave reasonably, whether it ties to my account is secondary.

starseeker (Jul 18 2020 at 01:00):

@Erik you mentioned your git fu being strong - we have a case where where help would be appreciated, if you have any ideas

Erik (Jul 18 2020 at 01:01):

a git repo is a git repo, they can be rewritten, there is no "exporting", it just is

starseeker (Jul 18 2020 at 01:01):

We need to rewrite a git history to take those commits that have notes, and append them to the end of the commit message instead.

Erik (Jul 18 2020 at 01:02):

starseeker (Jul 18 2020 at 01:02):

words like append, rebase, filter-branch, and such hover around this question, but there are additional challenges - such as preserving the original timestamps while doing all this.

starseeker (Jul 18 2020 at 01:04):

we are striving for a degree of fidelity in history preservation that I conclude is somewhat unusual among git users...

Erik (Jul 18 2020 at 01:04):

there are several types of dates kept in git.. author date, commit date, merge date... um, read all the formatting options in the pretty printing section of man git-log

Erik (Jul 18 2020 at 01:05):

starseeker (Jul 18 2020 at 01:07):

Is there some sort of standard "advanced" script for a situation like this, needing extensive (and non-unique) commit msg updates?

starseeker (Jul 18 2020 at 01:08):

If push comes to shove I can manipulate the data at whatever level is required, but it would be nice if there's a pre-packaged answer...

starseeker (Jul 18 2020 at 01:08):

Erik (Jul 18 2020 at 01:10):

there's a hooks directory that can be used for crap like this, the script just has to do one commit, then ask git to clone using it, or filter if you want to try to do it in place, or whatever. Or just write a script to iterate the commits and -amend them. Or ...

Erik (Jul 18 2020 at 01:11):

starseeker (Jul 18 2020 at 01:26):

It's not the breaking it, it's the 200 iterations of breaking it before I manage not to break it...

Daniel Rossberg (Jul 19 2020 at 15:11):

Another issue is that my brlcad.org address is dead. It points to a sourceforge address, which don't accept mails from outside. ~/.forward seems to not work.

Sean (Jul 20 2020 at 23:02):

@Daniel Rossberg your alias no longer points to any sourceforge addresses -- they were all updated recently for everyone for that very reason.

Sean (Jul 20 2020 at 23:03):

starseeker (Jul 21 2020 at 02:38):

@Sean As long as I'm doing this anyway, would you prefer a different format for the SVN revision and branch info than what I was using? It wouldn't be too much more work to change the formatting once I get the initial logic working, if you would prefer something different.

Sean (Jul 21 2020 at 02:55):

Another issue is that my brlcad.org address is dead. It points to a sourceforge address, which don't accept mails from outside. ~/.forward seems to not work.

And of course, not a problem either to keep it on your github address either, can be whatever you want. Just was letting you know it was an option. The aliases are DNS MX records, so they are aliased before they even hit a mail server.

Sean (Jul 21 2020 at 04:17):

starseeker (Jul 23 2020 at 04:05):

Sean (Jul 23 2020 at 04:07):

starseeker (Jul 23 2020 at 04:38):

starseeker (Jul 23 2020 at 04:39):

starseeker (Jul 23 2020 at 04:40):

Sean (Jul 23 2020 at 04:44):

Sean (Jul 23 2020 at 04:45):

starseeker (Jul 23 2020 at 05:17):

Sean (Jul 23 2020 at 05:18):

still got a bit more validation too, but yeah, the last couple names I got were huge wins

Sean (Jul 23 2020 at 06:16):

this is looking good. we're up to 74% of authors - 64 of 86 - and will be up to at least 95% commits after the next run. that should do it!

Sean (Jul 23 2020 at 06:16):

Sean (Jul 23 2020 at 06:21):

there are flags on just three accounts with anomalies that I'll need to investigate. one with too few, one with way too many, and one not linking to their github

starseeker (Jul 23 2020 at 14:14):

Here's an upload with all the bells and whistles - converted all the emails as of account-maps earlier this morning, notes consolidated into commit messages, and just for grins I also wrapped single line commit messages to 72 chars:

starseeker (Jul 23 2020 at 14:18):

Needs a validation check to make sure I didn't accidentally mess something up in the processing, still...

starseeker (Jul 23 2020 at 15:14):

starseeker (Jul 23 2020 at 16:01):

starseeker (Jul 23 2020 at 22:51):

starseeker (Jul 23 2020 at 23:00):

I think I'm pretty much out of stuff I know still has to be done (aside from pulling newer commits of course) - let me know if you spot anything else.

starseeker (Jul 24 2020 at 00:35):

starseeker (Jul 24 2020 at 00:36):

starseeker (Jul 24 2020 at 00:40):

Sean (Jul 24 2020 at 01:22):

yeah, I've got very strongly mixed feelings about inserting newlines where they didn't exist. I feel that's just bad git presentation defaults. Apparently they can be overcome (e.g., default interactive pager is LESS=-S even though it can auto-wrap to screen correctly).

Sean (Jul 24 2020 at 01:24):

I have commits for three accounts to investigate, which I hope to finish up with tomorrow. I'm done with accounts -- we nearly got everyone that made at least 100 commits (woot!). We're definitely getting super close.

Sean (Jul 24 2020 at 01:24):

The log additions for branch and revision look like they were flawless. Trying to find one with no log message to see what it did...

starseeker (Jul 24 2020 at 10:09):

@Sean That was my experience with wrapping - gitk I know can deal with it, but doesn't by default??? (I can only conclude that it's a deliberate design decision, given the feature does exist and works...)

I've got notes somewhere on which option to set (at least for gitk), which we'll probably still want to advise people to do regardless because there are some cases I don't detect as wrappable.

My motivation for wrapping was two fold - 1) if we wrap lines, we'll get better behavior for new users with default tool settings and 2) interfaces/websites/tools that assume "standard" git commit message settings may behave better.

It's quite literally an option in the post-processing tool, so trivial to disable if you decide we shouldn't wrap them.

starseeker (Jul 24 2020 at 10:12):

starseeker (Jul 24 2020 at 10:16):

git log --invert-grep --grep="svn:revision" will list the ones without an svn tag

starseeker (Jul 24 2020 at 10:20):

starseeker (Jul 24 2020 at 10:21):

starseeker (Jul 24 2020 at 10:23):

starseeker (Jul 24 2020 at 11:24):

Ah, whoops - sorry Sean, just messed up that repo with an experiment. Hang, on creating a new one.

starseeker (Jul 24 2020 at 11:33):

Sean (Jul 24 2020 at 21:08):

Sean (Jul 24 2020 at 21:09):

Sean (Jul 24 2020 at 21:10):

starseeker (Jul 24 2020 at 21:39):

starseeker (Jul 24 2020 at 21:41):

starseeker (Jul 24 2020 at 21:42):

Sean (Jul 25 2020 at 01:09):

starseeker (Jul 25 2020 at 01:45):

starseeker (Jul 25 2020 at 01:49):

starseeker (Jul 25 2020 at 01:51):

/me should confirm the svn-fast-export method is working for the other repos, actually - been a while since I tested that.

starseeker (Jul 25 2020 at 14:27):

starseeker (Jul 25 2020 at 14:28):

starseeker (Jul 25 2020 at 14:29):

I've not run it myself, beyond a few commits to see if it looks like it's working - it will be very slow, and there may be more optimal ways to go about checking - this is very much a brute force approach.

starseeker (Jul 25 2020 at 18:38):

We can compare the CVS portion of the history as well if you want to, but I'm not sure what we'd do about any discrepancies - I'm just using the output from cvs-fast-export, so any changes would be quite difficult.

starseeker (Jul 25 2020 at 18:40):

And in that case the true "ground truth" would actually be the equivalent CVS checkout, if we can map the svn revisions back to CVS in some fashion.

starseeker (Jul 25 2020 at 23:02):

starseeker (Jul 26 2020 at 01:43):

@Sean when you said there were authors you need to check, was that in the conversion or the Github integration?

Sean (Jul 27 2020 at 13:59):

Sean (Jul 27 2020 at 14:03):

I hope to have them all inspected today, will ask if there are questions. It's more a matter a tracing down all their master commit shas and seeing what the delta is against their trunk commits, to make sure all the differences can be explained.

Sean (Jul 27 2020 at 14:05):

I'll just note that is your assumption, and not one I would make. Yet you're using it to justify a subsequent decision that all have to live with.

We will also because I have little intention of manually injecting newlines once we're on github for command-line commits. I don't do it for other git repos and don't plan to on ours either except when the commit warrants a longer description and I'm in an editor.

That said, these are sound reasons, save for the caveat I just stated -- that it just means the historic commits might be pretty but not the more recent ones.

It's quite literally an option in the post-processing tool, so trivial to disable if you decide we shouldn't wrap them.

Sean (Jul 27 2020 at 14:11):

I'll just note that is your assumption, and not one I would make. Yet you're using it to justify a subsequent decision that all have to live with.

From my perspective, this is a feature that git and github have wrong. Line wrapping is a presentation issue that is trivially handled by apps. Other distributed vcs didn't make the same decisions, and if we'd picked another we wouldn't even be having this consideration. Which is to say that it's possibly something we'll regret in the future when we migrate to git's successor. Unfortunately, it's trivial to add newlines but it's not trivial to remove them.

That said, these are sound reasons, save for the caveat I just stated -- that it just means the historic commits might be pretty but not the more recent ones.

I don't feel that strongly to oppose it. I do like it neat and tidy though it begs a couple questions (like what column did you wrap on? what about things like URLs? what about punctuation? ..).

It's a little concerning that it's not preserving what actually was written. It's slightly complicating the review process because they don't match and I have to do additional scripting (but I'll deal, just slows things down). Those are not strong enough to argue against though. I think you said you limited it to commits that had only 1-line comments? That's probably a good balance.

Sean (Jul 27 2020 at 14:27):

Github actually appears to handle the long lines just fine (ellipses on presentation). This really is just a git tooling convention / defaults issue. I think I even read how one can make git log behave for a different format line.

But again, not enough to fight against it, just sharing my perspective. I probably wouldn't, but if you want to inject them on the single line commits, I won't fuss too much. :)

Sean (Jul 27 2020 at 16:26):

Hm... one possibility comes to mind. Putting svn:log:wrapped could be used to denote the ones wrapped, which would then make them invertible and an encoding of the original data.

starseeker (Jul 27 2020 at 16:43):

Didn't intend it to be a justification - more an assessment of likelihood of it being changed.

Point. I wasn't strongly advocating for it - I just put it in the test conversion as a demonstration of what I could achieve if it was of interest.

Which actually argues against doing it - don't want newer stuff to look "worse" in some sense.

Column 72 - used the "TextFlow" algorithm, which I gather is similar to what editors do for work wrapping.

I wish you'd said something, I could have generated another version without the wrapping. (Still can, for that matter...)

The posted version is not the final version anyway... Over the weekend I think I figured out how to actually audit and fix the CVS era commits so the git checkout for each commit will match what cvs would produce (still testing, and will take a while to run, but initial results are promising.)

Sean (Jul 27 2020 at 17:07):

Column 74 is I think the minimum only because Git defaults to presenting 4 char indents on log output.

Sean (Jul 27 2020 at 17:08):

Only slightly. But each upload has meant I need to regenerate my list of comparison hashes... ;)

Sean (Jul 27 2020 at 17:09):

On the latter point, what do you think about having the log tags actually denote cvs:revision:### (in addition to the svn revision) for the cvs portion?

Sean (Jul 27 2020 at 17:10):

if there's a way to record the actual account name used (not just the mapped account name) for both cvs and svn, that would be a nice-to-have preservation. if not, no biggie.

Sean (Jul 27 2020 at 17:13):

Sean (Jul 27 2020 at 17:15):

by the way, I updated https://brlcad.org/wiki/Github_Migration with all the migration steps as I'd envisioned them. I may have forgotten a step or two, but I think most of it is there. I did try to make sure it incorporated all the points you mentioned in your (more elaborate) discussion.

Sean (Jul 27 2020 at 17:16):

Of course, some of the verification steps may cause more verification steps, but it's got the gist of what's needed.

starseeker (Jul 27 2020 at 20:05):

Growl... well, I can change CVS era commits but auditing them is proving trickier than I'd hoped in some ways... specifically, what do I check out from CVS when Git says a particular commit is on a dozen branches?

starseeker (Jul 27 2020 at 20:17):

About cvs:revision - correct me if I'm wrong, but did the CVS tool actually have revision numbers? I thought all we had was the numbers SVN assigned various commits when the cvs2svn migration occurred.

starseeker (Jul 27 2020 at 20:18):

I'm checking out by date and -r tag (when trunk/master isn't available) - is there another option?

starseeker (Jul 27 2020 at 20:19):

Are the CVS commit names different from the SVN authors? I'd been assuming a 1-1 mapping there, but perhaps I'm wrong?

starseeker (Jul 27 2020 at 20:40):

I suppose one possibility might be to add the cvs checkout lines corresponding to each commit...

starseeker (Jul 27 2020 at 20:42):

Sean (Jul 27 2020 at 21:00):

revisions in cvs are per file -- akin to git. there is no global number like svn.

Sean (Jul 27 2020 at 21:02):

At least, there's a swath of names that only exist in cvs, a swath that exist in cvs and svn, and a swath that are only in svn

Sean (Jul 27 2020 at 21:04):

Sean (Jul 27 2020 at 21:06):

so for example, Markowski had commits as 'mmark' under rcs and 'mm' under cvs (or vice versa). I had commits as 'morrison' under cvs, never as that via svn though.

Sean (Jul 27 2020 at 21:08):

not a terrible loss, but it'd be really cool if we could preserve that original commit account name per commit. there's some semantic repo history that would be preserved just by knowing the name.

starseeker (Jul 27 2020 at 21:56):

So if we can ID which account names are unique to CVS, we could flag them. A quick check shows svn:account:mmark and svn:account::mm both present in the conversion, so the names made it. A cvs prefix could probably be added based on which commits originally came from the CVS conversion - I'd have to think about that, but it's probably possible.

starseeker (Jul 27 2020 at 21:56):

Actually, that might be best - just prefix with cvs:account or svn:account based on which VCS the commits came from.

starseeker (Jul 27 2020 at 21:57):

revisions will always be svn:revision (those commits that have it) since the numbers came from SVN.

starseeker (Jul 27 2020 at 21:58):

branches are trickier, but based on my experiences so far I'd rather just leave the svn branches alone - my brain hurts trying to sort out the various mappings, and I doubt it's terribly critical as long as git blame can walk back through the history successfully.

Sean (Jul 28 2020 at 01:46):

Actually since you mentioned it about mmark and mm, it looks like you already have it doing the right thing -- it's using the account username as originally committed for both svn and cvs. That's great!

Sean (Jul 28 2020 at 01:48):

That would be a pretty slick detail. We'd actually be able to distinguish three "generations" of commits, hah.

starseeker (Jul 28 2020 at 11:35):

I think I've found a way to associate the author ids (and cvs-fast-export's branch analysis) with the comments in the final conversion. I'll need to actually test applying the data in repowork, but I've got a script now that looks like it is successfully extracting the information. (misc/repoconv/cvs_info.sh)

starseeker (Jul 28 2020 at 20:50):

starseeker (Jul 29 2020 at 13:04):

@Sean Is there anything more I can do? I'm not sure it makes sense to have me do the check steps, since I'd basically be re-using the same logic I put together to do the conversion in the first place, but if there's anything that will move the process forward I'd like to help...

Sean (Jul 29 2020 at 14:26):

best you can probably do is probably just having a bit of patience, however frustrating.. :) you're right -- you can't / shouldn't verify since you may unintentionally dismiss or overlook something whereas someone else won't know to. sumanga and I don't know your conversion logic at all, so this is nice indep validation. :)

starseeker (Jul 29 2020 at 14:33):

starseeker (Jul 29 2020 at 14:47):

I had to make one adjustment post brlcad_conv9 to get the spacing right for the CVS-only comments - should I upload that version of the repo?

starseeker (Jul 29 2020 at 14:49):

(I know you mentioned the changing sha1 values in the various versions was a pain, so I wanted to check...)

Sumagna Das (Jul 29 2020 at 19:17):

Sumagna Das (Jul 29 2020 at 19:19):

starseeker (Jul 29 2020 at 19:57):

Sumagna Das (Jul 29 2020 at 19:57):

starseeker (Jul 29 2020 at 19:59):

I'll let you know if one appears that would motivate a restart in the check, but unless someone finds an actual error I doubt it will be necessary at this point...

starseeker (Jul 31 2020 at 19:49):

Sumagna Das (Jul 31 2020 at 19:50):

somehow the local copy of the github repo got wiped and all i know was that the last revision being checked was 75007

Sumagna Das (Jul 31 2020 at 19:51):

starseeker (Jul 31 2020 at 19:51):

Sumagna Das (Jul 31 2020 at 19:52):

it starts first with the github repo commits, check them and then checkout the svn revision

Sumagna Das (Jul 31 2020 at 19:52):

starseeker (Jul 31 2020 at 19:56):

@Sumagna Das The github checkout has the svn revisions in the comments - could you just filter out any commits that have a number higher than 75007?

Sumagna Das (Jul 31 2020 at 19:56):

starseeker (Jul 31 2020 at 19:57):

Sumagna Das (Jul 31 2020 at 19:57):

starseeker (Jul 31 2020 at 19:58):

starseeker (Jul 31 2020 at 20:01):

Sumagna Das (Jul 31 2020 at 20:01):

Sumagna Das (Jul 31 2020 at 20:29):

Sumagna Das (Jul 31 2020 at 20:30):

my script starts checking from the commit which is checked out at the moment on the git repo

starseeker (Aug 09 2020 at 16:45):

Sean (Aug 09 2020 at 17:24):

Got through two checks the past week, looking good so far. Few more to go. Hoping we will be able to go live soon, maybe next weekend if these check go good.

Sean (Aug 09 2020 at 17:24):

starseeker (Aug 09 2020 at 17:25):

starseeker (Aug 09 2020 at 17:26):

When I checked, all the SVN era differences I saw where when his script tried to compare brep-debug commits with trunk. CVS era is messier, as expected.

Sumagna Das (Aug 13 2020 at 07:01):

starseeker (Aug 15 2020 at 15:10):

starseeker (Aug 15 2020 at 15:11):

Sumagna Das (Aug 15 2020 at 16:42):

Sean (Aug 16 2020 at 18:14):

starseeker (Sep 02 2020 at 22:54):

Sean (Sep 03 2020 at 05:38):

not really a competition, it's been a full-stop shift to eye-bleeding commit reading for hours on end...

starseeker (Sep 03 2020 at 11:45):

/me winces. Well, hopefully the commit storm will be letting up after this for a while.

starseeker (Sep 03 2020 at 16:30):

starseeker (Sep 15 2020 at 17:26):

@Sean Just FYI, realized my updates were missing the svn commit ids for newer commits, in case you were using my github test conversion. New version, current as of last night with all commits, up at https://github.com/starseeker/brlcad_conv11

Sumagna Das (Sep 19 2020 at 15:44):

starseeker (Sep 19 2020 at 18:54):

Sumagna Das (Sep 19 2020 at 18:55):

starseeker (Sep 19 2020 at 18:56):

From a technical standpoint the main SVN->Git conversion is essentially complete (barring discovery of some significant, heretofore unnoticed problem).

The migration of the secondary data hasn't been as thoroughly explored - that'll probably be tricky, and hasn't (yet) been tested.

starseeker (Sep 19 2020 at 19:02):

starseeker (Sep 19 2020 at 19:05):

Ah, right, now I remember. Unfortunately I don't have admin privileges on BRL-CAD necessary to do the export...

Sumagna Das (Sep 19 2020 at 19:21):

or atleast someone who has admin privileges who can give the exported stuff needed

starseeker (Sep 19 2020 at 19:22):

I don't think any of my old projects have anything to export in this department, certainly not on a scale like BRL-CAD's...

starseeker (Sep 19 2020 at 19:24):

@Sumagna Das One question I don't know the answer to yet is what the best way to handle unmerged patches is. On github they're pull requests, but on sourceforge they're patch files... I don't know off hand how we're going to handle patch file submissions to github. Have you seen anything about how people address that problem?

starseeker (Sep 19 2020 at 19:26):

Maybe the gosf2github script migrates them somehow, since it looks like sourceforge categorizes bugs, patches and feature requests as tickets...

Sumagna Das (Sep 19 2020 at 19:28):

starseeker (Sep 19 2020 at 20:13):

/me is not quite sure what gosf2github is talking about with setting up oauth... never done that before

starseeker (Sep 19 2020 at 20:37):

OK, I think the "Personal access tokens" will work, but the perl script is a bit cranky...

starseeker (Sep 19 2020 at 20:40):

Blegh. This begs for a detailed, step-by-step guilde for folks unfamiliar with any of this...

starseeker (Sep 19 2020 at 20:46):

starseeker (Sep 19 2020 at 20:47):

Yeesh. I guess someone needs to experiment with this stuff on a test import of BRL-CAD in the org project...

starseeker (Sep 19 2020 at 20:48):

starseeker (Sep 19 2020 at 20:52):

Sean (Sep 19 2020 at 20:56):

starseeker (Sep 19 2020 at 20:59):

starseeker (Sep 19 2020 at 21:09):

starseeker (Sep 19 2020 at 21:20):

Essentially painless if we do it before pushing to github - I'll just note it in the CONVERT.sh script.

starseeker (Sep 19 2020 at 21:29):

starseeker (Sep 19 2020 at 21:31):

OK, looks like our contributions stats are still there after the default branch rename too.

Phew. Had a few bad moments wondering if we were going to have to re-run the whole thing again to get commits reassigned...

starseeker (Sep 25 2020 at 23:43):

Sumagna Das (Oct 13 2020 at 06:54):

Erik (Oct 15 2020 at 22:53):

starseeker (Oct 16 2020 at 02:48):

"main" appears to be the new convention. I like it - it's shorter and still starts with the same letters.

starseeker (Oct 20 2020 at 01:30):

Sean (Oct 20 2020 at 01:32):

I worked on it some this past weekend. Will update the checklist with things done tomorrow to see where we're at.

starseeker (Oct 20 2020 at 02:50):

starseeker (Oct 20 2020 at 02:51):

starseeker (Oct 20 2020 at 02:57):

starseeker (Oct 20 2020 at 03:12):

starseeker (Oct 20 2020 at 03:14):

starseeker (Oct 20 2020 at 12:42):

starseeker (Oct 27 2020 at 01:40):

starseeker (Oct 30 2020 at 15:19):

starseeker (Nov 09 2020 at 23:30):

Erik (Nov 11 2020 at 22:49):

is he doing that thing were he defers working on it a week every time someone bugs him about it? :D

Sean (Nov 11 2020 at 22:50):

starseeker (Dec 02 2020 at 18:34):

Sumagna Das (Dec 03 2020 at 13:49):

starseeker (Dec 03 2020 at 22:22):

starseeker (Dec 03 2020 at 22:26):

Sean (Dec 04 2020 at 02:01):

excellent, I'm going to try and push on it this friday now that a particular render task is finishing up.

Sean (Dec 04 2020 at 02:02):

starseeker (Dec 04 2020 at 13:07):

starseeker (Dec 05 2020 at 18:48):

@Sean Just wanted to check if you were/are able to push on the Git conversion - I can make more of an effort to keep the github repo in sync with SVN if it is helpful, but otherwise it's a little simpler to only do it every few hundred commits...

starseeker (Dec 06 2020 at 18:21):

starseeker (Dec 10 2020 at 19:05):

starseeker (Dec 17 2020 at 15:07):

starseeker (Dec 17 2020 at 15:08):

starseeker (Dec 17 2020 at 16:08):

Sean (Dec 17 2020 at 19:02):

Sean (Dec 17 2020 at 19:03):

starseeker (Dec 17 2020 at 20:32):

Sean (Dec 17 2020 at 20:32):

starseeker (Dec 17 2020 at 20:32):

Gotta say, I like the dark github theme - previously their website was the brightest thing on my desktop

starseeker (Dec 20 2020 at 20:08):

/me blinks - rsyncing the SVN repo from sf.net didn't complete. That's a new one...

starseeker (Dec 20 2020 at 21:00):

starseeker (Dec 20 2020 at 21:41):

@Sean barring something unforeseen, that's probably my last update of both SVN and Github for the year.

starseeker (Dec 20 2020 at 21:49):

I've not done the full cross platform distcheck-full hammering for release testing since the gqa multithreaded test will currently fail, but otherwise things are generally looking like they're in fairly good shape...

Aniket Khandagale (Dec 21 2020 at 12:22):

Hey can i get the link for github repo where i can look for beginner level,easy to fix problems and try to fix them

Thusal Ranawaka (Dec 21 2020 at 13:29):

Hey @Aniket Khandagale welcome to BRL-CAD Community, I think you can have a look at BRL-CAD Wiki www.brlcad.org/wiki and start with compiling BRL-CAD to your PC. You can find build instructions from here, https://brlcad.org/wiki/Building_from_SVN

Thusal Ranawaka (Dec 21 2020 at 13:31):

If you want assistance, ask from Community and also you can ask from Sean and starseeker.

Thusal Ranawaka (Dec 21 2020 at 13:32):

Sumagna Das (Dec 21 2020 at 17:43):

@Aniket Khandagale BRL-CAD is available on sourceforge(SVN). it is being migrated from sourceforge (svn) to github (git) so there is no (official) github repo. (there is one where the migration is happening but it is not up to date and behind the main repo by a couple of commits (or revisions, as per SVN terminology).

Aniket Khandagale (Dec 21 2020 at 18:01):

Sumagna Das (Dec 21 2020 at 18:03):

Aniket Khandagale (Dec 21 2020 at 18:04):

scorp08 (Dec 26 2020 at 05:38):

@starseeker Is it possible to fork from blrcad github, push staff or wait to finish migration??

starseeker (Dec 26 2020 at 14:55):

It's technically possible to fork, but the repository of record is still SVN at the moment. I'd recommend waiting for us to complete the migration.

Sean (Dec 26 2020 at 14:57):

Yes, best to wait or things could get messy when it comes time to switch it. I've been going through the repo so it hopefully won't be a long additional wait for folks.

starseeker (Jan 03 2021 at 20:39):

starseeker (Jan 16 2021 at 21:48):

starseeker (Jan 26 2021 at 19:02):

I'll post an announcement to the email list as well, but my plan is to lock the SVN repository sometime on Friday, Jan. 29th to finalize the repository contents for the Git conversion.

Sumagna Das (Jan 27 2021 at 02:35):

starseeker (Jan 27 2021 at 13:28):

@Sean is doing final review - I'm going to start uploading the secondary repositories while he finishes looking at the main repository.

Sumagna Das (Jan 27 2021 at 14:39):

starseeker (Jan 27 2021 at 15:03):

@Daniel Rossberg I've uploaded the svn-all-fast-export conversions of all the projects except BRL-CAD itself to https://github.com/BRL-CAD - can you take a look at rt-cubed and make sure it looks OK to you before anyone starts committing to it?

Daniel Rossberg (Jan 27 2021 at 16:31):

Git doesn't know empty directories, that's why they got lost from src/other/ogre. But, as far as I know, Ogre isn't used anywhere.

starseeker (Jan 30 2021 at 23:58):

Sean (Feb 01 2021 at 18:18):

I spent most of the weekend validating and reviewing. It's looking really fantastic to me. I have questions, but no show-stoppers. I actually got through the laundry checklist I'd written up to identify all the deltas as document discrepancies. Filed support request for --follow and doing one more pass through the log of missing commits now. Planning to upload repo myself today so I know the process, unless there's some reason not to.

starseeker (Feb 01 2021 at 22:57):

starseeker (Feb 01 2021 at 22:58):

Erik (Feb 02 2021 at 12:55):

if an empty directory is needed by git, typically a .do_not_delete file is touched

starseeker (Feb 02 2021 at 13:10):

I'm not aware of any situation where we actually need an empty directory in the raw source repo - if nothing else, it's simple to have the build system or the code create such directories on the fly...

Erik (Feb 02 2021 at 13:11):

starseeker (Feb 02 2021 at 13:12):

@Erik It looked like your git isst repo had everything from SVN's isst as well - is that correct?

Erik (Feb 02 2021 at 13:14):

I'm sure it was just an import, I hope I tried to make the introduction commit as basic as possible and did the "tidy" as a next commit... it's been a few, yo :)

Erik (Feb 02 2021 at 13:15):

if not, I hope someone is archiving the svn repo and the latest snapshot, y'know, "just in case". I'm sure the DoD can still afford the bits :)

starseeker (Feb 02 2021 at 13:15):

Erik (Feb 02 2021 at 13:16):

starseeker (Feb 02 2021 at 13:17):

I may have to "top off" the SVN portion depending on whether I need to make more SVN commits before the final switch, but if some poor soul has to repeat the VCS conversions for whatever reason they should have the necessary inputs to work with.

starseeker (Feb 02 2021 at 13:18):

Sean (Feb 02 2021 at 18:08):

@starseeker still no show-stoppers but found a couple oddities. there are about 100 "* empty log message *" cvs commits that exist in git and svn but are missing the corresponding svn:revision:#### line. would that be because of timestamps or something else?

Sean (Feb 02 2021 at 18:09):

Sean (Feb 02 2021 at 18:13):

there are also 139 empty log message cvs commits in addition to the 100 that don't seem to be in git, but I'm writing them off as different cvs2git vs cvs2svn translation until I see evidence otherwise.

starseeker (Feb 02 2021 at 18:49):

@Sean I was probably somewhat hesitant about mapping SVN numbers to those commits - with such ambiguous messages, all I had to go on for those was the timestamps, and the git and svn conversions of CVS didn't always end up exactly mapping those.

starseeker (Feb 02 2021 at 18:51):

Looking at the logic, I don't know that I did a whole lot with the empty log message commits.

starseeker (Feb 02 2021 at 18:52):

starseeker (Feb 02 2021 at 18:54):

I see that I categorized some commits as "non-unique, has exact timestamp match" but I think without manual inspection of the diffs I wouldn't have had the confidence to assign them SVN ids

starseeker (Feb 02 2021 at 18:54):

Even the "Initial revision" commit assignments, which I did make, are a bit dubious

starseeker (Feb 02 2021 at 18:57):

starseeker (Feb 02 2021 at 19:02):

starseeker (Feb 02 2021 at 19:03):

starseeker (Feb 02 2021 at 19:12):

Looks like when I ran that it was against a git repo that didn't have newer 76300+ commits. The following is cleaned up and sorted:

starseeker (Feb 02 2021 at 19:12):

Sean (Feb 02 2021 at 19:48):

So 735 there is an example -- it's the first in my list. It's not got a timestamp match, so you didn't know which svn :revision that was (which is curious in itself)

Sean (Feb 02 2021 at 19:49):

Sean (Feb 02 2021 at 19:51):

under what conditions would cvs commits get or not get the svn:revision:### note?

starseeker (Feb 02 2021 at 19:56):

I know about SVN commit 735, but in the git repository I don't have a commit with an exact timestamp match with the same commit message

starseeker (Feb 02 2021 at 19:57):

What you're seeing in that log is a processing of a detailed log from SVN, combined with a log of available git commits.

starseeker (Feb 02 2021 at 19:58):

For that printout, all SVN commits were checked against what was/is available in git

starseeker (Feb 02 2021 at 20:04):

1) there exists an exact, unique commit message that is shared by an SVN commit and a Git commit
2) There exists an SVN commit with a non-unique commit message match that also shares an exact timestamp with a Git commit having the same commit message
3) The special case commit message "Initial revision" when there exists a Git commit with an exact timestamp match, and the timestamp match is outside the known "bad" range of early commits with unreliable timestamps.

Sean (Feb 02 2021 at 20:05):

Yeah, that's odd. The timestamp you have for 735 is different in cvs2git from what svn had...

Sean (Feb 02 2021 at 20:05):

Sean (Feb 02 2021 at 20:06):

Sean (Feb 02 2021 at 20:07):

starseeker (Feb 02 2021 at 20:10):

I will readily admit I didn't delve into the details of how cvs2git and cvs2svn differed in their processing, so I can't say which one is right or better. For myself I wasn't worried about it - in some circumstance where precision for a given commit's timing mattered, I'd want to query CVS directly...

A word of caution - now that we're no longer using git notes for svn revision information, any updates to add more mappings are going to be difficult (not impossible, but it will be another custom processing implementation in repowork.)

Sean (Feb 02 2021 at 20:12):

No worries, not seeing any reason to reprocess anything -- just accounting to make sure nothing is missing. And simply trying to understand.

Sean (Feb 02 2021 at 20:14):

I was able to rule out all 81 "Initial revision" commits for example, as they're clearly all categorically present, just not labeled (likely your #3 above).

Sean (Feb 02 2021 at 20:16):

I think #1 may also be accounting for a lot of the 400 remaining. Many have a repeat commit message but that was made (sometimes seconds) later. If there's some discrepancy between the clock being used, that would also potentially account for more. I should know more definitively here in a bit.

starseeker (Feb 02 2021 at 20:17):

@Sean If you're doing the grunt work to go identify mappings manually, you may as well make a note of the mappings. If you're going to that degree of trouble, I might as well do the extra work to capture it in the commit messages...

starseeker (Feb 02 2021 at 20:21):

starseeker (Feb 02 2021 at 20:25):

One of the problems though is I don't expect some of them to have exact 1-1 mappings at all, since cvs2git may have grouped things differently.

starseeker (Feb 02 2021 at 20:33):

Pardon, my terminology was loose - the tool we're using is cvs-fast-export, not cvs2git.

Sean (Feb 02 2021 at 22:04):

Hm, yeah I have that info. I basically wrote two 1-liners to pull a diff of the missing svn revs and and of all git revs, then a 1-liner to make sure they're all accounted for. I could make it print which commit is actually which missing rev.

starseeker (Feb 02 2021 at 22:33):

Sean (Feb 02 2021 at 22:37):

technically I'm comparing the md5 sum of just the changed/added/removed lines, but yeah. it was also needed to figure out which missing commits were because they were just propset changes.

Sean (Feb 02 2021 at 22:37):

starseeker (Feb 02 2021 at 22:39):

starseeker (Feb 03 2021 at 02:22):

@Sean upon reflection, I'm second guessing myself - if I update the older commit messages, it changes all the sha1s again and arguably we would need to do more verification to make sure the new step didn't mess with anything. Maybe it's not worth it for the stray svn:revision tags?

starseeker (Feb 03 2021 at 03:17):

With a diff based approach, you might in principle be able to spot if any of the Initial Revision commits ended up mapped wrong despite exact timestamp matches...

starseeker (Feb 03 2021 at 03:18):

If I ended up assigning demonstrably incorrect numbers, that's probably worth fixing...

Sean (Feb 03 2021 at 03:21):

Sure, can revisit the decision -- my priority has been on finding / validating they're there somewhere. If they're all there and just not tagged, I agree that's less of a concern. I mean it'd be cool to have them all tagged, but that can happen at a later date even and we make everyone re-clone.

Sean (Feb 03 2021 at 03:22):

Hey, can you take a look at a couple commits and tell me what I'm seeing...
4850989e3a2f9624127ae043c6094076a60bc472 and 97d02527843ffb84f8bb3da0e64ef5f7db6df28c

starseeker (Feb 03 2021 at 13:40):

I'm not entirely sure what those are - some artifact of the cvs-to-git conversion process, obviously, but I'm not entirely clear on what they're trying to represent.

starseeker (Feb 03 2021 at 14:42):

I haven't considered trying to "clean up" any of the cvs era artifacts of the conversion, since I don't know which of them might be added to preserve content that would otherwise be garbage collected out.

Sean (Feb 03 2021 at 16:08):

at a quick glance, they look like the entire repository was deleted. they're the two largest commits in the git repo. they're fortunately in branches, but would be good to understand what's going on there because it smells like something went wrong

Sean (Feb 03 2021 at 16:09):

I recall the 7.0 branch and don't remember any sort of merge event like that happening ...

starseeker (Feb 03 2021 at 17:49):

starseeker (Feb 03 2021 at 17:51):

starseeker (Feb 03 2021 at 20:37):

@Sean FWIW, I think I've gotten the necessary piece in place to do the sha1;rev# updating successfully. I'd still want to run your diff check on the final results and probably inspect the updated commits to be sure, but a quick test with your 735 example succeeded.

starseeker (Feb 03 2021 at 20:38):

Sean (Feb 03 2021 at 20:40):

It would be nice to understand why either of those branches appear to wipe out everything (if that's indeed what happened), even if were not going to do anything about it. I think that'd entail checking out one of those branches and looking at the commits before/after to see if there's an explanation. Not a show-stopper since they're on branches, but concerning from a data anomaly perspective.

Sean (Feb 03 2021 at 20:43):

Good to know about the sha/rev updating. At a quick glance, lookup succeeded on about 1/2 to 2/3rds of the commits missing. I'm looking at the ones that didn't match to see if they're actually missing or if there's something in the diffing method pooching things. There are a few dozen that map 1:many that we can either ignore or map manually by their date, but I wasn't going to worry about them.

starseeker (Feb 04 2021 at 01:13):

@Sean The immediate question is whether they did do that or cvs-fast-export is misinterpreting some aspect of the CVS data.

starseeker (Feb 04 2021 at 01:16):

FWIW, I think a685e85ff730450f669a0d853c69ef545c30b46f may be related to the 97d02527843ffb84f8bb3da0e64ef5f7db6df28c commit

starseeker (Feb 04 2021 at 01:17):

"remove the cvs tag relic" may be why the prior incomplete tag commit removed everything?

starseeker (Feb 04 2021 at 01:20):

Ah, wait a minute - I wasn't looking closely enough. "merge-to-head" incomplete tag (4850989e3a2f9624127ae043c6094076a60bc472) is an SVN era commit, and also seems to have an associated commit (dd2bb79965568f5aab4f7458606d875d22b74b40)

starseeker (Feb 04 2021 at 01:21):

starseeker (Feb 04 2021 at 01:28):

97d02527843ffb84f8bb3da0e64ef5f7db6df28c - Synthetic commit for incomplete tag release-7-0 - CVS era commit
a685e85ff730450f669a0d853c69ef545c30b46f - child of 97d02, SVN era commit. Message:
clearly not actually release 7.0 .. remove the cvs tag relic that was made on a few files just before the project was converted to open source. (svn branch delete)

4850989e3a2f9624127ae043c6094076a60bc472 - Synthetic commit for incomplete tag merge-to-head-20051223 - CVS era commit
dd2bb79965568f5aab4f7458606d875d22b74b40 - child of 485098, SVN era commit. Message:
move cvs branch tagging artifact removal (svn branch delete)

starseeker (Feb 04 2021 at 01:33):

So, my guess is that the CVS conversions (evidently both of them, cvs2svn and cvs-fast-export) found something in the data prompted tagging. Based on the 7.0 message, it looks like a stray tag was on a few files, the converter interpreted that as a tag in Git that preserved only those files and removed everything else (hence the massive diff.)

Back in 2011, you did some cleanup on the SVN branches and spotted those as spurious. So, we've got the cvs-fast-export generated tags and associated branches, and then the 2011 SVN cleanup of the cvs2svn versions of the same thing.

starseeker (Feb 04 2021 at 01:38):

@Sean Your call how you want to handle the 1-many - I'm pretty sure I can handle that in the svn:revision assignment, as long as each sha1 maps to only one SVN rev.

starseeker (Feb 04 2021 at 03:12):

(I think if we delete the two branches in question from Git we can probably garbage collect them out, by the way - do we want to preserve that, or would it be better to remove?)

Sean (Feb 04 2021 at 03:20):

Sean (Feb 04 2021 at 03:21):

Yeah, okay I can see that happening and how it might have gotten intepreted -- a branch was tagged, the branch was removed, but a few stray files from that branch ended up remaining tagged/referenced, so it generated delete commits to preserve their lineage.

Sean (Feb 04 2021 at 03:23):

There were actually a dozen or two commits very similar to those, which is also why it was concerning (they were just the biggest two), but I think that fully explains them.

Sean (Feb 04 2021 at 03:24):

I think we can just ignore them for now. They're not the only ones, they just stood out during validation as potential processing corruption.

Sean (Feb 04 2021 at 03:24):

Erik (Feb 04 2021 at 14:30):

starseeker (Feb 04 2021 at 14:33):

@Erik Did folks manually edit CVS files to tag releases or some such? I know from what Sean said the history was edited at least once to deal with some Tcl/Tk issues (which can be seen comparing CVS checkouts vs git checkouts, actually)...

Erik (Feb 04 2021 at 14:35):

I have no recollection of manually tweaking CVS files for a release O.o I was slid off to muves3 around that time I think, I think 7 happened without me

Erik (Feb 04 2021 at 14:36):

I mostly just did fbsd support and autoconf before reassignment (plus a few side projects, uh, some parser for matrex federations, uh, something else for Geoff, too... )

starseeker (Feb 04 2021 at 14:37):

Fair enough. The more I see of all this the more grateful I am that I got to come on board just as SVN was introduced.

starseeker (Feb 04 2021 at 14:39):

CVS is... weird. At one point I even considered https://github.com/rcls/crap as an alternative to cvs-fast-export, since it seems to reproduce in Git what CVS checks out, but after discussions with Sean (and I think I noticed this myself at one point) I learned even CVS itself won't accurately check out some parts of our history (accurately in the sense of reproducing the tree that the users would have seen at the time) due to the edits made to work around the libtcl/libtk problems.

Erik (Feb 04 2021 at 14:41):

starseeker (Feb 04 2021 at 14:42):

<snort> I guess as a young whippersnapper I joined the software community too late to properly appreciate them. "RCS" to me mostly means annoying tags at the beginning of files that complicate diffing :-P

starseeker (Feb 04 2021 at 14:44):

Which is not to say I could have designed anything better than RCS back in the day, of course - I get the sense that VCS is one of those problems where only experience with the day-to-day requirements of the problems at scale can really result in good designs.

Sean (Feb 04 2021 at 15:29):

I don't know of CVS files being edited for releases. They were mostly edited to "fix" things CVS couldn't do, like renaming a directory or eliminating a bad commit.

Sean (Feb 04 2021 at 15:29):

@starseeker another something to investigate... do you know why this doesn't work?
git diff 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47~1 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47

Sean (Feb 04 2021 at 15:30):

show works, but can't diff it... somehow it doesn't have an ancestor or has multiple or ... ?

starseeker (Feb 04 2021 at 15:32):

Author: Douglas Kingston <dpk@randomnotes.org>
Date:   Fri Dec 16 00:10:31 1983 +0000

    Original 4.2 Distribution Source

    svn:revision:2
    cvs:account:dpk
    cvs:branch:trunk

Sean (Feb 04 2021 at 15:33):

starseeker (Feb 04 2021 at 15:33):

Sean (Feb 04 2021 at 15:33):

I was able to identify most of the missing revisions, but there are about 160 that didn't match, and when I investigated it was because git's show syntax doesn't match diff syntax for merge commits. manually looking at one of the ones that didn't match, it was indeed a merge commit that didn't match because of the format. so I regen'd the diffs but it barfed on that one.

starseeker (Feb 04 2021 at 15:34):

Sean (Feb 04 2021 at 15:34):

Sean (Feb 04 2021 at 15:35):

starseeker (Feb 04 2021 at 15:35):

Sean (Feb 04 2021 at 15:35):

starseeker (Feb 04 2021 at 15:35):

Sean (Feb 04 2021 at 15:35):

Sean (Feb 04 2021 at 15:36):

starseeker (Feb 04 2021 at 15:36):

git diff 4b825dc642cb6eb9a060e54bf8d69288fbee4904 2686445fedcfeadcbc8a2960fd8690f2d0ccbf47

Sean (Feb 04 2021 at 15:36):

starseeker (Feb 04 2021 at 15:36):

Sean (Feb 04 2021 at 15:37):

starseeker (Feb 04 2021 at 15:38):

Maybe ^! is a shorthand for that? Dunno, haven't encountered that syntax before - @Erik ?

Sean (Feb 04 2021 at 15:39):

I found that on some other SO but can't find it in the docs to know what it means.

starseeker (Feb 04 2021 at 15:39):

Erik (Feb 04 2021 at 15:40):

Sean (Feb 04 2021 at 15:40):

here we go: The r1^! notation includes commit r1 but excludes all of its parents. By itself, this notation denotes the single commit r1.

Sean (Feb 04 2021 at 15:41):

Erik (Feb 04 2021 at 15:41):

Sean (Feb 04 2021 at 15:41):

me too, but it's syntax is wrong for merge commits (at least for patch and diffing purposes)

Sean (Feb 04 2021 at 15:41):

Sean (Feb 04 2021 at 15:42):

there's undoubtedly other options to change the format, but diff is the command that does it in the right format by default, so it was a hunt to find the right syntax for "just this commit"

Sean (Feb 04 2021 at 15:42):

Sean (Feb 04 2021 at 15:44):

Erik (Feb 04 2021 at 15:44):

Sean (Feb 04 2021 at 15:44):

Sean (Feb 04 2021 at 15:45):

starseeker (Feb 04 2021 at 15:48):

@Sean If you think it's worthwhile, I'd stick your verification script or at least notes about the key gotchas associated with creating it in misc/repoconv once this is all over - it can't be any worse than my conversion logic, and it might be useful someday if we ever have to dive back into this swamp...

Sean (Feb 04 2021 at 15:48):

Sean (Feb 04 2021 at 15:49):

I've been stashing notes just in case I need to reference one of the 1-liners later, and notes on missing revs as they've been explained

starseeker (Feb 04 2021 at 15:49):

/me is embarrassed that he didn't think of cutting down the diff into +/- lines - should have considered that when the commit messages didn't resolve things unambiguously

Sean (Feb 04 2021 at 15:50):

well still remains to be seen -- they may need to be sorted too, but wasn't going to do that until there's evidence it's needed

Sean (Feb 04 2021 at 15:51):

e.g., if there are multiple file changes and svn shows A, B, C but then git displays C, B, A or similar ... shouldn't but might be possible. so far I'm thinking not just because so many are matching.

Sean (Feb 04 2021 at 15:52):

but the bigger set was empty merges next so should see how many of the 160 this eliminates

starseeker (Feb 04 2021 at 16:03):

/me will be curious to see if any of the commit message + timestamp based mappings prove to be incorrect.

Sean (Feb 04 2021 at 16:16):

I can re-run on everything next but immediate priority was just identifying potentially missing commits

starseeker (Feb 04 2021 at 16:16):

starseeker (Feb 04 2021 at 16:17):

Whatever you think best - just want to do whatever I can to put bow on this sucker.

Sean (Feb 04 2021 at 16:17):

that will require pulling all the svn diffs, which takes a while. took longer to pull 720 svn diffs from sf than it took to pull 70000 git diffs locally ... not much longer but still was a while

Sean (Feb 04 2021 at 16:17):

I'm fine just making sure we're not missing data. if a commit is mis-tagged, that could be fixed later.

starseeker (Feb 04 2021 at 16:18):

Might be faster to rsync the SVN repo and pull it locally - that's how I've worked with it

Sean (Feb 04 2021 at 16:18):

starseeker (Feb 04 2021 at 16:19):

K. If you've got the data to hand though, now that I've got what should be a means to correct them implemented, I'd kinda like to to ahead and fix them. Remember, if we have to ask everyone to re-clone, it's also going to wipe out any pull requests, etc. on github folks may have open.

starseeker (Feb 04 2021 at 16:20):

Sean (Feb 04 2021 at 16:21):

starseeker (Feb 04 2021 at 16:37):

One trick will be the known cases where cvs-fast-export split things more finely than cvs2svn with those desc tags from CVS - any commit with that in play won't match in diff - for those cases (most of them, anyway) the commit message would actually be more reliable.

starseeker (Feb 04 2021 at 16:40):

starseeker (Feb 04 2021 at 16:44):

Starts getting a bit more iffy if we have non-unique commit message, matching timestamp, but non-matching diff (with no other exact matching diff) - if the diff is a subset of the SVN diff a case could be made for assigning the number, but that'd probably take some laborious manual inspection...

starseeker (Feb 04 2021 at 16:44):

starseeker (Feb 04 2021 at 16:56):

Also, fair warning - I'd expect some differences (due to line endings especially) in the CVS era commits.

starseeker (Feb 04 2021 at 16:57):

If you want to focus on just the commits that are currently unmapped, and ignore trying to validate all of them based on diffs, I'd personally be fine with that given the difficulties of the latter.

Sean (Feb 05 2021 at 07:33):

I haven't run into any of those yet, but split commits could be handled pretty easily I think. If they're already tagged, I'd just ignore them and rely on all the unsplit matching as sufficient validation.

Sean (Feb 05 2021 at 07:36):

Update -- my reprocessing using diff format indeed improved things significantly. Found more than half the remaining missing commits. Down to just 64 commits unidentified.

Sean (Feb 05 2021 at 07:38):

Digging in, turns out at least a portion of them are due to changed lines with internal whitespace differences. It's some sort of expanded tabs issue, possibly where cvs-fast-export preserved tabs correctly whereas cvs2svn did not preserve them. That's unconfirmed, but matches the commit I was checking. Rerunning it now with internal space stripped and should know in the morning what's left.

starseeker (Feb 05 2021 at 17:46):

Huh - I expected some line ending oddities, but I'm surprised there's actual internal whitespace diffs.

starseeker (Feb 06 2021 at 04:13):

Erik (Feb 06 2021 at 23:54):

Erik (Feb 06 2021 at 23:55):

starseeker (Feb 07 2021 at 02:13):

starseeker (Feb 07 2021 at 03:00):

@Erik Unless @Sean spots something, the only planned remaining changes (other than updates until SVN closes) are the application of some addition SVN commit -> Git commit mappings @Sean has identified during his validation.

Sean (Feb 08 2021 at 15:19):

I'm down to reviewing the last few remaining missing commits -- it's down to about 50 missing, so I should hopefully figure out what happened without too much trouble (e.g., if they're categoric processing artifacts or actually missing data). It's a manual process for the few remaining until I find a categoric pattern.

So far, I'm genuinely having trouble finding one of them, but not done hunting for it (I found a fragment but then couldn't find its commit, so have to re-find the fragment to see if that was combined/merged with something else or just a coincidental edit to the same line in an unrelated commit.)

starseeker (Feb 08 2021 at 17:21):

@Sean you're much deeper in than I at this point, but is there anything I might be able to help with?

starseeker (Feb 08 2021 at 23:53):

starseeker (Feb 08 2021 at 23:54):

I'm going to see if I can arrange a partial re-run, but there's a glitch in one of my processing filters

starseeker (Feb 09 2021 at 13:25):

Sean (Feb 09 2021 at 17:28):

@starseeker can you take a look at 4d401a8617869d3594b5948de12a374a5bd292fe and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942 and r19440

Sean (Feb 09 2021 at 17:32):

Sean (Feb 09 2021 at 17:40):

starseeker (Feb 09 2021 at 18:31):

If I'm interpreting this correctly, 4d401a8617869d3594b5948de12a374a5bd292fe matches the r19440 change on trunk, and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942 is the same change applied to the rel-5-3 branch. However, the r19440 label was applied to the branch commit rather than the trunk commit.

starseeker (Feb 09 2021 at 18:33):

Which, since SVN reports a diff in trunk for r19440, means the timestamp must have matched for the branch application, but it should instead have been applied to the trunk version

starseeker (Feb 09 2021 at 18:35):

So the "correct" fix there would be to apply the SVN revision to the trunk commit and strip it from the branch commit. I can do the former, but I'll have to tweak things to support the latter.

starseeker (Feb 09 2021 at 18:36):

If you want, we can establish a line with the convention SHA1; to denote commits that I should clear an SVN revision assignment from.

Sean (Feb 09 2021 at 18:36):

Sean (Feb 09 2021 at 18:37):

I don't think it's a huge deal, I'm not sure how many of those there are. possibly quite unlikely if it was just because those commits were within a few seconds of each other?

Sean (Feb 09 2021 at 18:38):

I'm getting a count now -- there's some number of trunk commits are tagged on branches

starseeker (Feb 09 2021 at 18:39):

If cvs2svn consolidated the timestamps on those two commits as "identical" and picked the newer timestamp, then it will happen every time cvs-fast-export resolved those cases into individual commits and the branch commit was the newer of the two.

starseeker (Feb 09 2021 at 18:43):

Might as well fix them if it's easy to pull the data set - it won't be appreciably more work than adding the missing mappings in the first place.

Sean (Feb 09 2021 at 18:45):

I think if I've counted correctly, that there are at least 121 commit revisions that were on trunk, but are tagged in git on a branch.

starseeker (Feb 09 2021 at 18:45):

starseeker (Feb 09 2021 at 18:46):

Sean (Feb 09 2021 at 18:47):

that one was an anomaly... wasn't even looking, I was collapsing the multiple-match revs manually for the 30 or so that match multiple diffs and that matched two... and noticed it seemed flipped

starseeker (Feb 09 2021 at 18:48):

Sean (Feb 09 2021 at 18:48):

Sean (Feb 09 2021 at 19:28):

Is the branch/trunk label reliable? I've been assuming it was generated based off commit location with no guessing involved, but realize I should double-check that assumption.

starseeker (Feb 09 2021 at 19:52):

For SVN it should be reliable. CVS identifications were up to cvs-fast-export/cvs2svn and I'm not as certain there

starseeker (Feb 09 2021 at 20:05):

The cvs:branch labels were based off of a fairly low-level analysis of the git conversion data - misc/repoconv/cvs_info.sh IIRC

starseeker (Feb 09 2021 at 20:07):

The root was the git rev-list --first-parent reporting, which depends on cvs-fast-export correctly assigning the first parent based on CVS branch data.

starseeker (Feb 09 2021 at 20:09):

starseeker (Feb 11 2021 at 03:44):

Sean (Feb 11 2021 at 07:32):

There was a categoric anomaly so I cleaned up and changed some things to check, and am re-running the comparison to make sure.

Sean (Feb 11 2021 at 07:38):

The 40 count was wrong (it was higher). On the plus side, scripting is cleaned up (had to rewrite everything) to the point that it can check all revs easily now. Got svn cloned too so it can do that quickly. Got it matching files and log messages cleanly now too. It's running through re-processing the missing batch now and should have an update in the morning.

Sean (Feb 11 2021 at 09:19):

Can you see if you can find c1644? There's a number of initial rev commits like that that I can't find. I'd hope it simply got merged with something else, but trying to verify that on one of them like 1644.

starseeker (Feb 11 2021 at 13:31):

git log --diff-filter=A -- util/pl-X.c
commit 86a7fcc40057934832f61255b606c0bd6f7fc12b
Author: Phillip Dykstra <phil@pdykstra.com>
Date:   Thu Apr 28 17:40:50 1988 +0000

    Unix-plot to X Window System display (X11)

    cvs:branch:trunk
    cvs:account:phil

git log --diff-filter=A -- util/pl-X10.c
commit a6feb76ce1551b09222463514f15e65db0343b55
Author: Phillip Dykstra <phil@pdykstra.com>
Date:   Thu Apr 28 17:43:26 1988 +0000

    Unix-plot to X Window System Display (X10R4)

    cvs:branch:trunk
    cvs:account:phil

I didn't do a detailed diff analysis, but it looks the difference is splitting up the commit to get the distinct commit messages?

Sean (Feb 12 2021 at 08:13):

Cool, that was super helpful. I'm not sure about the general case but I'm guessing it's split them up because they were far apart enough in time (couple min), so cvs2git decided to handle them differently. Checking down through, that rules out a bunch but I have to figure out how to automate the check across all 135 missing. I have checks for matching diffs vs logs vs changed files but obviously doesn't catch split/merge changes unless all that changed was the log message (did verify a slew with that lil trick).

Sean (Feb 12 2021 at 08:21):

Initial revisions seem to be a large portion of the bulk missing. Took some work to figure out they're not just on branches.

Three commits you could check on for me are r51428, r54352, and r64428. They're fairly modern commits, so they stick out like a sore thumb for not matching. Haven't dove in to figure out what's up with them.

Set up the check across all svn commits and that's chugging along now. When that finishes up, should have a list of commits that are mistagged on branch vs trunk.

starseeker (Feb 12 2021 at 13:12):

@Sean Starting with r51428... The checkouts of the files are identical, so i pulled the diffs:

git format-patch -1 be5072cb90113d7c0d75839cc4f183d8cde1646b
svn diff -c51428 > r51428.patch

The patch formatting is different, so I brought them up in meld and applied all the SVN style headers to the git patch. Doing that, I was left with:
diff.png

starseeker (Feb 12 2021 at 13:12):

It looks like git and svn made very slightly different decisions on where to start and end their patch blocks.

starseeker (Feb 12 2021 at 13:23):

r54352 is similar, but less subtle - identical files in checkouts, but different ordering on the subtraction line instructions in the diff: diff_r54352.png

starseeker (Feb 12 2021 at 13:35):

r64428 is the most spectacularly different of the diffs, but checking the Git and SVN checkouts of r64427 and 64428 all files appear to agree, so the two different diffs appear to end up doing the same job.

starseeker (Feb 13 2021 at 18:59):

@Sean was that what you were looking for, or is there something else about those commits that is concerning?

Sean (Feb 13 2021 at 19:28):

No that was great, helpful. I hypothesized that'd happen but hadn't actually seen it (or at least hadn't noticed). Those stuck out because they were new. I've been going through the list ruling out others like those.

starseeker (Feb 13 2021 at 19:46):

starseeker (Feb 15 2021 at 13:28):

Sean (Feb 17 2021 at 08:58):

Went well! Took a while to process, but went really well. I double checking a couple lists, but here's the list of trunk commits that are misattributed to branches in git. It's not as many as originally seemed fortunately, but it's a few:
mistagged_trunk_commits.log

starseeker (Feb 17 2021 at 12:24):

starseeker (Feb 17 2021 at 12:26):

r66607 is surprising - I wouldn't have expected any issues like that in the SVN era

starseeker (Feb 17 2021 at 14:07):

OK. It looks like r66607 was a multi-branch commit, making changes to both the branch and trunk in the same commit. I didn't realize we had any of those in the modern era - all the instances I had spotted were much earlier.

starseeker (Feb 17 2021 at 14:09):

What the conversion ended up doing was to apply the changes from r66607 to trunk in commit r66672.

starseeker (Feb 17 2021 at 14:11):

Which will also mean that the r66672 diff won't match that from SVN, since the SVN change was just the HAVE_ANALYZER_NORETURN test.

starseeker (Feb 17 2021 at 14:13):

starseeker (Feb 17 2021 at 14:17):

Going through the rest of the list, I haven't identified obvious reassignment candidates yet for the following:

starseeker (Feb 17 2021 at 21:30):

starseeker (Feb 17 2021 at 21:38):

starseeker (Feb 17 2021 at 21:41):

Ah, I see. The other four are cvs2svn artifacts - so rather than reassigning, they simply don't have direct analog commits and all and we just remove the assignments.

starseeker (Feb 17 2021 at 21:41):

Sean (Feb 18 2021 at 06:56):

Cool, glad you could deduce them. I wasn't 100% sure if you have it tagging revs separate from branches. I didn't check whether the :branch: tag was correct or not, only that that rev definitely didn't happen on a branch.

Sean (Feb 18 2021 at 07:02):

Next set is the inverse -- looking a lot better (half done it just found one :trunk mis-assignment) but taking longer to process for some reason. Should be done here soon.

Sean (Feb 18 2021 at 07:03):

Sean (Feb 18 2021 at 07:15):

starseeker (Feb 18 2021 at 13:18):

starseeker (Feb 18 2021 at 13:23):

I used the date view to illustrate it's not just an isolated commit in the repo, but part of the main history

starseeker (Feb 18 2021 at 13:28):

@Sean Where are we with the list of previously unidentified SVN id matches found by your diffing method? I'd be glad to help if you have a set of commits for manual review.

Also, just conceptually, what is your preference for cases like the one identified earlier where a single cvs2svn commit got split up into multiple git commits? Did you want to assign the SVN id to each "portion" commit in Git, if they can be identified?

Sean (Feb 18 2021 at 20:00):

That would be totally awesome to tag both commits, and similarly, tag merged commits with multiple revision tags. I know some of them but haven't been fully tracking. I do think there are probably 100-200 in that category.

starseeker (Feb 18 2021 at 21:04):

Tagging multiple svn revs onto a single Git commit would require some rework of the assignment code - let me know if that's something you definitely want to do.

Sean (Feb 19 2021 at 06:47):

If you want to, go for it, but I don't think it's strictly necessary. So long as the commit is tagged somewhere on one of the rev parts, that should be sufficient for tracing.

Sean (Feb 19 2021 at 06:49):

I finished checking the inverse and the only anomaly was 30804. It's tagged as "svn:branch:trunk-UNNAMED-BRANCH" but was branch "unlabeled-2.5.1" in svn.

Sean (Feb 19 2021 at 07:22):

Am seeing some other anomalies on these tagged revisions, what's going on with r30687 ? The tags don't appear to match svn at all.

Sean (Feb 19 2021 at 07:28):

Another curious one is 46324 -- it's tagged as being on four branches but it was a tag, never committed to branches. Saw some others like that.

Sean (Feb 19 2021 at 07:33):

    svn:branch:ansi-20040316-freeze
    svn:branch:bobWinPort-20051223-freeze
    svn:branch:ctj-4-5-post
    svn:branch:ctj-4-5-pre
    svn:branch:hartley-6-0-post
    svn:branch:offsite-5-3-pre
    svn:branch:opensource-pre
    svn:branch:windows-20040315-freeze

ansi-20040316-freeze
ansi-20040405-merged
autoconf-freeze
bobWinPort-20051223-freeze
ctj-4-5-post
ctj-4-5-pre
hartley-6-0-post
hartley-6-0-pre
offsite-5-3-pre
opensource-post
opensource-pre
windows-20040315-freeze

Sean (Feb 19 2021 at 07:40):

Looks like 30688 is also tagged as trunk-UNNAMED-BRANCH but also cjohnson-mac-hack, but I don't see that in svn. Svn only lists it affecting:

unlabeled-1.1.1
unlabeled-1.1.2
unlabeled-1.2.1
unlabeled-11.1.1
unlabeled-2.12.1
unlabeled-2.6.1
unlabeled-9.1.1
unlabeled-9.10.1
unlabeled-9.12.1
unlabeled-9.2.1
unlabeled-9.3.1
unlabeled-9.7.1
unlabeled-9.9.1

starseeker (Feb 19 2021 at 14:39):

@Sean I've added the ability to correct the r30804 and r30688 branch assignments.

starseeker (Feb 19 2021 at 15:08):

commit 44e3d7341c5680250d65091b2aff6ed051720a11 (HEAD, origin/itcl3-2, itcl3-2)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Tue Aug 23 12:19:43 2011 +0000

    revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)

    svn:revision:46324
    svn:branch:itcl3-2
    svn:account:brlcad

commit a988903bbe27985e0dd94228e07079e91e98be4d (origin/libpng_1_0_2, libpng_1_0_2)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Tue Aug 23 12:19:43 2011 +0000

    revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)

    svn:revision:46324
    svn:branch:libpng_1_0_2
    svn:account:brlcad

commit c54b9b07158d4a904aabddae264290854ecb250c (origin/tcl8-3, tcl8-3)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Tue Aug 23 12:19:43 2011 +0000

    revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)

    svn:revision:46324
    svn:branch:tcl8-3
    svn:account:brlcad

commit 03af105da8dd3cf85a29cc7f056513cc8e79d751 (origin/tk8-3, tk8-3)
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Tue Aug 23 12:19:43 2011 +0000

    revmoed additional 3rd party dependencies that don't really belong amongst our other tags (svn branch delete)

    svn:revision:46324
    svn:branch:tk8-3
    svn:account:brlcad

When I look at what r46324 did in SVN, it eliminated branches/tags/itcl3-2, branches/tags/tcl8-3, branches/tags/tk8-3, and branches/tags/libpng_1_0_2 - this seems to corresponds to what is recorded in those Git commits (which can't actually delete the branches without any commits being uniquely referenced by them getting garbage collected.)

starseeker (Feb 19 2021 at 15:11):

$ git log --all --grep 30687
commit 004ec0ae439f0ca3c814d22a46957012cd8fb239
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:Original
    svn:account:brlcad

commit 720f9b9b75588e35d3cce0f9f5b802abea2259ab
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:itcl3-2
    svn:account:brlcad

commit f206b315ca475d3a3e55e98ec42d772c6b05baee
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:libpng_1_0_2
    svn:account:brlcad

commit cbff64617866cc3fc2b25db15cd610e651561958
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:tcl8-3
    svn:account:brlcad

commit 87bc784daf7f15cc8d9c9fa980a934a98a17de95
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:tk8-3
    svn:account:brlcad

commit d748c2ea214b699008563e18f5a7105de39faba9
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Wed Apr 16 14:40:20 2008 +0000

    remove branches that have no meaning and are for 3rd-party dependencies (svn branch delete)

    svn:revision:30687
    svn:branch:zlib_1_0_4
    svn:account:brlcad

starseeker (Feb 19 2021 at 15:14):

I think you're right that SVN tags are getting treated as branches - that's the only way to handle SVN tags with edits - and I doubt I attempted to distinguish when assigning the svn:branch labels.

starseeker (Feb 19 2021 at 15:15):

@Sean I'm not following how you're getting an association between (say) svn:branch:ansi-20040316-freeze and r46324 ?

starseeker (Feb 19 2021 at 15:17):

I guess I could try to take a list of commits made to tags instead of branches and update the svn:branch: labels to besvn:tag: labels instead?

starseeker (Feb 19 2021 at 15:37):

svn log file:///home/user/brlcad_repo/brlcad/tags|grep \^r|awk '{print $1}' > tags.log

That gives us (more or less) the set of tag commits. If we then look for any of them that match commit messages, we get a set of commits. tag_commits.txt

starseeker (Feb 19 2021 at 15:39):

So those commit labels could then be switched from svn:branch:* to svn:tag:*

Sean (Feb 19 2021 at 16:45):

I can generate a list of all branches/tags associated with each commit easily enough -- that's what I was doing to validate specific sets, just not systematically on all commits.

starseeker (Feb 19 2021 at 16:47):

Sean (Feb 19 2021 at 16:48):

starseeker (Feb 19 2021 at 16:49):

I wasn't (am not) seeing how you associated that commit with that branch, either in SVN or git?

Sean (Feb 19 2021 at 16:50):

let me check where I got it from because I agree, I'm only seeing it on four git commits now... maybe misprocessed on a subsequent validation

Sean (Feb 19 2021 at 16:52):

ah, yeah, looks like i wrote the wrong rev here in the chat.. 46324 is good...
that list was for 46322 ... which looks like it matches so I just got those two crossed when I was checking them manually

Sean (Feb 19 2021 at 16:54):

cool, that's great -- could be more thorough but that's good enough for non-branch commits -- means all non-branch commits that are tagged look like they're mostly tagged correctly besides the two trunk-UNNAMED-BRANCH commits.

starseeker (Feb 19 2021 at 16:56):

Those are partially my fault - a regex match was too loose and turned master-UNNAMED-BRANCH into trunk-UNNAMED-BRANCH. Either way though they had the wrong branch somehow, so I added corrections

Sean (Feb 19 2021 at 16:56):

Sean (Feb 19 2021 at 17:03):

can you check on something unusual... commits 21570 through 21634 in svn
I got nothing but a log message.

Sean (Feb 19 2021 at 17:04):

perhaps cvs2svn garbage of some sort? did cvs2git fix/import any of those better?

starseeker (Feb 19 2021 at 17:05):

starseeker (Feb 19 2021 at 17:06):

starseeker (Feb 19 2021 at 17:09):

Here's the portion of brlcad/h/Attic/tclIntPlatDecls.h,v from CVS that seems to have generated that commit:

1.1
log
@file tclIntPlatDecls.h was initially added on branch windows-6-0-branch.
@
text
@d1 585
@


1.1.2.1
log

starseeker (Feb 19 2021 at 17:09):

At a guess, cvs2svn put in an empty commit and cvs-fast-export ignored it as an empty commit...

Sean (Feb 19 2021 at 17:13):

Sean (Feb 19 2021 at 17:14):

that's a huge range of commits, all with detailed log messages indicating activity

Sean (Feb 19 2021 at 17:15):

I mean, I guess it's garbage or old cvs issue of some sort, so not a problem, but odd

Sean (Feb 19 2021 at 17:16):

also, how'd you manage to catch/fix r62027 ? looks like it was added alongside trunk and you somehow fixed it (or at least tagged it better) as being a branch

starseeker (Feb 19 2021 at 17:16):

1.1
log
@file libpkg.dsp was initially added on branch windows-6-0-branch.
@
text
@d1 115
@


1.1.2.1

starseeker (Feb 19 2021 at 17:18):

I was the one who messed that up, so I knew it was coming and did some manual work in the initial conversion to special case that.

Sean (Feb 19 2021 at 17:18):

Sean (Feb 19 2021 at 17:34):

Okay! Finally... here's the list of commits that appear to have applied to multiple branches at the same time: commits_to_multiple_branches.txt

Sean (Feb 19 2021 at 17:36):

Might want to double-check me there, but that's only looking at the svn side. You may already be handling some of them differently like the branches AUTOCONF vs autoconf-branch ?

starseeker (Feb 19 2021 at 17:36):

Maybe. I recognize 19033 - it's one of the ones you flagged as being missing on trunk. I had removed its commit id from the branch, but if that's right it actually needs to be on both

starseeker (Feb 19 2021 at 17:38):

Blegh. Well, I uploaded the latest state at brlcad_conv14 to demonstrate the switch to svn:tag: labeling for those commits made to tags, but don't use that for SHA1 lists of any sort - stick to brlcad_conv12. Clearly the post-processing isn't done yet...

Sean (Feb 19 2021 at 17:38):

That transcript is derived by pulling a diff of all commits and extracting all the filepaths that changed.

starseeker (Feb 19 2021 at 17:39):

I'll take a run through - probably it's just going to mean an adjustment/expansion of the branch and/or trunk commits I need to manually specify revisions for

starseeker (Feb 19 2021 at 17:40):

The other list I know we still need is the svn commit IDs you were able to identify that I had never mapped, like 735 - did that prove practical or were there roadblocks?

Sean (Feb 19 2021 at 17:44):

starseeker (Feb 19 2021 at 17:44):

starseeker (Feb 19 2021 at 17:45):

I'm not sure what to make of 18999 - I'm not seeing two commits associated with that in Git

Sean (Feb 19 2021 at 17:45):

starseeker (Feb 19 2021 at 17:46):

/me nods - it's surprising how much difference that makes over long stretches of time.

starseeker (Feb 19 2021 at 17:47):

/me goes through the list to see if he can quickly spot any candidates for svn revision labels...

starseeker (Feb 19 2021 at 18:10):

@Sean How authoritative was the cvs2svn branch identification for commits? A lot of these in git are tagged as rel-5-2 rather than rel-5-1-branch - given the process I used to try and determine which branch was the "origin" branch in CVS relied on the git conversion itself, it's possible I've not correctly identified the original branches...

starseeker (Feb 19 2021 at 18:30):

A sizable chunk of these are proving to be the mirror image of the other case - instead of the branch getting the svn id and trunk not getting it, it's trunk that got the id and the branch didn't.

Sean (Feb 19 2021 at 19:32):

How are the rev updates committed in the repo still valid? Doesn't assigning a different tag on earlier commits affect the future commit shas?

Sean (Feb 19 2021 at 19:33):

Can you give me an example of the rel-52 vs? Could be a bug, but the processing was pretty straightforward to have it report what actually changed.

starseeker (Feb 19 2021 at 19:33):

Yes. Every time I have to do that, I have to upload a new repository. That's why the brlcad_conv13 and brlcad_conv14 repos were up briefly

starseeker (Feb 19 2021 at 19:34):

starseeker (Feb 19 2021 at 19:36):

The corresponding commits in the Git conversion report:
cvs:branch:rel-5-3 cvs:branch:trunk

starseeker (Feb 19 2021 at 19:36):

4d401a8617869d3594b5948de12a374a5bd292fe and ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942

Sean (Feb 19 2021 at 19:58):

morrison@agua brlcad_conv11 % svn diff -c 19440  svn+ssh://brlcad@svn.code.sf.net/p/brlcad/code|grep "^Index: brlcad"
Index: brlcad/branches/rel-5-1-branch/tclscripts/mged/grid.tcl
Index: brlcad/trunk/tclscripts/mged/grid.tcl

Sean (Feb 19 2021 at 20:03):

Looks like the git commit log and diff are correct, just incorrectly asssociated with rel-5-3

Sean (Feb 19 2021 at 20:04):

How is the branch/tag figured out? I sort of assumed it was coming from the processing. If they're suspect, that might explain some of the missing trunk tags.

starseeker (Feb 19 2021 at 20:29):

The SVN era branch assignments should come directly from repository information. The CVS era branch assignments were done using the script in misc/repoconv/cvs_info.sh

starseeker (Feb 19 2021 at 20:30):

Fundamentally, it uses git rev-list --first-parent to follow commit chains back up the branches.

starseeker (Feb 19 2021 at 20:32):

starseeker (Feb 19 2021 at 20:40):

So my question was whether cvs2svn was more likely to correctly assign a correct commit branch of origin. If that's the case, then I'll have to reassign the CVS era branches somehow.

starseeker (Feb 19 2021 at 20:48):

I'm not conversant enough with CVS to know how to try and directly coax the information out of the original repo, so my reasoning was that since the cvs-fast-export conversion was the one we were using from the CVS era the branch assignments were the ones to use for that part of the history.

starseeker (Feb 19 2021 at 21:04):

FWIW, SVN commit r19990 "Release 5.3" was right in amongst the latter of the multibranch commits SVN reported as being on rel-5-1-branch. It seems a bit suspect that all the multibranch commits would be originating on rel-5-1-branch when they were about to release 5.3...

Sean (Feb 19 2021 at 22:16):

I can't say for sure, but I do recall that branches in cvs are recorded explicitly so there's no guessing. Any tool converting has perfect branch knowledge so I would expect cvs2svn (and cvs-to-git) to correctly reflect what was in cvs in svn.

Sean (Feb 19 2021 at 22:18):

Perhaps --first-parent isn't appropriate? What if something is a branch of a branch or similar? Git could be tracking through to a grandparent branch.

Sean (Feb 19 2021 at 22:19):

the branch names are in the ,v files, if you want to see if/when rel-5-1-branch vs 5-3 branch are associated with a 0particular commit. They're in a "symbolic names:" block near the top.

Sean (Feb 19 2021 at 22:22):

starseeker (Feb 19 2021 at 23:23):

I'm beginning to think git just literally doesn't track this properly at all, at ANY level. If I'm interpreting these number correctly per the Princeton site, it looks like SVN has it correct.

starseeker (Feb 19 2021 at 23:24):

starseeker (Feb 19 2021 at 23:26):

@Sean I don't suppose in that pile of scripts you've got one that will generate the set of branches for all SVN commits?

starseeker (Feb 20 2021 at 00:21):

starseeker (Feb 20 2021 at 02:31):

OK, there we go. Can now scrub out the existing cvs:branch labels and replace them with SVN data.

Sean (Feb 20 2021 at 06:13):

Yeah, that finished processing. Careful if you used the previous script, had a bug.

Sean (Feb 20 2021 at 06:14):

Sean (Feb 20 2021 at 06:24):

Er, rather that's all multiple branchpoint commits. This is all commits in the repo: all_branches2.log

Sean (Feb 20 2021 at 06:25):

note the multiple branches list did update, if that changes anything on the processing

starseeker (Feb 22 2021 at 15:02):

OK, I think I've got the branch assignments working using SVN data now. Here's the diff that shows the changes to the commit messages in brlcad_conv12 diff.txt

starseeker (Feb 22 2021 at 16:34):

The sha1s won't match, but I can upload that version of the repository if it is useful.

starseeker (Feb 22 2021 at 18:43):

starseeker (Feb 23 2021 at 19:14):

@Sean Any luck with generating the mappings? As an alternative if you want you can post the brlcad_conv11 repo you were using (I haven't kept that iteration so I'd need a copy of what you're using) and your existing SHA1 sets - I think I've hammered out an update script now.

Sean (Feb 23 2021 at 19:48):

Should be done soon. Taking a while to recompute all the hashes. Looks like the first few hundred ended up unmodified (same sha) but once a commit message changed, everything after had to be re-associated with the new shas and that process takes a couple hours (and it's a couple hours in, so almost done).

Sean (Feb 23 2021 at 19:49):

One curiosity that you can maybe help explain / educate me on ... do you know why a commit like 944 would be in git log --all but not in git log --follow . ?

Sean (Feb 23 2021 at 19:50):

maybe a bad example -- I didn't check if it was a commit to a different repo or something, just the first I noticed

starseeker (Feb 23 2021 at 20:26):

Hmm. If I save the log output of git log --follow . to a file and then search for c037a5e3a6eb97d2f9455225bbafeffec5b79be4 (which I think is the commit corresponding to 944 in brlcad_conv12) it is there.

starseeker (Feb 23 2021 at 20:27):

I do know in general that git log --all will incorporate the history from all branches, not just the currently checked out branch.

starseeker (Feb 23 2021 at 20:28):

starseeker (Feb 23 2021 at 20:33):

starseeker (Feb 23 2021 at 20:36):

Checking SVN, that's a property change - so the bug is the SVN revision getting assigned at all.

starseeker (Feb 23 2021 at 20:37):

starseeker (Feb 23 2021 at 20:53):

OK. Here are my thoughts so far: 944 looks like a timestamp match with 52036a8b4569b8ffe90e2e8fb0b43f5ed36ba040. It's got one of the generic log messages, so my revision assignment code went ahead and assigned it that revision.

Based on the diff report from SVN, that's an incorrect assignment and needs to be changed/cleared. Hopefully the diff based checking will catch that.

starseeker (Feb 23 2021 at 20:55):

That probably explains why it doesn't show in git log --follow . - that search is based on all the files in the currently checked out branch, working backwards. Since the incorrectly identified "944" has no files associated with it, there's no way for git to associate it with the history walking backwards from the tree as a starting point.

starseeker (Feb 23 2021 at 20:57):

Or, another possibility - even if it can associate it following the commit chains, an empty commit won't match the "." specifier.

starseeker (Feb 23 2021 at 21:03):

OK - I see "Added fb_close", which is the parent of 52036a8b4569b8ffe90e2e8fb0b43f5ed36ba040, does make it into the git log --follow . output. That suggests it's following the chain through that commit, but not matching "." and skipping reporting it.

starseeker (Feb 23 2021 at 21:04):

(Sorry, that's probably a little more stream of consciousnesses than you were looking for...)

starseeker (Feb 23 2021 at 21:07):

In some ways it's tempting to try to scrub empty commits like that with generic commit messages out, but at this juncture I'd be worried about inadvertently breaking something else...

starseeker (Feb 23 2021 at 21:34):

Hmm. Actually, repowork already has the info to detect empty commits, in principle, and even categorize them...

starseeker (Feb 23 2021 at 21:35):

Some I know we need (branch creation/deletion), some are marginal (commits removing empty directories, which are no-ops in git) and some of them are useless (empty generic message, empty contents).

starseeker (Feb 23 2021 at 21:45):

starseeker (Feb 23 2021 at 21:49):

@Sean What do you think - should I scrub out the empty commits with "* empty log message*" and maybe some of the other obvious ones?

starseeker (Feb 23 2021 at 21:51):

"BRL CAD Distribution Release 1.10" has a couple non-empties in addition to the 4 empties, for example...

starseeker (Feb 23 2021 at 23:05):

Yeah, it's a variation on the splicing problem. Have the ability to remove specified commits now.

scorp08 (Feb 24 2021 at 20:18):

starseeker (Feb 25 2021 at 01:01):

I don't recall at this point - they were iterative refinements to the process of correcting the output from the main svnfexport conversion (merging git notes into comments, correcting emails, etc.)

starseeker (Feb 25 2021 at 01:02):

conv12 is the "target" for a third series of refinements at this point, mainly because I need stable SHA1s to target for processing. (In principle I've prepared a script to translate between old and new repositories if necessary, but I'd rather not have to use it... this is already complicated enough.)

Sean (Feb 25 2021 at 17:31):

I see you're getting ahead of my own validation pace... Sorry it's taking so long, I'm just chasing down issues in the multiassignment, a couple bugs in the scripting, wanted the list I give to be more certain than a blanket wash as I'm seeing lots of little discrepancies and ways to mis-associate.

Sean (Feb 25 2021 at 17:34):

starseeker (Feb 25 2021 at 19:47):

I don't see any collisions - you've got about a dozen that I haven't got yet, but I suspect that's probably because I forgot to use the version of the SVN repository that had the RCS tags scrubbed down.

starseeker (Feb 25 2021 at 19:48):

I'll have to look more closely at the ones that popped up on mine as matching you don't have... may be an issue I haven't found yet.

starseeker (Feb 25 2021 at 19:56):

420b6c86aebaab8d233b9124aac2dfcaab390158;2253
3626fd67e335d89391ce624b8a3246bd99adffec;2470
231fd989a63e842f6ed485d8ac49caec4eee3660;2471
bbd7e8166d10d1f8c1c3355f87814fd9c4e652df;2489
1a79a71444aa3900b25c61c321c270d3f83d7065;2657
68245f26449e72b3fa8362bfdaa8ec4b458566bc;2841
0857eeb72eb573cad76f86b770e765349a85a671;2875
b026f5c0fe0aa8e2d9ca34051a20fb9afb92162a;2884
3732cf651af0b526eb3ec6bdf5893892f22afef4;2886
84c054fee5394109f52dfaf15add46d671ede196;2890
de144fde847a9fe45cac391c97bd7abaeacc3b0b;2900
d0f9348a8847c22a1f5cb4846f9c7414c7c1081b;3578

starseeker (Feb 25 2021 at 20:06):

I can pick up 68245f26449e72b3fa8362bfdaa8ec4b458566bc;2841 if I sort the diff contents ahead of doing the md5sum.

Sean (Feb 25 2021 at 20:11):

I have others that partially match, I just haven't validated them so didn't share them yet.

starseeker (Feb 25 2021 at 20:14):

Ah, it looks like the rest categorized in my processing as having non-unique content matching.

starseeker (Feb 25 2021 at 20:15):

starseeker (Feb 25 2021 at 20:18):

OK. Diff content wasn't unique for r2253 - matches with r22190 - so it takes the path and/or date to resolve. 420b6c86aebaa is correct

starseeker (Feb 25 2021 at 20:31):

starseeker (Feb 26 2021 at 01:07):

@Sean Sorry, just read back up through chat history - not trying to replace your work (defeats the point of independent V&V) - goal was/is to get representative inputs to make sure my repo updating logic can handle something similar to what the final pass will look like. Just stashed the various bits and pieces (and notes) in case they prove to be useful.

Sean (Feb 26 2021 at 18:07):

@starseeker e6417be98f27d570d863744f566f5aaf738abbe6 .. I'm seeing listed as branch commit, but it was a trunk commit 19763

starseeker (Feb 26 2021 at 18:09):

Sean (Feb 26 2021 at 18:09):

Sean (Feb 26 2021 at 18:10):

starseeker (Feb 26 2021 at 18:10):

Sean (Feb 26 2021 at 18:18):

19033 LOG+FILE MATCH ON c365a032935f99d5cbcc5e0b7316253e918183f5
19211 LOG+FILE MATCH ON 2ec20a87d6e216cc3af62da933a2917e96459ce2
19282 LOG+FILE MATCH ON 4d5fe4e8afa57a275c04f0a11cbf20c1378ce600
19283 LOG+FILE MATCH ON 4af5f01acc93a65ba8e158c1e407e6fa30f0a867
19288 LOG+FILE MATCH ON 9f4472b6c4a9d77005a25bac0e6ea9d0b45c6829
19289 LOG+FILE MATCH ON 3312597ec11da607ad8cdecb8e86ecd6cd43a21c
19440 LOG+FILE MATCH ON ea6d4c16bae6ecf30d4439d92c8dd72f56b3e942
19449 LOG+FILE MATCH ON af33297408e4ec0b38fa37d211104ae8e3f4b850
19558 LOG+FILE MATCH ON 3a6fdd142e59c7fee7dfb06fdaecc3b30f28d633
19587 LOG+FILE MATCH ON a53d24a82016e59e54ad3fa0750238b077313a33
19720 LOG+FILE MATCH ON f1c200f10e9d5c0f896508b2967f644abafad234
19723 LOG+FILE MATCH ON 45a67834524348e32e2c1d34071b59dbb1360d9e
19763 FILE+DIFF MATCH ON e6417be98f27d570d863744f566f5aaf738abbe6
19772 LOG+FILE MATCH ON 6f4104bd83cf4a930bda9cbaa1b811d3e0d236b3
19783 LOG+FILE MATCH ON 6af6602bcdb5227c51a6b467226d5fc70d321855
19797 LOG+FILE MATCH ON 4b51763bd75123f81f069bba1b873c4538776530
19798 LOG+FILE MATCH ON d82708b47d89c008a20ce23ba23ce4aca80cf232
19839 LOG+FILE MATCH ON 8dcb60d4529dc5e0cf99729338e05869cf270c06

starseeker (Feb 26 2021 at 18:20):

Sean (Feb 26 2021 at 18:20):

This one is an outlier I'm not sure about, 11077485329842c81213eab68006fe5d58b5925f ...

Sean (Feb 26 2021 at 18:21):

it says it was 21565 but that was a trunk cvs2svn conversion commit. Commit message on 11077.. is that of 21564

Sean (Feb 26 2021 at 18:22):

starseeker (Feb 26 2021 at 18:22):

Sean (Feb 26 2021 at 18:25):

I need to investigate why 21564 isn't in my list of missing commits... should have caught that but didn't

starseeker (Feb 26 2021 at 18:26):

starseeker (Feb 26 2021 at 18:30):

I think I've got 19033 set up as follows: aec4367dafd37a7b0657c4b27414caa21ac4c1be is the trunk portion of that commit, and c365a032935f99d5cbcc5e0b7316253e918183f5 is the rel-5-1-branch portion

Sean (Feb 26 2021 at 18:42):

I'll have to confirm that myself, as I've been toggling between processing all commits and only those on trunk.

Sean (Feb 26 2021 at 18:47):

aha! yes, that explains it. that's why 21564 wasn't in my list. thought I was going crazy. that was a branch commit.

Sean (Feb 26 2021 at 18:49):

so in svn, 21564 was committed to branch, then 21565 commited to trunk to compensate?? I'm not sure what cvs2svn did there.
regardless, in git .. 21564's diff turned into 11077485.. and perhaps properly tagged as branch, despite being tagged as trunk commit 21565. do I have that right?

Sean (Feb 26 2021 at 18:50):

starseeker (Feb 26 2021 at 18:53):

In git, if I'm interpreting gitk's display properly, 11077485329842c81213eab68006fe5d58b5925f is a branch commit. If 21564 was the branch commit in SVN, that's probably what it should be in Git. Not 100% sure why it got the 21565 assignment instead.

starseeker (Feb 26 2021 at 18:54):

Best guess is something funky happened because the timestamps of those two commits are identical in SVN, as far as I can tell.

Sean (Feb 26 2021 at 18:56):

Okay, yeah, that's what I thought I was seeing as well. Don't see how it got 21565 either. Is there a way to check, see if that happened anywhere else? Not too worried but if it's scannable, we can do a quick check.

starseeker (Feb 26 2021 at 18:58):

Only thing I can think of would be to look for identical timestamp commits in SVN and double check the Git assignments, but not sure how script-able that is (especially since we're accumulating a fair set of revision number assignments/updates.)

starseeker (Feb 26 2021 at 18:59):

f5a1b0037fec2927cba073d118db24cdbd681975
a098425430db227021617976961e6b51ce5569cb
e6417be98f27d570d863744f566f5aaf738abbe6

Those might be worth checking - I think they also had incorrect revision numbers

starseeker (Feb 26 2021 at 19:03):

It might get to the point where I should run the updates we've accumulated and establish a new baseline for additional comparisons, so we can focus without re-discovering what we've already fixed, but I know that would require regenerating the sha1/md5 mappings again. Let me know if you think things reach the point where that would be worthwhile.

Sean (Feb 26 2021 at 19:42):

Yeah, I'm ignoring timestamps because it'd be a fair bit of work to parse the date string into something that could be fuzzy compared in script land

Sean (Feb 26 2021 at 23:01):

You may already have, but here's a couple outliers that are partial matches, appear to be probably split commits?:

2125 LOG+DIFF MATCH ON 0c1f4a88c5c960bd7de51ef8a05e7f53f00fb1a2 (NOT TAGGED)
3102 LOG+DIFF MATCH ON 402419dac49d3abe9bd6036f76696b43a70a66f5 (NOT TAGGED)

Sean (Feb 27 2021 at 20:21):

awesome! got it doing the comparisons in parallel now... that should speed things up a bit!

starseeker (Feb 28 2021 at 00:38):

starseeker (Feb 28 2021 at 01:42):

Simple way to compare the brlcad_conv12 and the brlcad_conv15 logs to see changes seems to be:

starseeker (Feb 28 2021 at 02:12):

That filters out the sha1s so the message and other changes can be seen easily in a diff.

starseeker (Mar 01 2021 at 14:45):

@Sean It's looking like SVN and git use subtly different diffing algorithms, so the diff file changes don't always map up.

starseeker (Mar 01 2021 at 15:15):

starseeker (Mar 01 2021 at 15:27):

Sean (Mar 01 2021 at 16:26):

starseeker (Mar 01 2021 at 16:27):

Sean (Mar 01 2021 at 16:27):

Finished over the weekend pretty quickly actually, but I was too exhausted to verify+upload it.. sorry.

Sean (Mar 01 2021 at 16:27):

Sean (Mar 01 2021 at 16:28):

I have a laundry list now.. will post it in the categoric sets here in a few min.

starseeker (Mar 01 2021 at 16:28):

Np, happens. I ended up manually hunting up a bunch of Git commits in SVN - hopefully that'll be helpful.

Sean (Mar 01 2021 at 16:29):

Sean (Mar 01 2021 at 16:30):

I've not done anything with 15 or 16. I can kick that off a final pass on 17 assuming there are a few updates, but still working on 12 to keep shas in sync.

starseeker (Mar 01 2021 at 16:30):

starseeker (Mar 01 2021 at 16:31):

starseeker (Mar 01 2021 at 16:32):

FWIW, I'm not convinced all the CVS era commits will be diff free, even if the revisions line up.

Sean (Mar 01 2021 at 16:53):

Yeah, I think we already found a few differences where commits were split differently. They seem to be very few overall.

starseeker (Mar 01 2021 at 17:44):

I think cvs2svn and cvs-fast-export might have picked different contents for their "synthetic commit to represent incomplete tag" commits... I suppose a case can be made either way for assigning the corresponding SVN revs if that's what happened. I went ahead and did so, but I could go either way.

starseeker (Mar 01 2021 at 18:05):

r4778 actually is a nice compact illustration of different diff picks - at least with the svn and git versions I have, git produces:

diff --git a/librt/db_io.c b/librt/db_io.c
index 3645cea1dc..7faa9be6ba 100644
--- a/librt/db_io.c
+++ b/librt/db_io.c
@@ -32,8 +32,8 @@ static char RCSid[] = "@(#)$Header$ (BRL)";

 #include "machine.h"
 #include "vmath.h"
-#include "raytrace.h"
 #include "db.h"
+#include "raytrace.h"

 #include "./debug.h"

Index: brlcad/trunk/librt/db_io.c
===================================================================
--- brlcad/trunk/librt/db_io.c  (revision 4777)
+++ brlcad/trunk/librt/db_io.c  (revision 4778)
@@ -32,8 +32,8 @@

 #include "machine.h"
 #include "vmath.h"
+#include "db.h"
 #include "raytrace.h"
-#include "db.h"

 #include "./debug.h"

starseeker (Mar 01 2021 at 18:06):

Shouldn't impact a full-up revision check of course, but does illustrate the limits of diff comparisons nicely.

Sean (Mar 01 2021 at 18:09):

I should have one of the lists cleaned up here soon now. Trying to make sure I don't feed you bad data... so much scripting...

Sean (Mar 01 2021 at 18:09):

starseeker (Mar 01 2021 at 18:11):

/me can imagine - once this is done I'm going to have to scrub my home dir to clean out a truly amazing pile of intermediate scripting files, checkouts, test dirs, etc.

Sean (Mar 01 2021 at 18:11):

Yeah, I noticed some of the different diffs like that. Pretty interesting. I found a couple more complex cases where an entire function appeared to be added/removed when in reality all that happened was the end parenthesis on one function was moved and the signature on the next function had an edit. Somehow git's diff engine decided it would represent that as some mangled movement.

starseeker (Mar 01 2021 at 21:28):

@Sean I'm seeing a big swath of differences between r702 and r3735 - given the timing I'd guess that's tied up with that timestamp business in the SVN conversion?

starseeker (Mar 01 2021 at 21:30):

Sean (Mar 01 2021 at 22:08):

yeah, I noticed them a while back. found many/most of them (or ruled them out as splits/inconsequential).

starseeker (Mar 02 2021 at 03:00):

@Sean if we hit a situation where a commit message matches to one revision but the change matches a different revision, which mapping do you prefer to use?

Sean (Mar 02 2021 at 06:17):

Sean (Mar 02 2021 at 06:18):

regardless, I think it's more important the rev match the diff since we're notionally using these numbers to trace back changes in a file

Sean (Mar 02 2021 at 06:21):

unrelated, here's a neat little find in the commits. there appear to be exactly 7 commits that were perfectly duplicated on branches and trunk:

10 19514 LOG+FILE+DIFF PERFECT MATCH ON c9cc663089d441f8a7d40f63757b0080dec5af10 f5419dcbab0e9edc78c90af24b5318b04686a7b2 (TAGGED MISMATCH f5419dcbab0e9edc78c90af24b5318b04686a7b2)
10 19595 LOG+FILE+DIFF PERFECT MATCH ON 0c2cb0cf51b8f543cd740e758ea3ebe2be964336 afbcb106f05606065ae3ce11b602fa566efb0031 (TAGGED MISMATCH afbcb106f05606065ae3ce11b602fa566efb0031)
10 19605 LOG+FILE+DIFF PERFECT MATCH ON 9bacc2b9ac94977113d3d68617ac4c896a37da60 c614ed067a631ba7d56fee51d1fc289359efb64b (TAGGED MISMATCH 9bacc2b9ac94977113d3d68617ac4c896a37da60)
10 19697 LOG+FILE+DIFF PERFECT MATCH ON e49447b2d924385b7272c6ba8d78e490590f1778 f363b6cbec7bdd415f20e77a9d3734ecfa6cbf98 (TAGGED MISMATCH f363b6cbec7bdd415f20e77a9d3734ecfa6cbf98)
10 19892 LOG+FILE+DIFF PERFECT MATCH ON 200ca9ba685b57dbc4bd0dcd9600649a7bec8117 f5787013aff6a38adc807bcc5a8db617510818a3 (TAGGED MISMATCH 200ca9ba685b57dbc4bd0dcd9600649a7bec8117)
10 19992 LOG+FILE+DIFF PERFECT MATCH ON 1b8fd04c74f8b99551e35ec87d4980bb27735a62 ae67110218bc3d71c5f3301707b5d86a60564cf7 (TAGGED MISMATCH 1b8fd04c74f8b99551e35ec87d4980bb27735a62)
10 64506 LOG+FILE+DIFF PERFECT MATCH ON eb5c98bf8799083d4d946f1f63f9e1edd8e61631 2ca450a34b29f37d58b4ed8288c3f41a4b155a78 (TAGGED MISMATCH eb5c98bf8799083d4d946f1f63f9e1edd8e61631)

Ignore the mismatch, I manually verified and they're all correct in git. It was just interesting because there appear to be so few of those. I kind of expected more, but they were apparently pretty rare to be exactly the same message, the same files, same diff.

starseeker (Mar 02 2021 at 20:54):

starseeker (Mar 03 2021 at 04:05):

So, here's a question - 7496c761e580e1935607fc336ff85bf06c524caf was initially unassigned. It got assigned r10209 based on commit message and history position, but diffing it with the SVN checkout indicates some of the changes for r10209 in SVN got grouped into the git commit labeled r10210 instead.

So we can assign r10209 to 7496c761e5 and be "approximately" correct - presumably the best match available in the git history to that SVN commit, but with a checkout that won't match - or skip assigning r10209 to any commit (losing some mapping info, but skipping a mapping that can't produce a matching output.)

Sean (Mar 03 2021 at 04:38):

So I've been down a rabbit hole trying to sort out how git handles encoding, but it's looking like it's not just that -- I think there's a couple categoric issues potentially. check these out:

b17a2836c85b43422c15faf7b111088bc4e445e3
a9daa166161d57ee6ed486cc9488880ffc5da843
ed4c28dcc1f17520d6596192e2ccae808d44ba4f
bc320ea12852890495809d142600a97eb241bd6f
d1e7455ffff304d2b8f25aba0cf144c6dc0fb4b4
9594f3ce737b98e902379066be02337eabc8db53
18ea6afa636886ee2ba5fb7d7807a920db3ee35e
8a97709dae7e86479bc04ab8d52dcaa65c2b4beb
9ae7c9024838f140c1cb20d0ddaf0606e2e486ef
c13ba71962660bcd2bb471671a08d61c94827e30
9ef20d544982d92f0b1d9183477c42543c4d45c4
4d6d7aad28eed5f23e31aa3f3fc37576de05b6dc
03eab0819b8a74d2a046273443ff14122f2d7e98
92cc90f7397cf45802a70f70260cfa2f57b1fc3b
106637f9c2913d3cc43d8a02a0f955c9709f67d4
6c20c610b10b3c098ad8c8bd53fc111791bca7e6
9f1e2c92eb250b39ac64b981c6246236f0cdb2c5
4c103440e2947d6990386e2767b9778266dd1517
b7a0eb56822e52c1a18ca30f312abea93ead6867
c90bfc8e507ea27863d82ea9ff514d2c79253b98
cb8ebedb7da7eb7981d0038fc826b61f4315e699
b0f3314a23e067051d520b11da483d068b73ebe6

Sean (Mar 03 2021 at 04:39):

there's clearly some utf-8 going on there that wasn't preserved, but then there's also some utf-8 getting added where it previously did not exist. I didn't scan all commits for the condition -- these are the ones that came up as matching DIFF+FILES but not matching the log message.

Sean (Mar 03 2021 at 04:40):

looks like about half of them have message up log messages where there was an apostrophe or a double quote. I checked svn and they were indeed just simple single/double quotes, so I'm thinking something in the scripting

Sean (Mar 03 2021 at 04:43):

As to your question (sorry, had to offload before I lost the context) ... I have that commit matching 10209 and 10210 as well because of the log message match. They match these git commits:

5222348e9f8c57c3a7623700413d0f37a1d74122
7496c761e580e1935607fc336ff85bf06c524caf
46472340020700642675b9613c7ddce85c391bea
becf17cb8e73ddbef7a0e840090712714ef4cff0
5846eaff72182de5f744baf4ef8c757b1e44b615

Sean (Mar 03 2021 at 04:44):

So you could tag them all or just the first, shrug, all valid enough choices I think

Sean (Mar 03 2021 at 04:44):

Sean (Mar 03 2021 at 04:46):

when I gave them a prelim scan, it looked like most are commits split up differently than they were in svn

starseeker (Mar 03 2021 at 05:31):

So looking at the first one on that list (b17a2836c85b43422c15faf7b111088bc4e445e3) I'm seeing the following:

add Roßberg to list of contributors

add Roßberg to list of contributors

add Roßberg to list of contributors

You're saying your scripts indicate the SVN and Git messages don't match? All three lines appear to have the same utf8 character, at least here...

starseeker (Mar 03 2021 at 05:32):

Sean (Mar 03 2021 at 05:35):

yeah, I don't get the utf chars here when I query git. I could have done something that caused them, but if I just run git show, I get encoded mess

Sean (Mar 03 2021 at 05:36):

appears to be tagged across a variety of branches (which maybe happened, I hadn't checked that yet)

Sean (Mar 03 2021 at 05:40):

svn diff -c30687 file:///Users/morrison/brlcad.github/svn.sfmirror/code | grep ^Index | cut -f3 -d/ | sort | uniq
VendorARL
libpng
scriptics
zlib

for i in `echo "004ec0ae439f0ca3c814d22a46957012cd8fb239
720f9b9b75588e35d3cce0f9f5b802abea2259ab
f206b315ca475d3a3e55e98ec42d772c6b05baee
cbff64617866cc3fc2b25db15cd610e651561958
87bc784daf7f15cc8d9c9fa980a934a98a17de95
d748c2ea214b699008563e18f5a7105de39faba9
004ec0ae439f0ca3c814d22a46957012cd8fb239"` ; do git show $i | grep svn:branch ; done | sort | uniq
    svn:branch:Original
    svn:branch:itcl3-2
    svn:branch:libpng_1_0_2
    svn:branch:tcl8-3
    svn:branch:tk8-3
    svn:branch:zlib_1_0_4

starseeker (Mar 03 2021 at 13:01):

starseeker (Mar 03 2021 at 13:05):

starseeker (Mar 03 2021 at 14:24):

OK, so it looks like the git commits are spurious - I may have messed up a correction or some such. Of the 4 branches from r30687, only VendorARL is present and it looks like that's because I custom-added it.

starseeker (Mar 03 2021 at 15:06):

r15365 created the libpng branch in SVN, if I'm not mistaken. In Git, that revision got assigned to f8fa716f5077cdde438f676c1b24244a09eb3fcd

starseeker (Mar 03 2021 at 15:17):

r15338 created the zlib branch in SVN. In Git, that looks like e85b06be0fa6632e097d8c728506ab5251a2b635

starseeker (Mar 03 2021 at 15:24):

The scriptics branch has 4 commits - r19756, r19758, r19760 and r19762. Those don't have assignments right now, but it looks like the corresponding commits have r19757, r19759, r19761 and r19763. Looking at them, I'd say the four earlier commits are probably the better content choices for assignment (not to mention having the mapping commit messages.)

starseeker (Mar 03 2021 at 15:40):

@Sean OK, I think I've got the corrective files in place for r30687. Basically, since cvs-fast-export put the commits on other branches, we don't have png, zlib or scriptics branch deletes. I added the proper VendorARL delete, and removed the spurious itcl3-2, etc. deletes incorrectly associated with r30687 in Git.

starseeker (Mar 03 2021 at 15:41):

Sean (Mar 03 2021 at 15:42):

I don't think the encoding was a git version issue, I think it's just encoding. I think I have it sorted out.

Sean (Mar 03 2021 at 15:42):

Looks like the git command I used to dump the log and the svn command used to dump the log ended up dumping differently is all.

Sean (Mar 03 2021 at 15:43):

So that's pretty much the entirety of commits that had UTF-8 characters in them. The suspicious quote-related ones look like they're actually smart single quotes, probably copy-pasted from some output.

starseeker (Mar 03 2021 at 15:44):

Sean (Mar 03 2021 at 15:44):

starseeker (Mar 04 2021 at 01:01):

Well, that was mostly a blind alley I should have known better than to chase, but it did result in characterizing some of the commit diffs... looks like cvs-fast-export and cvs2svn sometimes picked different commit ordering for commits with the same timestamps.

@Sean unless you feel really strongly about that I'd rather not try to switch them around - it'll take some effort on the repowork code to support doing so.

starseeker (Mar 04 2021 at 01:14):

We need some kind of "good enough" criteria... my sense is that chasing down all the CVS vs SVN vs Git differences has the potential to be nearly endless...

Sean (Mar 04 2021 at 07:17):

Yeah, I'm not worried about commit ordering. The oddity was the multitude of seemingly unrelated branches. Working on pulling that list still, had some diffs that had to get recomputed and worked on tallying where we're at.

Sean (Mar 04 2021 at 07:23):

My criteria has been to identify or explain all the non-empty trunk commits. That all are tagged or otherwise accounted for correctly (i.e., with something matching or it's a split commit). We're definitely closing that gap.

Sean (Mar 04 2021 at 07:55):

30687 NOT FOUND (empty files) (TAGGED MISMATCH 004ec0ae439f0ca3c814d22a46957012cd8fb239 720f9b9b75588e35d3cce0f9f5b802abea2259ab f206b315ca475d3a3e55e98ec42d772c6b05baee cbff64617866cc3fc2b25db15cd610e651561958 87bc784daf7f15cc8d9c9fa980a934a98a17de95 d748c2ea214b699008563e18f5a7105de39faba9 )
30688 NOT FOUND (empty files) (TAGGED MISMATCH 3882bb89a329277499b8b6c2246be115544740a0 68aeb784b3ee698c854878c190eb4b229b88e1fe )
30690 NOT FOUND (empty files) (TAGGED MISMATCH bab9cb74c7cf403e3c6ffb862367e7e921d5de5e 1f1d7a7f607d5b4c673d1d73cc7bcb126b0da82b ebaea28c7f234f5af88bbe8f60e8cae1026d7f08 9a4972e8d397e2fe1457987252531bbb08aae2b5 2000a7fd53ba7f017eadb55168ed737a0e6d2906 47ca01661701d59ee6aa948cd914b42e9ae9e36e )
36471 NOT FOUND (empty files) (TAGGED MISMATCH 5d5a16ac1af3bef7ea3acd9df913a882ecb2c450 cf54441bbb9da781638c782f0330e2399b114ba2 f3402be29c09993717319df0a8045087c3c1efcc 29fb00141b4040de08c9319404bfe44946ef43f2 2c43fbad65f4bc373dfa80a6254077b5913623d0 e19308e9b43771204ad04daa015bb646ffda7077 )
36472 LOG+FILE MATCH ON 96a3e5fb75628744e4835d9ce2f7cbf8dbca8ec4 (TAGGED MISMATCH 96a3e5fb75628744e4835d9ce2f7cbf8dbca8ec4 c0737a9252506872ce5ce6cd14207f7c375741da )
46324 NOT FOUND (empty files) (TAGGED MISMATCH 44e3d7341c5680250d65091b2aff6ed051720a11 a988903bbe27985e0dd94228e07079e91e98be4d c54b9b07158d4a904aabddae264290854ecb250c 03af105da8dd3cf85a29cc7f056513cc8e79d751 )
46322 NOT FOUND (empty files) (TAGGED MISMATCH e56ca9ed3e746b0f0531a5a90a50706dc4486786 cbd805930e92e0174548d245eee8a50f79f4be6a 8db928ed630bba609e98e97045dc91377539353e f64cf35a3a10e027863a68a07f6d4dda041d0fb4 3e54caeb944540d809a8c123289f9fb3624b7509 f199b69dd1f620bfa299a9e8fd520c37cc9b3c26 33b42ffbd7c4aa5e42e5854d020a8d66dd69ccfc a6225b252463bcb48ce3376200227c1e783c77d5 )
46328 NOT FOUND (empty files) (TAGGED MISMATCH ccb829355adc0829b9a5a7a3f0b5ac72dc13ea45 a442ff82f39e00b14ef139fb8f62b18c0ec32046 e4fba5a7cdbc525184d64170eec22e7eeedbd1f2 4396a0b1cd513d4ce9945589c4aadb17eda9a6d0 5452bab5c382ff6f2d0af42c2d4b367a0fdc13aa 9884b41b3aa6f790c80dbb4a55cf5cea4844fc8b 46d4a300710516c5547fa1b6f64ba29ec64ab3b4 )
46335 NOT FOUND (empty files) (TAGGED MISMATCH dd2bb79965568f5aab4f7458606d875d22b74b40 f5e6fc5ebfaaedceb7538a1f2ba1a3fc1589c399 )
62127 LOG+FILE MATCH ON 797d0138514136e2e95b0dfa1cc7d2e774fef2ab (TAGGED MISMATCH bae6fd511505e5e4f12f16b1cd73b5381f4f47f6 797d0138514136e2e95b0dfa1cc7d2e774fef2ab )
62975 NOT FOUND (empty files) (TAGGED MISMATCH dbaf54ff6b25ad2f576f82f26086101bc5015dec e047bc1116cc3199bbdbf58101ef281c153c2b74 )
69921 NOT FOUND (TAGGED MISMATCH cca216f058fe5791dbbd082ad7293911b6aae9f6 f828d1c0b1f6e68879a1bdecb2c58d1dc9a9207b )

Sean (Mar 04 2021 at 07:57):

can ignore the NOT FOUND / empty files -- that's just me not tracking branches. What's interesting is those are revs tagged to multiple git commits. Some of course may be intentional, but that's all of them (on conv12).

starseeker (Mar 04 2021 at 14:20):

starseeker (Mar 04 2021 at 14:22):

r30688 is on two commits because it eliminated both the branch with the Mac Hack commit and other "unlabeled" branches which mapped (collapsed) to "master-UNNAMED-BRANCH" in the cvs-fast-export conversion. So master-UNNAMED-BRANCH has two branch delete commits assigned to it. Might as well delete 68aeb784b3ee698c, since it doesn't add anything.

starseeker (Mar 04 2021 at 14:25):

starseeker (Mar 04 2021 at 14:27):

starseeker (Mar 04 2021 at 14:30):

r36472 is the result of a branch naming consolidation - c0737a92525 can be removed.

starseeker (Mar 04 2021 at 14:31):

starseeker (Mar 04 2021 at 14:32):

starseeker (Mar 04 2021 at 14:36):

r62127 looks like a branch delete that registered an empty commit on trunk for some reason - 797d0138514136e can be removed.

starseeker (Mar 04 2021 at 14:36):

starseeker (Mar 04 2021 at 14:40):

r69921 - looks like a branch rebase got recorded somehow as a branch delete plus re-creation - f828d1c0b1f6e68879a1bdecb2c58d1dc9a9207b can be removed.

starseeker (Mar 04 2021 at 14:47):

@Sean Note that I went and manually tagged a lot of Git commits as mapping to multiple SVN revisions in the post-conv12 update logic...

Sean (Mar 04 2021 at 16:02):

I did notice that... it "should" just mean a lot more multiple matches no? If so, I think we can just do a post-process check later to make sure there wasn't a typo or other blatant mistake in the manual tagging, but shouldn't affect the upload V&V.

starseeker (Mar 04 2021 at 16:05):

starseeker (Mar 04 2021 at 16:07):

I wasn't sure how "deep" you wanted to go checking those manual tags - the majority are based on context (unmapped commit that is immediately before a mapped commit, with a file missing from the "mapped" commit compared to the SVN file list) but I'm not set up to actually try and validate all the diffs as being part of the SVN commits.

starseeker (Mar 04 2021 at 16:08):

It may not be possible in all cases anyway, if one git commit ended up getting deltas from two SVN commits - in that case the best that can be done is an "approximate" assignment.

Sean (Mar 04 2021 at 16:11):

ankle deep, just blatant sanity check to make sure they are deliberate or mistakes since they were outliers.

Sean (Mar 05 2021 at 05:45):

Okay, @starseeker here's a batch for you to check out, myriad issues. These are all the commits that do not map uniquely. Most are probably correct as-is and simply aren't unique because they were a common log message applied to the same files or similar or were branch commits (keep in mind that I'm ignoring branch-only diff data so they show up as "not found"), BUT the rest are all multiple candidate diffs. Could be entirely benign or correct, but could use your eyes on at least some of them.
svn.to.git5.multiple_matches.sorted

Sean (Mar 05 2021 at 06:02):

Here's one you may have already captured with changes you made a couple days ago, but here are all the commits that match svn revs in LOG+FILES+DIFF, but aren't tagged revs in git (or at least weren't as of brlcad_conv12). That's not to say that they should be all mapped -- it's entirely possible for a commit to have gotten split and just happens to map to another with the same files and log message. I'm not sure how to rule that out, but maybe you can verify them easily. There's 167 in this category:
svn.to.git5.matching_not_tagged

Sean (Mar 05 2021 at 06:12):

+9016 LOG+FILE+DIFF MATCH ON 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b (TAGGED MISMATCH 68b56645d7a689a3af445bb5dfef16c78a4a4270)
+9015 LOG+FILE+DIFF MATCH ON 68b56645d7a689a3af445bb5dfef16c78a4a4270 (TAGGED MISMATCH 24c9f6ebb84eba2bb53211c3012b7dfb68672a2b)

Sean (Mar 05 2021 at 06:12):

or they're splits because of cvs screwery and they're right because of other adjacent commits?

Sean (Mar 05 2021 at 06:30):

another set similar to matching_not_tagged is this batch that aren't/weren't in git but match a log+file pairing, possible candidates. Note some are non-unique.
svn.to.git5.matching_lf_not_tagged

Sean (Mar 05 2021 at 06:47):

here's a much smaller but similar set of untagged commits where a matching log file was found, possible candidates for manual tagging. affects 37 commits:
svn.to.git5.matching_l_not_tagged

Sean (Mar 05 2021 at 07:30):

In theory, I think those 5 data sets reconciled fully should nearly result in full coverage... the only ones missing should be ambiguous cases. I'll can run a final trunk pass on any changes you make (conv16?) and we can see if there are any left! This might be it.

Sean (Mar 05 2021 at 07:30):

Sean (Mar 05 2021 at 07:32):

 78233 total unique commits
-10544 PERFECT MATCH
- 9356 NOT FOUND (branch changes)
-  939 EMPTY (prop changes)
-50807 LOG+FILE+DIFF (matching)
       167 LOG+FILE+DIFF MATCH but not tagged (all UNIQUE)
       141 MISMATCH or duplicated candidates
- 5001 LOG+FILE (matching)
       38 LOG+FILE MATCH but not tagged (13 UNIQUE)
       776 MISMATCH or duplicated candidates
-  180 LOG+DIFF (matching)
-   29 FILE+DIFF (matching)
-   30 DIFF (matching)
-    0 FILE (matching)
- 1117 LOG (matching)
       90 MATCH but not tagged (all UNIQUE)
       350 MISMATCH or duplicated candidates
------
   229 unaccounted for in mismatches not tagged
  - 90 LOG not tagged
  - 13 LOG+FILE not tagged
  -167 LOG+FILE+DIFF not tagged
------
   -41 dupes not excluded properly (oops)

starseeker (Mar 05 2021 at 14:27):

starseeker (Mar 05 2021 at 14:34):

Going through the multiple_matches file, it's looking like the SVN era commits are mostly "checkpoint" or similarly ambiguous commit messages on similar file sets (which is what I would expect for the SVN era - given how the commits were generated for that portion of the history I'm not sure how we'd get an SVN revision number mis-assignment, since the commits were generated on a per-SVN commit basis to begin with...)

starseeker (Mar 05 2021 at 14:44):

r62027 and r62708 are a bit more interesting - they are branch creation and deletion commits that represent me adding and deleting a branch in the wrong place. They are candidates for removal, unless you want to keep them to preserve the history of what happened at those particular SVN commits.

starseeker (Mar 05 2021 at 15:01):

@Sean I've scanned the logs for the SVN era multiple_matches commits, and r62027 is the only one that jumped out - the rest appear to be either checkpoints, branch syncs, throwaway test commits, or applying identical changes to different branches.

starseeker (Mar 05 2021 at 15:02):

or a few that are different changes to the same file with the same commit message.

starseeker (Mar 05 2021 at 15:05):

One I'm not following - how come da4ace8194f81d0f92565b428dfa309143b37914 and ae970a06e7d02f63e7c77ff927af5ca90721a111 are getting flagged as 75110 match?

starseeker (Mar 05 2021 at 15:07):

I see they're "rename" commit message commits, but I'm not seeing any file matching...

starseeker (Mar 05 2021 at 15:30):

r799 is an example where the commit groupings ended up different - match.c isn't in SVN 799, but the vdeck.c changes in that commit do appear to align with the r799 changes.

starseeker (Mar 05 2021 at 16:10):

I'm trying to go through the CVS era a bit more carefully, but so far none of the multiple_matches seem to indicate mis-mapped files. A couple untagged commits that matched entries on my list, and one minor correction to a git rev assignment.

starseeker (Mar 05 2021 at 16:28):

The matching_not_tagged I confirmed as being part of svn_rev_updates.txt, and the lf_not_tagged had a few that appear to be valid matches as well (some aren't). I think I've accounted for the l_not_tagged commits as well.

starseeker (Mar 05 2021 at 16:45):

@Sean It will take me a bit more time to manually confirm that none of the "TAGGED MISMATCH" cvs era git commits are actually incorrectly identified, but I'm hopeful they're good. I've uploaded the current state at https://github.com/starseeker/brlcad_conv17 - if neither of us finds anything else, I'll do a final update from SVN and we'll be ready to roll.

(The most likely source of any remaining issues is if you spot something in my manually assigned commits - they're more extensive than the ones from the svn.to.git lists, since I was making a stab at mapping all the commits I could back to SVN.)

Sean (Mar 05 2021 at 21:07):

Sean (Mar 05 2021 at 21:22):

False positive, can ignore them. They're an artifact of how branches were handled from svn. They have empty file lists, so it erroneously thinks it has a better match than it really does. I didn't get around to detecting and handling that case differently. If you come across a rev that is branch activity, you can just skip it.

Sean (Mar 05 2021 at 22:20):

So.... based on an assumption that svn_rev_updates.txt is correct enough, we're done to just 48 to resolve...HOME STRETCH!
...

Sean (Mar 05 2021 at 22:21):

Sean (Mar 05 2021 at 22:43):

starseeker (Mar 05 2021 at 22:44):

Sean (Mar 05 2021 at 22:44):

starseeker (Mar 05 2021 at 22:45):

Maybe... if we replace the commit with a new commit having a custom message. A bit tricky, but doable if it's only one or two

Sean (Mar 05 2021 at 22:45):

there are a handful of svn_rev_updates.txt that didn't apply because the commit was already tagged as something else

Sean (Mar 05 2021 at 22:45):

starseeker (Mar 05 2021 at 22:46):

Sean (Mar 05 2021 at 22:48):

I don't know, can run a script to find out -- but basically it's all the entries in svn_rev_updates that are on a commit that has something else

Sean (Mar 05 2021 at 22:48):

starseeker (Mar 05 2021 at 22:49):

Sean (Mar 05 2021 at 22:49):

yeah, oldest I found was r14320 which is in 76e74e9e9ce955bf6602171e67cbcd9539bfbec9

starseeker (Mar 05 2021 at 22:50):

My original thought was that as long as we had one rev number assigned in the right general range, that would provide timeline and history context. Is it worth trying to tease out the multiple commit mappings?

starseeker (Mar 05 2021 at 22:50):

Sean (Mar 05 2021 at 22:50):

Sean (Mar 05 2021 at 22:51):

starseeker (Mar 05 2021 at 22:51):

Not trivially - to really do that "Right" I would have had to generate independent per-file diffs for all the git and SVN revisions, then find all the corresponding changes and do all the multi-mappings.

starseeker (Mar 05 2021 at 22:52):

If it's just a couple we can fake it by doing hand-assembled replacement commits, but anything more intensive would be really tough.

starseeker (Mar 05 2021 at 22:53):

Sean (Mar 05 2021 at 22:53):

I'm not looking to find them beyond what's already in svn_rev_mappings.txt ... there's some unknown number of them in there already

Sean (Mar 05 2021 at 22:54):

In hunting down the last few trunk commits missing, it turned out they were in the mappings file identified

starseeker (Mar 05 2021 at 22:54):

Sean (Mar 05 2021 at 22:54):

starseeker (Mar 05 2021 at 22:55):

Whoops. I thought I had checked for those - must have re-introduced a couple. Hang on - awk + sort + uniq to the rescue...

starseeker (Mar 05 2021 at 22:57):

/me blinks - all the sha1 keys in svn_rev_updates.txt appear to only be in there once...

starseeker (Mar 05 2021 at 23:00):

If we need to do it for the missing ones, as long as it's not too many, I can do what I did for the "Mac Hack" commit to fix its data and make replacement commits to apply.

Sean (Mar 05 2021 at 23:01):

Sean (Mar 05 2021 at 23:03):

it's that whatever processing association that normally happens happened (or it's because i'm on 12 and if I were testing 17 then I wouldn't find 9011 instead of 9012 and vice versa

starseeker (Mar 05 2021 at 23:03):

/me nods - I get it, and we actually want both so we don't have missing svn rev mappings.

Sean (Mar 05 2021 at 23:04):

starseeker (Mar 05 2021 at 23:04):

i.e. something for grep to match for both 9011 and 9012, even if it goes to the same commit.

Sean (Mar 05 2021 at 23:04):

starseeker (Mar 05 2021 at 23:04):

Sean (Mar 05 2021 at 23:04):

starseeker (Mar 05 2021 at 23:05):

Sean (Mar 05 2021 at 23:05):

starseeker (Mar 05 2021 at 23:05):

Sean (Mar 05 2021 at 23:05):

starseeker (Mar 05 2021 at 23:06):

Sean (Mar 05 2021 at 23:06):

well just that I ran into looking for missing trunks, but that won't have all the assignments you did

Sean (Mar 05 2021 at 23:06):

starseeker (Mar 05 2021 at 23:06):

starseeker (Mar 05 2021 at 23:07):

Sean (Mar 05 2021 at 23:07):

I think this scan over svn_rev_mappings will be a good enough check... if they're really rare, then we can just punt

Sean (Mar 05 2021 at 23:07):

Sean (Mar 05 2021 at 23:09):

Sean (Mar 05 2021 at 23:12):

MISMATCH on 859 and 858 in c7da20384024574fddc07c59dcdfcc2879560e31
MISMATCH on 9011 and 9012 in f9fd3ad956d23e854df73294083cb37ef3c2f341
MISMATCH on 11406 and 11407 in eb0179c08aefd8ea90697c42eba31244e4904eed
MISMATCH on 12424 and 12425 in fc9e5a26cba18a926c644a4e2bb4b321855f2a88
MISMATCH on 14320 and 14321 in 76e74e9e9ce955bf6602171e67cbcd9539bfbec9
MISMATCH on 18892 and 18993 in 67c46ada661fdab789632885c34bf77a277962db
MISMATCH on 21564 and 21565 in 11077485329842c81213eab68006fe5d58b5925f
MISMATCH on 22525 and 22521 in edf3df35c8c44492fa25cb3999788338b1f2570b
MISMATCH on 19756 and 19757 in a7c85f280677d70b8eef9aadf79302736ed26ffc
MISMATCH on 19758 and 19759 in f5a1b0037fec2927cba073d118db24cdbd681975
MISMATCH on 19760 and 19761 in a098425430db227021617976961e6b51ce5569cb
MISMATCH on 19762 and 19763 in e6417be98f27d570d863744f566f5aaf738abbe6

Sean (Mar 05 2021 at 23:13):

starseeker (Mar 05 2021 at 23:13):

Sean (Mar 05 2021 at 23:14):

yes, they were. okay, so probably showing up just because I'm comparing then to conv12

starseeker (Mar 05 2021 at 23:15):

OK, that's not too bad - my 858 test seems to be going smoothly, so I can probably get the others.

starseeker (Mar 05 2021 at 23:15):

Sean (Mar 05 2021 at 23:16):

were you still sorting through the other potential branch taggings or done with them?

starseeker (Mar 05 2021 at 23:16):

Sean (Mar 05 2021 at 23:17):

well that one, but more importantly the three not_tagged files to see which if any don't have a tagging

Sean (Mar 05 2021 at 23:18):

starseeker (Mar 05 2021 at 23:18):

I think I checked the non-tagged and all of those commits were listed in svn_rev_updates.txt

starseeker (Mar 05 2021 at 23:20):

multiple_matches has a pretty high false-positive rate - I'm basically checking the TAGGED MISMATCH commits against the trunk diff visually to make sure they look like they're correctly lined up. LOG+FILE is apparently not a terribly unique key in the revision set (mostly my fault, too, from what I've seen so far... I should go back in time and tell myself to use more unique commit messages.)

Sean (Mar 05 2021 at 23:20):

Sean (Mar 05 2021 at 23:21):

the thing about multiple matches is those are revs for tags that were not tagged in conv12

starseeker (Mar 05 2021 at 23:22):

Sean (Mar 05 2021 at 23:22):

probably would make sense to only check the multiple_match revs to see which if any are NOT listed in svn_rev_mappings , since they're potential new info

Sean (Mar 05 2021 at 23:24):

Sean (Mar 05 2021 at 23:25):

starseeker (Mar 05 2021 at 23:26):

Did you want me to do the multi-svn labeling? That'll probably take about an hour

Sean (Mar 05 2021 at 23:26):

starseeker (Mar 05 2021 at 23:27):

Sean (Mar 05 2021 at 23:27):

starseeker (Mar 05 2021 at 23:28):

starseeker (Mar 05 2021 at 23:29):

starseeker (Mar 05 2021 at 23:49):

@Sean 22521 and 22525 should both still be there after svn_rev_update - 22521 was moved to afd806bf472d0ac4b2685be406966a5a6eb28e5c

starseeker (Mar 06 2021 at 00:28):

starseeker (Mar 06 2021 at 00:29):

starseeker (Mar 06 2021 at 00:54):

starseeker (Mar 06 2021 at 00:56):

starseeker (Mar 06 2021 at 01:20):

Sean (Mar 06 2021 at 01:36):

starseeker (Mar 06 2021 at 13:55):

Sean (Mar 06 2021 at 18:13):

It's still chugging through it all; should know how it looks here in a bit. Per the checklist, we're done with the repo itself if there are no problems on this final pass! so exciting!

Sean (Mar 06 2021 at 18:13):

Sean (Mar 06 2021 at 18:15):

This is taking a little longer because I had to re-extract the diffs that ran last night. I forgot to set diff.renameLimit on the new conv18 which caused slews of false differences. The re-extraction is running.

Sean (Mar 06 2021 at 18:17):

I should probably figure out how to set that in my personal config, instead of having to set it every cloning.

starseeker (Mar 06 2021 at 19:36):

I pulled all the latest SVN commits in - brlcad_conv18 should now be up-to-the-minute (i.e. r78389)

starseeker (Mar 06 2021 at 19:37):

Unless you see an issue or someone commits before validation completes, brlcad_conv18 should be ready to upload.

Sean (Mar 06 2021 at 20:30):

I'm only checking through 78233 just so numbers can be compared with 12, but sounds good!

starseeker (Mar 06 2021 at 23:23):

You had indicated you wanted to do the final upload to the BRL-CAD github site - after setting the origin, this is what I use to push to upload everything:

starseeker (Mar 06 2021 at 23:25):

I'm not sure if a basic clone from github will get all the branches, so I'd recommend pulling a mirror clone:

git clone --mirror https://github.com/starseeker/brlcad_conv18.git
cd brlcad_conv18.git
git remote set-url origin git@github.com:BRL-CAD/brlcad.git

Erik (Mar 07 2021 at 01:00):

starseeker (Mar 07 2021 at 14:40):

starseeker (Mar 07 2021 at 14:41):

starseeker (Mar 07 2021 at 14:58):

@Sean did the updated run succeed? (by the way, I think that limit can be set with: git config diff.renameLimit 999999 )

starseeker (Mar 07 2021 at 14:59):

Erik (Mar 07 2021 at 15:11):

@starseeker: man page? :D a lot of cmds have similar (ninja -C, make -C, cmake -B <dir> -S <dir> ..)

starseeker (Mar 07 2021 at 16:04):

@Erik Oh, I see where I went wrong - it's a top level option supplied before the subcommands, so it's not in their --help statements.

Erik (Mar 07 2021 at 16:12):

yeah, it's a strange beast, git args/cmds are applied in order with side effects.

Sean (Mar 08 2021 at 14:30):

That's what I set, and it's needed to get the right diffs/logs for our history. The problem is that config is per cloning, so have to remember to do it every time.

Sean (Mar 08 2021 at 14:31):

Few changes in the numbers I've been looking at, but nothing turning the train around.

Sean (Mar 08 2021 at 14:33):

starseeker (Mar 08 2021 at 14:34):

starseeker (Mar 08 2021 at 14:36):

svn: E000013: Commit failed (details follow):
svn: E000013: Can't open file '/svn/p/brlcad/code/db/txn-current-lock': Permission denied

Sean (Mar 08 2021 at 14:40):

Sumagna Das (Mar 08 2021 at 15:43):

Sean (Mar 08 2021 at 15:44):

Sumagna Das (Mar 08 2021 at 15:45):

Sean (Mar 08 2021 at 15:46):

One of the world's oldest continuously developed source code repository's migration should be complete later today... ;)

Sumagna Das (Mar 08 2021 at 15:48):

Sumagna Das (Mar 08 2021 at 19:17):

Sumagna Das (Mar 08 2021 at 19:18):

Sean (Mar 08 2021 at 21:26):

@Sumagna Das I'm hopeful, but I'm still trying to figure out something that changed.

starseeker (Mar 08 2021 at 21:31):

starseeker (Mar 08 2021 at 21:39):

@Sean I'll be back on a bit later - please post anything I can help with. I'll be glad to re-run the final fixup pass again if necessary...

Sean (Mar 08 2021 at 21:41):

I need to make sure it's not something I did differently. I should know here in a bit. I need to pull conv12 again to confirm.

Sean (Mar 08 2021 at 21:42):

Still might not be enough to stop the gravy train, but it was a surprise. I'm hoping I just fat-fingered something.

starseeker (Mar 08 2021 at 21:46):

Sean (Mar 08 2021 at 21:50):

numbers are off. I don't want to speculate too much until I rule out a couple things.

starseeker (Mar 08 2021 at 21:51):

K. The good news is that as long as I don't need to do significant rework in the repowork C++, the post-brlcad_conv12 portion of the conversion runs pretty quickly.

starseeker (Mar 08 2021 at 21:52):

starseeker (Mar 09 2021 at 17:50):

Sean (Mar 10 2021 at 05:58):

Sean (Mar 10 2021 at 06:21):

Sean (Mar 10 2021 at 06:22):

I was ultimately able to reconcile most of the big differences, many looked like branch commit improvements (e.g., looks like you categorically eliminated the "initially added on branch" commits, that was 118 of them).

Sumagna Das (Mar 10 2021 at 08:13):

Can I clone the repo right now or are you guys still checking if there's any issues?

starseeker (Mar 10 2021 at 11:05):

starseeker (Mar 10 2021 at 11:06):

@Sumagna Das Give me a couple hours to check - this is almost certainly it, but I've got a couple things I need to do before I can focus properly on it.

Sumagna Das (Mar 10 2021 at 11:07):

Erik (Mar 10 2021 at 11:56):

starseeker (Mar 10 2021 at 12:14):

<snort> Only on one thing at a time. There are days when that's a significant handicap...

starseeker (Mar 10 2021 at 12:35):

OK, pull request and direct commit both worked, branches are present, tags are present, logs match, Contributors is populated. Looks good!

starseeker (Mar 10 2021 at 12:37):

Looks like I should have made that a rebase for the pull request... generated a merge commit too. Oh well.

starseeker (Mar 10 2021 at 13:18):

Sean (Mar 10 2021 at 14:43):

starseeker (Mar 10 2021 at 14:44):

The thought that worried me is that checkout out (say) prior release tags will produce checkouts that won't build.

Sean (Mar 10 2021 at 14:44):

@starseeker I hadn't looked at permissions yet, can do that today. There's a lot to do.

starseeker (Mar 10 2021 at 14:45):

Unfortunately, the fix requires a full (multi-week) re-run of the full process...

Sean (Mar 10 2021 at 14:45):

starseeker (Mar 10 2021 at 14:46):

If we want (say) 7.30.0 to check out with tkhtml in a build-able state, I need to re-generate the history leaving the tkhtml RCS tags in place. That's an adjustment to the filters, which means a full re-run.

Sean (Mar 10 2021 at 15:04):

Hm, I'm still not following. Why wouldn't a checkout of the rel-7-30-0 tag be any different than what it was? Is it not right? Or not right for src/other because of how history was spliced?

starseeker (Mar 10 2021 at 15:06):

It's not right because I made a point of stripping out the RCS tags to make the git history following cleaner. So, for example,

static const char rcsid[] = "$Id: cssparser.c,v 1.8 2008/01/19 06:08:13 danielk1977 Exp $";

static const char rcsid[] = "$Id$";

I exempted a few specific directories early on that were problematic (mostly the step related stuff) but tkhtml was one of the ones that got stripped.

starseeker (Mar 10 2021 at 15:07):

It never crossed my mind that those headers might be a compilation necessity, and apparently for all the scrutiny I put on the conversion (diffing, log messages, etc.) I never actually tried a full compile of the generated checkout.

starseeker (Mar 10 2021 at 15:10):

It looks like tkhtml does some cute trick where it generates a list of source files that go into a generated file, and the script that generates that source file is matching on those rcsid lines.

starseeker (Mar 10 2021 at 15:28):

Actually, I should probably confirm that they were originally populated in the raw SVN data - since SVN will do RCS keyword expansion, it's theoretically possible that they were stored unevaluated internally. If that's the case, even exempting src/other/tkhtml won't fix it because Git doesn't populate RCS tags.

starseeker (Mar 10 2021 at 15:31):

starseeker (Mar 10 2021 at 16:03):

Erik (Mar 10 2021 at 16:23):

a force push to fix something like that would be pretty traumatic once this is "the way"

starseeker (Mar 10 2021 at 16:27):

starseeker (Mar 10 2021 at 16:44):

starseeker (Mar 10 2021 at 16:46):

starseeker (Mar 10 2021 at 16:50):

Install the two scripts (rcs-keywords.clean and rcs-keywords.smudge) to /usr/local/share/git_filters

[filter "rcs-keywords"]
        clean  = /usr/local/share/git_filters/rcs-keywords.clean
        smudge = /usr/local/share/git_filters/rcs-keywords.smudge %f

Note that attributes is the file name, not a directory - the file should be renamed.

starseeker (Mar 10 2021 at 16:54):

What this will do is match the particular tkhtml files in question, and populate the tags. Placing it in .git/info (rather than .gitattributes) means it will be active for all checkout activities (a .gitattributes file wouldn't exist in older checkouts, defeating the purpose.)

I've adjusted main's copy of Tkhtml to not use an RCS tag for what it is doing, so it will not be altered by this filter. The older checkouts still using the $Id: tag, which are the ones that need to be populated, will match and be populated (thus being viable for compilation.)

starseeker (Mar 10 2021 at 16:56):

The attributes file specifically calls out only the tkhtml files in question, to minimize processing time overall.

starseeker (Mar 10 2021 at 16:59):

@Sean If that looks workable to you, I can write it up for inclusion in the src tree

Erik (Mar 10 2021 at 17:33):

starseeker (Mar 10 2021 at 17:55):

Erik (Mar 10 2021 at 17:56):

starseeker (Mar 10 2021 at 17:56):

Going that route means we don't need to worry about re-inserting any RCS expansions into the history - they're just populated on checkout

starseeker (Mar 10 2021 at 17:57):

If we want it to work without any RCS expansion, yes - that's a multi-day rewrite, not multi-hour. If, however, we use the filters to do the expansion just where we need to, all a user has to do is set up the .gitconfig and attributes file.

starseeker (Mar 10 2021 at 18:00):

That has the advantage, once set up, of giving us expanded RCS keywords anywhere we need them. I'm 90% sure the expanded tkhtml tags were originally in the commit history, but if I'm wrong about that even a full regeneration of the history wouldn't be enough - I'd actually have to inject the expanded tags into the commit history.

Erik (Mar 10 2021 at 18:01):

cool, please leave breadcrumbs for the next poor fool who tries to compile something old :grinning_face_with_smiling_eyes: (i had a 43bsd compiling an old old version in a simh vax11, crazy people will do crazy unexpected things...)

starseeker (Mar 10 2021 at 18:02):

The drawback of this is it's not a "working out of the box" solution - because Git has no way to expand RCS tags by default, it requires some work by the user to prepare the solution.

The best we can do is pre-bake everything and tell folks exactly how to set it up.

starseeker (Mar 10 2021 at 18:04):

@Erik And that's the primary drawback - someone coming into things cold and not knowing they need to set up RCS expansion for older checkouts.

starseeker (Mar 10 2021 at 18:20):

starseeker (Mar 10 2021 at 18:21):

starseeker (Mar 10 2021 at 18:22):

starseeker (Mar 10 2021 at 18:23):

Sean (Mar 10 2021 at 18:23):

Do tell, but I have a couple thoughts on this. First off, I'm not terribly concerned about tkhtml working but would be concerned if there's not a simple workaround that can be discovered when the failure is encountered.

Sean (Mar 10 2021 at 18:24):

what's that attributes file do? looks like it'll match on any files with those names??

starseeker (Mar 10 2021 at 18:25):

Yes, it's a filename match. I haven't figured out yet how to do a full-path match successfully.

starseeker (Mar 10 2021 at 18:25):

Sean (Mar 10 2021 at 18:26):

starseeker (Mar 10 2021 at 18:26):

As it happens $Id$ is the RCS keyword at issue, and although the Git expansion is totally different from RCS/CVS/SVN it satisfies the compilation requirement.

starseeker (Mar 10 2021 at 18:26):

Sean (Mar 10 2021 at 18:27):

I'm still more concerned about what the error looks like... is there a non-git workaround possible?

Sean (Mar 10 2021 at 18:27):

starseeker (Mar 10 2021 at 18:28):

Sean (Mar 10 2021 at 18:28):

starseeker (Mar 10 2021 at 18:29):

At one point I did try, but I may have removed it - it's impossible on headless build nodes and problematic otherwise (almost no systems install Tkhtml)

starseeker (Mar 10 2021 at 18:29):

Sean (Mar 10 2021 at 18:30):

if there is a simple way to turn it off (even if it disabled and man viewer), that'd be acceptable workaround

starseeker (Mar 10 2021 at 18:31):

starseeker (Mar 10 2021 at 18:32):

I'm fairly sure I didn't set up to disable just the Tkhtml dependent components.

starseeker (Mar 10 2021 at 18:33):

Sean (Mar 10 2021 at 18:33):

Sean (Mar 10 2021 at 18:34):

of if you just comment out the THIRD_PARTY_TCL_PACKAGE line in src/other/CMakeLists.txt

Sean (Mar 10 2021 at 18:35):

starseeker (Mar 10 2021 at 18:35):

Configure fails. (unnoticed dependency in tktable build on tkhtml build logic being loaded first.)

Sean (Mar 10 2021 at 18:36):

starseeker (Mar 10 2021 at 18:36):

starseeker (Mar 10 2021 at 18:37):

Yeah, OK - it actually did find the system tkhtml, but then tktable wasn't happy. However...

starseeker (Mar 10 2021 at 18:38):

starseeker (Mar 10 2021 at 18:41):

Urmf. It builds successfully, but at least on Ubuntu those packages doesn't seem to work - Archer can't load and while MGED will load, the man viewer won't come up.

starseeker (Mar 10 2021 at 18:47):

Sean (Mar 10 2021 at 18:48):

Sean (Mar 10 2021 at 18:49):

starseeker (Mar 10 2021 at 18:49):

That would produce the same result. Archer requires them and so wouldn't work, and MGED would work but without the man viewer...

starseeker (Mar 10 2021 at 18:50):

Installing the system tkhtml and tktable produced as much of a working config as we could expect without tkhtml/tktable built.

Sean (Mar 10 2021 at 18:50):

starseeker (Mar 10 2021 at 18:50):

I believe so - Archer's right panel uses tktable, and both the help viewer and man viewer use tkhtml. The way the Itcl class system works, if I'm remembering correctly, it all gets defined and loaded up front.

Erik (Mar 10 2021 at 18:51):

will do recursive globbing into subdirs, you can prefix / as the system root, too ( so /db//*.g )

Sean (Mar 10 2021 at 18:51):

Erik (Mar 10 2021 at 18:51):

Sean (Mar 10 2021 at 18:52):

that's a finite list -- can it be reduced to a one-line edit in tkhtml's build logic with the list of file names that it would have extracted (or a glob)?

starseeker (Mar 10 2021 at 18:53):

The simplest possible source code fix is probably a one line sed line of some sort that replaces all the $Id$ lines with $Id: tkhtml$

starseeker (Mar 10 2021 at 18:53):

starseeker (Mar 10 2021 at 18:54):

The issue is already fixed in the Git main - the only problem is the older checkouts that we can't change.

Sean (Mar 10 2021 at 18:57):

right, which I think is a problem unless we can figure out a trivial workaround :(

Sean (Mar 10 2021 at 18:58):

otherwise, there'd be almost no point to all the old tags since all the ones since tkhtml was introduced won't/can't work at all

starseeker (Mar 10 2021 at 18:58):

echo "/src/other/tkhtml/** ident" > .git/info/attributes doesn't count as trivial?

starseeker (Mar 10 2021 at 19:00):

find . -path \*tkhtml\* -type f -exec sed -i 's/\$Id\$/\$Id: tkhtml\$/g' {} \; should also do the trick.

Sean (Mar 10 2021 at 19:00):

it's a solution, but then it assumes git and is completely non-intuitive... that doesn't feel right

Sean (Mar 10 2021 at 19:01):

starseeker (Mar 10 2021 at 19:01):

starseeker (Mar 10 2021 at 19:02):

Sean (Mar 10 2021 at 19:04):

Sean (Mar 10 2021 at 19:05):

Sean (Mar 10 2021 at 19:11):

Hm, I'm not following this logic ... I think I need to get a branch and see how it was

starseeker (Mar 10 2021 at 19:12):

Sean (Mar 10 2021 at 19:12):

Sean (Mar 10 2021 at 19:13):

it's writing out #define lines (to whomever is calling mkdefaultstyle.tcl), reading from just four files (html.css, tkhtml.css, quirks.css,

Sean (Mar 10 2021 at 19:13):

starseeker (Mar 10 2021 at 19:13):

Sean (Mar 10 2021 at 19:14):

starseeker (Mar 10 2021 at 19:14):

Sean (Mar 10 2021 at 19:15):

because the logic says it's writing out #define lines, but none of them exist (yet)

starseeker (Mar 10 2021 at 19:16):

Sean (Mar 10 2021 at 19:16):

starseeker (Mar 10 2021 at 19:16):

starseeker (Mar 10 2021 at 19:17):

Sean (Mar 10 2021 at 19:18):

hm, so maybe ... can you generate it on trunk and see what happens on branch if you just drop htmldefaultstyle.c into the build dir?

starseeker (Mar 10 2021 at 19:19):

So, check out an older branch in git, take the trunk copy of that file, and drop it in the build dir?

starseeker (Mar 10 2021 at 19:21):

starseeker (Mar 10 2021 at 19:24):

@Sean You're thinking to stash the older htmldefaultstyle.c somewhere and just have folks drop it into the build directory?

Sean (Mar 10 2021 at 19:26):

Sean (Mar 10 2021 at 19:27):

does it work if it's dropped into the source tree? could commit changes to the tags...

Sean (Mar 10 2021 at 19:27):

starseeker (Mar 10 2021 at 19:27):

Sean (Mar 10 2021 at 19:28):

starseeker (Mar 10 2021 at 19:29):

I think we'd need to make new branches for those tags that don't already have one - a tag in git is just a pointer to a commit, so we'd be making a new branch from that change for the new tag.

starseeker (Mar 10 2021 at 19:30):

If we did that I'd want to use the same fix I made to main, so we can still keep the attributes option without breaking anything. The attributes solution allows arbitrary older commits to work, tag fixes will only address the tags.

starseeker (Mar 10 2021 at 19:30):

Actually though, with an ident based solution there, it probably won't matter anyway.

Sean (Mar 10 2021 at 19:31):

starseeker (Mar 10 2021 at 19:31):

Sean (Mar 10 2021 at 19:31):

starseeker (Mar 10 2021 at 19:32):

Right. The ones were we did that I believe have branches already (or they should) since we can't commit to tags. However, if any tags weren't edited that'll be new branches to introduce.

starseeker (Mar 10 2021 at 19:35):

/me is game to do that if you think it's the best solution. I'm also willing (gulp) to redo the process with the correct tkhtml contents, if you decide that's best - it was my mistake, and I'll do what it takes to fix it.

Sean (Mar 10 2021 at 19:35):

starseeker (Mar 10 2021 at 19:36):

Sean (Mar 10 2021 at 19:36):

it's not just on you... checking out a couple tags was on my verification list and I got lazy and skipped that one ... :(

Sean (Mar 10 2021 at 22:58):

looks like after turning off STRICT, adding include(CheckSymbolExists), all that was required to at least compile was commenting out the line it failed on -- Tcl_SetResult(interp, HTML_SOURCE_FILES, TCL_STATIC); in htmltcl.c:2787

Sean (Mar 10 2021 at 22:58):

Sean (Mar 10 2021 at 22:59):

I think that's an acceptable hardship since there's always going to be tweaking of old builds required.

starseeker (Mar 10 2021 at 23:23):

starseeker (Mar 10 2021 at 23:27):

starseeker (Mar 10 2021 at 23:30):

@Daniel Rossberg , @Sumagna Das - don't do anything yet with your forks. I may have achieved a deeper fix for the tkhtml issue, and if so it will require swapping out the current git repository.

starseeker (Mar 10 2021 at 23:31):

@Sean This is not a brlcad_conv12 derivative - it is the current repository up on BRL-CAD/brlcad with a targeted set of blob sha1 replacements centered on just the relevant Tkhtml files.

starseeker (Mar 10 2021 at 23:32):

As such it has the commits we had already made to that repository. Indeed, the commit where I had restored the rcsid strings is now in this new repository almost a no-op (there was one .txt file that got its revision restored in that commit but not in my remapping work.)

starseeker (Mar 10 2021 at 23:37):

Once I contemplated re-running the whole conversion again to avoid filtering the tkhtml sources, I realized it was far easier to take the existing conversion (which already has a multitude of corrections applied that would be tricky to re-apply again) and target just the necessary sources. So I scripted a git log --follow of all the relevant .c files in git, and for each log entry for each file extracted the svn revision. Then I checked out both the git and the SVN tkhtml sources for the corresponding revisions, calculated the git hashes for the git and svn versions of the file, and made a git fast-import blob out of each revison of the svn version of the file. That gave me a way to map the git internal state to what it should have been had the SVN files been properly included, and also gave me the raw blob inputs to feed to fast import so the blobs would be available to reference.

starseeker (Mar 10 2021 at 23:39):

It looks as if it worked. To switch this repository in I would recommend deleting BRL-CAD/brlcad and re-creating it, since all the revisions after somewhere in the 32k range will have different SHA1 values.

starseeker (Mar 11 2021 at 00:10):

starseeker (Mar 11 2021 at 01:44):

Sigh. Another oddity. Now that I can try distcheck, run.sh isn't set to executable and benchmark isn't cleaning up properly.

starseeker (Mar 11 2021 at 02:49):

For the latter it's a brute force approach - I generated a list of all the file paths rel-7-32-2 had set executable that git did not, and set those paths' modes to 100755 throughout the git history. I didn't attempt an analysis of when SVN did or didn't set that property on those files.

starseeker (Mar 11 2021 at 04:10):

/me fires distcheck-full on rel-7-32-2, notes that brain has reached "E", and heads off to the charging station...

Daniel Rossberg (Mar 11 2021 at 07:26):

Sumagna Das (Mar 11 2021 at 08:19):

starseeker (Mar 11 2021 at 12:53):

@Sumagna Das I would suggest doing so - @Sean will need to review what I've done and see what he wants to do.

starseeker (Mar 11 2021 at 12:55):

starseeker (Mar 11 2021 at 14:53):

One additional quirk I just hit, but something that's not specific to our setup as far as I can tell - git checked out the .3dm file in text mode by default on Windows. When I added .3dm binary to the .gitattributes file that seems to address it, but since older checkouts won't have a .gitattributes file on Windows they'll probably get the wrong checkout by default.

starseeker (Mar 11 2021 at 15:03):

Workarounds would either be the .git/info/attributes approach discussed earlier for $Id$, or setting .3dm in a global attributes file: https://stackoverflow.com/a/28027656

starseeker (Mar 11 2021 at 15:10):

With that caveat, brlcad_bench_fix rel-7-32-2 built successfully on Windows with MSVC

starseeker (Mar 11 2021 at 15:35):

starseeker (Mar 11 2021 at 16:11):

starseeker (Mar 11 2021 at 16:24):

rel-7-30-8 is too old to distcheck vanilla on this box without modding the system (system proj_api.h interferes )

starseeker (Mar 11 2021 at 16:25):

Erik (Mar 11 2021 at 16:28):

what about 7.0? :D I think that was the first release I contributed to (fbsd support and autoconf)

Erik (Mar 11 2021 at 16:29):

"here's a cd with fbsd, here's a cd with our source code, here's a computer. We'll try to get you on the network in the next couple of weeks." haha, the good old days :D

starseeker (Mar 11 2021 at 16:30):

/me chuckles. I'd almost certainly need a VM to try building something that old.

Erik (Mar 11 2021 at 16:35):

speaking of! A lot of my life lately is building singularity and occasionally docker images. I know we had a raw disk image a while back for loading into vmware or bochs or whatever, do we/should we(you) provide container images? :D

starseeker (Mar 11 2021 at 16:36):

@Erik Checking the diffs, it looks like whitespace changes (line endings) and expanded vs. unexpanded RCS tags. Plus a couple files like .cvsignore and .gitignore

Erik (Mar 11 2021 at 16:38):

he, hehe, he, yeh... 'find . -type f | xargs sed -i.bak 's/[ ^t]*//'` ... I think I did an indent back then, too...

starseeker (Mar 11 2021 at 16:39):

I mean differences between the SVN and git checkouts. Although if git's history following breaks in there I'll know who blame ;-)

Erik (Mar 11 2021 at 17:09):

c'mon, I was new, had to spraypaint my name all over the place and get established, ch'know :D

starseeker (Mar 12 2021 at 01:10):

starseeker (Mar 12 2021 at 02:59):

@Sean I've figured out how to insert a .gitattributes file at strategic points in the git history so the .3dm file gets flagged by git as a binary checkout. Ironically, this means the rel-7-32-2 distcheck will break on the default repo_verify step, since by that point: 1. the CMake logic had been taught how to use git for bookkeeping and 2. the .gitattributes file is present and unaccounted for in the CMake logic. However, I think it's still better to insert it - the error message tells the user what flag to supply to to avoid the problem (or they can just delete the .gitattributes file, since it has done its job by that point.) A corrupted 3dm file, on the other hand, has no easy fix.

Sean (Mar 12 2021 at 05:27):

@starseeker I found a good way to quickly extract all the files ever marked executable. It's a lot more than a handful. Quick scan of just a few dozen revisions found 2635 files. As expected, some are bogus but most look good. I'll do a manual pass over the list in the morning to weed out the ones that clearly shouldn't have exec set. The rest should be harmless.

Sean (Mar 12 2021 at 06:01):

Think it's better that the build work and the files be valid/usable on checkout. Distcheck failing isn't critical, so it's a reasonable trade. I would be cautious making more changes like that though. Surgery on the history to inject and edit files is risky in a manner that might not be realized for months and need to be re-uploaded to fix if there's some obscure but important bug.

Sean (Mar 12 2021 at 08:12):

Couldn't wait till morning. I went over the list manually, eliminated all the ones that looked like the exec bit was wrong/unnecessary, and here they are: executables.txt

Sean (Mar 12 2021 at 08:19):

I only looked at every 250'th commit for expediency, but did look through entire history up through r77000. I also only looked at trunk, so anything only existing on a branch Intentionally delisted all the itcl/itk files, makefile logic, and other outright errors (many of which are still wrong on trunk albeit harmlessly). Identified/Kept 1770 files but could use another pass from fresh eyes.

starseeker (Mar 12 2021 at 12:38):

Agreed. It might be better in that sense not to change it, even, since the .git/info/attributes answer would also address the issue and does not require history editing.

If we do opt for adding .gitattributes, there is one final question - the repo I posted last night puts a minimal .gitattributes in at two places - once when terra.dsp is introduced, and the second time when the .3dm file is introduced. The .gitattribute contents are focused tightly on those two file extensions. However, if we're going to more closely mimic the SVN checkout behavior, it would actually make more sense to inject a more comprehensive .gitattributes at the beginning of the history that covers more file types. My initial impulse was to go minimal to avoid surprises, but since SVN did have those mime types set there is an argument that it is more surprising for git not to have them. Thoughts?

starseeker (Mar 12 2021 at 14:32):

@Sean on the executable files - I set up some checks as well, using a brute force approach. (SSD speeds are nice) I checked all commits for trunk/ and branches/ is finishing up now.

starseeker (Mar 12 2021 at 14:33):

Sean (Mar 12 2021 at 14:42):

@starseeker That's basically the list I started with. I edited it down to the executables.txt list as there are many subtley and blatantly wrong entries in there.

Sean (Mar 12 2021 at 14:43):

We shouldn't set all those. There are entire folders that were checked in with executable bit set, including source files, header files, build files, images, ...

starseeker (Mar 12 2021 at 14:43):

Sean (Mar 12 2021 at 14:43):

Sean (Mar 12 2021 at 14:44):

starseeker (Mar 12 2021 at 14:44):

I 'll give it a quick check to see if a full rev check caught any that the 250-per jumping skipped over, but I don't expect to find much.

Sean (Mar 12 2021 at 14:44):

starseeker (Mar 12 2021 at 14:45):

After doing a main checkout, the user can add "*.3dm binary" to the .git/info/attributes file.

starseeker (Mar 12 2021 at 14:45):

That has to be a manual step, but because it's not a file in the repo history and it has highest precedence, once it's there any checkouts of other branches or tags will use it.

Sean (Mar 12 2021 at 14:46):

You're comparing apples to oranges there a bit because missing will be any between the 250 jumps that lived ephemerally but 99% will be intentional removals. I can give you the full list I started with.

starseeker (Mar 12 2021 at 14:47):

Sean (Mar 12 2021 at 14:48):

Here's the list I started with -- can compare with this to see what got skipped: executables_250.txt

Sean (Mar 12 2021 at 14:50):

starseeker (Mar 12 2021 at 14:52):

starseeker (Mar 12 2021 at 14:56):

Checking all tags, there was only one path that got added compared to trunk - "misc/archlinux/brlcad.sh"

Sean (Mar 12 2021 at 15:01):

starseeker (Mar 12 2021 at 15:01):

Sean (Mar 12 2021 at 15:02):

starseeker (Mar 12 2021 at 15:02):

Sean (Mar 12 2021 at 15:03):

Sean (Mar 12 2021 at 15:04):

Sean (Mar 12 2021 at 15:06):

Sean (Mar 12 2021 at 15:07):

starseeker (Mar 12 2021 at 15:07):

starseeker (Mar 12 2021 at 15:08):

starseeker (Mar 12 2021 at 15:11):

I'm three quarters of the way through the remaining branch checks - so far it looks like a little over 1200 files set exec that are unique to branches.

Sean (Mar 12 2021 at 15:40):

I'll comb through that diff list. There are a few in there that should be preserved.

starseeker (Mar 12 2021 at 15:49):

starseeker (Mar 12 2021 at 15:50):

Line used:

cat branches_uniq.txt |grep -v \\.h|grep -v \\.msg |grep -v \\.itk |grep -v \\.cpp |grep -v tzdata > branches_uniq_reduced.txt

Sean (Mar 12 2021 at 16:55):

still some inappropriates in there
I should be done going through the diff list here in a jiffy after I grab a bite

Sean (Mar 12 2021 at 18:02):

@starseeker thoughts on the creo3plugin snafu? inclined to ignore it from an exec bit perspective

starseeker (Mar 12 2021 at 18:03):

starseeker (Mar 12 2021 at 18:11):

@Sean As far as the .gitattributes thing, which option do you want to go with? We've got:

a) Insert minimal .gitattributes files at strategic points (the brlcad_added_gitattributes repo)

b) Insert more fully populated .gitattributes file for overall repo (closer match to SVN mime types in many cases, but problematic if we get unanticipated matches - personally I'm inclined not to do this)

c) No .gitattributes insertions, require user to set either per-checkout attributes or some form of global git attribute.

I'd be OK with a) or c) - if we go with c) however, we'll need to prominently document what to do to get "proper" older checkout behavior on Windows. The .dsp file isn't particularly noticeable if it gets munged up by the checkout, but the 3dm file is...

Sean (Mar 12 2021 at 18:14):

Sean (Mar 12 2021 at 18:15):

starseeker (Mar 12 2021 at 18:16):

Sean (Mar 12 2021 at 18:16):

Sean (Mar 12 2021 at 18:17):

so ... I think it looks like that's a global property that's just set, not something tracked per commit?

Sean (Mar 12 2021 at 18:17):

starseeker (Mar 12 2021 at 18:18):

I don't... think so? I think the index update is going to alter the tree entries git uses to track the checkout states?

starseeker (Mar 12 2021 at 18:18):

I know in the fast-export file that's how it's represented... hang on, let me generate something quick.

Sean (Mar 12 2021 at 18:19):

I guess to answer your question, I'm looking for an option #4 where it's just set in the repo transparently instead of explicitly as a bandaid

Sean (Mar 12 2021 at 18:19):

starseeker (Mar 12 2021 at 18:20):

Sean (Mar 12 2021 at 18:20):

what about checking out each rev and doing a git update-index on each of the files in our ledger?

Sean (Mar 12 2021 at 18:20):

Sean (Mar 12 2021 at 18:21):

starseeker (Mar 12 2021 at 18:21):

starseeker (Mar 12 2021 at 18:22):

Sean (Mar 12 2021 at 18:24):

starseeker (Mar 12 2021 at 18:24):

starseeker (Mar 12 2021 at 18:25):

Sean (Mar 12 2021 at 18:25):

So a script that walks every commit and scans for all the executables2.txt files?

Sean (Mar 12 2021 at 18:26):

starseeker (Mar 12 2021 at 18:26):

I was planning to do what I did for the previous case - take executables2.txt, reformat it for repowork, and operate on the fast import stream.

Sean (Mar 12 2021 at 18:30):

starseeker (Mar 12 2021 at 18:31):

Heh. Sorry. repowork take the output of "git fast-export", reads it into C++ data structures, manipulates it, and dumps out a new fast-import stream that is in turn fed to "git fast-import"

Sean (Mar 12 2021 at 18:32):

Sean (Mar 12 2021 at 18:33):

Sean (Mar 12 2021 at 18:34):

I think you mean you are dumping 12, doing all those corrections+fixes+etc, and then ending up with a new repo (call it 19 or 20 or whatever)?

starseeker (Mar 12 2021 at 18:34):

cd old_repo && git fast-export --all --show-original-ids > ~/old.fi
./repowork --mode-map exec_update.txt ~/old.fi new.fi
mkdir new_repo && cd new_repo && git init
cat ../new.fi | git fast-import

starseeker (Mar 12 2021 at 18:34):

starseeker (Mar 12 2021 at 18:35):

That way I don't have to redo all the 12->18 corrections - they're already there.

Sean (Mar 12 2021 at 18:35):

starseeker (Mar 12 2021 at 18:36):

That's why I was asking about the .gitattributes solution - I can also dump brlcad_tkhtml_fix and not incorporate the .gitattributes changes.

Sean (Mar 12 2021 at 18:36):

sounds good. Okay, so then ... is there anything to be done about the binary files? we could audit them similarly

Sean (Mar 12 2021 at 18:36):

starseeker (Mar 12 2021 at 18:36):

starseeker (Mar 12 2021 at 18:37):

Well, we can't avoid something like that, unless you know about a Git feature I don't.

Sean (Mar 12 2021 at 18:37):

I can pull the list of known correct and incorrect binaries the same way. even doing every rev would probably take an hour or so

starseeker (Mar 12 2021 at 18:38):

Whether the dsp or 3dm files get checked out as text or binary I don't think is governed by anything stored internally in the repo.

Sean (Mar 12 2021 at 18:38):

starseeker (Mar 12 2021 at 18:39):

That's why I posted the no_blob.fi file. If you check pretty much any commit, you'll see that only the mode and the blob sha1 are associated with the path. There doesn't seem to be an equivalent to the svn:mime-type

starseeker (Mar 12 2021 at 18:40):

Sean (Mar 12 2021 at 18:41):

that everything is essentially just stored (binary) and whether it displays them or treats them as binary depends on it detecting non-ascii bytes

starseeker (Mar 12 2021 at 18:42):

Right, which means if you want it to treat a file (say) as binary anyway (or keep Windows line endings on Linux, for that matter) you need some form of gitattributes override. The dsp and 3dm files are getting detected as text, as far as I can tell.

starseeker (Mar 12 2021 at 18:44):

We could look for other files in the repository that should be binary but will match a text detection, although I'm not 100% how to set that up, but even once we know that there's no per-path property we can set in git (that I know of) that doesn't involve the .gitattributes file

Sean (Mar 12 2021 at 18:45):

Sean (Mar 12 2021 at 18:46):

too much potential to screw up something. e.g., .dsp files .. those were msvc6 project files iirc, so they usually are/were text files

starseeker (Mar 12 2021 at 18:47):

Right - that's where terra.dsp got so messed up historically - when people auto-set all the mime types for .dsp files.

Sean (Mar 12 2021 at 18:47):

i mean, we can fix our little terra.dsp and 3dm, but probably not worth seeking out more

Sean (Mar 12 2021 at 18:48):

potential for error would probably be the few .g's that have been committed, but those are almost certainly correctly detected as binary

Sean (Mar 12 2021 at 18:49):

i'll do a spot check just to see if it looks like there were any important binaries in the history

Sean (Mar 12 2021 at 18:49):

Sean (Mar 12 2021 at 18:53):

oh that'll be handy -- this will also tell which files we changed the mime-type on, which might be an indicator that it was important

Sean (Mar 12 2021 at 18:54):

starseeker (Mar 12 2021 at 18:54):

Sean (Mar 12 2021 at 19:00):

Sean (Mar 12 2021 at 19:02):

starseeker (Mar 12 2021 at 19:10):

starseeker (Mar 12 2021 at 19:11):

starseeker (Mar 12 2021 at 19:41):

@Sean If you do find more important binary paths that test as text files, what did you want to do about them - make similar insertions of .gitattributes to protect them?

Sean (Mar 12 2021 at 19:48):

Sean (Mar 12 2021 at 19:51):

@starseeker do you have an existing .gitconfig or other file specifying file extensions being binary or not somewhere?

starseeker (Mar 12 2021 at 19:52):

Sean (Mar 12 2021 at 19:53):

Sean (Mar 12 2021 at 19:54):

starseeker (Mar 12 2021 at 19:54):

starseeker (Mar 12 2021 at 19:55):

Sean (Mar 12 2021 at 19:57):

The fact that it got terra.dsp wrong is a little surprising as it clearly has non-printable characters. The only reason I can think of where it would have committed that as text is somewhere something saying '.dsp files are text'. I'm not finding that so it's a little concerning where that came from.

Sean (Mar 12 2021 at 19:57):

starseeker (Mar 12 2021 at 19:58):

Sean (Mar 12 2021 at 19:59):

starseeker (Mar 12 2021 at 19:59):

That would simplify matters, actually - we'd only have to add .gitattributes for the 3dm file.

starseeker (Mar 12 2021 at 20:00):

As long as git doesn't have any built-in file extension awareness for *.dsp... I doubt it...

Sean (Mar 12 2021 at 20:00):

Sean (Mar 12 2021 at 20:01):

Sean (Mar 12 2021 at 20:02):

that's another that should get detected as binary... I mean unless the detection method is onerously too simple.

starseeker (Mar 12 2021 at 20:02):

starseeker (Mar 12 2021 at 20:03):

Sean (Mar 12 2021 at 20:03):

starseeker (Mar 12 2021 at 20:03):

Sean (Mar 12 2021 at 20:03):

Sean (Mar 12 2021 at 20:04):

Sean (Mar 12 2021 at 20:05):

starseeker (Mar 12 2021 at 20:05):

/me blinks - terra.dsp is coming out different in SVN and git checkouts according to diff, even though I fixed the SVN mime type in 70882

starseeker (Mar 12 2021 at 20:05):

starseeker (Mar 12 2021 at 20:09):

OK, I guess that makes sense, kind of. Both r18847 and latest trunk SVN checkout of terra.dsp diff with the CVS checkout, but the git checkout matches the CVS checkout.

starseeker (Mar 12 2021 at 20:09):

/me doesn't know why NONE of the SVN checkouts match CVS, but I guess it doesn't really matter at this point...

Sean (Mar 12 2021 at 20:14):

Sean (Mar 12 2021 at 20:16):

Sean (Mar 12 2021 at 20:17):

conv18's terra.dsp has both 0x13 and 0x11 bytes which at a glance is probably correct

Sean (Mar 12 2021 at 20:18):

it's also worth mentioning that both are perfectly valid dsp data files for the same dimensional specification. the difference is going to be a 1/32768 difference in elevation at those points.

starseeker (Mar 12 2021 at 20:19):

I just tested brlcad_tkhtml_fix on Windows, which for older checkouts doesn't have .gitattributes. terra.dsp checkout matches the CVS version according to diff, so you're correct - we don't need .dsp flagged as binary explicitly. We just need to make sure we don't flag it as Windows line ending in git.

starseeker (Mar 12 2021 at 20:21):

It's probably worth leaving the entry in the new .gitattributes to avoid that, but we don't need to insert it in the old history for that purpose. I'll adjust my logic to only add the .3dm version.

Sean (Mar 12 2021 at 20:21):

can we get rid of the top-level .gitattributes altogether? rather we stick to defaults if we can manage.

starseeker (Mar 12 2021 at 20:21):

Sean (Mar 12 2021 at 20:23):

starseeker (Mar 12 2021 at 20:23):

Sean (Mar 12 2021 at 20:23):

there's so much override specified in that file, I can see that coming to bite down the road or at least being a debugging discovery journey

Sean (Mar 12 2021 at 20:24):

but that also still begs the question how that 3dm is getting treated as text... it's full of binary

Sean (Mar 12 2021 at 20:24):

starseeker (Mar 12 2021 at 20:25):

starseeker (Mar 12 2021 at 20:26):

OK, if I'm reading this right, ".gitattributes file in the same directory as the path in question" is in the precedence list, so you should be correct we can target locally.

Sean (Mar 12 2021 at 20:26):

starseeker (Mar 12 2021 at 20:27):

@Sean I'm game to ditch the top level .gitattributes in main - I added it mostly trying to match the subversion default rules you had set up...

starseeker (Mar 12 2021 at 20:27):

Sean (Mar 12 2021 at 20:28):

okay, so apparently their method is essentially ..."check for any occurrence of a zero/nul byte in the first 8000 bytes"

Sean (Mar 12 2021 at 20:28):

Sean (Mar 12 2021 at 20:29):

Sean (Mar 12 2021 at 20:31):

Sean (Mar 12 2021 at 20:32):

Sean (Mar 12 2021 at 20:33):

did you maybe use some git checkout tool that had a built-in config such that it was an individual issue?

starseeker (Mar 12 2021 at 20:33):

Sean (Mar 12 2021 at 20:38):

according to git on mac, it thinks they're binary...
this tells which it thinks are binary: git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-'

starseeker (Mar 12 2021 at 20:41):

I just tried again cloning with Git on Windows - the SVN checkout and the git checkout differ.

Sean (Mar 12 2021 at 20:41):

those would seem to imply we don't need to do anything for them and terra.dsp is getting historically and contemporarily fixed by the migration

Sean (Mar 12 2021 at 20:41):

starseeker (Mar 12 2021 at 20:42):

terra.dsp agreed. The 3dm files are the issue - ayam_hyperbolid.3dm also differs between git and SVN checkouts.

starseeker (Mar 12 2021 at 20:42):

starseeker (Mar 12 2021 at 20:43):

starseeker (Mar 12 2021 at 20:44):

Confirmed. Git checkouts of both 3dm files on Windows fail to convert with 3dm-g

Sean (Mar 12 2021 at 20:45):

starseeker (Mar 12 2021 at 20:45):

starseeker (Mar 12 2021 at 20:46):

Sean (Mar 12 2021 at 20:46):

what does this report on windows: git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-' | grep 3dm

Sean (Mar 12 2021 at 20:47):

morrison@agua brlcad_conv18 % git diff --numstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep '^-' | grep 3dm
-   -   db/nist/NIST_MBE_PMI_7-10.3dm
-   -   regress/nurbs/brep-3dm.tar.bz2
-   -   src/libbrep/tests/ayam_hyperbolid.3dm

Sean (Mar 12 2021 at 20:49):

starseeker (Mar 12 2021 at 20:49):

Sean (Mar 12 2021 at 20:50):

% git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
 db/nist/NIST_MBE_PMI_7-10.3dm                      |    Bin 0 -> 4232626 bytes
 regress/nurbs/brep-3dm.tar.bz2                     |    Bin 0 -> 103242 bytes
 src/conv/3dm/3dm-g.c                               |    137 +
 src/conv/3dm/CMakeLists.txt                        |     16 +
 src/libbrep/tests/ayam_hyperbolid.3dm              |    Bin 0 -> 4189 bytes
 src/other/openNURBS/opennurbs_3dm.h                |    528 +
 src/other/openNURBS/opennurbs_3dm_attributes.cpp   |   1528 +
 src/other/openNURBS/opennurbs_3dm_attributes.h     |    573 +
 src/other/openNURBS/opennurbs_3dm_properties.cpp   |    598 +
 src/other/openNURBS/opennurbs_3dm_properties.h     |    142 +
 src/other/openNURBS/opennurbs_3dm_settings.cpp     |   4036 +
 src/other/openNURBS/opennurbs_3dm_settings.h       |    891 +

starseeker (Mar 12 2021 at 20:51):

starseeker (Mar 12 2021 at 20:52):

Sean (Mar 12 2021 at 20:53):

Sean (Mar 12 2021 at 20:54):

starseeker (Mar 12 2021 at 20:55):

$ git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
 db/nist/NIST_MBE_PMI_7-10.3dm                      |    Bin 0 -> 4232626 bytes
 regress/nurbs/brep-3dm.tar.bz2                     |    Bin 0 -> 103242 bytes
 src/conv/3dm/3dm-g.c                               |    137 +
 src/conv/3dm/CMakeLists.txt                        |     16 +
 src/libbrep/tests/ayam_hyperbolid.3dm              |    Bin 0 -> 4189 bytes
 src/other/openNURBS/opennurbs_3dm.h                |    528 +
 src/other/openNURBS/opennurbs_3dm_attributes.cpp   |   1528 +
 src/other/openNURBS/opennurbs_3dm_attributes.h     |    573 +
 src/other/openNURBS/opennurbs_3dm_properties.cpp   |    598 +
 src/other/openNURBS/opennurbs_3dm_properties.h     |    142 +
 src/other/openNURBS/opennurbs_3dm_settings.cpp     |   4036 +
 src/other/openNURBS/opennurbs_3dm_settings.h       |    891 +

Sean (Mar 12 2021 at 20:55):

Sean (Mar 12 2021 at 20:56):

starseeker (Mar 12 2021 at 20:56):

$ stat db/nist/NIST_MBE_PMI_7-10.3dm
  File: db/nist/NIST_MBE_PMI_7-10.3dm
  Size: 4232626         Blocks: 4136       IO Block: 65536  regular file
Device: e02c1581h/3760985473d   Inode: 3096224743825018  Links: 1
Access: (0644/-rw-r--r--)  Uid: (197612/   cliff)   Gid: (197612/ UNKNOWN)
Access: 2021-03-12 15:55:21.920148200 -0500
Modify: 2021-03-12 15:53:06.716503800 -0500
Change: 2021-03-12 15:53:06.716503800 -0500
 Birth: 2021-03-12 15:53:06.716503800 -0500

Sean (Mar 12 2021 at 20:57):

Sean (Mar 12 2021 at 20:59):

Sean (Mar 12 2021 at 21:00):

stat is saying it has no carriage returns, yet the file you sent has carriage returns

Sean (Mar 12 2021 at 21:00):

starseeker (Mar 12 2021 at 21:00):

Sean (Mar 12 2021 at 21:01):

starseeker (Mar 12 2021 at 21:01):

$ ls -l db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4232626 Mar 12 15:53 db/nist/NIST_MBE_PMI_7-10.3dm

Sean (Mar 12 2021 at 21:01):

Sean (Mar 12 2021 at 21:02):

starseeker (Mar 12 2021 at 21:05):

 /c/brlcad-build/Debug/bin/3dm-g.exe -o test.g brlcad_tkhtml_fix/src/libbrep/tests/ayam_hyperbolid.3dm
invalid input file ('ONX_Model::Read() failed.

Note:  if this file was saved from Rhino3D, make sure it was saved using
Rhino's v5 format or lower - newer versions of the 3dm format are not
currently supported by BRL-CAD.')

failed to load input file

starseeker (Mar 12 2021 at 21:06):

Sean (Mar 12 2021 at 21:06):

that's what's confusing because all the numbers are pointing at it being correct (now)

starseeker (Mar 12 2021 at 21:06):

Sean (Mar 12 2021 at 21:07):

starseeker (Mar 12 2021 at 21:07):

It's almost as if it wrote an intermediate version of the file and then went back and changed it

Sean (Mar 12 2021 at 21:07):

starseeker (Mar 12 2021 at 21:07):

Sean (Mar 12 2021 at 21:08):

starseeker (Mar 12 2021 at 21:12):

MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm

MINGW64 /c
$ diff brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm brlcad/db/nist/NIST_MBE_PMI_7-10.3dm
Binary files brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm and brlcad/db/nist/NIST_MBE_PMI_7-10.3dm differ

MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
-rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm

MINGW64 /c
$ /c/brlcad-build/Debug/bin/3dm-g.exe -o /c/brlcad_tkhtml_fix/test.g brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
invalid input file ('ONX_Model::Read() failed.

Note:  if this file was saved from Rhino3D, make sure it was saved using
Rhino's v5 format or lower - newer versions of the 3dm format are not
currently supported by BRL-CAD.')

failed to load input file

MINGW64 /c
$ ls -l brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm
 -rw-r--r-- 1 cliff 197612 4243206 Mar 12 16:08 brlcad_tkhtml_fix/db/nist/NIST_MBE_PMI_7-10.3dm

 MINGW64 /c
$ date
Fri Mar 12 16:11:59 EST 2021

starseeker (Mar 12 2021 at 21:18):

starseeker (Mar 12 2021 at 21:19):

$ stat db/nist/NIST_MBE_PMI_7-10.3dm
  File: db/nist/NIST_MBE_PMI_7-10.3dm
  Size: 4243206         Blocks: 4144       IO Block: 65536  regular file
Device: e02c1581h/3760985473d   Inode: 48132221017637258  Links: 1
Access: (0644/-rw-r--r--)  Uid: (197612/   cliff)   Gid: (197612/ UNKNOWN)
Access: 2021-03-12 16:17:43.513220300 -0500
Modify: 2021-03-12 16:16:43.532223700 -0500
Change: 2021-03-12 16:16:43.532223700 -0500
 Birth: 2021-03-12 16:08:53.242552400 -0500

starseeker (Mar 12 2021 at 21:20):

MINGW64 /c/brlcad_tkhtml_fix (main)
$ git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904 HEAD -- | grep 3dm
 db/nist/NIST_MBE_PMI_7-10.3dm                      |    Bin 0 -> 4232626 bytes
 regress/nurbs/brep-3dm.tar.bz2                     |    Bin 0 -> 103242 bytes
 src/conv/3dm/3dm-g.c                               |    137 +
 src/conv/3dm/CMakeLists.txt                        |     16 +
 src/libbrep/tests/ayam_hyperbolid.3dm              |    Bin 0 -> 4189 bytes
 src/other/openNURBS/opennurbs_3dm.h                |    528 +
 src/other/openNURBS/opennurbs_3dm_attributes.cpp   |   1528 +
 src/other/openNURBS/opennurbs_3dm_attributes.h     |    573 +
 src/other/openNURBS/opennurbs_3dm_properties.cpp   |    598 +
 src/other/openNURBS/opennurbs_3dm_properties.h     |    142 +
 src/other/openNURBS/opennurbs_3dm_settings.cpp     |   4036 +
 src/other/openNURBS/opennurbs_3dm_settings.h       |    891 +

MINGW64 /c/brlcad_tkhtml_fix (main)
$ stat db/nist/NIST_MBE_PMI_7-10.3dm
  File: db/nist/NIST_MBE_PMI_7-10.3dm
  Size: 4243206         Blocks: 4144       IO Block: 65536  regular file
Device: e02c1581h/3760985473d   Inode: 48132221017637258  Links: 1
Access: (0644/-rw-r--r--)  Uid: (197612/   cliff)   Gid: (197612/ UNKNOWN)
Access: 2021-03-12 16:17:43.852005600 -0500
Modify: 2021-03-12 16:16:43.532223700 -0500
Change: 2021-03-12 16:16:43.532223700 -0500
 Birth: 2021-03-12 16:08:53.242552400 -0500

starseeker (Mar 12 2021 at 21:23):

Sean (Mar 12 2021 at 21:23):

starseeker (Mar 12 2021 at 21:24):

Sean (Mar 12 2021 at 21:26):

it has to have been a git tool that fixed it.. you tried git diff --stat or git diff --numstat on it?

Sean (Mar 12 2021 at 21:26):

starseeker (Mar 12 2021 at 21:27):

Sean (Mar 12 2021 at 21:27):

Sean (Mar 12 2021 at 21:28):

starseeker (Mar 12 2021 at 21:28):

Only thing I can think of is that .gitattributes file I added in main may actually be the problem.

starseeker (Mar 12 2021 at 21:30):

starseeker (Mar 12 2021 at 21:31):

It checked out wrong in main because some rule I stuck in there must have matched the 3dm file, then when the branch checked out it kept the file that had been "modified" by the .gitattributes file.

starseeker (Mar 12 2021 at 21:31):

Sean (Mar 12 2021 at 21:32):

:thumbs_up: That's what I suspected would eventually happen... just not so soon.

starseeker (Mar 12 2021 at 21:32):

When I switched to the branch that didn't have the file, blew away the modded 3dm file, and restored from the branch rather than main that's where the good file came from.

Sean (Mar 12 2021 at 21:33):

starseeker (Mar 12 2021 at 21:33):

OK. I'll back up to conv18 (or the one with your readme update if I've got it) and re-apply the tkhtml fix and the exec settings. We should then be Good To Go - minimalism wins again.

Sean (Mar 12 2021 at 21:33):

Sean (Mar 12 2021 at 21:34):

I plan to completely overhaul the readme soon once all the other docs and tickets are in place.

starseeker (Mar 12 2021 at 21:35):

/me has been rather bad for your stress levels this week. OK, give me a few minutes to do a final pass and I'll upload the final candidate.

starseeker (Mar 12 2021 at 21:53):

@Sean remind me after we're done here to check the fast4 regression test - one of those files is deliberately Windows line endings and one is deliberately Linux - we may have to switch the in repo copies of those files to be .bz2 or something so they don't get autoupdated as text files on checkout.

starseeker (Mar 12 2021 at 21:55):

@Sean While we're thinking about it, did you also want to eliminate .gitignore? It's in there now because it gave us some non-empty SVN commits for id mapping, but maybe we want to eliminate it now.

starseeker (Mar 12 2021 at 21:57):

starseeker (Mar 12 2021 at 22:19):

Windows build and distcheck-full running on rel-7-32-2 tag from that repo now. Will confirm if successful in a few hours

Sean (Mar 12 2021 at 22:46):

starseeker (Mar 12 2021 at 22:46):

starseeker (Mar 13 2021 at 01:23):

starseeker (Mar 13 2021 at 17:20):

Sean (Mar 15 2021 at 16:55):

Sean (Mar 15 2021 at 16:56):

I'm running out of checks on my end, I think I'll upload the update this afternoon unless you found anything

starseeker (Mar 15 2021 at 17:43):

find ../ -name \*.g -exec ./bin/mged {} ls \; came through clean, as far as I can tell.

starseeker (Mar 15 2021 at 18:31):

Prabhat Singh (Mar 15 2021 at 19:31):

Hello everyone, will it be possible if someone can point me out to the getting started docs or quick start docs for opencax ?

Sean (Mar 15 2021 at 19:58):

@starseeker something that exercises external-to-internal form, like doing a get * or draw *. ls doesn't crack them iirc.

starseeker (Mar 15 2021 at 22:28):

starseeker (Mar 15 2021 at 22:33):

g2asc output for all of them matches as well, except for a few chars in the openNURBS serializations of the breps.

starseeker (Mar 16 2021 at 01:32):

OK, yeah - the openNURBS serializations differ even building from the same sources, when different build dirs are used.

starseeker (Mar 16 2021 at 01:33):

starseeker (Mar 16 2021 at 22:10):

Sean (Mar 17 2021 at 06:49):

starseeker (Mar 17 2021 at 23:42):

Sean (Mar 17 2021 at 23:49):

Sean (Mar 19 2021 at 03:01):

So I think a "soft opening" is probably in order. It's uploaded and live, but perhaps we could give it a few days to "simmer" .. not announce it publicly just yet.

starseeker (Mar 19 2021 at 03:35):

Sean (Mar 19 2021 at 03:45):

Sean (Mar 19 2021 at 03:52):

but yeah, I don't see a reason why not. if anything, we should exercise it to make sure it's correct.

starseeker (Mar 19 2021 at 12:07):

starseeker (Mar 19 2021 at 12:24):

starseeker (Mar 19 2021 at 12:38):

starseeker (Mar 19 2021 at 13:26):

@Sean Right now I've got the "check" target building on the runners, but that's not going to succeed reliably due to the threading issues - looks like regress-gqa is failing some of the time on the OSX runner. Should I disable the check portion of the test until we have an expectation it can reliably run?

starseeker (Mar 19 2021 at 15:50):

I took a run at updating HACKING, but without doing an all-up release I'm sure I've missed something.

starseeker (Mar 19 2021 at 15:54):

One thing that is clear - if we want to keep providing the GNU style ChangeLog files, we'll have to put some effort into it.

starseeker (Mar 19 2021 at 15:58):

My thought, since now each git clone has the whole history locally, would be to either dispense with the ChangeLog all together or simply use the git log output. The only real utility to the ChangeLog would be for folks looking at tarballs without any access to either a local or github version of the history - I would expect that to be a rare case, and even in that scenario I would expect git log (or maybe git log --stat) output to be as useful as the current ChangeLog.

starseeker (Mar 19 2021 at 19:51):

Started populating the releases - that's going to be a job if we want to get all the binaries, source tarballs and notes moved. I got the majority of the release notes set up - all but a couple back to 7.0, except for a couple without obvious corresponding tags. However, I've only gotten a few of the uploads.

starseeker (Mar 20 2021 at 01:09):

starseeker (Mar 20 2021 at 03:12):

starseeker (Mar 20 2021 at 03:15):

starseeker (Mar 20 2021 at 03:16):

Erik (Mar 20 2021 at 13:00):

ehhh, if github is central to development, people who want to see that far into development should come watch...

starseeker (Mar 20 2021 at 15:07):

Phew! OK, missing tags added, source and binary tarballs uploaded. Needs someone to double check to make sure I didn't miss any. OVA image (just barely) uploaded to Release on OVA repository.

Only binaries I don't have up yet are the old ProE plugins - not sure where to put them.

starseeker (Mar 20 2021 at 15:16):

Options would be either to set up a separate project for the creo plugins, or add tags for the plugins (something like proe-plugin-0-2-0 maybe?) and upload the plugins to those tags. If we do want to add older tags for the plugins, we'll need to be careful about setting tag dates once we identify the corresponding commits. (Just got bit by that - it's fixable https://stackoverflow.com/a/21741848/2037687 but we may as well get it right up front...)

starseeker (Mar 20 2021 at 15:26):

starseeker (Mar 21 2021 at 13:25):

@Erik Do you know anything about the Github "packages" feature? Is that anything that might be useful for BRL-CAD?

starseeker (Mar 21 2021 at 13:26):

Erik (Mar 22 2021 at 12:47):

Sumagna Das (Mar 23 2021 at 19:56):

Sumagna Das (Mar 23 2021 at 19:57):

starseeker (Mar 23 2021 at 20:16):

Sumagna Das (Mar 23 2021 at 20:17):

Sumagna Das (Mar 23 2021 at 20:18):

but i can try it right now if you want and keep the laptop open for the night....

starseeker (Mar 23 2021 at 20:19):

@Sumagna Das Up to you - I'd actually be surprised if it can do anything much with our action files, since they call for Windows and OSX vms as well as Linux...

Sumagna Das (Mar 23 2021 at 20:20):

Sumagna Das (Mar 23 2021 at 20:21):

starseeker (Mar 23 2021 at 20:22):

So the question would be whether it knows to skip the non-Linux entries automatically or would we need to edit the files down before running it.

Sumagna Das (Mar 23 2021 at 20:24):

Sean (Mar 23 2021 at 20:24):

Sumagna Das (Mar 23 2021 at 20:25):

i didnt know that it can actually help with BRL-CAD's github actions so thought it was off topic :smile:

starseeker (Mar 23 2021 at 20:27):

Sumagna Das (Mar 23 2021 at 20:27):

starseeker (Mar 23 2021 at 20:27):

I've lost count of the number of things I've done on this conversion that I've considered where I didn't know whether or not it would help...

Sean (Mar 24 2021 at 03:09):

Yes, though I'm not fond of Github's default that merely links to the diff. It should really be in the e-mail (up to some kb limit) since the entire point of commit notification is quick review of the code change.

Sean (Mar 24 2021 at 03:10):

Looks like the way to handle it will be to set up a clone on .bz that pulls periodically with a receive hook

Sean (Mar 24 2021 at 03:11):

Sean (Mar 24 2021 at 03:52):

For the ChangeLog, we can start without it. I think it will be good to include one in future source tarballs, though I don't think it matters so much what tool generates it. Including more than the git 1-liner would be essential, but a git log of all changes since last release would probably be adequate.

At a glance, looks like there are a couple that wrap git log, and looks like emacs can do it, or we can just sort out the magic needed to automatically extract commits since the previous release (a little tricky, but not terribly hard).

Sean (Mar 24 2021 at 03:53):

starseeker (Mar 24 2021 at 12:45):

The github file size limit is, IIRC, 2 gigs. Compressed, it was on the order of 1.8

Sean (Mar 25 2021 at 04:29):

starseeker (Mar 25 2021 at 10:59):

starseeker (Mar 30 2021 at 02:01):

/me bemusedly wonders if @Sean is planning to announce the migration on April 1st...

Sean (Mar 30 2021 at 15:42):

Okay, I've sent out 16 invitations to add people to our list of members (i.e., people that have commit access to any repo). It's only a fraction of what we had on SourceForge, but it should be a good start.

Sean (Mar 30 2021 at 15:54):

@starseeker you also apparently lacked the admin bit on the brlcad repo and weren't a member of the dev team, which looks like is why you couldn't add anyone.

starseeker (Mar 30 2021 at 15:54):

Sean (Mar 30 2021 at 15:55):

right now, being in a team pretty much gives full administrative control, so we may want to change that later, but that's essentially how it was on sourceforge

starseeker (Mar 30 2021 at 15:56):

Sean (Mar 30 2021 at 15:56):

Sean (Mar 30 2021 at 15:57):

permissions are set on repos themselves or they're set on teams (which then have permissions attached to them) or they're set on members (which have permissions attached to them)

Sean (Mar 30 2021 at 15:57):

starseeker (Mar 30 2021 at 15:58):

Sean (Mar 30 2021 at 15:58):

so for example you were a member, which lets you create repos, but you weren't on the dev team, so you couldn't add people to brlcad

Sean (Mar 30 2021 at 15:59):

It looks like it's set up this way so you can have teams with admin access, teams without, all accessing some or not having access to other repos. It's not a strict hierarchy of permissions, it's more of a matrix.

starseeker (Mar 30 2021 at 16:00):

A bit complex to manage, but also potentially quite useful for preventing accidents and the like.

Sean (Mar 30 2021 at 16:53):

right now I just have two teams set up, devs and webdevs, with devs having all repos but only admin on the compiled-code repos, and webdevs having admin over the web-related repos including the website and web projects

Sean (Mar 31 2021 at 19:17):

Sean (Mar 31 2021 at 19:40):

Looks like "nearly" everything in there has no traceability after the 25XXX converter movements. Looking at iges.h for example, it stops at 25521. Git log appears to have the other changes, for example if I git log --follow src/conv/iges/iges.h, it looks corrupted to me.

Sean (Mar 31 2021 at 19:41):

the last commit is shown as 0fe9bf30dc0f7980df6486014bb29567bec09a84 (r4502) which was a change to sig/i-a.c ... similarly 1cdf453b9d355b1a7fb10bea445ab18b262a0252 (r5920) was sig/u-a.c

Sean (Mar 31 2021 at 19:42):

the two commits before that seem to have nothing to do with sig and are other random commits

Sean (Mar 31 2021 at 19:46):

looks like it's not until 3408f5ba1220271623a90b3740eb43abe06a857a a dozen or commits prior that it starts to get back on track

Sean (Mar 31 2021 at 19:50):

If I trace back commits in subversion, the last five on iges.h are r13453, r10561, r9487, r8144, r7715. Commits r13453 is 994dcc97ee6d9f60e670aa9a2ed110273920294c for example and r7715 is split across three commits: 317460fce22e6ba835a08bef126e2b75a123ee78
b9f6d30bd15f4c66ed5e7506877b6ae35c80ea06
eb458e30c765b2758097abc1cb5909422e050e90
so the commits are somewhere in the full history, I'm just not sure where. :(

Sean (Mar 31 2021 at 19:54):

/me hopes this is limited to conv/ or conv/iges and not all directory renames besides r22798... because there were a dozen or so others

Sean (Mar 31 2021 at 20:11):

starseeker (Mar 31 2021 at 20:31):

For whatever reason, the --follow algorithm isn't finding the src/iges/iges.h file starting from src/conv/iges/iges.h. Looking at the gitk history, following the parent commits does get to the rename commit, so my initial guess is that it's not data corruption per say but a limitation of the implementation of --follow (which apparently has some issues...)

Sean (Mar 31 2021 at 20:34):

That doesn't add up though -- it lists some older commits on some files, commits that have absolutely nothing to do with that directory entirely.

Sean (Mar 31 2021 at 20:34):

starseeker (Mar 31 2021 at 20:34):

If I'm reading this right, git's interpretation (or cvs-fast-export's, at any rate) was that r25518 removed the iges files rather than moving them, 25519 and 25520 were then committed, and 25521 added the iges files back in.

starseeker (Mar 31 2021 at 20:35):

starseeker (Mar 31 2021 at 20:36):

starseeker (Mar 31 2021 at 20:37):

starseeker (Mar 31 2021 at 20:40):

Sean (Mar 31 2021 at 20:40):

starseeker (Mar 31 2021 at 20:43):

Sean (Mar 31 2021 at 20:44):

Doesn't that just mean that the history is attached somewhere? That much is already confirmed, the commits exist in the history, just seemingly not where they should be. Like, where is r13453 ? What file can I do a log on to find it? (inclined to see if it's attached to some other random file like the u-a.c commit.

Sean (Mar 31 2021 at 20:45):

Definitely not, they're genuine changes to other files not even related to src/conv in any way.

Sean (Mar 31 2021 at 20:46):

git show 0fe9bf30dc0f7980df6486014bb29567bec09a84 ... it says that was the first commit to iges.h in that location (sans follow)

starseeker (Mar 31 2021 at 20:48):

The parent commit of 25521 is 3408f5ba1220 (25520) which is an empty commit as far as iges.h is concerned (and iges.h doesn't exist in the tree at that point.) That may break the --follow chain, but I'm not clear yet on why --follow is reporting anything else before src/conv/iges/ iges.h in that case

starseeker (Mar 31 2021 at 20:48):

starseeker (Mar 31 2021 at 21:01):

Commits back through 22798 in the follow history do have changes that pertain to iges.h, from the looks of things.

starseeker (Mar 31 2021 at 21:02):

Sean (Mar 31 2021 at 21:02):

Sean (Mar 31 2021 at 21:04):

I haven't been able to find the iges/iges.h history which had several dozen commits prior to the move around 25520

starseeker (Mar 31 2021 at 21:22):

starseeker (Mar 31 2021 at 21:29):

@Sean I agree git log --follow is going off the rails in a bizarre way, but if I diff the svn commits and those found by git log -- "**/iges.h" the delta is pretty small:

--- svnrevs.txt 2021-03-31 17:19:14.593937412 -0400
+++ gitrevs.txt 2021-03-31 17:19:50.609358451 -0400
@@ -18,12 +18,13 @@
 27341
 26074
 25521
+25518
 23807
 23633
 23577
+22839
 22798
 13453
-10561
 9487
 8144
 7715

starseeker (Mar 31 2021 at 21:36):

starseeker (Mar 31 2021 at 21:37):

starseeker (Mar 31 2021 at 22:11):

Sean (Apr 01 2021 at 04:25):

Sorry, I meant iges.c for that one -- I was trying to find it's full history the same way and can't get it to report the 30 commits prior to it getting moved around even with git log --full-history -- **/iges.c

Sean (Apr 01 2021 at 04:26):

Comparing against: svn log svn+ssh://brlcad@svn.code.sf.net/p/brlcad/code/brlcad/trunk/iges/iges.c@22500 | grep '^r'

Sean (Apr 01 2021 at 04:31):

How can I manually traverse the actual history manually on the git side? In svn, one would see a log stops at r12345, then one pulls log on a path mentioned in the comment at a few revs prior (e.g., r12340), and repeat as needed. if it wasn't mentioned in a comment, one can still pull the tree at r12340, find the file, then continue the log on it.

Sean (Apr 01 2021 at 04:32):

Sean (Apr 01 2021 at 04:35):

I mean, I can think of a really brute force way, checking out the sha prior (-1), but what's the right way?

Sean (Apr 01 2021 at 05:00):

Relying on "git log -- **/file" feels inadequate in the general case because it 1) only works if the file wasn't renamed, 2) can erroneously catch other same-named files (good luck tracking a subdir README that moved..), and 3) doesn't seem to help figure out where the commit exists..only that it exists.

Sean (Apr 01 2021 at 05:01):

Any idea what happened with ProEngineer? It seems to similarly have lost track. I didn't check the others.

starseeker (Apr 01 2021 at 11:28):

So if I do the following: git log -- "**/iges.c"|grep svn:revision|awk -F':' '{print $3}' the last few returns are:

starseeker (Apr 01 2021 at 11:30):

With SVN svn log https://svn.code.sf.net/p/brlcad/code/brlcad/trunk/iges/iges.c@22500 | grep '^r'|awk '{print $1}'|sed 's/r//' I get:

starseeker (Apr 01 2021 at 11:31):

r10561 is the only one missing from Git, and that's expected as it was an SVN property change.

starseeker (Apr 01 2021 at 11:42):

In that situation what I would usually do is bring up gitk (or maybe gitk --all) and go to the last known relevant commit, then browse my way back up the history.

IMHO not tracking file moves was a mistake, since it fundamentally limits what you can successfully pull out of the history in cases like this.

starseeker (Apr 01 2021 at 11:51):

git log --follow and variations on git log -- "**/fiename" are the best answers I'm currently aware of, but I'll keep my eyes peeled for better ones.

starseeker (Apr 01 2021 at 12:06):

If I'm interpreting 69329 correctly, the CREO directory was added while the ProEngineer directory was still present.

starseeker (Apr 01 2021 at 12:07):

That may be why it's not following Creo back into ProEngineer - it wasn't a folder rename.

starseeker (Apr 01 2021 at 12:09):

gitk's blame feature might be slightly better in some cases at following changes back, since some of the comments I've seen seem to suggest it's using a more powerful search mechanism than the --follow option...

Sean (Apr 01 2021 at 21:30):

git log -- **/iges.c

git log -- "**/iges.c"

... I'd missed quoting the glob, so it was only matching src/conv/iges/iges.c history.

Sean (Apr 01 2021 at 21:31):

Sean (Apr 01 2021 at 21:35):

Er, that's rather error prone I'd think, trying to follow a text line potentially next to a half dozen other | lines, scrolling up for pages, maybe 10k commits back. Still that's also only good in GUI mode -- I'm looking for lower-level that will work even when I'm remove in a console. I mean is "git log -1 sha" where the gitk line connects up to? Or is it sha^! or something else?

starseeker (Apr 01 2021 at 23:12):

Maybe I'm not quite following what you're after... Do you mean something like the following?:

$ git log -1 a1e49c
commit a1e49c5edbb4df8eb10f7ae014ae6efeb12fc966
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Thu May 20 15:22:02 2004 +0000

    Vast reorganization begins.  Sources moved from top-level directories into src/.

    svn:revision:22798
    cvs:account:morrison
    cvs:branch:trunk

$ git log -1 a1e49c~1
commit be1f3137808b681347a7665a05049911c55166a1
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date:   Thu May 20 14:54:22 2004 +0000

    Sources that are external to BRL-CAD are moved from the top level to src/other/.

    svn:revision:22797
    cvs:account:morrison
    cvs:branch:trunk

I can then get (for example) a top level view of the tree at that previous revision:

$ git ls-tree a1e49c~1
100644 blob cf056985dbd9086d3db465d486471e1e4ec5427f    .gitignore
100644 blob 20214282c2426fcf91b0cd7635598aedb1ae06a7    AUTHORS
100644 blob c750a69b34d9c6cd4a914966f104176d00edf5f4    BUGS
100644 blob 1df557054723c07a38dc07014966a80bf024fdbc    COPYING
...

git ls-tree a1e49c~1 iges/
100644 blob 2ce043e0fec731921623a30b66a61350a6ca8f28    iges/Makefile.am
100644 blob 77a5bc69e89aae54230f3594a29345b4a6210c43    iges/add_face.c
100644 blob de7c126c87da7202a0fff25b39915c1605b6624e    iges/add_inner_shell.c
...

starseeker (Apr 01 2021 at 23:14):

$ git ls-tree -r a1e49c~1 |grep /iges\\.c
100644 blob 3cc309a9a5cc94b19ac1ffcda9f4a1204f889bbc    iges/iges.c

starseeker (Apr 01 2021 at 23:17):

To follow back up the parent-child chain starting from that commit, I can just pull a local log:

$ git log --oneline -10 a1e49c
a1e49c5edb Vast reorganization begins.  Sources moved from top-level directories into src/.
be1f313780 Sources that are external to BRL-CAD are moved from the top level to src/other/.
4440f1c095 Sources that are external to BRL-CAD are moved from the top level to src/other/.
fa32f6950a The old regression test scripts are being replaced by something else.  Likely it'll be Corredor with some unit test framework.  The old scripts are so far out of sync and so inadequate that it's simply not worth it any more.
074785b939 moved from html/ to doc/html/
4e5eaaaa87 s/.doc/.tr/
b51a0ee5e9 renamed .doc files to .tr since they are [tng]roff files
40e36bc94e old nmake visual studio file no longer exists
679e068d94 cake is no more and theres no incentive to maintain it any more so .. buh bye.
29ba93efce rename the text files from .doc to a .txt extension.  reserve .doc extension for groff files

starseeker (Apr 01 2021 at 23:33):

$ git log --all --name-only --pretty=format:"" "**/TODO" |sort|uniq

doc/docbook/resources/other/standard/xsl/TODO
doc/docbook/resources/standard/xsl/TODO
doc/docbook/system/man3/en/TODO
doc/docbook/system/man3/TODO
libitcl3.2/TODO
libitcl/TODO
libpng/TODO
misc/d-bindings/TODO
misc/tools/astyle/TODO
misc/tools/svn2cl/TODO
src/archer/TODO
src/libdm/TODO
src/libged/TODO
src/libicv/TODO
src/libpc/TODO
src/other/blt/src/TODO
src/other/ext/stepcode/TODO
src/other/ext/tcl/compat/zlib/contrib/iostream3/TODO
src/other/ext/tcl/pkgs/itcl4.2.0/TODO
src/other/ext/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/other/flex/TODO
src/other/freetype/docs/TODO
src/other/incrTcl/itcl/TODO
src/other/incrTcl/itk/TODO
src/other/incrTcl/TODO
src/other/libitcl/TODO
src/other/libnetpbm/TODO
src/other/libpng/TODO
src/other/libz/contrib/iostream3/TODO
src/other/openscenegraph/TODO
src/other/stepcode/TODO
src/other/step/TODO
src/other/tcl/compat/zlib/contrib/iostream3/TODO
src/other/tcl/pkgs/itcl4.0.4/TODO
src/other/tcl/pkgs/itcl4.2.0/TODO
src/other/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/other/uuid/TODO
src/qbrlcad/TODO
src/qged/TODO
src/superbuild/stepcode/TODO
src/superbuild/tcl/compat/zlib/contrib/iostream3/TODO
src/superbuild/tcl/pkgs/itcl4.2.0/TODO
src/superbuild/tcl/pkgs/tdbcpostgres1.1.1/TODO
src/tclscripts/checker/TODO

starseeker (Apr 01 2021 at 23:52):

If you know something about the contents, you can use git grep - for example, if I think the historical version of "iges.c" that I'm looking for has the string "Code to support the g-iges converter" in it but I don't know if the file name changed, I can do the following to grep for it back 5 commits:

$ git grep "Code to support the g-iges converter" $(git log -5 --pretty=format:"%H" 3408f5ba122027)
3408f5ba1220271623a90b3740eb43abe06a857a:src/conv/iges/iges.c: *  Code to support the g-iges converter
90f783ca790a5a2f7d176c1b9c0a5eba4c880927:src/iges/iges.c: *  Code to support the g-iges converter
f89fb406daf8348bf215ed96f115bdcf9bbd072c:src/iges/iges.c: *  Code to support the g-iges converter
b6414214c3cdd7e883be1d5f3cd19f9102deb9ec:src/iges/iges.c: *  Code to support the g-iges converter

Notice only 4 commits reported matching that content. If we look at the straight log for 5 commits from that point:

$ git log --oneline -5 3408f5ba122
3408f5ba12 (HEAD) moved all the geometry converter directories from src/. to src/conv/.
48a6bed946 a single iges file didn't make it for some bizzare reason, manually move from src/iges to src/conv/iges
90f783ca79 iges converter moved
f89fb406da moved all the geometry converter directories from src/. to src/conv/.
b6414214c3 formatting, spelling, reference the tasker too

Commit 48a6bed946's tree does not have a file (by any name) matching that string.

starseeker (Apr 02 2021 at 00:24):

iges.h has similar results, being missing from 3 commits (this is from a checkout of 3408f5ba12:

$ git grep "I G E S . H" $(git log -5 --pretty=format:"%H")
3408f5ba1220271623a90b3740eb43abe06a857a:src/conv/iges/iges.h:/*                          I G E S . H
b6414214c3cdd7e883be1d5f3cd19f9102deb9ec:src/iges/iges.h:/*                          I G E S . H
$ git log --oneline -5
3408f5ba12 (HEAD) moved all the geometry converter directories from src/. to src/conv/.
48a6bed946 a single iges file didn't make it for some bizzare reason, manually move from src/iges to src/conv/iges
90f783ca79 iges converter moved
f89fb406da moved all the geometry converter directories from src/. to src/conv/.
b6414214c3 formatting, spelling, reference the tasker too

starseeker (Apr 02 2021 at 00:34):

I'm still not sure why git log --follow pulls in 23649 for iges.h when doing the src/conv/iges/iges.h path search - it's clearly wrong. However, if I check out the first commit that does have the iges.h contents again (b6414214c3) git log --follow looks like it can go the rest of the way successfully.

starseeker (Apr 02 2021 at 00:36):

@Sean My thinking is it's more likely we found a bug in git log --follow than in the repo data...

starseeker (Apr 04 2021 at 16:44):

@Sean did we want to set the BRL-CAD github org's icon to the BRL-CAD logo? Right now it's just one of the generic Github images...

Sean (Apr 05 2021 at 19:26):

That's helpful. Looks like ls-tree in combo with a couple other commands can help me walk it back.

Sean (Apr 05 2021 at 19:28):

Really needing commits with diffs... but apparently that's going to require some customization. Will have to live with links to the changes for now.

Himanshu (Jun 14 2021 at 14:46):

I just saw some other organizations in GitHub and they have verified tag. We can have it too right?

Sean (Jun 14 2021 at 15:56):

Armin (LordOfBikes) (Jun 15 2021 at 21:40):

The verified tag is the committers responsibility.
Commits, made online by GitHub web interface, are verified with a GitHub key automatically.
Committer with push access have to set up a GPG key to their GitHub account and sign local commits using this key.
Then commits are verified by the developers key. A click on the verified tag shows the key owner.
See https://docs.github.com/en/github/authenticating-to-github/managing-commit-signature-verification

starseeker (Jun 24 2021 at 15:41):

So far the new repo hasn't "taken" for analysis - it tried to pull it down last night, but towards the end of the processing this morning something must have gone wrong. I added some paths to the ignore files (src/other, etc.) - that may help it complete successfully. Fingers crossed...

starseeker (Jun 24 2021 at 17:19):

Sean (Jun 25 2021 at 04:58):

Erik (Jul 16 2021 at 14:41):

starseeker (Jul 16 2021 at 22:44):

It's official from my perspective - the SVN repo is frozen and all dev activity is now on github

starseeker (Jul 16 2021 at 22:45):

There's still a lot of polishing to do on the site - get our logo up, see if we can migrate the sf metadata (patches, bug reports, etc.) somehow, etc. But Github is now the active development center.

Erik (Jul 17 2021 at 00:19):

Erik (Jul 17 2021 at 00:21):

starseeker (Jul 17 2021 at 00:23):

thanks :-). It's satisfying to have it complete, although I'm still finding myself in the "confound it, why doesn't git record file moves" camp

starseeker (Jul 17 2021 at 00:24):

The CI testing has been Really Useful though - it's already caught me a number of times.

starseeker (Jul 17 2021 at 00:25):

I tried turning on CodeQL to see what happens - early signs suggest we may be too big a bite for that setup to handle.

Erik (Jul 17 2021 at 00:37):

blehhhh, ci/cd stacks, that's my life lately. On software that takes 40 minutes on a 64 core (128 hyperthread) machine to compile and, uh, a test sys that is heavy enough that it'd cost ~$500 on aws to run once and has a minimum 10 hour turnaround... I hear ya on the pain of bein' too big :D

starseeker (Jul 17 2021 at 00:46):

I had already evolved a script to target the clang static analyzer at our core libs selectively, but I figured that would be a local machine only affair. However, I found some examples recently which suggested it might actually be possible to install the necessary pieces on the runner to set that up as a github action. I'm letting CodeQL run a bit to see what happens, but I wouldn't be surprised to see it time out without finishing.

starseeker (Jul 17 2021 at 00:48):

The static analyzer script looks like it may be able to complete in on the order of an hour, which isn't too bad.

starseeker (Jul 17 2021 at 00:48):

We're deliberately building serially in order to minimize stress on the I/O subsystem - I pushed it harder in some early tests and had a few cases where file writes didn't complete properly.

Erik (Jul 17 2021 at 00:52):

ya'll should get a lil nvme raid with one of them melly-nox connectx5's :D beastly i/o pair

Erik (Jul 17 2021 at 00:53):

(if the file writes didn't complete properly, either there're kernel bugs or your writer doesn't check return values and bitbuckets data when the buffers are full)

starseeker (Jul 17 2021 at 00:56):

I'm not sure what sort of backend system the Actions setup is using for its runners, so I can't say for sure.

starseeker (Jul 17 2021 at 00:57):

So far at least none of the issues we've hit is anything like that sourceforge failure that led to the duplicate SVN commit id crisis (knock on wood)

starseeker (Jul 17 2021 at 00:58):

Usually when that sort of thing happens I suspect another parallel compilation bug, but in this case it was a single .c file that failed to build - not much opportunity there for parallel issues...

starseeker (Jul 17 2021 at 00:59):

Sumagna Das (Apr 02 2022 at 07:47):

@Sean @starseeker i am thinking about trying to migrate the bugs to start getting back to work......and while migrating look at the bugs i can try to fix as starters to getting to know the code

Sumagna Das (Apr 02 2022 at 07:47):

starseeker (Apr 02 2022 at 22:16):

@Sumagna Das You'll want to check with @Sean on that one - I know he has some thoughts about migrating SF data

Sumagna Das (Apr 03 2022 at 05:30):

Sean (Apr 03 2022 at 19:16):

Sean (Apr 03 2022 at 19:17):

were you thinking the BUGS file? I wouldn't migrate those to github issues without first confirming that they are still issues. The BUGS file is intended to be for devs to leave notes on issues that may or may not be user visible, may or may not be fixed, may or may not be opinions on design, etc. It's great for finding things to work on, but I wouldn't necessarily think we want to elevate all of them to a github "issue".

Sumagna Das (Apr 03 2022 at 19:17):

well right now my target is the already present BUGS and TODO files....after that i will try the online issues

Sean (Apr 03 2022 at 19:17):

A better starting point would be to look at the bugs reported at http://sourceforge.net/p/brlcad/bugs/ ... those could all be migrated automatically or manually

Sumagna Das (Apr 03 2022 at 19:19):

Sumagna Das (Apr 03 2022 at 19:20):

well i was going to try the sf2github script but it needs the bugs.json file to start which i dont have

Sean (Apr 03 2022 at 19:20):

there are 126 bugs listed on sf.net, 67 feature requests on sf.net, 51 support requests, 4 geometry, and 214 patches. there's about 166 entries in the BUGS file and 492 ideas in the TODO file. :)

Sumagna Das (Apr 03 2022 at 19:22):

Sean (Apr 03 2022 at 19:22):

I mean it all depends on what interests you. working on any of those will be helpful!

Sumagna Das (Apr 03 2022 at 19:23):

anyways i saw that the sf2github script is not updated but i can fix it to work as per our need i think

Sean (Apr 03 2022 at 19:23):

personally, I'd probably start with the smallest (geometry) and next smallest (support requests), etc just because I like to shorten lists.

Sumagna Das (Apr 03 2022 at 19:23):

Sumagna Das (Apr 03 2022 at 19:24):

right now i was trying to parse the TODO file....should i continue with it or start doing the sf requests?

Sumagna Das (Apr 03 2022 at 19:26):

Sean (Apr 03 2022 at 19:30):

well, I meant actually address the item, not really migrate it -- or migrate it manually (copy-paste and link to the sf item)

Sean (Apr 03 2022 at 19:30):

I can look into generating the .json file -- there's a script I have to run as admin, I believe

Sean (Apr 03 2022 at 19:31):

alternatively, could just look through the list of bugs in BUGS like you'd said and find one you think you understand -- then add it to issues, then work on it ;)

starseeker (Apr 03 2022 at 20:04):

Just as an observation - the BUGS and TODO files, by virtue of being part of the repo, are already preserved on Github. The data in the Sourceforge systems isn't migrated at all, so from a data preservation standpoint it's the data we don't have migrated at all, in any form.

starseeker (Apr 03 2022 at 20:06):

For the SF data, my thinking (again for what it's worth) is that it's probably worth migrating them by hand, and doing some checking to see if the original issue is still valid for the current codebase. The end result would be a better set of issues than just a mechanical migration.

Sean (Apr 03 2022 at 20:36):

Yeah definitely would be most valuable to have some manually migrate and validate sf tracker items.

Sean (Apr 03 2022 at 20:39):

That’s where I’d probably start with the geometry because there’s just four of them and they could easily turn into four pull requests for new sample geom. iirc they just needed docs and some minor cleanup like making sure top level object name made sense, minimal overlaps, make sure title is set, etc

Sumagna Das (Apr 03 2022 at 21:22):

so i tried pulling all of the tickets throught the SF api.....one thing i have to know is that there are a few tickets with attachments, right?

Sumagna Das (Apr 03 2022 at 21:25):

if manual checking is needed then i can try putting all of the tickets i got throught API into a text file and then manually checking the needed ones?

Sumagna Das (Apr 03 2022 at 21:53):

Sean (Apr 03 2022 at 22:50):

There are a lot of tickets with attachments (especially the patches and geometry trackers), but not so much for the feature and support request trackers.

Sean (Apr 03 2022 at 22:51):

Yes, that would definitely work and be helpful! Any trackers that are still relevant could be manually submitted as a gh issue or pr (in the case of the patches and geometry).

Sumagna Das (Apr 04 2022 at 06:52):

i am giving only the urls of the attachments because nothing else can be gotten from the API

Sumagna Das (Apr 04 2022 at 06:55):

i will make a text file for an intermediate place for the tickets then......after the manual checking, the text file can again be parsed and then put onto github if that works

Sumagna Das (Apr 04 2022 at 08:06):

these file contain tickets with their information i got from the sourceforge API.....if this works, then i can make a parser which will parse the checked tickets and get it into github as issues

Sean (Apr 04 2022 at 20:05):

@Sumagna Das that sounds good, but I don't want to cause you work if there's a tool I can run as admin to migrate everything -- what about this: https://github.com/cmungall/gosf2github ?

Sumagna Das (Apr 05 2022 at 05:54):

wait.....there was and updated tool....the last time i checked there were no updated tools for this

Sumagna Das (Apr 05 2022 at 05:54):

Sean (Apr 05 2022 at 15:10):

@Sumagna Das there's no mention whether that tool does anything with file uploads, but I was going to test it out on the geometry tracker since it's so small.. If it goes bad, probably won't be hard to clean up after it.

Sumagna Das (Apr 06 2022 at 05:21):

geometry tracker doesnt have any attachments and its small so not a problem i guess

Sean (Apr 06 2022 at 06:37):

@Sumagna Das the geometry tracker does have attachments... they're in the comments

Sean (Apr 06 2022 at 06:37):

Sumagna Das (Apr 06 2022 at 14:44):

Sumagna Das (Apr 06 2022 at 14:45):

the SF API supports providing the discussion (posts) as well as it uploads/attachments via requests i think

Sumagna Das (Apr 06 2022 at 14:46):

Sean (May 02 2022 at 13:27):

Profanity aside, this is actually a really useful reference for common git issues: https://ohshitgit.com

Sean (Dec 02 2023 at 21:40):

starseeker (Dec 02 2023 at 22:20):

Huh, interesting. Certainly feels like it should be useful for some sort of repo report generation

Alexis Naveros (Dec 02 2023 at 23:58):

Hey Sean, it has been years, how's everything? I have received the "ok" from Mark to work fewer hours to do that point cloud thing. I'm planning the algorithm on paper before I get started, there are some details I'm undecided how to handle

Alexis Naveros (Dec 02 2023 at 23:58):

And that post of mine was off-topic. I'm not used to this Zulip topic-based chat

Alexis Naveros (Dec 03 2023 at 02:47):

I would have a couple questions... Cliff said you already had Screened Poisson reconstruction, the wording suggested that it was satisfactory but very slow. Could it be just a matter of beating the hell out of that code with threads, SSE/AVX/AVX-512, atomics, NUMA awareness? I briefly looked at the code but was a bit lost backtracking beyond SPSR.cpp

Alexis Naveros (Dec 03 2023 at 02:49):

And do you have some kind of deadline or desired date for the mesh reconstruction algorithm? Just to have an idea how I'll weight the couple different things that need to be done

Sean (Mar 13 2024 at 05:35):

hey @Alexis Naveros very delayed reply!... everything has been going really great, and glad to hear they're going well for you too. short answer is "I dunno" on the screened poisson, at least to say for sure. I'm fairly certain it's typical unstable non-performant academic code, so yeah, probably tons of room for optimizations and improvement.

On that point, I listed to a talk just last week by someone that was comparing screened poisson with other methods, outlining the general deficiencies of the algorithm. I believe they were approaching it from a completely different perspective, incorporating ML into the pipeline to make more dynamic decisions, with good results.

Sean (Mar 13 2024 at 05:38):

if it wasn't obvious, we don't have deadlines here. or better still, there's many many many desired deadlines to choose from and they often go wooshing by, but we make progress steadily still.

I consequently just finished implementing a montecarlo approach to external surface area estimation that samples the hell out of the exterior surfaces and would love a robust point-cloud to solid mesh routine. My current tactic is going to be to sample it very densely, make thin cylinders at each surface hit point, mesh and union them all together, and (if sampled densely enough) I should be able to eliminate all the interior faces/points. It's stupid, but it just might work well.

Stream: brlcad

Topic: GitHub

Sean (Sep 10 2019 at 18:39):

Sean (Sep 10 2019 at 18:42):

Sean (Sep 10 2019 at 18:44):

Sean (Sep 10 2019 at 18:47):

scorp08 (Sep 11 2019 at 10:24):

Sean (Sep 11 2019 at 18:40):

Sean (Sep 11 2019 at 18:40):

Erik (Feb 29 2020 at 13:30):

Sean (Feb 29 2020 at 14:51):

starseeker (Mar 14 2020 at 14:09):

Erik (Mar 17 2020 at 12:14):

starseeker (Mar 20 2020 at 02:51):

Erik (Mar 22 2020 at 22:37):

starseeker (Mar 23 2020 at 00:04):

starseeker (Mar 23 2020 at 00:12):

Sean (Mar 23 2020 at 01:57):

Sean (Mar 23 2020 at 01:58):

Erik (Mar 23 2020 at 11:07):

Erik (Mar 23 2020 at 11:08):

starseeker (Mar 23 2020 at 12:14):

starseeker (Mar 23 2020 at 12:22):

starseeker (Mar 23 2020 at 12:25):

Erik (Mar 23 2020 at 12:55):

Erik (Mar 23 2020 at 12:56):

starseeker (Mar 23 2020 at 16:36):

Daniel Rossberg (Mar 23 2020 at 16:55):

starseeker (Mar 23 2020 at 17:55):

Sean (Mar 24 2020 at 06:32):

Sean (Mar 24 2020 at 06:32):

Sean (Mar 24 2020 at 06:39):

Sean (Mar 24 2020 at 06:41):

Daniel Rossberg (Mar 24 2020 at 07:45):

Daniel Rossberg (Mar 24 2020 at 07:51):

Sean (Mar 24 2020 at 07:57):

Sean (Mar 24 2020 at 07:58):

Daniel Rossberg (Mar 24 2020 at 08:02):

Sean (Mar 24 2020 at 08:03):

starseeker (Mar 24 2020 at 11:59):

starseeker (Mar 31 2020 at 11:48):

starseeker (Apr 07 2020 at 18:56):

Erik (Apr 08 2020 at 13:52):

starseeker (Apr 08 2020 at 19:18):

starseeker (Apr 13 2020 at 12:43):

Erik (Apr 17 2020 at 23:39):

starseeker (Apr 19 2020 at 18:44):

Erik (Apr 22 2020 at 17:00):

starseeker (Apr 22 2020 at 19:34):

Sean (Apr 22 2020 at 19:37):

Sean (Apr 22 2020 at 19:37):

starseeker (Apr 22 2020 at 19:39):

starseeker (Apr 29 2020 at 16:03):

starseeker (Apr 29 2020 at 16:05):

starseeker (May 02 2020 at 17:21):

Erik (May 03 2020 at 14:44):

Sean (May 03 2020 at 14:44):

Erik (May 03 2020 at 14:46):

Sean (May 03 2020 at 14:48):

Erik (May 03 2020 at 14:48):

Sean (May 03 2020 at 14:49):

Erik (May 03 2020 at 14:49):

Erik (May 03 2020 at 14:50):

Erik (May 03 2020 at 14:51):

Erik (May 03 2020 at 15:00):

starseeker (May 03 2020 at 18:30):

Sean (May 03 2020 at 18:45):

starseeker (May 03 2020 at 22:17):

starseeker (May 03 2020 at 22:18):

Sean (May 03 2020 at 22:46):

Erik (May 05 2020 at 23:20):

Erik (May 05 2020 at 23:20):

Sean (May 05 2020 at 23:21):

Sean (May 05 2020 at 23:21):

Erik (May 05 2020 at 23:22):

Erik (May 05 2020 at 23:22):

Sean (May 05 2020 at 23:22):

Sean (May 05 2020 at 23:23):

Erik (May 05 2020 at 23:23):

Sean (May 05 2020 at 23:23):