hey @starseeker , Sean said that i can help the migration out by checking if theres anything missing from the github nonotes repo that is present in the svn repo
so should i still try that?
and on the recent migrated repo brlcad_conv5?
Sure. The diff line that I was using can be found in this file:
https://sourceforge.net/p/brlcad/code/HEAD/tree/brlcad/trunk/misc/repoconv/common_structs.h#l152
That filters out known differences to look for anything unexpected.
And yes, brlcad_conv5 is the latest
i was thinking about using a script
Sure, whatever works for you - the diff line just tells you what differences you should expect to see.
ok
i will use that
If you want to iterate through revisions, you'll need to parse the svn revision and branch out of the git commit message, construct the correct svn checkout line, and then diff the trees.
i was thinking about using python for this. will it be possible in python?
Don't see why not - anything that can invoke the correct commands will do fine. The trick is to construct them in the first place - that's what I've got encoded in the C++ logic I linked to.
ok
the conv5 repo is going to take some time with my internet speed
so bye
good luck!
do you have any idea why the git clone is taking so much time?
it didnt take so much time for the svn pull
It's a full copy of the entire history - that's how Git works.
Locally here it's about 340 megabytes.
no not that
Oh, you mean a local checkout?
(deleted)
i am cloning the thing and it hasnt completed yet
its at 14%
Yeah, that's what I was talking about. WIth git there is a "clone" - which pulls a copy of the entire history - and "checkout" which retrieves a version of that history from the local copy.
"git clone" is downloading more data than an "svn checkout". The equivalent (more or less) for svn is rsync copying the entire history.
oh i got confused with the terms
The upside (especially for repository verification) is that you'll find local checkout "git checkout" is much faster, since you won't have to go over the network.
so the whole repo might be how much big?
My local copy here is on the order of 340 megs.
the total repo?
yes.
mine is at 125.4 MB
It's actually quite a bit more compact than the SVN rsync copy. Of course it's just the BRL-CAD history, without the secondary repositories.
Probably about 1/3 of the way there then.
but the number of files received is 90560 out of 596546
@Sumagna Das if you want to do a lot of checkouts of SVN, you may want to consider doing the rsync copy of the full SVN history so you can do local checkouts with SVN as well - otherwise, every time you want a new revision from SVN you'll have to pull data over the network.
That's quite a bit larger than the Git checkout though... closer to 2 gigabytes, if I recall correctly.
which one is faster if i have 3MBps speed - pull or rsync?
for git, "git clone" is the correct way to get a copy. rsync is what you will need to use if you want a local copy of the full SVN repository
i might use pull every time and delete the revision i had
Um - once you have the clone, "git checkout" should work fine.
that might be faster?
i am talking about the svn repo
"git pull" means bring changes into your own repository from upstream - that's probably not what you're going to need to do here, unless I push updates to brlcad_conv5
The fastest way I know of to do a lot of SVN checkouts is to first rsync the full repository to your local disk, and then interact with that. It will take more space locally, but will be much faster for each checkout from SVN.
then let me start it
I'd wait for the git clone to finish
will that slow down the speed of the git clone?
If your connection is saturated, yes.
saturated?
Already downloading as much data as possible
i dont think so
Well, you can try it - but it sounds like there is a bottleneck somewhere between you and github.
rsync -av svn.code.sf.net::p/brlcad/code .
it once downloaded 4gb in 1 hour without me knowing it.(i started Epic games launcher and didnt knew that i had set fortnite to auto update and it updated in the background until i started watching youtube videos and it was buffering)
That will copy the SVN repository to the directory "code" locally.
thanks
To checkout a particular revision from SVN using that local copy, it will be something like:
svn co -q -r29886 file:///your/local/directory/code brlcad_svn-r29886
starseeker said:
Well, you can try it - but it sounds like there is a bottleneck somewhere between you and github.
i dont know but i am having problems with the github home page. My feed is not loading. is you home page feed loading well?
seems to be
starseeker said:
rsync -av svn.code.sf.net::p/brlcad/code .
this is the correct command?
That's the one I use - is it causing problems?
i forgot to include the . at the end of it
thats why it caused some problems
ah, yeah - that would do it. Remember, that's going to be quite a bit of data - more than the git clone
ok
i dont have much problem with that
how is the git clone coming - still slow?
it is at 147 mb right now
ouch. Yeah, that'll take a while then.
hmm
On the plus side, I really like the view gitk offers into the repository history - makes it way easier to go hunting for something.
i reported the issue and waiting for the reply. Until then, it is still trying to download 340 mb and out of that it has downloaded 214 mb so it might take some time
How about the rsync of SVN?
nearly r50000
and the speed is back for github
github repo done
only rsync is left
both the repos are done
i will take you help during the checkout part
of the svn repo
It's the line I posted earlier:
svn co -q -r29886 file:///your/local/directory/code brlcad_svn-r29886
i tried absolute and relative paths to the code directory but it doesnt work
gives me something like this
svn: E170013: Unable to connect to a repository at URL 'file://mnt/sda2/brlcad_migr/brlcad_svn/code'
svn: E170000: Local URL 'file://mnt/sda2/brlcad_migr/brlcad_svn/code' contains unsupported hostname
fixed the problem
was missing a slash
Right - that syntax drives me nuts, can't count how many times I've missed that slash.
it took some time to do the checkout
went to eat something while the checkout
going to let the laptop rest for a while. its hot as the sun right now due to the things i was running in parallel
Heh. Yeah, this kind of testing is a stress test, especially for hard disks. The main conversion really hammers my machine for over a week when I run it.
the checking will take some hours or more for my machine
Make sure you only check out (say) trunk or a given branch - the way the SVN repository is set up by default, a root checkout will create every tag and branch of every repository in the entire repo
That's a LOT of data, and not particularly useful
I saw that
Will try to checkout only trunk but I dont know how to checkout a specific branch from local repo
It's the same as checking out from subversion on sourceforge
just uses the file:// location instead of the http location
Oh ok
Then it will not be a problem if it's the same mechanics
svn checkout https://svn.code.sf.net/p/brlcad/code/brlcad/trunk
becomes
svn checkout file:///home/user/code/brlcad/trunk
where "/home/user/code" is replaced by your local path
(rather different from Git)
Hmmm
Hey @Sean do u know of a way to scroll through svn log
just like we can do in case of git log
?
I pipe it to a file and then use an editor or less on it...
i was piping to less directly :yum:
That works too.
creating a script right now to compare every commit in the github repo with its svn counterpart
/me suspects you'll be able to cook with your laptop before that's done...
hmmm
only on the trunk branch
Slightly easier - that's still most of the commits though.
so what will e much easier?
*be
checking only trunk instead of all the branches + trunk
i was going to do only on trunk
/me nods.
My computer crashed :disappointed:
Probably due to a bug In the which cause the ram usage to go up
Sumagna Das said:
Hey Sean do u know of a way to scroll through
svn log
just like we can do in case ofgit log
?
"git log" is nearly equivalent to "svn log | less -S"
@Sean just set up the basic part of the script
so what should it check - files, commit description or something else?
hey @starseeker, should i also check for the commit descriptions?
along with the files?
the commit messages will be different, since we have merged the notes - you'll have to script in the removal of the extra information or the commits won't match. We also line-wrapped most of them, so you'd have to mash both messages down into newline free strings.
Although your copy might not have the line wrapping, actually... don't remember
i have scripted in the removal of the extra info
line wrapping?
(deleted)
Most of the time, Sean and I will just type out a message on the command prompt rather than opening an editor, so we don't add newlines to the commit messages. That's rather contrary to how the Git world at large works.
ok
i will have to see to that then
https://github.com/starseeker/brlcad_conv8 should have it
the line wrapping?
yes
then i have to clone that one
No big deal - your copy should be fine for an initial check
the diff line you provided - if it returns something, then something is off?
Possibly - what is the result?
@Sumagna Das Actually, a difference is not guaranteed to be an issue - the SVN repository was pre-processed prior to conversion to remove expanded RCS tags from its internal content, so differences might manifest between a non-preprocessed checkout and what was used for the conversion. the misc/repoconv/CONVERT.sh script gives you an idea of what was done to produce the initial git repo.
(deleted)
@starseeker the final script has been set up with the diff
line you provided and commit message checker(if they are the same in every one of the commits
should i check all of the commits(all of them from the github repo to svn)?
I would start with some of them - there will be differences, so you'll want to make sure you can understand them
commit c97468e924 /rev 76457:
Only in svn/r76457/misc/repoconv: cvs2git
Only in svn/r76457/src/other/gdal/ogr/ogrsf_frmts/dgn: web
commit adb54fde73 /rev 76456:
Only in svn/r76456/misc/repoconv: cvs2git
Only in svn/r76456/src/other/gdal/ogr/ogrsf_frmts/dgn: web
these are the differences from two of the commits and their svn counterpart
Hmm. OK - so if you look at the two checkouts, can you confirm those differences?
i think so
OK. Now, the next question - why would they not be in git?
thats something i dont know
i was going to ask you that
btw i am storing them in a file
The first one is because cvs2git is an empty directory - SVN allows the checking of empty directories, git does not
The second is more puzzling
https://github.com/starseeker/brlcad_conv8/tree/master/src/other/gdal/ogr/ogrsf_frmts/dgn is there...
so...
what is your git checkout line?
git checkout <sha1 of the commit>
is that suspicious?
No, that's correct
ok
is that on another branch or something or is it not that much required?
https://sourceforge.net/p/brlcad/code/76456/tree/brlcad/trunk/src/other/gdal/ogr/ogrsf_frmts/ also doesn't show a "web" directory
Ah! I'm wrong - missed one level
And what's in that directory?
empty
thats why
bingo.
So my recommendation would be to script the removal of all empty directories from the SVN checkout before doing the comparison.
hmm
starseeker said:
So my recommendation would be to script the removal of all empty directories from the SVN checkout before doing the comparison.
completed that part
should i add any more things to check or anythin?
@starseeker i am going to let the script run the whole night and i will tell if it reported anything (after working correctly)
Not offhand, but there are a lot of potential issues (that's why I've got all this scripting and C++ logic in place...)
sounds good
starseeker said:
Not offhand, but there are a lot of potential issues (that's why I've got all this scripting and C++ logic in place...)
potential issues?
reasons for differences in files. RCS tag expansion is a classic - I've tried to filter it with the diff script, but it could use more testing to make sure I've got the expressions right
commit c97468e924 /rev 76457(commit messages match? True):
no differences
commit adb54fde73 /rev 76456(commit messages match? True):
no differences
commit 992d64fadf /rev 76454(commit messages match? True):
no differences
this is the text file which is being outputted to
Looks good. My expectation is that the SVN era commits will be in pretty good shape - it's the CVS era where I know there are still issues.
if there are differences, then it will be outputted in there
i am going to ignore the commit checker because as you said earlier, there might some differences in commit messages.
i also added the empty directory remover part
its working till now and i expect to work all the night
so good night
why is it sometimes jumping revisions?
sometimes upto 4 revisions at a time?
Not all SVN revisions map into git commits.
Then, in older commits, some commits will be in other repositories rather than brlcad - the SVN repo holds multiple projects.
@starseeker the differences it found until it aborted due to error during parsing this commit's (2bb78b9d04781650bf9226393281d738e7622824) description
difference.txt
the commits were in the format
<desc>
svn:revision:<rev>
svn:author:<author>
svn:branch:<branch>
until the aforesaid commit
commit desc -> (add view subdirectory (preliminary file move commit))
will fix that tomorrow
Ah, right - preliminary file move commits are not SVN commits - they're autogenerated to make it easier for git log --follow to track back along file moves.
I know
It was parsing them until it encountered the mentioned commit (I can quote because quoting is not available on the mobile app)
starseeker said:
The first one is because cvs2git is an empty directory - SVN allows the checking of empty directories, git does not
What about dropping a .gitignore into empty folders? Slight concern that leaving them out might cause breakage that did not exist before (e.g., something references the dir). Speaking of which, we should add compiling a couple of the tagged releases to the validation list... maybe 7.12 and 7.22 or similar, just to make sure nothing was introduced that's not a result of tool modernization.
Sumagna Das said:
starseeker the differences it found until it aborted due to error during parsing this commit's (2bb78b9d04781650bf9226393281d738e7622824) description
difference.txt
This is cool @Sumagna Das and helpful. This is one of the validation tasks I had on the to-do list.
I added the part to skip these type of commits
@Sean I think avoiding them is the best available option - adding them would be tricky, and even if we could get them in any such files would be guaranteed to break distcheck for those checkouts...
I was thinking them not being there might break the build for the same reason -- some file somewhere in the checkout listing that dir, and it not existing
there's a couple manual compilation verfications on our list, so that should present itself then if it's an issue
there's only a concern if their absence adds to build breakage
Let's wait and see if we have to then - new files in empty dirs would be difficult to add and after the CMake based file tracking was added, virtually guaranteed to break any distcheck tests.
sounds good. if it comes to a choice, breaking build should override breaking distcheck but I'll be a little surprised if it does break the build.
I don't think we ever made empty dirs (other than as process of creating the dir in svn) that stayed empty, so it's mainly our 3rd party codes
@Sean difference.txt this file contains all the differences it found in any of the commits
i changed the part where it prints "no differences" to the file for every commit to just print the differences for the commits it finds in
so my script had a bug in it because of which it found those so called differences
That's my life in a nutshell these last weeks...
hmmm
it didnt change branches according to the commit desc
because of which some changes were not found on trunk
on the github repo, master == trunk or is it the branch which contains all the branches' stuff?
master == trunk
because in some of the commits, there were svn:branch:bioh
or something like that which stated that it was from other branch and this created some differences with the trunk
so should i ignore them or let it checkout the correct branch in the svn part only and find any differences?
that notation indicates the commit in svn was made to the bioh branch, not to trunk.
so should i ignore commits from other svn branches?
The best thing to do would be to check out the branch and check it against the branch...
ok done
@Sumagna Das that's a question, actually - if you check out bioh and master from git at that point, what do you get?
Git's branching model is quite different from SVN and I'm still working my way through all the implications.
mismatching files
I mean if you check a git checkout of bioh at that point against SVN's bioh, and git's master vs SVN's trunk at the same point, does everything line up? The bioh I would expect to, since there was a commit to the branch at that point, but post-merge the bioh commit is also on master's history so I'm wondering what Git does vs SVN in trunk at the same point.
so should i ignore reference to other svn branches in master for now?
Probably, yes.
The commits where svn:branch:trunk is noted are where master and trunk should line up - I'm less certain what to expect if you pull master at the merged bioh commit...
later i will check for other branches
i am going to make it ignore commits for other branches onwards because checking about 360+ revisions again is going to take some time for now atleast
So that's a curious discussion -- so what does it mean to have a commit on trunk that says svn:branch:bioh ? or at least, what's that supposed to mean?
I wouldn't have expected to find any/many branch commits on master unless there was a branch-to-trunk merge that it's tracking and it just pulled the commits from the branch
meaning the work originated on the branch, but got merged to trunk
Right - work originating in branches, but merged to trunk. The way git history works, the branch commits then become part of master's history (if I'm understanding this correctly.)
That's in fact the main reason I added the svn:branch labels, and what's making my life complicated with CVS right now - in Git, once that merge happens, the commit that happened on bioh shows up as part of the history of both bioh and master. I guess a checkout would reference the rev-list relationships, if I'm understanding the proposal for a "date based" checkout here: https://stackoverflow.com/a/6990682
git rev-list --first-parent bullet --pretty=format:"%B" actually looks like it might do what I would expect for following an SVN branch's history (although it doesn't stop when the branch was created) - I'll have to compare that output to what the svn:branch based output produces. It certainly looks closer than anything I've found yet... Still doesn't tell me when the branch was originally created though.
@Sean Checked a little over 4700 CVS commits thus far, looks like about a dozen in there that have the appearance of legit differences between the CVS checkout and the git checkout. (getting some empty CVS checkouts as well trying to generate checkout lines from git info, but not 100% sure why - could be my fault, and I'm thinking this is starting to hit diminishing returns...)
Based on that rate of processing, it'll take at least a couple days to work through all the checks, maybe a bit more, when the final conversion run rolls around - is it worth spending more time on to try and make the fixes, or should we just accept the CVS conversion as-is?
o.O
Not much info to work with there other than ... "a dozen legitimate differences" sounds like an outright verification failure. We should inspect and what matters is whether the commit really is there or orphaned or elsewhere or what. Can you point me at a couple of the commits in question?
r29322 I think is one of them
29839
CVS checkout line for r29839:
cvs -d /home/user/brlcad_cvs -Q co -ko -D "2007-12-20 20:34:31 +0000" -P brlcad
For r29322:
cvs -d /home/user/brlcad_cvs -Q co -ko -D "2007-11-13 17:32:36 +0000" -P brlcad
The logic I'm using is in misc/repoconv/verify - although for this part I'm not using an SVN repository, just the CVS.
I'm not sure I'm right to be comparing against master in some of these cases, but I'm also having a heck of a time figuring out which branch I should be using instead...
29011, for example, looks like it was a merge onto a branch, but I can't seem to tell from git which one it was merged onto
/me is beginning to think he can only reasonably use the -D option on cvs against trunk/master... can't find anything about applying it to branches... grr.
i have found some commits which dont have any information for svn or are for another branch
Unexpectedly so?
i will give you the list i found until now
@Sumagna Das it's getting pretty late here - go ahead and post it, and I'll take a look tomorrow.
ok
i am in class right now
so......
i had to force shutdown my computer and i lost the commit sha1 list
i can only say that i had encountered 4-6 of them in total
i found this one just now
de92ecbe1a
it is also skipped
Gah. @Sean r19102 is representative of a general issue I'm seeing starting somewhere in the 19k range - the git checkout has libitcl, libtcl and libtk present in the checkout but CVS apparently does not.
I'll have to use the better cvs branch mappings I generated this morning to try and create better checkout commands - see if I can rule that out as a potential trouble source...
these are the commits i found after the previous wipe out
Any surprising/unexpected?
i changed the script to print to a file the list of commits which were skipped from one of two reasons
starseeker said:
Any surprising/unexpected?
i cant check while the script is running(i run it whenever i come online on my laptop else i dont have time)
it is checking r75780 right now
Which github checkout are you using?
Repo you mean?
If yes, then github.com/starseeker/brlcad_conv8
OK, good (that tells us how to decode the sha1 hashes). Unfortunately, they change each time I change the history, so it matters which iteration is being checked.
ok
@Sean I'm really not sure what to make of this. If I check out in an early date range, for example:
cvs -d /home/user/brlcad_cvs -Q co -ko -D "1999-04-21 17:23:51 +0000" -P brlcad
I'm getting a CVS checkout that does not contain libtcl or libtk. (The equivalent git clone does contain them.) However, if I look at doc/install.doc it does seem to refer to libtcl, as does gen.sh) Is it possible modern CVS is misinterpreting the old CVS repository somehow? I'm more surprised those directories are absent in the CVS checkout than that they are present in the git clone...
I do admit, though, that the conversion produced by https://github.com/rcls/crap does seem to skip including those directories when I check the equivalent commit...
Back on the bandwagon at r15263 from the looks of things...
So the SVN checkout at r15264 seems to agree better with the cvs-fast-export result, in that it has libtcl and libtk in the checkout.
CVS checkout for r15624:
cvs -d /home/user/brlcad_cvs -Q co -ko -D "1998-06-02 19:54:25 +0000" -P brlcad
@Sean If it's of interest, here were my CVS comparison results:
verify_cvs.log
Just shy of 500 checkout attempts generated an empty CVS tree, a lot of which I think are due to my not having a correct CVS checkout command for historical commits on branches. Since an empty CVS tree most likely isn't the goal there, I'm not generating 'eliminate the tree' diffs.
4948 commits generated a non-empty difference file, of which the majority (though certainly not all) are in that series in the 15k-19k range where CVS doesn't check out libtcl/libtk and friends.
At this point I see 3 options:
1) Just accept the cvs-fast-export conversion for what it is. Spot checking r15624, CVS, SVN and Git come to three different conclusions about what's there, so we never had perfect agreement with CVS to begin with, and we do seem to be able to trace back file history successfully (I've checked src/libnmg/bool.c and src/librt/primitives/ell/ell.c.)
2) Cherrypick those commit diffs that aren't part of the libtcl/libtk/etc. wrangle and apply just those, to match the CVS checkout when we don't appear to be caught up in that tangle. (That one's the most work.)
3) Try to force all the commits for which we have the non-empty delta to match the CVS checkout. Defensible as it gets us closest of the available options to what a CVS checkout would yield, but we may very well end up losing some of the libtcl/etc. folder contents in the garbage collect if CVS would never check them out.
Thoughts?
@Sumagna Das Of those commits, the two interesting ones are ecae1b1ed1 and 29b15aa2c4 - what differences are you seeing? Or is it reporting skipping those because they are on a branch?
@Sumagna Das oh, nevermind, I see you're printing skipped commits not commits where a difference was found.
starseeker said:
Sean I'm really not sure what to make of this. If I check out in an early date range, for example:
cvs -d /home/user/brlcad_cvs -Q co -ko -D "1999-04-21 17:23:51 +0000" -P brlcad
I'm getting a CVS checkout that does not contain libtcl or libtk. (The equivalent git clone does contain them.) However, if I look at doc/install.doc it does seem to refer to libtcl, as does gen.sh) Is it possible modern CVS is misinterpreting the old CVS repository somehow? I'm more surprised those directories are absent in the CVS checkout than that they are present in the git clone...
No, that's not likely. We did mess with those directories a lot back then. CVS doesn't track directories in any capacity, so if a directory is renamed for example, CVS does not know about it. It's just a tree of directories with RCS files.
What probably happened is those directories were manually renamed in the repo, so you probably have a libtcl8.3 or something checked out, whatever the last name was before we moved to SVN and they started getting tracked.
With the dir moved in the repo, it makes it that moved name back through time so even if you check out and older rev, you'd get the new dir name.
I think that renaming nonsense was isolated to the various tcl dirs.
Seems to be - I haven't scanned all of the diffs manually, but that's the only place I noticed it
if you check the checkout, you almost certainly have the dirs that are "missing", they just have a version number on the dir name
the fix is as simple as renaming them back to whatever the build is looking for, so I wouldn't be terribly worried about it so long as they're there
I've not tested it at anything like this scale, but in principle I can actually force the conversion to match the checked out CVS tree.
in this case, it's debatable which is right
basically, a backwards-incompatible change was made to the CVS repo at various points in time
Ah. Hmm.
such that checking out an old version as it actually existed is no longer possible without knowing what directories were renamed from-to
/me is inclined to go with the cvs-fast-export results, in that case... don't see much benefit to fighting with this, and on the whole the diff results were actually pretty good.
CVS repo had src/libtcl in r123, it was renamed to src/libtcl8 in r234 by renaming the backend directory -- thus checking out r123 now will also be src/libtcl8
ew.
Thank you for moving us to SVN just as I came onboard - much appreciated!
like I said, CVS does not know about or track directories in any capacity
and I think there was a reason for changing the backend directory.. i think if one created a src/libtcl8 and added all the files from src/libtcl and then deleted all of the files in libtcl, that has some undesirable effect like giving everyone an empty src/libtcl on future checkouts, so you had to scan and prune empty dirs
/me nods
and not just an empty src/libtcl, but a hierarchy of all the src/libtcl dirs .. all empty. I could be wrong, but it was something fugly
Sumagna Das said:
Sean difference.txt this file contains all the differences it found in any of the commits
This is awesome @Sumagna Das thank you for all your help with this! It's great to get some independent verification and validation.
oh i have the new updated skipped_commits.txt
and due to some bug in my code, there are commits which have been printed twice or multiple times.
If you can, it would be helpful to translate those sha1 values into log messages when you generate the skipped_commits.txt file:
git log -n1 <sha1> --pretty=format:"%h:%n%B"
to see why it has been skipped?
yes
ok
wll do that the next time i run the script
Skipped is less critical than finding a difference though - the latter indicates at least a possibility of a problem in the conversion.
ok
until now, no new difference has been found
that's good :-)
hey @starseeker, i used git svn
to clone the full history of the svn repo
because pulling it everytime was taking much time
if thats going to change anything please inform me
i am not going to run the script before you tell me anything about any issues with this method if present
as if i run the script, its going to go very fast through checking out commits, diff'ing them and repeat the process
I've never tried that, but in principle it should work - go ahead and give it a shot.
it will give the script a speed boost, a major speed boost
that'll be faster even than an svn checkout from a local rsync copy of the SVN repo?
yea
OK, sure - give it a try. If you find a difference we'll need to double check it with the regular checkout, but see what it can do
ok
Remember, once you get below 29887 there are likely to be some differences, as you'll be into the CVS portion of the history. (I posted my results of a direct CVS/git comparison a couple days back...)
ok
Anything that shows up 29887 or later as a difference is definitely a concern.
ok
Before that... it depends, but my inclination at this point barring some sort of catastrophic issue is to accept the cvs-fast-export results - from what Sean was saying there was at least one change made in the CVS days that would royally complicate the reconstruction challenge for converters and probably explains some odd results in the current CVS checkout in the 15k-19k range.
However, a final decision must wait on the results - that's why we need the testing in the first place. :wink:
ok
and it will not be similar to "import repo" in github because the email doesnt have to be mapped to any github account
so each revision/commit will be like the svn one
i asked someone in git's irc channel and he said that it wont change anything about the svn repo
now the overall time taken is the time for the diff
command
nice!
earlier it took about 10mins to check a revision and now it takes about (maximum) 2-3 minutes if the diff
command doesnt take too long
else about 1 minute
/me winces. Yeah, that's pretty rough.
it took about 3-4 nights (not days because i could turn it on for the whole night only as my father worked during the day in the windows part) which was the time i had to compromise
and it paid off
@Sean Do you think it might be better (given those performance constraints) to do a random scattering of revision checks instead of marching through all of them? I know that's less thorough, but at 2-3min per commit that'd be more than a month to run just back to the beginning of the SVN history...
until now, there is no changes in the difference.txt file but because of a porting bug(i forgot to change the rmdir
empty directories part according to the recent change) it is filled with "web folder only in svn repo" lines
@Sumagna Das that happens - as long as we know to ignore them, should be OK.
(deleted)
starseeker said:
Sumagna Das that happens - as long as we know to ignore them, should be OK.
i found out the bug and fixed it
now it is deleting them
@Sumagna Das can you print the log msgs?
log?
git log -n1 <sha1> --pretty=format:"%h:%n%B"
ohh
instead of just the sha1 (so we know which commits we're seeing)
for each of them?
yes
in the difference.txt?
or a new file?
in skipped_commits.txt was what I was thinking
ok
i will do that
But essentially, whenever you're reporting a commit it will be more useful/informative to have the log message. The sha1 itself is useful only as a lookup key - if the log has already done the lookup, we can understand what we are seeing.
i have to clear the skipped_commit.txt
sure
it still has some commits from the earlier conv8 repo
Ah. I think we already checked and cleared those, so don't worry about them.
thanks for the git command
at the speed the script is doing work....
as per calculations, it would take 56-60 hours for my script to check all the commits at max(i did not take in consideration the fact that revisions might be skipped)
wow, that is quite a speed-up. I hadn't realized SVN checkout was that slow.
:grinning_face_with_smiling_eyes:
couldnt sleep last night..... fixing the script time to time :sleepy:
heh - don't need to go sleep-deprived, time pressure isn't that critical. Just didn't want you to have to have the computer thrashing on commits for a month.
and take this
the script was one of the reasons. the other reason was my sleep cycle because of which i am awake at night and asleep at day
excellent!
So, you can see the pattern - skipping file move commits and branches, as expected.
yea
all of them are either file move commits or branch merge ones
right now it is at 61k
which is good. Have any differences been found yet?
and when i started it 6 hours ago it was around 73k
lemme check
actually yea
i dont know if its because of my script or not
heres the file
Hmm. So you'll want to pick a couple of those files from one of the checkouts and inspect the differences to see what it is finding.
see if the differences are actually present or just a bug in my script
will do that later
i was telling you to check because the script is constantly checking out a new commit from the github repo
One sec... I'll try 70527
r70527 looks like it was a commit to the brep-debug branch
so if you're checking a commit to brep-debug against trunk at that revision, that's not going to match (nor would it be expected to)
that was probably present in the trunk/master branch in github repo
so it was checked
If you mean in git log, it is, but that doesn't mean there will be a trunk checkout that corresponds to it - that's one of the reasons for the svn:branch: labels. The way git reports its history is rather different than SVN, and git log in any given branch won't line up with what svn log would produce without a fair bit of fiddling.
If you don't want to check out the branches, you'll want to filter out any commit that doesn't have svn:branch:trunk in its message
starseeker said:
Sean Do you think it might be better (given those performance constraints) to do a random scattering of revision checks instead of marching through all of them? I know that's less thorough, but at 2-3min per commit that'd be more than a month to run just back to the beginning of the SVN history...
4-way parallel, they're done in a week
Well, if his machine can handle the I/O load...
starseeker said:
Well, if his machine can handle the I/O load...
what am i supposed to do?
take this
Looks good
Sumagna Das said:
starseeker said:
Well, if his machine can handle the I/O load...
what am i supposed to do?
what were you talking about?
Was concerned about the stress on your computer when you were doing the checkouts from SVN directly - it sounds like you solved the problem with git-svn though, so no longer a concern.
oh ok
the script is at 30k-40k range right now
no differences dound after r69289 until now
Sumagna Das said:
no differences dound after r69289 until now
which range should i expect differences?
You'll probably seem some differences appearing below r29886
At that point, you're comparing how cvs2svn and cvs-fast-export (the git conversion) interpreted the CVS history - ours was/is convoluted enough that the interpretation is likely to have ambiguities.
my script is right now at 35859
currently at 30646.... woohoo !!!! :celebration: :tada:
it will probably check upto 27k or 28k if left tonight
29k reached
so some differences may start appearing now?
I wouldn't be surprised.
So no differences found?
(no differences found so far, rather?)
its actually skipping all the cvs commit with "other branch" skipping functionality
so should i remove the "if not trunk then skip" skipping functionality?
hey @Sean what should i do? should i comment the skipping part?
i have to clone the repo again because my local copy of the repo somehow broke and i cannot fix t
I would adjust the skipping to try cvs:branch:trunk commits as well
Sumagna Das said:
so should i remove the "if not trunk then skip" skipping functionality?
yeah, I'm not as concerned with branch commits unless they have some impact on trunk. they're already at risk of being orphaned in git, which sucks but is the git way.
that said, cvs wasn't "on a branch" so that's a separate beast to tangle with
Sumagna Das said:
i have to clone the repo again because my local copy of the repo somehow broke and i cannot fix t
that's super concerning ... do you know how? is all you were doing checkouts? did you manually modify files in .git dirs?
there is always the possibility of a sleeper in the repo that corrupts when accessed, so it would be good to know how you corrupted a checkout. if you cause it by editing .git files, for example, then there's no concern whatsoever. if git did it simply by reading the repo, that would be concerning and worth investigating more deeply.
I've done a few odd things to repos, but it's usually when I try something fairly dangerous - I agree checkouts shouldn't do anything weird.
What were the symptoms of the corruption? What error messages did you see?
they were like
please stash or commit the following changes:
misc/......
..........
Ah - that means something changed the contents of the git archive between one checkout and the next.
Sean said:
Sumagna Das said:
i have to clone the repo again because my local copy of the repo somehow broke and i cannot fix t
that's super concerning ... do you know how? is all you were doing checkouts? did you manually modify files in .git dirs?
didnt do anything. i just do a checkout in svn repo, use diff
checkout the next commit in git repo and repeat
Git will try to prevent you wiping out changes you have been making - for example, if I'm editing file A.txt on branch A and want to checkout branch B, Git wants to know what to do about the modified A.txt.
Right - so the question is why something was seen as being modified.
i just workaround this error using git checkout -q -f <sha1>
One possibility might be if it's assigning line endings on checkout - that could change line endings and produce locally modified files - but I'd have expected something like that to show up earlier in the process...
its showing up more often recently
My suggestion is to do a git checkout somewhere of one of the sha1 commits that is showing this problem, then see what git diff thinks after it is checked out.
now difference actually
The converted history doesn't have a .gitattributes file in it (we'll need to add that once we transition, I've got a template staged in misc/repoconv) so that's one possibility.
error: unable to unlink old 'src/mged/dm_old/dm-hp.c': Permission denied
error: unable to unlink old 'src/mged/dm_old/dm-oglX.c': Permission denied
error: unable to unlink old 'src/mged/dm_old/dm-tek4109.c': Permission denied
this is one of the things that showed up
Um. That's interesting, actually. What sha1 is that?
i tried to checkout this -> b74655fa52
from this -> 78553fdee7
Are you on Linux or Windows doing this?
linux
What's your umask?
umask?
Run the command "umask" - it will report some numbers
they relate to default file permissions on your filesystem
0002
OK, that looks right - what version of git are you using?
2.25.1
OK, that matches. Are you still checked out at commit 78553fdee7?
if so, what does git status report?
that was my previous head position
if u want me to checkout, then thats no problem
I'm wondering if you can reproduce that particular failure.
Checkout 78553fdee7, see what git status says, then try to checkout b74655fa52 and see what happens
actually the modified files wouldnt let me checkout the 78553fdee7 commit
HEAD detached at b74655fa52
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: src/mged/dm_old/dm-hp.c
modified: src/mged/dm_old/dm-oglX.c
modified: src/mged/dm_old/dm-tek4109.c
no changes added to commit (use "git add" and/or "git commit -a")
this git status report right now before switching to 78553fdee7 commit
what does git log say about src/mged/dm_old/dm-tek4109.c
and git diff, for that matter
git log -1?
commit f81cd07759396ef3706069575bb415ef4739eca9
Author: Christopher Sean Morrison <brlcad@gmail.com>
Date: Tue Feb 20 08:19:51 2007 +0000
update all usages of fgets() to instead use john's swanktastic bu_fgets() that behaves as one would generally want regardless of the line ending type of the compilation platform or of the input files. bu_fgets() responds to input files that use CR (usually old mac), LF (usually unix, new mac), or CR/LF (usually windows) for the line ending so now these file do too effectivley squashing buggish/bad behavior.
svn:revision:27628
cvs:account:brlcad
cvs:branch:trunk
git log -1 src/mged/dm_old/dm-tek4109.c :up_button:
git diff src/mged/dm_old/dm-tek4109.c :down_button:
diff --git a/src/mged/dm_old/dm-tek4109.c b/src/mged/dm_old/dm-tek4109.c
index 53106cc4ac..d8bb9d1078 100644
--- a/src/mged/dm_old/dm-tek4109.c
+++ b/src/mged/dm_old/dm-tek4109.c
@@ -161,7 +161,7 @@ T49_open()
char line[64], line2[64];
bu_log("Output tty [stdout]? ");
- (void)bu_fgets( line, sizeof(line), stdin ); /* \n, null terminated */
+ (void)fgets( line, sizeof(line), stdin ); /* \n, null terminated */
line[strlen(line)-1] = '\0'; /* remove newline */
if( feof(stdin) )
quit();
So if you do git checkout f81cd07759396ef3706069575bb415ef4739eca9 what happens?
it reports no issue
what does git status say from that checkout?
but the changes are still there
git status still says the files (which were stated before) were changed
What does git diff report?
same change
Sumagna Das said:
git diff src/mged/dm_old/dm-tek4109.c :down_button:
diff --git a/src/mged/dm_old/dm-tek4109.c b/src/mged/dm_old/dm-tek4109.c index 53106cc4ac..d8bb9d1078 100644 --- a/src/mged/dm_old/dm-tek4109.c +++ b/src/mged/dm_old/dm-tek4109.c @@ -161,7 +161,7 @@ T49_open() char line[64], line2[64]; bu_log("Output tty [stdout]? "); - (void)bu_fgets( line, sizeof(line), stdin ); /* \n, null terminated */ + (void)fgets( line, sizeof(line), stdin ); /* \n, null terminated */ line[strlen(line)-1] = '\0'; /* remove newline */ if( feof(stdin) ) quit();
this change :up:
So it's checking out that revision, but it has the older files somehow
think so
I can't seem to reproduce it here - do you have any steps that will? (A series of checkouts?)
it just happened suddenly
and this is a new clone of the repo
should i try cloning the repo again and see if it happens there?
yeah
my internet speed is down again (for github) :sad:
What if you clone from your local checkout to another local checkout?
??
i will try to clone after 20 minutes ( at 12:00 am midnight). the speed will be back at that time
starseeker said:
What if you clone from your local checkout to another local checkout?
is that possible?
if you have a repo in:
/home/user/brlcad_conv10
try
mkdir /home/user/checkout2 && cd /home/user/checkout2 && git clone /home/user/brlcad_conv10
started it already
and its fast (19 MB/s)
You should end up with a checkout at /home/user/checkout2/brlcad_conv10
starseeker said:
Checkout 78553fdee7, see what git status says, then try to checkout b74655fa52 and see what happens
i did this in the local checkout and it worked properly without any issues or errors
ohhhhhhhhhhhh
now i remember why this permission issue might be happening. can you tell me if this is the reason?
i set up the "empty directory remover" but for some reason it did not work giving permission issues. i added a part in the script which elevates the script to superuser(which is pretty much dangerous)
so it uses the sudo part even during git checkouts i think
can that be the reason?
Sean said:
that said, cvs wasn't "on a branch" so that's a separate beast to tangle with
but in the commit description it shows cvs:branch:trunk
?
@Sumagna Das that certainly could be related, if you're mixing (or trying to mix) sudo and non-sudo operations.
In any case, you shouldn't need sudo for any of this - I'm not sure why you were getting permission errors on the empty dir removal, but sudo almost certainly wasn't what you wanted to do. (Also, make sure none of your script activities are operating inside the .git directory.)
ok
The cvs:branch:trunk label indicates a CVS commit that was not in one of the branch chains in the CVS conversion. "cvs" denotes a version control system usage, rather than a branch - so "trunk" in this case is the conceptual branch used by CVS for a given commit, according to the cvs-fast-export analysis of the repo.
so there might not be another cvs:branch:<branch_name>
where branh_name is something other than trunk?
yes
in fact, there are
thats done
starseeker said:
in fact, there are
so i might wanna skip commits which are not for trunk?
yes
i will review the script tomorrow and rerun the script again
the laptop will get some rest for tonight
/me chuckles
will most of the commits after 29886 differ from the github ones?
because until 27637, no differences have been found
Most of them should be the same, based on my testing - I saw differences regularly starting in the 19k range with a CVS checkout, but it's not actually certain that will happen with GIT vs SVN - the SVN history is a cvs2svn reconstruction, so it may be closer to what the git history says than the vanilla CVS checkout.
i dont know if this is weird but ....
check the last few lines of the skipped_commits.txt
what about it?
why doesnt it have any svn:revision:<rev>
?
Because it doesn't have an equivalent SVN commit. cvs2svn and cvs-fast-export broke down the CVS history in slightly different ways. That means we don't have an exact match for every commit in the new Git conversion when comparing to the old SVN conversion.
oh ok
that was weird for me because i didnt know about that
btw no differences found after 69289
right now at 26k
/me nods - nice!
oh actually it found some differences :mischievous:
Do you know why it's reporting differences?
I'll give you a hint - check the svn:branch: labels of the commits newer than r29886
For the older ones, you'll want to take a look and see what the difference is in those two text files.
(deleted)
@starseeker
sumagnadas@hp-laptop:/mnt/sda2/brlcad_migr$ diff github/libtcl/library/reg1.0/pkgIndex.tcl brlcad/libtcl/library/reg1.0/pkgIndex.tcl --color=auto
1c1,2
< package ifneeded registry 1.0 "load [list [file join $dir tclreg82.dll]] registry"
---
> package ifneeded registry 1.0 "load [list [file join $dir tclreg82.dll]] registry"
>
why is this even a difference?
finally at 2k range :tada: :celebration:
the check is complete
(deleted)
here are the two files
the difference.txt is a bit big and long because it found many differences after 19k just as you stated @starseeker
That's probably a difference due to line endings, or possibly trailing whitespace - you can use a graphical diff tool like meld or kompare to take a look...
@Sumagna Das nice job!
@Sean what are the next steps?
mention me if i can help in the next steps
but i will not be much online from 14th august-5th september due to exams
starseeker said:
Sumagna Das nice job!
thanks by the way
Last updated: Jan 09 2025 at 00:46 UTC