-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Use msysGit's git-wrapper
instead of the builtins
#34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reminder: We probably need to add a manifest similar to #33 to the wrapper, too. |
Actually, we should include the git-wrapper in our Git repository. That way, we will not have to determine the builtins after the fact, but can use the I am therefore switching the label from |
On Windows, Git is faced by the challenge that it has to set up certain environment variables before running Git under special circumstances such as when Git is called directly from cmd.exe (i.e. outside any Bash environment). This source code was taken from msysGit's commit 74a198d: https://github.com/msysgit/msysgit/blob/74a198d/src/git-wrapper/git-wrapper.c Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Oy vey, I forgot about that one. @sschuberth good idea to put this reminder here! |
Sadly, we now save 164 kB less (out of 814 MB, I think we can live with that) 😛 By the way, I would like to keep those |
@sschuberth @t-b I would really appreciate your review about this one... The test suite is running with this and so far, things look real groovy (but that says nothing: it takes roughly an hour to run here, and we're only a couple minutes into the test run). |
It finished in 2 hours 39 minutes and 44 seconds. All tests are fine, including my manual ones. |
Actually, there is one more thing I just thought of: I need to make sure that the git wrapper works in its original role, i.e. when called from I should be able to make it work, still, in the msysGit context as well as in the MSys2 context by introducing conditional code depending on the presence of the |
Oh, and I cannot thank @patthoyts enough for his work on the Git wrapper. It is really an awesome source code to work on, and it gave me a big head start on this PR. |
Fix whitespace. This trick was performed by git rebase --whitespace=fix HEAD^ Signed-off-by: Johannes Schindelin <[email protected]>
This change simply makes the Git wrapper's source code conform to Git's coding guide lines. Signed-off-by: Johannes Schindelin <[email protected]>
@sschuberth do you want me to squash 74f22ffc5c09f366141c9b32343770a7fde13c77 before merging? Otherwise I would be content with the Pull Request as it is right now... |
This prepares the wrapper for modifications to serve as a drop-in replacement for the builtins. This commit's diff is best viewed with the `-w` flag. Signed-off-by: Johannes Schindelin <[email protected]>
Git started out as a bunch of separate commands, in the true Unix spirit. Over time, more and more functionality was shared between the different Git commands, though, so it made sense to introduce the notion of "builtins": programs that are actually integrated into the main Git executable. These builtins can be called in two ways: either by specifying a subcommand as the first command-line argument, or -- for backwards compatibility -- by calling the Git executable hardlinked to a filename of the form "git-<subcommand>". Example: the "log" command can be called via "git log <parameters>" or via "git-log <parameters>". The latter form is actually deprecated and only supported for scripts; calling "git-log" interactively will not even work by default because the libexec/git-core/ directory is not in the PATH. All of this is well and groovy as long as hard links are supported. Sadly, this is not the case in general on Windows. So it actually hurts quite a bit when you have to fall back to copying all of git.exe's currently 7.5MB 109 times, just for backwards compatibility. The simple solution would be to install really trivial shell script wrappers in place of the builtins: for builtin in $BUILTINS do rm git-$builtin.exe printf '#!/bin/sh\nexec git %s "$@"\n' $builtin > git-builtin chmod a+x git-builtin done This method would work -- even on Windows because Git for Windows ships a full-fledged Bash. However, the Windows Bash comes at a price: it needs to spin up a full-fledged POSIX emulation layer everytime it starts. Therefore, the shell script solution would incur a significant performance penalty. The best solution the Git for Windows team could come up with is to extend the Git wrapper -- that is needed to call Git from cmd.exe anyway, and that weighs in with a scant 19KB -- to also serve as a drop-in replacement for the builtins so that the following workaround is satisfactory: for builtin in $BUILTINS do cp git-wrapper.exe git-$builtin.exe done This commit allows for this, by extending the module file parsing to turn builtin command names like `git-log.exe ...` into calls to the main Git executable: `git.exe log ...`. Signed-off-by: Johannes Schindelin <[email protected]>
This reduces the disk footprint of a full Git for Windows setup dramatically because on Windows, one cannot assume that hard links are supported. The net savings are calculated easily: the 32-bit `git.exe` file weighs in with 7662 kB while the `git-wrapper.exe` file (modified to serve as a drop-in replacement for builtins) weighs a scant 21 kB. At this point, there are 109 builtins which results in a total of 813 MB disk space being freed up by this commit. Yes, that is really more than half a gigabyte. Signed-off-by: Johannes Schindelin <[email protected]>
Embed the manifest in the Git wrapper, too This is needed for builtins such as 'patch-id' to avoid triggering Windows' User Access Control. Signed-off-by: Johannes Schindelin <[email protected]>
@sschuberth I just adjusted the |
Since you were accommodating me, I'll accommodate you and not argue about the remaining things which mostly are opinion-based anyway ;-) So feel free to merge when you're ready. |
Use msysGit's `git-wrapper` instead of the builtins
This PR makes the MSYS1-based CI fail:
The fix is in #39. |
Use msysGit's `git-wrapper` instead of the builtins
Use msysGit's `git-wrapper` instead of the builtins
I don't quite get it. Hardlinks are supported as of Windows 2000. Why would we want to exchange 109 hardlinks (that require 0 MB of disk space) with 109 copies of the git-wrapper (i.e. additional 109 * 19k ~= 2 MB)? IIRC, we only need libexec/git-core/git-* to support legacy scripted commands that still use the 'git-command' syntax. Scripted commands are slow enough already on Windows, due to the performance penalty imposed by CreateProcess. Using copies of the git-wrapper rather than hardlinks to the real git.exe will likely double that penalty. IMO this is only useful for portable installations on FAT-formatted volumes (if that works at all). And even in this special case, users may not want to trade space for speed in this way... With the recent maint-1.9 patch to calculate HOME in git.exe, I thought that we might actually get rid of git-wrapper altogether. AFAIK there's nothing in there that cannot be done in git.exe's startup code, and IMO there's no point in duplicating environment setup code in so many places (including even completely separate projects such as GitExtensions and TortoiseGit). |
@kblees thanks for joining the discussion. It is unfortunate a bit late in the discussion, i.e. after review and merge. Even more sadly, it seems a couple of issues seemed to not be clear:
|
@dscho Even if this has been merged already I think it is important to clarify any outstanding issues. I don't see this as beating a dead horse as merged stuff can be still improved (or even reverted) later on.
Very true. Still, I think it is very reasonable to assume that any non-portable installation of Git for Windows will be running on NTFS. Thus, for that case, we can assume hardlink support.
Thinking about all the past discussions we had recently, not only as part of this PR, I think this indeed is the main point: You're trying to please the needs of GitHub for obvious reasons. There's nothing wrong with that as long as you accept that GitHub's needs are not necessarily our needs. And by "our" I mean the majority of Git for Windows users. I fully agree with @kblees that portable installations on FAT-formatted volumes are a special case, not the rule. It just happens that GitHub is interested in that special case. That said, I still think this PR is good as it allows us to have a common release process for both portable and "real" installations, in the sense that we share the least common denominator to create aliases for built-ins, which is file copies (of the wrapper, not the main git executable). While hardlinks are technically superior, even if they worked for portable installations (i.e. everybody was using NTFS formatted USB sticks), we'd still run into the ever occurring reports about the Git installation being so huge, because programs like the Windows Explorer erroneously accumulate hardlink sizes. |
That is of course appropriate. By opening a new ticket.
As stated before, my current focus is the portable Git for Windows. That does not prevent later changes to support an installer that knows how to make hard links but falls back to using copies -- of the Git wrapper.
I think you misunderstand. Without GitHub, there would not be a Git for Windows 2.x based on MSys2, portable or not. As a consequence, my first focus is on the portable application, keeping in mind that I must not throw the door shut to the needs of the InnoSetup installer.
That sounds similar to some of your earlier comments, which made me use the Git wrapper always in place of the builtins. |
By "clarifying any outstanding issues" I did not mean issues in the "bug" sense, but open questions. These should be answered in the same place where the question arose, i.e. in the PR discussion.
I think you got confused now with the different discussions currently going on in different PRs, or? This PR is titled "Use msysGit's
See above. This PR has already been merged 17 days ago. |
You are correct, I got confused. My bad. FWIW I had noticed that mistake just after hitting the "Comment" button and updated the comment before you responded, but sadly not before you had read and quoted the erroneous parts. |
If performance doesn't matter, a 2-line bash script as outlined in the commit message would have been just as good.
The point is that the commit message suggests that using the git-wrapper saves a lot of disk space, while in fact the opposite is true. So we pay the disk space penalty (even though a small one) in addition to the performance penalty.
mingw_startup() already initializes TERM=cygwin, so simply removing these lines from the git-wrapper would have had the same effect. MSYSTEM is only relevant in the development environment, it is neither needed by git nor set by the git-wrapper (at least not until now). If MSys2-based installations require MSYSTEM to be set, that's another reason to not add this to the git-wrapper but to git's own startup code. Otherwise all direct callers of git.exe would have to duplicate this logic.
/cmd/git.exe should be the real git executable (or hardlink to it) and 'de-pollute' PATH in its own startup code. |
It's still true that the git-wrapper saves a lot of disk spaces compared to the alternative of copying the main git executable. That's what the commit message refers to.
Good point. |
If that were true, I would not have needed to make that patch now, would I.
Incorrect.
That would require invasive changes by requiring the |
It sure is true, otherwise git + pager (without git-wrapper) wouldn't work in cmd.exe.
What do you mean by this? Judging from the MSys2 sources, there are no code path decisions based on the
What do you mean by |
Okay, I am going to lay it out clearly. There was a bug that made |
Well, you said it yourself:
There are other decisions made, based on this, so that you end up with a different setup depending on the
I made a mistake: I meant |
I said: "simply removing these lines from the git-wrapper would have had the same effect" If TERM is already (wrongly) set to "msys" by the git-wrapper, git's own startup code will not touch it. The problem is that we have similar (but not quite identical) environment setup code in too many places. The original git-wrapper was based on a very old version of git-cmd.bat, without questioning which settings were still necessary. E.g. it also sets HOME, but without checking if %HOMEDRIVE%%HOMEPATH% exists (i.e. it won't work if you're offline). |
…ects' We want to make `git push` faster, but we need to know where the time is going! There are likely four places where the time is going: 1. The info/refs call and force-update checking at the beginning. 2. The `git pack-objects` call that creates a pack-file to send to the server. 3. Sending the data to the server. 4. Waiting for the server to verify the pack-file. This PR adds `trace2_region_` calls inside `git pack-objects` so we can track the time in item (2). The rest could be interpreted from the start and end time of the entire command after we know this region. The server-side verification is something we can track using server telemetry.
…ects' We want to make `git push` faster, but we need to know where the time is going! There are likely four places where the time is going: 1. The info/refs call and force-update checking at the beginning. 2. The `git pack-objects` call that creates a pack-file to send to the server. 3. Sending the data to the server. 4. Waiting for the server to verify the pack-file. This PR adds `trace2_region_` calls inside `git pack-objects` so we can track the time in item (2). The rest could be interpreted from the start and end time of the entire command after we know this region. The server-side verification is something we can track using server telemetry.
We cannot rely on hardlinks to keep disk usage low when installing the builtins into
libexec/
. Therefore, we need something very small we can use instead for backwards-compatibility.The Git wrapper established in msysGit is ideal for that: it is already an executable users can call instead of Git (its original purpose is to call Git from
cmd.exe
without the need to call a script setting up the environment correctly). We can easily extend it to serve another purpose, too.