Skip to content

Use msysGit's git-wrapper instead of the builtins #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 18, 2015

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 13, 2015

We cannot rely on hardlinks to keep disk usage low when installing the builtins into libexec/. Therefore, we need something very small we can use instead for backwards-compatibility.

The Git wrapper established in msysGit is ideal for that: it is already an executable users can call instead of Git (its original purpose is to call Git from cmd.exe without the need to call a script setting up the environment correctly). We can easily extend it to serve another purpose, too.

@sschuberth
Copy link

Reminder: We probably need to add a manifest similar to #33 to the wrapper, too.

@dscho
Copy link
Member Author

dscho commented Mar 10, 2015

Actually, we should include the git-wrapper in our Git repository. That way, we will not have to determine the builtins after the fact, but can use the Makefile machinery to install the wrapper into the appropriate place right away.

I am therefore switching the label from sdk to git.

@dscho dscho added git and removed sdk labels Mar 10, 2015
dscho added 2 commits March 13, 2015 14:16
On Windows, Git is faced by the challenge that it has to set up certain
environment variables before running Git under special circumstances
such as when Git is called directly from cmd.exe (i.e. outside any
Bash environment).

This source code was taken from msysGit's commit 74a198d:

https://github.com/msysgit/msysgit/blob/74a198d/src/git-wrapper/git-wrapper.c

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Mar 13, 2015

Reminder: We probably need to add a manifest similar to #33 to the wrapper, too.

Oy vey, I forgot about that one. @sschuberth good idea to put this reminder here!

@dscho
Copy link
Member Author

dscho commented Mar 13, 2015

Sadly, we now save 164 kB less (out of 814 MB, I think we can live with that) 😛

By the way, I would like to keep those fixup! commits for the record. The next merging rebase should squash all of these commits, though.

@dscho
Copy link
Member Author

dscho commented Mar 13, 2015

@sschuberth @t-b I would really appreciate your review about this one... The test suite is running with this and so far, things look real groovy (but that says nothing: it takes roughly an hour to run here, and we're only a couple minutes into the test run).

@dscho
Copy link
Member Author

dscho commented Mar 14, 2015

The test suite is running with this

It finished in 2 hours 39 minutes and 44 seconds. All tests are fine, including my manual ones.

@dscho
Copy link
Member Author

dscho commented Mar 14, 2015

Actually, there is one more thing I just thought of: I need to make sure that the git wrapper works in its original role, i.e. when called from /cmd/ (which should now probably be /mingw32/cmd/ or /mingw64/cmd/).

I should be able to make it work, still, in the msysGit context as well as in the MSys2 context by introducing conditional code depending on the presence of the mingw/ directory next to the parent directory or the presence of the usr/ directory next to the "grandparent directory".

@dscho
Copy link
Member Author

dscho commented Mar 14, 2015

Oh, and I cannot thank @patthoyts enough for his work on the Git wrapper. It is really an awesome source code to work on, and it gave me a big head start on this PR.

dscho added 2 commits March 17, 2015 15:40
Fix whitespace. This trick was performed by

	git rebase --whitespace=fix HEAD^

Signed-off-by: Johannes Schindelin <[email protected]>
This change simply makes the Git wrapper's source code conform to Git's
coding guide lines.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Mar 18, 2015

@sschuberth do you want me to squash 74f22ffc5c09f366141c9b32343770a7fde13c77 before merging? Otherwise I would be content with the Pull Request as it is right now...

dscho added 4 commits March 18, 2015 08:50
This prepares the wrapper for modifications to serve as a drop-in
replacement for the builtins.

This commit's diff is best viewed with the `-w` flag.

Signed-off-by: Johannes Schindelin <[email protected]>
Git started out as a bunch of separate commands, in the true Unix spirit.
Over time, more and more functionality was shared between the different
Git commands, though, so it made sense to introduce the notion of
"builtins": programs that are actually integrated into the main Git
executable.

These builtins can be called in two ways: either by specifying a
subcommand as the first command-line argument, or -- for backwards
compatibility -- by calling the Git executable hardlinked to a filename
of the form "git-<subcommand>". Example: the "log" command can be called
via "git log <parameters>" or via "git-log <parameters>". The latter
form is actually deprecated and only supported for scripts; calling
"git-log" interactively will not even work by default because the
libexec/git-core/ directory is not in the PATH.

All of this is well and groovy as long as hard links are supported.

Sadly, this is not the case in general on Windows. So it actually hurts
quite a bit when you have to fall back to copying all of git.exe's
currently 7.5MB 109 times, just for backwards compatibility.

The simple solution would be to install really trivial shell script
wrappers in place of the builtins:

	for builtin in $BUILTINS
	do
		rm git-$builtin.exe
		printf '#!/bin/sh\nexec git %s "$@"\n' $builtin > git-builtin
		chmod a+x git-builtin
	done

This method would work -- even on Windows because Git for Windows ships a
full-fledged Bash. However, the Windows Bash comes at a price: it needs to
spin up a full-fledged POSIX emulation layer everytime it starts.
Therefore, the shell script solution would incur a significant performance
penalty.

The best solution the Git for Windows team could come up with is to extend
the Git wrapper -- that is needed to call Git from cmd.exe anyway, and
that weighs in with a scant 19KB -- to also serve as a drop-in replacement
for the builtins so that the following workaround is satisfactory:

	for builtin in $BUILTINS
	do
		cp git-wrapper.exe git-$builtin.exe
	done

This commit allows for this, by extending the module file parsing to
turn builtin command names like `git-log.exe ...` into calls to the main
Git executable: `git.exe log ...`.

Signed-off-by: Johannes Schindelin <[email protected]>
This reduces the disk footprint of a full Git for Windows setup
dramatically because on Windows, one cannot assume that hard links are
supported.

The net savings are calculated easily: the 32-bit `git.exe` file weighs
in with 7662 kB while the `git-wrapper.exe` file (modified to serve as a
drop-in replacement for builtins) weighs a scant 21 kB. At this point,
there are 109 builtins which results in a total of 813 MB disk space
being freed up by this commit.

Yes, that is really more than half a gigabyte.

Signed-off-by: Johannes Schindelin <[email protected]>
Embed the manifest in the Git wrapper, too

This is needed for builtins such as 'patch-id' to avoid triggering
Windows' User Access Control.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Mar 18, 2015

@sschuberth I just adjusted the Refactor commit message for your pleasure. Now I think this is ready for merging.

@sschuberth
Copy link

Since you were accommodating me, I'll accommodate you and not argue about the remaining things which mostly are opinion-based anyway ;-) So feel free to merge when you're ready.

dscho added a commit that referenced this pull request Mar 18, 2015
Use msysGit's `git-wrapper` instead of the builtins
@dscho dscho merged commit 5fb2a13 into git-for-windows:master Mar 18, 2015
@dscho dscho deleted the git-wrapper branch March 18, 2015 09:17
@sschuberth
Copy link

This PR makes the MSYS1-based CI fail:

c:/jenkins/sdk-build-git/workspace/git-sdk/mingw/bin/../lib/gcc/mingw32/4.8.1/../../../libmingw32.a(main.o):(.text.startup+0xa7): undefined reference to `WinMain@16'
collect2.exe: error: ld returned 1 exit status

The fix is in #39.

dscho added a commit that referenced this pull request Mar 21, 2015
Use msysGit's `git-wrapper` instead of the builtins
dscho pushed a commit that referenced this pull request Mar 24, 2015
Use msysGit's `git-wrapper` instead of the builtins
@kblees
Copy link

kblees commented Mar 25, 2015

I don't quite get it. Hardlinks are supported as of Windows 2000. Why would we want to exchange 109 hardlinks (that require 0 MB of disk space) with 109 copies of the git-wrapper (i.e. additional 109 * 19k ~= 2 MB)?

IIRC, we only need libexec/git-core/git-* to support legacy scripted commands that still use the 'git-command' syntax. Scripted commands are slow enough already on Windows, due to the performance penalty imposed by CreateProcess. Using copies of the git-wrapper rather than hardlinks to the real git.exe will likely double that penalty.

IMO this is only useful for portable installations on FAT-formatted volumes (if that works at all). And even in this special case, users may not want to trade space for speed in this way...

With the recent maint-1.9 patch to calculate HOME in git.exe, I thought that we might actually get rid of git-wrapper altogether. AFAIK there's nothing in there that cannot be done in git.exe's startup code, and IMO there's no point in duplicating environment setup code in so many places (including even completely separate projects such as GitExtensions and TortoiseGit).

@dscho
Copy link
Member Author

dscho commented Mar 26, 2015

@kblees thanks for joining the discussion. It is unfortunate a bit late in the discussion, i.e. after review and merge. Even more sadly, it seems a couple of issues seemed to not be clear:

  • hardlinks are not guaranteed to be available. This is a question of the file system, not the OS version
  • using copies rather than hardlinks might punish old-style scripts, but that does not matter: this is legacy support
  • the portable application is what e.g. GitHub needs; as GitHub made this push for MSys2-based Git for Windows possible, so they get the best support I can provide
  • Two megabyte? Really? 👀
  • HOME is not the only environment variable that needs to be set. With TERM=msys, a git log will result in a crash. That is just one example. Another one is MSYSTEM.
  • there is more to putting the Git wrapper into /cmd/ than just setting the environment variables to begin with: it is de-polluting the PATH. Just give it a little thought: there are currently 53 .dll files in /mingw??/bin/. See the problem?

@sschuberth
Copy link

@dscho Even if this has been merged already I think it is important to clarify any outstanding issues. I don't see this as beating a dead horse as merged stuff can be still improved (or even reverted) later on.

hardlinks are not guaranteed to be available. This is a question of the file system, not the OS version

Very true. Still, I think it is very reasonable to assume that any non-portable installation of Git for Windows will be running on NTFS. Thus, for that case, we can assume hardlink support.

as GitHub made this push for MSys2-based Git for Windows possible, so they get the best support I can provide

Thinking about all the past discussions we had recently, not only as part of this PR, I think this indeed is the main point: You're trying to please the needs of GitHub for obvious reasons. There's nothing wrong with that as long as you accept that GitHub's needs are not necessarily our needs. And by "our" I mean the majority of Git for Windows users. I fully agree with @kblees that portable installations on FAT-formatted volumes are a special case, not the rule. It just happens that GitHub is interested in that special case.

That said, I still think this PR is good as it allows us to have a common release process for both portable and "real" installations, in the sense that we share the least common denominator to create aliases for built-ins, which is file copies (of the wrapper, not the main git executable). While hardlinks are technically superior, even if they worked for portable installations (i.e. everybody was using NTFS formatted USB sticks), we'd still run into the ever occurring reports about the Git installation being so huge, because programs like the Windows Explorer erroneously accumulate hardlink sizes.

@dscho
Copy link
Member Author

dscho commented Mar 26, 2015

@dscho Even if this has been merged already I think it is important to clarify any outstanding issues.

That is of course appropriate. By opening a new ticket.

I think it is very reasonable to assume that any non-portable installation of Git for Windows will be running on NTFS.

As stated before, my current focus is the portable Git for Windows. That does not prevent later changes to support an installer that knows how to make hard links but falls back to using copies -- of the Git wrapper.

You're trying to please the needs of GitHub for obvious reasons.

I think you misunderstand. Without GitHub, there would not be a Git for Windows 2.x based on MSys2, portable or not. As a consequence, my first focus is on the portable application, keeping in mind that I must not throw the door shut to the needs of the InnoSetup installer.

While hardlinks are technically superior, even if they worked for portable installations (i.e. everybody was using NTFS formatted USB sticks), we'd still run into the ever occurring reports about the Git installation being so huge, because programs like the Windows Explorer erroneously accumulate hardlink sizes.

That sounds similar to some of your earlier comments, which made me use the Git wrapper always in place of the builtins.

@sschuberth
Copy link

That is of course appropriate. By opening a new ticket.

By "clarifying any outstanding issues" I did not mean issues in the "bug" sense, but open questions. These should be answered in the same place where the question arose, i.e. in the PR discussion.

This PR is not about builtins at all.

I think you got confused now with the different discussions currently going on in different PRs, or? This PR is titled "Use msysGit's git-wrapper instead of the builtins", so it is about builtins.

It seems that we have exhausted all good comments about this PR, so I am going to merge it right now.

See above. This PR has already been merged 17 days ago.

@dscho
Copy link
Member Author

dscho commented Mar 26, 2015

This PR is not about builtins at all.

I think you got confused now with the different discussions currently going on in different PRs, or? This PR is titled "Use msysGit's git-wrapper instead of the builtins", so it is about builtins.

You are correct, I got confused. My bad.

FWIW I had noticed that mistake just after hitting the "Comment" button and updated the comment before you responded, but sadly not before you had read and quoted the erroneous parts.

@kblees
Copy link

kblees commented Mar 26, 2015

using copies rather than hardlinks might punish old-style scripts, but that does not matter: this is legacy support

If performance doesn't matter, a 2-line bash script as outlined in the commit message would have been just as good.

Two megabyte? Really? 👀

The point is that the commit message suggests that using the git-wrapper saves a lot of disk space, while in fact the opposite is true. So we pay the disk space penalty (even though a small one) in addition to the performance penalty.

HOME is not the only environment variable that needs to be set. With TERM=msys, a git log will result in a crash. That is just one example. Another one is MSYSTEM.

mingw_startup() already initializes TERM=cygwin, so simply removing these lines from the git-wrapper would have had the same effect.

MSYSTEM is only relevant in the development environment, it is neither needed by git nor set by the git-wrapper (at least not until now). If MSys2-based installations require MSYSTEM to be set, that's another reason to not add this to the git-wrapper but to git's own startup code. Otherwise all direct callers of git.exe would have to duplicate this logic.

there is more to putting the Git wrapper into /cmd/ than just setting the environment variables to begin with: it is de-polluting the PATH. Just give it a little thought: there are currently 53 .dll files in /mingw??/bin/. See the problem?

/cmd/git.exe should be the real git executable (or hardlink to it) and 'de-pollute' PATH in its own startup code.

@sschuberth
Copy link

The point is that the commit message suggests that using the git-wrapper saves a lot of disk space, while in fact the opposite is true.

It's still true that the git-wrapper saves a lot of disk spaces compared to the alternative of copying the main git executable. That's what the commit message refers to.

MSYSTEM is only relevant in the development environment, it is neither needed by git nor set by the git-wrapper (at least not until now).

Good point.

@dscho
Copy link
Member Author

dscho commented Mar 26, 2015

mingw_startup() already initializes TERM=cygwin

If that were true, I would not have needed to make that patch now, would I.

MSYSTEM is only relevant in the development environment

Incorrect. msys-2.0.dll uses different code paths depending on the MSYSTEM value.

/cmd/git.exe should be the real git executable (or hardlink to it) and 'de-pollute' PATH in its own startup code.

That would require invasive changes by requiring the runtime_prefix() to also know about /cmd/. And about the MSys2-specific file system layout.

@kblees
Copy link

kblees commented Mar 30, 2015

mingw_startup() already initializes TERM=cygwin

If that were true, I would not have needed to make that patch now, would I.

It sure is true, otherwise git + pager (without git-wrapper) wouldn't work in cmd.exe.

MSYSTEM is only relevant in the development environment

Incorrect. msys-2.0.dll uses different code paths depending on the MSYSTEM value.

What do you mean by this? Judging from the MSys2 sources, there are no code path decisions based on the MSYSTEM value. It is just passed through to forked applications (in environ.cc), or used in the system name (in uname.cc).

/cmd/git.exe should be the real git executable (or hardlink to it) and 'de-pollute' PATH in its own startup code.

That would require invasive changes by requiring the runtime_prefix() to also know about /cmd/.

What do you mean by runtime_prefix()? Is this a function in git, or git-wrapper, or MSys2 ? I cannot find it anywhere.

@dscho
Copy link
Member Author

dscho commented Mar 31, 2015

mingw_startup() already initializes TERM=cygwin

If that were true, I would not have needed to make that patch now, would I.

It sure is true, otherwise git + pager (without git-wrapper) wouldn't work in cmd.exe.

Okay, I am going to lay it out clearly. There was a bug that made less.exe crash with an access violation before my change. After my change, it no longer crashed. After I merged this fix, you said that it was not needed. Then I said that it was needed, otherwise I would not have come up with that fix in the first place. And even then you argue that it is not needed?

@dscho
Copy link
Member Author

dscho commented Mar 31, 2015

msys-2.0.dll uses different code paths depending on the MSYSTEM value.

What do you mean by this? Judging from the MSys2 sources, there are no code path decisions based on the MSYSTEM value. It is just passed through to forked applications (in environ.cc), or used in the system name (in uname.cc).

Well, you said it yourself:

It is just passed through to forked applications (in environ.cc), or used in the system name (in uname.cc).

There are other decisions made, based on this, so that you end up with a different setup depending on the MSYSTEM variable. That has not changed relative to msysGit, by the way.

/cmd/git.exe should be the real git executable (or hardlink to it) and 'de-pollute' PATH in its own startup code.

That would require invasive changes by requiring the runtime_prefix() to also know about /cmd/.

What do you mean by runtime_prefix()? Is this a function in git, or git-wrapper, or MSys2 ? I cannot find it anywhere.

I made a mistake: I meant RUNTIME_PREFIX, not runtime_prefix(). To my defense, it sure feels that in addition to all the work and all the bug fixes, I also have to explain a lot, therefore a little distress has seeped through into my comments. I apologize for that.

@kblees
Copy link

kblees commented Mar 31, 2015

mingw_startup() already initializes TERM=cygwin

If that were true, I would not have needed to make that patch now, would I.

It sure is true, otherwise git + pager (without git-wrapper) wouldn't work in cmd.exe.

Okay, I am going to lay it out clearly. There was a bug that made less.exe crash with an access violation before my change. After my change, it no longer crashed. After I merged this fix, you said that it was not needed. Then I said that it was needed, otherwise I would not have come up with that fix in the first place. And even then you argue that it is not needed?

I said: "simply removing these lines from the git-wrapper would have had the same effect"

If TERM is already (wrongly) set to "msys" by the git-wrapper, git's own startup code will not touch it.

The problem is that we have similar (but not quite identical) environment setup code in too many places.

The original git-wrapper was based on a very old version of git-cmd.bat, without questioning which settings were still necessary. E.g. it also sets HOME, but without checking if %HOMEDRIVE%%HOMEPATH% exists (i.e. it won't work if you're offline).

jeffhostetler pushed a commit to jeffhostetler/git that referenced this pull request Oct 11, 2018
…ects'

We want to make `git push` faster, but we need to know where the time is going!

There are likely four places where the time is going:

1. The info/refs call and force-update checking at the beginning.
2. The `git pack-objects` call that creates a pack-file to send to the server.
3. Sending the data to the server.
4. Waiting for the server to verify the pack-file.

This PR adds `trace2_region_` calls inside `git pack-objects` so we can track the time in item (2). The rest could be interpreted from the start and end time of the entire command after we know this region. The server-side verification is something we can track using server telemetry.
jamill pushed a commit to jamill/git that referenced this pull request Nov 20, 2018
…ects'

We want to make `git push` faster, but we need to know where the time is going!

There are likely four places where the time is going:

1. The info/refs call and force-update checking at the beginning.
2. The `git pack-objects` call that creates a pack-file to send to the server.
3. Sending the data to the server.
4. Waiting for the server to verify the pack-file.

This PR adds `trace2_region_` calls inside `git pack-objects` so we can track the time in item (2). The rest could be interpreted from the start and end time of the entire command after we know this region. The server-side verification is something we can track using server telemetry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants