-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix access violations when cloning/fetching via HTTPS #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just realized that the most important place to call |
That damn thing drove me crazy the last two days... |
Me, too, believe me! ;-) Thanks for helping me debug this! |
It is unlikely that we have an empty environment, ever, but *if* we do, when `environ_size - 1` is passed to `bsearchenv()` it is misinterpreted as a real large integer. To make the code truly defensive, refuse to do anything at all if the size is negative (which should not happen, of course). Signed-off-by: Johannes Schindelin <[email protected]>
I wonder if t5539-fetch-http-shallow.sh ........................ skipped: no web server found at '/usr/sbin/apache2'
t5540-http-push-webdav.sh .......................... skipped: no web server found at '/usr/sbin/apache2'
t5541-http-push-smart.sh ........................... skipped: no web server found at '/usr/sbin/apache2'
t5542-push-http-shallow.sh ......................... skipped: no web server found at '/usr/sbin/apache2'
t5550-http-fetch-dumb.sh ........................... skipped: no web server found at '/usr/sbin/apache2'
t5551-http-fetch-smart.sh .......................... skipped: no web server found at '/usr/sbin/apache2'
t5561-http-backend.sh .............................. skipped: no web server found at '/usr/sbin/apache2' would have caught that error in the first place. If not, maybe we could add some general |
ea60bc0
to
b690df8
Compare
Possibly. But frankly, I do not want to install a full-fledged Apache Web Server just for that... ;-) If we are serious to catch that bug if it reoccurs, we need to make a special |
Fixed here. |
Hold off! I am going to add one more commit. |
Yep, that 15d371c was the commit I still wanted to have included. Now this PR is good to go for review and merge. |
Actually, for the benefit of interested parties, I would like to share the strategy that eventually let me diagnose this problem properly and even fix it. The first thing was to find out where this exit code 128 came from. Unfortunately, I found out about this problem after hours of hunting when I saw that the diff --git a/usage.c b/usage.c
index ed14645..d0e1fcd 100644
--- a/usage.c
+++ b/usage.c
@@ -146,6 +146,7 @@ int error(const char *err, ...)
va_start(params, err);
error_routine(err, params);
va_end(params);
+fflush(stderr);
return -1;
}
At this point I could at least rely on my debug statements (for which I use Now another hours-long hunt started, where I put in debug statements into the likely code paths, starting with To reduce the turn-around time, I had two windows open, one with make git-remote-https.exe && cp git-remote-https.exe /mingw32/libexec/git-core/ && git clone https://github.com/dscho/git dscho-git; echo $? The Every once in a while, I would stage only the most recent debug statement with Eventually, I arrived at this diff (in addition to the diff --git a/compat/mingw.c b/compat/mingw.c
index 38a79d0..5730141 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -990,6 +990,8 @@ static wchar_t *make_environment_block(char **deltaenv)
/* create environment block from temporary environment */
for (i = 0; tmpenv[i]; i++) {
+error("i: %d", i);
+error("strlen: %d", strlen(tmpenv[i]));
size = 2 * strlen(tmpenv[i]) + 2; /* +2 for final \0 */
ALLOC_GROW(wenvblk, (wenvpos + size) * sizeof(wchar_t), wenvsz);
wenvpos += xutftowcs(&wenvblk[wenvpos], tmpenv[i], size) + 1; Note that I had two debug statements because the second would not even be executed because the Further debug statements revealed that Another debug statement later, it was clear that some code other than This started another hunt – several hours, debugging is a patient persons' game – for the code that had modified that pointer outside of static char *dscho_zero = NULL;
static char **dscho_pointer = &dscho_zero;
void dscho_assert(const char *file, int lineno)
{
if (*dscho_pointer)
die("%s:%d NONONO!", file, lineno);
error("%s:%d STILL OK", file, lineno);
} In if (environ_size == 69) dscho_pointer = environ + (environ_size - 1); Then I would litter the likely code paths with copies of the following snippet: { extern void dscho_assert(const char*, int); dscho_assert(__FILE__, __LINE__); } This eventually led to the realization that the environment changes in the call to From that point it was all smooth sailing: it turned out that the difference between the environment before and after that function call was the addition of The assumption of Git's MinGW-specific environment code is that we have full control over it and can ensure that it is sorted and terminated by a The way I handled this is to detect when outside changes occur (it is not a totally fool-proof way, of course, it would have to be an extensive test that the environment is still sorted and terminated by a Phew! |
@sschuberth would you have time to review this PR? (I know that @t-b is busy...) |
@dscho I'd have some time tomorrow morning again. And I probably need some time to understand the underlying issue. (Is it something about a memory region being reallocated and thus potentially moved, leaving our pointer to the original region dandling?) |
No, it is about extending the environment and then terminating it by a pointer to a single The issue would be easily papered over simply by this fix: 15d371c363f5062cbd88104602f89291a0d83a58. No more running beyond the end of the environment, case closed, right? Actually, no. The entire idea of initializing our own environment is so that we can ensure that it is UTF-8 encoded. For performance, @kblees wrote the code in such a way that it sorts the entries after conversion, to be able to use a binary search in Except that The idea of b690df82565f900d2d91e0936e354a229ab8c5e0 is to detect that situation quickly and remedy it. I agree that it is not a total solution, but it is definitely necessary to fix the bug. I could imagine, for example, that the external code re-allocates the environment in such a way that There is also the possibility of a situation where a couple of environment variables get removed and then others get appended. If that happens between calls to Lastly, I have a hunch that we will eventually need to use critical sections to guard this code against multi-threaded usage. But those are really all bridges I would like to cross later, when – and if! – needed. |
I guess I should enhance the commit message. Will do so after running some errands, i.e. in a couple of hours. |
Outside of our Windows-specific code, the end of the environment can be marked also by a pointer to a NUL character, not only by a NULL pointer as our code assumed so far. That led to a buffer overrun in `make_environment_block()` when running `git-remote-https` in `mintty` (because `curl_global_init()` added the `CHARSET` environment variable *outside* of `mingw_putenv()`, ending the environment in a pointer to an empty string). Side note for future debugging on Windows: when running programs in `mintty`, the standard input/output/error is not connected to a Win32 Console, but instead is pipe()d. That means that even stderr may not be written completely before a crash, but has to be fflush()ed explicitly. For example, when debugging crashes, the developer should insert an `fflush(stderr);` at the end of the `error()` function defined in usage.c. Signed-off-by: Johannes Schindelin <[email protected]>
15d371c
to
ac5283c
Compare
Okay, I think the commit messages are better now (and the order of the commits is more intuitive now, albeit no longer chronological). @sschuberth please have a look. |
The environment is modified in most surprising circumstances, and not all of them are under Git's control. For example, calling curl_global_init() on Windows will ensure that the CHARSET variable is set, adding one if necessary. While the previous commit worked around crashes triggered by such outside changes of the environment by relaxing the requirement that the environment be terminated by a NULL pointer, the other assumption made by `mingw_getenv()` and `mingw_putenv()` is that the environment is sorted, for efficient lookup via binary search. Let's make real sure that our environment is intact before querying or modifying it, and reinitialize our idea of the environment if necessary. With this commit, before working on the environment we look briefly for indicators that the environment was modified outside of our control, and to ensure that it is terminated with a NULL pointer and sorted again in that case. Note: the indicators are maybe not sufficient. For example, when a variable is removed, it will not be noticed. It might also be a problem if outside changes to the environment result in a modified `environ` pointer: it is unclear whether such a modification could result in a problem when `mingw_putenv()` needs to `realloc()` the environment buffer. For the moment, however, the current fix works well enough, so let's only face the potential problems when (and if!) they occur. Signed-off-by: Johannes Schindelin <[email protected]>
ac5283c
to
22cc9c9
Compare
Ok @dscho, feel free to merge. |
Fix access violations when cloning/fetching via HTTPS
@sschuberth please note that I would really prefer strongly if you merged PRs after you reviewed them and are fine with merging. It feels like quite a bit of really superfluous back and forth to say "feel free to merge" instead of just merging it, and it also feels wrong to have the original author merge her own work after others reviewed it – just imagine that you send a patch to the Git mailing list, Junio and others review it, say it is fine, and then you push it to the official repository? That just sounds wrong, does it not? |
Not necessarily. It depends, on many things. For example: In a project with "dictatorship" (I do not mean to make this sound negative), i.e. with a single person responsible for final approval, and very few persons having the permission to push, you have no choice but to make the reviewer (i.e. maintainer / owner) merge. But in a project where almost all contributors have equal push rights (like ours), the past years in my day job have proven that merges by the owner make more sense, both to have the change to make additional changes that none of the reviewers missed (like you wanted to do here), and to have control over when the PR is integrated. Edit: But as GitHub, as opposed to e.g. Gerrit, has no nice way of recording / documenting review ratings / acknowledgements, I'll following your wish and merge PRs in the future if I'm fine with them instead of leaving that to the author. |
Fix access violations when cloning/fetching via HTTPS
Fix access violations when cloning/fetching via HTTPS
Wow, kudos for tracking down this issue. I believe this was caused by disabling USE_NED_ALLOCATOR in the Makefile in the MSys2 case. With nedmalloc enabled, MSVCRT's putenv simply fails with EINVAL. While this is not nice for 3rd party libraries that try to modify the environment, at least it doesn't destroy our internal data structures. This raises a few questions:
Especially the last point leads me to the conclusion that the issue can only be properly fixed by separating our environment from the MSVCRT environment. Something like this:
This means that environment changes by git will not be seen by 3rd party libs and vice versa, but I think for the problem at hand this is acceptable. Regarding this patch series, I believe the first two patches don't fix anything, and the commit messages are rather misleading than helpful. For example, environ_size is always at least 1 after being initialized in mingw_startup, even if the environment is empty. And if some developer happens to call mingw_getenv in mingw_startup before initializing the environment, I feel it is safer to just segfault than being 'defensive' and return NULL. |
That was almost certainly an oversight.
I understand that it is tempting to build on top of the original idea, but the fundamental issue that was unveiled by my bug hunt is that we play games with the environment. This PR is just a hot-fix so that we can keep going forward to a Git for Windows 2.x release, but the clean solution is to stop playing games with the environment, not to enhance the games.
You are correct. That is why they clearly state to introduce defensive programming, i.e. fending off future bugs.
Well, please understand that this is unnecessarily offensive a statement. The commit message states exactly what is going on:
Yes, it implies that there is a significant error in having While it is true that the current code as of This is not a theoretic issue, but a very practical one: The lack of this defensive programming cost me a sizable chunk of the two days I had to spend hunting this bug. To recapitulate (because you have not followed this issue during the time it was current): there was a Heisenbug which prevented fetching via HTTPS. In the beginning, I could only reproduce by running the entire net installer (and not always; sometimes it would succeed), and the worst part: the symptom was simply a failed clone. No error message, no nothing. In hindsight it is very easy to see that the debugging was made especially hard due to the fact that it was an unreported segmentation fault in Please also understand that after having to spend such an amount of time on fixing this bug, the comment "And if some developer happens to call |
Make sure that the nedmalloc patch is applied before.
Wow. This has really taken quite a lot of time to debug. This developer alone spent 15 hours tracking it down and squashing it, and @nalla as well as @sschuberth helped as well.