Skip to content

Fix t5570 flakiness on macOS #2111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 7, 2019

This pair of patches fixes the problem where our CI tests would fail randomly in t5570.9 or t5570.10, and a re-run would "fix" it.

It was integrated into git.git's next already, via git@59fe6aa5ac38, and will most likely make it into master soon.

Let's take this early, to lessen the burden on the person who has to re-run all those CI runs when t5570 fails again (which would be me).

peff added 2 commits March 5, 2019 15:02
The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.

Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.

Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.

Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
The default SIGPIPE behavior can be useful for a command that generates
a lot of output: if the receiver of our output goes away, we'll be
notified asynchronously to stop generating it (typically by killing the
program).

But for a command like fetch, which is primarily concerned with
receiving data and writing it to disk, an unexpected SIGPIPE can be
awkward. We're already checking the return value of all of our write()
calls, and dying due to the signal takes away our chance to gracefully
handle the error.

On Linux, we wouldn't generally see SIGPIPE at all during fetch. If the
other side of the network connection hangs up, we'll see ECONNRESET. But
on OS X, we get a SIGPIPE, and the process is killed. This causes t5570
to racily fail, as we sometimes die by signal (instead of the expected
die() call) when the server side hangs up.

Let's ignore SIGPIPE during the network portion of the fetch, which will
cause our write() to return EPIPE, giving us consistent behavior across
platforms.

This fixes the test flakiness, but note that it stops short of fixing
the larger problem. The server side hit a fatal error, sent us an "ERR"
packet, and then hung up. We notice the failure because we're trying to
write to a closed socket. But by dying immediately, we never actually
read the ERR packet and report its content to the user. This is a (racy)
problem on all platforms. So this patch lays the groundwork from which
that problem might be fixed consistently, but it doesn't actually fix
it.

Note the placement of the SIGPIPE handling. The absolute minimal change
would be to ignore SIGPIPE only when we're writing. But twiddling the
signal handler for each write call is inefficient and maintenance
burden. On the opposite end of the spectrum, we could simply declare
that fetch does not need SIGPIPE handling, since it doesn't generate a
lot of output, and we could just ignore it at the start of cmd_fetch().

This patch takes a middle ground. It ignores SIGPIPE during the network
operation (which is admittedly most of the program, since the actual
network operations are all done under the hood by the transport code).
So it's still pretty coarse.

Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
@dscho
Copy link
Member Author

dscho commented Mar 7, 2019

Thanks, @jamill!

@dscho dscho merged commit 1a02835 into git-for-windows:master Mar 7, 2019
@dscho dscho added this to the v2.21.0(2) milestone Mar 7, 2019
@dscho dscho mentioned this pull request Mar 8, 2019
dscho added a commit that referenced this pull request Mar 11, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request Mar 11, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request Apr 5, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request Apr 5, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request Apr 11, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request Apr 16, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request Apr 27, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request May 8, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request May 9, 2019
…rk-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request May 10, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit to dscho/git that referenced this pull request May 13, 2019
…pe-during-network-transport

Fix t5570 flakiness on macOS
git-for-windows-ci pushed a commit that referenced this pull request May 14, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request May 21, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request May 22, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request May 25, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request Jun 4, 2019
…rk-transport

Fix t5570 flakiness on macOS
dscho added a commit that referenced this pull request Jun 8, 2019
…rk-transport

Fix t5570 flakiness on macOS
@gitgitgadget gitgitgadget bot deleted the jk/no-sigpipe-during-network-transport branch August 18, 2023 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants