Skip to content

[master] RabbitMQ-server does not quit with Ctrl-C #1192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[master] RabbitMQ-server does not quit with Ctrl-C #1192

wants to merge 1 commit into from

Conversation

sodre
Copy link
Contributor

@sodre sodre commented Apr 25, 2017

The is something in rabbitmq-server that prevents dash from honoring the trap calls in lines 290-291.
In that cases "control-c" will kill the script but leave epmd and beam still running in the background.

Changing from dash to bash fixes this problem.

@dumbbell dumbbell self-requested a review April 25, 2017 10:07
@dumbbell dumbbell added the bug label Apr 25, 2017
@dumbbell dumbbell added this to the 3.6.10 milestone Apr 25, 2017
@dumbbell dumbbell self-assigned this Apr 25, 2017
Copy link
Collaborator

@dumbbell dumbbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sodre!

This is known issue with Dash unfortunately. Do you know other shells where the signal handler is broken?

I didn't have time yet to work on this issue. Depending on Bash as a workaround is ok.

Thank you for your patch!

@@ -1,4 +1,4 @@
#!/bin/sh -e
#!/bin/bash -e
Copy link
Collaborator

@dumbbell dumbbell Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't hardcode the path to Bash because it could be installed somewhere else. For instance, on various *BSD, Bash is installed as /usr/local/bin/bash. Please use /usr/bin/env bash instead.

In the process, you'll have to remove the -e and add a set -e (which is also cleaner) after the license.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. these are all good points. I will take care of them soon.

@gerhard
Copy link
Contributor

gerhard commented Apr 25, 2017

In the interest of keeping all scripts consistent, I would suggest that we change all /bin/sh/ to /usr/bin/env bash . This also means that we need to specify bash as a package dependency for all distros that we maintain.

@michaelklishin
Copy link
Collaborator

@sodre another thing we have to ask: if you expect this change to go into a future 3.6.x release, please rebase this against stable (and edit this or submit a new PR). Or force-push your changes, if it's a single commit we can cherry-pick.

@sodre
Copy link
Contributor Author

sodre commented Apr 25, 2017

@gerhard I can change all of them as well but it is probably not needed. The reason is the other runnable scripts, are not using "trap".

Let me know if you still want the others changed and I will make it happen.

@sodre sodre changed the title RabbitMQ-server does not quit with Ctrl-C [WIP] RabbitMQ-server does not quit with Ctrl-C Apr 25, 2017
@sodre
Copy link
Contributor Author

sodre commented Apr 25, 2017

Changed it WIP, I am trying to replicate it in another system.

@sodre sodre changed the title [WIP] RabbitMQ-server does not quit with Ctrl-C RabbitMQ-server does not quit with Ctrl-C Apr 26, 2017
@sodre
Copy link
Contributor Author

sodre commented Apr 26, 2017

Folks,
Here are the steps to replicate the bug on your own using a blank Docker container.

docker run -it --rm ubuntu bash
apt-get update
apt-get install -y wget  xz-utils erlang-asn1 erlang-base erlang-corba erlang-diameter erlang-edoc erlang-eldap erlang-erl-docgen erlang-eunit erlang-ic erlang-inets erlang-mnesia erlang-nox erlang-odbc erlang-os-mon erlang-parsetools erlang-percept erlang-public-key erlang-runtime-tools erlang-snmp erlang-ssh erlang-ssl erlang-tools erlang-webtool erlang-xmerl libltdl7 libodbc1 libpopt0

wget http://www.rabbitmq.com/releases/rabbitmq-server/v3.6.9/rabbitmq-server-generic-unix-3.6.9.tar.xz
tar -xf rabbitmq-server-generic-unix-3.6.9.tar.xz 
./rabbitmq_server-3.6.9/sbin/rabbitmq-server

Wait until all the plugins are loaded and hit ctrl-c. Your terminal should look like this..

root@3cf4b2669034:/# ./rabbitmq_server-3.6.9/sbin/rabbitmq-server

              RabbitMQ 3.6.9. Copyright (C) 2007-2016 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /rabbitmq_server-3.6.9/var/log/rabbitmq/[email protected]
  ######  ##        /rabbitmq_server-3.6.9/var/log/rabbitmq/[email protected]
  ##########
              Starting broker...
 completed with 0 plugins.
^Croot@3cf4b2669034:/# 

The shell-script quit but epmd and beam are still running in the background. You can type ps aux to verify. After that run

pkill epmd && pkill beam.smp

If we "apply the patch", i.e. use sed, and do the same process this is what we get:

root@3cf4b2669034:/# sed -i 's|/bin/sh -e|/usr/bin/env bash|' ./rabbitmq_server-3.6.9/sbin/rabbitmq-server 
root@3cf4b2669034:/# ./rabbitmq_server-3.6.9/sbin/rabbitmq-server

              RabbitMQ 3.6.9. Copyright (C) 2007-2016 Pivotal Software, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /rabbitmq_server-3.6.9/var/log/rabbitmq/[email protected]
  ######  ##        /rabbitmq_server-3.6.9/var/log/rabbitmq/[email protected]
  ##########
              Starting broker...
 completed with 0 plugins.
^CStopping and halting node rabbit@3cf4b2669034 ...
Gracefully halting Erlang VM
root@3cf4b2669034:/# 

The reason I am reproducing the bug is because the dash man page says that it supports traps, and a simple example code for dash works out of the box. Unfortunately, there is something with the rabbitmq-server script that is making the code fail. Using bash is one possible fix.

@dumbbell
Copy link
Collaborator

dumbbell commented Apr 26, 2017

Thank you for the updated patch!

The following script is enough to demonstrate the issue:

#!/bin/sh

set -e

do_things () {
  sleep 20
}

trap "echo 'Got SIGINT'" INT

do_things &

wait $!

If we hit Ctrl+C, the script is supposed to display ^CGot SIGINT. With Bash or FreeBSD's sh, this works as expected. However with Dash, it shows ^C only, meaning the signal handler was not executed.

I just noticed that even with Dash, the wait command still exits with 130, which is 128 + signal number, in this case SIGINT which is signal number 2 (kill -l lists signal names and numbers). So a possible workaround would be to test the exit status of wait and call the signal handler manually.

@dumbbell
Copy link
Collaborator

Here is the same small script modified with a homebrew signal handler for Dash:

#!/bin/sh

set -e

do_things () {
  sleep 20
}

signal_handler() {
  echo "Running signal handler"
}

trap "signal_handler; exit 0" HUP TERM TSTP
trap "signal_handler; exit 130" INT

do_things &

if wait $!; then
  :
else
  status=$?
  case "$(kill -l $status)" in
    HUP|TERM|TSTP) signal_handler ;;
    INT)           signal_handler; exit $status ;;
  esac
fi

It behaves the same with Dash, Bash and FreeBSD sh.

What do you think?

@sodre
Copy link
Contributor Author

sodre commented Apr 26, 2017

I like that! I will test it on my system and change the patch later today.

@dumbbell
Copy link
Collaborator

dumbbell commented Apr 26, 2017

@sodre: Note that I changed the trap line for SIGINT to exit with 130. Otherwise, on working shells, both the regular handler and the fake handler are executed.

Also, before we convert the exit status to a signal name with kill -l, we must verify that the exit status is greater than or equal to 128. Because kill -l will happily convert eg. 2 to INT and we don't want to convert regular exit codes from wait to signal names.

@gerhard
Copy link
Contributor

gerhard commented Apr 26, 2017

The wait workaround for Dash sounds most reasonable since we cannot introduce a dependency on bash in 3.6. Long-term we really want to simplify all scripts, including rabbitmq-server.

@sodre sodre changed the title RabbitMQ-server does not quit with Ctrl-C [master] RabbitMQ-server does not quit with Ctrl-C Apr 27, 2017
@sodre
Copy link
Contributor Author

sodre commented Apr 27, 2017

We will be going with the "wait" trick. However, I think the answer is simpler than the second proposed patch. Using the first patch, I added a silly If statement to look at the result of wait, surprisingly it works!

When you guys have time please verify it also works for you and that I am not getting things messed up. In particular testing it for FreeBSD would be great.

#!/bin/sh

set -e

do_things () {
  sleep 20
}

trap "echo 'Got SIGINT'" INT

do_things &

if wait $!; then
  :
fi

@michaelklishin
Copy link
Collaborator

michaelklishin commented Apr 27, 2017 via email

@sodre
Copy link
Contributor Author

sodre commented Apr 27, 2017

Michael,

#1200 is the answer to #1192 (comment) when you asked to rebase the patch into "stable" if I wanted the fix in 3.6.

#1192 is the same exact change but based in master.

What I was asking for you guys to look at is if the script in #1192 (comment) also work for you. It was mostly out of disbelief since the only change required is this "NoOp" with the wait statement.

I tested the script in my Ubuntu box that was failing before, and with the "NoOp" it works well!

@gerhard
Copy link
Contributor

gerhard commented Apr 27, 2017

We have determined that set -e in Dash will prevent traps from executing. Therefore, the problem is a bit more complicated and requires us to review the use-case for set -e.

Given a happy path scenario, we are thinking of forcing wait to always succeed. We are leaning towards wait $! || true which is a variation of your suggestion.

The unhappy paths will be a bit more complicated since we cannot use set -e in Dash as it prevents any traps from executing.

On SIGINT rabbitmq-server exits with status code 130 (128 + 2). We depend on this behaviour in makefile recipes. Wrapping wait in an if statement intercepts wait's exit code, so we can't use the PR as it stands. We always have to explicitly exit in all traps:

#!/bin/sh

set -ex

do_things() {
  sleep 10
}

trap "echo SIGINTing; exit 130" INT
trap "echo SIGTERMing; exit 0" TERM
trap "echo SIGHUPing; exit 0" HUP

do_things &
do_things_pid="$!"

wait $do_things_pid || true

What do you think?

/bin/sh may point to dash which has a bug with the trap calls in
lines 290-291. In that case "control-c" will kill the wait
command and leave rabbitmq-server still running in the
background.

The fix is to add a noop around the wait command.

Exit with errcode 130 for a SIGINT trap.

Make set -e consistent with the other scripts.

Signed-off-by: Patrick Sodré <[email protected]>
@sodre
Copy link
Contributor Author

sodre commented Apr 27, 2017

@gerhard,
I think I got it now. Please review at your earliest convenience.

I also fixed the beginning of the code so that the "set -e" call is consistent with the other scripts in master. This was different for the "stable" branch PR #1200 , the scripts there were called with #!/bin/sh -e.

I hope that covers all you guys need. It was awesome working on this PR, thanks for giving me the chance to contribute to it.

dumbbell pushed a commit that referenced this pull request Apr 28, 2017
On Debian-like distributions, `/bin/sh` defaults to `/bin/dash` which
has a bug with signal handlers.

In the case of Dash, it looks like `set -e` (set at the beginning of
this script) gets precedence over signal handling. Therefore, when
`wait` is interrupted, its exit code is non-zero and because of `set
-e`, the script terminates immediately without running the signal
handler. To work around this issue, we use `|| true` to force that
statement to succeed and the signal handler to properly execute.

Replace the use of `-e` on the shebang line by a standalone `set -e`,
like other scripts. This way, the script behavior remains the same if
the script is started as an argument to a shell. For instance:
    bash ./rabbitmq-server

Bump the copyright year to 2017.

Signed-off-by: Patrick Sodré <[email protected]>

Fixes #1192.
@dumbbell
Copy link
Collaborator

Thank you @sodre for the patch!

We added more comments and cherry-picked it to our stable branch. It will soon be merged to the master branch as well.

@dumbbell dumbbell closed this Apr 28, 2017
@sodre sodre deleted the patch-1 branch August 7, 2017 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants