-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Increase classic queue shutdown timeout #3409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A value that is too low will prevent the index from shutting down in time when there are many queues. This leads to the process being killed and on the next RabbitMQ restart a (potentially very long) dirty recovery is needed. The value of 10 minutes was chosen to mirror the shutdown timeout of the message store. Since both queues and message store need to have shut down gracefully in order to have a clean restart it makes sense to use the same value. Related: c40c262
A bit more context: the natural case where this is an issue is 3 nodes, 10k+ classic queues, 100k+ consumers, and other objects such as quorum queues. To reproduce this naturally you simply need to have the VM have to do enough work so that there isn't enough resources for the classic queues to shut down within 30 seconds. |
Without this change the test could take a very long time to cleanup the queues and finish because of a race condition between the queue deletion and the federation link being restarted and declaring the queue again. (The test bidirectional was renamed to message_flow to better represent what it is doing.)
The most recent commit fixes the issue with the |
Consensus is to ship 3.9.6 without this so this should only be merged after that is released. Also the release notes will need to be updated. |
Backported to |
…eout Increase classic queue shutdown timeout (cherry picked from commit 5fb118e)
Backported to |
A value that is too low will prevent the index from shutting
down in time when there are many queues. This leads to the
process being killed and on the next RabbitMQ restart a
(potentially very long) dirty recovery is needed.
The value of 10 minutes was chosen to mirror the shutdown
timeout of the message store. Since both queues and message
store need to have shut down gracefully in order to have
a clean restart it makes sense to use the same value.
Related: c40c262
To reproduce
You can get a rough reproduction of the issue (before this PR) by using the following patch:
Then start the node
make run-broker
, shut it down usingERL_LIBS=deps ./escript/rabbitmqctl stop
then start it againmake run-broker
. Check the logs that RabbitMQ went into dirty recovery.You will also likely see this in the logs during the shutdown:
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
document