-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Per-vhost supervisors. #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Per-vhost supervisors. #1158
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Mar 23, 2017
Upgrade from |
I think that's reasonable. |
dcorbacho
reviewed
Apr 12, 2017
src/rabbit_vhost.erl
Outdated
VHostStubFile = filename:join(VHostDir, ".vhost"), | ||
ok = rabbit_file:ensure_dir(VHostStubFile), | ||
ok = file:write_file(VHostStubFile, VHost), | ||
rabbit_log:info("Starting vhost ~p~n", [VHost]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation, also other logs below
Per-vhost message stores can be restarted, but queues contain references for old message stores in message store client data, also queues rely on message store process to report confirms for messages on disk. Because after message store restart queues will not get any confirms and will fail with badarg error trying to access message store with an old client, queue processes should be restarted together with message stores. Queue process cannot monitor message store because of backing_queue mechanism, so they should be controlled by a supervision tree. One tree will contain queues supervisor and message store proecesses. Per-vhost supervisor will restart if any of it's children dies. Per-vhost supervisor restart process will do queue and message store data recovery the same way as pre-3.7 global message store did, just with VHost as an argument and in a vhost data directory.
Support reading/saving recovery terms from global storage to per-vhost storages.
Wrapper supervisor makes it possible to make vhosts restartable exactly N times without interfering with each other. Because vhost should call recovery every time it's restarted, and recovery includes dynamically adding message stores, it's impossible to restart it using one_for_all. So vhost supervisor will just fail if it's child fails and vhost supervisor wrapper will restart it with recovery.
499fcbd
to
bedbb03
Compare
queue shutdown. When the msg store has crashed and the supervisor is trying to stop all its sibilings, the variable queue termination might crash as the store is not available. This would prevent the supervisor to restart the vhost, as it stops at the second crash.
bringing down the whole supervision tree and rabbit app
Only for when the top supervisor reaches its max restart intensity, as stopping applications from application:stop/1 will block
Conflicts: Makefile
michaelklishin
approved these changes
May 9, 2017
hairyhum
pushed a commit
that referenced
this pull request
Jul 18, 2017
If a vhost supervision tree is not active, which can be a result of an error in message store, refuse connections to this vhost on the node. This is a follow-up to [#140841611] [#1158] [#145106713]
hairyhum
pushed a commit
that referenced
this pull request
Jul 19, 2017
If a vhost supervision tree is not active, which can be a result of an error in message store, refuse connections to this vhost on the node. This is a follow-up to [#140841611] [#1158] [#145106713]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1146
Message stores and queues for a specific vhost are moved to per-vhost supervision trees.
rabbit_vhost_sup_sup
is asimple_one_for_one
supervisor, containing a supervision tree per vhost.Recovery mechanism was changed to recover a specific vhost when it's restarting. So message store will be started with actual durable queues and the queues will be recovered the same way as on rabbit start.
Recovery terms are moved to per-vhost directories.
Supports migration from 3.6 and 3.5 was tested.
Because restarting logic is per-vhost, a vhost supervisor cannot restart it's children (even using
one_for_all
), so every vhost supervisor is a child of another "wrapper" supervisor. THe wrapper supervisor defines how many times a vhost supervisor can be restarted (currently 1 in 1000 sec).When vhost fails to restart, it will either crash the
rabbit
application, or gives up and stops. This behaviour can be configured usingvhost_restart_strategy
environment variable. The variable can be eitherstop_rabbit
orgive_up
(stop_rabbit
by default).