Skip to content

Per-vhost supervisors. #1158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 9, 2017
Merged

Per-vhost supervisors. #1158

merged 19 commits into from
May 9, 2017

Conversation

hairyhum
Copy link
Contributor

Fixes #1146

Message stores and queues for a specific vhost are moved to per-vhost supervision trees.
rabbit_vhost_sup_sup is a simple_one_for_one supervisor, containing a supervision tree per vhost.
Recovery mechanism was changed to recover a specific vhost when it's restarting. So message store will be started with actual durable queues and the queues will be recovered the same way as on rabbit start.
Recovery terms are moved to per-vhost directories.
Supports migration from 3.6 and 3.5 was tested.

Because restarting logic is per-vhost, a vhost supervisor cannot restart it's children (even using one_for_all), so every vhost supervisor is a child of another "wrapper" supervisor. THe wrapper supervisor defines how many times a vhost supervisor can be restarted (currently 1 in 1000 sec).

When vhost fails to restart, it will either crash the rabbit application, or gives up and stops. This behaviour can be configured using vhost_restart_strategy environment variable. The variable can be either stop_rabbit or give_up (stop_rabbit by default).

@hairyhum
Copy link
Contributor Author

Upgrade from 3.4 requires old versions of some functions. We should require upgrade to 3.6 and then to 3.7

@michaelklishin
Copy link
Collaborator

I think that's reasonable.

VHostStubFile = filename:join(VHostDir, ".vhost"),
ok = rabbit_file:ensure_dir(VHostStubFile),
ok = file:write_file(VHostStubFile, VHost),
rabbit_log:info("Starting vhost ~p~n", [VHost]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation, also other logs below

Daniil Fedotov added 9 commits April 12, 2017 12:13
Per-vhost message stores can be restarted, but queues contain
references for old message stores in message store client data,
also queues rely on message store process to report confirms for
messages on disk.
Because after message store restart queues will not get any confirms and
will fail with badarg error trying to access message store with an old client,
queue processes should be restarted together with message stores.
Queue process cannot monitor message store because of backing_queue mechanism,
so they should be controlled by a supervision tree. One tree will contain
queues supervisor and message store proecesses.
Per-vhost supervisor will restart if any of it's children dies.
Per-vhost supervisor restart process will do queue and message store data recovery
the same way as pre-3.7 global message store did, just with VHost as an argument and
in a vhost data directory.
Support reading/saving recovery terms from global storage to per-vhost storages.
Wrapper supervisor makes it possible to make vhosts restartable
exactly N times without interfering with each other.
Because vhost should call recovery every time it's restarted,
and recovery includes dynamically adding message stores,
it's impossible to restart it using one_for_all.
So vhost supervisor will just fail if it's child fails
and vhost supervisor wrapper will restart it with recovery.
@hairyhum hairyhum force-pushed the rabbitmq-server-1146-full branch from 499fcbd to bedbb03 Compare April 12, 2017 13:10
dcorbacho and others added 10 commits April 21, 2017 11:29
queue shutdown.

When the msg store has crashed and the supervisor is trying to stop
all its sibilings, the variable queue termination might crash as the store
is not available. This would prevent the supervisor to restart the vhost,
as it stops at the second crash.
bringing down the whole supervision tree and rabbit app
Only for when the top supervisor reaches its max restart intensity,
as stopping applications from application:stop/1 will block
@michaelklishin michaelklishin merged commit d1eecf2 into master May 9, 2017
hairyhum pushed a commit that referenced this pull request Jul 18, 2017
If a vhost supervision tree is not active, which can be
a result of an error in message store, refuse connections
to this vhost on the node.
This is a follow-up to [#140841611] [#1158]

[#145106713]
hairyhum pushed a commit that referenced this pull request Jul 19, 2017
If a vhost supervision tree is not active, which can be
a result of an error in message store, refuse connections
to this vhost on the node.
This is a follow-up to [#140841611] [#1158]

[#145106713]
@dumbbell dumbbell deleted the rabbitmq-server-1146-full branch January 2, 2018 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants