Per-vhost supervisors. #1158

hairyhum · 2017-03-23T10:43:28Z

Message stores and queues for a specific vhost are moved to per-vhost supervision trees.
rabbit_vhost_sup_sup is a simple_one_for_one supervisor, containing a supervision tree per vhost.
Recovery mechanism was changed to recover a specific vhost when it's restarting. So message store will be started with actual durable queues and the queues will be recovered the same way as on rabbit start.
Recovery terms are moved to per-vhost directories.
Supports migration from 3.6 and 3.5 was tested.

Because restarting logic is per-vhost, a vhost supervisor cannot restart it's children (even using one_for_all), so every vhost supervisor is a child of another "wrapper" supervisor. THe wrapper supervisor defines how many times a vhost supervisor can be restarted (currently 1 in 1000 sec).

When vhost fails to restart, it will either crash the rabbit application, or gives up and stops. This behaviour can be configured using vhost_restart_strategy environment variable. The variable can be either stop_rabbit or give_up (stop_rabbit by default).

hairyhum · 2017-03-23T12:22:29Z

Upgrade from 3.4 requires old versions of some functions. We should require upgrade to 3.6 and then to 3.7

michaelklishin · 2017-03-23T12:44:52Z

I think that's reasonable.

dcorbacho · 2017-04-12T09:08:47Z

src/rabbit_vhost.erl

+    VHostStubFile = filename:join(VHostDir, ".vhost"),
+    ok = rabbit_file:ensure_dir(VHostStubFile),
+    ok = file:write_file(VHostStubFile, VHost),
+rabbit_log:info("Starting vhost ~p~n", [VHost]),


Indentation, also other logs below

Per-vhost message stores can be restarted, but queues contain references for old message stores in message store client data, also queues rely on message store process to report confirms for messages on disk. Because after message store restart queues will not get any confirms and will fail with badarg error trying to access message store with an old client, queue processes should be restarted together with message stores. Queue process cannot monitor message store because of backing_queue mechanism, so they should be controlled by a supervision tree. One tree will contain queues supervisor and message store proecesses. Per-vhost supervisor will restart if any of it's children dies. Per-vhost supervisor restart process will do queue and message store data recovery the same way as pre-3.7 global message store did, just with VHost as an argument and in a vhost data directory.

Support reading/saving recovery terms from global storage to per-vhost storages.

… be started

Wrapper supervisor makes it possible to make vhosts restartable exactly N times without interfering with each other. Because vhost should call recovery every time it's restarted, and recovery includes dynamically adding message stores, it's impossible to restart it using one_for_all. So vhost supervisor will just fail if it's child fails and vhost supervisor wrapper will restart it with recovery.

queue shutdown. When the msg store has crashed and the supervisor is trying to stop all its sibilings, the variable queue termination might crash as the store is not available. This would prevent the supervisor to restart the vhost, as it stops at the second crash.

bringing down the whole supervision tree and rabbit app

Only for when the top supervisor reaches its max restart intensity, as stopping applications from application:stop/1 will block

Conflicts: Makefile

If a vhost supervision tree is not active, which can be a result of an error in message store, refuse connections to this vhost on the node. This is a follow-up to [#140841611] [#1158] [#145106713]

This was referenced Mar 23, 2017

Support per-vhost stop/start API for backing queue behaviour. rabbitmq/rabbitmq-common#188

Merged

Queue supervisor location for per-vhost supervisor rabbitmq/rabbitmq-ct-helpers#5

Merged

hairyhum mentioned this pull request Mar 23, 2017

DO NOT MERGE Pluggable message stores experiment #1142

Closed

dcorbacho reviewed Apr 12, 2017

View reviewed changes

Daniil Fedotov added 9 commits April 12, 2017 12:13

Queue and msg store supervisor locations

6a95535

Migrating to per-vhost supervisor message store.

256caae

Support reading/saving recovery terms from global storage to per-vhost storages.

Create default vhost after rabbit recovery, so vhosts supervisor will…

dd2a79d

… be started

Move test fixes from unit_inbroker to backing_queue

5c0cf89

Configurable vhost restart strategy

baef781

Ignore pid from supervisor:start_child

68a40af

Remove debug logs

bedbb03

hairyhum force-pushed the rabbitmq-server-1146-full branch from 499fcbd to bedbb03 Compare April 12, 2017 13:10

dcorbacho and others added 10 commits April 21, 2017 11:29

Make rabbit_vhost_sup permanent so it can be restarted once without

d8c66d1

bringing down the whole supervision tree and rabbit app

Use new supervisor2:prep_stop to stop rabbit dependencies on shutdown

dcf15e7

Only for when the top supervisor reaches its max restart intensity, as stopping applications from application:stop/1 will block

Typo

585e43b

Allow 2 restarts per hour for vhost msg stores

69bb7b8

Add vhost_restart_strategy to default app config.

d6fa209

Merge branch 'master' into rabbitmq-server-1146-full

7132664

Conflicts: Makefile

Rename rabbit.vhost_restart_strategy options

f8adbe9

Schema mapping for vhost_restart_strategy

d89a50a

Merge branch 'master' into rabbitmq-server-1146-full

4cdad4b

michaelklishin approved these changes May 9, 2017

View reviewed changes

michaelklishin merged commit d1eecf2 into master May 9, 2017

michaelklishin mentioned this pull request Jun 30, 2017

Multiple processes tripped up by a non-existent vhost leads to node termination #1280

Closed

hairyhum mentioned this pull request Jul 18, 2017

Close connections when vhost is unavailable (supervision tree is down) #1293

Merged

dumbbell deleted the rabbitmq-server-1146-full branch January 2, 2018 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-vhost supervisors. #1158

Per-vhost supervisors. #1158

Uh oh!

hairyhum commented Mar 23, 2017

Uh oh!

hairyhum commented Mar 23, 2017

Uh oh!

michaelklishin commented Mar 23, 2017

Uh oh!

dcorbacho Apr 12, 2017

Uh oh!

Uh oh!

Per-vhost supervisors. #1158

Per-vhost supervisors. #1158

Uh oh!

Conversation

hairyhum commented Mar 23, 2017

Uh oh!

hairyhum commented Mar 23, 2017

Uh oh!

michaelklishin commented Mar 23, 2017

Uh oh!

dcorbacho Apr 12, 2017

Choose a reason for hiding this comment

Uh oh!

Uh oh!