Skip to content

Commit 08b5aca

Browse files
committed
Remove randomized startup delays
On initial cluster formation, only one node in a multi node cluster should initialize the Mnesia database schema (i.e. form the cluster). To ensure that for nodes starting up in parallel, RabbitMQ peer discovery backends have used either locks or randomized startup delays. Locks work great: When a node holds the lock, it either starts a new blank node (if there is no other node in the cluster), or it joins an existing node. This makes it impossible to have two nodes forming the cluster at the same time. Consul and etcd peer discovery backends use locks. The lock is acquired in the consul and etcd infrastructure, respectively. For other peer discovery backends (classic, DNS, AWS), randomized startup delays were used. They work good enough in most cases. However, in rabbitmq/cluster-operator#662 we observed that in 1% - 10% of the cases (the more nodes or the smaller the randomized startup delay range, the higher the chances), two nodes decide to form the cluster. That's bad since it will end up in a single Erlang cluster, but in two RabbitMQ clusters. Even worse, no obvious alert got triggered or error message logged. To solve this issue, one could increase the randomized startup delay range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster formation very slow since it will take up to 3 minutes until every node is ready. In rare cases, we still end up with two nodes forming the cluster. Another way to solve the problem is to name a dedicated node to be the seed node (forming the cluster). This was explored in rabbitmq/cluster-operator#689 and works well. Two minor downsides to this approach are: 1. If the seed node never becomes available, the whole cluster won't be formed (which is okay), and 2. it doesn't integrate with existing dynamic peer discovery backends (e.g. K8s, AWS) since nodes are not yet known at deploy time. In this commit, we take a better approach: We remove randomized startup delays altogether. We replace them with locks. However, instead of implementing our own lock implementation in an external system (e.g. in K8s), we re-use Erlang's locking mechanism global:set_lock/3. global:set_lock/3 has some convenient properties: 1. It accepts a list of nodes to set the lock on. 2. The nodes in that list connect to each other (i.e. create an Erlang cluster). 3. The method is synchronous with a timeout (number of retries). It blocks until the lock becomes available. 4. If a process that holds a lock dies, or the node goes down, the lock held by the process is deleted. The list of nodes passed to global:set_lock/3 corresponds to the nodes the peer discovery backend discovers (lists). Two special cases worth mentioning: 1. That list can be all desired nodes in the cluster (e.g. in classic peer discovery where nodes are known at deploy time) while only a subset of nodes is available. In that case, global:set_lock/3 still sets the lock not blocking until all nodes can be connected to. This is good since nodes might start sequentially (non-parallel). 2. In dynamic peer discovery backends (e.g. K8s, AWS), this list can be just a subset of desired nodes since nodes might not startup in parallel. That's also not a problem as long as the following requirement is met: "The peer disovery backend does not list two disjoint sets of nodes (on different nodes) at the same time." For example, in a 2-node cluster, the peer discovery backend must not list only node 1 on node 1 and only node 2 on node 2. Existing peer discovery backends fullfil that requirement because the resource the nodes are discovered from is global. For example, in K8s, once node 1 is part of the Endpoints object, it will be returned on both node 1 and node 2. Likewise, in AWS, once node 1 started, the described list of instances with a specific tag will include node 1 when the AWS peer discovery backend runs on node 1 or node 2. Removing randomized startup delays also makes cluster formation considerably faster (up to 1 minute faster if that was the upper bound in the range).
1 parent 2a217f0 commit 08b5aca

File tree

17 files changed

+327
-249
lines changed

17 files changed

+327
-249
lines changed

deps/rabbit/priv/schema/rabbit.schema

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -945,31 +945,37 @@ fun(Conf) ->
945945
end}.
946946

947947
%% Cluster formation: Randomized startup delay
948+
%%
949+
%% DEPRECATED: This is a no-op. Old configs are still allowed, but a warning will be printed.
948950

949-
{mapping, "cluster_formation.randomized_startup_delay_range.min", "rabbit.cluster_formation.randomized_startup_delay_range",
950-
[{datatype, integer}]}.
951-
{mapping, "cluster_formation.randomized_startup_delay_range.max", "rabbit.cluster_formation.randomized_startup_delay_range",
952-
[{datatype, integer}]}.
951+
{mapping, "cluster_formation.randomized_startup_delay_range.min", "rabbit.cluster_formation.randomized_startup_delay_range", []}.
952+
{mapping, "cluster_formation.randomized_startup_delay_range.max", "rabbit.cluster_formation.randomized_startup_delay_range", []}.
953953

954954
{translation, "rabbit.cluster_formation.randomized_startup_delay_range",
955955
fun(Conf) ->
956956
Min = cuttlefish:conf_get("cluster_formation.randomized_startup_delay_range.min", Conf, undefined),
957957
Max = cuttlefish:conf_get("cluster_formation.randomized_startup_delay_range.max", Conf, undefined),
958958

959959
case {Min, Max} of
960-
{undefined, undefined} ->
961-
cuttlefish:unset();
962-
{undefined, Max} ->
963-
%% fallback default
964-
{5, Max};
965-
{Min, undefined} ->
966-
%% fallback default
967-
{Min, 60};
968-
{Min, Max} ->
969-
{Min, Max}
970-
end
960+
{undefined, undefined} ->
961+
ok;
962+
_ ->
963+
cuttlefish:warn("cluster_formation.randomized_startup_delay_range.min and "
964+
"cluster_formation.randomized_startup_delay_range.max are deprecated")
965+
end,
966+
cuttlefish:unset()
971967
end}.
972968

969+
%% Cluster formation: lock acquisition retries as passed to https://erlang.org/doc/man/global.html#set_lock-3
970+
%%
971+
%% Currently used in classic, k8s, and aws peer discovery backends.
972+
973+
{mapping, "cluster_formation.erlang_lock_retries", "rabbit.cluster_formation.erlang_lock_retries",
974+
[
975+
{datatype, integer},
976+
{validators, ["non_zero_positive_integer"]}
977+
]}.
978+
973979
%% Cluster formation: discovery failure retries
974980

975981
{mapping, "cluster_formation.lock_retry_limit", "rabbit.cluster_formation.lock_retry_limit",

deps/rabbit/src/rabbit_mnesia.erl

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -105,21 +105,18 @@ init_with_lock(Retries, Timeout, RunPeerDiscovery) ->
105105
rabbit_log:debug("rabbit_peer_discovery:lock returned ~p", [LockResult]),
106106
case LockResult of
107107
not_supported ->
108-
rabbit_log:info("Peer discovery backend does not support locking, falling back to randomized delay"),
109-
%% See rabbitmq/rabbitmq-server#1202 for details.
110-
rabbit_peer_discovery:maybe_inject_randomized_delay(),
111108
RunPeerDiscovery(),
112109
rabbit_peer_discovery:maybe_register();
113-
{error, _Reason} ->
114-
timer:sleep(Timeout),
115-
init_with_lock(Retries - 1, Timeout, RunPeerDiscovery);
116110
{ok, Data} ->
117111
try
118112
RunPeerDiscovery(),
119113
rabbit_peer_discovery:maybe_register()
120114
after
121115
rabbit_peer_discovery:unlock(Data)
122-
end
116+
end;
117+
{error, _Reason} ->
118+
timer:sleep(Timeout),
119+
init_with_lock(Retries - 1, Timeout, RunPeerDiscovery)
123120
end.
124121

125122
-spec run_peer_discovery() -> ok | {[node()], node_type()}.
@@ -178,7 +175,7 @@ join_discovered_peers(TryNodes, NodeType) ->
178175
join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft, DelayInterval).
179176

180177
join_discovered_peers_with_retries(TryNodes, _NodeType, 0, _DelayInterval) ->
181-
rabbit_log:warning(
178+
rabbit_log:info(
182179
"Could not successfully contact any node of: ~s (as in Erlang distribution). "
183180
"Starting as a blank standalone node...",
184181
[string:join(lists:map(fun atom_to_list/1, TryNodes), ",")]),
@@ -193,7 +190,7 @@ join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft, DelayInterva
193190
rabbit_node_monitor:notify_joined_cluster();
194191
none ->
195192
RetriesLeft1 = RetriesLeft - 1,
196-
rabbit_log:error("Trying to join discovered peers failed. Will retry after a delay of ~b ms, ~b retries left...",
193+
rabbit_log:info("Trying to join discovered peers failed. Will retry after a delay of ~b ms, ~b retries left...",
197194
[DelayInterval, RetriesLeft1]),
198195
timer:sleep(DelayInterval),
199196
join_discovered_peers_with_retries(TryNodes, NodeType, RetriesLeft1, DelayInterval)

deps/rabbit/src/rabbit_peer_discovery.erl

Lines changed: 1 addition & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@
1414
-export([maybe_init/0, discover_cluster_nodes/0, backend/0, node_type/0,
1515
normalize/1, format_discovered_nodes/1, log_configured_backend/0,
1616
register/0, unregister/0, maybe_register/0, maybe_unregister/0,
17-
maybe_inject_randomized_delay/0, lock/0, unlock/1,
18-
discovery_retries/0]).
17+
lock/0, unlock/1, discovery_retries/0]).
1918
-export([append_node_prefix/1, node_prefix/0, locking_retry_timeout/0,
2019
lock_acquisition_failure_mode/0]).
2120

@@ -28,9 +27,6 @@
2827
%% default node prefix to attach to discovered hostnames
2928
-define(DEFAULT_PREFIX, "rabbit").
3029

31-
%% default randomized delay range, in seconds
32-
-define(DEFAULT_STARTUP_RANDOMIZED_DELAY, {5, 60}).
33-
3430
%% default discovery retries and interval.
3531
-define(DEFAULT_DISCOVERY_RETRY_COUNT, 10).
3632
-define(DEFAULT_DISCOVERY_RETRY_INTERVAL_MS, 500).
@@ -159,61 +155,6 @@ discovery_retries() ->
159155
{?DEFAULT_DISCOVERY_RETRY_COUNT, ?DEFAULT_DISCOVERY_RETRY_INTERVAL_MS}
160156
end.
161157

162-
163-
-spec maybe_inject_randomized_delay() -> ok.
164-
maybe_inject_randomized_delay() ->
165-
Backend = backend(),
166-
case Backend:supports_registration() of
167-
true ->
168-
rabbit_log:info("Peer discovery backend ~s supports registration.", [Backend]),
169-
inject_randomized_delay();
170-
false ->
171-
rabbit_log:info("Peer discovery backend ~s does not support registration, skipping randomized startup delay.", [Backend]),
172-
ok
173-
end.
174-
175-
-spec inject_randomized_delay() -> ok.
176-
177-
inject_randomized_delay() ->
178-
{Min, Max} = randomized_delay_range_in_ms(),
179-
case {Min, Max} of
180-
%% When the max value is set to 0, consider the delay to be disabled.
181-
%% In addition, `rand:uniform/1` will fail with a "no function clause"
182-
%% when the argument is 0.
183-
{_, 0} ->
184-
rabbit_log:info("Randomized delay range's upper bound is set to 0. Considering it disabled."),
185-
ok;
186-
{_, N} when is_number(N) ->
187-
rand:seed(exsplus),
188-
RandomVal = rand:uniform(round(N)),
189-
rabbit_log:debug("Randomized startup delay: configured range is from ~p to ~p milliseconds, PRNG pick: ~p...",
190-
[Min, Max, RandomVal]),
191-
Effective = case RandomVal < Min of
192-
true -> Min;
193-
false -> RandomVal
194-
end,
195-
rabbit_log:info("Will wait for ~p milliseconds before proceeding with registration...", [Effective]),
196-
timer:sleep(Effective),
197-
ok
198-
end.
199-
200-
-spec randomized_delay_range_in_ms() -> {integer(), integer()}.
201-
202-
randomized_delay_range_in_ms() ->
203-
Backend = backend(),
204-
Default = case erlang:function_exported(Backend, randomized_startup_delay_range, 0) of
205-
true -> Backend:randomized_startup_delay_range();
206-
false -> ?DEFAULT_STARTUP_RANDOMIZED_DELAY
207-
end,
208-
{Min, Max} = case application:get_env(rabbit, cluster_formation) of
209-
{ok, Proplist} ->
210-
proplists:get_value(randomized_startup_delay_range, Proplist, Default);
211-
undefined ->
212-
Default
213-
end,
214-
{Min * 1000, Max * 1000}.
215-
216-
217158
-spec register() -> ok.
218159

219160
register() ->

deps/rabbit/src/rabbit_peer_discovery_classic_config.erl

Lines changed: 42 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
-export([list_nodes/0, supports_registration/0, register/0, unregister/0,
1414
post_registration/0, lock/1, unlock/1]).
1515

16+
%% TODO ansd: see todo below
17+
-define(DEFAULT_LOCK_RETRIES, 80).
1618
%%
1719
%% API
1820
%%
@@ -26,12 +28,46 @@ list_nodes() ->
2628
Nodes when is_list(Nodes) -> {ok, {Nodes, disc}}
2729
end.
2830

31+
-spec lock(Node :: atom()) -> {ok, {ResourceId :: term(), LockRequesterId :: term()}} | {error, Reason :: string()}.
32+
33+
lock(Node) ->
34+
%% TODO ansd: instead of below lines, ideally we reuse functions rabbit_peer_discovery_util:lock_id/1 and
35+
%% rabbit_peer_discovery_util:lock_retries/0
36+
%% When using these two functions, executing tests failed since node couldn't boot: undef rabbit_peer_discovery_util:lock_id/1
37+
%% Can the rabbit app depend on the rabbit_peer_discovery_common_app?
38+
%% If not, should we move these two functions to some other place (e.g. rabbit_common)?
39+
LockId = {rabbit_nodes:cookie_hash(), Node},
40+
Retries = case application:get_env(rabbit, cluster_formation) of
41+
{ok, PropList} ->
42+
proplists:get_value(erlang_lock_retries, PropList, ?DEFAULT_LOCK_RETRIES);
43+
undefined ->
44+
?DEFAULT_LOCK_RETRIES
45+
end,
46+
47+
Nodes = node_list(),
48+
case lists:member(Node, Nodes) of
49+
false when Nodes =/= [] ->
50+
rabbit_log:warning("Local node ~s is not part of configured nodes ~p. "
51+
"This might lead to incorrect cluster formation.", [Node, Nodes]);
52+
_ -> ok
53+
end,
54+
case global:set_lock(LockId, Nodes, Retries) of
55+
true ->
56+
{ok, LockId};
57+
false ->
58+
{error, io_lib:format("Acquiring lock taking too long, bailing out after ~b retries", [Retries])}
59+
end.
60+
61+
-spec unlock(Data :: term()) -> ok.
62+
63+
unlock(LockId) ->
64+
global:del_lock(LockId, node_list()),
65+
ok.
66+
2967
-spec supports_registration() -> boolean().
3068

3169
supports_registration() ->
32-
%% If we don't have any nodes configured, skip randomized delay and similar operations
33-
%% as we don't want to delay startup for no reason. MK.
34-
has_any_peer_nodes_configured().
70+
false.
3571

3672
-spec register() -> ok.
3773

@@ -48,28 +84,14 @@ unregister() ->
4884
post_registration() ->
4985
ok.
5086

51-
-spec lock(Node :: atom()) -> not_supported.
52-
53-
lock(_Node) ->
54-
not_supported.
55-
56-
-spec unlock(Data :: term()) -> ok.
57-
58-
unlock(_Data) ->
59-
ok.
60-
6187
%%
6288
%% Helpers
6389
%%
6490

65-
has_any_peer_nodes_configured() ->
91+
node_list() ->
6692
case application:get_env(rabbit, cluster_nodes, []) of
67-
{[], _NodeType} ->
68-
false;
6993
{Nodes, _NodeType} when is_list(Nodes) ->
70-
true;
71-
[] ->
72-
false;
94+
Nodes;
7395
Nodes when is_list(Nodes) ->
74-
true
96+
Nodes
7597
end.

deps/rabbit/test/config_schema_SUITE_data/rabbit.snippets

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -540,23 +540,22 @@ tcp_listen_options.exit_on_close = false",
540540
{cluster_formation_randomized_startup_delay_both_values,
541541
"cluster_formation.randomized_startup_delay_range.min = 10
542542
cluster_formation.randomized_startup_delay_range.max = 30",
543-
[{rabbit, [{cluster_formation, [
544-
{randomized_startup_delay_range, {10, 30}}
545-
]}]}],
543+
[],
546544
[]},
547545

548546
{cluster_formation_randomized_startup_delay_min_only,
549547
"cluster_formation.randomized_startup_delay_range.min = 10",
550-
[{rabbit, [{cluster_formation, [
551-
{randomized_startup_delay_range, {10, 60}}
552-
]}]}],
548+
[],
553549
[]},
554550

555551
{cluster_formation_randomized_startup_delay_max_only,
556552
"cluster_formation.randomized_startup_delay_range.max = 30",
557-
[{rabbit, [{cluster_formation, [
558-
{randomized_startup_delay_range, {5, 30}}
559-
]}]}],
553+
[],
554+
[]},
555+
556+
{cluster_formation_erlang_lock_retries,
557+
"cluster_formation.erlang_lock_retries = 10",
558+
[{rabbit,[{cluster_formation,[{erlang_lock_retries,10}]}]}],
560559
[]},
561560

562561
{cluster_formation_dns,

0 commit comments

Comments
 (0)