Use erlang:system_info(creation) as GUID #3631

mkuratczyk · 2021-11-02T14:33:03Z

The goal of these changes is to prevent a false-positive network partition:
in some cases, in a multi-node cluster, a single node restart could trigger
node monitor to declare a partition and stop the remaining nodes.

Node GUID allows to differentiate between different incarnations of a node.
However, since rabbit may take some time to start (many queues/bindings, etc),
there could be a significant difference between Erlang VM being up and
responding to RPC calls and the new GUID being announced. During that
time, node monitor could incorrectly assume there was a network
partition, while in fact a node was simply restarted. With this change,
as soon as the Erlang VM is up, we can tell whether it was restarted and
avoid false positives.

Additionally, we now log if any queues were deleted on behalf of the
restarted node. This can take quite a long time if there are many transient
queues (eg. auto-delete queues). The longer this takes, the higher were the
odds of a restarted node being up again by the time
check_partial_partition was called. We may need to reconsider this logic
as well but for now - we just log this activity.

Co-authored-by: Loïc Hoguin [email protected]

michaelklishin · 2021-11-03T09:20:51Z

@Mergifyio rebase

Node GUID allows to differentiate between different incarnations of a node. However, since rabbit may take some time to start (many queues/bindings, etc), there could be a significant difference between Erlang VM being up and responding to RPC requests and the new GUID being announced. During that time, node monitor could incorrectly assume there was a network partition, while in fact a node was simply restarted. With this change, as soon as the Erlang VM is up, we can tell whether it was restarted and avoid false positives. Additionally, we now log if any queues were deleted on behalf of the restarted node. This can take quite a long time if there are many transient queues (eg. auto-delete queues). The longer this takes, the higher were the odds of a restarted node being up again by the time check_partial_partition was called. We may need to reconsider this logic as well but for now - we just log this activity. Co-authored-by: Loïc Hoguin <[email protected]>

mergify · 2021-11-03T09:21:01Z

rebase

✅ Branch has been successfully rebased

Use erlang:system_info(creation) as GUID (backport #3631)

Use erlang:system_info(creation) as GUID (cherry picked from commit 6318a7e) Conflicts: deps/rabbit/src/rabbit_node_monitor.erl

michaelklishin · 2021-11-03T10:37:25Z

Backported to v3.8.x for 3.8.24 manually.

lukebakken

👍

mkuratczyk added backport-v3.9.x labels Nov 2, 2021

HoloRin force-pushed the creation-as-guid branch from 0d9d3d7 to a92a532 Compare November 3, 2021 09:21

michaelklishin merged commit 6318a7e into master Nov 3, 2021

mergify bot mentioned this pull request Nov 3, 2021

Use erlang:system_info(creation) as GUID (backport #3631) #3641

Merged

michaelklishin deleted the creation-as-guid branch November 3, 2021 10:21

michaelklishin added a commit that referenced this pull request Nov 3, 2021

Merge pull request #3641 from rabbitmq/mergify/bp/v3.9.x/pr-3631

4845e3a

Use erlang:system_info(creation) as GUID (backport #3631)

michaelklishin added a commit that referenced this pull request Nov 3, 2021

Merge pull request #3631 from rabbitmq/creation-as-guid

de0d3b1

Use erlang:system_info(creation) as GUID (cherry picked from commit 6318a7e) Conflicts: deps/rabbit/src/rabbit_node_monitor.erl

lukebakken reviewed Nov 3, 2021

View reviewed changes

michaelklishin added this to the 3.9.9 milestone Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use erlang:system_info(creation) as GUID #3631

Use erlang:system_info(creation) as GUID #3631

Uh oh!

mkuratczyk commented Nov 2, 2021

Uh oh!

michaelklishin commented Nov 3, 2021

Uh oh!

mergify bot commented Nov 3, 2021

Uh oh!

michaelklishin commented Nov 3, 2021

Uh oh!

lukebakken left a comment

Uh oh!

Uh oh!

Use erlang:system_info(creation) as GUID #3631

Use erlang:system_info(creation) as GUID #3631

Uh oh!

Conversation

mkuratczyk commented Nov 2, 2021

Uh oh!

michaelklishin commented Nov 3, 2021

Uh oh!

mergify bot commented Nov 3, 2021

✅ Branch has been successfully rebased

Uh oh!

michaelklishin commented Nov 3, 2021

Uh oh!

lukebakken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!