Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" #11507

dumbbell · 2024-06-20T09:44:12Z

This reverts commit 8749c60.

Why

The patch was supposed to solve an issue that we didn't understand and that was likely a network/DNS problem outside of RabbitMQ. We know it didn't solve that issue because it was reported again 6 months after the initial pull request (#8411).

What we are sure however is that it increased the testing of RabbitMQ significantly because the code loops for 10+ minutes if the remote node is not running.

The retry in the Feature flags subsystem was not the right place either. The noconnection error is visible there because it runs earlier during RabbitMQ startup. But retrying there won't solve a network issue magically.

There are two ways to create a cluster:

peer discovery and this subsystem takes care of retries if necessary and appropriate
manually using the CLI, in which case the user is responsible for starting RabbitMQ nodes and clustering them

Let's revert it until the root cause is really understood.

kjnilsson · 2024-06-20T09:55:42Z

Additional observations:

a clearly wrong command like rabbitmqctl join_node this-is-not-a-node@argh takes over a minute to return. This is bad UX.
It makes an already slow test suite: clustering_managment_SUITE take longer than it needs to. There are two tests that test this functionality and each take over a minute to complete.

…onnection`" This reverts commit 8749c60. [Why] The patch was supposed to solve an issue that we didn't understand and that was likely a network/DNS problem outside of RabbitMQ. We know it didn't solve that issue because it was reported again 6 months after the initial pull request (#8411). What we are sure however is that it increased the testing of RabbitMQ significantly because the code loops for 10+ minutes if the remote node is not running. The retry in the Feature flags subsystem was not the right place either. The `noconnection` error is visible there because it runs earlier during RabbitMQ startup. But retrying there won't solve a network issue magically. There are two ways to create a cluster: 1. peer discovery and this subsystem takes care of retries if necessary and appropriate 2. manually using the CLI, in which case the user is responsible for starting RabbitMQ nodes and clustering them Let's revert it until the root cause is really understood.

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" (backport #11507)

dumbbell requested a review from kjnilsson June 20, 2024 09:44

dumbbell self-assigned this Jun 20, 2024

dumbbell force-pushed the revert-retry-after-noconnection-in-feature-flags-subsystem branch from ec41f06 to 61346ee Compare June 20, 2024 10:09

dumbbell force-pushed the revert-retry-after-noconnection-in-feature-flags-subsystem branch from 61346ee to d0c13b4 Compare June 20, 2024 12:30

dumbbell marked this pull request as ready for review June 20, 2024 14:29

kjnilsson approved these changes Jun 20, 2024

View reviewed changes

dumbbell merged commit 8f1219a into main Jun 20, 2024
251 checks passed

dumbbell deleted the revert-retry-after-noconnection-in-feature-flags-subsystem branch June 20, 2024 14:31

dumbbell added the backport-v3.13.x label Jul 9, 2024

mergify bot mentioned this pull request Jul 9, 2024

Revert "rabbit_feature_flags: Retry after erpc:call() fails with noconnection" (backport #11507) #11646

Merged

dumbbell added a commit that referenced this pull request Jul 10, 2024

Merge pull request #11646 from rabbitmq/mergify/bp/v3.13.x/pr-11507

2af9b09

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" (backport #11507)

michaelklishin mentioned this pull request Sep 12, 2024

Feature flags detection sometimes triggers erpc,noconnection #8346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" #11507

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" #11507

Uh oh!

dumbbell commented Jun 20, 2024 •

edited

Loading

Uh oh!

kjnilsson commented Jun 20, 2024

Uh oh!

Uh oh!

Uh oh!

Revert "rabbit_feature_flags: Retry after erpc:call() fails with noconnection" #11507

Revert "rabbit_feature_flags: Retry after erpc:call() fails with noconnection" #11507

Uh oh!

Conversation

dumbbell commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Uh oh!

kjnilsson commented Jun 20, 2024

Uh oh!

Uh oh!

Uh oh!

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" #11507

Revert "rabbit_feature_flags: Retry after erpc:call() fails with `noconnection`" #11507

dumbbell commented Jun 20, 2024 •

edited

Loading