You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rabbit_feature_flags: Fix possible deadlock when calling the Code server (take 2)
[Why]
The background reason for this fix is about the same as the one
explained in the previous version of this fix; see commit
e0a2f10.
This time, the order of events that led to a similar deadlock is the
following:
0. No `rabbit_ff_registry` is loaded yet.
1. Process A, B and C call `rabbit_ff_registry:something()` indirectly
which triggers two initializations in parallel.
* Process A did it from an explicit call to
`rabbit_ff_registry_factory:initialize_factory()` during RabbitMQ
boot.
* Process B and C indirectly called it because they checked if a
feature flag was enabled.
2. Process B acquires the lock first and finishes the initialization. A
new registry is loaded and the old `rabbit_ff_registry` module copy
is marked as "old". At this point, process A and C still reference
that old copy because `rabbit_ff_registry:something()` is up above in
its call stack.
3. Process A acquires the lock, prepares the new registry and tries to
soft-purge the old `rabbit_ff_registry` copy before loading the new
one.
This is where the deadlock happens: process A requests the Code server
to purge the old copy, but the Code server waits for process C to stop
using it.
The difference between the steps described in the first bug fix
attempt's commit and these ones is that the process which lingers on the
deleted `rabbit_ff_registry` (process C above) isn't the one who
acquired the lock; process A has it.
That's why the first bug fix isn't effective in this case: it relied on
the fact that the process which lingers on the deleted
`rabbit_ff_registry` is the process which attempts to purge the module.
[How]
In this commit, we go with a more drastic change. This time, we put a
wrapper in front of `rabbit_ff_registry` called
`rabbit_ff_registry_wrapper`. This wrapper is responsible for doing the
automatic initialization if the loaded registry is the stub module. The
`rabbit_ff_registry` stub now always returns `init_required` instead of
performing the initialization and calling itself recursively.
This way, processes linger on `rabbit_ff_registry_wrapper`, not on
`rabbit_ff_registry`. Thanks to this, the Code server can proceed with
the purge.
See #8112.
0 commit comments