Skip to content

Recover bindings for all durable queues including failed to recover. #1878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 17, 2019

Conversation

hairyhum
Copy link
Contributor

If a queue fails to recover it may still be restarted by the supervisor
and eventually start. After that some bindings may be in rabbit_durable_route
but not rabbit_route. This can cause binding not found errors.

If bindings are recovered for failed queues, the behaviour will be
the same as for the crashed queues. (which is currently broken
but needs to be fixed separately)

Addresses #1873
[#163919158]

This will reqruire a separate PR for 3.7

hairyhum and others added 2 commits February 12, 2019 17:26
If a queue fails to recover it may still be restarted by the supervisor
and eventually start. After that some bindings may be in rabbit_durable_route
but not rabbit_route. This can cause binding not found errors.

If bindings are recovered for failed queues, the behaviour will be
the same as for the crashed queues. (which is currently broken
but needs to be fixed separately)

Addresses #1873
[#163919158]
@hairyhum
Copy link
Contributor Author

Separate PR for 3.7

@Ayanda-D
Copy link
Contributor

ah, solving this weird case https://groups.google.com/forum/#!topic/rabbitmq-users/-X19dq8cYaY thank you! 👍

@michaelklishin
Copy link
Collaborator

michaelklishin commented Feb 13, 2019

@Ayanda-D we don't yet know and would not claim it is sufficient because there was no reliable way to reproduce any of those reports :(

@Ayanda-D
Copy link
Contributor

@michaelklishin ok, this was reported to us as well, but haven't been able to to reproduce it either :/ seemed the error handling could be improved as well? When binding lookup in rabbit_route is done and not found, but exists in rabbit_durable_route, before immediately returning not_found to the channel, maybe do/attempt a re-sync from rabbit_durable_route, to update rabbit_route and rabbit_semi_durable_route? And only return not_found if binding still doesn't exist after a "failed re-synch" attempt (if such a case can ever occur)? coz if its in rabbit_durable_route then surely the binding exists, and returning not_found here doesn't sound right. Some thoughts, guess this PR resolving the issue on recovery then maybe this case may never be hit on runtime. We'll see 😅

@hairyhum
Copy link
Contributor Author

@Ayanda-D this PR will fix binding not found errors only if there were failed queues. You can check that in logs searching for Queue <...> failed to initialise in the logs.

The option to allow recovery of bindings does also make sense. I was thinking of letting queue.bind to create bindings even if there is an inconsistency, because that's what clients use to recover the topologies.

@Ayanda-D
Copy link
Contributor

Ok, sounds good 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants