Skip to content

Fix a potential race condition between channel recovery and acknowledgement offset #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 17, 2018

Conversation

vikinghawk
Copy link
Contributor

@vikinghawk vikinghawk commented Jan 12, 2018

Proposed Changes

We encountered recovery errors during a brief network outage for a slow consumer that was under heavy load. When attempting to acknowledge a delivery the channel was closed with
reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 0, class-id=60, method-id=80) and messages stopped being delivered.

I was able to create a test app to easily recreate the issue. After these changes I could not recreate.

Changes are:

  1. Don't allow sending an ack with tag=0 unless multiple=true. Reading thru the AMQP spec, 0 should only ever be valid when using multiple.
  2. Fix potential timing issue by inheriting the offset on the new channel before setting delegate=newChannel.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Further Comments

If this is a relatively large or complex change, kick off the discussion by
explaining why you chose the solution you did and what alternatives you
considered, etc.

@vikinghawk
Copy link
Contributor Author

@acogoluegnes acogoluegnes added this to the 4.4.3 milestone Jan 12, 2018
@acogoluegnes acogoluegnes self-assigned this Jan 12, 2018
@michaelklishin
Copy link
Contributor

Thank you. The latter is likely the root cause here but I have no problem with the former. @acogoluegnes WDYT?

@michaelklishin michaelklishin changed the title fix PRECONDITION_FAILED error after channel recovery Fix a potential race condition between channel recovery and acknowledgement offset Jan 12, 2018
@acogoluegnes
Copy link
Contributor

LGTM, will merge it in 4.4.x-stable once 4.4.2 is released.

@michaelklishin michaelklishin merged commit 5fcd92e into rabbitmq:4.4.x-stable Jan 17, 2018
@michaelklishin
Copy link
Contributor

@vikinghawk thanks again for your excellent contributions!

@vikinghawk vikinghawk mentioned this pull request Jan 18, 2018
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants