Skip to content

combine queue deletion and binding deletion in one transaction #2190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jshen28
Copy link

@jshen28 jshen28 commented Dec 23, 2019

Proposed Changes

previous, when node down, rabbit_binding:process_deletes and
delete_queues_on_node_down are performed in separate transactions.
this patch tries to merge them into one transaction.

compared with running rabbit_binding:process_deletions in one
big transaction, splitting it up and running with queue deletion
should be more ideal.

Signed-off-by: ushen [email protected]

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

previous, when node down, rabbit_binding:process_deletes and
delete_queues_on_node_down are performed in separate transactions.
this patch tries to merge them into one transaction.

compared with running rabbit_binding:process_deletions in one
big transaction, splitting it up and running with queue deletion
should be more ideal.

Signed-off-by: ushen <[email protected]>
@michaelklishin
Copy link
Collaborator

Thank you but we won't accept this change.

We have previously explained in detail (on the mailing list) why we moved away from a single giant transaction design (#1513) and why we believe that it is a net positive, even though #1721/#1832 potentially makes this less relevant. We also explained that publishers must be ready to recover from unsuccessful publishes, which would address the originally reported issue.

If you can provide evidence (data from benchmarks) that the original contention problem is no longer present with #1721, we would reconsider this. However, a more likely timeframe for that is 4.0 when we'd have a new schema data store.

Other issues related to bulk binding removal: #1675, #1691, #1715.

@jshen28
Copy link
Author

jshen28 commented Dec 24, 2019

Hi, thank you for answering. Could you explain more about "evidence"? what benchmark should I provide? Is it enough to run perftest on this patch and latest release, and compare the result? Do you have any more specific requirements for test setup?

@lukebakken
Copy link
Collaborator

@jshen28 - please read the issues that @michaelklishin linked to. In the discussion, you can see how we analyzed the issue. Basically, it involved deleting very large numbers of queues and bindings at once.

@lukebakken
Copy link
Collaborator

lukebakken commented Dec 27, 2019

@jshen28 here is an example of the kind of test we are looking for.

  • I used this code to create 65536 queues that each have a binding to the gh-2190 exchange. Note that the queues are "auto delete", so each queue and its binding will be deleted when RabbitMQ stops.
  • Then, run a command like this to time how long shutting RabbitMQ down takes:
    /usr/bin/time -o /home/lbakken/issues/rabbitmq/rabbitmq-server/gh-2190/master-shutdown-timed-4.txt ./sbin/rabbitmqctl shutdown --wait --timeout 120000
    

Here is the result of four shutdowns using the master branch of code, and your PR code. You can see that the timing is similar, which is encouraging:

$ cat ~/issues/rabbitmq/rabbitmq-server/gh-2190/master-shutdown-timed-*
       16.43 real         0.50 user         0.05 sys
       21.65 real         0.45 user         0.07 sys
       17.09 real         0.43 user         0.06 sys
       13.20 real         0.45 user         0.06 sys

$ cat ~/issues/rabbitmq/rabbitmq-server/gh-2190/jshen28-clean-in-one-transaction-shutdown-timed-*
       19.29 real         0.41 user         0.07 sys
       16.18 real         0.45 user         0.08 sys
       16.04 real         0.47 user         0.05 sys
       20.04 real         0.40 user         0.10 sys

I will see if I can figure out the additional performance tests that were done. @michaelklishin do you have any test scripts or details available?

@jshen28
Copy link
Author

jshen28 commented Dec 30, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants