-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Do not set table-wide and partial locks when deleting bindings. #1589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of locking entire table we can use a custom global lock on the affected resource (source or destination). This can improve performance when multiple records deleted at the same time, for example when connection with exclusive queues closes. Resource lock also aquired when adding or removing a binding, so it won't conflict with bulk removal. Addresses #1566 [#156352963]
2800d08
to
5cdee15
Compare
Dirty deletes are faster and idempotent, which means that it can be run in transaction as long as it's locked in the begining of transaction, which is done in `lock_resource`. Speed improvement is aquired by not setting record locks for each record, since we already have a record lock Addresses #1566 [#156352963]
Benchmark data for the change. Collected in 10 manual runs.
|
I can reproduce the gains mostly with our primary use case with a lot of exclusive queues. |
Nice work @hairyhum |
%% This works better when there are multiple resources deleted at once, for | ||
%% example when exclusive queues are deleted. | ||
lock_resource(Name) -> | ||
mnesia:lock({global, Name, mnesia:table_info(rabbit_route, where_to_write)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this relies on resource name to be unique, which with resource names for e.g. queues (which include vhost name) should be good enough. The table name here is only used to find the list of replicas, which will be the same for all tables.
That's awesome! Good job :-) |
@michaelklishin Considering locks in mixed-version clusters. In the current code there is no update operation on bindings. Deletes should be idempotent which leaves add operation as the only one which can conflict with dirty deletes. In the current code add operation is performed in transaction with a resource check - read from mnesia on both resources, and cleanup is in transaction with the delete operation on one of the resources which is being deleted. This means that the "add" transaction is supposed to be locked on the deleted resource. In theory the global lock is not needed here, but it is there to avoid leaking abstractions if we change the code in future. |
Backported to 3.7.x and will be in 3.7.6. |
Follow up to 5cdee15 dirty_match_object does not provide much performance improvement while it's breaking auto-delete exchanges cleanup. A transaction with a binding deletion will call auto-delete exchange removal, which will call a cleanup. On this cleanup the deleted binding should not be dirty-deleted again. Follow-up to #1589 [#156352963]
Follow up to 5cdee15 dirty_match_object does not provide much performance improvement while it's breaking auto-delete exchanges cleanup. A transaction with a binding deletion will call auto-delete exchange removal, which will call a cleanup. On this cleanup the deleted binding should not be dirty-deleted again. Follow-up to #1589 [#156352963] (cherry picked from commit d0423f9)
Awesome work! |
@antoine-galataud there will be no more 3.6.x releases except for security patches, sorry. |
As part of working on rabbitmq/rabbitmq-consistent-hash-exchange#37 (rabbitmq/rabbitmq-consistent-hash-exchange#38, rabbitmq/rabbitmq-consistent-hash-exchange#39) we have identified scenarios in which this grouped binding deletion results in non-idempotent transaction retries, which means stateful exchange types can leave some of their state behind without knowing it. We are not yet sure we can make them idempotent without changing binding table schema (which can only be done for 3.8.0 and beyond), so we will partially or completely revert this optimization 😞 . A lot has been learnt in this PR and rabbitmq/rabbitmq-consistent-hash-exchange#37, though, so it will influence our future approach to binding storage and removal. |
Unfrotunately we will have to undo the changes that make most of the difference in this PR. They can lead to inconsistent views of the binding tables: transactions that perform removal are not idempotent and this can lead to situations where a rolled back transaction "restores" all bindings deleted by the exchange type module but its retry doesn't find any bindings to delete and so there are bindings left around. This was discovered during QA of rabbitmq/rabbitmq-consistent-hash-exchange#39 and is catastrophic for that particular exchange type since it has state that depends on the state of bindings and any consistency will break routing. The least efficient part of queue deletion is binding deletion because there is currently no way to load bindings of a queue (or exchange) without a full scan on one of the binding tables. This unfortunate design can be mitigated with secondary indices but this would be a breaking schema change and therefore can only go into 3.8.0. Per discussion with @dcorbacho @hairyhum @kjnilsson @dumbbell, kudos to @acogoluegnes for reproducing the issue in rabbitmq/rabbitmq-consistent-hash-exchange#39. |
Optimizations introduced in #1589 caused the removal of bindings to be non-idempotent, as the removal of routing information with dirty deletes did not allow for full transaction rollback. This commit reverts that change, losing some of the performance improvements in favour of data correctness.
Optimizations introduced in #1589 caused the removal of bindings to be non-idempotent, as the removal of routing information with dirty deletes did not allow for full transaction rollback. This commit reverts that change, losing some of the performance improvements in favour of data correctness. [#160100569]
Optimizations introduced in #1589 caused the removal of bindings to be non-idempotent, as the removal of routing information with dirty deletes did not allow for full transaction rollback. This commit reverts that change, losing some of the performance improvements in favour of data correctness. [#160100569] (cherry picked from commit 495ead3)
Table-wide locks are supposed to improve performance for bulk oprations
for a single entity, but block operations for multole entities, for example
exclusive queues.
Partial locks were there since forever without any clear justification,
and I don't see any good reasons to keep them. Delete operation will aquire
a lock for an actual record avoiding partial table locks.
More performance tests required. In particular deletion of exchanges/queues with multiple bindings
Addresses #1566
[#156352963]