Skip to content

Do not set table-wide and partial locks when deleting bindings. #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 15, 2018

Conversation

hairyhum
Copy link
Contributor

@hairyhum hairyhum commented May 4, 2018

Table-wide locks are supposed to improve performance for bulk oprations
for a single entity, but block operations for multole entities, for example
exclusive queues.
Partial locks were there since forever without any clear justification,
and I don't see any good reasons to keep them. Delete operation will aquire
a lock for an actual record avoiding partial table locks.

More performance tests required. In particular deletion of exchanges/queues with multiple bindings

Addresses #1566
[#156352963]

Instead of locking entire table we can use a custom global lock on
the affected resource (source or destination).
This can improve performance when multiple records deleted at the
same time, for example when connection with exclusive queues closes.
Resource lock also aquired when adding or removing a binding, so it
won't conflict with bulk removal.

Addresses #1566
[#156352963]
@hairyhum hairyhum force-pushed the exclusive-queues-cleanup-optimisation branch from 2800d08 to 5cdee15 Compare May 14, 2018 15:06
hairyhum added 2 commits May 14, 2018 17:49
Dirty deletes are faster and idempotent, which means
that it can be run in transaction as long as it's locked
in the begining of transaction, which is done in
`lock_resource`.
Speed improvement is aquired by not setting record locks
for each record, since we already have a record lock

Addresses #1566
[#156352963]
@hairyhum hairyhum changed the title WIP. Do not set table-wide and partial locks when deleting bindings. Do not set table-wide and partial locks when deleting bindings. May 15, 2018
@hairyhum
Copy link
Contributor Author

hairyhum commented May 15, 2018

Benchmark data for the change. Collected in 10 manual runs.

Topology Action with table locks & non-ditry ops with record locks and dirty ops
100000 exclusive queues Connection close ~53 500 millisec ~22 200 millisec
10000 exchanges, 1 queue Queue delete ~830 millisec ~260 millisec
10000 queues, 1 exchange Exchange delete ~620 millisec ~350 millisec
1 queue, 1 exchange, 10000 bindings Queue delete ~610 millisec ~250 millisec
1 queue, 1000 exchanges, 100 bindings Queue delete ~6 450 millisec ~2 670 millisec
1000 queues, 1 exchange, 100 bindings Exchange delete ~6 670 millisec ~2 450 millisec

@michaelklishin
Copy link
Collaborator

I can reproduce the gains mostly with our primary use case with a lot of exclusive queues.

@lukebakken
Copy link
Collaborator

Nice work @hairyhum

%% This works better when there are multiple resources deleted at once, for
%% example when exclusive queues are deleted.
lock_resource(Name) ->
mnesia:lock({global, Name, mnesia:table_info(rabbit_route, where_to_write)},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this relies on resource name to be unique, which with resource names for e.g. queues (which include vhost name) should be good enough. The table name here is only used to find the list of replicas, which will be the same for all tables.

@dumbbell
Copy link
Collaborator

That's awesome! Good job :-)

@hairyhum
Copy link
Contributor Author

@michaelklishin Considering locks in mixed-version clusters. In the current code there is no update operation on bindings. Deletes should be idempotent which leaves add operation as the only one which can conflict with dirty deletes. In the current code add operation is performed in transaction with a resource check - read from mnesia on both resources, and cleanup is in transaction with the delete operation on one of the resources which is being deleted. This means that the "add" transaction is supposed to be locked on the deleted resource. In theory the global lock is not needed here, but it is there to avoid leaking abstractions if we change the code in future.

@michaelklishin
Copy link
Collaborator

michaelklishin commented May 16, 2018

Backported to 3.7.x and will be in 3.7.6.

@michaelklishin michaelklishin deleted the exclusive-queues-cleanup-optimisation branch May 16, 2018 20:12
hairyhum added a commit that referenced this pull request May 18, 2018
Follow up to 5cdee15
dirty_match_object does not provide much performance improvement
while it's breaking auto-delete exchanges cleanup.
A transaction with a binding deletion will call auto-delete
exchange removal, which will call a cleanup. On this cleanup
the deleted binding should not be dirty-deleted again.

Follow-up to #1589
[#156352963]
michaelklishin pushed a commit that referenced this pull request May 19, 2018
Follow up to 5cdee15
dirty_match_object does not provide much performance improvement
while it's breaking auto-delete exchanges cleanup.
A transaction with a binding deletion will call auto-delete
exchange removal, which will call a cleanup. On this cleanup
the deleted binding should not be dirty-deleted again.

Follow-up to #1589
[#156352963]

(cherry picked from commit d0423f9)
@antoine-galataud
Copy link

Awesome work!
@michaelklishin is backporting in 3.6 still in the roadmap?

@michaelklishin
Copy link
Collaborator

@antoine-galataud there will be no more 3.6.x releases except for security patches, sorry.

@michaelklishin
Copy link
Collaborator

michaelklishin commented Aug 29, 2018

As part of working on rabbitmq/rabbitmq-consistent-hash-exchange#37 (rabbitmq/rabbitmq-consistent-hash-exchange#38, rabbitmq/rabbitmq-consistent-hash-exchange#39) we have identified scenarios in which this grouped binding deletion results in non-idempotent transaction retries, which means stateful exchange types can leave some of their state behind without knowing it.

We are not yet sure we can make them idempotent without changing binding table schema (which can only be done for 3.8.0 and beyond), so we will partially or completely revert this optimization 😞 . A lot has been learnt in this PR and rabbitmq/rabbitmq-consistent-hash-exchange#37, though, so it will influence our future approach to binding storage and removal.

@michaelklishin
Copy link
Collaborator

Unfrotunately we will have to undo the changes that make most of the difference in this PR. They can lead to inconsistent views of the binding tables: transactions that perform removal are not idempotent and this can lead to situations where a rolled back transaction "restores" all bindings deleted by the exchange type module but its retry doesn't find any bindings to delete and so there are bindings left around.

This was discovered during QA of rabbitmq/rabbitmq-consistent-hash-exchange#39 and is catastrophic for that particular exchange type since it has state that depends on the state of bindings and any consistency will break routing.

The least efficient part of queue deletion is binding deletion because there is currently no way to load bindings of a queue (or exchange) without a full scan on one of the binding tables. This unfortunate design can be mitigated with secondary indices but this would be a breaking schema change and therefore can only go into 3.8.0.

Per discussion with @dcorbacho @hairyhum @kjnilsson @dumbbell, kudos to @acogoluegnes for reproducing the issue in rabbitmq/rabbitmq-consistent-hash-exchange#39.

dcorbacho added a commit that referenced this pull request Aug 30, 2018
Optimizations introduced in #1589 caused the removal of bindings to
be non-idempotent, as the removal of routing information with dirty
deletes did not allow for full transaction rollback. This commit
reverts that change, losing some of the performance improvements in
favour of data correctness.
dcorbacho added a commit that referenced this pull request Aug 30, 2018
Optimizations introduced in #1589 caused the removal of bindings to
be non-idempotent, as the removal of routing information with dirty
deletes did not allow for full transaction rollback. This commit
reverts that change, losing some of the performance improvements in
favour of data correctness.

[#160100569]
@dcorbacho dcorbacho mentioned this pull request Aug 30, 2018
11 tasks
michaelklishin pushed a commit that referenced this pull request Aug 31, 2018
Optimizations introduced in #1589 caused the removal of bindings to
be non-idempotent, as the removal of routing information with dirty
deletes did not allow for full transaction rollback. This commit
reverts that change, losing some of the performance improvements in
favour of data correctness.

[#160100569]

(cherry picked from commit 495ead3)
@hairyhum hairyhum mentioned this pull request Sep 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants