Support configure max sync throughput in CMQs #3925
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed Changes
We would like to propose a backward compatible change in order to reduce memory utilization during classic mirrored queue sync.
Specifically, in addition to the existing
mirroring_sync_batch_size
configuration key, which controls maximum number of messages per sync batch, we propose to add a new configuration keymirroring_sync_max_throughput
in bytes per second to control maximum synchronization throughput. Thesyncer
at the primary node will make sure the throughput of messages being synchronized does not exceed the value configured. This will give mirroring nodes sufficient time to page messages being synchronized to disk and quickly free up memory.The motivation for this is that, in our test with a 3-node cluster on AWS with data volumes using EBS gp2 (maximum disk write throughput is 250MiB/s), we observe that the memory utilization during queue sync is high leading to a memory alarm. We suspect that when disk write throughput is low, mirroring nodes take time to page messages to disk while the syncer keeps broadcasting messages at a higher rate. As a result, most messages are kept in memory on the mirror(s) and are waiting to be paged to disk leading to high memory usage.
With this change, if the max sync throughput value is set to less than the write throughput of data volumes, (i.e., the primary node never broadcast data at a speed higher than disk write speed), it will help reduce memory utilization at mirroring node significantly. The new configuration will be disabled by default (
mirroring_sync_max_throughput = 0
) to ensure backward compatibility. I.e., nothing changes to existing/running RabbitMQ clusters. Users will have to explicitly set sync throughput to enable the feature.Here are samples of memory utilization at the mirroring node and the primary node with different message sizes in tests using RabbitMQ 3.8.22 in a cluster with EBS gp2 volume compared with the limit sync throughput feature:
Message size of 640KiB
RabbitMQ 3.8.22
A test with RabbitMQ 3.8.22 using batch size of 128 shows memory consumption at mirroring node is very high (around memory alarm level); (memory utilization at the primary node is good however).
Limit sync throughput
We tested this change with mirroring_sync_max_throughput = 150MiB with similar setting, memory consumption is low at both mirroring node and primary node.
Message size of 10MiB
A test with RabbitMQ 3.8.22 using batch size of 128 shows memory utilization is very high at both mirroring node and primary node.
RabbitMQ 3.8.22
Limit sync throughput
We tested with mirroring_sync_max_throughput = 150MiB, memory consumption drops significantly at both mirroring node and primary node.
We also conducted tests with various settings such as using default batch size of 4096 as well as with smaller message sizes; they all show that when sync throughput is limited to a lower level of disk write throughput at mirroring nodes, memory consumed by queue sync processes significantly improves.
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
document