Support configure max sync throughput in CMQs #3925

thuandb · 2021-12-21T01:47:22Z

Proposed Changes

We would like to propose a backward compatible change in order to reduce memory utilization during classic mirrored queue sync.

Specifically, in addition to the existing mirroring_sync_batch_size configuration key, which controls maximum number of messages per sync batch, we propose to add a new configuration key mirroring_sync_max_throughput in bytes per second to control maximum synchronization throughput. The syncer at the primary node will make sure the throughput of messages being synchronized does not exceed the value configured. This will give mirroring nodes sufficient time to page messages being synchronized to disk and quickly free up memory.

The motivation for this is that, in our test with a 3-node cluster on AWS with data volumes using EBS gp2 (maximum disk write throughput is 250MiB/s), we observe that the memory utilization during queue sync is high leading to a memory alarm. We suspect that when disk write throughput is low, mirroring nodes take time to page messages to disk while the syncer keeps broadcasting messages at a higher rate. As a result, most messages are kept in memory on the mirror(s) and are waiting to be paged to disk leading to high memory usage.

With this change, if the max sync throughput value is set to less than the write throughput of data volumes, (i.e., the primary node never broadcast data at a speed higher than disk write speed), it will help reduce memory utilization at mirroring node significantly. The new configuration will be disabled by default (mirroring_sync_max_throughput = 0) to ensure backward compatibility. I.e., nothing changes to existing/running RabbitMQ clusters. Users will have to explicitly set sync throughput to enable the feature.

Here are samples of memory utilization at the mirroring node and the primary node with different message sizes in tests using RabbitMQ 3.8.22 in a cluster with EBS gp2 volume compared with the limit sync throughput feature:

Message size of 640KiB

RabbitMQ 3.8.22

A test with RabbitMQ 3.8.22 using batch size of 128 shows memory consumption at mirroring node is very high (around memory alarm level); (memory utilization at the primary node is good however).

Memory consumption during queue sync at the mirroring node is high

Memory consumption during queue sync at the primary node looks good.

Limit sync throughput

We tested this change with mirroring_sync_max_throughput = 150MiB with similar setting, memory consumption is low at both mirroring node and primary node.

Memory consumption during queue sync at the mirroring node:

Memory consumption during queue sync at the primary node is low too

Message size of 10MiB

A test with RabbitMQ 3.8.22 using batch size of 128 shows memory utilization is very high at both mirroring node and primary node.

RabbitMQ 3.8.22

Memory consumption during queue sync at the mirroring node is high

Memory consumption during queue sync at the primary node is high too.

Limit sync throughput

We tested with mirroring_sync_max_throughput = 150MiB, memory consumption drops significantly at both mirroring node and primary node.

Memory consumption during queue sync at the mirroring node dropped significantly.

Memory consumption during queue sync at the primary node is low as well

We also conducted tests with various settings such as using default batch size of 4096 as well as with smaller message sizes; they all show that when sync throughput is limited to a lower level of disk write throughput at mirroring nodes, memory consumed by queue sync processes significantly improves.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

Bug fix (non-breaking change which fixes issue #NNNN)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause an observable behavior change in existing systems)
Documentation improvements (corrections, new content, etc)
Cosmetic change (whitespace, formatting, etc)
Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
I have added tests that prove my fix is effective or that my feature works
All tests pass locally with my changes
If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website

We will submit a separate PR for updating doc once this change is approved/merged.

If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

…it_mirror_queue_sync_SUITE

michaelklishin

I like how small the solution ended up being.

thuandb added 6 commits December 20, 2021 17:39

Support configure max sync throughput in CMQs

157bffa

minor update for batching messages when syncthroughput is 0

1ab485b

minor fix on condition to stop batching when total batch size is large

dc6fb24

reset counter after each sync throughput check interval

83b94ca

fix the sync pause time calculation

fe8bd15

add bazel rule definition for rabbit_mirror_queue_misc_SUITE and rabb…

542d2cf

…it_mirror_queue_sync_SUITE

mergify bot added the bazel label Dec 21, 2021

michaelklishin mentioned this pull request Dec 21, 2021

Support configure max sync throughput in CMQs #3911

Closed

12 tasks

michaelklishin approved these changes Dec 27, 2021

View reviewed changes

michaelklishin merged commit 542d2cf into rabbitmq:master Dec 27, 2021

michaelklishin added the enhancement label Dec 27, 2021

michaelklishin added this to the 3.10.0 milestone Dec 27, 2021

michaelklishin self-assigned this Dec 27, 2021

michaelklishin added a commit that referenced this pull request Dec 27, 2021

#3925 follow-up: update Bazel files to match new suite names

7ded41f

michaelklishin added a commit that referenced this pull request Dec 27, 2021

#3925 follow-up: don't include Erlang client headers

19ae35a

michaelklishin added a commit that referenced this pull request Dec 27, 2021

#3925 follow-up: add a rabbit_common Bazel dep

4993fef

johanrhodin mentioned this pull request May 5, 2023

Include "mirroring_sync_max_throughput" rabbitmq/rabbitmq-website#1646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support configure max sync throughput in CMQs #3925

Support configure max sync throughput in CMQs #3925

Uh oh!

thuandb commented Dec 21, 2021

Uh oh!

michaelklishin left a comment

Uh oh!

Uh oh!

Support configure max sync throughput in CMQs #3925

Support configure max sync throughput in CMQs #3925

Uh oh!

Conversation

thuandb commented Dec 21, 2021

Proposed Changes

Message size of 640KiB

RabbitMQ 3.8.22

Limit sync throughput

Message size of 10MiB

RabbitMQ 3.8.22

Limit sync throughput

Types of Changes

Checklist

Uh oh!

michaelklishin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!