Set segment_entry_count per vhost and use a better default #2954

lhoguin · 2021-04-07T12:48:52Z

Proposed Changes

Lower values of queue_index_segment_entry_count have shown to
potentially improve the performance of the queue. This PR allows
changing the default to a better value without breaking existing
installations.

Types of Changes

Bug fix (non-breaking change which fixes issue #NNNN)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause an observable behavior change in existing systems)
Documentation improvements (corrections, new content, etc)
Cosmetic change (whitespace, formatting, etc)

Checklist

I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
All tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in related repositories

Further Comments

At this point I have not tested it nor have I confirmed it works when upgrading.

The new default needs to be figured out by testing with values 512 1024 2056 4096 8192 16384. Since @gerhard already has a setup comparing 1024 and 16384 I think we should just extend the testing by adding more clusters for comparison.

michaelklishin · 2021-04-12T08:28:07Z

deps/rabbit/src/rabbit_vhost.erl

@@ -377,6 +410,10 @@ msg_store_dir_base() ->
    Dir = rabbit_mnesia:dir(),
    filename:join([Dir, "msg_stores", "vhosts"]).

+config_file_path(VHost) ->
+    VHostDir = msg_store_dir_path(VHost),
+    filename:join(VHostDir, ".config").


We should avoid .config as it might generate confusion with advanced.config and such. How about .meta since all we store here is metadata?

Or did you intentionally want to emphasize that this is a file:consult/1-compatible file?

I do want to show it's a consult file, but I agree that .config is not good, it's not something we want the user to ever modify. On that note I think I will add a comment saying it's auto generated inside the file.

I have left .config and added a comment at the top of the file instead. I have noticed that we already have some .config for example https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_node_monitor.erl#L73-L74

The .config for the queue is deep within the mnesia directory so it is unlikely that users stumble upon it. But if that's not good enough we can still change it, I'm not sure into what though.

lhoguin · 2021-04-29T09:00:21Z

Thanks to @gerhard I have been able to run scenarios that produce interesting results for different values, based on a scenario reported by customers earlier this year. The scenario is 50k queues spread over 3 nodes where messages published fan-out over those 50k queues. PerfTest uses the following arguments:

             --predeclared
             --exchange 1001
             --routing-key 1001-1
             --flag persistent
             --producers 1
             --rate 1
             --size 4096
             --confirm 1
             --confirm-timeout 60
             --pmessages 10000
             --consumers 0

Messages are NOT embedded in the index.

I have tested with queue_index_segment_entry_count values of 16384 (what RabbitMQ master uses), 8092, 4096, 2048, 1024, 512. In between runs, I have stopped the filling job, did a stop_app, deleted the queue data, made the node exit normally (q() - so that the pod got recreated, I couldn't get start_app to work it gave me a timeout), set the queue_index_segment_entry_count value in a remote shell and then once the nodes recovered started the filling job again.

I have captured 3 graphs that are interesting. In all cases the queues get filled in about 2h10m (likely as a result of --rate 1). In all cases, once the queues are full, the effective publishing rate decreases by an equal amount. The only difference in behavior is in the second graph which is the memory before publishers blocked.

queue_index_segment_entry_count 16384 (what RabbitMQ master uses)

Things go bad. One of the nodes behave quite differently while filling up, using a lot more memory. Once the queues are full, the nodes all use significantly more memory and block the publishers at regular intervals. We can see a lot of large spikes in memory usage, up and down.

queue_index_segment_entry_count 8192

The memory stabilizes around 9.3GiB before publishers are blocked. We can still see medium sized spikes, indicating a lot of memory allocation and deallocation is likely going on, due to how the index works.

queue_index_segment_entry_count 4096

Still around 9.3GiB. Similar scenario but with smaller spikes.

queue_index_segment_entry_count 2048

Memory ends up around 11.5GiB for two nodes, 10.5GiB for the other (the one that receives messages). There are very few spikes.

This scenario looks best, but it also was stopped perhaps a bit early. Over time I would expect the memory graphs to go down (usage increases) a little just like scenarios 1024 and 512.

queue_index_segment_entry_count 1024

Very similar to scenario 2048, perhaps using just a little more memory. I'd say around 11.4GiB and 10.4GiB respectively where the previous scenario was. But it's hard to be super confident about this.

On the other hand, because messages are not embedded in the index this time, perhaps we could see a bigger difference if we did embed the messages.

queue_index_segment_entry_count 512

Very similar to scenarios 2048 and 1024. Around the same memory usage. Again, hard to be super confident.

Conclusion

We should probably do test runs of 2048, 1024 and 512 with messages embedded in the index, and see what results we get. The winner will be one of those three.

lhoguin · 2021-04-30T11:41:41Z

With messages embedded in the index, the results are inconclusive. As you can see in all cases, the memory slowly goes down until there is no more memory available. Then publishers are blocked, then publishers are started again, then memory goes down faster until there is no more. Repeat until the end of the experiment.

The memory usage after publishers are stopped seems to be directly related to the number of messages. Because in this scenario we do not end up with too many segments (2 at most even for the 512 scenario) we can easily see the impact that segments have on the memory even at rest. Perhaps this is also why it is difficult to notice any difference between the three scenarios.

This is with message sizes of 4000 bytes to get them into the index. Because messages end up being duplicated into each queue index, we end up with disk alarms being triggered at around 45.1 million messages, which is where the experiment ends for 1024 and 512.

There are two places where the publishers stopped that's not related to memory, I will comment below the graph where appropriate. I suspect this is some network issue or similar. There are also times where we do not get metrics despite the fact that the nodes are running and nothing is in the logs.

queue_index_segment_entry_count 2048

queue_index_segment_entry_count 1024

Between around 15h50 to 17h05 the publishers were not running.

queue_index_segment_entry_count 512

Between around 11h45 to 12h10 the publishers were not running.

Conclusion

Inconclusive. I am thinking of running a benchmark to see the messages/s against a single queue and see if there's any difference there. If there isn't, then I will consider all three values to be roughly equivalent and suggest that we use 2048 as the new default, because in that case it will create less index files than the other two.

lhoguin · 2021-05-05T15:20:54Z

I have tried 1 queue with 1 publisher 1 consumer with no delay for the consuming, and didn't see anything useful. Messages rarely make it into segment files in that scenario.

I have also tried a similar scenario but with a consumer delay of 1ms between messages. In that scenario we quickly get hundreds of thousands of messages. I have therefore made the messages expire after 60s. The message payload is not embedded in the index. I ran 512 1024 2048 16384. This is the result:

It's hard to notice a difference in some of the graphs but there is a tiny one:

SEC	512	1024	2048	16834
Messages ready	715-720K	735-740K	745-748K	750-753K
Published /s	12.0K	12.3K	12.4K	12.5K

Memory and disk before publishers blocked is clearer. Memory is not important in this scenario though because there's only one queue. The difference in disk usage is probably unrelated based on repeated testing.

There is a definite advantage to 16384 in that scenario. But it is a small advantage over 2048 (less than 1%) and it could be related to other workloads on the nodes or some other factor.

Conclusion

Based on these observations I am recommending 2048 as the new default. 2048 has a much better memory footprint and almost identical performance as the current value 16384 (less than 1% difference).

I will do the changes tomorrow and make the PR ready for merging.

The new default of 2048 was chosen based on various scenarios. It provides much better memory usage when many queues are used (allowing one host to go from 500 queues to 800+ queues) and there seems to be none or negligible performance cost (< 1%) for single queues.

lhoguin · 2021-05-11T09:28:30Z

I have fixed the couple tests that failed. Please review and provide guidance to the next steps (if any).

I have marked the PR as a breaking change, because while going from before the PR to after the PR is a seamless change, rolling back to a version before the PR would not work.

Set segment_entry_count per vhost and use a better default (cherry picked from commit 12253d2) Conflicts: deps/rabbit/BUILD.bazel deps/rabbit/Makefile deps/rabbit/src/rabbit_vhost.erl

michaelklishin · 2021-05-27T00:18:58Z

Backported to v3.8.x.

This happens during queue migration from a pre-3.7 version. References #2954.

lhoguin force-pushed the new-segment-entry-count-default branch from 66047f2 to 9ed851d Compare April 8, 2021 12:52

michaelklishin reviewed Apr 12, 2021

View reviewed changes

lhoguin force-pushed the new-segment-entry-count-default branch 2 times, most recently from 7ece9c4 to c6a0df7 Compare May 7, 2021 09:33

lhoguin changed the title ~~WIP: Set segment_entry_count per vhost and use a better default~~ Set segment_entry_count per vhost and use a better default May 7, 2021

lhoguin marked this pull request as ready for review May 7, 2021 09:35

lhoguin requested a review from michaelklishin May 7, 2021 09:36

lhoguin force-pushed the new-segment-entry-count-default branch from c6a0df7 to 8d342b3 Compare May 11, 2021 08:14

lhoguin force-pushed the new-segment-entry-count-default branch from 8d342b3 to d9344b2 Compare May 11, 2021 08:45

michaelklishin approved these changes May 26, 2021

View reviewed changes

michaelklishin merged commit 12253d2 into master May 26, 2021

michaelklishin deleted the new-segment-entry-count-default branch May 26, 2021 22:56

michaelklishin added this to the 3.8.17 milestone May 27, 2021

michaelklishin added a commit that referenced this pull request May 27, 2021

Handle cases where virtual host config file does not yet exist

6acee76

This happens during queue migration from a pre-3.7 version. References #2954.

michaelklishin mentioned this pull request May 27, 2021

Handle cases where virtual host config file does not yet exist #3068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set segment_entry_count per vhost and use a better default #2954

Set segment_entry_count per vhost and use a better default #2954

Uh oh!

lhoguin commented Apr 7, 2021 •

edited

Loading

Uh oh!

michaelklishin Apr 12, 2021

Uh oh!

lhoguin Apr 12, 2021

Uh oh!

lhoguin May 7, 2021

Uh oh!

lhoguin commented Apr 29, 2021

Uh oh!

lhoguin commented Apr 30, 2021

Uh oh!

lhoguin commented May 5, 2021 •

edited

Loading

Uh oh!

lhoguin commented May 11, 2021

Uh oh!

michaelklishin commented May 27, 2021

Uh oh!

Uh oh!

Set segment_entry_count per vhost and use a better default #2954

Set segment_entry_count per vhost and use a better default #2954

Uh oh!

Conversation

lhoguin commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Types of Changes

Checklist

Further Comments

Uh oh!

michaelklishin Apr 12, 2021

Choose a reason for hiding this comment

Uh oh!

lhoguin Apr 12, 2021

Choose a reason for hiding this comment

Uh oh!

lhoguin May 7, 2021

Choose a reason for hiding this comment

Uh oh!

lhoguin commented Apr 29, 2021

queue_index_segment_entry_count 16384 (what RabbitMQ master uses)

queue_index_segment_entry_count 8192

queue_index_segment_entry_count 4096

queue_index_segment_entry_count 2048

queue_index_segment_entry_count 1024

queue_index_segment_entry_count 512

Conclusion

Uh oh!

lhoguin commented Apr 30, 2021

queue_index_segment_entry_count 2048

queue_index_segment_entry_count 1024

queue_index_segment_entry_count 512

Conclusion

Uh oh!

lhoguin commented May 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conclusion

Uh oh!

lhoguin commented May 11, 2021

Uh oh!

michaelklishin commented May 27, 2021

Uh oh!

Uh oh!

lhoguin commented Apr 7, 2021 •

edited

Loading

lhoguin commented May 5, 2021 •

edited

Loading