-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Set segment_entry_count per vhost and use a better default #2954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
66047f2
to
9ed851d
Compare
@@ -377,6 +410,10 @@ msg_store_dir_base() -> | |||
Dir = rabbit_mnesia:dir(), | |||
filename:join([Dir, "msg_stores", "vhosts"]). | |||
|
|||
config_file_path(VHost) -> | |||
VHostDir = msg_store_dir_path(VHost), | |||
filename:join(VHostDir, ".config"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid .config
as it might generate confusion with advanced.config
and such. How about .meta
since all we store here is metadata?
Or did you intentionally want to emphasize that this is a file:consult/1
-compatible file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to show it's a consult file, but I agree that .config
is not good, it's not something we want the user to ever modify. On that note I think I will add a comment saying it's auto generated inside the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left .config
and added a comment at the top of the file instead. I have noticed that we already have some .config
for example https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_node_monitor.erl#L73-L74
The .config
for the queue is deep within the mnesia directory so it is unlikely that users stumble upon it. But if that's not good enough we can still change it, I'm not sure into what though.
Thanks to @gerhard I have been able to run scenarios that produce interesting results for different values, based on a scenario reported by customers earlier this year. The scenario is 50k queues spread over 3 nodes where messages published fan-out over those 50k queues. PerfTest uses the following arguments:
Messages are NOT embedded in the index. I have tested with I have captured 3 graphs that are interesting. In all cases the queues get filled in about 2h10m (likely as a result of queue_index_segment_entry_count 16384 (what RabbitMQ master uses)Things go bad. One of the nodes behave quite differently while filling up, using a lot more memory. Once the queues are full, the nodes all use significantly more memory and block the publishers at regular intervals. We can see a lot of large spikes in memory usage, up and down. queue_index_segment_entry_count 8192The memory stabilizes around 9.3GiB before publishers are blocked. We can still see medium sized spikes, indicating a lot of memory allocation and deallocation is likely going on, due to how the index works. queue_index_segment_entry_count 4096Still around 9.3GiB. Similar scenario but with smaller spikes. queue_index_segment_entry_count 2048Memory ends up around 11.5GiB for two nodes, 10.5GiB for the other (the one that receives messages). There are very few spikes. This scenario looks best, but it also was stopped perhaps a bit early. Over time I would expect the memory graphs to go down (usage increases) a little just like scenarios 1024 and 512. queue_index_segment_entry_count 1024Very similar to scenario 2048, perhaps using just a little more memory. I'd say around 11.4GiB and 10.4GiB respectively where the previous scenario was. But it's hard to be super confident about this. On the other hand, because messages are not embedded in the index this time, perhaps we could see a bigger difference if we did embed the messages. queue_index_segment_entry_count 512Very similar to scenarios 2048 and 1024. Around the same memory usage. Again, hard to be super confident. ConclusionWe should probably do test runs of 2048, 1024 and 512 with messages embedded in the index, and see what results we get. The winner will be one of those three. |
With messages embedded in the index, the results are inconclusive. As you can see in all cases, the memory slowly goes down until there is no more memory available. Then publishers are blocked, then publishers are started again, then memory goes down faster until there is no more. Repeat until the end of the experiment. The memory usage after publishers are stopped seems to be directly related to the number of messages. Because in this scenario we do not end up with too many segments (2 at most even for the 512 scenario) we can easily see the impact that segments have on the memory even at rest. Perhaps this is also why it is difficult to notice any difference between the three scenarios. This is with message sizes of 4000 bytes to get them into the index. Because messages end up being duplicated into each queue index, we end up with disk alarms being triggered at around 45.1 million messages, which is where the experiment ends for 1024 and 512. There are two places where the publishers stopped that's not related to memory, I will comment below the graph where appropriate. I suspect this is some network issue or similar. There are also times where we do not get metrics despite the fact that the nodes are running and nothing is in the logs. queue_index_segment_entry_count 2048queue_index_segment_entry_count 1024Between around 15h50 to 17h05 the publishers were not running. queue_index_segment_entry_count 512Between around 11h45 to 12h10 the publishers were not running. ConclusionInconclusive. I am thinking of running a benchmark to see the messages/s against a single queue and see if there's any difference there. If there isn't, then I will consider all three values to be roughly equivalent and suggest that we use 2048 as the new default, because in that case it will create less index files than the other two. |
I have tried 1 queue with 1 publisher 1 consumer with no delay for the consuming, and didn't see anything useful. Messages rarely make it into segment files in that scenario. I have also tried a similar scenario but with a consumer delay of 1ms between messages. In that scenario we quickly get hundreds of thousands of messages. I have therefore made the messages expire after 60s. The message payload is not embedded in the index. I ran 512 1024 2048 16384. This is the result: It's hard to notice a difference in some of the graphs but there is a tiny one:
Memory and disk before publishers blocked is clearer. Memory is not important in this scenario though because there's only one queue. The difference in disk usage is probably unrelated based on repeated testing. There is a definite advantage to 16384 in that scenario. But it is a small advantage over 2048 (less than 1%) and it could be related to other workloads on the nodes or some other factor. ConclusionBased on these observations I am recommending 2048 as the new default. 2048 has a much better memory footprint and almost identical performance as the current value 16384 (less than 1% difference). I will do the changes tomorrow and make the PR ready for merging. |
7ece9c4
to
c6a0df7
Compare
c6a0df7
to
8d342b3
Compare
The new default of 2048 was chosen based on various scenarios. It provides much better memory usage when many queues are used (allowing one host to go from 500 queues to 800+ queues) and there seems to be none or negligible performance cost (< 1%) for single queues.
8d342b3
to
d9344b2
Compare
I have fixed the couple tests that failed. Please review and provide guidance to the next steps (if any). I have marked the PR as a breaking change, because while going from before the PR to after the PR is a seamless change, rolling back to a version before the PR would not work. |
Set segment_entry_count per vhost and use a better default (cherry picked from commit 12253d2) Conflicts: deps/rabbit/BUILD.bazel deps/rabbit/Makefile deps/rabbit/src/rabbit_vhost.erl
Backported to |
This happens during queue migration from a pre-3.7 version. References #2954.
Proposed Changes
Lower values of
queue_index_segment_entry_count
have shown topotentially improve the performance of the queue. This PR allows
changing the default to a better value without breaking existing
installations.
Types of Changes
Checklist
CONTRIBUTING.md
documentFurther Comments
At this point I have not tested it nor have I confirmed it works when upgrading.
The new default needs to be figured out by testing with values 512 1024 2056 4096 8192 16384. Since @gerhard already has a setup comparing 1024 and 16384 I think we should just extend the testing by adding more clusters for comparison.