-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Make consumer-level prefetch-like limit configurable for cases with QoS prefetch > 200 #11822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make consumer-level prefetch-like limit configurable for cases with QoS prefetch > 200 #11822
Conversation
This constant is used on the hot delivery path (as part of What are the effects of this PR with all defaults? |
3.12.x is out of community support so I'd not expect any more OSS RabbitMQ 3.12.x releases. |
%% | ||
%% {classic_queue_consumer_unsent_message_limit, 200}, | ||
|
||
{mapping, "classic_queue_consumer_unsent_message_limit", "rabbit.classic_queue_consumer_unsent_message_limit", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the established prefix for CQ-specific features.
@@ -22,7 +22,8 @@ | |||
|
|||
-define(QUEUE, lqueue). | |||
|
|||
-define(UNSENT_MESSAGE_LIMIT, 200). | |||
-define(UNSENT_MESSAGE_LIMIT, | |||
application:get_env(rabbit, classic_queue_consumer_unsent_message_limit, 200)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be initialized in the cr
record state, or at least a persistent term? Calling application:get_env/3
for every batch of deliveries sounds substantially more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sure can. I actually tried initialising in cr
state but fetching limit by pattern match per message delivery yielded slower throughput numbers vs lookups in ets via application:get_env/3
. I'll do more samples. Didn't try persistent term. i'll test all 3 and we opt for fastest approach 👍
Ran some tests locally for unbounded prefetch (at least 3 samples for each approach), using the following (including Process Dictionary).
Using:
From results below, we can either store/retrieve All approaches perform reasonably well. In this case, we can go for persistent term mainly due to generally acceptable high throughput (despite not giving highest samples), and higher throughput at lower/default UNSENT-MESSAGE-LIMIT = 200
1. APPLICATION CONFIG
2. PERSISTENT TERM **
3. CHANNEL RECORD / #cr{}
4. PROCESS DICTIONARY
UNSENT-MESSAGE-LIMIT = 10000
1. APPLICATION CONFIG
2. PERSISTENT TERM
3. CHANNEL RECORD / #cr{}
4. PROCESS DICTIONARY
UNSENT-MESSAGE-LIMIT = 20000
1. APPLICATION CONFIG
2. PERSISTENT TERM
3. CHANNEL RECORD / #cr{}
4. PROCESS DICTIONARY
UNSENT-MESSAGE-LIMIT = 50000
1. APPLICATION CONFIG
2. PERSISTENT TERM
3. CHANNEL RECORD / #cr{}
4. PROCESS DICTIONARY
UNSENT-MESSAGE-LIMIT = 100000
1. APPLICATION CONFIG
2. PERSISTENT TERM
3. CHANNEL RECORD / #cr{}
4. PROCESS DICTIONARY
|
So persistent term and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this implementation.
It does not regress on the default path, which 99.99% of users will start with.
With the limit of 500, I observe no average publishing rate degradation and very comparable variability, and the consumer rate goes up ≈ 7.6%.
With the limit of 1000, the publisher metrics are identical and the consumer throughput is 16.2% higher.
With higher limits (including a higher consumer/PerfTest QoS limit, of course) I do not observe any meaningful improvements but the variability of results raises significantly.
But I guess it is a useful configurable value to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this value configurable is a good idea. See my inline comments.
persistent_term:put(unsent_message_limit, | ||
application:get_env(rabbit, classic_queue_consumer_unsent_message_limit, 200)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very important, but would it be better to place this into rabbit_queue_consumers:new/0
instead because that's easier to reason about and called less often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define the default of 200
at only one place?
Either here or in the Makefile.
I know that other RabbitMQ settings also define a default value 2 times (1 in the Makefile and 1 in the code as done here), but I never understood why that's done.
FWIW, I prefer this setting to not be exposed in rabbitmq.conf
and only to bet set in advanced.config
. This way we can get rid of the Makefile changes entirely including the Makefile default setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that commonly set configuration goes into rabbitmq.conf
which requires writing Cuttlefish translations.
Rarely or advanced config goes - as the name suggests - into advanced.config
file.
This setting, given it has never been configurable, is for me a rare / advanced configuration.
Other internal credit flow settings such as msg_store_credit_disc_bound
or credit_flow_default_credit
are not exposed either in rabbitmq.conf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ansd using rabbit_queue_consumers:new/0
wont work here. We want this to be dynamic and configurable in real time, when new consumers attach to a queue. Which is why we're adding to rabbit_queue_consumers:add/9
. This suggestion breaks the purpose of the change - we want capability to "accelerate" consumers on existing queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ansd credit-flow configs use advanced.config
solely because they're configured as native Erlang terms/tuples. This config is just a simple integer, hence must be set in rabbitmq.conf
. Which is much easier for operators to update/manage this config. Not sure I get the value/advantage of making configuring this unnecessarily more complex for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want this to be dynamic and configurable in real time, when new consumers attach to a queue.
When exactly would you increase or decrease this value dynamically at runtime globally for all consumers running on this node?
Similar to other RabbitMQ internal credit flow configurations such as `credit_flow_default_credit` and `msg_store_credit_disc_bound`, this commit makes the `classic_queue_consumer_unsent_message_limit` configurable via `advanced.config`. See #11822 for the original motivation to make this setting configurable.
#11857 implements an alternative to this PR making this value configurable via |
@ansd this #11857 adds unnecessary complexity. Lets keep this config very simple please. Managing |
@Ayanda-D 99.99% of operators will not use this setting in the first few months when it ships, and a high 90-something % of operators will never use this setting because their applications do not set prefetch to values into thousands, leave alone to something like 30K or 50K. This is not a common user setting. We know well why |
I am comparing this PR without the |
#11857 was accepted and will be backported to |
Similar to other RabbitMQ internal credit flow configurations such as `credit_flow_default_credit` and `msg_store_credit_disc_bound`, this commit makes the `classic_queue_consumer_unsent_message_limit` configurable via `advanced.config`. See #11822 for the original motivation to make this setting configurable. (cherry picked from commit c771b24)
Similar to other RabbitMQ internal credit flow configurations such as `credit_flow_default_credit` and `msg_store_credit_disc_bound`, this commit makes the `classic_queue_consumer_unsent_message_limit` configurable via `advanced.config`. See #11822 for the original motivation to make this setting configurable. (cherry picked from commit c771b24) (cherry picked from commit eee851b) # Conflicts: # deps/rabbit/src/rabbit_queue_consumers.erl
Similar to other RabbitMQ internal credit flow configurations such as `credit_flow_default_credit` and `msg_store_credit_disc_bound`, this commit makes the `classic_queue_consumer_unsent_message_limit` configurable via `advanced.config`. See #11822 for the original motivation to make this setting configurable. (cherry picked from commit c771b24) (cherry picked from commit eee851b) # Conflicts: # deps/rabbit/src/rabbit_queue_consumers.erl
A prefetch of And just because a config is configurable in What you're now doing in #11857 is now taking Consumer optimization config and making it tightly coupled to the queue. So queues have to be terminated and restarted for a consumer config/optimization to take effect? Which completely doesn't make sense. Queues can be/usually are, long running entities - having to restart queues to apply this configuration just doesn't make sense and is breaking the way consumers/queues are meant to be optimized. Like QoS (anytime at runtime), you only need to restart a consumer channel to apply new QoS value. Similarly this config. Restarting a queue for a consumer-level config to take effect, is probably one of the most incorrect things to be doing/i've seen being done in Rabbit. Restarting a queue to improve consumer QoS ? Very wrong thing to do. Thats what #11857 introduces. & FYI. Follow up to this work is a CLI command to Luckily, we maintain our own releases - we don't have to conform to some of these wrong code changes that break normal operation flow. We'll do the right thing on our side 👍 |
@Ayanda-D you didn't answer #11822 (comment). You weren't able to explain when exactly you would increase or decrease this value at runtime globally for all consumers running on the given node. We will never add features to RabbitMQ without a clear and well understood use case. |
You can easily update the value via
It is not tied to the queue at all. There's no point in adding a CLI command, but feel free to do so in your fork. Note that some time after 4.1 I will go through this module to make it work better for CQv2, potentially making this configuration option obsolete. |
@ansd I've just answered that in my last comment, thought you'd see the answer without me needing to repeat myself - please go into CLI repo and check set_* commands for updating all sorts of config and limits during runtime:
Even by simply running:
This is a very normal thing to do. |
A CLI command and a |
@lhoguin yes but thats too many moving parts - if we update If a queue restarts, or new queue added, then it overrides whatever you had updated directly with |
@Ayanda-D you completely ignore the fact that 99% of RabbitMQ users likely won't do things like "set QoS to 30K so that we must override an internal constant related to classic queues, and then update it at runtime." RabbitMQ remains open source software in a world where other projects have gone Business Software License or ask to pay for their cloud option. That does not mean that we will accept every single change but it surely is worth something. |
Yes and that's fine, you shouldn't update this value in production. It's a band-aid that does some good now but is likely to be removed in the future after the next set of refactorings of classic queues. |
And anyone who needs a substantial throughput boost should switch from classic queues to streams and/or superstreams. That'd be an order of magnitude change or more with an easier path to multiplying that throughput by growing cluster size. Which, by the way, were developed and made available for free to everyone by the RabbitMQ Core team. |
Proposed Changes
The
rabbit_queue_consumers
implementation has a hard-codedUNSENT_MESSAGE_LIMIT = 200
which can block the consumer channel (seeis_ch_blocked/1
predicate) from consumer credit based flow control.When a consumer channel is operating at e.g. prefetch count >= 200, outbound deliveries leading to
unsent_message_count
>= 200 before Acks are received or consumer credit is processed, can cause the channel to block further deliveries more frequently. This can also break the semantics of expected AMQP QOS, for such high end or unlimited prefetch settings.This PR introduces a
rabbit.classic_queue_consumer_unsent_message_limit
config which grants us flexibility on this hard limit, allowing us to increase/prolong blocking on classic queue consumers. We're keeping the default 200 limit (happy to bump it up if necessary, e.g. 1K).Extremely high setting can be risky for slow consumers - however very fast consumers can do with higher limit than default 200.
UNSENT_MESSAGE_LIMIT
limitation dates back to some really early releases of RabbitMQ, v1.4.0 and v2.8.0 having control over this flow control limit on consumers, to increase it, is yielding some interesting results on improved throughput.Couple of PerfTests (on some very high
UNSENT_MESSAGE_LIMIT
settings):FIXED PREFETCH (up to ~250% throughput improvement)
1. UNSENT_MESSAGE_LIMIT = 200, PREFETCH = 1000
2. UNSENT_MESSAGE_LIMIT = 10000, PREFETCH = 1000
3. UNSENT_MESSAGE_LIMIT = 20000, PREFETCH = 1000
4. UNSENT_MESSAGE_LIMIT = 50000, PREFETCH = 1000
UNBOUNDED PREFETCH (up to ~530% throughput improvement)
1. UNSENT_MESSAGE_LIMIT = 200, PREFETCH = UNLIMITED
2. UNSENT_MESSAGE_LIMIT = 10000, PREFETCH = UNLIMITED
3. UNSENT_MESSAGE_LIMIT = 20000, PREFETCH = UNLIMITED
4. UNSENT_MESSAGE_LIMIT = 50000, PREFETCH = UNLIMITED
From 20K to 50K settings, we notice consumer receive rate/throughput reaching saturation (could be of use for more high performance use-cases).
We're keen to have this feature in 3.12.x as well. Please take a look 👍
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments