-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Stable GM memory usage during constant redelivery #1302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable GM memory usage during constant redelivery #1302
Conversation
We don't want to use the backoff/hibernate feature because we have observed that the GM process is suspended half of the time. We really wanted to replace gen_server2 with gen_server, but it was more important to keep changes in 3.6 to a minimum. GM will eventually be replaced, so switching it from gen_server2 to gen_server will be soon redundant. We simply do not understand some of the gen_server2 trade-offs well enough to feel strongly about this change. [#148892851] Signed-off-by: Gerhard Lazu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice way to integrate garbage_collect()
src/gm.erl
Outdated
flush_timeout(_) -> 0. | ||
|
||
ensure_force_gc_timer(State = #state { force_gc_timer = TRef }) | ||
when TRef =/= undefined -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRef =/= undefined
could be is_reference(TRef)
but it doesn't really matter 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we'll change that. Thanks!
1ac9ea3
to
7f84083
Compare
In high throughput scenarios, e.g. `basic.reject` or `basic.nack`, messages which belong to a mirrored queue and are replicated within a GM group, are quickly promoted to the old heap. This means that garbage collection happens only when the Erlang VM is under memory pressure, which might be too late. When a process is under pressure, garbage collection slows it down even further, to the point of RabbitMQ nodes running out of memory and crashing. To avoid this scenario, We want the GM process to garbage collect binaries regularly, i.e. every 250ms. The variable queue does the same for a similar reason: #289 Initially, we wanted to use the number of messages as the trigger for garbage collection, but we soon discovered that different workloads (e.g. small vs large messages) would result in unpredictable and sub-optimal GC schedules. Before setting `fullsweep_after` to 0, memory usage was 2x higher (400MB vs 200MB) and throughput was 0.1x lower (18k vs 20k). With this `spawn_opt` setting, the general collection algorithm is disabled, meaning that all live data is copied at every garbage collection: http://erlang.org/doc/man/erlang.html#spawn_opt-3 The RabbitMQ deployment used for testing this change: * AWS, c4.2xlarge, bosh-aws-xen-hvm-ubuntu-trusty-go_agent 3421.11 * 3 RabbitMQ nodes running OTP 20.0.1 * 3 durable & auto-delete queues with 3 replicas each * each queue master was defined on a different RabbitMQ node * every RabbitMQ node was running 1 queue master & 2 queue slaves * 1 consumer per queue with QOS 100 * 100 durable messages @ 1KiB each * `basic.reject` operations ``` | Node | Message throughput | Memory usage | | ------ | -------------------- | -------------- | | rmq0 | 12K - 20K msg/s | 400 - 900 MB | | rmq1 | 12K - 20K msg/s | 500 - 1000 MB | | rmq2 | 12K - 20K msg/s | 500 - 800 MB | ``` [#148892851] Signed-off-by: Gerhard Lazu <[email protected]>
7f84083
to
7d0e49c
Compare
Having run the benchmark for 18h, we are confident that high message redelivery rates for mirrored queues no longer affect the cluster stability: Even though there is a noticeable memory fluctuation after 11h, it's not significant. Memory usage grows from 400MB to 1000MB, it remains stable for 4h, then drops back to 400MB before another short increase. Eventually, memory returns back to 400MB and remains stable for the rest of the 18h. We observe the same behaviour on all nodes. Since the memory returns back to normal and since the cluster is stable throughout, we are happy to just take note of this and not investigate further. |
@michaelklishin @dcorbacho ready to review & merge |
We've left rmq-148892851 around, in case you need to use it for further benchmarks. |
When in openstack large deployment, 1000 compute node ovs-agent will create 10K ha-queue. https://groups.google.com/forum/?nomobile=true#!topic/rabbitmq-users/6jGtaHINmNM |
@langyxxl please keep discussions to the mailing list. Thank you. |
The Erlang VM spends 44.80% in |
In high throughput scenarios, e.g.
basic.reject
orbasic.nack
,messages which belong to a mirrored queue and are replicated within a GM
group, are quickly promoted to the old heap. This means that garbage
collection happens only when the Erlang VM is under memory pressure,
which might be too late. When a process is under pressure, garbage
collection slows it down even further, to the point of RabbitMQ nodes
running out of memory and crashing. To avoid this scenario, We want the
GM process to garbage collect binaries regularly, i.e. every 250ms. The
variable queue does the same for a similar reason:
#289
Initially, we wanted to use the number of messages as the trigger for
garbage collection, but we soon discovered that different workloads
(e.g. small vs large messages) would result in unpredictable and
sub-optimal GC schedules.
Before setting
fullsweep_after
to0
, memory usage was 2x higher (400MBvs 200MB) and throughput was 0.1x lower (18k vs 20k). This
spawn_opt
setting disables generational collection,meaning that all live data is copied at every garbage collection:
http://erlang.org/doc/man/erlang.html#spawn_opt-3
The RabbitMQ deployment used for testing this change:
basic.reject
operations