Skip to content

Stable GM memory usage during constant redelivery #1302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 25, 2017

Conversation

gerhard
Copy link
Contributor

@gerhard gerhard commented Jul 24, 2017

In high throughput scenarios, e.g. basic.reject or basic.nack,
messages which belong to a mirrored queue and are replicated within a GM
group, are quickly promoted to the old heap. This means that garbage
collection happens only when the Erlang VM is under memory pressure,
which might be too late. When a process is under pressure, garbage
collection slows it down even further, to the point of RabbitMQ nodes
running out of memory and crashing. To avoid this scenario, We want the
GM process to garbage collect binaries regularly, i.e. every 250ms. The
variable queue does the same for a similar reason:
#289

Initially, we wanted to use the number of messages as the trigger for
garbage collection, but we soon discovered that different workloads
(e.g. small vs large messages) would result in unpredictable and
sub-optimal GC schedules.

Before setting fullsweep_after to 0, memory usage was 2x higher (400MB
vs 200MB) and throughput was 0.1x lower (18k vs 20k). This
spawn_opt setting disables generational collection,
meaning that all live data is copied at every garbage collection:
http://erlang.org/doc/man/erlang.html#spawn_opt-3

The RabbitMQ deployment used for testing this change:

  • AWS, c4.2xlarge, bosh-aws-xen-hvm-ubuntu-trusty-go_agent 3421.11
  • 3 RabbitMQ nodes running OTP 20.0.1
  • 3 durable & auto-delete queues with 3 replicas each
  • each queue master was defined on a different RabbitMQ node
  • every RabbitMQ node was running 1 queue master & 2 queue slaves
  • 1 consumer per queue with QOS 100
  • 100 durable messages @ 1KiB each
  • basic.reject operations
| Node   | Message throughput   | Memory usage   |
| ------ | -------------------- | -------------- |
| rmq0   | 12K - 20K msg/s      | 400 - 900 MB   |
| rmq1   | 12K - 20K msg/s      | 500 - 1000 MB  |
| rmq2   | 12K - 20K msg/s      | 500 - 800 MB   |

We don't want to use the backoff/hibernate feature because we have
observed that the GM process is suspended half of the time.

We really wanted to replace gen_server2 with gen_server, but it was more
important to keep changes in 3.6 to a minimum. GM will eventually be
replaced, so switching it from gen_server2 to gen_server will be soon
redundant. We simply do not understand some of the gen_server2
trade-offs well enough to feel strongly about this change.

[#148892851]

Signed-off-by: Gerhard Lazu <[email protected]>
@gerhard
Copy link
Contributor Author

gerhard commented Jul 24, 2017

Letting the benchmark run for 12h before it's ready to merge.

image

image

@lukebakken
Copy link
Collaborator

lukebakken commented Jul 24, 2017

See #289, #290 and #339 for garbage_collect() in rabbit_variable_queue

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice way to integrate garbage_collect()

src/gm.erl Outdated
flush_timeout(_) -> 0.

ensure_force_gc_timer(State = #state { force_gc_timer = TRef })
when TRef =/= undefined ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRef =/= undefined could be is_reference(TRef) but it doesn't really matter 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we'll change that. Thanks!

@gerhard gerhard force-pushed the stable-gm-mem-usage-during-constant-redelivery branch from 1ac9ea3 to 7f84083 Compare July 25, 2017 08:25
@gerhard gerhard changed the title Stable GM mem usage during constant redelivery Stable GM memory usage during constant redelivery Jul 25, 2017
In high throughput scenarios, e.g. `basic.reject` or `basic.nack`,
messages which belong to a mirrored queue and are replicated within a GM
group, are quickly promoted to the old heap. This means that garbage
collection happens only when the Erlang VM is under memory pressure,
which might be too late. When a process is under pressure, garbage
collection slows it down even further, to the point of RabbitMQ nodes
running out of memory and crashing. To avoid this scenario, We want the
GM process to garbage collect binaries regularly, i.e. every 250ms. The
variable queue does the same for a similar reason:
#289

Initially, we wanted to use the number of messages as the trigger for
garbage collection, but we soon discovered that different workloads
(e.g. small vs large messages) would result in unpredictable and
sub-optimal GC schedules.

Before setting `fullsweep_after` to 0, memory usage was 2x higher (400MB
vs 200MB) and throughput was 0.1x lower (18k vs 20k). With this
`spawn_opt` setting, the general collection algorithm is disabled,
meaning that all live data is copied at every garbage collection:
http://erlang.org/doc/man/erlang.html#spawn_opt-3

The RabbitMQ deployment used for testing this change:

* AWS, c4.2xlarge, bosh-aws-xen-hvm-ubuntu-trusty-go_agent 3421.11
* 3 RabbitMQ nodes running OTP 20.0.1
* 3 durable & auto-delete queues with 3 replicas each
* each queue master was defined on a different RabbitMQ node
* every RabbitMQ node was running 1 queue master & 2 queue slaves
* 1 consumer per queue with QOS 100
* 100 durable messages @ 1KiB each
* `basic.reject` operations

```
| Node   | Message throughput   | Memory usage   |
| ------ | -------------------- | -------------- |
| rmq0   | 12K - 20K msg/s      | 400 - 900 MB   |
| rmq1   | 12K - 20K msg/s      | 500 - 1000 MB  |
| rmq2   | 12K - 20K msg/s      | 500 - 800 MB   |
```

[#148892851]

Signed-off-by: Gerhard Lazu <[email protected]>
@gerhard gerhard force-pushed the stable-gm-mem-usage-during-constant-redelivery branch from 7f84083 to 7d0e49c Compare July 25, 2017 10:33
@gerhard
Copy link
Contributor Author

gerhard commented Jul 25, 2017

Having run the benchmark for 18h, we are confident that high message redelivery rates for mirrored queues no longer affect the cluster stability:

image

Even though there is a noticeable memory fluctuation after 11h, it's not significant. Memory usage grows from 400MB to 1000MB, it remains stable for 4h, then drops back to 400MB before another short increase. Eventually, memory returns back to 400MB and remains stable for the rest of the 18h. We observe the same behaviour on all nodes. Since the memory returns back to normal and since the cluster is stable throughout, we are happy to just take note of this and not investigate further.

@gerhard
Copy link
Contributor Author

gerhard commented Jul 25, 2017

@michaelklishin @dcorbacho ready to review & merge

@gerhard
Copy link
Contributor Author

gerhard commented Jul 25, 2017

We've left rmq-148892851 around, in case you need to use it for further benchmarks.

@michaelklishin michaelklishin merged commit 82fb30b into stable Jul 25, 2017
@gerhard gerhard deleted the stable-gm-mem-usage-during-constant-redelivery branch July 26, 2017 17:09
@gerhard gerhard added this to the 3.6.11 milestone Jul 26, 2017
@langyxxl
Copy link

When in openstack large deployment, 1000 compute node ovs-agent will create 10K ha-queue.
This gc every 250ms will cause cpu usage very high. Even these queue is empty.

https://groups.google.com/forum/?nomobile=true#!topic/rabbitmq-users/6jGtaHINmNM

@michaelklishin
Copy link
Collaborator

@langyxxl please keep discussions to the mailing list. Thank you.

@gerhard
Copy link
Contributor Author

gerhard commented Sep 4, 2018

The Erlang VM spends 44.80% in busy_wait. Setting +sbwt none via RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS significantly reduces CPU utilisation - a few % for both user & system spaces.

Read more and discuss on the mailing list thread

cc @michaelklishin @dcorbacho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants