Optimize publication path when messages are expiring #51

richardlarocque · 2016-05-23T23:38:49Z

We saw some unusual CPU spikes when we migrated our direct exchanges to a delayed exchange in a heavy traffic environment. The CPU spikes did not correlate well with the amount of traffic, making this behavior very suspicious.

After some investigation [1], we found that both publication to the exchange and message timeout expiry were quite fast. However, when publication and message expiry were occurring at the same time, the publication rate plummeted and CPU usage skyrocketed.

The root cause of this issue appears to be the branch of code intended to start a timer when we publish the first delayed message to an empty exchange. That code is triggered by accident when we publish new messages while the exchange is falling behind in its handling of expired messages. That code path happens to be much slower than the regular case.

This PR avoids the issue by moving the responsibility for starting the initial timer from the publication code path to the exchange itself. Now it maintains a timer in the distant future as a placeholder even when there are no delayed messages sitting in the queue.

[1] https://groups.google.com/forum/#!topic/rabbitmq-users/XgjY7UtLkfs

Run `make rabbitmq-components-mk` and commit the result.

Replaces `maybe_delay_first` with `delay_first`. Modfies the behavior so it ensures that there is always one timer active, even when there are no delayed messages in the exchange. This timer is not used for anything except book-keeping, so we set it to expire far in the future (1 hour) so it doesn't gobble up resources for no reason. Previously the `erlang:read_timer(CurrTimer) == false` code path could be hit in two scenarios: - No timer had been set. The message currently being added is the only one in the delay queue. - The timer has already expired. The delay queue is full of messages ready for publication and the exchange has not yet caught up. In the former case, the exchange should start a new timer. In the latter, the exchange is far behind and starting a new timer will only waste time. The new invariant makes the first of those scenarios impossible. Updates the publication code path to take advantage of this change. Since the first of those two scenarios is no longer possible, the publication code path is no longer responsible for starting timers. (It can still *replace* a timer if the current timer is set too far in the future, but it can't create a new one from nothing.)

richardlarocque · 2016-05-23T23:45:52Z

I know it may seem weird to keep a timer around even when the exchange is idle. However, I think the other options are less good.

We could instead track whether or not a timer has been set with a flag in the State record. That's a little cleaner, but it risks getting out of sync.

Maintaining the extra timer eliminates some code paths that would be needed to handle the state flag. It's not more expensive, since we need to check the timer every time we publish anyway, in order to check if we need to restart the timer with a shorter delay.

michaelklishin · 2016-05-24T07:40:05Z

Thank you, we will consider this.

hairyhum · 2016-06-01T13:12:15Z

src/rabbit_delayed_message.erl

-    {noreply, State};
+handle_cast(go, State = #state{}) ->
+    delay_first(),
+    {noreply, State#state{timer = delay_first()}};


There are two delay_first in a row. Is that on purpose?

richardlarocque · 2016-06-01T21:58:28Z

Thanks a lot for the review.

The extra delay_first() call is indeed a bug. The second comment probably indicates a real bug, too, but the solution to it is not as obvious. Let me know what you think of it and I'll update the PR accordingly.

michaelklishin · 2016-06-03T10:14:25Z

Let close this in favour of #53.

richardlarocque added 2 commits May 23, 2016 16:07

Update RabbitMQ components file

f8f1cb6

Run `make rabbitmq-components-mk` and commit the result.

michaelklishin self-assigned this May 24, 2016

michaelklishin assigned hairyhum and unassigned michaelklishin May 27, 2016

hairyhum reviewed Jun 1, 2016
View reviewed changes

richardlarocque mentioned this pull request Jun 2, 2016

Track timer state to optimize message publication #53

Closed

michaelklishin closed this Jun 3, 2016

richardlarocque mentioned this pull request Jun 3, 2016

Track timer state to optimize message publication (stable branch) #54

Merged

michaelklishin mentioned this pull request Jun 22, 2016

Seeing the number of messages in the exchange #59

Closed

michaelklishin mentioned this pull request Sep 28, 2018

how can view messages in delay-queue ? #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize publication path when messages are expiring #51

Optimize publication path when messages are expiring #51

Uh oh!

richardlarocque commented May 23, 2016

Uh oh!

richardlarocque commented May 23, 2016

Uh oh!

michaelklishin commented May 24, 2016

Uh oh!

hairyhum Jun 1, 2016

Uh oh!

richardlarocque commented Jun 1, 2016

Uh oh!

michaelklishin commented Jun 3, 2016

Uh oh!

Uh oh!

Optimize publication path when messages are expiring #51

Optimize publication path when messages are expiring #51

Uh oh!

Conversation

richardlarocque commented May 23, 2016

Uh oh!

richardlarocque commented May 23, 2016

Uh oh!

michaelklishin commented May 24, 2016

Uh oh!

hairyhum Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

richardlarocque commented Jun 1, 2016

Uh oh!

michaelklishin commented Jun 3, 2016

Uh oh!

Uh oh!