-
Notifications
You must be signed in to change notification settings - Fork 154
Optimize publication path when messages are expiring #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize publication path when messages are expiring #51
Conversation
Run `make rabbitmq-components-mk` and commit the result.
Replaces `maybe_delay_first` with `delay_first`. Modfies the behavior so it ensures that there is always one timer active, even when there are no delayed messages in the exchange. This timer is not used for anything except book-keeping, so we set it to expire far in the future (1 hour) so it doesn't gobble up resources for no reason. Previously the `erlang:read_timer(CurrTimer) == false` code path could be hit in two scenarios: - No timer had been set. The message currently being added is the only one in the delay queue. - The timer has already expired. The delay queue is full of messages ready for publication and the exchange has not yet caught up. In the former case, the exchange should start a new timer. In the latter, the exchange is far behind and starting a new timer will only waste time. The new invariant makes the first of those scenarios impossible. Updates the publication code path to take advantage of this change. Since the first of those two scenarios is no longer possible, the publication code path is no longer responsible for starting timers. (It can still *replace* a timer if the current timer is set too far in the future, but it can't create a new one from nothing.)
I know it may seem weird to keep a timer around even when the exchange is idle. However, I think the other options are less good. We could instead track whether or not a timer has been set with a flag in the State record. That's a little cleaner, but it risks getting out of sync. Maintaining the extra timer eliminates some code paths that would be needed to handle the state flag. It's not more expensive, since we need to check the timer every time we publish anyway, in order to check if we need to restart the timer with a shorter delay. |
Thank you, we will consider this. |
{noreply, State}; | ||
handle_cast(go, State = #state{}) -> | ||
delay_first(), | ||
{noreply, State#state{timer = delay_first()}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two delay_first
in a row. Is that on purpose?
Thanks a lot for the review. The extra |
Let close this in favour of #53. |
We saw some unusual CPU spikes when we migrated our direct exchanges to a delayed exchange in a heavy traffic environment. The CPU spikes did not correlate well with the amount of traffic, making this behavior very suspicious.
After some investigation [1], we found that both publication to the exchange and message timeout expiry were quite fast. However, when publication and message expiry were occurring at the same time, the publication rate plummeted and CPU usage skyrocketed.
The root cause of this issue appears to be the branch of code intended to start a timer when we publish the first delayed message to an empty exchange. That code is triggered by accident when we publish new messages while the exchange is falling behind in its handling of expired messages. That code path happens to be much slower than the regular case.
This PR avoids the issue by moving the responsibility for starting the initial timer from the publication code path to the exchange itself. Now it maintains a timer in the distant future as a placeholder even when there are no delayed messages sitting in the queue.
[1] https://groups.google.com/forum/#!topic/rabbitmq-users/XgjY7UtLkfs