-
Notifications
You must be signed in to change notification settings - Fork 3.9k
4.x: Optimise msg_store recovery in case of large message file #13250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.x: Optimise msg_store recovery in case of large message file #13250
Conversation
deps/rabbit/src/rabbit_msg_store.erl
Outdated
@@ -1537,6 +1537,39 @@ scan_data(<<Size:64, MsgIdAndMsg:Size/binary, 255, Rest/bits>> = Data, | |||
scan_data(Rest2, Fd, Fun, Offset + 1, FileSize, MsgIdsFound, Acc) | |||
end | |||
end; | |||
%% Large message alone in its own file | |||
scan_data(<<Size:64, MsgIdInt:128, _Rest/bits>> = Data, Fd, Fun, Offset, FileSize, _MsgIdsFound, _Acc) | |||
when Size >= ?LARGE_MESSAGE_THRESHOLD + 16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(maybe the check for LARGE_MESSAGE_THRESHOLD
is not even necessary, if Offset == 0
and FileSize == Size + 9
this means that this message is alone in this file and can be optimised. And the fact that the previous close did not match means this cannot be a very small message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the unnecessary check.
deps/rabbit/src/rabbit_msg_store.erl
Outdated
case file:pread(Fd, FileSize - 1, 1) of | ||
{ok, <<255>>} -> | ||
%% Avoid sub-binary construction. | ||
MsgId = <<MsgIdInt:128>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is admittedly quite a copy-paste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a new function.
Does this really make a big difference? Did you measure it? |
This came up in a real world case. A small server had 2 ~100MB rdq files (a misuse obviously, if one sends abusively large messages at least do it on a larger server). RabbitMQ couldn't start and went OOM with rss:1271380kB. With this patch RabbitMQ allocates minimal memory (the initial 4MB read block). I "tested" just calling During startup/recovery there is always a smaller or larger memory spike. As operators we would like to have that recovery as smooth as possible so we don't have to manually intervene. So even an improvement in an edge case matters for us. After answering I re-read the previous discussion #11072 I came up with this little test program, which demonstrates that 3-4x more binary memory is allocated than the actual message size. (OTP 27.2.1) 11> f(Read), Read = fun R(Data, Fd) -> case file:read(Fd, 4*1024*1024) of {ok, Bin} -> Bin2 = <<Data/binary, Bin/binary>>, io:format("~p~n~p~n~n", [byte_size(Bin2), process_info(self(), [binary])]), R(Bin2, Fd); eof -> io:format("~p~n~ntotal size ~p", [process_info(self(), [binary]), byte_size(Data)]) end end.
#Fun<erl_eval.18.18682967>
12> f(Pid), Pid = spawn(fun() -> {ok, Fd} = file:open("364.rdq", [read, raw, binary]), Read(<<>>, Fd) end).
<0.102.0>
4194304
[{binary,[{5244131056,256,1},{4591910952,8388608,1},{4583489576,4194304,1}]}]
8388608
[{binary,[{4591910952,4194304,1},
{4604526632,8388608,1},
{4600315944,4194304,1},
{4587700264,4194304,1}]}]
12582912
[{binary,[{4604526632,8388608,1},
{4621336616,16777216,1},
{4583489576,4194304,1},
{4591910952,4194304,1}]}]
16777216
[{binary,[{4621336616,15099494,1},
{4650729512,25165824,1},
{4600315944,4194304,1},
{4638130216,12582912,1},
{4591910952,4194304,1}]}]
20971520
[{binary,[{4650729512,16777216,1},
{4831838248,33554432,1},
{4583489576,4194304,1},
{4591910952,4194304,1}]}]
25165824
[{binary,[{4831838248,20971520,1},
{4891607080,25165824,1},
{4600315944,4194304,1},
{4591910952,4194304,1}]}]
29360128
[{binary,[{4891607080,25165824,1},
{5100273704,30198988,1},
{4583489576,4194304,1},
{4916789288,25165824,1},
{4591910952,4194304,1}]}]
33554432
[{binary,[{5100273704,30198988,1},
{5502926888,35232153,1},
{4600315944,4194304,1},
{5175771176,29360128,1},
{4591910952,4194304,1}]}]
37748736
[{binary,[{5502926888,35232153,1},
{5571739688,40265318,1},
{4583489576,4194304,1},
{5538168872,33554432,1},
{4591910952,4194304,1}]}]
41943040
[{binary,[{5571739688,37748736,1},
{5612011560,45298483,1},
{4600315944,4194304,1},
{4591910952,4194304,1}]}]
46137344
[{binary,[{5612011560,45298483,1},
{5657313320,50331648,1},
{4583489576,4194304,1},
{4642291752,41943040,1},
{4591910952,4194304,1}]}]
50331648
[{binary,[{5657313320,46137344,1},
{5707661352,55364812,1},
{4600315944,4194304,1},
{4591910952,4194304,1}]}]
54525952
[{binary,[{5707661352,55364812,1},
{5763039272,60397977,1},
{4583489576,4194304,1},
{5175771176,50331648,1},
{4591910952,4194304,1}]}]
58720256
[{binary,[{5763039272,60397977,1},
{5823447080,65431142,1},
{4600315944,4194304,1},
{5707661352,54525952,1},
{4591910952,4194304,1}]}]
62914560
[{binary,[{5823447080,58720256,1},
{5947621416,70464307,1},
{4583489576,4194304,1}]}]
67108864
[{binary,[{5947621416,62914560,1},
{6010552360,75497472,1},
{4600315944,4194304,1}]}]
71303168
[{binary,[{6010552360,67108864,1},
{5565857832,80530636,1},
{4583489576,4194304,1}]}]
75497472
[{binary,[{6010552360,67108864,1},
{5565857832,71303168,1},
{6176636968,85563801,1},
{4583489576,4194304,1},
{4600315944,4194304,1},
{4591910952,4194304,1}]}]
79691776
[{binary,[{6176636968,75497472,1},
{6337724456,90596966,1},
{4587700264,4194304,1}]}]
83886080
[{binary,[{6337724456,79691776,1},
{13958643752,95630131,1},
{4583489576,4194304,1},
{4591910952,4194304,1}]}]
88080384
[{binary,[{6337724456,79691776,1},
{13958643752,95630131,1},
{5800788008,100663296,1},
{4583489576,4194304,1},
{4591910952,4194304,1},
{4587700264,4194304,1},
{5716885544,83886080,1},
{5947621416,4194304,1}]}]
92274688
[{binary,[{6337724456,79691776,1},
{5800788008,88080384,1},
{14054277160,105696460,1},
{4600315944,4194304,1}]}]
96468992
[{binary,[{14054277160,92274688,1},
{5502926888,110729625,1},
{4583489576,4194304,1}]}]
100663296
[{binary,[{5502926888,96468992,1},
{5951832104,115762790,1},
{4600315944,4194304,1}]}]
104857600
[{binary,[{5951832104,100663296,1},
{14146568232,120795955,1},
{4583489576,4194304,1},
{4591910952,4194304,1}]}]
109051904
[{binary,[{14146568232,120795955,1},
{14267367464,125829120,1},
{4600315944,4194304,1},
{4591910952,4194304,1}]}]
113246208
[{binary,[{14146568232,120795955,1},
{14267367464,125829120,1},
{14502281256,130862284,1},
{4600315944,4194304,1},
{4591910952,4194304,1},
{4583489576,4194304,1},
{14393212968,109051904,1},
{6337724456,4194304,1}]}]
116882124
[{binary,[{14502281256,130862284,1},
{5800771624,135895449,1},
{4587700264,3635916,1},
{14146568232,113246208,1},
{4591910952,3635916,1}]}]
[{binary,[{14502281256,130862284,1},
{5800771624,135895449,1},
{4587700264,3635916,1},
{14146568232,113246208,1},
{4591910952,3635916,1}]}]
total size 116882124 |
deps/rabbit/src/rabbit_msg_store.erl
Outdated
%% It is quite unexpected that FileSize matches the | ||
%% expected Size but the end marker is wrong. | ||
%% There is no compaction in case of large message files | ||
%% so it can only happen with byte corruption? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. But best to try recovering messages in the file anyway.
Thank you. The reduced memory usage in these cases is good enough to want this in. |
thanks Loic, will address your feedback |
Since 4.0.0 (commit d45fbc3) the shared message store writes large messages into their own rdq files. This information can be utilised when scanning rdq files during recovery to avoid reading in the whole message body into memory unnecessarily. This commit addresses the same issue that was addressed in 3.13.x by commit baeefbe (ie. appending a large binary together from 4MB chunks leaves a lot of garbage and memory fragmentation behind) but even more efficiently. Large messages which were written before 4.0.0, which don't fully fill the rdq file, are still handled as before.
154fa6f
to
fb21a19
Compare
addressed the comments and rebased to latest main |
4.x: Optimise msg_store recovery in case of large message file (backport #13250)
Proposed Changes
Since 4.0.0 (commit d45fbc3) the shared message store writes large messages into their own rdq files. This information can be utilised when scanning rdq files during recovery to avoid reading in the whole message body into memory unnecessarily.
This commit addresses the same issue that was addressed in 3.13.x by commit baeefbe (ie. appending a large binary together from 4MB chunks leaves a lot of garbage and memory fragmentation behind) but even more efficiently.
Large messages which were written before 4.0.0, which don't fully fill the rdq file, are still handled as before.
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution
you did and what alternatives you considered, etc.