Skip to content

4.x: Optimise msg_store recovery in case of large message file #13250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

gomoripeti
Copy link
Contributor

Proposed Changes

Since 4.0.0 (commit d45fbc3) the shared message store writes large messages into their own rdq files. This information can be utilised when scanning rdq files during recovery to avoid reading in the whole message body into memory unnecessarily.

This commit addresses the same issue that was addressed in 3.13.x by commit baeefbe (ie. appending a large binary together from 4MB chunks leaves a lot of garbage and memory fragmentation behind) but even more efficiently.

Large messages which were written before 4.0.0, which don't fully fill the rdq file, are still handled as before.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution
you did and what alternatives you considered, etc.

@@ -1537,6 +1537,39 @@ scan_data(<<Size:64, MsgIdAndMsg:Size/binary, 255, Rest/bits>> = Data,
scan_data(Rest2, Fd, Fun, Offset + 1, FileSize, MsgIdsFound, Acc)
end
end;
%% Large message alone in its own file
scan_data(<<Size:64, MsgIdInt:128, _Rest/bits>> = Data, Fd, Fun, Offset, FileSize, _MsgIdsFound, _Acc)
when Size >= ?LARGE_MESSAGE_THRESHOLD + 16,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe the check for LARGE_MESSAGE_THRESHOLD is not even necessary, if Offset == 0 and FileSize == Size + 9 this means that this message is alone in this file and can be optimised. And the fact that the previous close did not match means this cannot be a very small message)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the unnecessary check.

case file:pread(Fd, FileSize - 1, 1) of
{ok, <<255>>} ->
%% Avoid sub-binary construction.
MsgId = <<MsgIdInt:128>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is admittedly quite a copy-paste

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a new function.

@lhoguin
Copy link
Contributor

lhoguin commented Feb 13, 2025

Does this really make a big difference? Did you measure it?

@gomoripeti
Copy link
Contributor Author

This came up in a real world case. A small server had 2 ~100MB rdq files (a misuse obviously, if one sends abusively large messages at least do it on a larger server). RabbitMQ couldn't start and went OOM with rss:1271380kB. With this patch RabbitMQ allocates minimal memory (the initial 4MB read block). I "tested" just calling rabbit_msg_store:scan_file_for_valid_messages("100MB.rdq") before and after the patch and looking at the memory graph in observer.

During startup/recovery there is always a smaller or larger memory spike. As operators we would like to have that recovery as smooth as possible so we don't have to manually intervene. So even an improvement in an edge case matters for us.

After answering I re-read the previous discussion #11072
The main issue is the appending by chunks, which is solved in 3.13 by reading large messages in one chunk (not forward ported to 4.x), and this PR proposes to not read at all in 4.x.

I came up with this little test program, which demonstrates that 3-4x more binary memory is allocated than the actual message size. (OTP 27.2.1)

11> f(Read), Read = fun R(Data, Fd) -> case file:read(Fd, 4*1024*1024) of {ok, Bin} -> Bin2 = <<Data/binary, Bin/binary>>, io:format("~p~n~p~n~n", [byte_size(Bin2), process_info(self(), [binary])]), R(Bin2, Fd); eof -> io:format("~p~n~ntotal size ~p", [process_info(self(), [binary]), byte_size(Data)]) end end.
#Fun<erl_eval.18.18682967>
12> f(Pid), Pid = spawn(fun() -> {ok, Fd} = file:open("364.rdq", [read, raw, binary]), Read(<<>>, Fd) end).
<0.102.0>

4194304
[{binary,[{5244131056,256,1},{4591910952,8388608,1},{4583489576,4194304,1}]}]

8388608
[{binary,[{4591910952,4194304,1},
          {4604526632,8388608,1},
          {4600315944,4194304,1},
          {4587700264,4194304,1}]}]

12582912
[{binary,[{4604526632,8388608,1},
          {4621336616,16777216,1},
          {4583489576,4194304,1},
          {4591910952,4194304,1}]}]

16777216
[{binary,[{4621336616,15099494,1},
          {4650729512,25165824,1},
          {4600315944,4194304,1},
          {4638130216,12582912,1},
          {4591910952,4194304,1}]}]

20971520
[{binary,[{4650729512,16777216,1},
          {4831838248,33554432,1},
          {4583489576,4194304,1},
          {4591910952,4194304,1}]}]

25165824
[{binary,[{4831838248,20971520,1},
          {4891607080,25165824,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1}]}]

29360128
[{binary,[{4891607080,25165824,1},
          {5100273704,30198988,1},
          {4583489576,4194304,1},
          {4916789288,25165824,1},
          {4591910952,4194304,1}]}]

33554432
[{binary,[{5100273704,30198988,1},
          {5502926888,35232153,1},
          {4600315944,4194304,1},
          {5175771176,29360128,1},
          {4591910952,4194304,1}]}]

37748736
[{binary,[{5502926888,35232153,1},
          {5571739688,40265318,1},
          {4583489576,4194304,1},
          {5538168872,33554432,1},
          {4591910952,4194304,1}]}]

41943040
[{binary,[{5571739688,37748736,1},
          {5612011560,45298483,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1}]}]

46137344
[{binary,[{5612011560,45298483,1},
          {5657313320,50331648,1},
          {4583489576,4194304,1},
          {4642291752,41943040,1},
          {4591910952,4194304,1}]}]

50331648
[{binary,[{5657313320,46137344,1},
          {5707661352,55364812,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1}]}]

54525952
[{binary,[{5707661352,55364812,1},
          {5763039272,60397977,1},
          {4583489576,4194304,1},
          {5175771176,50331648,1},
          {4591910952,4194304,1}]}]

58720256
[{binary,[{5763039272,60397977,1},
          {5823447080,65431142,1},
          {4600315944,4194304,1},
          {5707661352,54525952,1},
          {4591910952,4194304,1}]}]

62914560
[{binary,[{5823447080,58720256,1},
          {5947621416,70464307,1},
          {4583489576,4194304,1}]}]

67108864
[{binary,[{5947621416,62914560,1},
          {6010552360,75497472,1},
          {4600315944,4194304,1}]}]

71303168
[{binary,[{6010552360,67108864,1},
          {5565857832,80530636,1},
          {4583489576,4194304,1}]}]

75497472
[{binary,[{6010552360,67108864,1},
          {5565857832,71303168,1},
          {6176636968,85563801,1},
          {4583489576,4194304,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1}]}]

79691776
[{binary,[{6176636968,75497472,1},
          {6337724456,90596966,1},
          {4587700264,4194304,1}]}]

83886080
[{binary,[{6337724456,79691776,1},
          {13958643752,95630131,1},
          {4583489576,4194304,1},
          {4591910952,4194304,1}]}]

88080384
[{binary,[{6337724456,79691776,1},
          {13958643752,95630131,1},
          {5800788008,100663296,1},
          {4583489576,4194304,1},
          {4591910952,4194304,1},
          {4587700264,4194304,1},
          {5716885544,83886080,1},
          {5947621416,4194304,1}]}]

92274688
[{binary,[{6337724456,79691776,1},
          {5800788008,88080384,1},
          {14054277160,105696460,1},
          {4600315944,4194304,1}]}]

96468992
[{binary,[{14054277160,92274688,1},
          {5502926888,110729625,1},
          {4583489576,4194304,1}]}]

100663296
[{binary,[{5502926888,96468992,1},
          {5951832104,115762790,1},
          {4600315944,4194304,1}]}]

104857600
[{binary,[{5951832104,100663296,1},
          {14146568232,120795955,1},
          {4583489576,4194304,1},
          {4591910952,4194304,1}]}]

109051904
[{binary,[{14146568232,120795955,1},
          {14267367464,125829120,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1}]}]

113246208
[{binary,[{14146568232,120795955,1},
          {14267367464,125829120,1},
          {14502281256,130862284,1},
          {4600315944,4194304,1},
          {4591910952,4194304,1},
          {4583489576,4194304,1},
          {14393212968,109051904,1},
          {6337724456,4194304,1}]}]

116882124
[{binary,[{14502281256,130862284,1},
          {5800771624,135895449,1},
          {4587700264,3635916,1},
          {14146568232,113246208,1},
          {4591910952,3635916,1}]}]

[{binary,[{14502281256,130862284,1},
          {5800771624,135895449,1},
          {4587700264,3635916,1},
          {14146568232,113246208,1},
          {4591910952,3635916,1}]}]

total size 116882124

%% It is quite unexpected that FileSize matches the
%% expected Size but the end marker is wrong.
%% There is no compaction in case of large message files
%% so it can only happen with byte corruption?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But best to try recovering messages in the file anyway.

@lhoguin
Copy link
Contributor

lhoguin commented Feb 13, 2025

Thank you. The reduced memory usage in these cases is good enough to want this in.

@gomoripeti
Copy link
Contributor Author

thanks Loic, will address your feedback

Since 4.0.0 (commit d45fbc3) the shared message store writes large
messages into their own rdq files. This information can be utilised
when scanning rdq files during recovery to avoid reading in the whole
message body into memory unnecessarily.

This commit addresses the same issue that was addressed in 3.13.x by
commit baeefbe (ie. appending a large binary together from 4MB chunks
leaves a lot of garbage and memory fragmentation behind) but even more
efficiently.

Large messages which were written before 4.0.0, which don't fully fill
the rdq file, are still handled as before.
@gomoripeti gomoripeti force-pushed the large_message_rdq_scan branch from 154fa6f to fb21a19 Compare February 14, 2025 16:17
@gomoripeti
Copy link
Contributor Author

addressed the comments and rebased to latest main

@michaelklishin michaelklishin changed the title Optimise msg_store recovery in case of large message file 4.x: Optimise msg_store recovery in case of large message file Feb 14, 2025
@michaelklishin michaelklishin added this to the 4.0.7 milestone Feb 14, 2025
@michaelklishin michaelklishin merged commit 0eb65c2 into rabbitmq:main Feb 19, 2025
265 of 268 checks passed
michaelklishin added a commit that referenced this pull request Feb 20, 2025
4.x: Optimise msg_store recovery in case of large message file (backport #13250)
michaelklishin added a commit that referenced this pull request Feb 20, 2025
4.x: Optimise msg_store recovery in case of large message file (backport #13250) (backport #13373)
@gomoripeti gomoripeti deleted the large_message_rdq_scan branch May 6, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants