Skip to content

Use system process rss memory when reporting vm memory #1259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 19, 2017

Conversation

hairyhum
Copy link
Contributor

@hairyhum hairyhum commented Jun 12, 2017

Memory reported by erlang:memory(total) is not supposed to
be equal to the total size of all pages mapped to the emulator,
according to http://erlang.org/doc/man/erlang.html#memory-0
erlang:memory(total) under-reports memory usage by around 20%

Closes #1223.
[#145451399]

Memory reported by erlang:memory(total) is not supposed to
be equal to the total size of all pages mapped to the emulator,
according to http://erlang.org/doc/man/erlang.html#memory-0
erlang:memory(total) under-reports memory usage by around 20%

[#1223]
[#145451399]
get_ps_memory();

get_system_process_resident_memory({unix, linux}) ->
get_ps_memory();
Copy link
Collaborator

@lukebakken lukebakken Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading from /proc/PID/smaps is an option, I'll code up an example.

Copy link
Collaborator

@lukebakken lukebakken Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/lukebakken/03d92dcdaecce8bd48d1639f9545295d

The only reason I'm suggesting this is that reading from /proc should be faster than starting up a sub-process.

Output:

$ ./get-rss.escript $$
RSS: 8588 KiB

$ ps -h -o rss -p $$
 8588

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lukebakken that this is a better solution for Linux. However, note that this is specific to Linux.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we went with ps mainly because it works the same across both Linux & BSD.

I would prefer to go with ps to begin with, then improve the approach as/when we learn about its limitations.

Would it be worth doing some benchmarks?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this code is not executed on the hot path. Compatibility with more OSes seems like a bigger gain to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I didn't know how often this stat is collected 😄

get_ps_memory();

get_system_process_resident_memory({unix,freebsd}) ->
{error, not_implemented_for_os};
Copy link
Collaborator

@dumbbell dumbbell Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of get_ps_memory() will work on FreeBSD and probably other *BSD as well.

The command line you used is portable to POSIX systems by the way. The only non-portable part is the field name (rss).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at various systems' manpages: all common *BSD support the rss field (DragonFlyBSD, FreeBSD, NetBSD, OpenBSD). Solaris should support it too.

Plan9 and HP-UX do not provide a POSIX ps(1), and I couldn't find manpages for AIX.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me wants to add Plan9 VMs in CI!

{error, not_implemented_for_os};

get_system_process_resident_memory({win32,_OSname}) ->
{error, not_implemented_for_os};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm the difference on Windows too: for a simple Erlang shell, Windows reports 19 MiB but erlang:memory(total). reports 14 MiB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bad example, please disregard this comment.

@dumbbell
Copy link
Collaborator

@gerhard and @hairyhum, do you have examples where erlang:memory(total) underestimates the memory footprint?

A freshly started RabbitMQ without messages (or an Erlang shell) is a bad example: it doesn't reflect a production situation and is the worst case scenario because the memory allocated by the kernel to start the process (so outside of the Erlang VM scope) is significant compared to the memory allocated by the Erlang VM.

According to AIX, SunOS and BSD docs, ps command can have rss format,
so get_ps_memory should work.
We already requiring tasklist for wait command, so we can use it here.
@hairyhum
Copy link
Contributor Author

@dumbbell
Log in to long-running-environment

ps -p `pgrep beam` -o rss
#   RSS
# 450784
rabbitmqctl eval 'erlang:memory(total).'
# 170767520

@gerhard
Copy link
Contributor

gerhard commented Jun 13, 2017

It's actually worse than we thought:

Nodename ps (bytes) erlang (bytes) Off by (%)
rabbit@rmq0-rmq-36-20170608 441290752 175692152 61%
rabbit@rmq1-rmq-36-20170608 663388160 327849080 51%
rabbit@rmq2-rmq-36-20170608 335593472 165004064 51%
ps_bytes_mem() {
  KiB=$(ps -p $(pgrep beam) -o rss=)
  echo $((KiB * 1024))
}

erlang_bytes_mem() {
  rabbitmqctl eval 'erlang:memory(total).'
}

diff() {
  ps=$(ps_bytes_mem)
  erl=$(erlang_bytes_mem)
  echo $((100 - erl * 100 / ps))
}

echo "
| Nodename | \`ps\` (bytes) | \`erlang\` (bytes) | Off by (%) |
|-|-|-|-|
| $RABBITMQ_NODENAME | $(ps_bytes_mem) | $(erlang_bytes_mem) | $(diff)% |
"

@dumbbell can you sanity check this? The difference seems too big to me.

Account for extra memory that was reported in other_ets

[#145451399]

Signed-off-by: Gerhard Lazu <[email protected]>
gerhard pushed a commit to rabbitmq/rabbitmq-management-agent that referenced this pull request Jun 14, 2017
@gerhard
Copy link
Contributor

gerhard commented Jun 14, 2017

@michaelklishin feel free to use this in the release notes:

As of this release, RabbitMQ will display the memory usage as reported by the operating system, not by the Erlang VM. According to the official Erlang documentation, total [memory] value is not supposed to be equal to the total size of all pages mapped to the emulator. Also, because of fragmentation and pre[-]reservation of memory areas, the size of the memory segments containing the dynamically allocated memory blocks can be much larger than the total size of the dynamically allocated memory blocks. This means that in some cases, RabbitMQ would significantly under-report the used memory. For example, Erlang VM would report 167MB used memory while the OS would report 420MB used by the Erlang VM process.

While it may appear that as of this version RabbitMQ is using more memory, it's just a result of switching from Erlang VM memory reporting to OS memory reporting.

Since memory usage is used to trigger memory alarms, publishers will be blocked more often. This change will improve stability since RabbitMQ is less likely to be killed by the OS due to out-of-memory errors.

OS memory reporting is now enabled by default. To disable, set {vm_memory_use_process_rss, false} in the rabbit config.

@gerhard
Copy link
Contributor

gerhard commented Jun 14, 2017

@michaelklishin @dumbbell ready for QA

gerhard pushed a commit to rabbitmq/rabbitmq-website that referenced this pull request Jun 14, 2017
%% erlang:memory(total) under-reports memory usage by around 20%
-spec get_used_memory() -> Bytes :: integer().
get_used_memory() ->
case application:get_env(rabbit, vm_memory_use_process_rss, false) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this match the value in PROJECT_ENV in the Makefile (true)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because this module is moved to rabbitmq-common in the resolve-deps-circles branch to solve interdependencies between the broker, the Erlang client and the rabbitmq-common. But the configuration key will remain in the rabbit application.

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #1264 to add the new setting to the cuttlefish schema.

@dumbbell
Copy link
Collaborator

I have one concern with this approach: RSS reports the physical memory footprint, unlike erlang:memory(total) which reports the used memory.

We see that RSS is often larger compared to erlang:memory(total), probably because of pre-allocations done by the Erlang VM in order to improve performance. However, if memory pages used by the Erlang VM are paged out (by the kernel; I'm not talking about our internal paging of queue messages) for whatever reason, the RSS value will remain low while erlang:memory(total) could grow beyond the actual physical memory size.

It's possible that today, RabbitMQ doesn't behave nicely w.r.t. queue messages paging and blocking producers when the host starts to swap. But if we use RSS, the behavior will change for sure: I'm wondering if it will be better or worse.

@michaelklishin
Copy link
Collaborator

@dumbbell that's a valid concern. With this change it is more likely that publishers will be blocked earlier, so kernel swapping is less likely with this change. Would you agree?

If RabbitMQ node process is being swapped out, I'm afraid our throttling mechanisms won't be very effective either way.

@dumbbell
Copy link
Collaborator

Yes, I agree because in this patch, we are comparing comparable numbers (physical memory footprint vs. physical memory size), unlike before (memory consumption from Erlang applications PoV vs. physical memory size). Thus the new calculation is not just more accurate: it starts to make sense :-)

After thinking a bit more about this, the only situations I see where this could be worse would be:

  • Another process is eating all available memory.
  • Memory ballooning is being used and the "physical" memory size was reduced.

We do not support the second situation anyway because we only read the physical memory size during startup. So if it changes later, we don't know and calculations will be off anyway.

Perhaps in the future, RabbitMQ should decide about queue messages paging and blocking producers by looking at the free memory (+ cache), i.e. something close to what we do with disk space. I don't know, just a random thought.

@michaelklishin
Copy link
Collaborator

@dumbbell @lukebakken @hairyhum @gerhard @kjnilsson @dcorbacho @acogoluegnes we already have a feature flag for going back to the old calculation method.

I suggest that we rename rabbit.vm_memory_use_process_rss (a boolean) to rabbit.vm_memory_calculation_strategy (which can be rss, erlang, and who knows what else in the future, defaulting to rss). This will leave the door open for N more options in the future without having to add N more feature flags.

I'm also curious what @jerrykuch and @carlhoerberg think.

@dumbbell
Copy link
Collaborator

Good idea about the parameter name!

@gerhard
Copy link
Contributor

gerhard commented Jun 14, 2017

Good point, happy to change the flag.

What should we do when a user sets an unsupported flag? I would prefer defaulting to rss & logging a warning over crashing.

@michaelklishin
Copy link
Collaborator

@gerhard default to rss and log a warning.

@michaelklishin
Copy link
Collaborator

@gerhard note that with Cuttlefish unsupported values for enum types will terminate boot. So will unreadable certificate and key files and other config validation failures.

@gerhard
Copy link
Contributor

gerhard commented Jun 14, 2017

That's good to know, glad that cuttlefish guards against incorrect configuration in 3.7

Will default to rss & log a warning in 3.6

gerhard pushed a commit to rabbitmq/rabbitmq-website that referenced this pull request Jun 14, 2017
@gerhard
Copy link
Contributor

gerhard commented Jun 14, 2017

@michaelklishin how about now?

@michaelklishin
Copy link
Collaborator

I'm reliably getting 7 failures in unit_inbroker_non_parallel_SUITE:

unit_inbroker_non_parallel_SUITE > non_parallel_tests > app_management
    #1. {error,
            {{badmatch,
                 {badrpc,
                     {'EXIT',
                         {noproc,
                             {gen_server,call,
                                 [vm_memory_monitor,get_memory_limit,
                                  infinity]}}}}},
             [{unit_inbroker_non_parallel_SUITE,app_management1,1,
                  [{file,"test/unit_inbroker_non_parallel_SUITE.erl"},
                   {line,109}]},
              {rpc,'-handle_call_call/6-fun-0-',5,
                  [{file,"rpc.erl"},{line,197}]}]}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > channel_statistics
    #1. {error,
            {badarg,
                [{ets,lookup,
                     [rabbit_exchange,
                      {resource,<<"/">>,exchange,<<"amq.rabbitmq.trace">>}],
                     []},
                 {rabbit_misc,dirty_read,1,
                     [{file,"src/rabbit_misc.erl"},{line,394}]},
                 {rabbit_trace,init,1,
                     [{file,"src/rabbit_trace.erl"},{line,48}]},
                 {rabbit_channel,init,1,
                     [{file,"src/rabbit_channel.erl"},{line,386}]},
                 {gen_server2,init_it,6,
                     [{file,"src/gen_server2.erl"},{line,554}]},
                 {proc_lib,init_p_do_apply,3,
                     [{file,"proc_lib.erl"},{line,247}]}]}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > disk_monitor
    #1. {error,{noproc,{gen_server,call,
                                   [rabbit_sup,
                                    {terminate_child,rabbit_disk_monitor_sup},
                                    infinity]}}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > disk_monitor_enable
    #1. {error,{noproc,{gen_server,call,
                                   [rabbit_sup,
                                    {terminate_child,rabbit_disk_monitor_sup},
                                    infinity]}}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > file_handle_cache
    #1. {error,{noproc,{gen_server2,call,
                                    [file_handle_cache,get_limit,infinity]}}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > head_message_timestamp_statistics
    #1. {error,
            {{badmatch,
                 {error,
                     {badarg,
                         [{ets,lookup,
                              [rabbit_exchange,
                               {resource,<<"/">>,exchange,
                                   <<"amq.rabbitmq.trace">>}],
                              []},
                          {rabbit_misc,dirty_read,1,
                              [{file,"src/rabbit_misc.erl"},{line,394}]},
                          {rabbit_trace,init,1,
                              [{file,"src/rabbit_trace.erl"},{line,48}]},
                          {rabbit_channel,init,1,
                              [{file,"src/rabbit_channel.erl"},{line,386}]},
                          {gen_server2,init_it,6,
                              [{file,"src/gen_server2.erl"},{line,554}]},
                          {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,247}]}]}}},
             [{rabbit_ct_broker_helpers,test_channel,0,
                  [{file,"src/rabbit_ct_broker_helpers.erl"},{line,1191}]},
              {unit_inbroker_non_parallel_SUITE,test_spawn,0,
                  [{file,"test/unit_inbroker_non_parallel_SUITE.erl"},
                   {line,709}]},
              {unit_inbroker_non_parallel_SUITE,head_message_timestamp1,1,
                  [{file,"test/unit_inbroker_non_parallel_SUITE.erl"},
                   {line,651}]},
              {rpc,'-handle_call_call/6-fun-0-',5,
                  [{file,"rpc.erl"},{line,197}]}]}}

unit_inbroker_non_parallel_SUITE > non_parallel_tests > log_management
    #1. {error,
            {badarg,
                [{erlang,process_info,[undefined,group_leader],[]},
                 {unit_inbroker_non_parallel_SUITE,override_group_leader,0,
                     [{file,"test/unit_inbroker_non_parallel_SUITE.erl"},
                      {line,386}]},
                 {unit_inbroker_non_parallel_SUITE,log_management1,1,
                     [{file,"test/unit_inbroker_non_parallel_SUITE.erl"},
                      {line,187}]},
                 {rpc,'-handle_call_call/6-fun-0-',5,
                     [{file,"rpc.erl"},{line,197}]}]}}

unit_inbroker_parallel_SUITE > parallel_tests > confirms
    #1. {error,killed}

Here's a CT log directory for that suite:

deps.rabbit.unit_inbroker_non_parallel_SUITE.logs.1.zip

@michaelklishin
Copy link
Collaborator

I can reproduce this on two machines reliably even after gmake distclean.

@michaelklishin
Copy link
Collaborator

I cannot reproduce the failures in stable.

@gerhard
Copy link
Contributor

gerhard commented Jun 15, 2017

You're too good (sounds better than we're terrible). Thanks for spotting this, will go over it more thoroughly.

Tests were failing because they required vm_memory_monitor process to be running

re #1259

[#145451399]

Signed-off-by: Gerhard Lazu <[email protected]>
@gerhard
Copy link
Contributor

gerhard commented Jun 15, 2017

All failing tests are now passing. Running all broker tests, ok to QA again.

@michaelklishin
Copy link
Collaborator

The tests now pass, we are doing more observatory testing on more OS'es. There's also at least one place in management-agent that needs updating.

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows 10 looks good:

C:\Users\lbakken>tasklist /fi "pid eq 8248" /fo LIST 2>&1

Image Name:   erl.exe
PID:          8248
Session Name: Console
Session#:     1
Mem Usage:    65,860 K
(rabbit@DSCH-WIN)2> os:getpid().
"8248"
(rabbit@DSCH-WIN)3> rabbit_vm:total_memory().
67440640

@gerhard
Copy link
Contributor

gerhard commented Jun 19, 2017

@michaelklishin just PR-ed management-agent, anything else that you can think of?

@michaelklishin michaelklishin merged commit d817366 into stable Jun 19, 2017
@lukebakken lukebakken deleted the rabbitmq-server-1223 branch June 20, 2017 12:49
CmdOutput = os:cmd(Cmd),
%% Memory usage is displayed in kilobytes
%% with comma-separated thousands
case re:run(CmdOutput, "Mem Usage:\\s+([0-9,]+)\\s+K", [{capture, all_but_first, list}]) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of tasklist is localized, thus this function can only work on an English Windows.

Here is a sample output from a French Windows:

C:\Users\Jean-Sébastien>tasklist /fi "pid eq 7116" /fo LIST

Nom de l’image:      explorer.exe
PID:                 7116
Nom de la session:   Console
Numéro de session:   1
Utilisation mémoire: 72 008 Ko

C:\Users\Jean-Sébastien>

Thus:

  • the label is translated ("Utilisation mémoire")
  • the number is localized (a space to separate thousands).

Perhaps there is a way to set the locale for a particular command, but I don't know how.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to fallback to Erlang memory calculation for RabbitMQ installations on French Windows?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dumbbell good find. I'll look into this today.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to find this but no details about how that can be adapted for one-off commands.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently chcp does not change local-specific formatting of numbers/dates/etc. So it's not an option here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants