Skip to content

Fix time unit in credit_flow trace events which has potential to crash all messaging components #9433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 14, 2023
Merged

Fix time unit in credit_flow trace events which has potential to crash all messaging components #9433

merged 1 commit into from
Sep 14, 2023

Conversation

Ayanda-D
Copy link
Contributor

@Ayanda-D Ayanda-D commented Sep 14, 2023

Proposed Changes

When CREDIT_FLOW_TRACING is enabled, all messaging components which use flow
control when calling credit_flow:block/1, credit_flow:unblock/1 and/or via the
credit_flow:peer_down/1 and handle_bump_msg/1 calls can all crash, i.e. all
connections, channels, queues, msg-store when they engage in different flow states
(amqp-1.0 included), due to the unsupported os:system_time/1 time unit in use.

The correct time unit (milli_seconds) is/was part of OTP-18 new time API: erlang/otp@6487aac

Example, message store crashing from credit_flow:peer_down/1 and vhost persistently failed to restart.

2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> ** Generic server <0.1416.0> terminating
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> ** Last message in was {'DOWN',#Ref<0.4179644850.2686189571.244506>,process,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                <0.1419.0>,normal}
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> ** When Server state == {msstate,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             "/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4/msg_store_transient",
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             rabbit_msg_store_ets_index,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             {state,#Ref<0.4179644850.2686320643.244493>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                 <<"/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4/msg_store_transient">>},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             0,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             {writer,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                 {file_descriptor,prim_file,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                     #{handle =>
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                           #Ref<0.4179644850.2686320673.241723>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                       owner => <0.1416.0>,r_ahead_size => 0,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                       r_buffer =>
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                           #Ref<0.4179644850.2686320643.244521>}},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                                 #Ref<0.4179644850.2686320643.244522>},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             0,undefined,#{},undefined,<0.1417.0>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             #Ref<0.4179644850.2686320643.244494>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             #Ref<0.4179644850.2686320643.244491>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             #Ref<0.4179644850.2686320643.244495>,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             #Ref<0.4179644850.2686320643.244496>,#{},#{},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             false,16777216,#{},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                             {4000,800}}
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> ** Reason for termination ==
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> ** {badarg,[{os,system_time,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                 [milliseconds],
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                 [{error_info,#{module => erl_kernel_errors}}]},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>             {credit_flow,unblock,1,[{file,"credit_flow.erl"},{line,205}]},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>             {credit_flow,peer_down,1,[{file,"credit_flow.erl"},{line,182}]},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>             {rabbit_msg_store,handle_info,2,
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>                               [{file,"rabbit_msg_store.erl"},{line,959}]},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>             {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]},
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0>             {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
2023-09-14 11:39:57.286651+01:00 [error] <0.1416.0> 
2023-09-14 11:39:57.287280+01:00 [debug] <0.1405.0> rabbit_binding:recover/2 for vhost v2 completed in 0.001303s
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>   crasher:
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     initial call: rabbit_msg_store:init/1
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     pid: <0.1416.0>
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     registered_name: []
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     exception exit: {badarg,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                         [{os,system_time,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [milliseconds],
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{error_info,#{module => erl_kernel_errors}}]},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                          {credit_flow,unblock,1,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{file,"credit_flow.erl"},{line,205}]},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                          {credit_flow,peer_down,1,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{file,"credit_flow.erl"},{line,182}]},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                          {rabbit_msg_store,handle_info,2,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{file,"rabbit_msg_store.erl"},{line,959}]},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                          {gen_server2,handle_msg,2,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{file,"gen_server2.erl"},{line,1056}]},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                          {proc_lib,init_p_do_apply,3,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                              [{file,"proc_lib.erl"},{line,240}]}]}
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>       in function  gen_server2:terminate/3 (gen_server2.erl, line 1172)
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     ancestors: [<0.1404.0>,<0.1298.0>,rabbit_vhost_sup_sup,rabbit_sup,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                   <0.238.0>]
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     message_queue_len: 1
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     messages: [{'EXIT',<0.1417.0>,normal}]
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     links: [<0.1404.0>]
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     dictionary: [{rand_seed,{#{jump => #Fun<rand.3.34006561>,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                                 max => 288230376151711743,
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                                 next => #Fun<rand.5.34006561>,type => exsplus},
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>                               [117221360623263333|243156253367872442]}}]
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     trap_exit: true
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     status: running
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     heap_size: 10958
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     stack_size: 28
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>     reductions: 58216
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0>   neighbours:
2023-09-14 11:39:57.287040+01:00 [error] <0.1416.0> 
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>     supervisor: {<0.1404.0>,rabbit_vhost_sup}
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>     errorContext: child_terminated
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>     reason: {badarg,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                 [{os,system_time,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [milliseconds],
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [{error_info,#{module => erl_kernel_errors}}]},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                  {credit_flow,unblock,1,[{file,"credit_flow.erl"},{line,205}]},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                  {credit_flow,peer_down,1,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [{file,"credit_flow.erl"},{line,182}]},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                  {rabbit_msg_store,handle_info,2,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [{file,"rabbit_msg_store.erl"},{line,959}]},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                  {gen_server2,handle_msg,2,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [{file,"gen_server2.erl"},{line,1056}]},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                  {proc_lib,init_p_do_apply,3,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                      [{file,"proc_lib.erl"},{line,240}]}]}
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>     offender: [{pid,<0.1416.0>},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {id,msg_store_transient},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {mfargs,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                    {rabbit_msg_store,start_link,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                        [<<"v2">>,msg_store_transient,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                         "/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4",
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                         undefined,
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                         {#Fun<rabbit_variable_queue.0.30378138>,ok}]}},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {restart_type,transient},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {significant,false},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {shutdown,600000},
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0>                {child_type,worker}]
2023-09-14 11:39:57.287409+01:00 [error] <0.1404.0> 
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>     supervisor: {<0.1404.0>,rabbit_vhost_sup}
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>     errorContext: shutdown
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>     reason: reached_max_restart_intensity
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>     offender: [{pid,<0.1416.0>},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {id,msg_store_transient},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {mfargs,
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                    {rabbit_msg_store,start_link,
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                        [<<"v2">>,msg_store_transient,
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                         "/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4",
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                         undefined,
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                         {#Fun<rabbit_variable_queue.0.30378138>,ok}]}},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {restart_type,transient},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {significant,false},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {shutdown,600000},
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0>                {child_type,worker}]
2023-09-14 11:39:57.287700+01:00 [error] <0.1404.0> 
2023-09-14 11:39:57.287882+01:00 [info] <0.1420.0> Stopping message store for directory '/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4/msg_store_persistent'
2023-09-14 11:39:57.289926+01:00 [info] <0.1420.0> Message store for directory '/var/folders/2m/178y2y915430cstbh621lr240000gq/T/rabbitmq-test-instances/rabbit@ayandas-mac/mnesia/rabbit@ayandas-mac/msg_stores/vhosts/1LV4ZE5DHYKL6VS1PMGDQC3F4/msg_store_persistent' is stopped
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>     supervisor: {<0.1298.0>,rabbit_vhost_sup_wrapper}
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>     errorContext: child_terminated
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>     reason: shutdown
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>     offender: [{pid,<0.1404.0>},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {id,rabbit_vhost_sup},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {mfargs,{rabbit_vhost_sup_wrapper,start_vhost_sup,[<<"v2">>]}},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {restart_type,permanent},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {significant,false},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {shutdown,infinity},
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0>                {child_type,supervisor}]
2023-09-14 11:39:57.290174+01:00 [error] <0.1298.0> 
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>     supervisor: {<0.1298.0>,rabbit_vhost_sup_wrapper}
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>     errorContext: shutdown
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>     reason: reached_max_restart_intensity
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>     offender: [{pid,<0.1404.0>},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {id,rabbit_vhost_sup},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {mfargs,{rabbit_vhost_sup_wrapper,start_vhost_sup,[<<"v2">>]}},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {restart_type,permanent},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {significant,false},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {shutdown,infinity},
2023-09-14 11:39:57.290281+01:00 [error] <0.1298.0>                {child_type,supervisor}]

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.

…crashing

messaging components which use flow control when calling credit_flow:block/1,
credit_flow:unblock/1 and/or via credit_flow:peer_down/1 and handle_bump_msg/1.
This can crash all messaging components (connections, channels, queues, msg-store)
when trace is enabled and flow is set or cleared (amqp-1.0 included).
@Ayanda-D
Copy link
Contributor Author

OTP-18 milli_seconds works on OTP-25 & 26, but listed as deprecated in OTP-25 docs https://www.erlang.org/docs/25/man/erlang#type-deprecated_time_unit - switched to preferred millisecond

@michaelklishin michaelklishin merged commit d00d03c into rabbitmq:main Sep 14, 2023
@michaelklishin michaelklishin added this to the 3.12.5 milestone Sep 14, 2023
@Ayanda-D Ayanda-D deleted the fix-credit-flow-trace-events-time-unit branch September 14, 2023 13:30
@michaelklishin
Copy link
Collaborator

@Mergifyio backport v3.12.x v3.11.x

@mergify
Copy link

mergify bot commented Sep 14, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants