Skip to content

Take ha-mode into account in choosing queue master #1372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 23, 2017

Conversation

lukebakken
Copy link
Collaborator

Fixes #1371

Copy link
Collaborator

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct but breaks queue_master_locator_SUITE which can be executed with gmake; gmake ct-queue_master_location.

Please add a couple of tests that limit queue replication to 2 nodes out of 3 and e.g. retrieve
next suggested node 200 times, making sure the 3rd node is never returned.

@lukebakken
Copy link
Collaborator Author

@michaelklishin on it

@lukebakken lukebakken changed the title Take ha-mode into account in choosing queue master WIP: Take ha-mode into account in choosing queue master Sep 21, 2017
@lukebakken lukebakken changed the title WIP: Take ha-mode into account in choosing queue master Take ha-mode into account in choosing queue master Sep 21, 2017
@lukebakken
Copy link
Collaborator Author

@michaelklishin ready for round two

Copy link
Collaborator

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still getting one cluster_size_3 > declare_policy_exactly failure out of 5 runs:

gmake distclean; gmake
for i in {1..5}; do gmake ct-queue_master_location; done
queue_master_location_SUITE > cluster_size_3 > declare_policy_exactly
    #1. {error,
            {{assertEqual,
                 [{module,queue_master_location_SUITE},
                  {line,309},
                  {expression,"Rpc"},
                  {expected,{ok,'declare_policy_exactly-2@localhost'}},
                  {value,{ok,'declare_policy_exactly-3@localhost'}}]},
             [{queue_master_location_SUITE,'-verify_min_master/3-fun-0-',2,
                  [{file,"test/queue_master_location_SUITE.erl"},{line,309}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1529}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1045}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,977}]}]}}

@lukebakken
Copy link
Collaborator Author

@michaelklishin OK that's good to know ... exactly is working differently for you than me 😄 I'll take that into account.

@lukebakken lukebakken removed the request for review from gerhard September 21, 2017 21:15
@lukebakken
Copy link
Collaborator Author

@michaelklishin all set for round three. Thanks!

Copy link
Collaborator

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node index assertion fails from time to time:

for i in {1..20}; do gmake ct-queue_master_location t=cluster_size_3:declare_policy_exactly; done
queue_master_location_SUITE > cluster_size_3 > declare_policy_exactly
    #1. {error,
            {{assert,
                 [{module,queue_master_location_SUITE},
                  {line,195},
                  {expression,"MIdx > SIdx"},
                  {expected,true},
                  {value,false}]},
             [{queue_master_location_SUITE,
                  '-declare_policy_exactly/1-fun-2-',2,
                  [{file,"test/queue_master_location_SUITE.erl"},{line,195}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1529}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1045}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,977}]}]}}

@lukebakken
Copy link
Collaborator Author

lukebakken commented Sep 22, 2017

@michaelklishin - yeah, on my run I thought of one other thing ... ha-mode exactly needs to take the min-masters strategy into account when picking the nodes to use, which is what you're seeing. For whatever reason I didn't run into that locally.

@lukebakken lukebakken changed the title Take ha-mode into account in choosing queue master WIP: Take ha-mode into account in choosing queue master Sep 22, 2017
@lukebakken lukebakken changed the title WIP: Take ha-mode into account in choosing queue master Take ha-mode into account in choosing queue master Sep 22, 2017
@lukebakken
Copy link
Collaborator Author

lukebakken commented Sep 22, 2017

@michaelklishin alright, turns out some smarter behavior around the initial queue master node was needed. I was able to reproduce the failure you saw and can confirm that I can run the queue_master_location test suite 30 times in a row without error now.

@michaelklishin michaelklishin merged commit 1c81095 into stable Sep 23, 2017
@lukebakken lukebakken deleted the rabbitmq-server-1371 branch September 24, 2017 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants