PHPLIB-467: Connections survive primary step-down #667

alcaeus · 2019-08-13T12:10:59Z

https://jira.mongodb.org/browse/PHPLIB-467

tests/SpecTests/PrimaryStepDownSpecTest.php

jmikola · 2019-08-20T21:46:51Z

FYI: I opened mongodb/specifications#634 to clarify topology requirements in the spec test.

jmikola

Suggestion for tweaking waitForMasterReelection(), but LGTM either way.

tests/SpecTests/PrimaryStepDownSpecTest.php

jmikola · 2019-08-22T05:20:01Z

tests/SpecTests/PrimaryStepDownSpecTest.php

+    /** @var Collection */
+    private $collection;
+
+    private function doSetUp()


I was going to suggest dropping the collection under test on tearDown, since the tests write to it; however, I realized that we don't consistently do that across the test suite. For example: MongoDB\Tests\Collection\FunctionalTestCase::tearDown() drops collections after a successful test and leaves behind data after a failure in case we need to inspect it.

I don't think this is worth doing here, but I did want to share the thought. Long term, I'd be more interested in auditing/profiling the test suite to see where we can improve execution time by reducing things like drops. I know database drops can be fairly expensive, so those would be worth looking at to see if they serve a useful purpose.

I'll leave it for now. As you mentioned, leaving data behind to check out after test failures can make sense, and the spec tests drop collections during setup anyways. At a previous job I've played around with only dropping collections that were written to during tests (using APM to figure out which collections were written to), but the performance gain from that was negligible IIRC.

alcaeus · 2019-08-23T09:49:56Z

@jmikola it looks like the getMoreIteration test causes failures down the line, which unfortunately only appear when running the entire test suite. These errors disappear when skipping that test. Looking at the replica set configuration, it seems likely that there is in fact another step down sometime after running that initial step down: the first replica set member has a significantly higher priority, and in my local tests I was able to confirm that the first server becomes primary a short while after stepping down.

I'm not sure how to best fix this: giving the replica set members the same priority would definitely prevent another step down. The alternative is making the server step down for significantly longer than 5 seconds while hoping that there is no other step down during the tests. What do you think?

jmikola · 2019-08-24T19:17:13Z

the first replica set member has a significantly higher priority

IIRC, the priorities were configured as such for PHPC since it had some tests that select primary or secondary nodes and assert their tags. Those are inherently fragile tests and I'm dubious of how much value they provide, but beyond that I don't think they have any relevance to PHPLIB. It's only coming up now because we evidently based the PHPLIB cluster configs on what we already had for PHPC.

I'd suggest removing the priority definitions and confirming that it has no adverse affects on the rest of the PHPLIB test suite (AFAIK, it should not). Bumping the replSetStepDown value higher may work today, but IMO it would just be kicking the can further down the road and invite a failure if our test suite runs beyond that delay.

jmikola · 2019-08-24T19:24:19Z

I just realized that this was originally a PHPC ticket but you chose to implement in PHPLIB. If so, let's move the JIRA issue over to PHPLIB and revise the commit message.

It bears mentioning that we do have some existing tests in PHPC that start their own servers with MO, but those are mainly creating standalones to test fail points. I think you made the right call to use our existing cluster for these tests. This could have probably been implemented as phpt tests in PHPC, but your decision to use PHPLIB does avoid a possible conflict with fragile replica set tag tests I mentioned above, so all is well there :)

jmikola

A few small suggestions but LGTM if all is passing.

jmikola · 2019-08-27T04:43:19Z

mongo-orchestration/replica_sets/replicaset-old.json

@@ -15,7 +15,6 @@
                "smallfiles": true
            },
            "rsParams": {
-                "priority": 99,


Did these removals actually break some PHPLIB tests, or was that resolved? IIRC, you mentioned something about it on Monday.

They pass on travis-ci, but sometimes fail on my machine. I'm currently investigating the cause of these sporadic failures.

I wasn't able to figure out build failures, but at least they're not confined to my machine anymore. Re-adding the priority fixes the current failures but reintroduces others.

I've taken a look at the libmongoc implementation of these tests, and they work exactly the same: grab the connection count from the current primary, perform an operation that fails, check connection count again. This works fine, unless the first test that triggers a stepdown ran. If it ran, connection counts are sometimes off, without knowing exactly why. Moving the stepdown test to the end solves this.

I feel bad moving the test, but I'm not sure what could be the cause of this. We don't manage connections at all, and the corresponding tests pass for libmongoc: https://evergreen.mongodb.com/task_log_raw/mongo_c_driver_gcc54_test_latest_replica_set_noauth_sasl_openssl_0acb4a539a62248c2a7f1e0b4a866340c463359c_19_08_27_00_33_53/0?type=T#L3694.

tests/SpecTests/PrimaryStepDownSpecTest.php

jmikola

Point about test ordering acknowledged. I'd suggest creating a follow-up ticket to address the fragile ordering. You can cross-reference it with this issue and have it depend on PHPC-1150 (SDAM monitoring).

alcaeus requested a review from kvwalker August 13, 2019 12:10

alcaeus self-assigned this Aug 13, 2019

alcaeus changed the title ~~PHPC-1294: Connections survive primary step-down~~ [WIP] PHPC-1294: Connections survive primary step-down Aug 13, 2019

alcaeus removed the request for review from kvwalker August 13, 2019 12:13

jmikola reviewed Aug 19, 2019

View reviewed changes

tests/SpecTests/PrimaryStepDownSpecTest.php Outdated Show resolved Hide resolved

tests/SpecTests/PrimaryStepDownSpecTest.php Outdated Show resolved Hide resolved

tests/SpecTests/PrimaryStepDownSpecTest.php Show resolved Hide resolved

alcaeus force-pushed the phpc-1294 branch from 403b4d4 to ab94868 Compare August 20, 2019 09:08

jmikola reviewed Aug 20, 2019

View reviewed changes

tests/SpecTests/PrimaryStepDownSpecTest.php Outdated Show resolved Hide resolved

alcaeus force-pushed the phpc-1294 branch from ab94868 to 1a9b11b Compare August 21, 2019 10:33

jmikola approved these changes Aug 22, 2019

View reviewed changes

alcaeus force-pushed the phpc-1294 branch from 1a9b11b to 5d465ca Compare August 22, 2019 13:17

alcaeus changed the title ~~[WIP] PHPC-1294: Connections survive primary step-down~~ PHPC-1294: Connections survive primary step-down Aug 22, 2019

alcaeus force-pushed the phpc-1294 branch 2 times, most recently from b2996d4 to 91f63f1 Compare August 26, 2019 14:31

alcaeus changed the title ~~PHPC-1294: Connections survive primary step-down~~ PHPLIB-467: Connections survive primary step-down Aug 26, 2019

jmikola approved these changes Aug 27, 2019

View reviewed changes

alcaeus force-pushed the phpc-1294 branch 2 times, most recently from d531beb to 3e50254 Compare August 28, 2019 10:39

alcaeus added 2 commits August 28, 2019 13:55

PHPLIB-467: Connections survive primary stepdown

0570864

PHPLIB-467: Remove priorities from MO replica sets to fix test failures

51081f4

alcaeus force-pushed the phpc-1294 branch from 3e50254 to 51081f4 Compare August 28, 2019 11:55

jmikola self-requested a review August 28, 2019 17:31

jmikola approved these changes Aug 28, 2019

View reviewed changes

alcaeus added a commit to alcaeus/mongo-php-library that referenced this pull request Aug 28, 2019

Merge pull request mongodb#667

97540ea

alcaeus merged commit 51081f4 into mongodb:master Aug 28, 2019

alcaeus deleted the phpc-1294 branch August 28, 2019 19:25

PHPLIB-467: Connections survive primary step-down #667

PHPLIB-467: Connections survive primary step-down #667

Uh oh!

Conversation

alcaeus commented Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmikola commented Aug 20, 2019

Uh oh!

jmikola left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jmikola Aug 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alcaeus Aug 22, 2019

Choose a reason for hiding this comment

Uh oh!

alcaeus commented Aug 23, 2019

Uh oh!

jmikola commented Aug 24, 2019

Uh oh!

jmikola commented Aug 24, 2019

Uh oh!

jmikola left a comment

Choose a reason for hiding this comment

Uh oh!

jmikola Aug 27, 2019

Choose a reason for hiding this comment

Uh oh!

alcaeus Aug 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alcaeus Aug 27, 2019

Choose a reason for hiding this comment

Uh oh!

alcaeus Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmikola left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alcaeus commented Aug 13, 2019 •

edited

Loading

jmikola Aug 22, 2019 •

edited

Loading

alcaeus Aug 27, 2019 •

edited

Loading