-
Notifications
You must be signed in to change notification settings - Fork 266
PHPLIB-467: Connections survive primary step-down #667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI: I opened mongodb/specifications#634 to clarify topology requirements in the spec test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for tweaking waitForMasterReelection()
, but LGTM either way.
/** @var Collection */ | ||
private $collection; | ||
|
||
private function doSetUp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest dropping the collection under test on tearDown, since the tests write to it; however, I realized that we don't consistently do that across the test suite. For example: MongoDB\Tests\Collection\FunctionalTestCase::tearDown()
drops collections after a successful test and leaves behind data after a failure in case we need to inspect it.
I don't think this is worth doing here, but I did want to share the thought. Long term, I'd be more interested in auditing/profiling the test suite to see where we can improve execution time by reducing things like drops. I know database drops can be fairly expensive, so those would be worth looking at to see if they serve a useful purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it for now. As you mentioned, leaving data behind to check out after test failures can make sense, and the spec tests drop collections during setup anyways. At a previous job I've played around with only dropping collections that were written to during tests (using APM to figure out which collections were written to), but the performance gain from that was negligible IIRC.
@jmikola it looks like the I'm not sure how to best fix this: giving the replica set members the same priority would definitely prevent another step down. The alternative is making the server step down for significantly longer than 5 seconds while hoping that there is no other step down during the tests. What do you think? |
IIRC, the priorities were configured as such for PHPC since it had some tests that select primary or secondary nodes and assert their tags. Those are inherently fragile tests and I'm dubious of how much value they provide, but beyond that I don't think they have any relevance to PHPLIB. It's only coming up now because we evidently based the PHPLIB cluster configs on what we already had for PHPC. I'd suggest removing the priority definitions and confirming that it has no adverse affects on the rest of the PHPLIB test suite (AFAIK, it should not). Bumping the |
I just realized that this was originally a PHPC ticket but you chose to implement in PHPLIB. If so, let's move the JIRA issue over to PHPLIB and revise the commit message. It bears mentioning that we do have some existing tests in PHPC that start their own servers with MO, but those are mainly creating standalones to test fail points. I think you made the right call to use our existing cluster for these tests. This could have probably been implemented as phpt tests in PHPC, but your decision to use PHPLIB does avoid a possible conflict with fragile replica set tag tests I mentioned above, so all is well there :) |
b2996d4
to
91f63f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions but LGTM if all is passing.
@@ -15,7 +15,6 @@ | |||
"smallfiles": true | |||
}, | |||
"rsParams": { | |||
"priority": 99, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did these removals actually break some PHPLIB tests, or was that resolved? IIRC, you mentioned something about it on Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They pass on travis-ci, but sometimes fail on my machine. I'm currently investigating the cause of these sporadic failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to figure out build failures, but at least they're not confined to my machine anymore. Re-adding the priority fixes the current failures but reintroduces others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look at the libmongoc implementation of these tests, and they work exactly the same: grab the connection count from the current primary, perform an operation that fails, check connection count again. This works fine, unless the first test that triggers a stepdown ran. If it ran, connection counts are sometimes off, without knowing exactly why. Moving the stepdown test to the end solves this.
I feel bad moving the test, but I'm not sure what could be the cause of this. We don't manage connections at all, and the corresponding tests pass for libmongoc: https://evergreen.mongodb.com/task_log_raw/mongo_c_driver_gcc54_test_latest_replica_set_noauth_sasl_openssl_0acb4a539a62248c2a7f1e0b4a866340c463359c_19_08_27_00_33_53/0?type=T#L3694.
d531beb
to
3e50254
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point about test ordering acknowledged. I'd suggest creating a follow-up ticket to address the fragile ordering. You can cross-reference it with this issue and have it depend on PHPC-1150 (SDAM monitoring).
https://jira.mongodb.org/browse/PHPLIB-467