-
Notifications
You must be signed in to change notification settings - Fork 266
PHPLIB-466: Support mongos pinning for sharded transactions #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -29,15 +28,13 @@ public function testRetryableWrites(stdClass $test, array $runOn = null, array $ | |||
$this->markTestSkipped('Transaction numbers are only allowed on a replica set member or mongos (PHPC-1415)'); | |||
} | |||
|
|||
if (isset($test->useMultipleMongoses) && $test->useMultipleMongoses && $this->isShardedCluster()) { | |||
$this->manager = new Manager(static::getUri(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised way too late that we don't use $this->manager
in these tests. This was exposed when testing session pinning, as running with a single mongos won't trigger pinning logic in libmongoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had we used $this->manager
, which is set from MongoDB\Tests\FunctionalTestCase::setUp()
, I don't think that would have worked either. That manager is constructed purely from the URI environment variable and wouldn't have considered the spec tests' useMultipleMongoses
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I originally added this conditional to recreate the manager with multiple mongooses if specified in the spec test, not realising that it still didn't have any effect.
@@ -36,6 +37,9 @@ class TransactionsSpecTest extends FunctionalTestCase | |||
private static $incompleteTests = [ | |||
'error-labels: add unknown commit label to MaxTimeMSExpired' => 'PHPC-1382', | |||
'error-labels: add unknown commit label to writeConcernError MaxTimeMSExpired' => 'PHPC-1382', | |||
'mongos-recovery-token: commitTransaction retry fails on new mongos' => 'isMaster failpoints cannot be disabled', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to reliably clear the failCommand
fail point if it's configured to fail a certain amounts of times. It seems that we don't trigger the fail point the expected number of times, which causes issues when removing the fail point (as isMaster
will cause an error). However, even trying to remove the fail point in a loop did not always clear the fail point, which causes failures in following tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the script for this test and the comments therein, I wonder if it was written with multi-threaded SDAM in mind. I wasn't aware we had any tests failing isMaster
commands before reading this. It looks like this test dates back to the original spec change for recovery tokens in SPEC-1168.
Perhaps you can reach out to @bazile-clyde to see whether the C driver is able to test this with a multi- or single-threaded client. Multi-threaded clients would offload all isMaster
activity to a separate background thread and socket, which means these isMaster
failures might not affect the client under test at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the comments aren't present in the JSON files, which is why I missed the detailed explanation. The test is implemented in the C driver, but I couldn't find out whether it uses a single- or multi-threaded client. I'll check with Clyde on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our fail points are activated using single-threaded clients. Also, to disable the fail point we didn't use any loops but instead just sent the command {'configureFailPoint': 'failCommand' 'mode': 'off'}
. Both activating and deactivating fail points were done via a new client, pinned to the same server ID as the one we used for testing.
{ | ||
parent::setUpBeforeClass(); | ||
parent::setUp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes the test more compliant with the spec, which requires that all sessions be killed before running a test. We still kill all sessions after a failing test.
ebd3e95
to
5a9872a
Compare
@@ -376,7 +378,7 @@ private function resume($resumeToken = null, $hasAdvanced = false) | |||
$this->hasResumed = true; | |||
|
|||
// Select a new server using the original read preference | |||
$server = $this->manager->selectServer($this->aggregateOptions['readPreference']); | |||
$server = select_server($this->manager, $this->aggregateOptions['readPreference'], extract_session_from_options($this->aggregateOptions)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is necessary. Quoting myself from PHPLIB-466:
Watch also invokes
selectServer()
when resuming, but I think that should be left as-is. AFAIK, change streams and transactions are mutually exclusive, but you may want to double-check that just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A watch operation is currently not allowed in a transaction:
MongoDB\Driver\Exception\CommandException : Operation not permitted in transaction :: caused by :: Operation not permitted in transaction :: caused by :: $changeStream cannot run with a readConcern other than 'majority', or in a multi-document transaction. Current readConcern: { readConcern: { level: "snapshot" } }
I originally changed all instances of server selection within the library with the new logic so that it's consistent. In this case, changing server selection back would mean that running the watch operation on a pinned session will error because we're running the command on the wrong server, instead of informing the user that the operation can't be done in a transaction. Even in this technically impossible case, I'd still pick the pinned server to have the correct exception shown to the user.
This will also avoid problems down the line if watch ever becomes available in a transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that this would ensure the user gets more accurate error message. Would you mind adding a comment to document this reasoning?
@@ -29,15 +28,13 @@ public function testRetryableWrites(stdClass $test, array $runOn = null, array $ | |||
$this->markTestSkipped('Transaction numbers are only allowed on a replica set member or mongos (PHPC-1415)'); | |||
} | |||
|
|||
if (isset($test->useMultipleMongoses) && $test->useMultipleMongoses && $this->isShardedCluster()) { | |||
$this->manager = new Manager(static::getUri(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had we used $this->manager
, which is set from MongoDB\Tests\FunctionalTestCase::setUp()
, I don't think that would have worked either. That manager is constructed purely from the URI environment variable and wouldn't have considered the spec tests' useMultipleMongoses
option.
tests/FunctionalTestCase.php
Outdated
if ($this->isShardedCluster()) { | ||
$this->markTestSkipped('Transactions are not supported on sharded clusters'); | ||
if (!$this->isShardedClusterUsingReplicasets()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't easily figure out what storage engine is in use on sharded clusters, so we're skipping that check when running a sharded setup. I don't think we even run any tests on the MMAPv1 storage engine at all in PHPLIB. This may change once we move library tests to Evergreen, but I'd revisit this then and figure out a more robust solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MMAPv1 was completely removed in 4.2, so this shouldn't be a concern.
@jmikola I've added the prose tests for mongos pinning. I'm not sure if they provide additional value to the spec tests (especially since they cover a range of pin/unpin scenarios), but I've added them nonetheless 👍 |
*/ | ||
public function testStartingNewTransactionOnPinnedSessionUnpinsSession() | ||
{ | ||
$this->skipIfTransactionsAreNotSupported(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should just move this to setUp()
.
* ClientSession unpins the session and normal server selection is performed | ||
* for the next operation. | ||
*/ | ||
public function testStartingNewTransactionOnPinnedSessionUnpinsSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I opened SPEC-1433 to determine if these prose tests are still relevant. I see no harm in adding them, though.
The isMaster failpoint can't be reliably disabled again in these tests, causing subsequent test failures.
https://jira.mongodb.org/browse/PHPLIB-466