Skip to content

PHPLIB-466: Support mongos pinning for sharded transactions #677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 4, 2019

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented Sep 2, 2019

@alcaeus alcaeus requested a review from jmikola September 2, 2019 18:10
@alcaeus alcaeus self-assigned this Sep 2, 2019
@@ -29,15 +28,13 @@ public function testRetryableWrites(stdClass $test, array $runOn = null, array $
$this->markTestSkipped('Transaction numbers are only allowed on a replica set member or mongos (PHPC-1415)');
}

if (isset($test->useMultipleMongoses) && $test->useMultipleMongoses && $this->isShardedCluster()) {
$this->manager = new Manager(static::getUri(true));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised way too late that we don't use $this->manager in these tests. This was exposed when testing session pinning, as running with a single mongos won't trigger pinning logic in libmongoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had we used $this->manager, which is set from MongoDB\Tests\FunctionalTestCase::setUp(), I don't think that would have worked either. That manager is constructed purely from the URI environment variable and wouldn't have considered the spec tests' useMultipleMongoses option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I originally added this conditional to recreate the manager with multiple mongooses if specified in the spec test, not realising that it still didn't have any effect.

@@ -36,6 +37,9 @@ class TransactionsSpecTest extends FunctionalTestCase
private static $incompleteTests = [
'error-labels: add unknown commit label to MaxTimeMSExpired' => 'PHPC-1382',
'error-labels: add unknown commit label to writeConcernError MaxTimeMSExpired' => 'PHPC-1382',
'mongos-recovery-token: commitTransaction retry fails on new mongos' => 'isMaster failpoints cannot be disabled',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to reliably clear the failCommand fail point if it's configured to fail a certain amounts of times. It seems that we don't trigger the fail point the expected number of times, which causes issues when removing the fail point (as isMaster will cause an error). However, even trying to remove the fail point in a loop did not always clear the fail point, which causes failures in following tests.

Copy link
Member

@jmikola jmikola Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the script for this test and the comments therein, I wonder if it was written with multi-threaded SDAM in mind. I wasn't aware we had any tests failing isMaster commands before reading this. It looks like this test dates back to the original spec change for recovery tokens in SPEC-1168.

Perhaps you can reach out to @bazile-clyde to see whether the C driver is able to test this with a multi- or single-threaded client. Multi-threaded clients would offload all isMaster activity to a separate background thread and socket, which means these isMaster failures might not affect the client under test at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the comments aren't present in the JSON files, which is why I missed the detailed explanation. The test is implemented in the C driver, but I couldn't find out whether it uses a single- or multi-threaded client. I'll check with Clyde on this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our fail points are activated using single-threaded clients. Also, to disable the fail point we didn't use any loops but instead just sent the command {'configureFailPoint': 'failCommand' 'mode': 'off'}. Both activating and deactivating fail points were done via a new client, pinned to the same server ID as the one we used for testing.

{
parent::setUpBeforeClass();
parent::setUp();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the test more compliant with the spec, which requires that all sessions be killed before running a test. We still kill all sessions after a failing test.

@alcaeus alcaeus force-pushed the phplib-466 branch 2 times, most recently from ebd3e95 to 5a9872a Compare September 3, 2019 05:21
@@ -376,7 +378,7 @@ private function resume($resumeToken = null, $hasAdvanced = false)
$this->hasResumed = true;

// Select a new server using the original read preference
$server = $this->manager->selectServer($this->aggregateOptions['readPreference']);
$server = select_server($this->manager, $this->aggregateOptions['readPreference'], extract_session_from_options($this->aggregateOptions));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is necessary. Quoting myself from PHPLIB-466:

Watch also invokes selectServer() when resuming, but I think that should be left as-is. AFAIK, change streams and transactions are mutually exclusive, but you may want to double-check that just to be sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A watch operation is currently not allowed in a transaction:

MongoDB\Driver\Exception\CommandException : Operation not permitted in transaction :: caused by :: Operation not permitted in transaction :: caused by :: $changeStream cannot run with a readConcern other than 'majority', or in a multi-document transaction. Current readConcern: { readConcern: { level: "snapshot" } }

I originally changed all instances of server selection within the library with the new logic so that it's consistent. In this case, changing server selection back would mean that running the watch operation on a pinned session will error because we're running the command on the wrong server, instead of informing the user that the operation can't be done in a transaction. Even in this technically impossible case, I'd still pick the pinned server to have the correct exception shown to the user.

This will also avoid problems down the line if watch ever becomes available in a transaction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur that this would ensure the user gets more accurate error message. Would you mind adding a comment to document this reasoning?

@@ -29,15 +28,13 @@ public function testRetryableWrites(stdClass $test, array $runOn = null, array $
$this->markTestSkipped('Transaction numbers are only allowed on a replica set member or mongos (PHPC-1415)');
}

if (isset($test->useMultipleMongoses) && $test->useMultipleMongoses && $this->isShardedCluster()) {
$this->manager = new Manager(static::getUri(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had we used $this->manager, which is set from MongoDB\Tests\FunctionalTestCase::setUp(), I don't think that would have worked either. That manager is constructed purely from the URI environment variable and wouldn't have considered the spec tests' useMultipleMongoses option.

if ($this->isShardedCluster()) {
$this->markTestSkipped('Transactions are not supported on sharded clusters');
if (!$this->isShardedClusterUsingReplicasets()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't easily figure out what storage engine is in use on sharded clusters, so we're skipping that check when running a sharded setup. I don't think we even run any tests on the MMAPv1 storage engine at all in PHPLIB. This may change once we move library tests to Evergreen, but I'd revisit this then and figure out a more robust solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMAPv1 was completely removed in 4.2, so this shouldn't be a concern.

@alcaeus
Copy link
Member Author

alcaeus commented Sep 4, 2019

@jmikola I've added the prose tests for mongos pinning. I'm not sure if they provide additional value to the spec tests (especially since they cover a range of pin/unpin scenarios), but I've added them nonetheless 👍

*/
public function testStartingNewTransactionOnPinnedSessionUnpinsSession()
{
$this->skipIfTransactionsAreNotSupported();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should just move this to setUp().

* ClientSession unpins the session and normal server selection is performed
* for the next operation.
*/
public function testStartingNewTransactionOnPinnedSessionUnpinsSession()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I opened SPEC-1433 to determine if these prose tests are still relevant. I see no harm in adding them, though.

alcaeus added a commit that referenced this pull request Sep 4, 2019
@alcaeus alcaeus merged commit 9281873 into mongodb:master Sep 4, 2019
@alcaeus alcaeus deleted the phplib-466 branch September 4, 2019 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants