Skip to content

PHPLIB-418: Add the ability to specify a pipeline to an update command #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 27, 2019

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented Aug 22, 2019

@alcaeus alcaeus requested a review from jmikola August 22, 2019 12:16
@alcaeus alcaeus self-assigned this Aug 22, 2019
@@ -1,4 +1,9 @@
{
"runOn": [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -76,6 +76,10 @@ public function __construct($databaseName, $collectionName, $filter, $replacemen
throw new InvalidArgumentException('First key in $replacement argument is an update operator');
}

if (\MongoDB\is_pipeline($replacement)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, although I didn't find this discussed in the CRUD spec. In hindsight, I think mongodb/specifications@7246066#diff-5efd2e4ce61f4ccda0ce031099303996 for SPEC-1267 should have addressed this in the CRUD spec's Update vs. Replace Validation section. It could also have added a failing test for replaceOne and replace via BulkWrite. Would you mind opening a spec ticket? Hopefully we can get to it during SPEC triage Monday afternoon (NYC time).

Although documents cannot contain keys prefixed with $, I was wondering if the following replacement document might be ambiguous (at least for our is_pipeline() function):

[
    [ '$ref' => '<any string>' ],
    [ '$id' => '<any non-array value>' ],
]

This could be interpreted as a replacement document assigning a DBRef to the document's top-level "0" field, or a pipeline with $ref and $id stages. I decided to test this in the mongo shell:

rs0:PRIMARY> db.foo.drop()
false

rs0:PRIMARY> db.foo.insert({"0":1})
WriteResult({ "nInserted" : 1 })

rs0:PRIMARY> db.foo.findOne()
{ "_id" : ObjectId("5d61d496637adc23cd5c4215"), "0" : 1 }

rs0:PRIMARY> db.foo.replaceOne({},[{"$ref":"foo","$id":"bar"}])
2019-08-24T20:26:51.491-0400 E  QUERY    [js] uncaught exception: Error: Cannot use pipeline-style updates in a replacement operation :
DBCollection.prototype.replaceOne@src/mongo/shell/crud_api.js:461:15
@(shell):1:1

That supports the validation you have here. Still, it's a client-side error and I was wondering what the server-side error might be if we actually passed this along to the update command. Does MongoDB 4.2 accept a BSON array as a valid replacement document in the update command's updates[].u option? Recent server versions added stricter validation, but I believe in some older servers (perhaps in the 3.x range) a BSON array/document were accepted interchangeably. Let's find out:

rs0:PRIMARY> db.runCommand({update:"foo",updates:[{q:{},u:[{"$ref":"foo","$id":"bar"}]}]});
{
	"n" : 0,
	"nModified" : 0,
	"writeErrors" : [
		{
			"index" : 0,
			"code" : 40323,
			"errmsg" : "A pipeline stage specification object must contain exactly one field."
		}
	],
        ...
}

My DBRef example is a total edge case, but let's suppose an application actually did want to perform that replacement for some reason. Passing an actual BSON document (i.e. object cast on the PHP array) does the trick:

rs0:PRIMARY> db.runCommand({update:"foo",updates:[{q:{},u:{"0":{"$ref":"foo","$id":"bar"}}}]}); 
{
	"n" : 1,
	"nModified" : 1,
        ...
}

I also traced the code path for constructing the update command and we can see in _mongoc_write_command_update_append() that libmongoc always appends the u option as a document unless it's a pipeline. So even if a PHP array was used as a replacement document and managed to sneak past our update/replace validation, we could trust libmongoc to actually send a BSON document to the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that detailed analysis. I've created a spec ticket to ask for clarification in the spec. From the result of your tests, I deduce that our handling is correct in this case with users having to explicitly pass an object for a replacement that you suggested above.

I agree that it's an edge case, especially considering that DBRef objects normally contain both $ref and $id keys. With that in mind, we could add logic to MongoDB\is_pipeline to assert that each stage in the potential pipeline only contains a single element; this would at least remove the client-side assertion when replacing a document with a single DBRef object, as we would no longer accept that as a valid aggregation pipeline. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked this through early Monday, but the current logic in is_pipeline() seems fine. If the user actually needs to specify { "0": DBRef(...) } as a replacement document they can cast it to an object.

Checking that each element within the pipeline array is a document with only a single key seems like overkill for what could be a common code path for update-heavy code.

return false;
}

if ($pipeline === []) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing from my previous shell testing, it looks like MongoDB 4.2 accepts an empty pipeline and simply NOPs:

rs0:PRIMARY> db.runCommand({update:"foo",updates:[{q:{},u:[]}]});
{
	"n" : 1,
	"nModified" : 0,
        ...
}

Contrast this with an empty replacement document:

rs0:PRIMARY> db.runCommand({update:"foo",updates:[{q:{},u:{}}]});
{
	"n" : 1,
	"nModified" : 1,
	...
}

Is this function based on libmongoc's _mongoc_document_is_pipeline()? I see that also returns false if the argument has no elements.

I'm not arguing that this is incorrect. I actually think it's the most intuitive behavior, especially for PHP where users would most commonly pass [] as an empty replacement document; however, I'm curious if this was dictated by the spec. If not, perhaps we should mention it in the SPEC ticket I proposed opening in my earlier comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function based on libmongoc's _mongoc_document_is_pipeline()? I see that also returns false if the argument has no elements.

That is correct. I assumed that it'd be wrong to treat an empty array as an aggregation pipeline, but I also specifically remember one test failing because of that, which is why I added the check after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a note of this in SPEC-1430.

@alcaeus alcaeus force-pushed the phplib-418 branch 3 times, most recently from 2fa9984 to 4991997 Compare August 26, 2019 11:14
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion for extra tests, but LGTM.

alcaeus added a commit that referenced this pull request Aug 27, 2019
@alcaeus alcaeus merged commit fc87768 into mongodb:master Aug 27, 2019
@alcaeus alcaeus deleted the phplib-418 branch August 27, 2019 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants