-
Notifications
You must be signed in to change notification settings - Fork 266
PHPLIB-53 Add documentation for mapReduce #419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
type: boolean | ||
description: | | ||
Specifies whether to convert intermediate data into BSON format between the execution | ||
of the map and reduce functions. The default is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other doc pages, let's wrap boolean literals true
and false
with two back-ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: if it's not too much trouble to setup in your editor, most of the docs pages wrap at 80 characters. I noticed that some of these lines ran a bit long, and paragraphs in MongoDBCollection-mapReduce.txt
wrapped well before 80.
name: scope | ||
type: document | ||
description: | | ||
Specifies global variables that are accessible in the map, reduce and finalize functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oxford comma after "reduce, "
type: boolean | ||
description: | | ||
Specifies whether to include the timing information in the result information. | ||
The default is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
|
||
The :manual:`mapReduce </core/map-reduce>` command | ||
allows you to run map-reduce aggregation operations | ||
over a collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that this is based on the intro paragraph on db.collection.mapReduce()
, I think we can link to both the command reference and "core" manual section here.
If we're referring to the mapReduce
command, let's use:
:manual:`mapReduce </reference/command/mapReduce>`
Later in the sentence "map-reduce" can then link to the page in "/core".
|
||
.. code-block:: php | ||
|
||
function mapReduce($map, $reduce, $out, array $options = []) MongoDB\MapReduceResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a ": " before the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that $map
and $reduce
may get type hints after other PR changes.
result documents must be within the :limit:`BSON Document Size` limit, | ||
which is currently 16 megabytes. For additional information on limits | ||
and restrictions on map-reduce operations, see the | ||
:manual:`mapReduce </reference/command/mapReduce>` reference page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this last sentence comes straight from https://docs.mongodb.com/manual/core/map-reduce/#map-reduce-behavior, but the mapReduce
command page doesn't really have any additional info. The Output Inline section merely repeats the BSON document limit, so I propose we just drop the last sentence here.
or return the results inline. If you write map-reduce output to a | ||
collection, you can perform subsequent map-reduce operations on the | ||
same input collection that merge replace, merge, or reduce new results | ||
with previous results. See :manual:`mapReduce </core/map-reduce>` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As done for the intro section, the link to "core" can be "Map-Reduce" since we're referring to the title of the linked page and not the command reference.
|
||
Example | ||
------- | ||
This example will get all users with at least one "sale" event and count the number of sales for each of these users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RST doesn't seem to mind, but a link break between the heading and following paragraph would be consistent with other doc sources.
$out = array("merge" => "eventCounts"); | ||
|
||
$sales = new MapReduce($this->getDatabaseName(), $this->getCollectionName(), $map, $reduce, $out); | ||
$sales->execute($this->getPrimaryServer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that #392 never added a Collection::mapReduce()
method for the operation. Would you mind taking a crack at adding that?
I'd like to steer clear of talking about the operation classes in our documentation, so this example would ideally just use the collection method. $this->getPrimaryServer()
refers to a method in the test suite, so that will naturally get removed.
Existing Collection methods should provide examples of how to merge in special options such as readConcern
and writeConcern
.
Lastly, I would propose we change the operation class to check map, reduce, and finalize as an instance of MongoDB\BSON\JavascriptInterface
instead of MongoDB\BSON\Javascript
. That type hint can also be enforced on the Collection method for $map
and $reduce
. Since $out
allows multiple types, that won't need a type hint (we can reply on the operation's constructor to validate).
When the time comes to rewrite the example using the Collection method, the examples for find()
and the various findOneAnd___()
methods might be helpful. Those are all based on existing data from one of the example data collections, so if you can cook up a mapReduce
command for one of those data sets, I might suggest running it in inline
mode and then printing output as is done for find()
(iterating on the MapReduceResult and dumping each document).
$sales = new MapReduce($this->getDatabaseName(), $this->getCollectionName(), $map, $reduce, $out); | ||
$sales->execute($this->getPrimaryServer()); | ||
|
||
This will create a MapReduceResult. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the example ends up printing output and using The output would then resemble::
boilerplate, we can drop this line.
Otherwise, we'll want to use the :phpclass:
notation so we can link to a forthcoming documentation page for the result class.
------------- | ||
|
||
A :phpclass:`MongoDB\\MapReduceResult` object, which allows for iteration of mapReduce results | ||
irrespective of the output method (e.g. inline, collection) via the IteratorAggregate interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the following to link to IteratorAggregate:
:php:`IteratorAggregate <iteratoraggregate>`
Return Values | ||
------------- | ||
|
||
A :phpclass:`MongoDB\\MapReduceResult` object, which allows for iteration of mapReduce results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also realize that we don't have an existing page for MapReduceResult. We can take care of that in a later PR (a usage example on this page is going to be more helpful for users anyway). Once a stub for the class exists, this syntax should start linking to it automatically.
The library reference has sections for both write result and enumeration classes, and MapReduceResult overlaps with both. Down the line, I imagine we can consolidate both of those pages into a single, large, "result classes" page.
$collection = (new MongoDB\Client)->test->zips; | ||
|
||
$map = new Javascript('function() { emit(this.state, this.pop); }'); | ||
$reduce = new Javascript('function(key, values) { return Array.sum(values) }'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MongoDB\BSON\Javascript
for the class names. Alternatively, you can add a use
statement at the top of the example.
|
||
$map = new Javascript('function() { emit(this.state, this.pop); }'); | ||
$reduce = new Javascript('function(key, values) { return Array.sum(values) }'); | ||
$out = '{ inline: 1 }'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant ['inline' => 1]
here. If a string is provided for $out
, it's assumed to be the collection name (see: out options).
$reduce = new Javascript('function(key, values) { return Array.sum(values) }'); | ||
$out = '{ inline: 1 }'; | ||
|
||
$populations = new MapReduce($map, $reduce, $out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$collection->mapReduce(...)
?
src/Collection.php
Outdated
*/ | ||
public function mapReduce(JavascriptInterface $map, JavascriptInterface $reduce, $out, array $options = []) | ||
{ | ||
$operation = new MapReduce($this->databaseName, $this->collectionName, $map, $reduce, $out, $options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a use statement for MongoDB\Operation\MapReduce
.
string(2) "AZ" | ||
["value"]=> | ||
float(3665228) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent example!
* @param array $options Command options | ||
* @return MapReduceResult | ||
* @throws UnsupportedException if options are not supported by the selected server | ||
* @throws InvalidArgumentException for parameter/option parsing errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@throws UnexpectedValueException if the command response was malformed
can be added here. Similar to what we have in aggregate()
.
src/Collection.php
Outdated
public function mapReduce(JavascriptInterface $map, JavascriptInterface $reduce, $out, array $options = []) | ||
{ | ||
$operation = new MapReduce($this->databaseName, $this->collectionName, $map, $reduce, $out, $options); | ||
$server = $this->manager->selectServer(new ReadPreference(ReadPreference::RP_PRIMARY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read preference is configurable, but we will want to coerce a primary read preference if the out option is not inline. That will be similar to how aggregate()
checks if the $out
pipeline operator is being used and, if so, forces a primary read preference. After we establish the read preference, we then use that to select a server upon which the operation class will be executed.
Since the out option can be a document (i.e. PHP array or object), I'd suggest looking at is_last_pipeline_operator_out()
in functions.php
for a resilient way to check. Something like the following should work:
// In this case, $out is probably a string (i.e. collection name)
if ( ! is_array($out) && ! is_object($out)) {
return false;
}
// Cast a possible object to an array so we can conveniently check the first field name
$out = (array) $out;
return key($out) === 'inline';
Additionally, the type map, read concern, and write concern can all be inherited from the collection (if unspecified in the options). See aggregate()
for an example of how we merge those in. Notably, read concern depends on the server's wire version, so we only inject a default value if we know it's supported.
src/Collection.php
Outdated
@@ -810,6 +811,28 @@ public function listIndexes(array $options = []) | |||
} | |||
|
|||
/** | |||
* Runs map-reduce aggregation operations as defined by the $map and $reduce parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest "Executes a map-reduce aggregation on the collection." based on aggregate()
.
@@ -78,7 +79,7 @@ name: verbose | |||
type: boolean | |||
description: | | |||
Specifies whether to include the timing information in the result information. | |||
The default is true. | |||
The default is `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re: two back-ticks.
Specifies whether to convert intermediate data into BSON format between the execution | ||
of the map and reduce functions. The default is false. | ||
Specifies whether to convert intermediate data into BSON format between the | ||
execution of the map and reduce functions. The default is `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RST requires two back-ticks, unlike Markdown, which allows one.
src/Collection.php
Outdated
|
||
// Check if the out option is inline because we will want to coerce a primary read preference if not | ||
if ( ! is_array($out) && ! is_object($out)) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole logic for determining if $out
is inline would be better relegated to a function. A private method at the bottom of the Collection class can work (it doesn't need to be in functions.php
if we won't need to use it outside of this class).
As-is, this logic might return false and abort the mapReduce()
method if $out
is a collection name string (a valid use case).
src/Collection.php
Outdated
|
||
$server = $this->manager->selectServer($options['readPreference']); | ||
|
||
// Set options if not set already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest copying the same comment from aggregate()
, as the explanation for readConcern is the same (see: SERVER-20214).
/* A "majority" read concern is not compatible with the $out stage, so
* avoid providing the Collection's read concern if it would conflict.
*/
Also, a majority read concern is only a problem if $out
is not inline (i.e. mapReduce will write results to an output collection). To that end, we probably want to track $isOutInline
(or its inverse, $hasOutputCollection
) and implement similar logic as aggregate()
does with the $hasOutStage
variable. For aggregate()
, $hasOutStage
means the command will write to a collection.
src/Collection.php
Outdated
$options['readConcern'] = $this->readConcern; | ||
} | ||
|
||
if ( ! isset($options['typeMap']) && ( ! isset($options['useCursor']) || $options['useCursor'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapReduce doesn't have a useCursor
option, so this extra logic isn't relevant.
Aggregate happens to be limited because we never implemented support for type maps when it returns results inline without using a cursor. MapReduce::createGetIteratorCallable()
actually gets around that limitation, which I didn't think of at the time when implementing Aggregate. That may be something to fix down the line for Aggregate, depending on how long we'll keep supporting non-cursor replies (that support may end when we remove support for MongoDB 2.6).
|
||
$options = ["readPreference" => ReadPreference::RP_SECONDARY_PREFERRED]; | ||
|
||
$operation = new MapReduce($this->getDatabaseName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be testing the Operation class directly, as the test doesn't interact with the Collection object at all.
I think we can just remove the testMapReduce()
test, as the read preference option isn't testing much. In most environments, this is just going to select the primary server.
$this->createFixtures(3); | ||
|
||
$map = new Javascript('function() { emit(this._id, this.x); }'); | ||
$reduce = new Javascript('function(key, values) { return Array.sum(values); }'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest emitting on 1
instead of this._id
, so that we collect all documents into a single bucket. Then, the reduce function can compute the sum of all x
values. We can later assert that the result of map reduce is a single document with [ '_id' => 1, 'x' => 66 ]
. Something like:
$expected = [
[ '_id' => 1, 'x' => 66 ],
];
$this->assertSameDocuments($expected, $mapReduceResult);
src/Collection.php
Outdated
$options['readPreference'] = new ReadPreference(ReadPreference::RP_PRIMARY); | ||
} | ||
|
||
$server = $this->manager->selectServer($options['readPreference']); | ||
|
||
// Set options if not set already | ||
if ( ! isset($options['readConcern']) && ! ($this->readConcern->getLevel() === ReadConcern::MAJORITY) && \MongoDB\server_supports_feature($server, self::$wireVersionForReadConcern)) { | ||
/* A "majority" read concern is not compatible with $out inline, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"$out" was a reference to the aggregation pipeline operator. "with inline output, so" may read better.
src/Collection.php
Outdated
/* A "majority" read concern is not compatible with $out inline, so | ||
* avoid providing the Collection's read concern if it would conflict. | ||
*/ | ||
if ( ! isset($options['readConcern']) && ! ($isOutInline && $this->readConcern->getLevel() === ReadConcern::MAJORITY) && \MongoDB\server_supports_feature($server, self::$wireVersionForReadConcern)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a logic error in the middle condition:
! ($isOutInline && $this->readConcern->getLevel() === ReadConcern::MAJORITY)
If we apply De Morgan's Law to distribute the negation, we get:
( ! $isOutInline || $this->readConcern->getLevel() !== ReadConcern::MAJORITY )
We can write this in prose as:
mapReduce is writing to a collection OR read concern is not "majority"
And back in the original condition:
- the user hasn't specified a read concern AND
- (mapReduce is writing to a collection OR read concern is not "majority") AND
- the server supports read concerns
Given this, it's possible that the Collection could have a majority read concern and we'd still inherit it as a default when mapReduce is writing to a collection. Bad news bears.
While we can fix this by negating $isOutInline
within the middle conditional, it's already hard to read as it is. I'd suggest changing the variable to track the inverse (e.g. $hasOutputCollection
as mentioned in #419 (comment)). Then, this condition basically copies the aggregate()
snippet with one variable rename:
/* A "majority" read concern is not compatible with writing output to a
* collection, so avoid providing the Collection's read concern if it
* would conflict.
*/
if ( ! isset($options['readConcern']) &&
! ($hasOutputCollection && $this->readConcern->getLevel() === ReadConcern::MAJORITY) &&
\MongoDB\server_supports_feature($server, self::$wireVersionForReadConcern)) {
$options['readConcern'] = $this->readConcern;
}
src/Collection.php
Outdated
|
||
if (key($out) !== 'inline') { | ||
if ($this->isOutInline($out)) { | ||
$isOutInline = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues here:
isOutInline
is used later in a conditional, but its assignment here is nested within a condition. As such, it's possible we'd get an undefined variable error later on.- This condition forces a primary read preference is output is inline (i.e. not writing to a collection), which is backwards.
If we make the swap to $hasOutputCollection
, I'd suggest assigning $hasOutputCollection = ! $this->isOutInline($out);
in the method's top scope and then replacing this condition with if ($hasOutputCollection)
to force a primary read preference.
LGTM. I noticed there were a few documentation changes mixed in with the last four commits related to |
https://jira.mongodb.org/browse/PHPLIB-53