@@ -12,11 +12,11 @@ Overview
12
12
The MongoDB aggregation framework provides a means to calculate
13
13
aggregate values without having to use :doc: `map/reduce
14
14
</core/map-reduce>`. While map/reduce is powerful, using map/reduce is
15
- more difficult than necessary for simple aggregation tasks, such as
15
+ more difficult than necessary for many simple aggregation tasks, such as
16
16
totaling or averaging field values.
17
17
18
18
If you're familiar with :term: `SQL `, the aggregation framework
19
- provides similar functionality as "``GROUPBY ``" and related SQL
19
+ provides similar functionality to "``GROUP BY ``" and related SQL
20
20
operators as well as simple forms of "self joins." Additionally, the
21
21
aggregation framework provides projection capabilities to reshape the
22
22
returned data. Using projections and aggregation, you can add computed
@@ -38,23 +38,22 @@ underpin the aggregation framework: :term:`pipelines <pipeline>` and
38
38
Pipelines
39
39
~~~~~~~~~
40
40
41
- A pipeline is process that applies a sequence of documents when using
42
- the aggregation framework. For those familiar with UNIX-like shells
43
- (e.g. bash,) the concept is analogous to the pipe (i.e. "`` | ``") used
44
- to string operations together.
41
+ Conceptually, documents from a collection are passed through an
42
+ aggregation pipeline, and are transformed as they pass through it.
43
+ For those familiar with UNIX-like shells (e.g. bash,) the concept is
44
+ analogous to the pipe (i.e. "`` | ``") used to string text filters together.
45
45
46
46
In a shell environment the pipe redirects a stream of characters from
47
47
the output of one process to the input of the next. The MongoDB
48
48
aggregation pipeline streams MongoDB documents from one :doc: `pipeline
49
49
operator </reference/aggregation>` to the next to process the
50
50
documents.
51
51
52
- All pipeline operators processes a stream of documents, and the
52
+ All pipeline operators process a stream of documents, and the
53
53
pipeline behaves as if the operation scans a :term: `collection ` and
54
- passes all matching documents into the "top" of the pipeline. Then,
55
- each operator in the pipleine transforms each document as it passes
56
- through the pipeline. At the end of the pipeline, the aggregation
57
- framework returns documents in the same manner as all other queries.
54
+ passes all matching documents into the "top" of the pipeline.
55
+ Each operator in the pipleine transforms each document as it passes
56
+ through the pipeline.
58
57
59
58
.. note ::
60
59
@@ -72,24 +71,26 @@ framework returns documents in the same manner as all other queries.
72
71
- :agg:pipeline: `$unwind `
73
72
- :agg:pipeline: `$group `
74
73
- :agg:pipeline: `$sort `
74
+ TODO I'd remove references to $out, since we don't have it yet
75
75
- :agg:pipeline: `$out `
76
76
77
77
.. _aggregation-expressions :
78
78
79
79
Expressions
80
80
~~~~~~~~~~~
81
81
82
- Expressions calculate values based on inputs from the pipeline, and
83
- return their results to the pipeline. The aggregation framework
84
- defines expressions in :term: `JSON ` using a prefix format.
82
+ Expressions calculate values based on documents passing through the pipeline,
83
+ and contribute their results to documents flowing through the pipeline.
84
+ The aggregation framework defines expressions in :term: `JSON ` using a prefix
85
+ format.
85
86
86
87
Often, expressions are stateless and are only evaluated when seen by
87
88
the aggregation process. Stateless expressions perform operations such
88
- as: adding the values of two fields together, or extracting the year
89
+ as adding the values of two fields together or extracting the year
89
90
from a date.
90
91
91
92
The :term: `accumulator ` expressions *do * retain state, and the
92
- :agg:pipeline: `$group ` operator uses maintains state (e.g. counts,
93
+ :agg:pipeline: `$group ` operator maintains that state (e.g.
93
94
totals, maximums, minimums, and related data.) as documents progress
94
95
through the :term: `pipeline `.
95
96
@@ -104,17 +105,17 @@ Invocation
104
105
~~~~~~~~~~
105
106
106
107
Invoke an :term: `aggregation ` operation with the :func: `aggregate `
107
- wrapper in the :program: `mongo ` shell for the :dbcommand: `aggregate `
108
+ wrapper in the :program: `mongo ` shell or the :dbcommand: `aggregate `
108
109
:term: `database command `. Always call :func: `aggregate ` on a
109
110
collection object, which will determine the documents that contribute
110
111
to the beginning of the aggregation :term: `pipeline `. The arguments to
111
- the :func: `aggregate ` function specify a sequence :ref: `pipeline
112
+ the :func: `aggregate ` function specify a sequence of :ref: `pipeline
112
113
operators <aggregation-pipeline-operator-reference>`, where each
113
114
:ref: `pipeline operator <aggregation-pipeline-operator-reference >` may
114
115
have a number of operands.
115
116
116
117
First, consider a :term: `collection ` of documents named "``article ``"
117
- using the following schema or and format:
118
+ using the following format:
118
119
119
120
.. code-block :: javascript
120
121
@@ -169,7 +170,10 @@ The aggregation operation in the previous section returns a
169
170
if there was an error
170
171
171
172
As a document, the result is subject to the current :ref: `BSON
172
- Document size <limit-maximum-bson-document-size>`. If you expect the
173
+ Document size <limit-maximum-bson-document-size>`.
174
+
175
+ TODO $out is not going to be available in 2.2, so I'd eliminate this reference
176
+ If you expect the
173
177
aggregation framework to return a larger result, consider using the
174
178
use the :agg:pipeline: `$out ` pipeline operator to write the output to a
175
179
collection.
@@ -181,22 +185,21 @@ Early Filtering
181
185
~~~~~~~~~~~~~~~
182
186
183
187
Because you will always call :func: `aggregate ` on a
184
- :term: `collection ` object, which inserts the *entire * collection into
185
- the aggregation pipeline, you may want to increase efficiency in some
186
- situations by avoiding scanning an entire collection.
188
+ :term: `collection ` object, which logically inserts the *entire * collection into
189
+ the aggregation pipeline, you may want to optimize the operation
190
+ by avoiding scanning the entire collection whenever possible .
187
191
188
192
If your aggregation operation requires only a subset of the data in a
189
- collection, use the :agg:pipeline: `$match ` to limit the items in the
190
- pipeline, as in a query. These :agg:pipeline: `$match ` operations will use
191
- suitable indexes to access the matching element or elements in a
192
- collection.
193
-
194
- When :agg:pipeline: `$match ` appears first in the :term: `pipeline `, the
195
- :dbcommand: `pipeline ` begins with results of a :term: `query ` rather than
196
- the entire contents of a collection.
197
-
193
+ collection, use the :agg:pipeline: `$match ` to restrict which items go in
194
+ to the top of the
195
+ pipeline, as in a query. When placed early in a pipeline, these
196
+ :agg:pipeline: `$match ` operations will use
197
+ suitable indexes to scan only the matching documents in a collection.
198
+
199
+ TODO we don't do the following yet, but there's a ticket for it. Should we
200
+ leave it out for now?
198
201
:term: `Aggregation ` operations have an optimization phase, before
199
- execution, attempts to re-arrange the pipeline by moving
202
+ execution, which attempts to re-arrange the pipeline by moving
200
203
:agg:pipeline: `$match ` operators towards the beginning to the greatest
201
204
extent possible. For example, if a :term: `pipeline ` begins with a
202
205
:agg:pipeline: `$project ` that renames fields, followed by a
@@ -221,7 +224,7 @@ must fit in memory.
221
224
222
225
:agg:pipeline: `$group ` has similar characteristics: Before any
223
226
:agg:pipeline: `$group ` passes its output along the pipeline, it must
224
- receive the entity of its input. For the case of :agg:pipeline: `$group `
227
+ receive the entirety of its input. For the case of :agg:pipeline: `$group `
225
228
this frequently does not require as much memory as
226
229
:agg:pipeline: `$sort `, because it only needs to retain one record for
227
230
each unique key in the grouping specification.
@@ -236,14 +239,14 @@ Sharded Operation
236
239
237
240
The aggregation framework is compatible with sharded collections.
238
241
239
- When the operating on a sharded collection, the aggregation pipeline
240
- splits into two parts. The aggregation framework pushes all of the
242
+ When operating on a sharded collection, the aggregation pipeline
243
+ splits the pipeline into two parts. The aggregation framework pushes all of the
241
244
operators up to and including the first :agg:pipeline: `$group ` or
242
- :agg:pipeline: `$sort ` to each shard using the results received from the
243
- shards. [#match-sharding ]_ Then, a second pipeline on the
245
+ :agg:pipeline: `$sort ` to each shard.
246
+ [#match-sharding ]_ Then, a second pipeline on the
244
247
:program: `mongos ` runs. This pipeline consists of the first
245
248
:agg:pipeline: `$group ` or :agg:pipeline: `$sort ` and any remaining pipeline
246
- operators
249
+ operators; this is run on the results received from the shards.
247
250
248
251
The :program: `mongos ` pipeline merges :agg:pipeline: `$sort ` operations
249
252
from the shards. The :agg:pipeline: `$group `, brings any “sub-totals”
0 commit comments