Skip to content

Commit 124c882

Browse files
committed
Merge pull request #4 from cwestin/master
DOCS-134 review
2 parents cd3839d + e8a0d02 commit 124c882

File tree

2 files changed

+165
-146
lines changed

2 files changed

+165
-146
lines changed

source/applications/aggregation.rst

Lines changed: 42 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ Overview
1212
The MongoDB aggregation framework provides a means to calculate
1313
aggregate values without having to use :doc:`map/reduce
1414
</core/map-reduce>`. While map/reduce is powerful, using map/reduce is
15-
more difficult than necessary for simple aggregation tasks, such as
15+
more difficult than necessary for many simple aggregation tasks, such as
1616
totaling or averaging field values.
1717

1818
If you're familiar with :term:`SQL`, the aggregation framework
19-
provides similar functionality as "``GROUPBY``" and related SQL
19+
provides similar functionality to "``GROUP BY``" and related SQL
2020
operators as well as simple forms of "self joins." Additionally, the
2121
aggregation framework provides projection capabilities to reshape the
2222
returned data. Using projections and aggregation, you can add computed
@@ -38,23 +38,22 @@ underpin the aggregation framework: :term:`pipelines <pipeline>` and
3838
Pipelines
3939
~~~~~~~~~
4040

41-
A pipeline is process that applies a sequence of documents when using
42-
the aggregation framework. For those familiar with UNIX-like shells
43-
(e.g. bash,) the concept is analogous to the pipe (i.e. "``|``") used
44-
to string operations together.
41+
Conceptually, documents from a collection are passed through an
42+
aggregation pipeline, and are transformed as they pass through it.
43+
For those familiar with UNIX-like shells (e.g. bash,) the concept is
44+
analogous to the pipe (i.e. "``|``") used to string text filters together.
4545

4646
In a shell environment the pipe redirects a stream of characters from
4747
the output of one process to the input of the next. The MongoDB
4848
aggregation pipeline streams MongoDB documents from one :doc:`pipeline
4949
operator </reference/aggregation>` to the next to process the
5050
documents.
5151

52-
All pipeline operators processes a stream of documents, and the
52+
All pipeline operators process a stream of documents, and the
5353
pipeline behaves as if the operation scans a :term:`collection` and
54-
passes all matching documents into the "top" of the pipeline. Then,
55-
each operator in the pipleine transforms each document as it passes
56-
through the pipeline. At the end of the pipeline, the aggregation
57-
framework returns documents in the same manner as all other queries.
54+
passes all matching documents into the "top" of the pipeline.
55+
Each operator in the pipleine transforms each document as it passes
56+
through the pipeline.
5857

5958
.. note::
6059

@@ -72,24 +71,26 @@ framework returns documents in the same manner as all other queries.
7271
- :agg:pipeline:`$unwind`
7372
- :agg:pipeline:`$group`
7473
- :agg:pipeline:`$sort`
74+
TODO I'd remove references to $out, since we don't have it yet
7575
- :agg:pipeline:`$out`
7676

7777
.. _aggregation-expressions:
7878

7979
Expressions
8080
~~~~~~~~~~~
8181

82-
Expressions calculate values based on inputs from the pipeline, and
83-
return their results to the pipeline. The aggregation framework
84-
defines expressions in :term:`JSON` using a prefix format.
82+
Expressions calculate values based on documents passing through the pipeline,
83+
and contribute their results to documents flowing through the pipeline.
84+
The aggregation framework defines expressions in :term:`JSON` using a prefix
85+
format.
8586

8687
Often, expressions are stateless and are only evaluated when seen by
8788
the aggregation process. Stateless expressions perform operations such
88-
as: adding the values of two fields together, or extracting the year
89+
as adding the values of two fields together or extracting the year
8990
from a date.
9091

9192
The :term:`accumulator` expressions *do* retain state, and the
92-
:agg:pipeline:`$group` operator uses maintains state (e.g. counts,
93+
:agg:pipeline:`$group` operator maintains that state (e.g.
9394
totals, maximums, minimums, and related data.) as documents progress
9495
through the :term:`pipeline`.
9596

@@ -104,17 +105,17 @@ Invocation
104105
~~~~~~~~~~
105106

106107
Invoke an :term:`aggregation` operation with the :func:`aggregate`
107-
wrapper in the :program:`mongo` shell for the :dbcommand:`aggregate`
108+
wrapper in the :program:`mongo` shell or the :dbcommand:`aggregate`
108109
:term:`database command`. Always call :func:`aggregate` on a
109110
collection object, which will determine the documents that contribute
110111
to the beginning of the aggregation :term:`pipeline`. The arguments to
111-
the :func:`aggregate` function specify a sequence :ref:`pipeline
112+
the :func:`aggregate` function specify a sequence of :ref:`pipeline
112113
operators <aggregation-pipeline-operator-reference>`, where each
113114
:ref:`pipeline operator <aggregation-pipeline-operator-reference>` may
114115
have a number of operands.
115116

116117
First, consider a :term:`collection` of documents named "``article``"
117-
using the following schema or and format:
118+
using the following format:
118119

119120
.. code-block:: javascript
120121
@@ -169,7 +170,10 @@ The aggregation operation in the previous section returns a
169170
if there was an error
170171

171172
As a document, the result is subject to the current :ref:`BSON
172-
Document size <limit-maximum-bson-document-size>`. If you expect the
173+
Document size <limit-maximum-bson-document-size>`.
174+
175+
TODO $out is not going to be available in 2.2, so I'd eliminate this reference
176+
If you expect the
173177
aggregation framework to return a larger result, consider using the
174178
use the :agg:pipeline:`$out` pipeline operator to write the output to a
175179
collection.
@@ -181,22 +185,21 @@ Early Filtering
181185
~~~~~~~~~~~~~~~
182186

183187
Because you will always call :func:`aggregate` on a
184-
:term:`collection` object, which inserts the *entire* collection into
185-
the aggregation pipeline, you may want to increase efficiency in some
186-
situations by avoiding scanning an entire collection.
188+
:term:`collection` object, which logically inserts the *entire* collection into
189+
the aggregation pipeline, you may want to optimize the operation
190+
by avoiding scanning the entire collection whenever possible.
187191

188192
If your aggregation operation requires only a subset of the data in a
189-
collection, use the :agg:pipeline:`$match` to limit the items in the
190-
pipeline, as in a query. These :agg:pipeline:`$match` operations will use
191-
suitable indexes to access the matching element or elements in a
192-
collection.
193-
194-
When :agg:pipeline:`$match` appears first in the :term:`pipeline`, the
195-
:dbcommand:`pipeline` begins with results of a :term:`query` rather than
196-
the entire contents of a collection.
197-
193+
collection, use the :agg:pipeline:`$match` to restrict which items go in
194+
to the top of the
195+
pipeline, as in a query. When placed early in a pipeline, these
196+
:agg:pipeline:`$match` operations will use
197+
suitable indexes to scan only the matching documents in a collection.
198+
199+
TODO we don't do the following yet, but there's a ticket for it. Should we
200+
leave it out for now?
198201
:term:`Aggregation` operations have an optimization phase, before
199-
execution, attempts to re-arrange the pipeline by moving
202+
execution, which attempts to re-arrange the pipeline by moving
200203
:agg:pipeline:`$match` operators towards the beginning to the greatest
201204
extent possible. For example, if a :term:`pipeline` begins with a
202205
:agg:pipeline:`$project` that renames fields, followed by a
@@ -221,7 +224,7 @@ must fit in memory.
221224

222225
:agg:pipeline:`$group` has similar characteristics: Before any
223226
:agg:pipeline:`$group` passes its output along the pipeline, it must
224-
receive the entity of its input. For the case of :agg:pipeline:`$group`
227+
receive the entirety of its input. For the case of :agg:pipeline:`$group`
225228
this frequently does not require as much memory as
226229
:agg:pipeline:`$sort`, because it only needs to retain one record for
227230
each unique key in the grouping specification.
@@ -236,14 +239,14 @@ Sharded Operation
236239

237240
The aggregation framework is compatible with sharded collections.
238241

239-
When the operating on a sharded collection, the aggregation pipeline
240-
splits into two parts. The aggregation framework pushes all of the
242+
When operating on a sharded collection, the aggregation pipeline
243+
splits the pipeline into two parts. The aggregation framework pushes all of the
241244
operators up to and including the first :agg:pipeline:`$group` or
242-
:agg:pipeline:`$sort` to each shard using the results received from the
243-
shards. [#match-sharding]_ Then, a second pipeline on the
245+
:agg:pipeline:`$sort` to each shard.
246+
[#match-sharding]_ Then, a second pipeline on the
244247
:program:`mongos` runs. This pipeline consists of the first
245248
:agg:pipeline:`$group` or :agg:pipeline:`$sort` and any remaining pipeline
246-
operators
249+
operators; this is run on the results received from the shards.
247250

248251
The :program:`mongos` pipeline merges :agg:pipeline:`$sort` operations
249252
from the shards. The :agg:pipeline:`$group`, brings any “sub-totals”

0 commit comments

Comments
 (0)