Skip to content

Commit c55598a

Browse files
jason-price-mongodbjason-price-mongodb
andauthored
DOCS-14901 aggregation refactor (#339) (#426)
* DOCS-14901 aggregation refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901-aggregation-refactor * DOCS-14901 aggregation refactor * DOCS-14901 aggregation refactor * DOCS-14901 aggregation refactor * DOCS-14901 aggregation refactor Co-authored-by: jason-price-mongodb <[email protected]> Co-authored-by: jason-price-mongodb <[email protected]>
1 parent 65726f2 commit c55598a

8 files changed

+319
-223
lines changed

source/aggregation.txt

Lines changed: 42 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,16 @@ results. You can use aggregation operations to:
2121

2222
To perform aggregation operations, you can use:
2323

24-
- :ref:`Aggregation pipelines <aggregation-framework>`
24+
- :ref:`Aggregation pipelines <aggregation-framework>`, which are the
25+
preferred method for performing aggregations.
2526

2627
- :ref:`Single purpose aggregation methods
27-
<single-purpose-agg-operations>`
28+
<single-purpose-agg-methods>`, which are simple but lack the
29+
capabilities of an aggregation pipeline.
2830

29-
- :ref:`Map-reduce functions <aggregation-map-reduce>`
31+
- :ref:`Map-reduce operations <aggregation-map-reduce>`, which are
32+
deprecated starting in MongoDB 5.0. Instead, use an aggregation
33+
pipeline.
3034

3135
.. _aggregation-framework:
3236

@@ -40,50 +44,39 @@ Aggregation Pipeline Example
4044

4145
.. include:: /includes/aggregation-pipeline-example.rst
4246

43-
For a runnable example, see :ref:`Complete Aggregation Pipeline
44-
Example <aggregation-pipeline-example>`.
47+
For runnable examples containing sample input documents, see
48+
:ref:`Complete Aggregation Pipeline Examples
49+
<aggregation-pipeline-examples>`.
4550

46-
Aggregation Pipeline Stages and Operations
47-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51+
.. _single-purpose-agg-methods:
4852

49-
The most basic pipeline stages provide *filters* that operate like
50-
queries and *document transformations* that modify the form
51-
of the output document.
53+
Single Purpose Aggregation Methods
54+
----------------------------------
5255

53-
Other pipeline operations provide tools for grouping and sorting
54-
documents by specific field or fields as well as tools for aggregating
55-
the contents of arrays, including arrays of documents. In addition,
56-
pipeline stages can use :ref:`operators
57-
<aggregation-expression-operators>` for tasks such as calculating the
58-
average or concatenating a string.
56+
You can use the following single purpose aggregation methods to
57+
aggregate documents from a single collection:
5958

60-
The pipeline provides efficient data aggregation using native
61-
operations within MongoDB, and is the preferred method for data
62-
aggregation in MongoDB.
59+
.. list-table::
60+
:header-rows: 1
61+
:widths: 50 50
62+
63+
* - Method
64+
- Description
6365

64-
The aggregation pipeline can operate on a
65-
:doc:`sharded collection </sharding>`.
66+
* - :method:`db.collection.estimatedDocumentCount()`
67+
- Returns an approximate count of the documents in a collection or
68+
a view.
6669

67-
The aggregation pipeline can use indexes to improve its performance
68-
during some of its stages. In addition, the aggregation pipeline has an
69-
internal optimization phase. See
70-
:ref:`aggregation-pipeline-operators-and-performance` and
71-
:doc:`/core/aggregation-pipeline-optimization` for details.
70+
* - :method:`db.collection.count()`
71+
- Returns a count of the number of documents in a collection or a
72+
view.
7273

73-
.. _single-purpose-agg-operations:
74+
* - :method:`db.collection.distinct()`
75+
- Returns an array of documents that have distinct values for the
76+
specified field.
7477

75-
Single Purpose Aggregation Operations
76-
-------------------------------------
77-
78-
MongoDB also provides :method:`db.collection.estimatedDocumentCount()`,
79-
:method:`db.collection.count()` and :method:`db.collection.distinct()`.
80-
81-
All of these operations aggregate documents from a single collection.
82-
While these operations provide simple access to common aggregation
83-
processes, they lack the flexibility and capabilities of an aggregation
84-
pipeline.
85-
86-
.. include:: /images/distinct.rst
78+
The single purpose aggregation methods are simple but lack the
79+
capabilities of an :ref:`aggregation pipeline <aggregation-framework>`.
8780

8881
.. _aggregation-map-reduce:
8982

@@ -92,21 +85,22 @@ Map-Reduce
9285

9386
.. include:: /includes/fact-use-aggregation-not-map-reduce.rst
9487

95-
Additional Features and Behaviors
96-
---------------------------------
97-
98-
For a feature comparison of the aggregation pipeline,
99-
map-reduce, and the special group functionality, see
88+
For a feature comparison of aggregation pipelines and map-reduce, see
10089
:doc:`/reference/aggregation-commands-comparison`.
10190

10291
Learn More
10392
----------
10493

105-
Practical MongoDB Aggregations E-Book
106-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94+
To learn more about aggregations, see:
95+
96+
- :ref:`aggregation-pipeline`
97+
98+
- :ref:`aggregation-expression-operators`
99+
100+
- :ref:`aggregation-pipeline-operator-reference`
107101

108-
For more information on aggregations, read the `Practical MongoDB
109-
Aggregations <https://www.practical-mongodb-aggregations.com>`__ e-book.
102+
- `Practical MongoDB Aggregations
103+
<https://www.practical-mongodb-aggregations.com>`_
110104

111105

112106
.. toctree::

source/core/aggregation-pipeline-optimization.txt

Lines changed: 78 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@ include the :method:`explain <db.collection.aggregate()>` option in the
1919

2020
.. include:: /includes/fact-optimizations-subject-to-change.rst
2121

22+
In addition to learning about the aggregation pipeline optimizations
23+
performed during the optimization phase, you will also see how to
24+
improve aggregation pipeline performance using indexes and document
25+
filters. See
26+
:ref:`aggregation-pipeline-optimization-indexes-and-filters`.
27+
2228
.. _aggregation-pipeline-projection-optimization:
2329

2430
Projection Optimization
@@ -112,11 +118,12 @@ use any values computed in either the :pipeline:`$project` or
112118
:pipeline:`$match` stage before both of the projection stages.
113119

114120
.. note::
115-
After optimization, the filter ``{ name: "Joe Schmoe" }`` is in
116-
a :pipeline:`$match` stage at the beginning of the pipeline. This has
121+
122+
After optimization, the filter ``{ name: "Joe Schmoe" }`` is in a
123+
:pipeline:`$match` stage at the beginning of the pipeline. This has
117124
the added benefit of allowing the aggregation to use an index on the
118-
``name`` field when initially querying the collection.
119-
See :ref:`aggregation-pipeline-operators-and-performance` for more
125+
``name`` field when initially querying the collection. See
126+
:ref:`aggregation-pipeline-optimization-indexes-and-filters` for more
120127
information.
121128

122129
.. _agg-sort-match-optimization:
@@ -151,8 +158,9 @@ can sometimes add a portion of the :pipeline:`$match` stage before the
151158
:pipeline:`$redact` stage. If the added :pipeline:`$match` stage is at
152159
the start of a pipeline, the aggregation can use an index as well as
153160
query the collection to limit the number of documents that enter the
154-
pipeline. See :ref:`aggregation-pipeline-operators-and-performance` for
155-
more information.
161+
pipeline. See
162+
:ref:`aggregation-pipeline-optimization-indexes-and-filters` for more
163+
information.
156164

157165
For example, if the pipeline consists of the following stages:
158166

@@ -383,6 +391,70 @@ option, the ``explain`` output shows the coalesced stage:
383391
}
384392
}
385393

394+
.. _aggregation-pipeline-optimization-indexes-and-filters:
395+
396+
Improve Performance with Indexes and Document Filters
397+
-----------------------------------------------------
398+
399+
The following sections show how you can improve aggregation performance
400+
using indexes and document filters.
401+
402+
Indexes
403+
~~~~~~~
404+
405+
The :ref:`query planner <query-plans-query-optimization>` analyzes
406+
an aggregation pipeline to determine if :ref:`indexes <indexes>`
407+
can be used to improve pipeline performance.
408+
409+
The following list shows some pipeline stages that can use indexes:
410+
411+
``$match`` stage
412+
:pipeline:`$match` can use an index to filter documents if
413+
:pipeline:`$match` is the first stage in a pipeline.
414+
415+
``$sort`` stage
416+
:pipeline:`$sort` can use an index if :pipeline:`$sort` is not
417+
preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
418+
:pipeline:`$group` stage.
419+
420+
``$group`` stage
421+
:pipeline:`$group` can potentially use an index to find the first
422+
document in each group if:
423+
424+
- :pipeline:`$group` is preceded by :pipeline:`$sort` that sorts the
425+
field to group by, and
426+
427+
- there is an index on the grouped field that matches the sort order,
428+
and
429+
430+
- :group:`$first` is the only accumulator in :pipeline:`$group`.
431+
432+
See :ref:`group-pipeline-optimization` for an example.
433+
434+
``$geoNear`` stage
435+
:pipeline:`$geoNear` can use a geospatial index. :pipeline:`$geoNear`
436+
must be the first stage in an aggregation pipeline.
437+
438+
Indexes can :ref:`cover <read-operations-covered-query>` queries in an
439+
aggregation pipeline. A covered query uses an index to return all of the
440+
documents and has high performance.
441+
442+
Document Filters
443+
~~~~~~~~~~~~~~~~
444+
445+
If your aggregation operation requires only a subset of the documents in
446+
a collection, filter the documents first:
447+
448+
- Use the :pipeline:`$match`, :pipeline:`$limit`, and :pipeline:`$skip`
449+
stages to restrict the documents that enter the pipeline.
450+
451+
- When possible, put :pipeline:`$match` at the beginning of the pipeline
452+
to use indexes that scan the matching documents in a collection.
453+
454+
- :pipeline:`$match` followed by :pipeline:`$sort` at the start of the
455+
pipeline is equivalent to a single query with a sort, and can use an
456+
index.
457+
386458
Example
387459
-------
388460
.. _agg-sort-skip-limit-sequence:

0 commit comments

Comments
 (0)