Skip to content

Commit 7b72372

Browse files
DOCS-15221 updated with initial copy and tech review input (#1974) (#2022)
1 parent 0b11590 commit 7b72372

File tree

1 file changed

+57
-33
lines changed

1 file changed

+57
-33
lines changed

source/core/aggregation-pipeline-optimization.txt

Lines changed: 57 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -475,52 +475,76 @@ using indexes and document filters.
475475
Indexes
476476
~~~~~~~
477477

478-
The :ref:`query planner <query-plans-query-optimization>` analyzes
479-
an aggregation pipeline to determine if :ref:`indexes <indexes>`
480-
can be used to improve pipeline performance.
478+
An aggregation pipeline can use :ref:`indexes <indexes>` from the input
479+
collection to improve performance. Using an index limits the amount of
480+
documents a stage processes. Ideally, an index can :ref:`cover
481+
<read-operations-covered-query>` the stage query. A covered query has
482+
especiallly high performance, since the index returns all matching
483+
documents.
481484

482-
The following list shows some pipeline stages that can use indexes:
485+
For example, a pipeline that consists of :pipeline:`$match`,
486+
:pipeline:`$sort`, :pipeline:`$group` can benefit from indexes at
487+
every stage:
488+
489+
- An index on the :pipeline:`$match` query field can efficiently
490+
identify the relevant data
491+
492+
- An index on the sorting field can return data in sorted order for the
493+
:pipeline:`$sort` stage
494+
495+
- An index on the grouping field that matches the :pipeline:`$sort`
496+
order can return all of the field values needed to execute the
497+
:pipeline:`$group` stage (a covered query)
498+
499+
To determine whether a pipeline uses indexes, review the query plan and
500+
look for ``IXSCAN`` or ``DISTINCT_SCAN`` plans.
501+
502+
.. note::
503+
In some cases, the query planner uses a ``DISTINCT_SCAN`` index plan
504+
that returns one document per index key value. ``DISTINCT_SCAN``
505+
executes faster than ``IXSCAN`` if there are multiple documents per
506+
key value. However, index scan parameters might affect the time
507+
comparison of ``DISTINCT_SCAN`` and ``IXSCAN``.
508+
509+
For early stages in your aggregation pipeline, consider indexing the
510+
query fields. Stages that can benefit from indexes are:
483511

484512
``$match`` stage
485-
:pipeline:`$match` can use an index to filter documents if
486-
:pipeline:`$match` is the first stage in a pipeline.
513+
:pipeline:`$match` can use an index to filter documents if it is the
514+
first stage in the pipeline, after any optimizations from the
515+
:ref:`query planner <query-plans-query-optimization>`.
487516

488517
``$sort`` stage
489-
:pipeline:`$sort` can use an index if :pipeline:`$sort` is not
490-
preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
491-
:pipeline:`$group` stage.
518+
:pipeline:`$sort` can benefit from an index as long as it is not
519+
preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
520+
:pipeline:`$group` stage.
492521

493522
``$group`` stage
494-
:pipeline:`$group` can potentially use an index to find the first
495-
document in each group if:
523+
:pipeline:`$group` can use an index to find the first document in
524+
each group if it meets all of the following conditions:
496525

497-
- :pipeline:`$group` is preceded by :pipeline:`$sort` that sorts the
498-
field to group by, and
526+
- a :pipeline:`$sort` stage sorts the grouping field before
527+
:pipeline:`$group`
499528

500-
- there is an index on the grouped field that matches the sort order,
501-
and
529+
- an index exists that matches the sort order on the grouped field
502530

503-
- :group:`$first` is the only accumulator in :pipeline:`$group`.
531+
- :group:`$first` is the only accumulator in the :pipeline:`$group`
532+
stage
504533

505-
See :ref:`group-pipeline-optimization` for an example.
534+
See :ref:`$group Performance Optimizations <group-pipeline-optimization>`
535+
for an example.
506536

507-
``$geoNear`` stage
508-
:pipeline:`$geoNear` can use a geospatial index. :pipeline:`$geoNear`
509-
must be the first stage in an aggregation pipeline.
537+
``$geoNear`` stage
538+
:pipeline:`$geoNear` always uses an index, since it must be the first
539+
stage in a pipeline and requires a :ref:`geospatial index <index-feature-geospatial>`.
510540

511-
Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use
512-
a ``DISTINCT_SCAN`` index plan that returns one document per index key
513-
value.
514-
515-
.. note::
516-
``DISTINCT_SCAN`` executes faster than ``IXSCAN`` if multiple
517-
documents per index value exist. However, index scan parameters
518-
might affect the time comparison of ``DISTINCT_SCAN`` and
519-
``IXSCAN``.
541+
Additionally, stages later in the pipeline that retrieve data from
542+
other, unmodified collections can use indexes on those collections
543+
for optimization. These stages include:
520544

521-
Indexes can :ref:`cover <read-operations-covered-query>` queries in an
522-
aggregation pipeline. A covered query uses an index to return all of the
523-
documents and has high performance.
545+
- :pipeline:`$lookup`
546+
- :pipeline:`$graphLookup`
547+
- :pipeline:`$unionWith`
524548

525549
Document Filters
526550
~~~~~~~~~~~~~~~~
@@ -576,4 +600,4 @@ MongoDB increases the :pipeline:`$limit` amount with the reordering.
576600
.. seealso::
577601

578602
:method:`explain <db.collection.aggregate()>` option in the
579-
:method:`db.collection.aggregate()`
603+
:method:`db.collection.aggregate()`

0 commit comments

Comments
 (0)