Skip to content

Commit db01fd5

Browse files
DOCS-15221 updated with initial copy and tech review input (#1974) (#2023)
1 parent 2891614 commit db01fd5

File tree

1 file changed

+57
-33
lines changed

1 file changed

+57
-33
lines changed

source/core/aggregation-pipeline-optimization.txt

Lines changed: 57 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -402,52 +402,76 @@ using indexes and document filters.
402402
Indexes
403403
~~~~~~~
404404

405-
The :ref:`query planner <query-plans-query-optimization>` analyzes
406-
an aggregation pipeline to determine if :ref:`indexes <indexes>`
407-
can be used to improve pipeline performance.
405+
An aggregation pipeline can use :ref:`indexes <indexes>` from the input
406+
collection to improve performance. Using an index limits the amount of
407+
documents a stage processes. Ideally, an index can :ref:`cover
408+
<read-operations-covered-query>` the stage query. A covered query has
409+
especiallly high performance, since the index returns all matching
410+
documents.
408411

409-
The following list shows some pipeline stages that can use indexes:
412+
For example, a pipeline that consists of :pipeline:`$match`,
413+
:pipeline:`$sort`, :pipeline:`$group` can benefit from indexes at
414+
every stage:
415+
416+
- An index on the :pipeline:`$match` query field can efficiently
417+
identify the relevant data
418+
419+
- An index on the sorting field can return data in sorted order for the
420+
:pipeline:`$sort` stage
421+
422+
- An index on the grouping field that matches the :pipeline:`$sort`
423+
order can return all of the field values needed to execute the
424+
:pipeline:`$group` stage (a covered query)
425+
426+
To determine whether a pipeline uses indexes, review the query plan and
427+
look for ``IXSCAN`` or ``DISTINCT_SCAN`` plans.
428+
429+
.. note::
430+
In some cases, the query planner uses a ``DISTINCT_SCAN`` index plan
431+
that returns one document per index key value. ``DISTINCT_SCAN``
432+
executes faster than ``IXSCAN`` if there are multiple documents per
433+
key value. However, index scan parameters might affect the time
434+
comparison of ``DISTINCT_SCAN`` and ``IXSCAN``.
435+
436+
For early stages in your aggregation pipeline, consider indexing the
437+
query fields. Stages that can benefit from indexes are:
410438

411439
``$match`` stage
412-
:pipeline:`$match` can use an index to filter documents if
413-
:pipeline:`$match` is the first stage in a pipeline.
440+
:pipeline:`$match` can use an index to filter documents if it is the
441+
first stage in the pipeline, after any optimizations from the
442+
:ref:`query planner <query-plans-query-optimization>`.
414443

415444
``$sort`` stage
416-
:pipeline:`$sort` can use an index if :pipeline:`$sort` is not
417-
preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
418-
:pipeline:`$group` stage.
445+
:pipeline:`$sort` can benefit from an index as long as it is not
446+
preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
447+
:pipeline:`$group` stage.
419448

420449
``$group`` stage
421-
:pipeline:`$group` can potentially use an index to find the first
422-
document in each group if:
450+
:pipeline:`$group` can use an index to find the first document in
451+
each group if it meets all of the following conditions:
423452

424-
- :pipeline:`$group` is preceded by :pipeline:`$sort` that sorts the
425-
field to group by, and
453+
- a :pipeline:`$sort` stage sorts the grouping field before
454+
:pipeline:`$group`
426455

427-
- there is an index on the grouped field that matches the sort order,
428-
and
456+
- an index exists that matches the sort order on the grouped field
429457

430-
- :group:`$first` is the only accumulator in :pipeline:`$group`.
458+
- :group:`$first` is the only accumulator in the :pipeline:`$group`
459+
stage
431460

432-
See :ref:`group-pipeline-optimization` for an example.
461+
See :ref:`$group Performance Optimizations <group-pipeline-optimization>`
462+
for an example.
433463

434-
``$geoNear`` stage
435-
:pipeline:`$geoNear` can use a geospatial index. :pipeline:`$geoNear`
436-
must be the first stage in an aggregation pipeline.
464+
``$geoNear`` stage
465+
:pipeline:`$geoNear` always uses an index, since it must be the first
466+
stage in a pipeline and requires a :ref:`geospatial index <index-feature-geospatial>`.
437467

438-
Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use
439-
a ``DISTINCT_SCAN`` index plan that returns one document per index key
440-
value.
441-
442-
.. note::
443-
``DISTINCT_SCAN`` executes faster than ``IXSCAN`` if multiple
444-
documents per index value exist. However, index scan parameters
445-
might affect the time comparison of ``DISTINCT_SCAN`` and
446-
``IXSCAN``.
468+
Additionally, stages later in the pipeline that retrieve data from
469+
other, unmodified collections can use indexes on those collections
470+
for optimization. These stages include:
447471

448-
Indexes can :ref:`cover <read-operations-covered-query>` queries in an
449-
aggregation pipeline. A covered query uses an index to return all of the
450-
documents and has high performance.
472+
- :pipeline:`$lookup`
473+
- :pipeline:`$graphLookup`
474+
- :pipeline:`$unionWith`
451475

452476
Document Filters
453477
~~~~~~~~~~~~~~~~
@@ -503,4 +527,4 @@ MongoDB increases the :pipeline:`$limit` amount with the reordering.
503527
.. seealso::
504528

505529
:method:`explain <db.collection.aggregate()>` option in the
506-
:method:`db.collection.aggregate()`
530+
:method:`db.collection.aggregate()`

0 commit comments

Comments
 (0)