@@ -402,52 +402,76 @@ using indexes and document filters.
402
402
Indexes
403
403
~~~~~~~
404
404
405
- The :ref:`query planner <query-plans-query-optimization>` analyzes
406
- an aggregation pipeline to determine if :ref:`indexes <indexes>`
407
- can be used to improve pipeline performance.
405
+ An aggregation pipeline can use :ref:`indexes <indexes>` from the input
406
+ collection to improve performance. Using an index limits the amount of
407
+ documents a stage processes. Ideally, an index can :ref:`cover
408
+ <read-operations-covered-query>` the stage query. A covered query has
409
+ especiallly high performance, since the index returns all matching
410
+ documents.
408
411
409
- The following list shows some pipeline stages that can use indexes:
412
+ For example, a pipeline that consists of :pipeline:`$match`,
413
+ :pipeline:`$sort`, :pipeline:`$group` can benefit from indexes at
414
+ every stage:
415
+
416
+ - An index on the :pipeline:`$match` query field can efficiently
417
+ identify the relevant data
418
+
419
+ - An index on the sorting field can return data in sorted order for the
420
+ :pipeline:`$sort` stage
421
+
422
+ - An index on the grouping field that matches the :pipeline:`$sort`
423
+ order can return all of the field values needed to execute the
424
+ :pipeline:`$group` stage (a covered query)
425
+
426
+ To determine whether a pipeline uses indexes, review the query plan and
427
+ look for ``IXSCAN`` or ``DISTINCT_SCAN`` plans.
428
+
429
+ .. note::
430
+ In some cases, the query planner uses a ``DISTINCT_SCAN`` index plan
431
+ that returns one document per index key value. ``DISTINCT_SCAN``
432
+ executes faster than ``IXSCAN`` if there are multiple documents per
433
+ key value. However, index scan parameters might affect the time
434
+ comparison of ``DISTINCT_SCAN`` and ``IXSCAN``.
435
+
436
+ For early stages in your aggregation pipeline, consider indexing the
437
+ query fields. Stages that can benefit from indexes are:
410
438
411
439
``$match`` stage
412
- :pipeline:`$match` can use an index to filter documents if
413
- :pipeline:`$match` is the first stage in a pipeline.
440
+ :pipeline:`$match` can use an index to filter documents if it is the
441
+ first stage in the pipeline, after any optimizations from the
442
+ :ref:`query planner <query-plans-query-optimization>`.
414
443
415
444
``$sort`` stage
416
- :pipeline:`$sort` can use an index if :pipeline:`$sort` is not
417
- preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
418
- :pipeline:`$group` stage.
445
+ :pipeline:`$sort` can benefit from an index as long as it is not
446
+ preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
447
+ :pipeline:`$group` stage.
419
448
420
449
``$group`` stage
421
- :pipeline:`$group` can potentially use an index to find the first
422
- document in each group if:
450
+ :pipeline:`$group` can use an index to find the first document in
451
+ each group if it meets all of the following conditions :
423
452
424
- - :pipeline:`$group` is preceded by :pipeline:`$sort` that sorts the
425
- field to group by, and
453
+ - a :pipeline:`$sort` stage sorts the grouping field before
454
+ :pipeline:`$ group`
426
455
427
- - there is an index on the grouped field that matches the sort order,
428
- and
456
+ - an index exists that matches the sort order on the grouped field
429
457
430
- - :group:`$first` is the only accumulator in :pipeline:`$group`.
458
+ - :group:`$first` is the only accumulator in the :pipeline:`$group`
459
+ stage
431
460
432
- See :ref:`group-pipeline-optimization` for an example.
461
+ See :ref:`$group Performance Optimizations <group-pipeline-optimization>`
462
+ for an example.
433
463
434
- ``$geoNear`` stage
435
- :pipeline:`$geoNear` can use a geospatial index. :pipeline:`$geoNear`
436
- must be the first stage in an aggregation pipeline .
464
+ ``$geoNear`` stage
465
+ :pipeline:`$geoNear` always uses an index, since it must be the first
466
+ stage in a pipeline and requires a :ref:`geospatial index <index-feature-geospatial>` .
437
467
438
- Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use
439
- a ``DISTINCT_SCAN`` index plan that returns one document per index key
440
- value.
441
-
442
- .. note::
443
- ``DISTINCT_SCAN`` executes faster than ``IXSCAN`` if multiple
444
- documents per index value exist. However, index scan parameters
445
- might affect the time comparison of ``DISTINCT_SCAN`` and
446
- ``IXSCAN``.
468
+ Additionally, stages later in the pipeline that retrieve data from
469
+ other, unmodified collections can use indexes on those collections
470
+ for optimization. These stages include:
447
471
448
- Indexes can :ref:`cover <read-operations-covered-query>` queries in an
449
- aggregation pipeline. A covered query uses an index to return all of the
450
- documents and has high performance.
472
+ - :pipeline:`$lookup`
473
+ - : pipeline:`$graphLookup`
474
+ - :pipeline:`$unionWith`
451
475
452
476
Document Filters
453
477
~~~~~~~~~~~~~~~~
@@ -503,4 +527,4 @@ MongoDB increases the :pipeline:`$limit` amount with the reordering.
503
527
.. seealso::
504
528
505
529
:method:`explain <db.collection.aggregate()>` option in the
506
- :method:`db.collection.aggregate()`
530
+ :method:`db.collection.aggregate()`
0 commit comments