Skip to content

Index comments #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 72 additions & 6 deletions draft/administration/indexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ of the ``people`` collection:

db.people.ensureIndex( { phone-number: 1 } )

TODO: you need ""s around phone-number, otherwise it's invalid JS (phone minus number).

To create a :ref:`compound index <index-type-compound>`, use an
operation that resembles the following prototype:

Expand All @@ -60,13 +62,17 @@ collection:
To build indexes for a :term:`replica set`, before version 2.2,
see :ref:`index-building-replica-sets`.

TODO: I don't think anything changed about replica set index builds for 2.2...

.. [#ensure] As the name suggests, :func:`ensureIndex() <db.collection.ensureIndex()>`
only creates an index if an index of the same specification does
not already exist.

Sparse
``````

TODO: Sparse? Maybe "Types of Indexes->Sparse"?

To create a :ref:`sparse index <index-type-sparse>` on a field, use an
operation that resembles the following prototype:

Expand All @@ -87,6 +93,12 @@ without the ``twitter_name`` field.

MongoDB cannot create sparse compound indexes.

TODO: is this true? I thought that it could.

TODO: Is there more doc on spare indexes somewhere? Seems like this is missing
some info like getting different results back when the index is used, null
counts as existing, etc.

Unique
``````

Expand All @@ -105,10 +117,14 @@ records for the same legal entity:

db.accounts.ensureIndex( { tax-id: 1 }, { unique: true } )

TODO: tax-id should be in ""s.

The :ref:`_id index <index-type-primary>` is a unique index. In some
situations you may want to use the ``_id`` field for these primary
data rather than using a unique index on another field.

TODO: "for these primary data"?

In many situations you will want to combine the ``unique`` constraint
with the ``sparse`` option. When MongoDB indexes a field, if a
document does not have a value for a field, the index entry for that
Expand Down Expand Up @@ -141,6 +157,8 @@ as in the following example:

db.accounts.dropIndex( { tax-id: 1 } )

TODO: ""s!

This will remove the index on the ``tax-id`` field in the ``accounts``
collection. The shell provides the following document after completing
the operation:
Expand Down Expand Up @@ -203,6 +221,12 @@ for this operation.
To rebuild indexes for a :term:`replica set`, before version 2.2,
see :ref:`index-rebuilding-replica-sets`.

TODO: again, this probably isn't different in 2.2

TODO: one thing that I would appreciate you mentioning is that some drivers may
create indexes like {a : NumberLong(1)} _which is fine_ and doesn't break
anything so stop complaining about it.

Special Creation Options
~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -211,6 +235,8 @@ Special Creation Options
TTL collections use a special ``expire`` index option. See
:doc:`/tutorial/expire-data` for more information.

TODO: Are 2d indexes getting a mention?

Background
``````````

Expand All @@ -222,11 +248,25 @@ prototype invocation of :func:`db.collection.ensureIndex()`:

db.collection.ensureIndex( { a: 1 }, { background: true } )

TODO: what does it mean to build an index in the background? You might want to
mention:
* performance implications
* that this type of index build can be killed
* that this blocks the connection you sent the ensureindex on, but ops from
other connections can proceed in
* that indexes are created on the foreground on secondaries in 2.0,
which blocks replication & slave reads. In 2.2, it does not block reads (but
still blocks repl).

Drop Duplicates
```````````````

To force the creation of a :ref:`unique index <index-type-unique>`
index, you can use the ``dropDups`` option. This will force MongoDB to
index

TODO: " on a collection with duplicate values in the field to be indexed "

you can use the ``dropDups`` option. This will force MongoDB to
create a *unique* index by deleting documents with duplicate values
when building the index. Consider the following prototype invocation
of :func:`db.collection.ensureIndex()`:
Expand All @@ -243,12 +283,15 @@ See the full documentation of :ref:`duplicate dropping
Specifying ``{ dropDups: true }`` will delete data from your
database. Use with extreme caution.

TODO: I'd say it "may" delete data from your DB, not like it's going to go all
Shermanesque on your data.

.. _index-building-replica-sets:

Building Indexes on Replica Sets
--------------------------------

.. versionchanged:: 2.2
.. versionchanged:: 2.2
Index rebuilding operations on :term:`secondary` members of
:term:`replica sets <replica set>` now run as normal background
index operations. Run :func:`ensureIndex()
Expand All @@ -257,20 +300,30 @@ Building Indexes on Replica Sets
the following operation to isolate and control the impact of
indexing building operations on a set as a whole.

TODO: I think there needs to be a huge mention that this still blocks
replication, so the procedure below is recommended.

.. admonition:: For Version 1.8 and 2.0

:ref:`Background index creation operations
<index-creation-background>` became *foreground* indexing
operations on :term:`secondary` members of replica sets. These
foreground operations will block all replication on the
secondaries, and can impact performance of the entire set. To build
secondaries,

TODO: and don't allow any reads to go through.

and can impact performance of the entire set. To build
indexes with minimal impact on a replica set, use the following
procedure for all non-trivial index builds:

#. Stop the :program:`mongod` process on one secondary. Restart the
:program:`mongod` process *without* the :option:`--replSet <mongod --replSet>`
:program:`mongod` process *without* the :option:`--replSet <mongod --replSet>`
option. This instance is now in "standalone" mode.

TODO: generally we recommend running it on a different port, too, so that apps
& other servers in the set don't try to contact it.

#. Create the new index or rebuild the index on this :program:`mongod`
instance.

Expand All @@ -287,7 +340,7 @@ Building Indexes on Replica Sets

Ensure that your :ref:`oplog` is large enough to permit the
indexing or re-indexing operation to complete without falling
too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`"
too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`"
documentation for additional information.

.. note::
Expand All @@ -301,6 +354,9 @@ Building Indexes on Replica Sets
For the best results, always create indexes *before* you begin
inserting data into a collection.

TODO: well, sort of. That'll build the indexes fast, but make the inserts
slower. Overall, it's faster to insert data, then build indexes.

Measuring Index Use
-------------------

Expand All @@ -318,7 +374,12 @@ following tools:
- :func:`cursor.hint()`

Append the :func:`hint() <cursor.hint()>` to any cursor (e.g.
query) with the name of an index as the argument to *force* MongoDB
query) with the name

TODO: this isn't "the name of an index." I'd say just "with the index." The
name of an index is a string like "zipcode_1".

of an index as the argument to *force* MongoDB
to use a specific index to fulfill the query. Consider the following
example:

Expand All @@ -331,8 +392,13 @@ following tools:
<cursor.explain()>` in conjunction with each other to compare the
effectiveness of a specific index.

TODO: mention $natural to force no index usage?

- :status:`indexCounters`

Use the :status:`indexCounters` data in the output of
:dbcommand:`serverStatus` for insight into database-wise index
utilization.

TODO: I'd like to see this also cover how to track how far an index build has
gotten and how to kill an index build.
41 changes: 39 additions & 2 deletions draft/applications/indexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ database. To use a covered index you must:
- in the :term:`projection`, explicitly exclude the ``_id`` field from
the result set, unless the index includes ``_id``.

TODO: the third point seems like part of the first point.

Use the :func:`explain() <cursor.explain()>` to test the query. If
MongoDB was able to use a covered index, then the value of the
``indexOnly`` field will be ``true``.
Expand All @@ -49,7 +51,12 @@ disk, and indexes are smaller than the documents they catalog.
Sort Using Indexes
~~~~~~~~~~~~~~~~~~

While the :dbcommand:`sort` database command and the :func:`sort()
While the :dbcommand:`sort` database command

TODO: sort database command? Is "database command" being used in a different
sense here?

and the :func:`sort()
<cursor.sort()>` helper support in-memory sort operations without the
use of an index, these operations are:

Expand Down Expand Up @@ -77,6 +84,9 @@ results. For example:
When using compound indexes to support sort operations, the sorted
field must be the *last* field in the index.

TODO: not true! In 2.2, you can use, say, the index above for a query on
username, sort by status, too.

Store Indexes in Memory
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -124,6 +134,8 @@ deep understanding of:

MongoDB can only use *one* index to support any given operation.

TODO: trickily put. I hope you menion $or elsewhere?

Selectivity
~~~~~~~~~~~

Expand All @@ -145,9 +157,22 @@ with fulfilling the query.
these values using the index, MongoDB will only need to scan a very
small number of documents to fulfill the rest of the query.

TODO: It'd be clearer to use "real" numbers in the second example, too, but I
think you'd have to re-jigger the example to do so.

To ensure optimal performance, use indexes that are maximally
selective relative to your queries.

TODO: the example makes selectivity sound like the uniqueness of the index,
which isn't the whole story. Having something like {x:{$gt:3}} that matches 60%
of the collection isn't very selective, even if x has a unique index on it.

I think it's important to emphasize that selectivity is whittling down possible
results to as small a % as possible.

TODO: Also, might be worth mentioning that, if you cannot get selectivity low
enough, indexes will actually be slower than table scans.

Insert Throughput
~~~~~~~~~~~~~~~~~

Expand All @@ -156,20 +181,28 @@ Insert Throughput
.. TODO fact check

MongoDB must update all indexes associated with a collection following
every insert or update operation. Every index on a collection adds
every insert or update operation.

TODO: or delete, too

Every index on a collection adds
some amount of overhead to these operations. In almost every case, the
performance gains that indexes realize for read operations are worth
the insertion penalty; however:

- in some cases, an index to support an infrequent query may incur
more insert-related costs, than saved read-time.

TODO: rm comma: "insert-related costs than saved read-time"

- in some situations, if you have many indexes on a collection with a
high insert throughput and a number of very similar indexes, you may
find better overall results by using a slightly less effective index
on some queries if it means consolidating the total number of
indexes.

TODO: do you cover what indexes overlap?

Index Size
~~~~~~~~~~

Expand All @@ -182,9 +215,13 @@ index to locate those documents, MongoDB can maintain a much smaller
- all of your indexes use less space than the documents in the
collection.

TODO: individually or all together?

- the indexes and a reasonable working set can fit RAM at the same
time.

TODO: a reasonable working set?

.. _indexing-right-handed:

Indexes do not have to fit *entirely* into RAM in all cases. If the
Expand Down