Skip to content

DOCS-1197 document hashed shard keys #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions source/administration/indexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,30 @@ without the ``twitter_name`` field.
index. See the :ref:`sparse index <index-type-sparse>` section for
more information.


.. index:: index; hashed
.. _index-hashed-index:

Hashed Indexes
~~~~~~~~~~~~~~

.. versionadded:: 2.4

To create a :ref:`index-type-hashed`, specify `"hashed"` as
the value of the index key:

.. example::

.. code-block:: javascript

db.collection.ensureIndex({a:"hashed"})

A hashed index can be created on any single field.
The hashing function collapses compound documents together
and hashes the resulting data.

A hashed index cannot be combined with other index specifications.

.. index:: index; unique
.. _index-unique-index:

Expand Down
2 changes: 2 additions & 0 deletions source/administration/tag-aware-sharding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ sharding in MongoDB deployments.
Shard key range tags are entirely distinct from :ref:`replica set member
tags <replica-set-read-preference-tag-sets>`.

Tag-aware sharding cannot be used with :term:`hashed shard keys <hashed shard key>`.

Behavior and Operations
-----------------------

Expand Down
59 changes: 55 additions & 4 deletions source/core/indexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ MongoDB indexes have the following core features:
requirements as you create indexes in your MongoDB environment.

- All MongoDB indexes use a B-tree data structure. MongoDB can use
these representation of the data to optimize query responses.
this representation of the data to optimize query responses.

- Every query, including update operations, use one and only one
- Every query, including update operations, uses one and only one
index. The :ref:`query optimizer <read-operations-query-optimization>`
selects the index empirically by
occasionally running alternate query plans and by selecting the plan
Expand Down Expand Up @@ -82,7 +82,7 @@ Index Types
This section enumerates the types of indexes available in MongoDB.
For all collections, MongoDB creates the default :ref:`_id index
<index-type-id>`. You can create additional indexes with the
:method:`ensureIndex() <db.collection.ensureIndex()>` method on any
:method:`~db.collection.ensureIndex()` method on any
single field or :ref:`sequence of fields <index-type-compound>` within
any document or :ref:`sub-document <index-sub-document>`. MongoDB also
supports indexes of arrays, called :ref:`multi-key indexes
Expand Down Expand Up @@ -274,6 +274,13 @@ index, however, would not support queries that select the following:
- only the ``location`` and ``stock`` fields
- only the ``item`` and ``stock`` fields


.. note::

:ref:`Hashed indexes <index-hashed-index>` are incompatible with compound indexes. You will receive
an error if you attempt to create a compound index including a hashed
index.

When creating an index, the number associated with a key specifies the
direction of the index. The options are ``1`` (ascending) and ``-1``
(descending). Direction doesn't matter for single key indexes or for
Expand Down Expand Up @@ -360,6 +367,14 @@ value in the array separately, in a "multikey index."
Queries could use the multikey index to return queries for any of
the above values.

.. note::

MongoDB computes values for a hashed index on the entire content of a field,
including fields that hold arrays or sub-documents.

For fields that hold arrays and sub-documents, you cannot use the index to
support any query that introspects the value of an array or sub-document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't in the hashed index section which means it's probably a bit out of place, and also I thought re: our conversation yesterday, you couldn't use "hashed" index on multi-key fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot create a hashed index on an array.
You can create a hashed index on apparently any other BSON type (so a: {b: {c: {d: 1, e: 2}}} is a valid document for a hashed index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "for fields that hold arrays" (which would be a multi-key index) is misleading or confusing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed references to arrays, which I thought I'd done already.


You can use multikey indexes to index fields within objects embedded in
arrays, as in the following example:

Expand Down Expand Up @@ -509,6 +524,42 @@ By default, ``sparse`` is ``false`` on MongoDB indexes.
have the indexed field *are* indexed in a sparse index, even if
that field stores a null value in some documents.

.. index:: index; hashed
.. _index-type-hashed:

Hashed Index
~~~~~~~~~~~~

.. versionadded:: 2.4

Hashed indexes contain entries consisting of a hash of the indexed field.
Hashed indexes cannot be compound indexes.
Hashed indexes can be created on only one field which may not contain
an array as a value.
Hashed indexes cannot have a ``unique`` constraint.

MongoDB can use the hashed index to support equality queries, but cannot
use these indexes for range queries.

It is possible to create a hashed and non-hashed index on the same field:
MongoDB will use the scalar index for range queries.

.. _hashed-index-warning:

.. include:: /includes/warning-hashed-index-floating-point.rst

Create a hashed index using an operation that resembles the
following:

.. code-block:: javascript

db.active.ensureIndex( { a: "hashed" } )

This operation creates a hashed index for the ``active`` collection on
the ``a`` field.

.. [#hash-size] The hash stored in the hashed index is 64 bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash is 64 bits, but we only store the first 32 bits? This assertion misleads given the truncation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The md5 hash is 128 bits, we store 64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only notable if you know that MD5s are 128 bits, so maybe we should say that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with the caveat that the hash function may change to sha1 or something else.


.. index:: index; options
.. _index-creation-operations:
.. _index-operations:
Expand Down Expand Up @@ -563,7 +614,7 @@ construction:
.. versionchanged:: 2.4
Before 2.4, a :program:`mongod` instance could only build one
background index per database at a time.

.. versionchanged:: 2.2
Before 2.2, a single :program:`mongod` instance could only build
one index at a time.
Expand Down
39 changes: 38 additions & 1 deletion source/core/sharded-cluster-internals.txt
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Choosing a Shard Key

It is unlikely that any single, naturally occurring key in your
collection will satisfy all requirements of a good shard key. There
are three options:
are four options:

#. Compute a more ideal shard key in your application layer,
and store this in all of your documents, potentially in the
Expand All @@ -249,6 +249,13 @@ are three options:
- expected data size, or
- query patterns and demands.

#. .. versionadded:: 2.4
Utilize a :term:`hashed shard key`.
With a hashed shard key you choose a field that has high variability
and create a hashed index on that field.
MongoDB then uses the values of this hashed index as the shard key
values, thus ensuring an even distribution across the shards.

From a decision making stand point, begin by finding the field
that will provide the required :ref:`query isolation
<sharding-shard-key-query-isolation>`, ensure that :ref:`writes will
Expand Down Expand Up @@ -308,6 +315,36 @@ and you want to replace this with an index on the field ``{ zipcode:
If you drop the last appropriate index for the shard key, recover
by recreating a index on just the shard key.

.. _sharding-hashed-shard-key-internals:

Hashed Shard Keys
~~~~~~~~~~~~~~~~~

.. versionadded:: 2.4

Hashed shard keys
use a special :ref:`hashed index type <index-type-hashed>` that stores
hashes of the shard key field to partition data in a cluster.

Use hashed shard keys when the desired shard key has high
cardinality but uneven distribution, or increases monotonically.

.. example::

The :term:`ObjectId` value of the default ``_id`` field in MongoDB
documents has good cardinality but can lead to a hot shard as new
documents are always inserted on the last shard.

A hashed index on an :term:`ObjectId` will lead to an even distribution
of documents across all shards as the hash of two sequential documents
will be consistently different.

Do not use tag aware sharding with hashed shard keys.
Tags are applied to the hashed field value in the index, and not
the underlying field value used to compute the hash.

.. include:: /includes/warning-hashed-index-floating-point.rst

.. index:: balancing; internals
.. _sharding-balancing-internals:

Expand Down
46 changes: 46 additions & 0 deletions source/core/sharded-clusters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,52 @@ the optimal key. In those situations, computing a special purpose
shard key into an additional field or using a compound shard key may
help produce one that is more ideal.

.. _sharding-hashed-sharding:

Hashed Sharding
---------------

.. versionadded:: 2.4

:ref:`Hashed shard keys <sharding-hashed-shard-key-internals>` use a :ref:`hashed index <index-hashed-index>` of
the chosen field as the value in the index used to partition data
across your sharded cluster.

.. example::

To shard a collection using a hashed shard key, issue an operation in
the :program:`mongo` shell that resembles the following:

.. code-block:: javascript

sh.shardCollection( "records.active", { a: "hashed" } )

This operation shards the ``active`` collection in the ``records``
database, using a hash of the ``a`` field as the shard key.

The field you choose as your hashed shard key should have a distribution
of values.

A field with a high degree of cardinality, or with ever increasing values
would be an ideal choice as a hashed shard key.

If you shard an empty collection using a hashed shard key,
MongoDB will automatically create and migrate chunks so that
each shard has two chunks.
You can control how many chunks MongoDB will create with the
``numInitialChunks`` parameter to :dbcommand:`shardCollection`.

See :ref:`index-hashed-index` for limitations on hashed indexes.

.. include:: /includes/warning-hashed-index-floating-point.rst

.. warning::

Hashed shard keys are only supported by MongoDB 2.4 and greater
versions of the :program:`mongos` program. After sharding a
collection with a hashed shard key you must use MongoDB 2.4 or
greater mongos instances in your sharded cluster.

.. index:: balancing
.. _sharding-balancing:

Expand Down
11 changes: 7 additions & 4 deletions source/faq/sharding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -369,10 +369,13 @@ performance. However, if you have a high insert volume, a
monotonically increasing shard key may be a limitation.

To address this issue, you can use a field with a value that stores
the hash of a key with an ascending value. While you can compute a
hashed value in your application and include this value in your
documents for use as a shard key, the :issue:`SERVER-2001` issue will
implement this capability within MongoDB.
the hash of a key with an ascending value.

.. versionchanged:: 2.4
You can use a :ref:`hashed index <index-type-hashed>` and
:term:`hashed shard key`
or you can compute and maintain this hashed value in your
application.

What do ``moveChunk commit failed`` errors mean?
------------------------------------------------
Expand Down
8 changes: 8 additions & 0 deletions source/includes/warning-cannot-unshard-collection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. warning::

There is no supported means to un-shard a collection after running
:dbcommand:`shardCollection`.
Additionally, once you have sharded a collection you cannot
change shard keys, or update the value of any field used in
your shard key index.

9 changes: 9 additions & 0 deletions source/includes/warning-hashed-index-floating-point.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. warning::

Hashed indexes truncate floating point numbers to 64-bit integers
before hashing. For example, a hashed index would store the same
value for a field that held a value of ``2.3``, ``2.2`` and ``2.9``.
To prevent collisions do not use a hashed index for floating point
numbers that cannot be consistently converted to 64-bit integers (and
then back to floating point.) Hashed indexes do not support floating
point values larger than 2\ :sup:`53`.
41 changes: 31 additions & 10 deletions source/reference/command/shardCollection.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,38 @@ shardCollection
shard. ``<shardkey>`` is a document, and takes the same form as an
:ref:`index specification document <document-index-specification>`.

:param string shardCollection:

Specify the namespace of a collection to be sharded in the form
``<db>.<collection>``.

:param document key:

Specify the index specification to use as the shard key. The
index must exist prior to the :dbcommand:`shardCollection` command
unless the collection is empty. If the collection is empty, then
MongoDB will create the index prior to sharding the collection.

.. versionadded:: 2.4
The key may be in the form ``{ field : "hashed" }`` which will
use the specified field as a hashed shard key.

:param integer numInitialChunks:

.. versionadded:: 2.4
Specify the number of chunks to create upon sharding the
collection. The collection will then be pre-split and balanced
across the specified number of chunks.

You can create at most ``8192`` chunks using ``numInitialChunks``.

Choosing the right shard key to effectively distribute load among
your shards requires some planning.
your shards requires some planning. Also review
:ref:`sharding-shard-key` regarding choosing a shard key.

.. seealso:: :doc:`/sharding` for more information related to
sharding. Also consider the section on :ref:`sharding-shard-key`
for documentation regarding shard keys.
.. include:: /includes/warning-cannot-unshard-collection.rst

.. warning::
.. seealso::

There's no easy way to disable sharding after running :dbcommand:`shardCollection`. In addition,
you cannot change shard keys once set. If you must convert a sharded cluster to a :term:`standalone`
node or :term:`replica set`, you must make a single backup of the entire cluster
and then restore the backup to the standalone :program:`mongod`
or the replica set..
:doc:`/sharding`, :doc:`/core/sharded-clusters`, and
:doc:`/tutorial/deploy-shard-cluster`.
6 changes: 6 additions & 0 deletions source/reference/glossary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,12 @@ Glossary
uses to distribute documents among members of the
:term:`sharded cluster`.

hashed shard key
A :ref:`hashed shard key <index-type-hashed>`
is a special type of :term:`shard key` where
a hash of the shard key field is uses to distribute
documents among members of the :term:`sharded cluster`.

query
A read request. MongoDB queries use a :term:`JSON`-like query
language that includes a variety of :term:`query operators <operator>`
Expand Down
12 changes: 9 additions & 3 deletions source/reference/method/db.collection.ensureIndex.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,16 @@ db.collection.ensureIndex()
fields to index and order of the index. A
``1`` specifies ascending and a ``-1``
specifies descending.
A value of ``"hashed"`` can be used to create an
index on hashed values of the specified field.
Hashed indexes are primarily used to support
:term:`hashed shard keys`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove this addition, and refactor the page to say "To create other kinds of indexes, see " and link people out to another index type page, because we'll end up being too redundant in this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer handling the refactoring as a separate ticket.


:param document options: A :term:`document` that controls the creation
of the index. This argument is optional.

.. warning:: Index names, including their full namespace
(i.e. ``database.collection``) can be no longer than 128
(i.e. ``database.collection``) cannot be longer than 128
characters. See the :method:`db.collection.getIndexes()` field
":data:`~system.indexes.name`" for the names of existing indexes.

Expand All @@ -33,7 +37,7 @@ db.collection.ensureIndex()
``[key]``.

If the ``keys`` document specifies more than one field, than
:method:`db.collection.ensureIndex()` creates a :term:`compound
:method:`~db.collection.ensureIndex()` creates a :term:`compound
index`. To specify a compound index use the following form:

.. code-block:: javascript
Expand All @@ -43,6 +47,8 @@ db.collection.ensureIndex()
This command creates a compound index on the ``key`` field
(in ascending order) and ``key1`` field (in descending order.)

A compound index cannot include a :ref:`hashed index <index-type-hashed>`.

.. note::

The order of an index is important for supporting
Expand Down Expand Up @@ -125,7 +131,7 @@ db.collection.ensureIndex()
and faster index format.

Please be aware of the following behaviors of
:method:`ensureIndex() <db.collection.ensureIndex()>`:
:method:`~db.collection.ensureIndex()`:

- To add or change index options you must drop the index using the
:method:`db.collection.dropIndex()` and issue another
Expand Down
Loading