Skip to content

Migrate splitting chunks - DOCS-304 #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 30, 2012
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 63 additions & 29 deletions source/administration/sharding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -404,41 +404,51 @@ stop the processes comprising the ``mongodb0`` shard.
Chunk Management
----------------

This section describes various operations on
:term:`chunks <chunk>` in :term:`shard clusters <shard cluster>`. In
most cases MongoDB automates these processes; however, in some cases,
particularly when you're setting up a shard cluster, you may
need to create and manipulate chunks directly.
This section describes various operations on :term:`chunks <chunk>` in
:term:`shard clusters <shard cluster>`. MongoDB automates these
processes; however, in some cases, particularly when you're setting up
a shard cluster, you may need to create and manipulate chunks
directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-flowing these paragraphs makes reviewing more difficult.

avoid doing this unless you making a significant substantive change, otherwise it wastes time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd line edited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I see now.

I think asserting:

  1. That this document addresses one topic.
  2. MongoDB automates these tasks.

Is rhetorically confusing. The original isn't perfect by any means, but let's revert unless there's another good solution given that this is orthogonal to the project of getting the page redirected.


.. _sharding-procedure-create-split:

Splitting Chunks
~~~~~~~~~~~~~~~~

Normally, MongoDB splits a :term:`chunk` following inserts or updates
when a chunk exceeds the :ref:`chunk size <sharding-chunk-size>`.
Normally, MongoDB splits a :term:`chunk` when a chunk exceeds the
:ref:`chunk size <sharding-chunk-size>`.
Recently split chunks may be moved immediately to a new shard
if :program:`mongos` predicts future insertions will benefit from the
move.

The MongoDB treats all chunks the same, whether split manually or
automatically by the system.

.. warning::

You cannot merge or combine chunks once you have split them.

You may want to split chunks manually if:

- you have a large amount of data in your cluster that is *not* split,
as is the case after creating a shard cluster with existing data.
- you have a large amount of data in your cluster and very few
:term:`chunks <chunk>`,
as is the case after creating a shard cluster from existing data.

- you expect to add a large amount of data that would
initially reside in a single chunk or shard.

.. example::

You plan to insert a large amount of data as the result of an
import process with :term:`shard key` values between ``300`` and
``400``, *but* all values of your shard key between ``250`` and
``500`` are within a single chunk.
You plan to insert a large amount of data with :term:`shard key`
values between ``300`` and ``400``, *but* all values of your shard
keys are between ``250`` and ``500`` are in a single chunk.

Use :func:`sh.status()` to determine the current chunks ranges across
the cluster.
To determine the current chunk ranges across the cluster, use
:func:`sh.status()` or :func:`db.printShardingStatus()`.

To split chunks manually, use either the :func:`sh.splitAt()` or
:func:`sh.splitFind()` helpers in the :program:`mongo` shell.
These helpers wrap the :dbcommand:`split` command.
Split chunks in a collection using the :dbcommand:`split` command with
operators: ``middle`` and ``find``. The equivalent shell helpers are
:func:`sh.splitAt()` or :func:`sh.splitFind()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to encourage people to use the helpers over the dbcommand for administrative tasks like this. It's fine to introduce the helpers first and then say "these helpers wrap this database command."


.. example::

Expand All @@ -450,28 +460,52 @@ These helpers wrap the :dbcommand:`split` command.
sh.splitFind( { "zipcode": 63109 } )

:func:`sh.splitFind()` will split the chunk that contains the *first* document returned
that matches this query into two equal components. MongoDB will split
the chunk so that documents that have half of the shard keys in will
be in one chunk and the documents that have other half of the shard
keys will be a second chunk. The query in :func:`sh.splitFind()` need
not contain the shard key, though it almost always makes sense to
that matches this query into two equal sized chunks.
The query in :func:`sh.splitFind()` may
not be based on the shard key, though it almost always makes sense to
query for the shard key in this case, and including the shard key will
expedite the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph no longer makes sense (the second sentence does not reflect the operation of MongoDB.) While the initial version is a bit verbose, I would revert and then delete/rewrite he redundant second sentence and leave the rest of the paragraph as is, which would resolve the issue without introducing a regression.


However, the location of the document that this query finds with
respect to the other documents in the chunk does not affect how the
chunk splits.

Use :func:`sh.splitAt()` to split a chunk in two using the queried
document as the partition point:

.. code-block:: javascript

sh.splitAt( { "zipcode": 63109 } )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can change the form of this admonition, but I think it's a crucial point, and it needs to remain in the document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still in the document. this line has been moved up in the document to line 427.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. sorry.


.. warning::
However, the location of the document that this query finds with
respect to the other documents in the chunk does not affect how the
chunk splits.

You cannot merge or combine chunks once you have split them.
Pre-splitting Chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should come up with a better term for that.

I think Jenna's doc has the right idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disregard

~~~~~~~~~~~~~~~~~~~~

For large imports, pre-splitting and pre-migrating many chunks
will dramatically improve performance because the system does not need
to split and migrate new chunks during import.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-splitting is jargony and not introduced anywhere, so comes out of the blue in this document, and I don't think this introductory paragraph provides enough context or justification.

how large is too large?

what about conversions of replica sets/standalones into shard clusters?

etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cases:

  • Lots of data in a mongod, which becomes shard one.
  • importing/ingesting lots of data at once. (batch workloads)
  • full system restore. (subset of previous.)
  • growing a cluster rapidly (two shards to five shards, when combined with one of the above.)

and small number of total chunks relative to data size.


#. Make many chunks by splitting empty chunks in your
collection.

.. example::

To pre-split chunks for 100 million user profiles sharded by
email address for 5 shards, run the following commands in the
mongo shell:

.. code-block:: javascript

for ( var x=97; x<97+26; x++ ){
for( var y=97; y<97+26; y+=6 ) {
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
db.runCommand( { split : <collection> , middle : { email : prefix } } );
}
}

#. Move chunks to different shard by using the balancer or manually
moving chunks.

#. Insert data into the shard cluster using a custom script for your data.

.. _sharding-balancing-modify-chunk-size:

Expand Down