Skip to content

Commit 81a81c8

Browse files
author
Bob Grabar
committed
DOCS-293 migrate Bulk Inserts page
1 parent 30e9a07 commit 81a81c8

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed

source/administration/sharding.txt

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -767,6 +767,72 @@ to pre-splitting.
767767

768768
.. todo:: insert link to killing a cursor.
769769

770+
.. index:: bulk insert
771+
.. _sharding-bulk-inserts:
772+
773+
Bulk Inserts and Sharding
774+
~~~~~~~~~~~~~~~~~~~~~~~~~
775+
776+
.. todo link the words "bulk insert" to the bulk insert topic when it's
777+
published
778+
779+
When performing a bulk insert into a :term:`sharded collection`, consider
780+
the following:
781+
782+
- If the collection is not yet populated, MongoDB must take time to
783+
"learn" what the key distribution is and how to distribute the data.
784+
To avoid this performance cost, you can pre-split the collection, as
785+
described in :ref:`sharding-administration-pre-splitting`.
786+
787+
- You can parallel import by sending inserts to multiple
788+
:program:`mongos` instances. If the collection is empty, pre-split
789+
first, as described in :ref:`sharding-administration-pre-splitting`.
790+
791+
Monotonically Increasing Shard Key Values
792+
`````````````````````````````````````````
793+
794+
If your shard key monotonically increases during an insert then all the
795+
inserts will go to the last chunk in the collection. The system will
796+
adjust the metadata to keep balance, but at a given time ``t`` all
797+
writes will be going to a single shard, which is undesirable if insert
798+
rate is extremely large. A large insert is one in which the insert
799+
volume is beyond the range that a single shard can process at a given
800+
point in time. Increasing values are fine if the insert volume is within
801+
the range the shard can process.
802+
803+
To avoid sending more writes than a shard can process, use a shard key
804+
that is not increasing in value. For example in some cases you could
805+
reverse all the bits of your shard key, which preserves information
806+
while avoiding the increasing sequence of values.
807+
808+
:term:`BSON` :term:`ObjectIds <ObjectId>` increase in value with each
809+
insert. To more evenly distribute inserts based on this property, you
810+
might want at generation time to reverse the bits of the ObjectIds or to
811+
swap the first and last 16-bit words, to "shuffle" the inserts.
812+
Alternatively you might use UUIDs instead, but check that your UUID
813+
generator does not generate consistent increasing UUIDs, which would
814+
cause the same behavior.
815+
816+
.. example:: The following example, in C++, swaps the leading and
817+
trailing 16-bit word of object IDs generated so that they are no
818+
longer monotonically increasing.
819+
820+
.. code-block:: none
821+
822+
using namespace mongo;
823+
OID make_an_id() {
824+
OID x = OID::gen();
825+
const unsigned char *p = x.getData();
826+
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
827+
return x;
828+
}
829+
830+
void foo() {
831+
// create an object
832+
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
833+
// now we might insert o into a sharded collection...
834+
}
835+
770836
.. index:: balancing; operations
771837
.. _sharding-balancing-operations:
772838

0 commit comments

Comments
 (0)