DOCS-293 migrate Bulk Inserts page

Bob Grabar · Bob Grabar · commit 81a81c8fad68 · 2012-10-29T09:36:09.000-04:00
diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt
@@ -767,6 +767,72 @@ to pre-splitting.
 
    .. todo:: insert link to killing a cursor.
 
+.. index:: bulk insert
+.. _sharding-bulk-inserts:
+
+Bulk Inserts and Sharding
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. todo link the words "bulk insert" to the bulk insert topic when it's
+   published
+
+When performing a bulk insert into a :term:`sharded collection`, consider
+the following:
+
+- If the collection is not yet populated, MongoDB must take time to
+  "learn" what the key distribution is and how to distribute the data.
+  To avoid this performance cost, you can pre-split the collection, as
+  described in :ref:`sharding-administration-pre-splitting`.
+
+- You can parallel import by sending inserts to multiple
+  :program:`mongos` instances. If the collection is empty, pre-split
+  first, as described in :ref:`sharding-administration-pre-splitting`.
+
+Monotonically Increasing Shard Key Values
+`````````````````````````````````````````
+
+If your shard key monotonically increases during an insert then all the
+inserts will go to the last chunk in the collection. The system will
+adjust the metadata to keep balance, but at a given time ``t`` all
+writes will be going to a single shard, which is undesirable if insert
+rate is extremely large. A large insert is one in which the insert
+volume is beyond the range that a single shard can process at a given
+point in time. Increasing values are fine if the insert volume is within
+the range the shard can process.
+
+To avoid sending more writes than a shard can process, use a shard key
+that is not increasing in value. For example in some cases you could
+reverse all the bits of your shard key, which preserves information
+while avoiding the increasing sequence of values.
+
+:term:`BSON` :term:`ObjectIds <ObjectId>` increase in value with each
+insert. To more evenly distribute inserts based on this property, you
+might want at generation time to reverse the bits of the ObjectIds or to
+swap the first and last 16-bit words, to "shuffle" the inserts.
+Alternatively you might use UUIDs instead, but check that your UUID
+generator does not generate consistent increasing UUIDs, which would
+cause the same behavior.
+
+.. example:: The following example, in C++, swaps the leading and
+   trailing 16-bit word of object IDs generated so that they are no
+   longer monotonically increasing.
+
+   .. code-block:: none
+
+      using namespace mongo;
+      OID make_an_id() {
+        OID x = OID::gen();
+        const unsigned char *p = x.getData();
+        swap( (unsigned short&) p[0], (unsigned short&) p[10] );
+        return x;
+      }
+
+      void foo() {
+        // create an object
+        BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
+        // now we might insert o into a sharded collection...
+      }
+
 .. index:: balancing; operations
 .. _sharding-balancing-operations: