@@ -767,6 +767,72 @@ to pre-splitting.
767
767
768
768
.. todo:: insert link to killing a cursor.
769
769
770
+ .. index:: bulk insert
771
+ .. _sharding-bulk-inserts:
772
+
773
+ Bulk Inserts and Sharding
774
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
775
+
776
+ .. todo link the words "bulk insert" to the bulk insert topic when it's
777
+ published
778
+
779
+ When performing a bulk insert into a :term:`sharded collection`, consider
780
+ the following:
781
+
782
+ - If the collection is not yet populated, MongoDB must take time to
783
+ "learn" what the key distribution is and how to distribute the data.
784
+ To avoid this performance cost, you can pre-split the collection, as
785
+ described in :ref:`sharding-administration-pre-splitting`.
786
+
787
+ - You can parallel import by sending inserts to multiple
788
+ :program:`mongos` instances. If the collection is empty, pre-split
789
+ first, as described in :ref:`sharding-administration-pre-splitting`.
790
+
791
+ Monotonically Increasing Shard Key Values
792
+ `````````````````````````````````````````
793
+
794
+ If your shard key monotonically increases during an insert then all the
795
+ inserts will go to the last chunk in the collection. The system will
796
+ adjust the metadata to keep balance, but at a given time ``t`` all
797
+ writes will be going to a single shard, which is undesirable if insert
798
+ rate is extremely large. A large insert is one in which the insert
799
+ volume is beyond the range that a single shard can process at a given
800
+ point in time. Increasing values are fine if the insert volume is within
801
+ the range the shard can process.
802
+
803
+ To avoid sending more writes than a shard can process, use a shard key
804
+ that is not increasing in value. For example in some cases you could
805
+ reverse all the bits of your shard key, which preserves information
806
+ while avoiding the increasing sequence of values.
807
+
808
+ :term:`BSON` :term:`ObjectIds <ObjectId>` increase in value with each
809
+ insert. To more evenly distribute inserts based on this property, you
810
+ might want at generation time to reverse the bits of the ObjectIds or to
811
+ swap the first and last 16-bit words, to "shuffle" the inserts.
812
+ Alternatively you might use UUIDs instead, but check that your UUID
813
+ generator does not generate consistent increasing UUIDs, which would
814
+ cause the same behavior.
815
+
816
+ .. example:: The following example, in C++, swaps the leading and
817
+ trailing 16-bit word of object IDs generated so that they are no
818
+ longer monotonically increasing.
819
+
820
+ .. code-block:: none
821
+
822
+ using namespace mongo;
823
+ OID make_an_id() {
824
+ OID x = OID::gen();
825
+ const unsigned char *p = x.getData();
826
+ swap( (unsigned short&) p[0], (unsigned short&) p[10] );
827
+ return x;
828
+ }
829
+
830
+ void foo() {
831
+ // create an object
832
+ BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
833
+ // now we might insert o into a sharded collection...
834
+ }
835
+
770
836
.. index:: balancing; operations
771
837
.. _sharding-balancing-operations:
772
838
0 commit comments