|
| 1 | +.. _clustered-collections: |
| 2 | + |
| 3 | +===================== |
| 4 | +Clustered Collections |
| 5 | +===================== |
| 6 | + |
| 7 | +.. default-domain:: mongodb |
| 8 | + |
| 9 | +.. contents:: On this page |
| 10 | + :local: |
| 11 | + :backlinks: none |
| 12 | + :depth: 1 |
| 13 | + :class: singlecol |
| 14 | + |
| 15 | +.. versionadded:: 5.3 |
| 16 | + |
| 17 | +Overview |
| 18 | +-------- |
| 19 | + |
| 20 | +.. include:: /includes/clustered-collections-introduction.rst |
| 21 | + |
| 22 | +Benefits |
| 23 | +-------- |
| 24 | + |
| 25 | +Because clustered collections store documents ordered by the |
| 26 | +:ref:`clustered index <db.createCollection.clusteredIndex>` key value, |
| 27 | +clustered collections have the following benefits compared to |
| 28 | +non-clustered collections: |
| 29 | + |
| 30 | +- Faster queries on clustered collections without needing a secondary |
| 31 | + index, such as queries with range scans and equality comparisons on |
| 32 | + the clustered index key. |
| 33 | + |
| 34 | +- Clustered collections have a lower storage size, which improves |
| 35 | + performance for queries and bulk inserts. |
| 36 | + |
| 37 | +- Clustered collections can eliminate the need for a secondary :ref:`TTL |
| 38 | + (Time To Live) index <ttl-index>`. |
| 39 | + |
| 40 | + - A clustered index is also a TTL index if you specify the |
| 41 | + :ref:`expireAfterSeconds <db.createCollection.expireAfterSeconds>` |
| 42 | + field. |
| 43 | + |
| 44 | + - To be used as a TTL index, the ``_id`` field must be a supported |
| 45 | + date type. See :ref:`index-feature-ttl`. |
| 46 | + |
| 47 | + - If you use a clustered index as a TTL index, it improves document |
| 48 | + delete performance and reduces the clustered collection storage |
| 49 | + size. |
| 50 | + |
| 51 | +- Clustered collections have additional performance improvements for |
| 52 | + inserts, updates, deletes, and queries. |
| 53 | + |
| 54 | + - All collections have an :ref:`_id index <index-type-id>`. |
| 55 | + |
| 56 | + - A non-clustered collection stores the ``_id`` index separately from |
| 57 | + the documents. This requires two writes for inserts, updates, and |
| 58 | + deletes, and two reads for queries. |
| 59 | + |
| 60 | + - A clustered collection stores the index and the documents together |
| 61 | + in ``_id`` value order. This requires one write for inserts, |
| 62 | + updates, and deletes, and one read for queries. |
| 63 | + |
| 64 | +Behavior |
| 65 | +-------- |
| 66 | + |
| 67 | +Clustered collections store documents ordered by the :ref:`clustered |
| 68 | +index <db.createCollection.clusteredIndex>` key value. |
| 69 | + |
| 70 | +You can only have one clustered index in a collection because the |
| 71 | +documents can be stored in only one order. Only collections with a |
| 72 | +clustered index store the data in sorted order. |
| 73 | + |
| 74 | +You can have a clustered index and add :term:`secondary indexes |
| 75 | +<secondary index>` to a clustered collection. Clustered indexes differ |
| 76 | +from secondary indexes: |
| 77 | + |
| 78 | +- A clustered index can only be created when you create the collection. |
| 79 | + |
| 80 | +- The clustered index keys are stored with the collection. The |
| 81 | + collection size returned by the :dbcommand:`collStats` command |
| 82 | + includes the clustered index size. |
| 83 | + |
| 84 | +Limitations |
| 85 | +----------- |
| 86 | + |
| 87 | +Clustered collection limitations: |
| 88 | + |
| 89 | +- You cannot transform a non-clustered collection to a clustered |
| 90 | + collection, or the reverse. Instead, you can: |
| 91 | + |
| 92 | + - Read documents from one collection and write them to another |
| 93 | + collection using an :ref:`aggregation pipeline |
| 94 | + <aggregation-pipeline-intro>` with an :pipeline:`$out` stage or |
| 95 | + a :pipeline:`$merge` stage. |
| 96 | + |
| 97 | + - Export collection data with :binary:`~bin.mongodump` and import the |
| 98 | + data into another collection with :binary:`~bin.mongorestore`. |
| 99 | + |
| 100 | +- By default, if a :term:`secondary index <secondary index>` exists on |
| 101 | + a clustered collection and the secondary index is usable by your |
| 102 | + query, the secondary index is selected instead of the clustered |
| 103 | + index. |
| 104 | + |
| 105 | + - You must provide a hint to use the clustered index because it |
| 106 | + is not automatically selected by the :doc:`query optimizer |
| 107 | + </core/query-plans>`. |
| 108 | + |
| 109 | + - The :ref:`clustered index <db.createCollection.clusteredIndex>` is |
| 110 | + not automatically used by the query optimizer if a usable secondary |
| 111 | + index exists. |
| 112 | + |
| 113 | + - When a query uses a clustered index, it will perform a |
| 114 | + :term:`bounded collection scan`. |
| 115 | + |
| 116 | +- The clustered index key must be on the ``_id`` field. |
| 117 | + |
| 118 | +- You cannot hide a clustered index. See :doc:`Hidden indexes |
| 119 | + </core/index-hidden>`. |
| 120 | + |
| 121 | +- If there are secondary indexes for the clustered collection, the |
| 122 | + collection has a larger storage size. This is because secondary |
| 123 | + indexes on a clustered collection with large clustered index keys may |
| 124 | + have a larger storage size than secondary indexes on a non-clustered |
| 125 | + collection. |
| 126 | + |
| 127 | +.. _clustered-collections-clustered-index-key-values: |
| 128 | + |
| 129 | +Set Your Own Clustered Index Key Values |
| 130 | +--------------------------------------- |
| 131 | + |
| 132 | +By default, the :ref:`clustered index |
| 133 | +<db.createCollection.clusteredIndex>` key values are the unique document |
| 134 | +:ref:`object identifiers <objectid>`. |
| 135 | + |
| 136 | +You can set your own clustered index key values. Your key: |
| 137 | + |
| 138 | +- Must contain unique values. |
| 139 | + |
| 140 | +- Must be immutable. |
| 141 | + |
| 142 | +- Should contain sequentially increasing values. This is not a |
| 143 | + requirement but improves insert performance. |
| 144 | + |
| 145 | +- Should be as small in size as possible. |
| 146 | + |
| 147 | + - A clustered index supports keys up to 8 MB in size, but a much |
| 148 | + smaller clustered index key is best. |
| 149 | + |
| 150 | + - A large clustered index key causes the clustered collection to |
| 151 | + increase in size and secondary indexes are also larger. This reduces |
| 152 | + the performance and storage benefits of the clustered collection. |
| 153 | + |
| 154 | + - Secondary indexes on clustered collections with large clustered |
| 155 | + index keys may use more space compared to secondary indexes on |
| 156 | + non-clustered collections. |
| 157 | + |
| 158 | +Examples |
| 159 | +-------- |
| 160 | + |
| 161 | +This section shows clustered collection examples. |
| 162 | + |
| 163 | +``Create`` Example |
| 164 | +~~~~~~~~~~~~~~~~~~ |
| 165 | + |
| 166 | +.. include:: /includes/create-clustered-collection-example.rst |
| 167 | + |
| 168 | +``db.createCollection`` Example |
| 169 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 170 | + |
| 171 | +.. include:: /includes/db-create-clustered-collection-example.rst |
| 172 | + |
| 173 | +Date Clustered Index Key Example |
| 174 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 175 | + |
| 176 | +The following :dbcommand:`create` example adds a clustered collection |
| 177 | +named ``orders``: |
| 178 | + |
| 179 | +.. code-block:: javascript |
| 180 | + |
| 181 | + db.createCollection( |
| 182 | + "orders", |
| 183 | + { clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "orders clustered key" } } |
| 184 | + ) |
| 185 | + |
| 186 | +In the example, :ref:`clusteredIndex |
| 187 | +<db.createCollection.clusteredIndex>` specifies: |
| 188 | + |
| 189 | +.. |clustered-index-name| replace:: ``"name": "orders clustered key"`` |
| 190 | + |
| 191 | +.. include:: /includes/clustered-index-example-fields.rst |
| 192 | + |
| 193 | +The following example adds documents to the ``orders`` collection: |
| 194 | + |
| 195 | +.. code-block:: javascript |
| 196 | + |
| 197 | + db.orders.insertMany( [ |
| 198 | + { _id: ISODate( "2022-03-18T12:45:20Z" ), "quantity": 50, "totalOrderPrice": 500 }, |
| 199 | + { _id: ISODate( "2022-03-18T12:47:00Z" ), "quantity": 5, "totalOrderPrice": 50 }, |
| 200 | + { _id: ISODate( "2022-03-18T12:50:00Z" ), "quantity": 1, "totalOrderPrice": 10 } |
| 201 | + ] ) |
| 202 | + |
| 203 | +The ``_id`` :ref:`clusteredIndex <create.clusteredIndex>` key stores the |
| 204 | +order date. |
| 205 | + |
| 206 | +If you use the ``_id`` field in a range query, performance is improved. |
| 207 | +For example, the following query uses ``_id`` and :expression:`$gt` to |
| 208 | +return the orders where the order date is greater than the supplied |
| 209 | +date: |
| 210 | + |
| 211 | +.. code-block:: javascript |
| 212 | + |
| 213 | + db.orders.find( { _id: { $gt: ISODate( "2022-03-18T12:47:00.000Z" ) } } ) |
| 214 | + |
| 215 | +Example output: |
| 216 | + |
| 217 | +.. code-block:: javascript |
| 218 | + :copyable: false |
| 219 | + |
| 220 | + [ |
| 221 | + { |
| 222 | + _id: ISODate( "2022-03-18T12:50:00.000Z" ), |
| 223 | + quantity: 1, |
| 224 | + totalOrderPrice: 10 |
| 225 | + } |
| 226 | + ] |
| 227 | + |
| 228 | +Determine if a Collection is Clustered |
| 229 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 230 | + |
| 231 | +To determine if a collection is clustered, use the |
| 232 | +:dbcommand:`listCollections` command: |
| 233 | + |
| 234 | +.. code-block:: javascript |
| 235 | + |
| 236 | + db.runCommand( { listCollections: 1 } ) |
| 237 | + |
| 238 | +For clustered collections, you will see the :ref:`clusteredIndex |
| 239 | +<create.clusteredIndex>` details in the output. For example, the |
| 240 | +following output shows the details for the ``orders`` clustered |
| 241 | +collection: |
| 242 | + |
| 243 | +.. code-block:: javascript |
| 244 | + :copyable: false |
| 245 | + |
| 246 | + ... |
| 247 | + name: 'orders', |
| 248 | + type: 'collection', |
| 249 | + options: { |
| 250 | + clusteredIndex: { |
| 251 | + v: 2, |
| 252 | + key: { _id: 1 }, |
| 253 | + name: 'orders clustered key', |
| 254 | + unique: true |
| 255 | + } |
| 256 | + }, |
| 257 | + ... |
| 258 | + |
| 259 | +``v`` is the index version. |
0 commit comments