Skip to content

Commit 4714c7c

Browse files
authored
DOCS-15340 Balancer Policy redux (#1990) (#2278)
* DOCS-15340 redux * updates * prep * build errors * tech feedback * tech review feedback * restores original SVG * tech feedback * internal review updates
1 parent 7ce0d1b commit 4714c7c

32 files changed

+322
-450
lines changed

source/core/hashed-sharding.txt

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -130,13 +130,11 @@ Shard a Populated Collection
130130

131131
If you shard a populated collection using a hashed shard key:
132132

133-
- The sharding operation creates the initial chunk(s) to cover the
134-
entire range of the shard key values. The number of chunks created
135-
depends on the :ref:`configured chunk size <sharding-chunk-size>`.
133+
- The sharding operation creates an initial chunk to cover all of the
134+
shard key values.
136135

137-
- After the initial chunk creation, the balancer migrates these initial
138-
chunks across the shards as appropriate as well as manages the chunk
139-
distribution going forward.
136+
- After the initial chunk creation, the balancer moves ranges of the
137+
initial chunk when it needs to balance data.
140138

141139
Shard an Empty Collection
142140
~~~~~~~~~~~~~~~~~~~~~~~~~

source/core/ranged-sharding.txt

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,15 +64,9 @@ to use as the :term:`shard key`.
6464
Shard a Populated Collection
6565
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6666

67-
If you shard a populated collection:
68-
69-
- The sharding operation creates the initial chunk(s) to cover the
70-
entire range of the shard key values. The number of chunks created
71-
depends on the :ref:`configured chunk size <sharding-chunk-size>`.
72-
73-
- After the initial chunk creation, the balancer migrates these initial
74-
chunks across the shards as appropriate as well as manages the chunk
75-
distribution going forward.
67+
If you shard a populated collection, only one chunk is created
68+
initially. The balancer then migrates ranges from that chunk if
69+
necessary according to the configured range size.
7670

7771
Shard an Empty Collection
7872
~~~~~~~~~~~~~~~~~~~~~~~~~

source/core/sharded-cluster-config-servers.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,8 @@ the ranges that define the chunks.
2727
The :binary:`~bin.mongos` instances cache this data and use it to route
2828
read and write operations to the correct shards. :binary:`~bin.mongos`
2929
updates the cache when there are metadata changes for the cluster, such
30-
as :ref:`sharding-chunk-splits` or :doc:`adding a
31-
shard</tutorial/add-shards-to-shard-cluster>`. Shards also read chunk
32-
metadata from the config servers.
30+
as :ref:`adding a shard <sharding-procedure-add-shard>`. Shards also read
31+
chunk metadata from the config servers.
3332

3433
The config servers also store :doc:`authentication` configuration
3534
information such as :doc:`Role-Based Access

source/core/sharding-balancer-administration.txt

Lines changed: 86 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,17 @@ Sharded Cluster Balancer
1313
:depth: 1
1414
:class: singlecol
1515

16-
The MongoDB balancer is a background process that monitors the number of
17-
:term:`chunks <chunk>` on each :term:`shard`. When the number of chunks on a
18-
given shard reaches specific :ref:`migration thresholds
19-
<sharding-migration-thresholds>`, the balancer attempts to automatically
20-
migrate chunks between shards and reach an equal number of chunks per shard.
21-
22-
The balancing procedure for :term:`sharded clusters <sharded cluster>` is
23-
entirely transparent to the user and application layer, though there may be
24-
some performance impact while the procedure takes place.
16+
The MongoDB balancer is a background process that monitors the amount of
17+
data on each :term:`shard` for each sharded collection. When the amount
18+
of data for a sharded collection on a given shard reaches specific
19+
:ref:`migration thresholds <sharding-migration-thresholds>`, the balancer
20+
attempts to automatically migrate data between shards and reach an even
21+
amount of data per shard while respecting the :ref:`zones
22+
<zone-sharding>`. By default, the balancer process is always enabled.
23+
24+
The balancing procedure for :term:`sharded clusters <sharded cluster>`
25+
is entirely transparent to the user and application layer, though there
26+
may be some performance impact while the procedure takes place.
2527

2628
.. include:: /images/sharding-migrating.rst
2729

@@ -39,43 +41,28 @@ The balancer runs on the primary of the config server replica set
3941
.. _sharding-balancing-internals:
4042
.. _sharding-internals-balancing:
4143

42-
Cluster Balancer
43-
----------------
44-
45-
The :term:`balancer` process is responsible for redistributing the
46-
chunks of a sharded collection evenly among the shards for every
47-
sharded collection. By default, the balancer process is always enabled.
48-
49-
To address uneven chunk distribution for a sharded collection, the
50-
balancer :doc:`migrates chunks </core/sharding-balancer-administration>` from
51-
shards with more chunks to shards with a fewer number of chunks. The
52-
balancer migrates the chunks until there is an even
53-
distribution of chunks for the collection across the shards. For details
54-
about chunk migration, see :ref:`chunk-migration-procedure`.
44+
Balancer Internals
45+
------------------
5546

56-
.. include:: /includes/fact-archiveMovedChunks.rst
57-
58-
Chunk migrations carry some overhead in terms of bandwidth and
59-
workload, both of which can impact database performance. [#auto-distribute]_ The
60-
:term:`balancer` attempts to minimize the impact by:
47+
Range migrations carry some overhead in terms of bandwidth and
48+
workload, both of which can impact database performance.
49+
The :term:`balancer` attempts to minimize the impact by:
6150

6251
- Restricting a shard to at most one migration at any given time.
63-
Specifically, a shard cannot participate in multiple chunk migrations
64-
at the same time. To migrate multiple chunks from a shard, the
65-
balancer migrates the chunks one at a time.
52+
Specifically, a shard cannot participate in multiple data migrations
53+
at the same time. The balancer migrates ranges one at a time.
6654

67-
MongoDB can perform parallel chunk migrations, but a shard can
55+
MongoDB can perform parallel data migrations, but a shard can
6856
participate in at most one migration at a time. For a sharded cluster
6957
with *n* shards, MongoDB can perform at most *n/2* (rounded down)
70-
simultaneous chunk migrations.
58+
simultaneous migrations.
7159

72-
See also :ref:`chunk-migration-queuing`.
60+
See also :ref:`range-migration-queuing`.
7361

74-
- Starting a balancing round **only** when the difference in the
75-
number of chunks between the shard with the greatest number of chunks
76-
for a sharded collection and the shard with the lowest number of
77-
chunks for that collection reaches the :ref:`migration threshold
78-
<sharding-migration-thresholds>`.
62+
- Starting a balancing round **only** when the difference in the amount
63+
of data between the shard with the most data for a sharded collection
64+
and the shard with the least data for that collection reaches the
65+
:ref:`migration threshold <sharding-migration-thresholds>`.
7966

8067
You may disable the balancer temporarily for maintenance. See
8168
:ref:`sharding-balancing-disable-temporally` for details.
@@ -91,76 +78,69 @@ Window <sharding-schedule-balancing-window>` for details.
9178

9279
.. seealso::
9380

94-
:doc:`/tutorial/manage-sharded-cluster-balancer`
95-
96-
97-
.. [#auto-distribute]
98-
99-
.. include:: /includes/extracts/zoned-sharding-shard-operation-chunk-distribution.rst
100-
101-
.. include:: /includes/extracts/zoned-sharding-shard-operation-chunk-distribution-hashed-short.rst
102-
103-
See :ref:`pre-define-zone-range-hashed-example` for an example.
81+
:ref:`<sharded-cluster-balancer>`
10482

10583

10684
Adding and Removing Shards from the Cluster
10785
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10886

10987
Adding a shard to a cluster creates an imbalance, since the new
110-
shard has no chunks. While MongoDB begins migrating data to the new
111-
shard immediately, it can take some time before the cluster balances. See the
112-
:doc:`/tutorial/add-shards-to-shard-cluster` tutorial for instructions on
113-
adding a shard to a cluster.
88+
shard has no data. While MongoDB begins migrating data to the new
89+
shard immediately, it can take some time before the cluster balances.
90+
See the :ref:`Add Shards to a Cluster <sharding-procedure-add-shard>`
91+
tutorial for instructions on adding a shard to a cluster.
11492

115-
Removing a shard from a cluster creates a similar imbalance, since chunks
116-
residing on that shard must be redistributed throughout the cluster. While
117-
MongoDB begins draining a removed shard immediately, it can take some time
118-
before the cluster balances. *Do not* shutdown the servers associated
119-
to the removed shard during this process.
93+
Removing a shard from a cluster creates a similar imbalance, since data
94+
residing on that shard must be redistributed throughout the cluster.
95+
While MongoDB begins draining a removed shard immediately, it can take
96+
some time before the cluster balances. *Do not* shutdown the servers
97+
associated to the removed shard during this process.
12098

12199
.. include:: /includes/fact-remove-shard-balance-order.rst
122100

123-
See the :doc:`/tutorial/remove-shards-from-cluster` tutorial for
124-
instructions on safely removing a shard from a cluster.
101+
See the :ref:`Remove Shards from a Cluster
102+
<remove-shards-from-cluster-tutorial>` tutorial for instructions on
103+
safely removing a shard from a cluster.
125104

126105
.. seealso::
127106

128107
:method:`sh.balancerCollectionStatus()`
129108

130109

131110
.. _chunk-migration-procedure:
111+
.. _range-migration-procedure:
132112

133-
Chunk Migration Procedure
113+
Range Migration Procedure
134114
-------------------------
135115

136-
All chunk migrations use the following procedure:
116+
All range migrations use the following procedure:
137117

138-
#. The balancer process sends the :dbcommand:`moveChunk` command to
118+
#. The balancer process sends the :dbcommand:`moveRange` command to
139119
the source shard.
140120

141121
#. The source starts the move when it receives an internal
142122
:dbcommand:`moveRange` command. During the migration process,
143-
operations to the chunk are sent to the source shard. The source
144-
shard is responsible for incoming write operations for the chunk.
123+
operations to the range are sent to the source shard. The source
124+
shard is responsible for incoming write operations for the range.
145125

146126
#. The destination shard builds any indexes required by the source
147127
that do not exist on the destination.
148128

149-
#. The destination shard begins requesting documents in the chunk and
129+
#. The destination shard begins requesting documents in the range and
150130
starts receiving copies of the data. See also
151-
:ref:`chunk-migration-replication`.
131+
:ref:`range-migration-replication`.
152132

153-
#. After receiving the final document in the chunk, the
133+
#. After receiving the final document in the range, the
154134
destination shard starts a synchronization process to ensure that it
155135
has the changes to the migrated documents that occurred during the
156136
migration.
157137

158138
#. When fully synchronized, the source shard connects to the
159139
:term:`config database` and updates the cluster metadata with the new
160-
location for the chunk.
140+
location for the range.
161141

162142
#. After the source shard completes the update of the metadata,
163-
and once there are no open cursors on the chunk, the source shard
143+
and once there are no open cursors on the range, the source shard
164144
deletes its copy of the documents.
165145

166146
.. note::
@@ -174,9 +154,6 @@ All chunk migrations use the following procedure:
174154

175155
:ref:`moveChunk-directory`
176156

177-
The migration process ensures consistency and maximizes the availability of
178-
chunks during balancing.
179-
180157
.. seealso::
181158

182159
:serverstatus:`shardingStatistics.countDonorMoveChunkLockTimeout`
@@ -189,87 +166,69 @@ Migration Thresholds
189166

190167
To minimize the impact of balancing on the cluster, the
191168
:term:`balancer` only begins balancing after the distribution of
192-
chunks for a sharded collection has reached certain thresholds. The
193-
thresholds apply to the difference in number of :term:`chunks <chunk>`
194-
between the shard with the most chunks for the collection and the shard
195-
with the fewest chunks for that collection. The balancer has the
196-
following thresholds:
197-
198-
.. list-table::
199-
:header-rows: 1
200-
201-
* - Number of Chunks
202-
- Migration Threshold
203-
204-
* - Fewer than 20
205-
- 2
169+
date for a sharded collection has reached certain thresholds.
206170

207-
* - 20-79
208-
- 4
209-
210-
* - 80 and greater
211-
- 8
212-
213-
The balancer stops running on the target collection when the difference
214-
between the number of chunks on any two shards for that collection is *less
215-
than two*, or a chunk migration fails.
171+
A collection is considered balanced if the difference in data between
172+
shards (for that collection) is less than :ref:`migration thresholds
173+
<sharding-migration-thresholds>`.
216174

217175
.. seealso::
218176

219177
:method:`sh.balancerCollectionStatus()`
220178

221179

222180
.. _chunk-migration-queuing:
181+
.. _range-migration-queuing:
223182
.. _asynchronous-chunk-migration-cleanup:
224183

225-
Asynchronous Chunk Migration Cleanup
184+
Asynchronous Range Migration Cleanup
226185
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
227186

228-
To migrate multiple chunks from a shard, the balancer migrates the
229-
chunks one at a time. However, the balancer does not wait for the
230-
current migration's delete phase to complete before starting the next
231-
chunk migration. See :ref:`sharding-chunk-migration` for the chunk
187+
To migrate data from a shard, the balancer migrates the
188+
data one range at a time. However, the balancer does not wait for the
189+
current migration's delete phase to complete before starting the next
190+
range migration. See :ref:`sharding-range-migration` for the range
232191
migration process and the delete phase.
233192

234-
This queuing behavior allows shards to unload chunks more quickly in
193+
This queuing behavior allows shards to unload data more quickly in
235194
cases of heavily imbalanced cluster, such as when performing initial
236195
data loads without pre-splitting and when adding new shards.
237196

238197
This behavior also affects the :dbcommand:`moveRange` command, and
239198
migration scripts that use the :dbcommand:`moveRange` command may
240199
proceed more quickly.
241200

242-
In some cases, the delete phases may persist longer. Starting in MongoDB
243-
4.4, chunk migrations are enhanced to be more resilient in the event of
244-
a failover during the delete phase. Orphaned documents are cleaned up
245-
even if a replica set's primary crashes or restarts during this phase.
201+
In some cases, the delete phases may persist longer. Range migrations
202+
are enhanced to be more resilient in the event of a failover during the
203+
delete phase. Orphaned documents are cleaned up even if a replica set's
204+
primary crashes or restarts during this phase.
246205

247-
The ``_waitForDelete``, available as a setting for the balancer as well
248-
as the :dbcommand:`moveRange` command, can alter the behavior so that
206+
The ``_waitForDelete`` balancer setting can alter the behavior so that
249207
the delete phase of the current migration blocks the start of the next
250208
chunk migration. The ``_waitForDelete`` is generally for internal
251-
testing purposes. For more information, see
209+
testing purposes. For more information, see
252210
:ref:`wait-for-delete-setting`.
253211

254212
.. _chunk-migration-replication:
213+
.. _range-migration-replication:
255214

256-
Chunk Migration and Replication
215+
Range Migration and Replication
257216
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
258217

259-
During chunk migration, the ``_secondaryThrottle`` value determines
260-
when the migration proceeds with next document in the chunk.
218+
During range migration, the ``_secondaryThrottle`` value determines
219+
when the migration proceeds with next document in the range.
261220

262221
In the :data:`config.settings` collection:
263222

264223
- If the ``_secondaryThrottle`` setting for the balancer is set to a
265-
write concern, each document move during chunk migration must receive
224+
write concern, each document move during range migration must receive
266225
the requested acknowledgement before proceeding with the next
267226
document.
268227

269228
- If the ``_secondaryThrottle`` setting for the balancer is set to
270-
``true``, each document move during chunk migration must receive
229+
``true``, each document move during range migration must receive
271230
acknowledgement from at least one secondary before the migration
272-
proceeds with the next document in the chunk. This is equivalent to a
231+
proceeds with the next document in the range. This is equivalent to a
273232
write concern of :writeconcern:`{ w: 2 } <\<number\>>`.
274233

275234
- If the ``_secondaryThrottle`` setting is unset, the migration process
@@ -280,17 +239,16 @@ To update the ``_secondaryThrottle`` parameter for the balancer, see
280239
:ref:`sharded-cluster-config-secondary-throttle` for an example.
281240

282241
Independent of any ``_secondaryThrottle`` setting, certain phases of
283-
the chunk migration have the following replication policy:
284-
285-
- MongoDB briefly pauses all application reads and writes to the
286-
collection being migrated, on the source shard, before updating the
287-
config servers with the new location for the chunk, and resumes the
288-
application reads and writes after the update. The chunk move requires
289-
all writes to be acknowledged by majority of the members of the
290-
replica set both before and after committing the chunk move to config
291-
servers.
292-
293-
- When an outgoing chunk migration finishes and cleanup occurs, all
242+
the range migration have the following replication policy:
243+
244+
- MongoDB briefly pauses all application reads and writes to the
245+
collection being migrated to on the source shard before updating the
246+
config servers with the range location. MongoDB resumes application
247+
reads and writes after the update. The range move requires all writes
248+
to be acknowledged by majority of the members of the replica set both
249+
before and after committing the range move to config servers.
250+
251+
- When an outgoing migration finishes and cleanup occurs, all
294252
writes must be replicated to a majority of servers before further
295253
cleanup (from other outgoing migrations) or new incoming migrations
296254
can proceed.
@@ -300,11 +258,12 @@ To update the ``_secondaryThrottle`` setting in the
300258
:ref:`sharded-cluster-config-secondary-throttle` for an example.
301259

302260
.. _migration-chunk-size-limit:
261+
.. _migration-range-size-limit:
303262

304-
Maximum Number of Documents Per Chunk to Migrate
263+
Maximum Number of Documents Per Range to Migrate
305264
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
306265

307-
.. include:: /includes/limits-sharding-maximum-documents-chunk.rst
266+
.. include:: /includes/limits-sharding-maximum-documents-range.rst
308267

309268
.. _range-deletion-performance-tuning:
310269

0 commit comments

Comments
 (0)