Skip to content

Commit ec6b2d3

Browse files
author
Bob Grabar
committed
DOCS-561 & DOCS-551 reconfig repl set when members down: draft 3
1 parent bd42f6a commit ec6b2d3

File tree

4 files changed

+101
-54
lines changed

4 files changed

+101
-54
lines changed

source/administration/replica-sets.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ suggestions for administers of replica sets.
3232
- :doc:`/tutorial/force-member-to-be-primary`
3333
- :doc:`/tutorial/change-hostnames-in-a-replica-set`
3434
- :doc:`/tutorial/convert-secondary-into-arbiter`
35-
- :doc:`/tutorial/reconfigure-replica-set-when-members-are-down.txt`
35+
- :doc:`/tutorial/reconfigure-replica-set-when-members-are-down`
3636

3737
.. _replica-set-node-configurations:
3838
.. _replica-set-member-configurations:

source/includes/list-administration-tutorials.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
- :doc:`/tutorial/force-member-to-be-primary`
1212
- :doc:`/tutorial/change-hostnames-in-a-replica-set`
1313
- :doc:`/tutorial/convert-secondary-into-arbiter`
14-
- :doc:`/tutorial/reconfigure-replica-set-when-members-are-down.txt`
14+
- :doc:`/tutorial/reconfigure-replica-set-when-members-are-down`
1515
- :doc:`tutorial/recover-data-following-unexpected-shutdown`
1616
- :doc:`tutorial/deploy-shard-cluster`
1717
- :doc:`tutorial/convert-replica-set-to-replicated-shard-cluster`

source/release-notes/2.0.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ Reconfiguration with a Minority Up
219219
If the majority of servers in a set has been permanently lost, you can
220220
now force a reconfiguration of the set to bring it back online.
221221

222-
See more information see :wiki:`Reconfiguring a replica set when members are down <Reconfiguring+a+replica+set+when+members+are+down>`.
222+
For more information see :doc:`/tutorial/reconfigure-replica-set-when-members-are-down`.
223223

224224
Primary Checks for a Caught up Secondary before Stepping Down
225225
`````````````````````````````````````````````````````````````
@@ -229,7 +229,7 @@ method will now fail if the primary does not see a :term:`secondary`
229229
within 10 seconds of its latest optime. You can force the primary to
230230
step down anyway, but by default it will return an error message.
231231

232-
See also :wiki:`Forcing a Member to be Primary <Forcing+a+Member+to+be+Primary>`.
232+
See also :doc:`/tutorial/force-member-to-be-primary`.
233233

234234
Extended Shutdown on the Primary to Minimize Interruption
235235
`````````````````````````````````````````````````````````

source/tutorial/reconfigure-replica-set-when-members-are-down.txt

Lines changed: 97 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -4,43 +4,43 @@ Reconfigure a Replica Set when Members are Down
44

55
.. default-domain:: mongodb
66

7-
To reconfigure a :term:`replica set` when a majority of the members are
8-
down or unreachable, you must manually change the set configuration.
9-
10-
This includes situations where you have a network partition and where
11-
neither side of the partition has a majority. In such cases the two
12-
sides of the partition cannot see each other when determining whether a
13-
majority exists (see
14-
:ref:`replica-set-elections-and-network-partitions`). Therefore, do not
15-
use scripts to reconfigure but do so manually.
16-
17-
This section gives several procedures for reconfiguring when a majority
18-
is down. Use the procedure appropriate to your version and situation.
19-
20-
.. note:: To reconfigure a replica set when a majority of members are
21-
running, run the :method:`rs.reconfig()` command on the current
22-
:term:`primary`. For examples of using :method:`rs.reconfig()`, see
23-
:ref:`replica-set-reconfiguration-usage`.
24-
25-
Force a Reconfiguration when the Primary is Down
26-
------------------------------------------------
7+
To reconfigure a :term:`replica set` when only a *minority* of members
8+
are down or unreachable, run the :method:`rs.reconfig()` command on the
9+
current :term:`primary`. For examples of how to reconfigure a replica
10+
set using :method:`rs.reconfig()`, see
11+
:ref:`replica-set-reconfiguration-usage`.
12+
13+
To reconfigure a replica set when a *majority* of the members are down
14+
or unreachable, you must manually change the set configuration as
15+
described in the procedures in this tutorial. Use the procedure
16+
appropriate to your version and situation.
17+
18+
Reconfiguring when a *majority* of members are down can include
19+
situations where you have a network partition and where neither side of
20+
the partition has a majority. In such cases the two sides of the
21+
partition cannot see each other when determining whether a majority
22+
exists (see :ref:`replica-set-elections-and-network-partitions`). In
23+
these situations, never use scripts to reconfigure but instead
24+
reconfigure manually, as described in the procedures here.
25+
26+
.. index:: replica set; reconfiguration
27+
.. _replica-set-force-reconfiguration:
28+
29+
Reconfigure by Forcing the Reconfiguration
30+
------------------------------------------
2731

2832
.. versionchanged:: 2.0
2933

3034
This procedure lets you recover while a majority of :term:`replica set`
31-
members are down or unreachable. A member might be unreachable, for
32-
example, if it is on the wrong side of a network partition.
33-
34-
.. TODO Question: must the primary be down for you to use this procedure?
35-
36-
You connect to any surviving member and use the the ``force : true``
37-
option to force a reconfiguration of the replica set.
35+
members are down or unreachable. You connect to any surviving member and
36+
use :method:`rs.reconfig()`'s ``force`` option to force a
37+
reconfiguration of the replica set.
3838

39-
The ``force : true`` option manually reconfigures the set. The option is
39+
The ``force`` option manually reconfigures the set. The option is
4040
intended only for serious problems, such as a disaster recovery
41-
failover. Do not use ``force : true`` every time you reconfigure. Also,
42-
do not put ``force : true`` into any automatic scripts and do not use
43-
``force : true`` when there is still a primary.
41+
failover. Do not use ``force`` any time you reconfigure. Also, do not
42+
include ``force`` into any automatic scripts and do not use ``force``
43+
when there is still a primary.
4444

4545
To force reconfiguration:
4646

@@ -69,7 +69,8 @@ To force reconfiguration:
6969
cfg.members = [cfg.members[0] , cfg.members[4] , cfg.members[7]]
7070

7171
#. On the same member, reconfigure the set by using the
72-
:method:`rs.reconfig()` command with the ``force : true`` option:
72+
:method:`rs.reconfig()` command with the ``force`` option set to
73+
``true``:
7374

7475
.. code-block:: javascript
7576

@@ -79,29 +80,75 @@ To force reconfiguration:
7980
connected to.
8081

8182
.. note:: When you use ``force : true``, the version number in the
82-
replica set configuration increases significantly, by tens or
83-
hundreds of thousands. This is normal and designed to prevent set
84-
version collisions if network partitioning ends.
83+
replica set configuration increases significantly, by tens or
84+
hundreds of thousands. This is normal and designed to prevent set
85+
version collisions if network partitioning ends.
8586

86-
#. If the failure or partition was only temporary, then as members
87-
recover they detect that they have been removed from the set and
88-
enter a special state where they are up but refuse to answer
89-
requests, as they are no longer syncing changes. You can now re-add
90-
them to the configuration object:
87+
#. If any of the removed members come back online, there is the chance
88+
they will elect a new primary, resulting in two primaries. To ensure
89+
that the removed members do not elect a new primary, shut down or
90+
decommission the removed members as soon as possible.
9191

92-
Be sure that each member has the same _id it had before.
92+
Reconfigure by Replacing the Replica Set
93+
----------------------------------------
9394

94-
.. TODO why does each member have to have the same ID as before?
95-
.. The wiki said they'd have to run rs.reconfig(). Why?
95+
The procedures here are intended mainly for MongoDB versions *prior to*
96+
version 2.0. For post-2.0 version, the above procedure,
97+
:ref:`replica-set-force-reconfiguration`, is recommended.
9698

97-
Consider the following example:
99+
These procedures are for situations where a *majority* of the
100+
:term:`replica set` members are down or unreachable. If a majority is
101+
*running*, then skip these procedures and instead use the
102+
:method:`rs.reconfig()` command according to the examples in
103+
:ref:`replica-set-reconfiguration-usage`.
98104

99-
.. code-block:: javascript
105+
If you run a pre-2.0 version and a majority of your replica set is down,
106+
you have the two options described here. Both involve replacing the
107+
replica set.
108+
109+
Reconfigure by Turning Off Replication
110+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111+
112+
This option replaces the :term:`replica set` with a :term:`standalone` server.
113+
114+
1. Stop the surviving :program:`mongod` instances.
115+
116+
#. Perform a backup.
117+
118+
#. Move each surviving member's data directory to an archive folder. For example:
119+
120+
.. code-block:: sh
121+
122+
mv /data/db /data/db-old
123+
124+
.. optional:: You may remove the data instead.
125+
126+
#. Restart one of the :program:`mongod` instances *without* the
127+
``--replSet`` parameter.
128+
129+
You are back online with a single server that is not a replica set
130+
member. Clients can use this server for both reads and writes.
131+
132+
Reconfigure by "Breaking the Mirror"
133+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134+
135+
This option selects a surviving :term:`replica set` member to be the new
136+
:term:`primary` and to be the "seed" for a new replica set. All other
137+
members must resync from this new primary.
138+
139+
1. Stop the surviving :program:`mongod` instances.
140+
141+
#. Perform a backup.
142+
143+
#. Move each surviving member's data directory to an archive folder. For example:
144+
145+
.. code-block:: sh
146+
147+
mv /data/db /data/db-old
100148

101-
rs.add("example.com:30003")
149+
.. optional:: You may remove the data instead.
102150

103-
Once you add the removed members back into the set, they detect they
104-
have been added and synchronize to the current state of the set.
151+
#. Restart all :program:`mongod` instances with the new replica set name.
105152

106-
Be aware that if the original primary was one of the removed members,
107-
these members may need to rollback.
153+
#. On the new primary, add the other instances as members of the replica
154+
set. For more information, see :doc:`/tutorial/expand-replica-set`.

0 commit comments

Comments
 (0)