Skip to content

Commit b1f48fd

Browse files
author
Bob Grabar
committed
DOCS-449: resync a stale member
1 parent 6d7008b commit b1f48fd

File tree

6 files changed

+120
-31
lines changed

6 files changed

+120
-31
lines changed

source/administration/replica-sets.txt

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ suggestions for administers of replica sets.
3333
- :doc:`/tutorial/change-hostnames-in-a-replica-set`
3434
- :doc:`/tutorial/convert-secondary-into-arbiter`
3535
- :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members`
36+
- :doc:`/tutorial/recover-data-following-unexpected-shutdown`
3637

3738
.. _replica-set-node-configurations:
3839
.. _replica-set-member-configurations:
@@ -365,7 +366,8 @@ the following to prepare the new member's :term:`data directory <dbpath>`:
365366
difference in the amount of time between the most recent operation and
366367
the most recent operation to the database exceeds the length of the
367368
:term:`oplog` on the existing members, then the new instance will have
368-
to completely re-synchronize.
369+
to completely resynchronize, as described in
370+
:ref:`replica-set-resync-stale-member`.
369371

370372
Use :method:`db.printReplicationInfo()` to check the current state of
371373
replica set members with regards to the oplog.
@@ -558,6 +560,90 @@ the oplog. For a detailed procedure, see
558560

559561
.. include:: /includes/procedure-change-oplog-size.rst
560562

563+
.. _replica-set-resync-stale-member:
564+
565+
Resyncing a Member of a Replica Set
566+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
567+
568+
When a member's data falls too far behind the :term:`oplog` to catch up,
569+
the member and it's data are considered "stale". A member's data is too
570+
far behind when the oplog on the :term:`primary` has overwritten its
571+
entries before the member has copied them. When that occurs, you must
572+
resync the member by removing its data and replacing it with up-to-date
573+
data.
574+
575+
To do so, use one of the following approaches:
576+
577+
- Restart the machine with an empty data directory and let MongoDB's
578+
automatic syncing feature restore the data. This approach requires
579+
fewer steps but can take longer to replace the data.
580+
581+
See :ref:`replica-set-auto-resync-stale-member`.
582+
583+
- Restart the machine with a copy of a recent data directory from
584+
another member in the :term:`replica set`. This procedure can replace
585+
the data more quickly but requires more manual steps.
586+
587+
See :ref:`replica-set-resync-by-copying`.
588+
589+
.. index:: replica set; resync
590+
.. _replica-set-auto-resync-stale-member:
591+
592+
Automatically Resync a Stale Member
593+
```````````````````````````````````
594+
595+
This procedure relies on MongoDB's automatic syncing feature to restore
596+
the data on the stale member. For an overview of how MongoDB syncs
597+
replica sets, see :ref:`replica-set-syncing`.
598+
599+
To resync the stale member:
600+
601+
1. Stop the member's :program:`mongod` instance using the
602+
:option:`mongod --shutdown` option. Make sure to set
603+
:option:`--dbpath <mongod --dbpath>` to the member's data directory.
604+
605+
.. code-block:: sh
606+
607+
mongod --dbpath /data/db/ --shutdown
608+
609+
#. Delete all data and subdirectories from the member's data directory
610+
such that the directory is empty.
611+
612+
#. Restart the :program:`mongod` instance on the member. Consider the
613+
following example:
614+
615+
.. code-block:: sh
616+
617+
mongod --dbpath /data/db/ --replSet rsProduction
618+
619+
MongoDB resyncs the member. Resyncing may take a long time, depending on
620+
the size of the database and speed of the network.
621+
622+
.. index:: replica set; resync
623+
.. _replica-set-resync-by-copying:
624+
625+
Resync by Copying Data from Another Member
626+
``````````````````````````````````````````
627+
628+
This approach uses the data directory of an existing member to "seed"
629+
the stale member. The data must be recent enough to allow the new member
630+
to catch up with the :term:`primary` member's :term:`oplog`.
631+
632+
To resync by copying data from another member, use one of the following
633+
approaches:
634+
635+
- Create a snapshot of another member's data and then restore that
636+
snapshot to the stale member. Use the snapshot procedures in
637+
:doc:`/administration/backups`.
638+
639+
- Lock another member's database with the :method:`db.fsyncLock()`
640+
command, copy that data, and then restore the data to the stale
641+
member. Use the procedures for backup storage in
642+
:doc:`/administration/backups`.
643+
644+
- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as
645+
described in :doc:`/tutorial/copy-databases-between-instances`.
646+
561647
.. _replica-set-security:
562648

563649
Replica Set Security

source/core/replication-internals.txt

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ Replication Internals
44

55
.. default-domain:: mongodb
66

7-
Synopsis
8-
--------
9-
107
This document provides a more in-depth explanation of the internals and
118
operation of :term:`replica set` features. This material is not necessary for
129
normal operation or application development but may be useful for
@@ -77,11 +74,10 @@ the following collections:
7774
.. _replica-set-oplog:
7875
.. _replica-set-internals-oplog:
7976

80-
Oplog
81-
-----
77+
Oplog Internals
78+
---------------
8279

83-
For an explanation of the oplog, see the :ref:`replica-set-oplog-sizing`
84-
topic in the :doc:`/core/replication` document.
80+
For an explanation of the oplog, see :ref:`replica-set-oplog-sizing`.
8581

8682
Under various exceptional
8783
situations, updates to a :term:`secondary's <secondary>` oplog might
@@ -113,8 +109,8 @@ Data Integrity
113109

114110
.. index:: replica set; read preferences
115111

116-
Read Preferences
117-
~~~~~~~~~~~~~~~~
112+
Read Preference Internals
113+
~~~~~~~~~~~~~~~~~~~~~~~~~
118114

119115
MongoDB uses :term:`single-master replication` to ensure that the
120116
database remains consistent. However, clients may modify the
@@ -172,8 +168,8 @@ for your data set is crucial.
172168

173169
.. index:: replica set; security
174170

175-
Security
176-
--------
171+
Security Internals
172+
------------------
177173

178174
Administrators of replica sets also have unique :ref:`monitoring
179175
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
@@ -188,8 +184,8 @@ modify the configuration of an existing replica set.
188184
.. index:: replica set; failover
189185
.. _replica-set-election-internals:
190186

191-
Elections
192-
---------
187+
Election Internals
188+
------------------
193189

194190
Elections are the process :term:`replica set` members use to select which member should
195191
become :term:`primary`. A primary is the only member in the replica
@@ -297,6 +293,8 @@ and a majority of servers in one data center and one server in another.
297293

298294
.. index:: replica set; sync
299295

296+
.. _replica-set-syncing:
297+
300298
Syncing
301299
-------
302300

@@ -327,3 +325,5 @@ For example:
327325
alternate facility, and if you add another secondary to the alternate
328326
facility, the new secondary will likely sync from the existing
329327
secondary because it is closer than the primary.
328+
329+
.. seealso:: :ref:`replica-set-resync-stale-member`

source/core/replication.txt

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -353,9 +353,13 @@ activity of your MongoDB-based application are reads and you are
353353
writing a small amount of data, you may find that you need a much
354354
smaller oplog.
355355

356-
For a further understanding of oplog behavior, see the
357-
:ref:`replica-set-oplog` topic in the :doc:`/core/replication-internals`
358-
document.
356+
To view oplog status, including the size and the time range of
357+
operations, issue the :method:`db.printReplicationInfo()` method. For
358+
more information on oplog status, see
359+
:ref:`replica-set-troubleshooting-check-oplog-size`.
360+
361+
For an advanced understanding of oplog behavior, see
362+
ref:`replica-set-oplog` and :ref:`replica-set-syncing`.
359363

360364
Replica Set Deployment
361365
~~~~~~~~~~~~~~~~~~~~~~

source/replication.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ operations in detail:
5656
tutorial/change-hostnames-in-a-replica-set
5757
tutorial/convert-secondary-into-arbiter
5858
tutorial/reconfigure-replica-set-with-unavailable-members
59+
tutorial/recover-data-following-unexpected-shutdown
5960

6061
.. _replication-reference:
6162

source/tutorial/reconfigure-replica-set-with-unavailable-members.txt

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
==================================================
2-
Reconfigure a Replica Set with Unavailable Members
3-
==================================================
1+
===============================================
2+
Reconfigure a Replica Set when Members are Down
3+
===============================================
44

55
.. default-domain:: mongodb
66

@@ -23,9 +23,6 @@ members can reach a majority. See
2323
:ref:`replica-set-elections-and-network-partitions` for more
2424
information on this situation.
2525

26-
This document provides the following options for reconfiguring a replica
27-
set when a **majority** of members are accessible:
28-
2926
.. index:: replica set; reconfiguration
3027
.. _replica-set-force-reconfiguration:
3128

source/tutorial/recover-data-following-unexpected-shutdown.txt

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,22 @@ representation of the data files will likely reflect an inconsistent
99
state which could lead to data corruption.
1010

1111
To prevent data inconsistency and corruption, always shut down the
12-
database cleanly, and use the :ref:`durability journaling
12+
database cleanly and use the :ref:`durability journaling
1313
<setting-journal>`. The journal writes data to disk every 100
14-
milliseconds by default, and ensures that MongoDB will be able to
14+
milliseconds by default and ensures that MongoDB can
1515
recover to a consistent state even in the case of an unclean shutdown due to
1616
power loss or other system failure.
1717

1818
If you are *not* running as part of a :term:`replica set` **and** do
19-
*not* have journaling enabled use the following procedure to recover
19+
*not* have journaling enabled, use the following procedure to recover
2020
data that may be in an inconsistent state. If you are running as part
2121
of a replica set, you should *always* restore from a backup or restart
2222
the :program:`mongod` instance with an empty :setting:`dbpath` and
2323
allow MongoDB to resync the data.
2424

25-
.. seealso:: The ":doc:`/administration`" documents and the
26-
documentation of the :setting:`repair`, :setting:`repairpath`, and
25+
.. seealso:: The :doc:`/administration` documents, including
26+
:ref:`replica-set-syncing`, and the
27+
documentation on the :setting:`repair`, :setting:`repairpath`, and
2728
:setting:`journal` settings.
2829

2930
.. [#clean-shutdown] To ensure a clean shut down, use the
@@ -41,7 +42,7 @@ When you are aware of a :program:`mongod` instance running without
4142
journaling that stops unexpectedly **and** you're not running with
4243
replication, you should always run the repair operation before
4344
starting MongoDB again. If you're using replication, then restore from
44-
a backup and allow replication to synchronize your data.
45+
a backup and allow replication to :ref:`synchronize <replica-set-syncing>` your data.
4546

4647
If the ``mongod.lock`` file in the data directory specified by
4748
:setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file,
@@ -72,7 +73,7 @@ Overview
7273

7374
Do not use this procedure to recover a member of a :term:`replica set`.
7475
Instead you should either restore from a :doc:`backup </administration/backups>`
75-
or re-sync from an intact member of the set.
76+
or resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.
7677

7778
There are two processes to repair data files that result from an
7879
unexpected shutdown:
@@ -171,4 +172,4 @@ If you are not running with journaling, and your database shuts down
171172
unexpectedly for *any* reason, you should always proceed *as if* your database
172173
is in an inconsistent and likely corrupt state. If at all possible restore
173174
from :doc:`backup </administration/backups>` or if running as a :term:`replica
174-
set` re-sync from an intact member of the set.
175+
set` resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.

0 commit comments

Comments
 (0)