DOCS-449: resync a stale member

Bob Grabar · Bob Grabar · commit b1f48fd3c201 · 2012-10-22T12:46:27.000-04:00
diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt
@@ -33,6 +33,7 @@ suggestions for administers of replica sets.
    - :doc:`/tutorial/change-hostnames-in-a-replica-set`
    - :doc:`/tutorial/convert-secondary-into-arbiter`
    - :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members`
+   - :doc:`/tutorial/recover-data-following-unexpected-shutdown`
 
 .. _replica-set-node-configurations:
 .. _replica-set-member-configurations:
@@ -365,7 +366,8 @@ the following to prepare the new member's :term:`data directory <dbpath>`:
   difference in the amount of time between the most recent operation and
   the most recent operation to the database exceeds the length of the
   :term:`oplog` on the existing members, then the new instance will have
-  to completely re-synchronize.
+  to completely resynchronize, as described in
+  :ref:`replica-set-resync-stale-member`.
 
    Use :method:`db.printReplicationInfo()` to check the current state of
    replica set members with regards to the oplog.
@@ -558,6 +560,90 @@ the oplog. For a detailed procedure, see
 
 .. include:: /includes/procedure-change-oplog-size.rst
 
+.. _replica-set-resync-stale-member:
+
+Resyncing a Member of a Replica Set
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a member's data falls too far behind the :term:`oplog` to catch up,
+the member and it's data are considered "stale". A member's data is too
+far behind when the oplog on the :term:`primary` has overwritten its
+entries before the member has copied them. When that occurs, you must
+resync the member by removing its data and replacing it with up-to-date
+data.
+
+To do so, use one of the following approaches:
+
+- Restart the machine with an empty data directory and let MongoDB's
+  automatic syncing feature restore the data. This approach requires
+  fewer steps but can take longer to replace the data.
+
+  See :ref:`replica-set-auto-resync-stale-member`.
+
+- Restart the machine with a copy of a recent data directory from
+  another member in the :term:`replica set`. This procedure can replace
+  the data more quickly but requires more manual steps.
+
+  See :ref:`replica-set-resync-by-copying`.
+
+.. index:: replica set; resync
+.. _replica-set-auto-resync-stale-member:
+
+Automatically Resync a Stale Member
+```````````````````````````````````
+
+This procedure relies on MongoDB's automatic syncing feature to restore
+the data on the stale member. For an overview of how MongoDB syncs
+replica sets, see :ref:`replica-set-syncing`.
+
+To resync the stale member:
+
+1. Stop the member's :program:`mongod` instance using the
+   :option:`mongod --shutdown` option. Make sure to set
+   :option:`--dbpath <mongod --dbpath>` to the member's data directory.
+
+   .. code-block:: sh
+
+      mongod --dbpath /data/db/ --shutdown
+
+#. Delete all data and subdirectories from the member's data directory
+   such that the directory is empty.
+
+#. Restart the :program:`mongod` instance on the member. Consider the
+   following example:
+
+   .. code-block:: sh
+
+      mongod --dbpath /data/db/ --replSet rsProduction
+
+   MongoDB resyncs the member. Resyncing may take a long time, depending on
+   the size of the database and speed of the network.
+
+.. index:: replica set; resync
+.. _replica-set-resync-by-copying:
+
+Resync by Copying Data from Another Member
+``````````````````````````````````````````
+
+This approach uses the data directory of an existing member to "seed"
+the stale member. The data must be recent enough to allow the new member
+to catch up with the :term:`primary` member's :term:`oplog`.
+
+To resync by copying data from another member, use one of the following
+approaches:
+
+- Create a snapshot of another member's data and then restore that
+  snapshot to the stale member. Use the snapshot procedures in
+  :doc:`/administration/backups`.
+
+- Lock another member's database with the :method:`db.fsyncLock()`
+  command, copy that data, and then restore the data to the stale
+  member. Use the procedures for backup storage in
+  :doc:`/administration/backups`.
+
+- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as
+  described in :doc:`/tutorial/copy-databases-between-instances`.
+
 .. _replica-set-security:
 
 Replica Set Security
diff --git a/source/core/replication-internals.txt b/source/core/replication-internals.txt
@@ -4,9 +4,6 @@ Replication Internals
 
 .. default-domain:: mongodb
 
-Synopsis
---------
-
 This document provides a more in-depth explanation of the internals and
 operation of :term:`replica set` features. This material is not necessary for
 normal operation or application development but may be useful for
@@ -77,11 +74,10 @@ the following collections:
 .. _replica-set-oplog:
 .. _replica-set-internals-oplog:
 
-Oplog
------
+Oplog Internals
+---------------
 
-For an explanation of the oplog, see the :ref:`replica-set-oplog-sizing`
-topic in the :doc:`/core/replication` document.
+For an explanation of the oplog, see :ref:`replica-set-oplog-sizing`.
 
 Under various exceptional
 situations, updates to a :term:`secondary's <secondary>` oplog might
@@ -113,8 +109,8 @@ Data Integrity
 
 .. index:: replica set; read preferences
 
-Read Preferences
-~~~~~~~~~~~~~~~~
+Read Preference Internals
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
 MongoDB uses :term:`single-master replication` to ensure that the
 database remains consistent. However, clients may modify the
@@ -172,8 +168,8 @@ for your data set is crucial.
 
 .. index:: replica set; security
 
-Security
---------
+Security Internals
+------------------
 
 Administrators of replica sets also have unique :ref:`monitoring
 <replica-set-monitoring>` and :ref:`security <replica-set-security>`
@@ -188,8 +184,8 @@ modify the configuration of an existing replica set.
 .. index:: replica set; failover
 .. _replica-set-election-internals:
 
-Elections
----------
+Election Internals
+------------------
 
 Elections are the process :term:`replica set` members use to select which member should
 become :term:`primary`. A primary is the only member in the replica
@@ -297,6 +293,8 @@ and a majority of servers in one data center and one server in another.
 
 .. index:: replica set; sync
 
+.. _replica-set-syncing:
+
 Syncing
 -------
 
@@ -327,3 +325,5 @@ For example:
    alternate facility, and if you add another secondary to the alternate
    facility, the new secondary will likely sync from the existing
    secondary because it is closer than the primary.
+
+.. seealso:: :ref:`replica-set-resync-stale-member`
diff --git a/source/core/replication.txt b/source/core/replication.txt
@@ -353,9 +353,13 @@ activity of your MongoDB-based application are reads and you are
 writing a small amount of data, you may find that you need a much
 smaller oplog.
 
-For a further understanding of oplog behavior, see the
-:ref:`replica-set-oplog` topic in the :doc:`/core/replication-internals`
-document.
+To view oplog status, including the size and the time range of
+operations, issue the :method:`db.printReplicationInfo()` method. For
+more information on oplog status, see
+:ref:`replica-set-troubleshooting-check-oplog-size`.
+
+For an advanced understanding of oplog behavior, see
+ref:`replica-set-oplog` and :ref:`replica-set-syncing`.
 
 Replica Set Deployment
 ~~~~~~~~~~~~~~~~~~~~~~
diff --git a/source/replication.txt b/source/replication.txt
@@ -56,6 +56,7 @@ operations in detail:
    tutorial/change-hostnames-in-a-replica-set
    tutorial/convert-secondary-into-arbiter
    tutorial/reconfigure-replica-set-with-unavailable-members
+   tutorial/recover-data-following-unexpected-shutdown
 
 .. _replication-reference:
 
diff --git a/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt b/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt
@@ -1,6 +1,6 @@
-==================================================
-Reconfigure a Replica Set with Unavailable Members
-==================================================
+===============================================
+Reconfigure a Replica Set when Members are Down
+===============================================
 
 .. default-domain:: mongodb
 
@@ -23,9 +23,6 @@ members can reach a majority. See
 :ref:`replica-set-elections-and-network-partitions` for more
 information on this situation.
 
-This document provides the following options for reconfiguring a replica
-set when a **majority** of members are accessible:
-
 .. index:: replica set; reconfiguration
 .. _replica-set-force-reconfiguration:
 
diff --git a/source/tutorial/recover-data-following-unexpected-shutdown.txt b/source/tutorial/recover-data-following-unexpected-shutdown.txt
@@ -9,21 +9,22 @@ representation of the data files will likely reflect an inconsistent
 state which could lead to data corruption.
 
 To prevent data inconsistency and corruption, always shut down the
-database cleanly, and use the :ref:`durability journaling
+database cleanly and use the :ref:`durability journaling
 <setting-journal>`. The journal writes data to disk every 100
-milliseconds by default, and ensures that MongoDB will be able to
+milliseconds by default and ensures that MongoDB can
 recover to a consistent state even in the case of an unclean shutdown due to
 power loss or other system failure.
 
 If you are *not* running as part of a :term:`replica set` **and** do
-*not* have journaling enabled use the following procedure to recover
+*not* have journaling enabled, use the following procedure to recover
 data that may be in an inconsistent state. If you are running as part
 of a replica set, you should *always* restore from a backup or restart
 the :program:`mongod` instance with an empty :setting:`dbpath` and
 allow MongoDB to resync the data.
 
-.. seealso:: The ":doc:`/administration`" documents and the
-   documentation of the :setting:`repair`, :setting:`repairpath`, and
+.. seealso:: The :doc:`/administration` documents, including
+   :ref:`replica-set-syncing`, and the
+   documentation on the :setting:`repair`, :setting:`repairpath`, and
    :setting:`journal` settings.
 
 .. [#clean-shutdown] To ensure a clean shut down, use the
@@ -41,7 +42,7 @@ When you are aware of a :program:`mongod` instance running without
 journaling that stops unexpectedly **and** you're not running with
 replication, you should always run the repair operation before
 starting MongoDB again. If you're using replication, then restore from
-a backup and allow replication to synchronize your data.
+a backup and allow replication to :ref:`synchronize <replica-set-syncing>` your data.
 
 If the ``mongod.lock`` file in the data directory specified by
 :setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file,
@@ -72,7 +73,7 @@ Overview
 
    Do not use this procedure to recover a member of a :term:`replica set`.
    Instead you should either restore from a :doc:`backup </administration/backups>` 
-   or re-sync from an intact member of the set.
+   or resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.
 
 There are two processes to repair data files that result from an
 unexpected shutdown:
@@ -171,4 +172,4 @@ If you are not running with journaling, and your database shuts down
 unexpectedly for *any* reason, you should always proceed *as if* your database
 is in an inconsistent and likely corrupt state. If at all possible restore
 from :doc:`backup </administration/backups>` or if running as a :term:`replica
-set` re-sync from an intact member of the set.
+set` resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.