Skip to content

DOCS-449 resync stale replica set member #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 6, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion source/administration/replica-sets.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ suggestions for administers of replica sets.
- :doc:`/tutorial/change-hostnames-in-a-replica-set`
- :doc:`/tutorial/convert-secondary-into-arbiter`
- :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members`
- :doc:`/tutorial/recover-data-following-unexpected-shutdown`

.. _replica-set-node-configurations:
.. _replica-set-member-configurations:
Expand Down Expand Up @@ -365,7 +366,8 @@ the following to prepare the new member's :term:`data directory <dbpath>`:
difference in the amount of time between the most recent operation and
the most recent operation to the database exceeds the length of the
:term:`oplog` on the existing members, then the new instance will have
to completely re-synchronize.
to completely resynchronize, as described in
:ref:`replica-set-resync-stale-member`.

Use :method:`db.printReplicationInfo()` to check the current state of
replica set members with regards to the oplog.
Expand Down Expand Up @@ -558,6 +560,89 @@ the oplog. For a detailed procedure, see

.. include:: /includes/procedure-change-oplog-size.rst

.. _replica-set-resync-stale-member:

Resyncing a Member of a Replica Set
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a member's data falls too far behind the :term:`oplog` to catch up,
the member and it's data are considered "stale". A member's data is too
far behind when the oplog on the :term:`primary` has overwritten its
entries before the member has copied them. When that occurs, you must
resync the member by removing its data and replacing it with up-to-date
data.

To do so, use one of the following approaches:

- Restart the :program:`mongod` with an empty data directory and let MongoDB's
automatic syncing feature restore the data. This approach requires
fewer steps but can take longer to replace the data.

See :ref:`replica-set-auto-resync-stale-member`.

- Restart the machine with a copy of a recent data directory from
another member in the :term:`replica set`. This procedure can replace
the data more quickly but requires more manual steps.

See :ref:`replica-set-resync-by-copying`.

.. index:: replica set; resync
.. _replica-set-auto-resync-stale-member:

Automatically Resync a Stale Member
```````````````````````````````````

This procedure relies on MongoDB's automatic syncing feature to restore
the data on the stale member. For an overview of how MongoDB syncs
:term:`replica sets <replica set>`, see :ref:`replica-set-syncing`.

To resync the stale member:

1. Stop the member's :program:`mongod` instance using the
:option:`mongod --shutdown` option. Make sure to set
:option:`--dbpath <mongod --dbpath>` to the member's data directory.

.. code-block:: sh

mongod --dbpath /data/db/ --shutdown

#. Delete all data and subdirectories from the member's data directory
such that the directory is empty.

#. Restart the :program:`mongod` instance on the member. Consider the
following example:

.. code-block:: sh

mongod --dbpath /data/db/ --replSet rsProduction

MongoDB resyncs the member. Resyncing may take a long time, depending on
the size of the database and speed of the network. Also,
this puts a load on the member being synced from. That
member might not be able to keep a working set in memory.

.. index:: replica set; resync
.. _replica-set-resync-by-copying:

Resync by Copying Data from Another Member
``````````````````````````````````````````

This approach uses the data directory of an existing member to "seed"
the stale member. The data must be recent enough to allow the new member
to catch up with the :term:`oplog`.

To resync by copying data from another member, use one of the following
approaches:

- Create a snapshot of another member's data and then restore that
snapshot to the stale member. Use the snapshot procedures in
:doc:`/administration/backups`.

- Lock another member's data with the :method:`db.fsyncLock()`
command, copy all of the data in the data directory, and then restore the data to the stale
member. Use the procedures for backup storage in
:doc:`/administration/backups`.

.. _replica-set-security:

Replica Set Security
Expand Down
26 changes: 13 additions & 13 deletions source/core/replication-internals.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ Replication Internals

.. default-domain:: mongodb

Synopsis
--------

This document provides a more in-depth explanation of the internals and
operation of :term:`replica set` features. This material is not necessary for
normal operation or application development but may be useful for
Expand Down Expand Up @@ -77,11 +74,10 @@ the following collections:
.. _replica-set-oplog:
.. _replica-set-internals-oplog:

Oplog
-----
Oplog Internals
---------------

For an explanation of the oplog, see the :ref:`replica-set-oplog-sizing`
topic in the :doc:`/core/replication` document.
For an explanation of the oplog, see :ref:`replica-set-oplog-sizing`.

Under various exceptional
situations, updates to a :term:`secondary's <secondary>` oplog might
Expand Down Expand Up @@ -113,8 +109,8 @@ Data Integrity

.. index:: replica set; read preferences

Read Preferences
~~~~~~~~~~~~~~~~
Read Preference Internals
~~~~~~~~~~~~~~~~~~~~~~~~~

MongoDB uses :term:`single-master replication` to ensure that the
database remains consistent. However, clients may modify the
Expand Down Expand Up @@ -172,8 +168,8 @@ for your data set is crucial.

.. index:: replica set; security

Security
--------
Security Internals
------------------

Administrators of replica sets also have unique :ref:`monitoring
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
Expand All @@ -188,8 +184,8 @@ modify the configuration of an existing replica set.
.. index:: replica set; failover
.. _replica-set-election-internals:

Elections
---------
Election Internals
------------------

Elections are the process :term:`replica set` members use to select which member should
become :term:`primary`. A primary is the only member in the replica
Expand Down Expand Up @@ -297,6 +293,8 @@ and a majority of servers in one data center and one server in another.

.. index:: replica set; sync

.. _replica-set-syncing:

Syncing
-------

Expand Down Expand Up @@ -327,3 +325,5 @@ For example:
alternate facility, and if you add another secondary to the alternate
facility, the new secondary will likely sync from the existing
secondary because it is closer than the primary.

.. seealso:: :ref:`replica-set-resync-stale-member`
10 changes: 7 additions & 3 deletions source/core/replication.txt
Original file line number Diff line number Diff line change
Expand Up @@ -353,9 +353,13 @@ activity of your MongoDB-based application are reads and you are
writing a small amount of data, you may find that you need a much
smaller oplog.

For a further understanding of oplog behavior, see the
:ref:`replica-set-oplog` topic in the :doc:`/core/replication-internals`
document.
To view oplog status, including the size and the time range of
operations, issue the :method:`db.printReplicationInfo()` method. For
more information on oplog status, see
:ref:`replica-set-troubleshooting-check-oplog-size`.

For an advanced understanding of oplog behavior, see
ref:`replica-set-oplog` and :ref:`replica-set-syncing`.

Replica Set Deployment
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions source/replication.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ operations in detail:
tutorial/change-hostnames-in-a-replica-set
tutorial/convert-secondary-into-arbiter
tutorial/reconfigure-replica-set-with-unavailable-members
tutorial/recover-data-following-unexpected-shutdown

.. _replication-reference:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
==================================================
Reconfigure a Replica Set with Unavailable Members
==================================================
===============================================
Reconfigure a Replica Set when Members are Down
===============================================

.. default-domain:: mongodb

Expand All @@ -23,9 +23,6 @@ members can reach a majority. See
:ref:`replica-set-elections-and-network-partitions` for more
information on this situation.

This document provides the following options for reconfiguring a replica
set when a **majority** of members are accessible:

.. index:: replica set; reconfiguration
.. _replica-set-force-reconfiguration:

Expand Down
17 changes: 9 additions & 8 deletions source/tutorial/recover-data-following-unexpected-shutdown.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,22 @@ representation of the data files will likely reflect an inconsistent
state which could lead to data corruption.

To prevent data inconsistency and corruption, always shut down the
database cleanly, and use the :ref:`durability journaling
database cleanly and use the :ref:`durability journaling
<setting-journal>`. The journal writes data to disk every 100
milliseconds by default, and ensures that MongoDB will be able to
milliseconds by default and ensures that MongoDB can
recover to a consistent state even in the case of an unclean shutdown due to
power loss or other system failure.

If you are *not* running as part of a :term:`replica set` **and** do
*not* have journaling enabled use the following procedure to recover
*not* have journaling enabled, use the following procedure to recover
data that may be in an inconsistent state. If you are running as part
of a replica set, you should *always* restore from a backup or restart
the :program:`mongod` instance with an empty :setting:`dbpath` and
allow MongoDB to resync the data.

.. seealso:: The ":doc:`/administration`" documents and the
documentation of the :setting:`repair`, :setting:`repairpath`, and
.. seealso:: The :doc:`/administration` documents, including
:ref:`replica-set-syncing`, and the
documentation on the :setting:`repair`, :setting:`repairpath`, and
:setting:`journal` settings.

.. [#clean-shutdown] To ensure a clean shut down, use the
Expand All @@ -41,7 +42,7 @@ When you are aware of a :program:`mongod` instance running without
journaling that stops unexpectedly **and** you're not running with
replication, you should always run the repair operation before
starting MongoDB again. If you're using replication, then restore from
a backup and allow replication to synchronize your data.
a backup and allow replication to :ref:`synchronize <replica-set-syncing>` your data.

If the ``mongod.lock`` file in the data directory specified by
:setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file,
Expand Down Expand Up @@ -72,7 +73,7 @@ Overview

Do not use this procedure to recover a member of a :term:`replica set`.
Instead you should either restore from a :doc:`backup </administration/backups>`
or re-sync from an intact member of the set.
or resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.

There are two processes to repair data files that result from an
unexpected shutdown:
Expand Down Expand Up @@ -171,4 +172,4 @@ If you are not running with journaling, and your database shuts down
unexpectedly for *any* reason, you should always proceed *as if* your database
is in an inconsistent and likely corrupt state. If at all possible restore
from :doc:`backup </administration/backups>` or if running as a :term:`replica
set` re-sync from an intact member of the set.
set` resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.