Skip to content

1862 refactor data recovery tutorial take 2 #1520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions source/includes/steps-recover-data-files.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
ref: recover-data-files-run-mongodump-once-per-db
stepnum: 1
title: Run ``mongodump`` once for each database to recover
action:
- pre: |
If the database used the :option:`--directoryperdb` option, run the
following command from the system shell prompt:
language: sh
code: |
mongodump --journal --dbpath /data/db --directoryperdb --repair -d users -o /data/recovery > /data/recovery/users.log
- pre: |
Otherwise omit the :option:`--directoryperdb` option:
language: sh
code: |
mongodump --journal --dbpath /data/db --repair -d users -o /data/recovery > /data/recovery/users.log
---
ref: recover-data-files-verify-recovered-files-exist
stepnum: 2
title: Verify the new files contain recovered documents
pre: |
Examine ``/data/recovery/users.log`` to determine how many documents
:program:`mongodump` recovered.
---
ref: recover-data-files-create-new-mongodb-node
stepnum: 3
title: Create MongoDB node with ``mongorestore``
pre: |
In this example the new data directory is ``/data/db2``.
action:
language: sh
code: |
mongorestore --dbpath /data/db2 /data/recovery
---
ref: recover-data-files-test-data-files
stepnum: 4
title: Test the data files on a standalone ``mongod``
action:
- pre:
language: sh
code: |
mongod --dbpath /data/recovery/
- pre: |
If the repair has removed data, the number of documents in the
collection will be lower than it had been previously. From the
:program:`mongo` shell, verify the number of documents in each collection:
language: javascript
code: |
use users
db.collection.count()
post: |
Perform other application-specific tests in a staging environment as
needed. If the data files are correct, delete or archive the
``/data/recovery`` directory, and *do not proceed with any further
recovery efforts*.
---
ref: recover-data-files-use-repair-option-and-repairpath
stepnum: 5
title: Use ``--repair`` and ``--repairpath``
pre: |
If :program:`mongodump` failed to recover the data files, use
:program:`mongod` with the :option:`--repair <mongod --repair>` and
:option:`--repairpath <mongod --repairpath>` options to create a new
data directory with a repaired set of data files. Specify a new
directory to receive the repaired data files:
action:
language: sh
code: |
mongod --dbpath /data/db --repair --repairpath /data/recovery
post: |
When the :option:`--repair <mongod --repair>` operation completes
successfully, the newly-repaired data files are in the new directory.

.. warning::

:option:`--repair <mongod --repair>` removes the invalid parts of
data files. *You can lose data as part of the recovery process.*
Under some circumstances, :option:`--repair <mongod --repair>`
may remove the majority of data in the data file. Without the
:option:`--repairpath <mongod --repairpath>` option, the new
data files permanently overwrite the old.
---
ref: recover-data-files-test-data-files
stepnum: 6
title: Test the data files
pre: |
Test the data files using the procedure outlined above.
---
ref: recover-data-files-use-files-normally
stepnum: 7
title: Use the recovered files normally
pre: |
Start :program:`mongod` with :setting:`dbpath` pointing to the new directory:
action:
language: sh
code: |
mongod --dbpath <newpath>
...

4 changes: 2 additions & 2 deletions source/includes/toc-administration-backup-and-recovery.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ description: |
Copy databases between :program:`mongod` instances or
within a single :program:`mongod` instance or deployment.
---
file: /tutorial/recover-data-following-unexpected-shutdown
file: /tutorial/detect-invalid-data-files
description: |
Recover data from MongoDB data files that were not properly closed
or are in an inconsistent state.
or have an invalid state.
...
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ files:
level: 2
- file: /administration/backup-sharded-clusters
level: 2
- file: /tutorial/recover-data-following-unexpected-shutdown
- file: /tutorial/detect-invalid-data-files
level: 2
- file: /administration/scripting
level: 1
Expand Down
5 changes: 2 additions & 3 deletions source/tutorial.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Replica Sets
- :doc:`/tutorial/configure-replica-set-tag-sets`
- :doc:`/tutorial/manage-chained-replication`
- :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members`
- :doc:`/tutorial/recover-data-following-unexpected-shutdown`
- :doc:`/tutorial/detect-invalid-data-files`
- :doc:`/tutorial/troubleshoot-replica-sets`

Sharding
Expand Down Expand Up @@ -85,7 +85,7 @@ Basic Operations
~~~~~~~~~~~~~~~~

- :doc:`/tutorial/use-database-commands`
- :doc:`/tutorial/recover-data-following-unexpected-shutdown`
- :doc:`/tutorial/detect-invalid-data-files`
- :doc:`/tutorial/copy-databases-between-instances`
- :doc:`/tutorial/expire-data`
- :doc:`/tutorial/manage-the-database-profiler`
Expand All @@ -104,7 +104,6 @@ Security
- :doc:`/tutorial/add-user-administrator`
- :doc:`/tutorial/add-user-to-database`
- :doc:`/tutorial/define-roles`
- :doc:`/tutorial/change-user-privileges`
- :doc:`/tutorial/view-roles`
- :doc:`/tutorial/generate-key-file`
- :doc:`/tutorial/control-access-to-mongodb-with-kerberos-authentication`
Expand Down
113 changes: 113 additions & 0 deletions source/tutorial/detect-invalid-data-files.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
==========================
Detect Invalid Data Files
==========================

.. default-domain:: mongodb

.. contents::
:backlinks: none
:local:

Overview
--------

Any deployment may suffer hardware failure, power failure, networking
failure, or some other interruption that may damage data files. MongoDB
provides a range of features, including :term:`replica sets <replica
set>` and :term:`journaling <journal>`, to make recovery from those events
quick and complete.

If you are *not* running a replica set, it may not be possible to
recover all the data stored in damaged data files. But even in this
case it is possible to remove the damaged portions of your data files,
and make them able to support application queries.

If you are *not* running a replica set, and suspect that some of your
data files might be invalid, use the procedures described here and
in :doc:`/tutorial/recover-data` to help recover some of your data.

The best way to avoid data loss and ensure the most
robust deployments is to follow the recommendations in
:doc:`maintain-valid-data-files`.

See also
:doc:`/core/backups` and
:doc:`/administration/backup` for more information on preventing data loss. Also see
:doc:`/core/replication`,
:doc:`/core/journaling`, and
:doc:`/tutorial/manage-journaling`.

Considerations
--------------

Data recovery on a single unjournaled :program:`mongod` instance
is more difficult than data recovery on a journaled replica set,
and may recover less data.

Procedure
---------

Select the procedure that matches the :program:`mongod` configuration
that used the data files you want to recover:

With no Journal Enabled
~~~~~~~~~~~~~~~~~~~~~~~

When a :program:`mongod` instance does not run with journaling enabled
and shuts down uncleanly, you must assume the data files are in an
invalid state.

To confirm that a :program:`mongod` instance shut down uncleanly, look for the
following indicators:

- a ``mongod.lock`` non-zero-length file in the data directory.

- the following line in the :program:`mongod` log output:

.. code-block:: none

Unclean shutdown detected.

With a Journal Enabled
~~~~~~~~~~~~~~~~~~~~~~

When a :program:`mongod` instance runs with journaling enabled
and shuts down uncleanly, or if you suspect invalid data
files, test the integrity of any single collection with the
:method:`db.collection.validate()` method.

Test the integrity of the ``people`` collection using the following
command from the :program:`mongo` shell:

.. code-block:: javascript

db.test.validate(true)

A portion of the output shows that the ``test`` collection is valid:

.. code-block:: javascript

{
...

"valid" : true,
"errors" : [ ],
"ok" : 1
}

If the collection is invalid, the output of
:method:`db.collection.validate()` shows that as well:

.. code-block:: javascript

{
...
"valid" : false,
"errors" : [
"invalid bson object detected (see logs for more info)",
"exception during validate"
],
"advice" : "ns corrupt, requires repair",
"ok" : 1
}

68 changes: 68 additions & 0 deletions source/tutorial/maintain-valid-data-files.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
=========================
Maintain Valid Data Files
=========================

.. default-domain:: mongodb

.. contents::
:backlinks: none
:local:

Overview
--------

Any deployment may suffer hardware failure, power failure, networking
failure, or some other interruption that may damage data files. MongoDB
provides a range of features, including :term:`replica sets <replica
set>` and :term:`journaling <journal>`, to make recovery from those events
quick and complete.

Use the following recommendations to ensure that data is routinely
copied to multiple servers, and that damaged servers may recover
quickly. It's important to protect your data to enable recovery from
any unforseen event.

See also
:doc:`/core/backups` and
:doc:`/administration/backup` for more information on preventing data loss. Also see
:doc:`/core/replication`,
:doc:`/core/journaling`, and
:doc:`/tutorial/manage-journaling` for information on how to set up
a robust deployment.

Recommendations
---------------

Use Journaling
~~~~~~~~~~~~~~

Always use :ref:`durability journaling <setting-journal>`. The journal
stores recent data changes, with the primary aim of recovering from
database invalidity. By default, MongoDB updates its journal ten
times per second. In the worst case, with journaling enabled, only
``1/10`` of a second of data may be lost.

If a :program:`mongod` instance without journaling shuts down
unexpectedly for *any* reason, always assume that your database is
in an invalid state.

Run all Deployments as Replica Sets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Certain recovery options are much simpler if the :program:`mongod`
instance runs as a member of a replica set. The primary goal of
replica sets in MongoDB is to prevent data loss and ensure availability. In
the event of database invalidity, recovery may be as simple as syncing
from a fellow replica set member.

Shut down Cleanly
~~~~~~~~~~~~~~~~~

A clean shutdown means that all ongoing MongoDB operations are
complete, and :program:`mongod` has flushed and closed all data files.

An unclean shutdown can leave the database in an invalid state.

To ensure a clean shutdown, use one of the shutdown procedures
described in :doc:`/tutorial/manage-mongodb-processes`.

23 changes: 9 additions & 14 deletions source/tutorial/manage-journaling.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,15 @@ and to provide crash resiliency. Before applying a change to the data
files, MongoDB writes the change operation to the journal. If MongoDB
should terminate or encounter an error before it can write the changes
from the journal to the data files, MongoDB can re-apply the write
operation and maintain a consistent state.
operation and maintain a valid state.

*Without* a journal, if :program:`mongod` exits unexpectedly, you must
assume your data is in an inconsistent state, and you must run either
:doc:`repair </tutorial/recover-data-following-unexpected-shutdown>`
or, preferably, :doc:`resync </tutorial/resync-replica-set-member>`
from a clean member of the replica set.
Without a journal, if :program:`mongod` exits unexpectedly, you must
assume your data is in an invalid state, and follow the recommendations
in :doc:`/tutorial/detect-invalid-data-files`.

With journaling enabled, if :program:`mongod` stops unexpectedly,
the program can recover everything written to the journal, and the
data remains in a consistent state. By default, the greatest extent of lost
writes, i.e., those not made to the journal, are those made in the last
100 milliseconds. See :setting:`journalCommitInterval` for more
information on the default.
By default, the greatest extent of lost writes, i.e., those not made
to the journal, are those made in the last 100 milliseconds. See
:setting:`journalCommitInterval` for more information on the default.

With journaling, if you want a data set to reside entirely in RAM, you
need enough RAM to hold the data set plus the "write working set." The
Expand Down Expand Up @@ -63,10 +58,10 @@ Disable Journaling

Do not disable journaling on production systems. If your
:program:`mongod` instance stops without shutting down cleanly
unexpectedly for any reason, (e.g. power failure) and you are
for any reason, (e.g. power failure) and you are
not running with journaling, then you must recover from an
unaffected :term:`replica set` member or backup, as described in
:doc:`repair </tutorial/recover-data-following-unexpected-shutdown>`.
:doc:`/tutorial/resync-replica-set-member`.

To disable journaling, start :program:`mongod` with the
:option:`--nojournal <mongod --nojournal>` command line option.
Expand Down
Loading