Skip to content

DOCS-814 journaling #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 19, 2012
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
292 changes: 292 additions & 0 deletions source/administration/journaling.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
==========
Journaling
==========

.. default-domain:: mongodb

:term:`Journaling <journal>` ensures durability of data by storing
:doc:`write operations </core/write-operations>` in an on-disk
journal prior to applying them to the data files. The journal
ensures write operations can be re-applied in the event of a crash.
Journaling is also referred to as "write ahead logging."

Journaling ensures that :program:`mongodb` is crash resilient. *Without*
a journal, if :program:`mongodb` exits unexpectedly, you must assume
your data is in an inconsistent state and must either run
:doc:`repair </tutorial/recover-data-following-unexpected-shutdown>`
or :ref:`resync <replica-set-resync-stale-member>` from a clean member
of the replica set.

With journaling, if :program:`mongodb` stops unexpectedly, the program
can recover everything written to the journal, and the data is in a
consistent state. By default, the greatest extent of lost writes, i.e.,
those not made to the journal, is no more than the last 100
milliseconds.

With journaling, if you want a data set to reside entirely in RAM, you
need enough RAM to hold the dataset plus the "write working set." The
"write working set" is the amount of unique data you expect to see
written between re-mappings of the private view. For information on
views, see :ref:`journaling-storage-views`.

.. versionchanged:: 2.0
Journaling is enabled by default for 64-bit platforms.
For other platforms, see :setting:`journal`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

journal need not be literalized.

journaling also ensures that mongodb is crash resistent: without a journal, if mongodb exits unexpectedly, then operators must assume that the data are in an inconsistent state and should resync from a clean secondary.

If we don't make this clear, it's possible that people won't respect or value the importance of journaling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash resistent or crash resilient?

What operators?

Are you saying that if journaling is enabled and the primary in a replica set crashes, that the secondaries don't need to resync from a clean secondary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resilient.

operators = administrations/users (this is, admittedly a somewhat arcane and use of the term, sorry for the confusion)

the longer story is:

  • without journaling, if you shutdown uncleanly (i.e. by sending kill -9 mongod, or if it encounters an error and bails out, or there's power loss) then the data is almost certainly corrupt in some way. So you either have to run repair (which just throws away invalid BSON in the database,) or you have to resync from a clean member of the set (copy the data or just use initial sync) to ensure that the data is coherent.
  • with journaling, if mongod stops, it can recover everything that it wrote to the journal (which is everything less the last 100ms at most data (by default)) and the data files will be in a consistent state after it finishes playing back the journal, without need for resync (unless, of course, the secondary has fallen off the back edge of the oplog, which is an unrelated issue that doesn't need to be documented here...

Configuration and Setup
-----------------------

Enable Journaling
~~~~~~~~~~~~~~~~~

.. versionchanged:: 2.0
Journaling is enabled by default for 64-bit platforms.

To enable journaling, start :program:`mongod` with the
:option:`--journal` command line option.

If the :program:`mongod` process preallocates the files, the process
delays listening on port 27017 until preallocation completes, which can
take a few minutes. Your applications and the shell will not be able to
connect to the database until the process completes.

Disable Journaling
~~~~~~~~~~~~~~~~~~

.. warning::

Do not disable journaling on production systems. If your MongoDB
system stops unexpectedly from a power failure or other condition,
and if you are not running with journaling, then you must recover
from an unaffected :term:`replica set` member or backup, as described
in :doc:`repair
</tutorial/recover-data-following-unexpected-shutdown>`.

To disable journaling, start :program:`mongod` with the
:option:`--nojournal <mongod --nojournal>` command line option.

To disable journaling, shut down :program:`mongod` cleanly and restart
with :option:`--nojournal <mongod --nojournal>`.

Get Commit Acknowledgement
~~~~~~~~~~~~~~~~~~~~~~~~~~

You can get commit acknowledgement with the
:dbcommand:`getLastError` command and the ``j`` option. For details, see
:ref:`write-concern-operation`.

.. _journaling-avoid-preallocation-lag:

Avoid Preallocation Lag
~~~~~~~~~~~~~~~~~~~~~~~

To avoid :ref:`preallocation lag <journaling-journal-files>`, you can
preallocate files in the journal directory by copying them from another
instance of :program:`mongod`.

Preallocated files do not contain data. It is safe to later remove them.
But if you restart :program:`mongod` with journaling, :program:`mongod`
will create them again.

.. example:: The following sequence preallocates journal files for an
instance of :program:`mongod` running on port ``27017`` with a database
path of ``/data/db``.

For demonstration purposes, the sequence starts by creating a set of
journal files in the usual way.

1. Create a temporary directory into which to create a set of journal
files:

.. code-block:: sh

mkdir ~/tmpDbpath

#. Create a set of journal files by staring a :program:`mongod`
instance that uses the temporary directory:

.. code-block:: sh

mongod --port 10000 --dbpath ~/tmpDbpath --journal

#. When you see the following log output, indicating :program:`mongod` has the files,
press CONTROL+C to stop the :program:`mongod` instance:

.. code-block:: sh

web admin interface listening on port 11000

#. Preallocate journal files for the new instance of
:program:`mongod` by moving the journal files from the data directory
of the existing instance to the data directory of the new instance:

.. code-block:: sh

mv ~/tmpDbpath/journal /data/db/

#. Start the new :program:`mongod` instance:

.. code-block:: sh

mongod --port 27017 --dbpath /data/db --journal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need appropriate warning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam, What warning is needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Monday, December 17 2012, 15:57:50, Bob Grabar wrote:

+Beginning with version 2.0, journaling is enabled by default for 64-bit
+platforms.
+
+To enable journaling, start :program:mongod with the
+:option:--journal command line option.
+
+If :program:mongod decides to preallocate the files, it will not start
+listening on port 27017 until this process completes, which can take a
+few minutes. This means that your applications and the shell will not be
+able to connect to the database immediately on initial startup. Check
+the logs to see if MongoDB is busy preallocating.
+
+Disable Journaling
+~~~~~~~~~~~~~~~~~~
+

Sam, What warning is needed here?

"Do not disable journaling on production systems. If your MongoDB system
stops unexpectedly, as the result of a system error, power failure, or
other condition and you are not running with journaling; you must
recover from backups or re-sync from an unaffected replica set member."
(Link: recovering from unexpected shutdown.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Monitor Journal Status
~~~~~~~~~~~~~~~~~~~~~~

Use the following commands and methods to monitor journal status:

- :dbcommand:`serverStatus`

The :dbcommand:`serverStatus` command returns database status
information that is useful for assessing performance.

- :dbcommand:`journalLatencyTest`

Use :dbcommand:`journalLatencyTest` to measure how long it takes on
your volume to write to the disk in an append-only fashion. You can
run this command on an idle system to get a baseline sync time for
journaling. You can also run this command on a busy system to see the
sync time on a busy system, which may be higher if the journal
directory is on the same volume as the data files.

The :dbcommand:`journalLatencyTest` command also provides a way to
check if your disk drive is buffering writes in its local cache. If
the number is very low (i.e., less than 2 milliseconds) and the drive
is non-SSD, the drive is probably buffering writes. In that case,
enable cache write-through for the device in your operating system,
unless you have a disk controller card with battery backed RAM.

.. _journaling-journal-commit-interval:

Change the Group Commit Interval
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. versionchanged:: 2.0

You can set the group commit interval using the
:option:`--journalCommitInterval <mongod --journalCommitInterval>`
command line option. The allowed range is ``2`` to ``300`` milliseconds.

Lower values increase the durability of the journal at the expense of
disk performance.

Recover Data After Unexpected Shutdown
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On a restart after a crash, journal files in journal are replayed
before the server goes online. This is indicated in the log output.
You do not need to run a repair.

Journaling Internals
--------------------

When running with journaling, MongoDB stores and applies :doc:`write
operations </core/write-operations>` in memory and in the journal before
the changes are in the data files.

.. _journaling-journal-files:

Journal Files
~~~~~~~~~~~~~

With journaling enabled, MongoDB creates a journal directory within
your database directory. The journal directory holds journal files,
which contain write-ahead redo logs. The directory also holds a
last-sequence-number file. A clean shutdown removes all the files in the
journal directory.

Journal files are append-only files and are named with the ``j._``
prefix. When a journal file reaches 1 gigabyte, a new file is created.
Files that no longer are needed are automatically deleted. Unless you
write many bytes of data per-second, the journal directory should
contain only two or three journal files.

To limit the size of journal files to 128 megabytes per file, use the
``--smallfiles`` command line option when starting :program:`mongod`.

To speed the frequent sequential writes that occur to the current
journal file, you can ensure that the journal directory is on a
different system. However, doing so prevents use of a snapshotting
filesystem to take backups.

Depending on your file system, you might experience a preallocation lag
the first time you start a :program:`mongod` instance with journaling
enabled. MongoDB preallocates journal files if it is faster on your file
system to predefine file size. Preallocation lag might last several
minutes, during which you will not be able to connect to the database.
This is a one-time preallocation and does not occur with future
invocations.

To avoid preallocation lag, see
:ref:`journaling-avoid-preallocation-lag`.

.. _journaling-storage-views:

Storage Views used in Journaling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Journaling adds three storage views to MongoDB.

The ``shared view`` stores modified data for upload to the MongoDB
data files. The ``shared view`` is the only view with direct access
to the MongoDB data files. When running with journaling, :program:`mongod`
asks the operating system to map your existing on-disk data files to the
``shared view`` memory view. The operating system maps the files but
does not load them. MongoDB later loads data files to ``shared view`` as
needed.

The ``private view`` stores data for use in :doc:`read operations
</core/read-operations>`. The ``private view`` is mapped to the ``shared view``
and is the first place MongoDB applies new :doc:`write operations
</core/write-operations>`.

The journal is an on-disk view that stores new write operations
after they have been applied to the ``private cache`` but before they
have been applied to the data files. The journal provides durability.
If the :program:`mongod` instance were to crash without having applied
the writes to the data files, the journal could replay the writes to
the ``shared view`` for eventual upload to the data files.

.. _journaling-record-write-operation:

How Journaling Records Write Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MongoDB copies the write operations to the journal in batches called
group commits. By default, MongoDB performs a group commit every 100
milliseconds, which means a series of operations over 100 milliseconds
are committed as a single batch. This is done to improve performance.

Journaling stores raw operations that allow MongoDB to reconstruct the
following:

- document insertion/updates
- index modifications
- changes to the namespace files

As :doc:`write operations </core/write-operations>` occur, MongoDB
writes the data to the ``private view`` in RAM and then copies the write
operations in batches to the journal. The journal stores the operations
on disk to ensure durability. MongoDB adds the operations as entries on
the journal's forward pointer. Each entry describes which bytes the
write operation changed in the data files.

MongoDB next applies the journal's write operations to the ``shared
view``. At this point, the ``shared view`` becomes inconsistent with the
data files.

At default intervals of 60 seconds, MongoDB asks the operating system to
flush the ``shared view`` to disk. This brings the data files up-to-date
with the latest write operations.

When write operations are flushed to the data files, MongoDB removes the
write operations from the journal's behind pointer. The behind pointer
is always far back from advanced pointer.

As part of journaling, MongoDB routinely asks the operating system to
remap the ``shared view`` to the ``private view``, for consistency.

.. note:: The interaction between the ``shared view`` and the on-disk
data files is similar to how MongoDB works *without*
journaling, which is that MongoDB asks the operating system to flush
in-memory changes back to the data files every 60 seconds.