-
Notifications
You must be signed in to change notification settings - Fork 1.7k
DOCS-814 journaling #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCS-814 journaling #470
Changes from all commits
0080810
0958271
709a879
c083828
f165da4
f0e788e
5732009
84c28dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,292 @@ | ||
========== | ||
Journaling | ||
========== | ||
|
||
.. default-domain:: mongodb | ||
|
||
:term:`Journaling <journal>` ensures durability of data by storing | ||
:doc:`write operations </core/write-operations>` in an on-disk | ||
journal prior to applying them to the data files. The journal | ||
ensures write operations can be re-applied in the event of a crash. | ||
Journaling is also referred to as "write ahead logging." | ||
|
||
Journaling ensures that :program:`mongodb` is crash resilient. *Without* | ||
a journal, if :program:`mongodb` exits unexpectedly, you must assume | ||
your data is in an inconsistent state and must either run | ||
:doc:`repair </tutorial/recover-data-following-unexpected-shutdown>` | ||
or :ref:`resync <replica-set-resync-stale-member>` from a clean member | ||
of the replica set. | ||
|
||
With journaling, if :program:`mongodb` stops unexpectedly, the program | ||
can recover everything written to the journal, and the data is in a | ||
consistent state. By default, the greatest extent of lost writes, i.e., | ||
those not made to the journal, is no more than the last 100 | ||
milliseconds. | ||
|
||
With journaling, if you want a data set to reside entirely in RAM, you | ||
need enough RAM to hold the dataset plus the "write working set." The | ||
"write working set" is the amount of unique data you expect to see | ||
written between re-mappings of the private view. For information on | ||
views, see :ref:`journaling-storage-views`. | ||
|
||
.. versionchanged:: 2.0 | ||
Journaling is enabled by default for 64-bit platforms. | ||
For other platforms, see :setting:`journal`. | ||
|
||
Configuration and Setup | ||
----------------------- | ||
|
||
Enable Journaling | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
.. versionchanged:: 2.0 | ||
Journaling is enabled by default for 64-bit platforms. | ||
|
||
To enable journaling, start :program:`mongod` with the | ||
:option:`--journal` command line option. | ||
|
||
If the :program:`mongod` process preallocates the files, the process | ||
delays listening on port 27017 until preallocation completes, which can | ||
take a few minutes. Your applications and the shell will not be able to | ||
connect to the database until the process completes. | ||
|
||
Disable Journaling | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
.. warning:: | ||
|
||
Do not disable journaling on production systems. If your MongoDB | ||
system stops unexpectedly from a power failure or other condition, | ||
and if you are not running with journaling, then you must recover | ||
from an unaffected :term:`replica set` member or backup, as described | ||
in :doc:`repair | ||
</tutorial/recover-data-following-unexpected-shutdown>`. | ||
|
||
To disable journaling, start :program:`mongod` with the | ||
:option:`--nojournal <mongod --nojournal>` command line option. | ||
|
||
To disable journaling, shut down :program:`mongod` cleanly and restart | ||
with :option:`--nojournal <mongod --nojournal>`. | ||
|
||
Get Commit Acknowledgement | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
You can get commit acknowledgement with the | ||
:dbcommand:`getLastError` command and the ``j`` option. For details, see | ||
:ref:`write-concern-operation`. | ||
|
||
.. _journaling-avoid-preallocation-lag: | ||
|
||
Avoid Preallocation Lag | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To avoid :ref:`preallocation lag <journaling-journal-files>`, you can | ||
preallocate files in the journal directory by copying them from another | ||
instance of :program:`mongod`. | ||
|
||
Preallocated files do not contain data. It is safe to later remove them. | ||
But if you restart :program:`mongod` with journaling, :program:`mongod` | ||
will create them again. | ||
|
||
.. example:: The following sequence preallocates journal files for an | ||
instance of :program:`mongod` running on port ``27017`` with a database | ||
path of ``/data/db``. | ||
|
||
For demonstration purposes, the sequence starts by creating a set of | ||
journal files in the usual way. | ||
|
||
1. Create a temporary directory into which to create a set of journal | ||
files: | ||
|
||
.. code-block:: sh | ||
|
||
mkdir ~/tmpDbpath | ||
|
||
#. Create a set of journal files by staring a :program:`mongod` | ||
instance that uses the temporary directory: | ||
|
||
.. code-block:: sh | ||
|
||
mongod --port 10000 --dbpath ~/tmpDbpath --journal | ||
|
||
#. When you see the following log output, indicating :program:`mongod` has the files, | ||
press CONTROL+C to stop the :program:`mongod` instance: | ||
|
||
.. code-block:: sh | ||
|
||
web admin interface listening on port 11000 | ||
|
||
#. Preallocate journal files for the new instance of | ||
:program:`mongod` by moving the journal files from the data directory | ||
of the existing instance to the data directory of the new instance: | ||
|
||
.. code-block:: sh | ||
|
||
mv ~/tmpDbpath/journal /data/db/ | ||
|
||
#. Start the new :program:`mongod` instance: | ||
|
||
.. code-block:: sh | ||
|
||
mongod --port 27017 --dbpath /data/db --journal | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need appropriate warning. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sam, What warning is needed here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On Monday, December 17 2012, 15:57:50, Bob Grabar wrote:
"Do not disable journaling on production systems. If your MongoDB system There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
Monitor Journal Status | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Use the following commands and methods to monitor journal status: | ||
|
||
- :dbcommand:`serverStatus` | ||
|
||
The :dbcommand:`serverStatus` command returns database status | ||
information that is useful for assessing performance. | ||
|
||
- :dbcommand:`journalLatencyTest` | ||
|
||
Use :dbcommand:`journalLatencyTest` to measure how long it takes on | ||
your volume to write to the disk in an append-only fashion. You can | ||
run this command on an idle system to get a baseline sync time for | ||
journaling. You can also run this command on a busy system to see the | ||
sync time on a busy system, which may be higher if the journal | ||
directory is on the same volume as the data files. | ||
|
||
The :dbcommand:`journalLatencyTest` command also provides a way to | ||
check if your disk drive is buffering writes in its local cache. If | ||
the number is very low (i.e., less than 2 milliseconds) and the drive | ||
is non-SSD, the drive is probably buffering writes. In that case, | ||
enable cache write-through for the device in your operating system, | ||
unless you have a disk controller card with battery backed RAM. | ||
|
||
.. _journaling-journal-commit-interval: | ||
|
||
Change the Group Commit Interval | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. versionchanged:: 2.0 | ||
|
||
You can set the group commit interval using the | ||
:option:`--journalCommitInterval <mongod --journalCommitInterval>` | ||
command line option. The allowed range is ``2`` to ``300`` milliseconds. | ||
|
||
Lower values increase the durability of the journal at the expense of | ||
disk performance. | ||
|
||
Recover Data After Unexpected Shutdown | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
On a restart after a crash, journal files in journal are replayed | ||
before the server goes online. This is indicated in the log output. | ||
You do not need to run a repair. | ||
|
||
Journaling Internals | ||
-------------------- | ||
|
||
When running with journaling, MongoDB stores and applies :doc:`write | ||
operations </core/write-operations>` in memory and in the journal before | ||
the changes are in the data files. | ||
|
||
.. _journaling-journal-files: | ||
|
||
Journal Files | ||
~~~~~~~~~~~~~ | ||
|
||
With journaling enabled, MongoDB creates a journal directory within | ||
your database directory. The journal directory holds journal files, | ||
which contain write-ahead redo logs. The directory also holds a | ||
last-sequence-number file. A clean shutdown removes all the files in the | ||
journal directory. | ||
|
||
Journal files are append-only files and are named with the ``j._`` | ||
prefix. When a journal file reaches 1 gigabyte, a new file is created. | ||
Files that no longer are needed are automatically deleted. Unless you | ||
write many bytes of data per-second, the journal directory should | ||
contain only two or three journal files. | ||
|
||
To limit the size of journal files to 128 megabytes per file, use the | ||
``--smallfiles`` command line option when starting :program:`mongod`. | ||
|
||
To speed the frequent sequential writes that occur to the current | ||
journal file, you can ensure that the journal directory is on a | ||
different system. However, doing so prevents use of a snapshotting | ||
filesystem to take backups. | ||
|
||
Depending on your file system, you might experience a preallocation lag | ||
the first time you start a :program:`mongod` instance with journaling | ||
enabled. MongoDB preallocates journal files if it is faster on your file | ||
system to predefine file size. Preallocation lag might last several | ||
minutes, during which you will not be able to connect to the database. | ||
This is a one-time preallocation and does not occur with future | ||
invocations. | ||
|
||
To avoid preallocation lag, see | ||
:ref:`journaling-avoid-preallocation-lag`. | ||
|
||
.. _journaling-storage-views: | ||
|
||
Storage Views used in Journaling | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Journaling adds three storage views to MongoDB. | ||
|
||
The ``shared view`` stores modified data for upload to the MongoDB | ||
data files. The ``shared view`` is the only view with direct access | ||
to the MongoDB data files. When running with journaling, :program:`mongod` | ||
asks the operating system to map your existing on-disk data files to the | ||
``shared view`` memory view. The operating system maps the files but | ||
does not load them. MongoDB later loads data files to ``shared view`` as | ||
needed. | ||
|
||
The ``private view`` stores data for use in :doc:`read operations | ||
</core/read-operations>`. The ``private view`` is mapped to the ``shared view`` | ||
and is the first place MongoDB applies new :doc:`write operations | ||
</core/write-operations>`. | ||
|
||
The journal is an on-disk view that stores new write operations | ||
after they have been applied to the ``private cache`` but before they | ||
have been applied to the data files. The journal provides durability. | ||
If the :program:`mongod` instance were to crash without having applied | ||
the writes to the data files, the journal could replay the writes to | ||
the ``shared view`` for eventual upload to the data files. | ||
|
||
.. _journaling-record-write-operation: | ||
|
||
How Journaling Records Write Operations | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
MongoDB copies the write operations to the journal in batches called | ||
group commits. By default, MongoDB performs a group commit every 100 | ||
milliseconds, which means a series of operations over 100 milliseconds | ||
are committed as a single batch. This is done to improve performance. | ||
|
||
Journaling stores raw operations that allow MongoDB to reconstruct the | ||
following: | ||
|
||
- document insertion/updates | ||
- index modifications | ||
- changes to the namespace files | ||
|
||
As :doc:`write operations </core/write-operations>` occur, MongoDB | ||
writes the data to the ``private view`` in RAM and then copies the write | ||
operations in batches to the journal. The journal stores the operations | ||
on disk to ensure durability. MongoDB adds the operations as entries on | ||
the journal's forward pointer. Each entry describes which bytes the | ||
write operation changed in the data files. | ||
|
||
MongoDB next applies the journal's write operations to the ``shared | ||
view``. At this point, the ``shared view`` becomes inconsistent with the | ||
data files. | ||
|
||
At default intervals of 60 seconds, MongoDB asks the operating system to | ||
flush the ``shared view`` to disk. This brings the data files up-to-date | ||
with the latest write operations. | ||
|
||
When write operations are flushed to the data files, MongoDB removes the | ||
write operations from the journal's behind pointer. The behind pointer | ||
is always far back from advanced pointer. | ||
|
||
As part of journaling, MongoDB routinely asks the operating system to | ||
remap the ``shared view`` to the ``private view``, for consistency. | ||
|
||
.. note:: The interaction between the ``shared view`` and the on-disk | ||
data files is similar to how MongoDB works *without* | ||
journaling, which is that MongoDB asks the operating system to flush | ||
in-memory changes back to the data files every 60 seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
journal need not be literalized.
journaling also ensures that mongodb is crash resistent: without a journal, if mongodb exits unexpectedly, then operators must assume that the data are in an inconsistent state and should resync from a clean secondary.
If we don't make this clear, it's possible that people won't respect or value the importance of journaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crash resistent or crash resilient?
What operators?
Are you saying that if journaling is enabled and the primary in a replica set crashes, that the secondaries don't need to resync from a clean secondary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resilient.
operators = administrations/users (this is, admittedly a somewhat arcane and use of the term, sorry for the confusion)
the longer story is: