DOCS-4667: wiredtiger storage faq

Sam Kleinman · kay-kim · commit 9572df2f6233 · 2015-03-06T13:11:30.000-05:00
Signed-off-by: kay &lt;kay.kim@10gen.com&gt;
diff --git a/source/faq/storage.txt b/source/faq/storage.txt
@@ -11,106 +11,102 @@ If you don't find the answer you're looking for, check
 the :doc:`complete list of FAQs </faq>` or post your question to the
 `MongoDB User Mailing List <https://groups.google.com/forum/?fromgroups#!forum/mongodb-user>`_.
 
-.. _faq-storage-memory-mapped-files:
+Storage Engine Fundamentals
+---------------------------
 
-What are memory mapped files?
------------------------------
+What is a storage engine?
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-A memory-mapped file is a file with data that the operating system
-places in memory by way of the ``mmap()`` system call. ``mmap()`` thus
-*maps* the file to a region of virtual memory. Memory-mapped files are
-the critical piece of the storage engine in MongoDB. By using memory
-mapped files MongoDB can treat the contents of its data files as if
-they were in memory. This provides MongoDB with an extremely fast and
-simple method for accessing and manipulating data.
+A storage engine is the part of a database that is responsible for
+managing how data is stored on disk. Many databases, support multiple
+storage engines, where different engines perform better for specific
+workloads. For example, one storage engine might offer better
+performance for read-heavy workloads, and another might support
+a higher-throughput for write operations.
 
-How do memory mapped files work?
---------------------------------
+What will be the default storage engine going forward?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Memory mapping assigns files to a block of virtual memory with a
-direct byte-for-byte correlation. Once mapped, the relationship
-between file and memory allows MongoDB to interact with the data in
-the file as if it were memory.
-
-How does MongoDB work with memory mapped files?
------------------------------------------------
-
-MongoDB uses memory mapped files for managing and interacting with all
-data. MongoDB memory maps data files to memory as it accesses
-documents. Data that isn't accessed is *not* mapped to memory.
+MMAPv1 will be the default storage engine in 3.0.  WiredTiger will
+become the default storage engine in a future version of
+MongoDB. You will be able to decide which storage engine is best for
+their application.
 
-.. _faq-storage-page-faults:
+Can you mix storage engines in a replica set?
+---------------------------------------------
 
-What are page faults?
----------------------
+Yes. You can have a replica set members that use different storage
+engines.
 
-.. include:: /includes/fact-page-fault.rst
+When designing these milt-storage engine deployments consider the
+following:
 
-If there is free memory, then the operating system can find the page
-on disk and load it to memory directly. However, if there is no free
-memory, the operating system must:
+- the oplog on each member may need to be sized differently to account
+  for differences in throughput between different storage engines.
 
-- find a page in memory that is stale or no longer needed, and write
-  the page to disk.
+- recovery from backups may become more complex if your backup
+  captures data files from MongoDB: you may need to maintain backups
+  for each storage engine.
 
-- read the requested page from disk and load it into memory.
+Wired Tiger Storage Engine
+--------------------------
 
-This process, particularly on an active system can take a long time,
-particularly in comparison to reading a page that is already in
-memory.
+Can I upgrade an existing deployment to a WiredTiger?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-See :ref:`administration-monitoring-page-faults` for more information.
+Yes. You can upgrade an existing deployment to WiredTiger while the
+deployment remains continuously available, by adding replica set
+members with the new storage engine and then removing members with the
+legacy storage engine. See the following sections of the
+:doc:`/release-notes/3.0-upgrade` for the complete procedure that you
+can use to upgrade an existing deployment:
 
-What is the difference between soft and hard page faults?
----------------------------------------------------------
+- :ref:`3.0-upgrade-repl-set-wiredtiger`
 
-:term:`Page faults <page fault>` occur when MongoDB needs access to
-data that isn't currently in active memory. A "hard" page fault
-refers to situations when MongoDB must access a disk to access the
-data. A "soft" page fault, by contrast, merely moves memory pages from
-one list to another, such as from an operating system file
-cache. In production, MongoDB will rarely encounter soft page faults.
+- :ref:`3.0-upgrade-cluster-wiredtiger`
 
-See :ref:`administration-monitoring-page-faults` for more information.
+How much compression does WiredTiger provide?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. _faq-tools-for-measuring-storage-use:
+As much as 50% to 80%. Collection data in WiredTiger use Snappy
+:term:`block compression` by default, and index data use :term:`prefix
+compression` by default.
 
-What tools can I use to investigate storage use in MongoDB?
------------------------------------------------------------
+MMAP Storage Engine
+-------------------
 
-The :method:`db.stats()` method in the :program:`mongo` shell,
-returns the current state of the "active" database. The
-:doc:`dbStats command </reference/command/dbStats>` document describes
-the fields in the :method:`db.stats()` output.
+.. _faq-storage-memory-mapped-files:
 
-.. _faq-working-set:
+What are memory mapped files?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-What is the working set?
-------------------------
+A memory-mapped file is a file with data that the operating system
+places in memory by way of the ``mmap()`` system call. ``mmap()`` thus
+*maps* the file to a region of virtual memory. Memory-mapped files are
+the critical piece of the storage engine in MongoDB. By using memory
+mapped files MongoDB can treat the contents of its data files as if
+they were in memory. This provides MongoDB with an extremely fast and
+simple method for accessing and manipulating data.
 
-Working set represents the total body of data that the application
-uses in the course of normal operation. Often this is a subset of the
-total data size, but the specific size of the working set depends on
-actual moment-to-moment use of the database.
+How do memory mapped files work?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-If you run a query that requires MongoDB to scan every document in a
-collection, the working set will expand to include every
-document. Depending on physical memory size, this may cause documents
-in the working set to "page out," or to be removed from physical memory by
-the operating system. The next time MongoDB needs to access these
-documents, MongoDB may incur a hard page fault.
+Memory mapping assigns files to a block of virtual memory with a
+direct byte-for-byte correlation. Once mapped, the relationship
+between file and memory allows MongoDB to interact with the data in
+the file as if it were memory.
 
-If you run a query that requires MongoDB to scan every
-:term:`document` in a collection, the working set includes every
-active document in memory.
+How does MongoDB work with memory mapped files?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-For best performance, the majority of your *active* set should fit in
-RAM.
+MongoDB uses memory mapped files for managing and interacting with all
+data. MongoDB memory maps data files to memory as it accesses
+documents. Data that isn't accessed is *not* mapped to memory.
 
 .. _faq-disk-size:
 
 Why are the files in my data directory larger than the data in my database?
----------------------------------------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The data files in your data directory, which is the :file:`/data/db`
 directory in default configurations, might be larger than the data set
@@ -174,8 +170,52 @@ inserted into the database. Consider the following possible causes:
      running. Be aware that :dbcommand:`repairDatabase` will block
      all other operations and may take a long time to complete.
 
+How do I know when the server runs out of disk space?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If your server runs out of disk space for data files, you will see
+something like this in the log:
+
+.. code-block:: none
+
+   Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
+   Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
+   Thu Aug 11 13:06:09 [FileAllocator]     will try again in 10 seconds
+   Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
+   Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
+   Thu Aug 11 13:06:19 [FileAllocator]     will try again in 10 seconds
+
+The server remains in this state forever, blocking all writes including
+deletes. However, reads still work. To delete some data and compact,
+using the :dbcommand:`compact` command, you must restart the server
+first.
+
+If your server runs out of disk space for journal files, the server
+process will exit. By default, :program:`mongod` creates journal files
+in a sub-directory of :setting:`~storage.dbPath` named ``journal``. You may
+elect to put the journal files on another storage device using a
+filesystem mount or a symlink.
+
+.. note::
+
+   If you place the journal files on a separate storage device you
+   will not be able to use a file system snapshot tool to capture a
+   valid snapshot of your data files and journal files.
+
+Data Storage Diagnostics
+------------------------
+
+How can I check the size of indexes?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To view the size of the data allocated for an index, use one of the
+following procedures in the :program:`mongo` shell:
+
+Check the value of :data:`~collStats.indexSizes` in the output of the
+:method:`db.collection.stats()` method.
+
 How can I check the size of a collection?
------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 To view the size of a collection and other information, use the
 :method:`db.collection.stats()` method from the :program:`mongo` shell.
@@ -204,91 +244,73 @@ collection:
 
    db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})})
 
-How can I check the size of indexes?
-------------------------------------
-
-To view the size of the data allocated for an index, use one of the
-following procedures in the :program:`mongo` shell:
-
-Check the value of :data:`~collStats.indexSizes` in the output of the
-:method:`db.collection.stats()` method.
-
-How do I know when the server runs out of disk space?
------------------------------------------------------
-
-If your server runs out of disk space for data files, you will see
-something like this in the log:
+.. _faq-tools-for-measuring-storage-use:
 
-.. code-block:: none
+What tools can I use to investigate storage use in MongoDB?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-   Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
-   Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
-   Thu Aug 11 13:06:09 [FileAllocator]     will try again in 10 seconds
-   Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
-   Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
-   Thu Aug 11 13:06:19 [FileAllocator]     will try again in 10 seconds
+The :method:`db.stats()` method in the :program:`mongo` shell,
+returns the current state of the "active" database. The
+:doc:`dbStats command </reference/command/dbStats>` document describes
+the fields in the :method:`db.stats()` output.
 
-The server remains in this state forever, blocking all writes including
-deletes. However, reads still work. To delete some data and compact,
-using the :dbcommand:`compact` command, you must restart the server
-first.
+Page Faults
+-----------
 
-If your server runs out of disk space for journal files, the server
-process will exit. By default, :program:`mongod` creates journal files
-in a sub-directory of :setting:`~storage.dbPath` named ``journal``. You may
-elect to put the journal files on another storage device using a
-filesystem mount or a symlink.
+.. _faq-working-set:
 
-.. note::
+What is the working set?
+~~~~~~~~~~~~~~~~~~~~~~~~
 
-   If you place the journal files on a separate storage device you
-   will not be able to use a file system snapshot tool to capture a
-   valid snapshot of your data files and journal files.
+Working set represents the total body of data that the application
+uses in the course of normal operation. Often this is a subset of the
+total data size, but the specific size of the working set depends on
+actual moment-to-moment use of the database.
 
-.. todo the following "journal FAQ" content is from the wiki. Must add
-   this content to the manual, perhaps on this page.
+If you run a query that requires MongoDB to scan every document in a
+collection, the working set will expand to include every
+document. Depending on physical memory size, this may cause documents
+in the working set to "page out," or to be removed from physical memory by
+the operating system. The next time MongoDB needs to access these
+documents, MongoDB may incur a hard page fault.
 
-   If I am using replication, can some members use journaling and others not?
-   --------------------------------------------------------------------------
+If you run a query that requires MongoDB to scan every
+:term:`document` in a collection, the working set includes every
+active document in memory.
 
-   Yes. It is OK to use journaling on some replica set members and not
-   others.
+For best performance, the majority of your *active* set should fit in
+RAM.
 
-   Can I use the journaling feature to perform safe hot backups?
-   -------------------------------------------------------------
+.. _faq-storage-page-faults:
 
-   Yes, see :doc:`/administration/backups`.
+What are page faults?
+~~~~~~~~~~~~~~~~~~~~~
 
-   32 bit nuances?
-   ---------------
+.. include:: /includes/fact-page-fault.rst
 
-   There is extra memory mapped file activity with journaling. This will
-   further constrain the limited db size of 32 bit builds. Thus, for now
-   journaling by default is disabled on 32 bit systems.
+If there is free memory, then the operating system can find the page
+on disk and load it to memory directly. However, if there is no free
+memory, the operating system must:
 
-   When did the --journal option change from --dur?
-   ------------------------------------------------
+- find a page in memory that is stale or no longer needed, and write
+  the page to disk.
 
-   In 1.8 the option was renamed to --journal, but the old name is still
-   accepted for backwards compatibility; please change to --journal if
-   you are using the old option.
+- read the requested page from disk and load it into memory.
 
-   Will the journal replay have problems if entries are incomplete (like the failure happened in the middle of one)?
-   -----------------------------------------------------------------------------------------------------------------
+This process, particularly on an active system can take a long time,
+particularly in comparison to reading a page that is already in
+memory.
 
-   Each journal (group) write is consistent and won't be replayed during
-   recovery unless it is complete.
+See :ref:`administration-monitoring-page-faults` for more information.
 
-   How many times is data written to disk when replication and journaling are both on?
-   -----------------------------------------------------------------------------------
+What is the difference between soft and hard page faults?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-   In v1.8, for an insert, four times. The object is written to the main
-   collection and also the oplog collection. Both of those writes are
-   also journaled as a single mini-transaction in the journal files in
-   /data/db/journal.
+:term:`Page faults <page fault>` occur when MongoDB needs access to
+data that isn't currently in active memory. A "hard" page fault
+refers to situations when MongoDB must access a disk to access the
+data. A "soft" page fault, by contrast, merely moves memory pages from
+one list to another, such as from an operating system file
+cache. In production, MongoDB will rarely encounter soft page faults.
 
-   The above applies to collection data and inserts which is the worst
-   case scenario. Index updates are written to the index and the
-   journal, but not the oplog, so they should be 2X today not 4X.
-   Likewise updates with things like $set, $addToSet, $inc, etc. are
-   compactly logged all around so those are generally small.
+See :ref:`administration-monitoring-page-faults` for more information.
diff --git a/source/release-notes/3.0-upgrade.txt b/source/release-notes/3.0-upgrade.txt
@@ -128,8 +128,8 @@ upgrade procedure during a scheduled maintenance window.
 
 .. _3.0-upgrade-repl-set-wiredtiger:
 
-Change Storage Engine to WiredTiger
-```````````````````````````````````
+Change Replica Set Storage Engine to WiredTiger
+```````````````````````````````````````````````
 
 In MongoDB 3.0, replica sets can have members with different storage
 engines. As such, you can update members to use the WiredTiger storage
@@ -198,10 +198,10 @@ Upgrade Sharded Clusters
 
 .. include:: /includes/steps/3.0-upgrade-sharded-cluster.rst
 
-.. _3.0-upgrade-wiredtiger-sharded-clusters:
+.. _3.0-upgrade-cluster-wiredtiger:
 
-Change Storage Engine to WiredTiger
-```````````````````````````````````
+Change Sharded Cluster Storage Engine to WiredTiger
+```````````````````````````````````````````````````
 
 For a sharded cluster in MongoDB 3.0, you can choose to update the
 shards to use WiredTiger storage engine and have the config servers use