Skip to content

Commit 9572df2

Browse files
Sam Kleinmankay-kim
authored andcommitted
DOCS-4667: wiredtiger storage faq
Signed-off-by: kay <[email protected]>
1 parent c123304 commit 9572df2

File tree

2 files changed

+168
-146
lines changed

2 files changed

+168
-146
lines changed

source/faq/storage.txt

Lines changed: 163 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -11,106 +11,102 @@ If you don't find the answer you're looking for, check
1111
the :doc:`complete list of FAQs </faq>` or post your question to the
1212
`MongoDB User Mailing List <https://groups.google.com/forum/?fromgroups#!forum/mongodb-user>`_.
1313

14-
.. _faq-storage-memory-mapped-files:
14+
Storage Engine Fundamentals
15+
---------------------------
1516

16-
What are memory mapped files?
17-
-----------------------------
17+
What is a storage engine?
18+
~~~~~~~~~~~~~~~~~~~~~~~~~
1819

19-
A memory-mapped file is a file with data that the operating system
20-
places in memory by way of the ``mmap()`` system call. ``mmap()`` thus
21-
*maps* the file to a region of virtual memory. Memory-mapped files are
22-
the critical piece of the storage engine in MongoDB. By using memory
23-
mapped files MongoDB can treat the contents of its data files as if
24-
they were in memory. This provides MongoDB with an extremely fast and
25-
simple method for accessing and manipulating data.
20+
A storage engine is the part of a database that is responsible for
21+
managing how data is stored on disk. Many databases, support multiple
22+
storage engines, where different engines perform better for specific
23+
workloads. For example, one storage engine might offer better
24+
performance for read-heavy workloads, and another might support
25+
a higher-throughput for write operations.
2626

27-
How do memory mapped files work?
28-
--------------------------------
27+
What will be the default storage engine going forward?
28+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2929

30-
Memory mapping assigns files to a block of virtual memory with a
31-
direct byte-for-byte correlation. Once mapped, the relationship
32-
between file and memory allows MongoDB to interact with the data in
33-
the file as if it were memory.
34-
35-
How does MongoDB work with memory mapped files?
36-
-----------------------------------------------
37-
38-
MongoDB uses memory mapped files for managing and interacting with all
39-
data. MongoDB memory maps data files to memory as it accesses
40-
documents. Data that isn't accessed is *not* mapped to memory.
30+
MMAPv1 will be the default storage engine in 3.0. WiredTiger will
31+
become the default storage engine in a future version of
32+
MongoDB. You will be able to decide which storage engine is best for
33+
their application.
4134

42-
.. _faq-storage-page-faults:
35+
Can you mix storage engines in a replica set?
36+
---------------------------------------------
4337

44-
What are page faults?
45-
---------------------
38+
Yes. You can have a replica set members that use different storage
39+
engines.
4640

47-
.. include:: /includes/fact-page-fault.rst
41+
When designing these milt-storage engine deployments consider the
42+
following:
4843

49-
If there is free memory, then the operating system can find the page
50-
on disk and load it to memory directly. However, if there is no free
51-
memory, the operating system must:
44+
- the oplog on each member may need to be sized differently to account
45+
for differences in throughput between different storage engines.
5246

53-
- find a page in memory that is stale or no longer needed, and write
54-
the page to disk.
47+
- recovery from backups may become more complex if your backup
48+
captures data files from MongoDB: you may need to maintain backups
49+
for each storage engine.
5550

56-
- read the requested page from disk and load it into memory.
51+
Wired Tiger Storage Engine
52+
--------------------------
5753

58-
This process, particularly on an active system can take a long time,
59-
particularly in comparison to reading a page that is already in
60-
memory.
54+
Can I upgrade an existing deployment to a WiredTiger?
55+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6156

62-
See :ref:`administration-monitoring-page-faults` for more information.
57+
Yes. You can upgrade an existing deployment to WiredTiger while the
58+
deployment remains continuously available, by adding replica set
59+
members with the new storage engine and then removing members with the
60+
legacy storage engine. See the following sections of the
61+
:doc:`/release-notes/3.0-upgrade` for the complete procedure that you
62+
can use to upgrade an existing deployment:
6363

64-
What is the difference between soft and hard page faults?
65-
---------------------------------------------------------
64+
- :ref:`3.0-upgrade-repl-set-wiredtiger`
6665

67-
:term:`Page faults <page fault>` occur when MongoDB needs access to
68-
data that isn't currently in active memory. A "hard" page fault
69-
refers to situations when MongoDB must access a disk to access the
70-
data. A "soft" page fault, by contrast, merely moves memory pages from
71-
one list to another, such as from an operating system file
72-
cache. In production, MongoDB will rarely encounter soft page faults.
66+
- :ref:`3.0-upgrade-cluster-wiredtiger`
7367

74-
See :ref:`administration-monitoring-page-faults` for more information.
68+
How much compression does WiredTiger provide?
69+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7570

76-
.. _faq-tools-for-measuring-storage-use:
71+
As much as 50% to 80%. Collection data in WiredTiger use Snappy
72+
:term:`block compression` by default, and index data use :term:`prefix
73+
compression` by default.
7774

78-
What tools can I use to investigate storage use in MongoDB?
79-
-----------------------------------------------------------
75+
MMAP Storage Engine
76+
-------------------
8077

81-
The :method:`db.stats()` method in the :program:`mongo` shell,
82-
returns the current state of the "active" database. The
83-
:doc:`dbStats command </reference/command/dbStats>` document describes
84-
the fields in the :method:`db.stats()` output.
78+
.. _faq-storage-memory-mapped-files:
8579

86-
.. _faq-working-set:
80+
What are memory mapped files?
81+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8782

88-
What is the working set?
89-
------------------------
83+
A memory-mapped file is a file with data that the operating system
84+
places in memory by way of the ``mmap()`` system call. ``mmap()`` thus
85+
*maps* the file to a region of virtual memory. Memory-mapped files are
86+
the critical piece of the storage engine in MongoDB. By using memory
87+
mapped files MongoDB can treat the contents of its data files as if
88+
they were in memory. This provides MongoDB with an extremely fast and
89+
simple method for accessing and manipulating data.
9090

91-
Working set represents the total body of data that the application
92-
uses in the course of normal operation. Often this is a subset of the
93-
total data size, but the specific size of the working set depends on
94-
actual moment-to-moment use of the database.
91+
How do memory mapped files work?
92+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9593

96-
If you run a query that requires MongoDB to scan every document in a
97-
collection, the working set will expand to include every
98-
document. Depending on physical memory size, this may cause documents
99-
in the working set to "page out," or to be removed from physical memory by
100-
the operating system. The next time MongoDB needs to access these
101-
documents, MongoDB may incur a hard page fault.
94+
Memory mapping assigns files to a block of virtual memory with a
95+
direct byte-for-byte correlation. Once mapped, the relationship
96+
between file and memory allows MongoDB to interact with the data in
97+
the file as if it were memory.
10298

103-
If you run a query that requires MongoDB to scan every
104-
:term:`document` in a collection, the working set includes every
105-
active document in memory.
99+
How does MongoDB work with memory mapped files?
100+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106101

107-
For best performance, the majority of your *active* set should fit in
108-
RAM.
102+
MongoDB uses memory mapped files for managing and interacting with all
103+
data. MongoDB memory maps data files to memory as it accesses
104+
documents. Data that isn't accessed is *not* mapped to memory.
109105

110106
.. _faq-disk-size:
111107

112108
Why are the files in my data directory larger than the data in my database?
113-
---------------------------------------------------------------------------
109+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114110

115111
The data files in your data directory, which is the :file:`/data/db`
116112
directory in default configurations, might be larger than the data set
@@ -174,8 +170,52 @@ inserted into the database. Consider the following possible causes:
174170
running. Be aware that :dbcommand:`repairDatabase` will block
175171
all other operations and may take a long time to complete.
176172

173+
How do I know when the server runs out of disk space?
174+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175+
176+
If your server runs out of disk space for data files, you will see
177+
something like this in the log:
178+
179+
.. code-block:: none
180+
181+
Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
182+
Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
183+
Thu Aug 11 13:06:09 [FileAllocator] will try again in 10 seconds
184+
Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
185+
Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
186+
Thu Aug 11 13:06:19 [FileAllocator] will try again in 10 seconds
187+
188+
The server remains in this state forever, blocking all writes including
189+
deletes. However, reads still work. To delete some data and compact,
190+
using the :dbcommand:`compact` command, you must restart the server
191+
first.
192+
193+
If your server runs out of disk space for journal files, the server
194+
process will exit. By default, :program:`mongod` creates journal files
195+
in a sub-directory of :setting:`~storage.dbPath` named ``journal``. You may
196+
elect to put the journal files on another storage device using a
197+
filesystem mount or a symlink.
198+
199+
.. note::
200+
201+
If you place the journal files on a separate storage device you
202+
will not be able to use a file system snapshot tool to capture a
203+
valid snapshot of your data files and journal files.
204+
205+
Data Storage Diagnostics
206+
------------------------
207+
208+
How can I check the size of indexes?
209+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
210+
211+
To view the size of the data allocated for an index, use one of the
212+
following procedures in the :program:`mongo` shell:
213+
214+
Check the value of :data:`~collStats.indexSizes` in the output of the
215+
:method:`db.collection.stats()` method.
216+
177217
How can I check the size of a collection?
178-
-----------------------------------------
218+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179219

180220
To view the size of a collection and other information, use the
181221
:method:`db.collection.stats()` method from the :program:`mongo` shell.
@@ -204,91 +244,73 @@ collection:
204244

205245
db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})})
206246

207-
How can I check the size of indexes?
208-
------------------------------------
209-
210-
To view the size of the data allocated for an index, use one of the
211-
following procedures in the :program:`mongo` shell:
212-
213-
Check the value of :data:`~collStats.indexSizes` in the output of the
214-
:method:`db.collection.stats()` method.
215-
216-
How do I know when the server runs out of disk space?
217-
-----------------------------------------------------
218-
219-
If your server runs out of disk space for data files, you will see
220-
something like this in the log:
247+
.. _faq-tools-for-measuring-storage-use:
221248

222-
.. code-block:: none
249+
What tools can I use to investigate storage use in MongoDB?
250+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223251

224-
Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
225-
Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
226-
Thu Aug 11 13:06:09 [FileAllocator] will try again in 10 seconds
227-
Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes...
228-
Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device
229-
Thu Aug 11 13:06:19 [FileAllocator] will try again in 10 seconds
252+
The :method:`db.stats()` method in the :program:`mongo` shell,
253+
returns the current state of the "active" database. The
254+
:doc:`dbStats command </reference/command/dbStats>` document describes
255+
the fields in the :method:`db.stats()` output.
230256

231-
The server remains in this state forever, blocking all writes including
232-
deletes. However, reads still work. To delete some data and compact,
233-
using the :dbcommand:`compact` command, you must restart the server
234-
first.
257+
Page Faults
258+
-----------
235259

236-
If your server runs out of disk space for journal files, the server
237-
process will exit. By default, :program:`mongod` creates journal files
238-
in a sub-directory of :setting:`~storage.dbPath` named ``journal``. You may
239-
elect to put the journal files on another storage device using a
240-
filesystem mount or a symlink.
260+
.. _faq-working-set:
241261

242-
.. note::
262+
What is the working set?
263+
~~~~~~~~~~~~~~~~~~~~~~~~
243264

244-
If you place the journal files on a separate storage device you
245-
will not be able to use a file system snapshot tool to capture a
246-
valid snapshot of your data files and journal files.
265+
Working set represents the total body of data that the application
266+
uses in the course of normal operation. Often this is a subset of the
267+
total data size, but the specific size of the working set depends on
268+
actual moment-to-moment use of the database.
247269

248-
.. todo the following "journal FAQ" content is from the wiki. Must add
249-
this content to the manual, perhaps on this page.
270+
If you run a query that requires MongoDB to scan every document in a
271+
collection, the working set will expand to include every
272+
document. Depending on physical memory size, this may cause documents
273+
in the working set to "page out," or to be removed from physical memory by
274+
the operating system. The next time MongoDB needs to access these
275+
documents, MongoDB may incur a hard page fault.
250276

251-
If I am using replication, can some members use journaling and others not?
252-
--------------------------------------------------------------------------
277+
If you run a query that requires MongoDB to scan every
278+
:term:`document` in a collection, the working set includes every
279+
active document in memory.
253280

254-
Yes. It is OK to use journaling on some replica set members and not
255-
others.
281+
For best performance, the majority of your *active* set should fit in
282+
RAM.
256283

257-
Can I use the journaling feature to perform safe hot backups?
258-
-------------------------------------------------------------
284+
.. _faq-storage-page-faults:
259285

260-
Yes, see :doc:`/administration/backups`.
286+
What are page faults?
287+
~~~~~~~~~~~~~~~~~~~~~
261288

262-
32 bit nuances?
263-
---------------
289+
.. include:: /includes/fact-page-fault.rst
264290

265-
There is extra memory mapped file activity with journaling. This will
266-
further constrain the limited db size of 32 bit builds. Thus, for now
267-
journaling by default is disabled on 32 bit systems.
291+
If there is free memory, then the operating system can find the page
292+
on disk and load it to memory directly. However, if there is no free
293+
memory, the operating system must:
268294

269-
When did the --journal option change from --dur?
270-
------------------------------------------------
295+
- find a page in memory that is stale or no longer needed, and write
296+
the page to disk.
271297

272-
In 1.8 the option was renamed to --journal, but the old name is still
273-
accepted for backwards compatibility; please change to --journal if
274-
you are using the old option.
298+
- read the requested page from disk and load it into memory.
275299

276-
Will the journal replay have problems if entries are incomplete (like the failure happened in the middle of one)?
277-
-----------------------------------------------------------------------------------------------------------------
300+
This process, particularly on an active system can take a long time,
301+
particularly in comparison to reading a page that is already in
302+
memory.
278303

279-
Each journal (group) write is consistent and won't be replayed during
280-
recovery unless it is complete.
304+
See :ref:`administration-monitoring-page-faults` for more information.
281305

282-
How many times is data written to disk when replication and journaling are both on?
283-
-----------------------------------------------------------------------------------
306+
What is the difference between soft and hard page faults?
307+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284308

285-
In v1.8, for an insert, four times. The object is written to the main
286-
collection and also the oplog collection. Both of those writes are
287-
also journaled as a single mini-transaction in the journal files in
288-
/data/db/journal.
309+
:term:`Page faults <page fault>` occur when MongoDB needs access to
310+
data that isn't currently in active memory. A "hard" page fault
311+
refers to situations when MongoDB must access a disk to access the
312+
data. A "soft" page fault, by contrast, merely moves memory pages from
313+
one list to another, such as from an operating system file
314+
cache. In production, MongoDB will rarely encounter soft page faults.
289315

290-
The above applies to collection data and inserts which is the worst
291-
case scenario. Index updates are written to the index and the
292-
journal, but not the oplog, so they should be 2X today not 4X.
293-
Likewise updates with things like $set, $addToSet, $inc, etc. are
294-
compactly logged all around so those are generally small.
316+
See :ref:`administration-monitoring-page-faults` for more information.

source/release-notes/3.0-upgrade.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,8 @@ upgrade procedure during a scheduled maintenance window.
128128

129129
.. _3.0-upgrade-repl-set-wiredtiger:
130130

131-
Change Storage Engine to WiredTiger
132-
```````````````````````````````````
131+
Change Replica Set Storage Engine to WiredTiger
132+
```````````````````````````````````````````````
133133

134134
In MongoDB 3.0, replica sets can have members with different storage
135135
engines. As such, you can update members to use the WiredTiger storage
@@ -198,10 +198,10 @@ Upgrade Sharded Clusters
198198

199199
.. include:: /includes/steps/3.0-upgrade-sharded-cluster.rst
200200

201-
.. _3.0-upgrade-wiredtiger-sharded-clusters:
201+
.. _3.0-upgrade-cluster-wiredtiger:
202202

203-
Change Storage Engine to WiredTiger
204-
```````````````````````````````````
203+
Change Sharded Cluster Storage Engine to WiredTiger
204+
```````````````````````````````````````````````````
205205

206206
For a sharded cluster in MongoDB 3.0, you can choose to update the
207207
shards to use WiredTiger storage engine and have the config servers use

0 commit comments

Comments
 (0)