Skip to content

Commit cd3839d

Browse files
author
Sam Kleinman
committed
DOCS-136: revisions and editing to replication documentation based on kchodorow
1 parent d3d299b commit cd3839d

File tree

8 files changed

+104
-147
lines changed

8 files changed

+104
-147
lines changed

source/administration/replica-sets.rst

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -36,16 +36,15 @@ Adding Members
3636
From to time, you may need to add an additional member to an existing
3737
:term:`replica set`. The data directory for the new member can:
3838

39-
- have no data. In this case, you must copy all data as part of the
40-
replication process before the member can exit ":term:`recovering`"
41-
status, and become a :term:`secondary` member.
39+
- have no data. In this case, MongoDB must copy all data as part of
40+
the replication process before the member can exit
41+
":term:`recovering`" status, and become a :term:`secondary`
42+
member. This process can be time intensive, but does not require
43+
administrator intervention.
4244

43-
TODO: might be worth mentioning that "you" don't have to copy this data, it's done automatically.
44-
45-
- copy the data directory from an existing member to limit the amount
46-
of time that the recovery process takes.
47-
48-
TODO: if you copy from an existing member, the new member will immediately be a secondary (not recovering).
45+
- copy the data directory from an existing member. The new member
46+
becomes a secondary, and will catch up to the current state of the
47+
replica set after a short interval.
4948

5049
If the difference in the amount of time between the most recent
5150
operation and the most recent operation to the database exceeds the
@@ -54,22 +53,20 @@ TODO: if you copy from an existing member, the new member will immediately be a
5453
copy the data to the new system and begin replication within the
5554
window allowed by the :term:`oplog`.
5655

57-
TODO: maybe mention you can do this with the db.printReplicationInfo() function.
56+
Use :fun:`db.printReplicationInfo()` to check the current state of
57+
replica set members with regards to the oplog.
5858

5959
To add a member to an existing :term:`replica set`, deploy a new
6060
:program:`mongod` instance, specifying the name of the replica set
6161
(i.e. "setname" or ``replSet``) on the command line with the
6262
:option:`--replSet <mongod --replSet>` option or in the configuration
63-
with the :setting:`replSet`. Take note of the host name and
64-
port information for the new :program:`mongod` instance.
65-
66-
TODO: "the configuration
67-
with" -> the configuration file with
63+
file with the :setting:`replSet`. Take note of the host name and port
64+
information for the new :program:`mongod` instance.
6865

6966
Then, log in to the current primary using the :program:`mongo`
7067
shell. Issue the :func:`db.isMaster()` command when connected to *any*
7168
member of the set to determine the current :term:`primary`. Issue the
72-
following command to add the new member to the set.
69+
following command to add the new member to the set:
7370

7471
.. code-block:: javascript
7572
@@ -80,14 +77,15 @@ of the fields in a :data:`members` document, for example:
8077

8178
.. code-block:: javascript
8279
83-
rs.add({host: "mongo2.example.net:27017", priority: 0, hidden: true})
84-
85-
TODO: is the _id field automatically populated?
80+
rs.add({_id: 1, host: "mongo2.example.net:27017", priority: 0, hidden: true})
8681
8782
This configures a :term:`hidden member` that is accessible at
8883
``mongo2.example.net:27018``. See ":data:`host <members[n].host>`,"
8984
":data:`priority <members[n].priority>`," and ":data:`hidden
90-
<members[n].hidden>`" for more information about these settings.
85+
<members[n].hidden>`" for more information about these settings. When
86+
you specify a full configuration object with :fun:`rs.add()`, you must
87+
declare the ``_id`` field, which is not automatically populated in
88+
this case.
9189

9290
.. seealso:: :doc:`/tutorial/expand-replica-set`
9391

@@ -134,7 +132,14 @@ in the :program:`mongo` shell:
134132
rs.remove("mongo2.example.net:27018")
135133
rs.add({host: "mongo2.example.net:27019", priority: 0, hidden: true})
136134
137-
TODO: prior to 2.2, this will almost never work because the _id will change.
135+
.. note::
136+
137+
Because the set member tracks its own replica set member ``_id``
138+
which can cause conflicts when trying to re-add a previous member.
139+
140+
To resolve this issue, you can either restart the :program:`mongod`
141+
process on the host that you're re-adding, or make sure that you
142+
specify an "``_id``" in the :func:`rs.add()` document.
138143

139144
Second, you may consider using the following procedure to use
140145
:func:`rs.reconfig()` to change the value of the
@@ -284,9 +289,9 @@ configurations and also describes the arbiter node type.
284289
Secondary-Only
285290
~~~~~~~~~~~~~~
286291

287-
Given a three node replica set, with member "``_id``" values of:
288-
``0``, ``1``, and ``2``, use the following sequence of operations in
289-
the :program:`mongo` shell to modify node priorities:
292+
Given a four-member replica set, with member "``_id``" values of:
293+
``0``, ``1``, ``2``, and ``3`` use the following sequence of
294+
operations in the :program:`mongo` shell to modify node priorities:
290295

291296
.. code-block:: javascript
292297
@@ -297,8 +302,6 @@ the :program:`mongo` shell to modify node priorities:
297302
cfg.members[3].priority = 2
298303
rs.reconfig(cfg)
299304
300-
TODO: this is actually 4 nodes...
301-
302305
This operation sets the member ``0`` to ``0`` and cannot become
303306
primary. Member ``3`` has a priority of ``2`` and will become primary,
304307
if eligible, under most circumstances. Member ``2`` has a priority of
@@ -335,15 +338,18 @@ operations in the :program:`mongo` shell:
335338
cfg.members[0].hidden = true
336339
rs.reconfig(cfg)
337340
338-
TODO: it might be worth noting that, currently, you must send the reconfig command to
339-
a member that can become primary in the new configuration. So, if members[0] is the
340-
current primary, this reconfig won't work.
341-
342341
After re-configuring the set, the node with the "``_id``" of ``0``,
343342
has a priority of ``0`` so that it cannot become master, while the
344343
other nodes in the set will not advertise the hidden node in the
345344
:dbcommand:`isMaster` or :func:`db.isMaster()` output.
346345

346+
.. note::
347+
348+
You must send the :func:`rs.reconfig()` command to a set member
349+
that *can* become :term:`primary`. In the above example, if issue
350+
the :func:`rs.reconfig()` operation to the member with the ``_id``
351+
of ``0``, the operation will fail.
352+
347353
.. seealso:: ":ref:`Replica Set Read Preference <replica-set-read-preference>`."
348354
":data:`members[n].hidden`," ":data:`members[n].priority`,"
349355
and ":ref:`Replica Set Reconfiguration <replica-set-reconfiguration-usage>`."
@@ -535,9 +541,7 @@ were never replicated to the set so that the data set is in a
535541
consistent state. The :program:`mongod` program writes rolled back
536542
data to a :term:`BSON`.
537543

538-
You can prevent Rollbacks prevented by ensuring safe writes by using
544+
You can prevent rollbacks prevented by ensuring safe writes by using
539545
the appropriate :term:`write concern`.
540546

541-
TODO: "rollback" is not a proper noun.
542-
543547
.. seealso:: ":ref:`Replica Set Elections <replica-set-elections>`"

source/administration/replication-architectures.rst

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,10 @@ Replication Architectures
66

77
There is no single :term:`replica set` architecture that is compatible
88
or ideal for every deployment or environment. Indeed the flexibility
9-
of replica sets may be its greatest strenght. This document outlines
9+
of replica sets may be its greatest strength. This document outlines
1010
and describes the most prevalent deployment patterns for replica set
1111
administrators.
1212

13-
TODO: strenght?
14-
1513
.. seealso:: ":doc:`/administration/replica-sets`" and
1614
":doc:`/reference/replica-configuration`."
1715

@@ -65,9 +63,7 @@ architectural conditions are true:
6563
situation. If a member does not have this capability (i.e. resource
6664
constraints,) set its ``priority`` value to ``0``.
6765

68-
- A majority *of the set's* members exist in the main data center.
69-
70-
TODO: I don't understand why "of the set's" is emphasized.
66+
- A majority of the set's members exist in the main data center.
7167

7268
.. seealso:: ":doc:`/tutorial/expand-replica-set`."
7369

@@ -138,25 +134,19 @@ settings relevant for these kinds of nodes:
138134
- **Voting**: This changes the number of votes that a member of the
139135
set node has in elections for primary. In general use priority to
140136
control the outcome of elections, as weighting votes introduces
141-
operational complexities and the potential. Only modify the number
142-
of votes, if you need to have more than 7 members of a replica
143-
set. (:ref:`see also <replica-set-non-voting-members>`.)
144-
145-
TODO: and the potential... for royally screwing yourself.
137+
operational complexities and risks set failure. Only modify the
138+
number of votes, if you need to have more than 7 members of a
139+
replica set. (:ref:`see also <replica-set-non-voting-members>`.)
146140

147141
Backups
148142
~~~~~~~
149143

150144
For some deployments, keeping a replica set member for dedicated
151-
backup for dedicated backup purposes is operationally
152-
advantageous. Ensure this system is close, from a networking
153-
perspective, to the primary node or likely primary, and that the
154-
:term:`replication lag` is minimal or non-existent. You may wish to
155-
create a dedicated :ref:`hidden node <replica-set-hidden-members>` for
156-
the purpose of creating backups.
157-
158-
TODO: Glitch in the matrix: "a replica set member for dedicated
159-
backup for dedicated backup purposes"
145+
backup purposes is operationally advantageous. Ensure this system is
146+
close, from a networking perspective, to the primary node or likely
147+
primary, and that the :term:`replication lag` is minimal or
148+
non-existent. You may wish to create a dedicated :ref:`hidden node
149+
<replica-set-hidden-members>` for the purpose of creating backups.
160150

161151
If this node have journaling enabled, you can safely use standard
162152
:ref:`block level backup methods <block-level-backup>` to create a

source/applications/replication.rst

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,11 @@ Write Concern
2222
When a :term:`client` sends a write operation to a database server,
2323
the operation will return without waiting for the operation to succeed
2424
or return. To verify that the operation is successful, use the
25-
:dbcommand:`getLastError`
26-
command. :dbcommand:`getLastError` is configurable and can wait
27-
to return for journal writes or full disk flush. For replica sets,
28-
:dbcommand:`getLastError` can return only when the write
29-
operation has propagated to more than one node, or a majority of nodes
30-
in the cluster.
25+
:dbcommand:`getLastError` command. :dbcommand:`getLastError` is
26+
configurable and can wait to return for journal writes or full disk
27+
flush. For replica sets, :dbcommand:`getLastError` can return only
28+
when the write operation has propagated to more than one node, or a
29+
majority of nodes in the cluster.
3130

3231
Many drivers have a "safe" or "write concern" mode that automatically
3332
issues a :dbcommand:`getLastError` command following write
@@ -65,12 +64,8 @@ replica set configuration. For instance:
6564
.. code-block:: javascript
6665
6766
cfg = rs.conf()
68-
cfg.settings.getLastErrorDefaults = "w: majority, fsync: false, j: true"
69-
rs.reconfig(cfg)
70-
71-
TODO: Incorrect getLastErrorDefaults setting:
7267
cfg.settings.getLastErrorDefaults = {w: "majority", fsync: false, j: true}
73-
68+
rs.reconfig(cfg)
7469
7570
When the new configuration is active, the effect of the
7671
:dbcommand:`getLastError` operation will wait until the write

source/core/replication-internals.rst

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,18 +27,20 @@ will reflect writes within one second of the primary. However, various
2727
exceptional situations may cause secondaries lag behind further. See
2828
:term:`replication lag` for details.
2929

30-
All nodes send heartbeats to all other nodes, and will import
31-
operations into its oplog from the node with the lowest "ping" time.
30+
All nodes send heartbeats to all other nodes, and can import
31+
operations to into its oplog from any other node in the
32+
cluster.
3233

33-
TODO: the lowest "ping" time thing is very specific to 2.0, it wasn't true in 1.8 and probably will be different again in 2.2. Not sure if you care or not.
34+
.. In 2.0, replicas would import entries from the member lowest
35+
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.
3436
3537
.. _replica-set-implementation:
3638

3739
Implementation
3840
--------------
3941

4042
MongoDB uses :term:`single master replication` to ensure that the
41-
database remains consistent. However, clients possible to modify the
43+
database remains consistent. However, clients may modify the
4244
:ref:`read preferences <replica-set-read-preference>` on a
4345
per-connection basis in order to distribute read operations to the
4446
secondary members of a replica set. Read-heavy deployments may achieve
@@ -50,8 +52,6 @@ section for more about ":ref:`read preference
5052
<replica-set-read-preference>`" and ":ref:`write concern
5153
<replica-set-write-concern>`."
5254

53-
TODO: "However, clients possible..."
54-
5555
.. note::
5656

5757
Use :func:`db.getReplicationInfo()` from a secondary node
@@ -115,12 +115,10 @@ member should become primary.
115115
Elections are the process that the members of a replica set use to
116116
select the primary node in a cluster. Elections follow two events:
117117
primary node that "steps down" or a :term:`secondary` member that
118-
looses contact with a :term:`primary` node. All members have one vote
118+
loses contact with a :term:`primary` node. All members have one vote
119119
in an election, and every :program:`mongod` can veto an election. A
120120
single member's veto will invalidate the election.
121121

122-
TODO: sp: looses
123-
124122
An existing primary will step down in response to the
125123
:dbcommand:`replSetStepDown` command, or if it sees that one of
126124
the current secondaries is eligible for election *and* has a higher

source/core/replication.rst

Lines changed: 18 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,7 @@ the secondary nodes replicate from the primary asynchronously.
1616
Database replication with MongoDB, as with other systems, adds redundancy, helps to
1717
ensure high availability, simplifies certain administrative tasks
1818
such as backups, and may increase read capacity. Most production
19-
deployments are or should use replication.
20-
21-
TODO: "are or should use replication." are use replication?
19+
deployments are using or should use replication.
2220

2321
If you're familiar with other database systems, you may think about
2422
replica sets as a more sophisticated form of traditional master-slave replication. [#master-slave]_
@@ -194,11 +192,11 @@ remain a secondary.
194192

195193
.. note::
196194

197-
When an election occurs, the :program:`mongod` instances will close
198-
all client connections. This ensures that the clients maintain an accurate
199-
view of the :term:`replica set` and helps prevent :term:`rollbacks <rollback>`.
200-
201-
TODO: it's actually just when a primary steps down that connections are closed.
195+
When the current :term:`primary` steps down and triggers an
196+
election, the :program:`mongod` instances will close all client
197+
connections. This ensures that the clients maintain an accurate
198+
view of the :term:`replica set` and helps prevent :term:`rollbacks
199+
<rollback>`.
202200

203201
.. seealso:: ":ref:`Replica Set Election Internals <replica-set-election-internals>`"
204202

@@ -256,35 +254,24 @@ Rollbacks
256254
~~~~~~~~~
257255

258256
In some :term:`failover` situations :term:`primary` nodes will have
259-
accepted write operations that have replicated to the
257+
accepted write operations that have *not* replicated to the
260258
:term:`secondaries <secondary>` after a failover occurs. This case is
261259
rare and typically occurs as a result of a network partition with
262260
replication lag. When this node (the former primary) rejoins the
263261
:term:`replica set` and attempts to continue replication as a
264-
secondary those operations the former primary must revert these
265-
operations or "rolled back" these operations to maintain database
262+
secondary the former primary must revert these
263+
operations or "roll back" these operations to maintain database
266264
consistency across the replica set.
267265

268-
TODO: ":term:`primary` nodes will have
269-
accepted write operations that have replicated to the
270-
:term:`secondaries <secondary>`" -> have not replicated to the secondary
271-
272-
TODO: "as a
273-
secondary those operations the former primary must revert these
274-
operations or" those operations...these operations
275-
276266
MongoDB writes the rollback data to a :term:`BSON` file in the
277267
database's :setting:`dbpath` directory. Use :doc:`bsondump
278268
</reference/bsondump>` to read the contents of these rollback files
279269
and then manually apply the changes to the new primary. There is no
280270
way for MongoDB to appropriately and fairly handle rollback situations
281-
without manual intervention. Since rollback situations require an
282-
administrator's direct intervention, users should strive to avoid
283-
rollbacks as much as possible. Until an administrator applies this
284-
rollback data, the former primary remains in a "rollback" status.
285-
286-
TODO: "Until an administrator applies this
287-
rollback data, the former primary remains in a "rollback" status." Untrue! ROLLBACK state should automatically correct itself and the server will end up in SECONDARY state.
271+
without manual intervention. Even after the node completes the
272+
rollback and returns to secondary status, administrators will need to
273+
apply or decide to ignore the rollback data. MongoDB users should strive to avoid
274+
rollbacks as much as possible.
288275

289276
The best strategy for avoiding all rollbacks is to ensure :ref:`write
290277
propagation <replica-set-write-concern>` to all or some of the
@@ -297,15 +284,6 @@ that might create rollbacks.
297284
megabytes of data. If your system needs to rollback more than 300
298285
MB, you will need to manually intervene to recover this data.
299286

300-
.. note::
301-
302-
After a rollback occurs, the former primary will remain in a
303-
"rollback" mode until the administrator deals with the rolled back
304-
data and restarts the :program:`mongod` instance. Only then can the
305-
node becomes a normal :term:`secondary` terms.
306-
307-
TODO: not true...
308-
309287
Application Concerns
310288
~~~~~~~~~~~~~~~~~~~~
311289

@@ -455,11 +433,12 @@ the existing members.
455433
:term:`Journaling <journal>`, provides single-instance
456434
write durability. The journaling greatly improves the reliability
457435
and durability of a database. Unless MongoDB runs with journaling, when a
458-
MongoDB instance terminates ungracefully, the database can loose up to 60 seconds of data,
459-
and the database may remain in an inconsistent state and
460-
unrecoverable state.
436+
MongoDB instance terminates ungracefully, the database can end in a
437+
corrupt and unrecoverable state.
461438

462-
TODO: this isn't true. If you are running w/out journaling and mongod terminates "ungracefully" you can lose _all_ data. Also, you should assume, after a crash w/out journaling, that the db is in an inconsistent (i.e., corrupt) state.
439+
You should assume that a database, running without journaling, that
440+
suffers a crash or unclean shutdown is in corrupt or inconsistent
441+
state.
463442

464443
**Use journaling**, however, do not forego proper replication
465444
because of journaling.

0 commit comments

Comments
 (0)