Skip to content

Commit 853c191

Browse files
author
Bob Grabar
committed
DOCS-242: Updated for style and content.
1 parent a505705 commit 853c191

File tree

1 file changed

+77
-84
lines changed

1 file changed

+77
-84
lines changed

source/core/replication-internals.txt

Lines changed: 77 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,10 @@ Replication Internals
77
Synopsis
88
--------
99

10-
This document provides a more in-depth explanation of the internals
11-
and operation of replica set features. This material is not necessary
12-
for normal operation or application development, but may be
13-
useful for troubleshooting, helpful understanding MongoDB's behavior,
14-
or interesting for learning about MongoDB's approach.
10+
This document provides a more in-depth explanation of the internals and
11+
operation of :term:`replica set` features. This material is not necessary for
12+
normal operation or application development but may be useful for
13+
troubleshooting and for further understanding MongoDB's behavior and approach.
1514

1615
.. index:: replica set; oplog
1716
.. _replica-set-oplog:
@@ -21,19 +20,17 @@ Oplog
2120

2221
Replication itself works by way of a special :term:`capped collection`
2322
called the :term:`oplog`. This collection keeps a rolling record of
24-
all operations applied to the primary node. Secondary nodes then
23+
all operations applied to the :term:`primary`. Secondary members then
2524
replicate this log by applying the operations to themselves in an
26-
asynchronous process. Under normal operation, secondary nodes will
25+
asynchronous process. Under normal operation, :term:`secondary` members
2726
reflect writes within one second of the primary. However, various
28-
exceptional situations may cause secondaries lag behind further. See
27+
exceptional situations may cause secondaries to lag behind further. See
2928
:term:`replication lag` for details.
3029

31-
Also consider :ref:`oplog sizing <replica-set-oplog-sizing>` for more
32-
information about the oplog.
30+
All members send heartbeats (pings) to all other members in the set and can
31+
import operations to the local oplog from any other member in the set.
3332

34-
All members send heartbeats to all other members, and can import
35-
operations to into its oplog from any other member in the
36-
set.
33+
For more information about the oplog, see :ref:`oplog sizing <replica-set-oplog-sizing>`.
3734

3835
.. In 2.0, replicas would import entries from the member lowest
3936
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.
@@ -47,65 +44,64 @@ MongoDB uses :term:`single-master replication` to ensure that the
4744
database remains consistent. However, clients may modify the
4845
:ref:`read preferences <replica-set-read-preference>` on a
4946
per-connection basis in order to distribute read operations to the
50-
secondary members of a replica set. Read-heavy deployments may achieve
51-
greater query volumes by distributing reads to secondary nodes. But
47+
:term:`secondary` members of a :term:`replica set`. Read-heavy deployments may achieve
48+
greater query volumes by distributing reads to secondary members. But
5249
keep in mind that replication is asynchronous; therefore, reads from
5350
secondaries may not always reflect the latest writes to the
54-
primary. See the ":ref:`consistency <replica-set-consistency>`"
55-
section for more about ":ref:`read preference
56-
<replica-set-read-preference>`" and ":ref:`write concern
57-
<replica-set-write-concern>`."
51+
:term:`primary`. See the :ref:`consistency <replica-set-consistency>`
52+
section for more about :ref:`read preference
53+
<replica-set-read-preference>` and :ref:`write concern
54+
<replica-set-write-concern>`.
5855

5956
.. note::
6057

61-
Use :func:`db.getReplicationInfo()` from a secondary node
62-
and the ":doc:`replication status </reference/replication-info>`
63-
output to asses the current state of replication, and determine if
58+
Use :func:`db.getReplicationInfo()` from a secondary member
59+
and the :doc:`replication status </reference/replication-info>`
60+
output to asses the current state of replication and determine if
6461
there is any unintended replication delay.
6562

66-
In the default configuration, all have nodes an equal chance of
67-
becoming primary; however, it's possible to set "priorities" that
63+
In the default configuration, all members have an equal chance of
64+
becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
6865
weight the election. In some architectures, there may be operational
6966
reasons for increasing the likelihood of a specific replica set member
70-
becoming primary. For instance, a node located in a remote data
71-
center should become primary . See: ":ref:`node
72-
priority <replica-set-node-priority>`" for more background on this
67+
becoming primary. For instance, a member located in a remote data
68+
center should *not* become primary. See: :ref:`node
69+
priority <replica-set-node-priority>` for more background on this
7370
concept.
7471

75-
Replica sets can also include nodes with four special
76-
configurations which affect membership behavior in a replica
77-
set. Consider the following node types:
72+
Replica sets can also include members with the following four special
73+
configurations that affect membership behavior:
7874

79-
- :ref:`Secondary-only <replica-set-secondary-only-members>` members
80-
have their "priority" set to 0 and thus not eligible for election as
81-
primary nodes.
75+
- :ref:`Secondary-only <replica-set-secondary-only-members>` members have
76+
their :data:`priority <members[n].priority>` values set to ``0`` and thus
77+
are not eligible for election as primaries.
8278

8379
- :ref:`Hidden <replica-set-hidden-members>` members do not appear in the
84-
output of :func:`db.isMaster()`. This setting prevents clients
85-
from discovering, and thus potentially queries, the node in question.
80+
output of :func:`db.isMaster()`. This prevents clients
81+
from discovering and potentially querying the member in question.
8682

8783
- :ref:`Delayed <replica-set-delayed-members>` members lag a fixed period
88-
of time behind the the primary node. These nodes are typically used
84+
of time behind the primary. These members are typically used
8985
for disaster recovery scenarios. For example, if an administrator
9086
mistakenly truncates a collection, and you discover the mistake within
91-
the lag window, then you can manually fail over to the delayed node.
87+
the lag window, then you can manually fail over to the delayed member.
9288

9389
- :ref:`Arbiters <replica-set-arbiters>` exist solely to participate
9490
in elections. They do not replicate data from the primary.
9591

9692
In almost every case, replica sets simplify the process of
97-
administering database replication; however, replica sets still have a
93+
administering database replication. However, replica sets still have a
9894
unique set of administrative requirements and concerns. Choosing the
9995
right :doc:`system architecture </administration/replication-architectures>`
10096
for your data set is crucial.
10197

10298
Administrators of replica sets also have unique :ref:`monitoring
103-
<replica-set-monitoring>`, and :ref:`security <replica-set-security>`
99+
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
104100
concerns. The :ref:`replica set functions <replica-set-functions>` in
105101
the :program:`mongo` shell, provide the tools necessary for replica set
106102
administration. In particular use the :func:`rs.conf()` to return a
107103
:term:`document` that holds the :doc:`replica set configuration
108-
</reference/replica-configuration>`, and :func:`rs.reconfig()` to
104+
</reference/replica-configuration>` and use :func:`rs.reconfig()` to
109105
modify the configuration of an existing replica set.
110106

111107
.. index:: replica set; elections
@@ -118,30 +114,30 @@ Elections
118114
When you initialize a :term:`replica set` for the first time, or when any
119115
failover occurs, an election takes place to decide which member should
120116
become :term:`primary`. A primary is the only member in the replica
121-
set that can accept write operations including :func:`insert()
117+
set that can accept write operations, including :func:`insert()
122118
<db.collection.insert()>`, :func:`update() <db.collection.update()>`,
123119
and :func:`remove() <db.collection.remove()>`.
124120

125121
Elections are the process replica set members use to
126-
select the primary in a set. Elections follow after one of two events:
127-
a primary that "steps down" or a :term:`secondary` that
128-
loses contact with a :term:`primary` member. All members have one vote
122+
select the primary in a set. Two types of events can trigger an election:
123+
a primary steps down or a :term:`secondary` member
124+
loses contact with a primary. All members have one vote
129125
in an election, and any :program:`mongod` can veto an election. A
130-
single veto will invalidate the election.
126+
single veto invalidates the election.
131127

132128
An existing primary will step down in response to the
133-
:dbcommand:`replSetStepDown` command, or if it sees that one of
129+
:dbcommand:`replSetStepDown` command or if it sees that one of
134130
the current secondaries is eligible for election *and* has a higher
135131
priority. A secondary will call for an election if it cannot
136-
establish a connection to a primary member. A primary will also step
137-
down when they cannot contact a majority of the members of the replica
138-
set. When the current primary member steps down, it closes all open client
132+
establish a connection to a primary. A primary will also step
133+
down when it cannot contact a majority of the members of the replica
134+
set. When the current primary steps down, it closes all open client
139135
connections to prevent clients from unknowingly writing data to a
140136
non-primary member.
141137

142138
In an election, every member, including :ref:`hidden
143139
<replica-set-hidden-members>` members, :ref:`arbiters
144-
<replica-set-arbiters>`, and even recovering members get a single
140+
<replica-set-arbiters>`, and even recovering members, get a single
145141
vote. Members will give votes to every eligible member that calls an
146142
election.
147143

@@ -150,24 +146,24 @@ conditions:
150146

151147
- If the member seeking an election is not a member of the voter's set.
152148

153-
- If the member seeking an election is not up to date with the most
149+
- If the member seeking an election is not up-to_date with the most
154150
recent operation accessible in the replica set.
155151

156152
- If the member seeking an election has a lower priority than another member
157153
in the set that is also eligible for election.
158154

159-
- If the current :term:`primary` member has more recent operations
155+
- If the current primary member has more recent operations
160156
(i.e. a higher "optime") than the member seeking election, from the
161157
perspective of the voting member.
162158

163159
- The current primary will also veto an election if it has the same or
164160
more recent operations (i.e. a "higher or equal optime") than the
165-
node seeking election.
161+
member seeking election.
166162

167163
.. note::
168164

169-
Any member of a replica set *can* veto an election, even if they
170-
are :ref:`non-voting members <replica-set-non-voting-members>`.
165+
Any member of a replica set *can* veto an election, even if the
166+
member is a :ref:`non-voting member <replica-set-non-voting-members>`.
171167

172168
The first member to receive votes from a majority of members in a set
173169
becomes the next primary until the next election. Be
@@ -177,61 +173,58 @@ aware of the following conditions and possible situations:
177173
seconds. If a heartbeat does not return for more than 10 seconds,
178174
the other members mark the delinquent member as inaccessible.
179175

180-
- Replica set members only compare priorities with other members of
176+
- Replica set members compare priorities only with other members of
181177
the set. The absolute value of priorities does not have any impact on
182178
the outcome of replica set elections.
183179

184180
.. note::
185181

186-
The only exception is that members with a priority of ``0``
187-
cannot become :term:`primary` and will not seek election. See
182+
The only exception is that members with :data:`priority <members[n].priority>` values of ``0``
183+
cannot become primary and will not seek election. See
188184
:ref:`replica-set-node-priority-configuration` for more
189185
information.
190186

191-
- Replica set members cannot become primary *unless* they have the
192-
highest "optime" of any visible members in the set.
187+
- A replica set member cannot become primary *unless* it has the
188+
highest "optime" of any visible member in the set.
193189

194190
- If the member of the set with the highest priority is within 10
195-
seconds of the latest oplog entry, then the set will *not* elect a
196-
:term:`primary` until the member with the highest priority catches up
191+
seconds of the latest :term:`oplog` entry, then the set will *not* elect a
192+
primary until the member with the highest priority catches up
197193
to the latest operation.
198194

199195

200-
.. seealso:: ":ref:`Non-voting members in a replica
201-
set<replica-set-non-voting-members>`",
202-
":ref:`replica-set-node-priority-configuration`", and
203-
":data:`replica configuration <members[n].votes>`
196+
.. seealso:: :ref:`Non-voting members in a replica
197+
set<replica-set-non-voting-members>`,
198+
:ref:`replica-set-node-priority-configuration`, and
199+
:data:`replica configuration <members[n].votes>`
204200

205201
Syncing
206202
-------
207203

208-
Replica set members sync, or copy :term:`oplog` entries, from the
209-
:term:`primary` or another :term:`secondary` member of the set in
210-
order to remain up to date with the current state of the set.
204+
In order to remain up-to-date with the current state of the :term:`replica set`,
205+
set members sync, or copy, :term:`oplog` entries from other members.
211206

212-
When a new member joins a set, or an existing member restarts, the new
213-
member waits to receive heartbeats from other members. Then, by
214-
default, a new member will sync from the *the closest* member of the
215-
set that is either: the primary or another secondary with more recent
216-
oplog entries. This prevents two secondaries from syncing from each
217-
other.
207+
When a new member joins a set or when an existing member restarts, the
208+
member waits to receive heartbeats from other members. By
209+
default, the member syncs from the *the closest* member of the
210+
set that is either the primary or another secondary with more recent
211+
oplog entries. This prevents two secondaries from syncing from each other.
218212

219213
In version 2.0, secondaries only change sync targets if the connection
220214
between secondaries drops or produces an error.
221215

222216
For example:
223217

224-
#. If you have two secondary members in one data center, a primary in
225-
a second facility, *and* you start all three instances at roughly
218+
#. If you have two secondary members in one data center and a primary in
219+
a second facility, and if you start all three instances at roughly
226220
the same time (i.e. with no existing data sets or oplog,) both
227221
secondaries will likely sync from the primary, as neither secondary
228222
has more recent oplog entries.
229223

230-
If you restart one of the secondaries, when it rejoins the set it
231-
will likely begin syncing from the other secondary.
224+
If you restart one of the secondaries, then when it rejoins the set it
225+
will likely begin syncing from the other secondary, because of proximity.
232226

233-
#. If, you have a primary in one facility, and a secondary in an
234-
alternate facility, and you add another secondary to the alternate
235-
facility, the new secondary will likely sync from the older
236-
secondary because this member is closer and has more recent oplog
237-
entries.
227+
#. If you have a primary in one facility and a secondary in an
228+
alternate facility, and if you add another secondary to the alternate
229+
facility, the new secondary will likely sync from the existing
230+
secondary because it is closer than the primary.

0 commit comments

Comments
 (0)