@@ -7,11 +7,10 @@ Replication Internals
7
7
Synopsis
8
8
--------
9
9
10
- This document provides a more in-depth explanation of the internals
11
- and operation of replica set features. This material is not necessary
12
- for normal operation or application development, but may be
13
- useful for troubleshooting, helpful understanding MongoDB's behavior,
14
- or interesting for learning about MongoDB's approach.
10
+ This document provides a more in-depth explanation of the internals and
11
+ operation of :term:`replica set` features. This material is not necessary for
12
+ normal operation or application development but may be useful for
13
+ troubleshooting and for further understanding MongoDB's behavior and approach.
15
14
16
15
.. index:: replica set; oplog
17
16
.. _replica-set-oplog:
@@ -21,19 +20,17 @@ Oplog
21
20
22
21
Replication itself works by way of a special :term:`capped collection`
23
22
called the :term:`oplog`. This collection keeps a rolling record of
24
- all operations applied to the primary node . Secondary nodes then
23
+ all operations applied to the :term:` primary` . Secondary members then
25
24
replicate this log by applying the operations to themselves in an
26
- asynchronous process. Under normal operation, secondary nodes will
25
+ asynchronous process. Under normal operation, :term:` secondary` members
27
26
reflect writes within one second of the primary. However, various
28
- exceptional situations may cause secondaries lag behind further. See
27
+ exceptional situations may cause secondaries to lag behind further. See
29
28
:term:`replication lag` for details.
30
29
31
- Also consider :ref:`oplog sizing <replica- set-oplog-sizing>` for more
32
- information about the oplog.
30
+ All members send heartbeats (pings) to all other members in the set and can
31
+ import operations to the local oplog from any other member in the set .
33
32
34
- All members send heartbeats to all other members, and can import
35
- operations to into its oplog from any other member in the
36
- set.
33
+ For more information about the oplog, see :ref:`oplog sizing <replica-set-oplog-sizing>`.
37
34
38
35
.. In 2.0, replicas would import entries from the member lowest
39
36
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.
@@ -47,65 +44,64 @@ MongoDB uses :term:`single-master replication` to ensure that the
47
44
database remains consistent. However, clients may modify the
48
45
:ref:`read preferences <replica-set-read-preference>` on a
49
46
per-connection basis in order to distribute read operations to the
50
- secondary members of a replica set. Read-heavy deployments may achieve
51
- greater query volumes by distributing reads to secondary nodes . But
47
+ :term:` secondary` members of a :term:` replica set` . Read-heavy deployments may achieve
48
+ greater query volumes by distributing reads to secondary members . But
52
49
keep in mind that replication is asynchronous; therefore, reads from
53
50
secondaries may not always reflect the latest writes to the
54
- primary. See the " :ref:`consistency <replica-set-consistency>`"
55
- section for more about " :ref:`read preference
56
- <replica-set-read-preference>`" and " :ref:`write concern
57
- <replica-set-write-concern>`."
51
+ :term:` primary` . See the :ref:`consistency <replica-set-consistency>`
52
+ section for more about :ref:`read preference
53
+ <replica-set-read-preference>` and :ref:`write concern
54
+ <replica-set-write-concern>`.
58
55
59
56
.. note::
60
57
61
- Use :func:`db.getReplicationInfo()` from a secondary node
62
- and the " :doc:`replication status </reference/replication-info>`
63
- output to asses the current state of replication, and determine if
58
+ Use :func:`db.getReplicationInfo()` from a secondary member
59
+ and the :doc:`replication status </reference/replication-info>`
60
+ output to asses the current state of replication and determine if
64
61
there is any unintended replication delay.
65
62
66
- In the default configuration, all have nodes an equal chance of
67
- becoming primary; however, it's possible to set "priorities" that
63
+ In the default configuration, all members have an equal chance of
64
+ becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
68
65
weight the election. In some architectures, there may be operational
69
66
reasons for increasing the likelihood of a specific replica set member
70
- becoming primary. For instance, a node located in a remote data
71
- center should become primary . See: " :ref:`node
72
- priority <replica-set-node-priority>`" for more background on this
67
+ becoming primary. For instance, a member located in a remote data
68
+ center should *not* become primary. See: :ref:`node
69
+ priority <replica-set-node-priority>` for more background on this
73
70
concept.
74
71
75
- Replica sets can also include nodes with four special
76
- configurations which affect membership behavior in a replica
77
- set. Consider the following node types:
72
+ Replica sets can also include members with the following four special
73
+ configurations that affect membership behavior:
78
74
79
- - :ref:`Secondary-only <replica-set-secondary-only-members>` members
80
- have their " priority" set to 0 and thus not eligible for election as
81
- primary nodes .
75
+ - :ref:`Secondary-only <replica-set-secondary-only-members>` members have
76
+ their :data:` priority <members[n].priority>` values set to ``0`` and thus
77
+ are not eligible for election as primaries .
82
78
83
79
- :ref:`Hidden <replica-set-hidden-members>` members do not appear in the
84
- output of :func:`db.isMaster()`. This setting prevents clients
85
- from discovering, and thus potentially queries, the node in question.
80
+ output of :func:`db.isMaster()`. This prevents clients
81
+ from discovering and potentially querying the member in question.
86
82
87
83
- :ref:`Delayed <replica-set-delayed-members>` members lag a fixed period
88
- of time behind the the primary node . These nodes are typically used
84
+ of time behind the primary. These members are typically used
89
85
for disaster recovery scenarios. For example, if an administrator
90
86
mistakenly truncates a collection, and you discover the mistake within
91
- the lag window, then you can manually fail over to the delayed node .
87
+ the lag window, then you can manually fail over to the delayed member .
92
88
93
89
- :ref:`Arbiters <replica-set-arbiters>` exist solely to participate
94
90
in elections. They do not replicate data from the primary.
95
91
96
92
In almost every case, replica sets simplify the process of
97
- administering database replication; however , replica sets still have a
93
+ administering database replication. However , replica sets still have a
98
94
unique set of administrative requirements and concerns. Choosing the
99
95
right :doc:`system architecture </administration/replication-architectures>`
100
96
for your data set is crucial.
101
97
102
98
Administrators of replica sets also have unique :ref:`monitoring
103
- <replica-set-monitoring>`, and :ref:`security <replica-set-security>`
99
+ <replica-set-monitoring>` and :ref:`security <replica-set-security>`
104
100
concerns. The :ref:`replica set functions <replica-set-functions>` in
105
101
the :program:`mongo` shell, provide the tools necessary for replica set
106
102
administration. In particular use the :func:`rs.conf()` to return a
107
103
:term:`document` that holds the :doc:`replica set configuration
108
- </reference/replica-configuration>`, and :func:`rs.reconfig()` to
104
+ </reference/replica-configuration>` and use :func:`rs.reconfig()` to
109
105
modify the configuration of an existing replica set.
110
106
111
107
.. index:: replica set; elections
@@ -118,30 +114,30 @@ Elections
118
114
When you initialize a :term:`replica set` for the first time, or when any
119
115
failover occurs, an election takes place to decide which member should
120
116
become :term:`primary`. A primary is the only member in the replica
121
- set that can accept write operations including :func:`insert()
117
+ set that can accept write operations, including :func:`insert()
122
118
<db.collection.insert()>`, :func:`update() <db.collection.update()>`,
123
119
and :func:`remove() <db.collection.remove()>`.
124
120
125
121
Elections are the process replica set members use to
126
- select the primary in a set. Elections follow after one of two events :
127
- a primary that " steps down" or a :term:`secondary` that
128
- loses contact with a :term:` primary` member . All members have one vote
122
+ select the primary in a set. Two types of events can trigger an election :
123
+ a primary steps down or a :term:`secondary` member
124
+ loses contact with a primary. All members have one vote
129
125
in an election, and any :program:`mongod` can veto an election. A
130
- single veto will invalidate the election.
126
+ single veto invalidates the election.
131
127
132
128
An existing primary will step down in response to the
133
- :dbcommand:`replSetStepDown` command, or if it sees that one of
129
+ :dbcommand:`replSetStepDown` command or if it sees that one of
134
130
the current secondaries is eligible for election *and* has a higher
135
131
priority. A secondary will call for an election if it cannot
136
- establish a connection to a primary member . A primary will also step
137
- down when they cannot contact a majority of the members of the replica
138
- set. When the current primary member steps down, it closes all open client
132
+ establish a connection to a primary. A primary will also step
133
+ down when it cannot contact a majority of the members of the replica
134
+ set. When the current primary steps down, it closes all open client
139
135
connections to prevent clients from unknowingly writing data to a
140
136
non-primary member.
141
137
142
138
In an election, every member, including :ref:`hidden
143
139
<replica-set-hidden-members>` members, :ref:`arbiters
144
- <replica-set-arbiters>`, and even recovering members get a single
140
+ <replica-set-arbiters>`, and even recovering members, get a single
145
141
vote. Members will give votes to every eligible member that calls an
146
142
election.
147
143
@@ -150,24 +146,24 @@ conditions:
150
146
151
147
- If the member seeking an election is not a member of the voter's set.
152
148
153
- - If the member seeking an election is not up to date with the most
149
+ - If the member seeking an election is not up-to_date with the most
154
150
recent operation accessible in the replica set.
155
151
156
152
- If the member seeking an election has a lower priority than another member
157
153
in the set that is also eligible for election.
158
154
159
- - If the current :term:` primary` member has more recent operations
155
+ - If the current primary member has more recent operations
160
156
(i.e. a higher "optime") than the member seeking election, from the
161
157
perspective of the voting member.
162
158
163
159
- The current primary will also veto an election if it has the same or
164
160
more recent operations (i.e. a "higher or equal optime") than the
165
- node seeking election.
161
+ member seeking election.
166
162
167
163
.. note::
168
164
169
- Any member of a replica set *can* veto an election, even if they
170
- are :ref:`non-voting members <replica-set-non-voting-members>`.
165
+ Any member of a replica set *can* veto an election, even if the
166
+ member is a :ref:`non-voting member <replica-set-non-voting-members>`.
171
167
172
168
The first member to receive votes from a majority of members in a set
173
169
becomes the next primary until the next election. Be
@@ -177,61 +173,58 @@ aware of the following conditions and possible situations:
177
173
seconds. If a heartbeat does not return for more than 10 seconds,
178
174
the other members mark the delinquent member as inaccessible.
179
175
180
- - Replica set members only compare priorities with other members of
176
+ - Replica set members compare priorities only with other members of
181
177
the set. The absolute value of priorities does not have any impact on
182
178
the outcome of replica set elections.
183
179
184
180
.. note::
185
181
186
- The only exception is that members with a priority of ``0``
187
- cannot become :term:` primary` and will not seek election. See
182
+ The only exception is that members with :data:`priority <members[n]. priority>` values of ``0``
183
+ cannot become primary and will not seek election. See
188
184
:ref:`replica-set-node-priority-configuration` for more
189
185
information.
190
186
191
- - Replica set members cannot become primary *unless* they have the
192
- highest "optime" of any visible members in the set.
187
+ - A replica set member cannot become primary *unless* it has the
188
+ highest "optime" of any visible member in the set.
193
189
194
190
- If the member of the set with the highest priority is within 10
195
- seconds of the latest oplog entry, then the set will *not* elect a
196
- :term:` primary` until the member with the highest priority catches up
191
+ seconds of the latest :term:` oplog` entry, then the set will *not* elect a
192
+ primary until the member with the highest priority catches up
197
193
to the latest operation.
198
194
199
195
200
- .. seealso:: " :ref:`Non-voting members in a replica
201
- set<replica-set-non-voting-members>`" ,
202
- " :ref:`replica-set-node-priority-configuration`" , and
203
- " :data:`replica configuration <members[n].votes>`
196
+ .. seealso:: :ref:`Non-voting members in a replica
197
+ set<replica-set-non-voting-members>`,
198
+ :ref:`replica-set-node-priority-configuration`, and
199
+ :data:`replica configuration <members[n].votes>`
204
200
205
201
Syncing
206
202
-------
207
203
208
- Replica set members sync, or copy :term:`oplog` entries, from the
209
- :term:`primary` or another :term:`secondary` member of the set in
210
- order to remain up to date with the current state of the set.
204
+ In order to remain up-to-date with the current state of the :term:`replica set`,
205
+ set members sync, or copy, :term:`oplog` entries from other members.
211
206
212
- When a new member joins a set, or an existing member restarts, the new
213
- member waits to receive heartbeats from other members. Then, by
214
- default, a new member will sync from the *the closest* member of the
215
- set that is either: the primary or another secondary with more recent
216
- oplog entries. This prevents two secondaries from syncing from each
217
- other.
207
+ When a new member joins a set or when an existing member restarts, the
208
+ member waits to receive heartbeats from other members. By
209
+ default, the member syncs from the *the closest* member of the
210
+ set that is either the primary or another secondary with more recent
211
+ oplog entries. This prevents two secondaries from syncing from each other.
218
212
219
213
In version 2.0, secondaries only change sync targets if the connection
220
214
between secondaries drops or produces an error.
221
215
222
216
For example:
223
217
224
- #. If you have two secondary members in one data center, a primary in
225
- a second facility, * and* you start all three instances at roughly
218
+ #. If you have two secondary members in one data center and a primary in
219
+ a second facility, and if you start all three instances at roughly
226
220
the same time (i.e. with no existing data sets or oplog,) both
227
221
secondaries will likely sync from the primary, as neither secondary
228
222
has more recent oplog entries.
229
223
230
- If you restart one of the secondaries, when it rejoins the set it
231
- will likely begin syncing from the other secondary.
224
+ If you restart one of the secondaries, then when it rejoins the set it
225
+ will likely begin syncing from the other secondary, because of proximity .
232
226
233
- #. If, you have a primary in one facility, and a secondary in an
234
- alternate facility, and you add another secondary to the alternate
235
- facility, the new secondary will likely sync from the older
236
- secondary because this member is closer and has more recent oplog
237
- entries.
227
+ #. If you have a primary in one facility and a secondary in an
228
+ alternate facility, and if you add another secondary to the alternate
229
+ facility, the new secondary will likely sync from the existing
230
+ secondary because it is closer than the primary.
0 commit comments