Skip to content

Commit fd8b6a1

Browse files
committed
DOCSP-2133: Clarify time required for failover/election
1 parent efa833e commit fd8b6a1

11 files changed

+113
-82
lines changed

source/core/replica-set-elections.txt

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,28 +17,31 @@ Replica Set Elections
1717
:class: singlecol
1818

1919
:term:`Replica sets <replica set>` use elections to determine which
20-
set member will become :term:`primary`. Elections occur after
21-
initiating a replica set, and also any time the primary becomes
22-
unavailable. The primary is the only member in the set that can accept
23-
write operations. If a primary becomes unavailable, elections allow
24-
the set to recover normal operations without manual
25-
intervention.
26-
In the following three-member replica set, the primary is unavailable.
27-
One of the remaining secondaries holds an election to elect itself as a
28-
new primary.
20+
set member will become :term:`primary`. Replica sets can trigger an
21+
election in response to a variety of events, such as:
22+
23+
- Adding a new node to the replica set,
24+
- :method:`initiating a replica set <rs.initiate()>`,
25+
- performing replica set maintenance using methods such as :method:`rs.stepDown()` or :method:`rs.reconfig()`, and
26+
- the :term:`secondary` members losing connectivity to the primary for more than the configured :rsconf:`timeout <settings.electionTimeoutMillis>` (10 seconds by default).
27+
28+
In the following diagram, the primary node was unavailable for longer
29+
than the :rsconf:`configured timeout <settings.electionTimeoutMillis>`
30+
and triggers the :ref:`automatic failover <replication-auto-failover>`
31+
process. One of the remaining secondaries calls for an election to
32+
select a new primary and automatically resume normal operations.
2933

3034
.. include:: /images/replica-set-trigger-election.rst
3135

32-
Elections are essential for independent operation of a
33-
replica set; however, elections take time to complete. While an
34-
election is in process, the replica set has no primary and cannot
35-
accept writes and all remaining members become read-only.
36+
The replica set cannot process write operations until the
37+
election completes successfully. The replica set can continue to serve
38+
read queries if such queries are configured to
39+
:ref:`run on secondaries <replica-set-read-preference>`.
40+
41+
.. include:: /includes/fact-election-latency.rst
42+
43+
.. include:: /includes/fact-retryable-writes-failover-election.rst
3644

37-
If a majority of the replica set is inaccessible or unavailable to the
38-
current primary, the primary will step down and become a secondary. The
39-
replica set cannot accept writes after this occurs, but remaining
40-
members can continue to serve read queries if such queries are
41-
configured to run on secondaries.
4245

4346
Factors and Conditions that Affect Elections
4447
--------------------------------------------

source/core/replica-set-high-availability.txt

Lines changed: 3 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,29 +15,8 @@ Replica Set High Availability
1515
:depth: 1
1616
:class: singlecol
1717

18-
:term:`Replica sets <replica set>` provide high availability using
19-
automatic :term:`failover`. Failover allows a :term:`secondary` member
20-
to become :term:`primary` if the current primary becomes unavailable.
18+
Replica sets use elections to support high availability.
2119

22-
.. versionchanged:: 3.2
20+
.. include:: /includes/toc/dfn-list-replica-set-high-availability.rst
2321

24-
MongoDB introduces a version 1 of the replication protocol
25-
(:rsconf:`protocolVersion: 1 <protocolVersion>`) to reduce replica set
26-
failover time and accelerates the detection of multiple simultaneous
27-
primaries. New replica sets will, by default, use
28-
:rsconf:`protocolVersion: 1 <protocolVersion>`. Previous versions of
29-
MongoDB use version 0 of the protocol.
30-
31-
Replica set members keep the same data set but are otherwise
32-
independent. If the primary becomes unavailable, an eligible secondary
33-
holds an :doc:`election </core/replica-set-elections>` to elect itself
34-
as a new primary. In some situations, the failover process may undertake
35-
a :doc:`rollback </core/replica-set-rollbacks>`. [#rollback-automatic]_
36-
37-
.. class:: hidden
38-
39-
.. include:: /includes/toc/replica-set-high-availability.rst
40-
41-
.. [#rollback-automatic] Replica sets remove "rollback" data when
42-
needed without intervention. Administrators must apply or discard
43-
rollback data manually.
22+
.. include:: /includes/toc/replica-set-high-availability.rst
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
The median time before a cluster elects a new primary should not
2+
typically exceed 12 seconds, assuming default :rsconf:`replica
3+
configuration settings <settings>`. This includes time required to
4+
mark the primary as :ref:`unavailable <replication-auto-failover>` and
5+
call and complete an :ref:`election <replica-set-elections>`.
6+
You can tune this time period by modifying the
7+
:rsconf:`settings.electionTimeoutMillis` replication configuration
8+
option. Factors such as network latency may extend the time required
9+
for replica set elections to complete, which in turn affects the amount
10+
of time your cluster may operate without a primary. These factors are
11+
dependent on your particular cluster architecture.

source/includes/fact-replica-set-protocolVersion1.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
MongoDB introduces a version 1 of the replication protocol
44
(:rsconf:`protocolVersion: 1 <protocolVersion>`) to reduce replica
5-
set failover time and accelerates the detection of multiple
6-
simultaneous primaries. New replica sets will, by default, use
5+
set failover time and accelerate the detection of multiple
6+
simultaneous primaries. New replica sets, by default, use
77
:rsconf:`protocolVersion: 1 <protocolVersion>`. Previous versions of
8-
MongoDB use version 0 of the protocol.
8+
MongoDB use version 0 of the protocol. See :ref:`replication election
9+
enhancements <3.2-rel-notes-rs-enhancements>` for details.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Your application connection logic should include tolerance for
2+
automatic failovers and the subsequent elections.
3+
4+
.. versionadded:: 3.6
5+
6+
MongoDB 3.6+ drivers can detect the loss of the primary and
7+
automatically :ref:`retry certain write operations
8+
<retryable-writes>` a single time, providing additional built-in
9+
handling of automatic failovers and elections.

source/includes/steps-perform-maintenance-task-on-replica-set-members.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ action:
9090
language: javascript
9191
code: rs.stepDown(300)
9292
post: |
93-
After the primary steps down, the replica set elects a new
94-
primary. See :doc:`/core/replica-set-elections` for more
93+
After the primary steps down, the replica set will elect a new
94+
primary. See :ref:`replica-set-elections` for more
9595
information about replica set elections.
9696
9797
Restart :binary:`~bin.mongod` as a standalone instance, making

source/reference/command/replSetGetStatus.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -343,14 +343,14 @@ following fields:
343343

344344
For the current primary, information regarding the election
345345
:ref:`Timestamp <document-bson-type-timestamp>` from the
346-
operation log. See :doc:`/core/replica-set-elections` for more
346+
operation log. See :doc:`/core/replica-set-high-availability` for more
347347
information about elections.
348348

349349
.. data:: replSetGetStatus.members[n].electionDate
350350

351351
For the current primary, an :term:`ISODate` formatted date string
352352
that reflects the election date. See
353-
:doc:`/core/replica-set-elections` for more information about
353+
:doc:`/core/replica-set-high-availability` for more information about
354354
elections.
355355

356356
.. data:: replSetGetStatus.members[n].self

source/reference/command/replSetReconfig.txt

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -53,15 +53,21 @@ considerations:
5353
- A majority of the set's members must be operational for the
5454
changes to propagate properly.
5555

56-
- This command can cause downtime as the set renegotiates
57-
primary-status. Typically this is 10-20 seconds, but could
58-
be as long as a minute or more. Therefore, you should attempt
59-
to reconfigure only during scheduled maintenance periods.
60-
61-
- In some cases, :dbcommand:`replSetReconfig` forces the current
62-
primary to step down, initiating an election for primary among the
63-
members of the replica set. When this happens, the primary node will drop
64-
all current connections.
56+
- :dbcommand:`replSetReconfig` can trigger the current
57+
primary to step down in some situations. When the primary steps down,
58+
it forcibly closes all client connections. Primary step-down triggers
59+
an :ref:`election <replica-set-elections>` to select a new
60+
:term:`primary`.
61+
62+
.. include:: /includes/fact-election-latency.rst
63+
64+
During the election process, the cluster cannot
65+
accept write operations until it elects the new primary.
66+
67+
.. include:: /includes/fact-retryable-writes-failover-election.rst
68+
69+
To further reduce potential impact to a production cluster,
70+
reconfigure only during scheduled maintenance periods.
6571

6672
.. versionchanged:: 3.2
6773

source/reference/glossary.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ Glossary
304304
failover
305305
The process that allows a :term:`secondary` member of a
306306
:term:`replica set` to become :term:`primary` in the event of a
307-
failure. See :ref:`replica-set-failover`.
307+
failure. See :ref:`replication-auto-failover`.
308308

309309
field
310310
A name-value pair in a :term:`document <document>`. A document has

source/reference/method/rs.reconfig.txt

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,19 @@ Availability
4646
~~~~~~~~~~~~
4747

4848
The :method:`rs.reconfig()` shell method can trigger the current primary
49-
to step down in some situations. When the
50-
primary steps down, it forcibly closes all client connections. This is
51-
by design. Since it may take a period of time to elect a new primary,
52-
schedule reconfiguration changes during maintenance periods to minimize loss of write availability.
49+
to step down in some situations. When the primary steps down, it
50+
forcibly closes all client connections. Primary step-down triggers an
51+
:ref:`election <replica-set-elections>` to select a new :term:`primary`.
52+
53+
.. include:: /includes/fact-election-latency.rst
54+
55+
During the election process, the cluster cannot
56+
accept write operations until it elects the new primary.
57+
58+
.. include:: /includes/fact-retryable-writes-failover-election.rst
59+
60+
To further reduce potential impact to a production cluster,
61+
reconfigure only during scheduled maintenance periods.
5362

5463
``{ force: true }``
5564
~~~~~~~~~~~~~~~~~~~

source/replication.txt

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -104,30 +104,43 @@ Automatic Failover
104104
------------------
105105

106106
When a primary does not communicate with the other members of the set
107-
for more than 10 seconds, an eligible secondary will hold an election
108-
to elect itself the new primary. The first secondary to hold an
109-
election and receive a majority of the members' votes becomes primary.
110-
111-
.. include:: /includes/fact-replica-set-protocolVersion1.rst
107+
for more than the configured :rsconf:`electionTimeoutMillis` period
108+
(10 seconds by default), an eligible secondary calls for an election
109+
to nominate itself as the new primary. The cluster attempts to
110+
complete the election of a new primary and eesume normal operations.
112111

113112
.. include:: /images/replica-set-trigger-election.rst
114113

115-
Although the timing varies, the failover process generally completes
116-
within a minute. For instance, it may take 10-30 seconds for the
117-
members of a :term:`replica set` to declare a :term:`primary`
118-
inaccessible (see :rsconf:`~settings.electionTimeoutMillis`). One of
119-
the remaining secondaries holds an :term:`election` to elect itself as
120-
a new primary. The election itself may take another 10-30 seconds.
114+
The replica set cannot process write operations
115+
until the election completes successfully. The replica set can continue
116+
to serve read queries if such queries are configured to
117+
:ref:`run on secondaries <replica-set-read-preference>` while the
118+
primary is offline.
119+
120+
.. include:: /includes/fact-election-latency.rst
121+
122+
Lowering the :rsconf:`~settings.electionTimeoutMillis`
123+
replication configuration option from the default ``10000`` (10 seconds)
124+
can result in faster detection of primary failure. However,
125+
the cluster may call elections more frequently due to factors such as
126+
temporary network latency even if the primary is otherwise healthy.
127+
This can result in increased :ref:`rollbacks <replica-set-rollback>` for
128+
:ref:`w : 1 <wc-w>` write operations.
129+
130+
.. include:: /includes/fact-retryable-writes-failover-election.rst
131+
132+
See :ref:`replica-set-elections` for complete documentation on
133+
replica set elections.
121134

122-
.. versionchanged:: 3.2
135+
To learn more about MongoDB's failover process, see:
123136

124-
Starting in MongoDB 3.2, with the :ref:`replication election
125-
enhancements <3.2-rel-notes-rs-enhancements>`, MongoDB reduces
126-
replica set failover time. See :ref:`replication election
127-
enhancements <3.2-rel-notes-rs-enhancements>` for details.
137+
- :ref:`replica-set-elections`
138+
- :ref:`retryable-writes`
139+
- :ref:`replica-set-rollback`
128140

129-
See :ref:`replica-set-elections` and
130-
:ref:`replica-set-rollbacks` for more information.
141+
.. seealso:
142+
143+
:ref:`write-concern`
131144

132145
Read Operations
133146
---------------
@@ -174,7 +187,7 @@ See :ref:`replica-set-secondary-only-members`,
174187
/core/replica-set-oplog
175188
/core/replica-set-sync
176189
/core/replica-set-architectures
177-
/core/replica-set-high-availability
190+
Replica Set High Availability </core/replica-set-high-availability>
178191
/applications/replication
179192
/administration/replica-set-deployment
180193
/administration/replica-set-member-configuration

0 commit comments

Comments
 (0)