Skip to content

Commit ac8fb68

Browse files
author
Bob Grabar
committed
DOCS-260 migrating rs design concepts, draft 3
1 parent 9b75247 commit ac8fb68

File tree

4 files changed

+54
-88
lines changed

4 files changed

+54
-88
lines changed

source/administration/replica-sets.txt

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,7 @@ configurations.
5151
connections. While, this typically takes 10-20 seconds, attempt to
5252
make these changes during scheduled maintenance periods.
5353

54-
.. seealso::
55-
56-
- The :ref:`replica-set-elections` topic in the :doc:`/core/replication` document
57-
- The :ref:`replica-set-election-internals` topic in the :doc:`/core/replication-internals` document
54+
.. include:: /includes/seealso-elections.rst
5855

5956
.. index:: replica set members; secondary only
6057
.. _replica-set-secondary-only-members:
@@ -116,7 +113,7 @@ This sets the following:
116113
election for primary.
117114

118115
.. seealso:: :data:`members[n].priority` and :ref:`Replica Set
119-
Reconfiguration <replica-set-reconfiguration-usage>`
116+
Reconfiguration <replica-set-reconfiguration-usage>`.
120117

121118
.. index:: replica set members; hidden
122119
.. _replica-set-hidden-members:
@@ -161,7 +158,7 @@ other members in the set will not advertise the hidden member in the
161158
of ``0``, the operation fails.
162159

163160
.. seealso:: :ref:`Replica Set Read Preference <replica-set-read-preference>`
164-
and :ref:`Replica Set Reconfiguration <replica-set-reconfiguration-usage>`
161+
and :ref:`Replica Set Reconfiguration <replica-set-reconfiguration-usage>`.
165162

166163
.. index:: replica set members; delayed
167164
.. _replica-set-delayed-members:
@@ -399,7 +396,7 @@ you specify a full configuration object with :method:`rs.add()`, you must
399396
declare the ``_id`` field, which is not automatically populated in
400397
this case.
401398

402-
.. seealso:: :doc:`/tutorial/expand-replica-set`
399+
.. seealso:: :doc:`/tutorial/expand-replica-set`.
403400

404401
.. _replica-set-admin-procedure-remove-members:
405402

@@ -746,7 +743,5 @@ data to a :term:`BSON` file that you can view using
746743
You can prevent rollbacks by ensuring safe writes by using
747744
the appropriate :term:`write concern`.
748745

749-
.. seealso::
746+
.. include:: /includes/seealso-elections.rst
750747

751-
- The :ref:`replica-set-elections` topic in the :doc:`/core/replication` document
752-
- The :ref:`replica-set-election-internals` topic in the :doc:`/core/replication-internals` document

source/applications/replication.txt

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,36 @@ This document describes those options and their implications.
1717
shards are also replica sets provide the same configuration options
1818
with regards to write and read operations.
1919

20+
.. TODO Is any of the following missing from this document:
21+
22+
.. Writes committed at the primary may be visible before the
23+
cluster-wide commit completes. The read uncommitted semantics (an
24+
option on many databases) are more relaxed and make theoretically
25+
achievable performance and availability higher (for example we never
26+
have an object locked in the server where the locking is dependent on
27+
network performance).
28+
29+
.. On a failover, if there are writes which have not replicated from the
30+
primary, the writes are rolled back. To confirm replica-set-wide
31+
commits, use the getLastError command. On a failover, data is backed
32+
up to files in the rollback directory. To recover this data use the
33+
mongorestore.
34+
35+
.. Merging back old operations later, after another member has accepted
36+
writes, is a hard problem. One then has multi-master replication,
37+
with potential for conflicting writes. Typically that is handled in
38+
other products by manual version reconciliation code by developers.
39+
That is too much work. Multi-master also can make atomic operation
40+
semantics problematic. It is possible (as mentioned above) to
41+
manually recover these events, via manual DBA effort, but in large
42+
system with many, many members that such efforts become impractical.
43+
44+
.. Calling getLastError causes the client to wait for a response from
45+
the server. This can slow the client's throughput on writes if large
46+
numbers are made because of the client/server network turnaround
47+
times. Thus for "non-critical" writes it often makes sense to make no
48+
getLastError check at all, or only a single check after many writes.
49+
2050
.. _write-concern:
2151
.. _replica-set-write-concern:
2252

source/core/replication-internals.txt

Lines changed: 15 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,12 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
1818
Oplog
1919
-----
2020

21-
Under normal operation, MongoDB updates the :ref:`oplog <replica-set-oplog-sizing>`
22-
on a :term:`secondary` within one second of
23-
applying an operation to a :term:`primary`. However, various exceptional
24-
situations may cause a secondary to lag further behind. See
21+
For an explanation of the oplog, see the :ref:`oplog <replica-set-oplog-sizing>`
22+
topic in the :doc:`/core/replication` document.
23+
24+
Under various exceptional
25+
situations, updates to a :term:`secondary's <secondary>` oplog might
26+
lag behind the desired performance time. See
2527
:ref:`Replication Lag <replica-set-replication-lag>` for details.
2628

2729
All members of a :term:`replica set` send heartbeats (pings) to all
@@ -69,65 +71,6 @@ secondaries may not always reflect the latest writes to the
6971
output to asses the current state of replication and determine if
7072
there is any unintended replication delay.
7173

72-
Write Concern and getLastError
73-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74-
75-
A write is committed once it has replicated to a majority of members of
76-
the set. For important writes, the client should request acknowledgement
77-
of this with :dbcommand:`getLastError` set to ``w`` to get confirmation
78-
the commit has finished. For more information on
79-
:dbcommand:`getLastError`, see :doc:`/applications/replication`.
80-
81-
.. TODO Verify if the following info is needed. -BG
82-
83-
Queries in MongoDB and replica sets have "READ UNCOMMITTED"
84-
semantics. Writes which are committed at the primary of the set may
85-
be visible before the cluster-wide commit completes.
86-
87-
The read uncommitted semantics (an option on many databases) are more
88-
relaxed and make theoretically achievable performance and
89-
availability higher (for example we never have an object locked in
90-
the server where the locking is dependent on network performance).
91-
92-
Write Concern and Failover
93-
~~~~~~~~~~~~~~~~~~~~~~~~~~
94-
95-
On a failover, if there are writes which have not replicated from the
96-
:term:`primary`, the writes are rolled back. Therefore, to confirm replica-set-wide commits,
97-
use the :dbcommand:`getLastError` command.
98-
99-
On a failover, data is backed up to files in the rollback directory. To
100-
recover this data use the :program:`mongorestore`.
101-
102-
.. TODO Verify whether to include the following. -BG
103-
104-
Merging back old operations later, after another member has accepted
105-
writes, is a hard problem. One then has multi-master replication,
106-
with potential for conflicting writes. Typically that is handled in
107-
other products by manual version reconciliation code by developers.
108-
We think that is too much work : we want MongoDB usage to be less
109-
developer work, not more. Multi-master also can make atomic operation
110-
semantics problematic.
111-
112-
It is possible (as mentioned above) to manually recover these events,
113-
via manual DBA effort, but we believe in large system with many, many
114-
members that such efforts become impractical.
115-
116-
Some drivers support 'safe' write modes for critical writes. For
117-
example via setWriteConcern in the Java driver.
118-
119-
Additionally, defaults for { w : ... } parameter to getLastError can
120-
be set in the replica set's configuration.
121-
122-
..note::
123-
124-
Calling :dbcommand:`getLastError` causes the client to wait for a
125-
response from the server. This can slow the client's throughput on
126-
writes if large numbers are made because of the client/server network
127-
turnaround times. Thus for "non-critical" writes it often makes sense
128-
to make no :dbcommand:`getLastError` check at all, or only a single
129-
check after many writes.
130-
13174
.. _replica-set-member-configurations-internals:
13275

13376
Member Configurations
@@ -271,21 +214,15 @@ aware of the following conditions and possible situations:
271214
Elections and Network Partitions
272215
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
273216

274-
A replica set has at most one primary at a given time. If a majority of
275-
the set is up, the most up-to-date secondary will be elected primary. If
276-
a majority of the set is not up or reachable, no member will be elected
277-
primary.
278-
279-
There is no way to tell (from the set's point of view) the difference
280-
between a network partition and members going down, so members left in a
281-
minority will not attempt to become primary (to prevent a set from
282-
ending up with primaries on either side of a partition).
283-
284-
This means that, if there is no majority on either side of a network
285-
partition, the set will be read only. Thus, we suggest an odd number of
286-
servers: e.g., two servers in one data center and one in another. The
287-
upshot of this strategy is that data is consistent: there are no
288-
multi-primary conflicts to resolve.
217+
.. TODO The following two paragraphs needs review -BG
218+
219+
Members on either side of a network partition cannot see each other when
220+
determining whether a majority is available to hold an election.
221+
222+
That means that if a primary steps down and neither side of the
223+
partition has a majority on its own, the set will not elect a new
224+
primary and the set will become read only. The best practice is to have
225+
and a majority of servers in one data center and one server in another.
289226

290227
Syncing
291228
-------

source/includes/seealso-elections.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. seealso:: The :ref:`replica-set-elections` topic in the
2+
:doc:`/core/replication` document, and the
3+
:ref:`replica-set-election-internals` topic in the
4+
:doc:`/core/replication-internals` document.

0 commit comments

Comments
 (0)