Skip to content

Commit 1cb94bc

Browse files
author
Andrew Leung
committed
initial draft of sharding and failover
1 parent 0ef3780 commit 1cb94bc

File tree

1 file changed

+28
-20
lines changed

1 file changed

+28
-20
lines changed

source/administration/sharding-architectures.txt

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ created subsequently, may reside on any shard in the cluster.
104104
Failover scenarios within MongoDB
105105
---------------------------------
106106

107-
A properly deployed MongoDB shard cluster will not have a single point
107+
A properly deployed MongoDB shard cluster will have no single point
108108
of failure. This section describes potential points of failure within
109109
a shard cluster and its recovery method.
110110

@@ -116,42 +116,50 @@ For reference, a properly deployed MongoDB shard cluster consists of:
116116

117117
- :program:`mongos` running on each application server.
118118

119-
Scenarios:
119+
Potential failure scenarios:
120120

121121
- A :term:`mongos` or the application server failing.
122122

123123
As each application server is running its own :program:`mongos`
124124
instance, the database is still accessible for other application
125-
servers. :program:`mongos` is stateless, so if it fails, no critical
126-
information is lost. When :program:`mongos` restarts, it will retrieve a copy
127-
of the configuration from the :term:`config database` and resume
128-
working.
125+
servers and the data is intact. :program:`mongos` is stateless, so
126+
if it fails, no critical information is lost. When :program:`mongos`
127+
restarts, it will retrieve a copy of the configuration from the
128+
:term:`config database` and resume working.
129129

130130
Suggested user intervention: restart application servers and/or
131131
:program:`mongos`.
132132

133133
- A single :term:`mongod` suffers a failure in a shard.
134134

135-
A single :term:`mongod` instance failing will be recovered by a
136-
:term:`secondary` member of the shard replica set. As each shard
137-
will have a single :term:`primary` and two :term:`secondary` members
138-
with the exact same copy of the information, any member will be able
139-
to replace the failed member.
135+
A single :term:`mongod` instance failing within a shard will be
136+
recovered by a :term:`secondary` member of the :term:`replica
137+
set`. As each shard will have two :term:`secondary` members with the
138+
exact same copy of the information, :term:`secondary` members will
139+
be able to replace the failed :term:`primary` member.
140140

141-
Suggested course of action: investigate failure and replace member
142-
as soon as possible. Additional loss of members on same shard will
143-
reduce availablility.
141+
Suggested course of action: investigate failure and replace
142+
:term:`primary` member as soon as possible. Additional loss of
143+
members on same shard will reduce availablility and the shard
144+
cluster's data set reliability.
144145

145146
- All three replica set members of a shard fail.
146147

147148
All data within that shard will be unavailable, but the shard
148-
cluster will still be operational for applications. Data on other
149-
shards will be accessible and new data can be written to other shard
150-
members.
149+
cluster's other data will still be operational for applications and
150+
new data can be written to other shard members.
151151

152-
- A :term:`config database` suffers a failure.
152+
Suggested course of action: investigate situation immediately.
153+
154+
- A :term:`config database` server suffers a failure.
153155

154156
As the :term:`config database` is deployed in a 3 member
155157
configuration with two-phase commits to maintain synchronization
156-
between all members. Any single member failing will not result in a
157-
loss of operation
158+
between all members. Shard cluster operation will continue as normal
159+
but :ref:`chunk migration` will not occur.
160+
161+
Suggested course of action: replace :term:`config database` server
162+
as soon as possible. Shards will become unbalanced without chunk
163+
migration capability. Additional loss of :term:`config database`
164+
servers will put the shard cluster metadata in jeopardy.
165+

0 commit comments

Comments
 (0)