@@ -104,7 +104,7 @@ created subsequently, may reside on any shard in the cluster.
104
104
Failover scenarios within MongoDB
105
105
---------------------------------
106
106
107
- A properly deployed MongoDB shard cluster will not have a single point
107
+ A properly deployed MongoDB shard cluster will have no single point
108
108
of failure. This section describes potential points of failure within
109
109
a shard cluster and its recovery method.
110
110
@@ -116,42 +116,50 @@ For reference, a properly deployed MongoDB shard cluster consists of:
116
116
117
117
- :program:`mongos` running on each application server.
118
118
119
- Scenarios :
119
+ Potential failure scenarios :
120
120
121
121
- A :term:`mongos` or the application server failing.
122
122
123
123
As each application server is running its own :program:`mongos`
124
124
instance, the database is still accessible for other application
125
- servers. :program:`mongos` is stateless, so if it fails, no critical
126
- information is lost. When :program:`mongos` restarts, it will retrieve a copy
127
- of the configuration from the :term:`config database` and resume
128
- working.
125
+ servers and the data is intact . :program:`mongos` is stateless, so
126
+ if it fails, no critical information is lost. When :program:`mongos`
127
+ restarts, it will retrieve a copy of the configuration from the
128
+ :term:`config database` and resume working.
129
129
130
130
Suggested user intervention: restart application servers and/or
131
131
:program:`mongos`.
132
132
133
133
- A single :term:`mongod` suffers a failure in a shard.
134
134
135
- A single :term:`mongod` instance failing will be recovered by a
136
- :term:`secondary` member of the shard replica set. As each shard
137
- will have a single :term:`primary` and two :term:`secondary` members
138
- with the exact same copy of the information, any member will be able
139
- to replace the failed member.
135
+ A single :term:`mongod` instance failing within a shard will be
136
+ recovered by a :term:`secondary` member of the :term:` replica
137
+ set`. As each shard will have two :term:`secondary` members with the
138
+ exact same copy of the information, :term:`secondary` members will
139
+ be able to replace the failed :term:`primary` member.
140
140
141
- Suggested course of action: investigate failure and replace member
142
- as soon as possible. Additional loss of members on same shard will
143
- reduce availablility.
141
+ Suggested course of action: investigate failure and replace
142
+ :term:`primary` member as soon as possible. Additional loss of
143
+ members on same shard will reduce availablility and the shard
144
+ cluster's data set reliability.
144
145
145
146
- All three replica set members of a shard fail.
146
147
147
148
All data within that shard will be unavailable, but the shard
148
- cluster will still be operational for applications. Data on other
149
- shards will be accessible and new data can be written to other shard
150
- members.
149
+ cluster's other data will still be operational for applications and
150
+ new data can be written to other shard members.
151
151
152
- - A :term:`config database` suffers a failure.
152
+ Suggested course of action: investigate situation immediately.
153
+
154
+ - A :term:`config database` server suffers a failure.
153
155
154
156
As the :term:`config database` is deployed in a 3 member
155
157
configuration with two-phase commits to maintain synchronization
156
- between all members. Any single member failing will not result in a
157
- loss of operation
158
+ between all members. Shard cluster operation will continue as normal
159
+ but :ref:`chunk migration` will not occur.
160
+
161
+ Suggested course of action: replace :term:`config database` server
162
+ as soon as possible. Shards will become unbalanced without chunk
163
+ migration capability. Additional loss of :term:`config database`
164
+ servers will put the shard cluster metadata in jeopardy.
165
+
0 commit comments