|
1 | 1 | # Retryable Write Tests
|
2 | 2 |
|
3 |
| -______________________________________________________________________ |
4 |
| - |
5 | 3 | ## Introduction
|
6 | 4 |
|
7 | 5 | The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
|
@@ -106,17 +104,238 @@ Drivers should test that transactions IDs are always included in commands for su
|
106 | 104 |
|
107 | 105 | The following tests ensure that retryable writes work properly with replica sets and sharded clusters.
|
108 | 106 |
|
109 |
| -1. Test that retryable writes raise an exception when using the MMAPv1 storage engine. For this test, execute a write |
110 |
| - operation, such as `insertOne`, which should generate an exception. Assert that the error message is the replacement |
111 |
| - error message: |
| 107 | +### 1. Test that retryable writes raise an exception when using the MMAPv1 storage engine. |
| 108 | + |
| 109 | +For this test, execute a write operation, such as `insertOne`, which should generate an exception. Assert that the error |
| 110 | +message is the replacement error message: |
| 111 | + |
| 112 | +``` |
| 113 | +This MongoDB deployment does not support retryable writes. Please add |
| 114 | +retryWrites=false to your connection string. |
| 115 | +``` |
| 116 | + |
| 117 | +and the error code is 20. |
| 118 | + |
| 119 | +> [!NOTE] |
| 120 | +> Drivers that rely on `serverStatus` to determine the storage engine in use MAY skip this test for sharded clusters, |
| 121 | +> since `mongos` does not report this information in its `serverStatus` response. |
| 122 | +
|
| 123 | +### 2. Test that drivers properly retry after encountering PoolClearedErrors. |
| 124 | + |
| 125 | +This test MUST be implemented by any driver that implements the CMAP specification. |
| 126 | + |
| 127 | +This test requires MongoDB 4.3.4+ for both the `errorLabels` and `blockConnection` fail point options. |
| 128 | + |
| 129 | +1. Create a client with maxPoolSize=1 and retryWrites=true. If testing against a sharded deployment, be sure to connect |
| 130 | + to only a single mongos. |
| 131 | + |
| 132 | +2. Enable the following failpoint: |
| 133 | + |
| 134 | + ```javascript |
| 135 | + { |
| 136 | + configureFailPoint: "failCommand", |
| 137 | + mode: { times: 1 }, |
| 138 | + data: { |
| 139 | + failCommands: ["insert"], |
| 140 | + errorCode: 91, |
| 141 | + blockConnection: true, |
| 142 | + blockTimeMS: 1000, |
| 143 | + errorLabels: ["RetryableWriteError"] |
| 144 | + } |
| 145 | + } |
| 146 | + ``` |
| 147 | + |
| 148 | +3. Start two threads and attempt to perform an `insertOne` simultaneously on both. |
| 149 | + |
| 150 | +4. Verify that both `insertOne` attempts succeed. |
| 151 | + |
| 152 | +5. Via CMAP monitoring, assert that the first check out succeeds. |
| 153 | + |
| 154 | +6. Via CMAP monitoring, assert that a PoolClearedEvent is then emitted. |
| 155 | + |
| 156 | +7. Via CMAP monitoring, assert that the second check out then fails due to a connection error. |
| 157 | + |
| 158 | +8. Via Command Monitoring, assert that exactly three `insert` CommandStartedEvents were observed in total. |
112 | 159 |
|
| 160 | +9. Disable the failpoint. |
| 161 | + |
| 162 | +### 3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label. |
| 163 | + |
| 164 | +This test MUST: |
| 165 | + |
| 166 | +- be implemented by any driver that implements the Command Monitoring specification, |
| 167 | +- only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers. |
| 168 | +- be run against server versions 6.0 and above. |
| 169 | + |
| 170 | +Additionally, this test requires drivers to set a fail point after an `insertOne` operation but before the subsequent |
| 171 | +retry. Drivers that are unable to set a failCommand after the CommandSucceededEvent SHOULD use mocking or write a unit |
| 172 | +test to cover the same sequence of events. |
| 173 | + |
| 174 | +1. Create a client with `retryWrites=true`. |
| 175 | + |
| 176 | +2. Configure a fail point with error code `91` (ShutdownInProgress): |
| 177 | + |
| 178 | + ```javascript |
| 179 | + { |
| 180 | + configureFailPoint: "failCommand", |
| 181 | + mode: {times: 1}, |
| 182 | + data: { |
| 183 | + failCommands: ["insert"], |
| 184 | + errorLabels: ["RetryableWriteError"], |
| 185 | + writeConcernError: { code: 91 } |
| 186 | + } |
| 187 | + } |
| 188 | + ``` |
| 189 | + |
| 190 | +3. Via the command monitoring CommandSucceededEvent, configure a fail point with error code `10107` (NotWritablePrimary) |
| 191 | + and a NoWritesPerformed label: |
| 192 | + |
| 193 | + ```javascript |
| 194 | + { |
| 195 | + configureFailPoint: "failCommand", |
| 196 | + mode: {times: 1}, |
| 197 | + data: { |
| 198 | + failCommands: ["insert"], |
| 199 | + errorCode: 10107, |
| 200 | + errorLabels: ["RetryableWriteError", "NoWritesPerformed"] |
| 201 | + } |
| 202 | + } |
113 | 203 | ```
|
114 |
| - This MongoDB deployment does not support retryable writes. Please add |
115 |
| - retryWrites=false to your connection string. |
| 204 | + |
| 205 | + Drivers SHOULD only configure the `10107` fail point command if the the succeeded event is for the `91` error |
| 206 | + configured in step 2. |
| 207 | + |
| 208 | +4. Attempt an `insertOne` operation on any record for any database and collection. For the resulting error, assert that |
| 209 | + the associated error code is `91`. |
| 210 | + |
| 211 | +5. Disable the fail point: |
| 212 | + |
| 213 | + ```javascript |
| 214 | + { |
| 215 | + configureFailPoint: "failCommand", |
| 216 | + mode: "off" |
| 217 | + } |
116 | 218 | ```
|
117 | 219 |
|
118 |
| - and the error code is 20. |
| 220 | +### 4. Test that in a sharded cluster writes are retried on a different mongos when one is available. |
| 221 | + |
| 222 | +This test MUST be executed against a sharded cluster that has at least two mongos instances, supports |
| 223 | +`retryWrites=true`, has enabled the `configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+). |
| 224 | + |
| 225 | +> [!NOTE] |
| 226 | +> This test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior |
| 227 | +> intended to be tested) from "retry on a different mongos due to normal SDAM randomized suitable server selection". |
| 228 | +> Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code |
| 229 | +> coverage tool, etc. |
| 230 | +
|
| 231 | +1. Create two clients `s0` and `s1` that each connect to a single mongos from the sharded cluster. They must not connect |
| 232 | + to the same mongos. |
| 233 | + |
| 234 | +2. Configure the following fail point for both `s0` and `s1`: |
| 235 | + |
| 236 | + ```javascript |
| 237 | + { |
| 238 | + configureFailPoint: "failCommand", |
| 239 | + mode: { times: 1 }, |
| 240 | + data: { |
| 241 | + failCommands: ["insert"], |
| 242 | + errorCode: 6, |
| 243 | + errorLabels: ["RetryableWriteError"] |
| 244 | + } |
| 245 | + } |
| 246 | + ``` |
| 247 | + |
| 248 | +3. Create a client `client` with `retryWrites=true` that connects to the cluster using the same two mongoses as `s0` and |
| 249 | + `s1`. |
| 250 | + |
| 251 | +4. Enable failed command event monitoring for `client`. |
| 252 | + |
| 253 | +5. Execute an `insert` command with `client`. Assert that the command failed. |
| 254 | + |
| 255 | +6. Assert that two failed command events occurred. Assert that the failed command events occurred on different mongoses. |
| 256 | + |
| 257 | +7. Disable the fail points on both `s0` and `s1`. |
| 258 | + |
| 259 | +### 5. Test that in a sharded cluster writes are retried on the same mongos when no others are available. |
| 260 | + |
| 261 | +This test MUST be executed against a sharded cluster that supports `retryWrites=true`, has enabled the |
| 262 | +`configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+). |
| 263 | + |
| 264 | +Note: this test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior |
| 265 | +intended to be tested) from "retry on a different mongos due to normal SDAM behavior of randomized suitable server |
| 266 | +selection". Verify relevant code paths are correctly executed by the tests using external means such as a logging, |
| 267 | +debugger, code coverage tool, etc. |
| 268 | + |
| 269 | +1. Create a client `s0` that connects to a single mongos from the cluster. |
| 270 | + |
| 271 | +2. Configure the following fail point for `s0`: |
| 272 | + |
| 273 | + ```javascript |
| 274 | + { |
| 275 | + configureFailPoint: "failCommand", |
| 276 | + mode: { times: 1 }, |
| 277 | + data: { |
| 278 | + failCommands: ["insert"], |
| 279 | + errorCode: 6, |
| 280 | + errorLabels: ["RetryableWriteError"], |
| 281 | + closeConnection: true |
| 282 | + } |
| 283 | + } |
| 284 | + ``` |
| 285 | + |
| 286 | +3. Create a client `client` with `directConnection=false` (when not set by default) and `retryWrites=true` that connects |
| 287 | + to the cluster using the same single mongos as `s0`. |
| 288 | + |
| 289 | +4. Enable succeeded and failed command event monitoring for `client`. |
| 290 | + |
| 291 | +5. Execute an `insert` command with `client`. Assert that the command succeeded. |
| 292 | + |
| 293 | +6. Assert that exactly one failed command event and one succeeded command event occurred. Assert that both events |
| 294 | + occurred on the same mongos. |
| 295 | + |
| 296 | +7. Disable the fail point on `s0`. |
| 297 | + |
| 298 | +## Changelog |
| 299 | + |
| 300 | +- 2024-05-30: Migrated from reStructuredText to Markdown. |
| 301 | + |
| 302 | +- 2024-02-27: Convert legacy retryable writes tests to unified format. |
| 303 | + |
| 304 | +- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing\ |
| 305 | + execution of deprioritization code |
| 306 | + paths. |
| 307 | + |
| 308 | +- 2024-01-05: Fix typo in prose test title. |
| 309 | + |
| 310 | +- 2024-01-03: Note server version requirements for fail point options and revise\ |
| 311 | + tests to specify the `errorLabels` |
| 312 | + option at the top-level instead of within `writeConcernError`. |
| 313 | + |
| 314 | +- 2023-08-26: Add prose tests for retrying in a sharded cluster. |
| 315 | + |
| 316 | +- 2022-08-30: Add prose test verifying correct error handling for errors with\ |
| 317 | + the NoWritesPerformed label, which is to |
| 318 | + return the original error. |
| 319 | + |
| 320 | +- 2022-04-22: Clarifications to `serverless` and `useMultipleMongoses`. |
| 321 | + |
| 322 | +- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of\ |
| 323 | + `useMultipleMongoses` for `LoadBalanced` topologies. |
| 324 | + |
| 325 | +- 2021-04-23: Add `load-balanced` to test topology requirements. |
| 326 | + |
| 327 | +- 2021-03-24: Add prose test verifying `PoolClearedErrors` are retried. |
| 328 | + |
| 329 | +- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to\ |
| 330 | + `result` |
| 331 | + |
| 332 | +- 2019-08-07: Add Prose Tests section |
| 333 | + |
| 334 | +- 2019-06-07: Mention $merge stage for aggregate alongside $out |
| 335 | + |
| 336 | +- 2019-03-01: Add top-level `runOn` field to denote server version and/or\ |
| 337 | + topology requirements requirements for the |
| 338 | + test file. Removes the `minServerVersion` and `maxServerVersion` top-level fields, which are now expressed within |
| 339 | + `runOn` elements. |
119 | 340 |
|
120 |
| - [!NOTE] |
121 |
| - storage engine in use MAY skip this test for sharded clusters, since `mongos` does not report this information in its |
122 |
| - `serverStatus` response. |
| 341 | + Add test-level `useMultipleMongoses` field. |
0 commit comments