Skip to content

Commit 68059f9

Browse files
migrate to new tests
1 parent 2b72b9b commit 68059f9

File tree

125 files changed

+15171
-9188
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+15171
-9188
lines changed

test/spec/retryable-writes/README.md

Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
# Retryable Write Tests
2+
3+
## Introduction
4+
5+
The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
6+
retryable writes. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md).
7+
8+
Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to
9+
be manually implemented by each driver.
10+
11+
Tests will require a MongoClient created with options defined in the tests. Integration tests will require a running
12+
MongoDB cluster with server versions 3.6.0 or later. The `{setFeatureCompatibilityVersion: 3.6}` admin command will also
13+
need to have been executed to enable support for retryable writes on the cluster. Some tests may have more stringent
14+
version requirements depending on the fail points used.
15+
16+
## Use as Integration Tests
17+
18+
Integration tests are expressed in YAML and can be run against a replica set or sharded cluster as denoted by the
19+
top-level `runOn` field. Tests that rely on the `onPrimaryTransactionalWrite` fail point cannot be run against a sharded
20+
cluster because the fail point is not supported by mongos.
21+
22+
The tests exercise the following scenarios:
23+
24+
- Single-statement write operations
25+
- Each test expecting a write result will encounter at-most one network error for the write command. Retry attempts
26+
should return without error and allow operation to succeed. Observation of the collection state will assert that the
27+
write occurred at-most once.
28+
- Each test expecting an error will encounter successive network errors for the write command. Observation of the
29+
collection state will assert that the write was never committed on the server.
30+
- Multi-statement write operations
31+
- Each test expecting a write result will encounter at-most one network error for some write command(s) in the batch.
32+
Retry attempts should return without error and allow the batch to ultimately succeed. Observation of the collection
33+
state will assert that each write occurred at-most once.
34+
- Each test expecting an error will encounter successive network errors for some write command in the batch. The batch
35+
will ultimately fail with an error, but observation of the collection state will assert that the failing write was
36+
never committed on the server. We may observe that earlier writes in the batch occurred at-most once.
37+
38+
We cannot test a scenario where the first and second attempts both encounter network errors but the write does actually
39+
commit during one of those attempts. This is because (1) the fail point only triggers when a write would be committed
40+
and (2) the skip and times options are mutually exclusive. That said, such a test would mainly assert the server's
41+
correctness for at-most once semantics and is not essential to assert driver correctness.
42+
43+
## Split Batch Tests
44+
45+
The YAML tests specify bulk write operations that are split by command type (e.g. sequence of insert, update, and delete
46+
commands). Multi-statement write operations may also be split due to `maxWriteBatchSize`, `maxBsonObjectSize`, or
47+
`maxMessageSizeBytes`.
48+
49+
For instance, an insertMany operation with five 10 MiB documents executed using OP_MSG payload type 0 (i.e. entire
50+
command in one document) would be split into five insert commands in order to respect the 16 MiB `maxBsonObjectSize`
51+
limit. The same insertMany operation executed using OP_MSG payload type 1 (i.e. command arguments pulled out into a
52+
separate payload vector) would be split into two insert commands in order to respect the 48 MB `maxMessageSizeBytes`
53+
limit.
54+
55+
Noting when a driver might split operations, the `onPrimaryTransactionalWrite` fail point's `skip` option may be used to
56+
control when the fail point first triggers. Once triggered, the fail point will transition to the `alwaysOn` state until
57+
disabled. Driver authors should also note that the server attempts to process all documents in a single insert command
58+
within a single commit (i.e. one insert command with five documents may only trigger the fail point once). This behavior
59+
is unique to insert commands (each statement in an update and delete command is processed independently).
60+
61+
If testing an insert that is split into two commands, a `skip` of one will allow the fail point to trigger on the second
62+
insert command (because all documents in the first command will be processed in the same commit). When testing an update
63+
or delete that is split into two commands, the `skip` should be set to the number of statements in the first command to
64+
allow the fail point to trigger on the second command.
65+
66+
## Command Construction Tests
67+
68+
Drivers should also assert that command documents are properly constructed with or without a transaction ID, depending
69+
on whether the write operation is supported.
70+
[Command Logging and Monitoring](../../command-logging-and-monitoring/command-logging-and-monitoring.rst) may be used to
71+
check for the presence of a `txnNumber` field in the command document. Note that command documents may always include an
72+
`lsid` field per the [Driver Session](../../sessions/driver-sessions.md) specification.
73+
74+
These tests may be run against both a replica set and shard cluster.
75+
76+
Drivers should test that transaction IDs are never included in commands for unsupported write operations:
77+
78+
- Write commands with unacknowledged write concerns (e.g. `{w: 0}`)
79+
- Unsupported single-statement write operations
80+
- `updateMany()`
81+
- `deleteMany()`
82+
- Unsupported multi-statement write operations
83+
- `bulkWrite()` that includes `UpdateMany` or `DeleteMany`
84+
- Unsupported write commands
85+
- `aggregate` with write stage (e.g. `$out`, `$merge`)
86+
87+
Drivers should test that transactions IDs are always included in commands for supported write operations:
88+
89+
- Supported single-statement write operations
90+
- `insertOne()`
91+
- `updateOne()`
92+
- `replaceOne()`
93+
- `deleteOne()`
94+
- `findOneAndDelete()`
95+
- `findOneAndReplace()`
96+
- `findOneAndUpdate()`
97+
- Supported multi-statement write operations
98+
- `insertMany()` with `ordered=true`
99+
- `insertMany()` with `ordered=false`
100+
- `bulkWrite()` with `ordered=true` (no `UpdateMany` or `DeleteMany`)
101+
- `bulkWrite()` with `ordered=false` (no `UpdateMany` or `DeleteMany`)
102+
103+
## Prose Tests
104+
105+
The following tests ensure that retryable writes work properly with replica sets and sharded clusters.
106+
107+
### 1. Test that retryable writes raise an exception when using the MMAPv1 storage engine.
108+
109+
For this test, execute a write operation, such as `insertOne`, which should generate an exception. Assert that the error
110+
message is the replacement error message:
111+
112+
```
113+
This MongoDB deployment does not support retryable writes. Please add
114+
retryWrites=false to your connection string.
115+
```
116+
117+
and the error code is 20.
118+
119+
> [!NOTE]
120+
> Drivers that rely on `serverStatus` to determine the storage engine in use MAY skip this test for sharded clusters,
121+
> since `mongos` does not report this information in its `serverStatus` response.
122+
123+
### 2. Test that drivers properly retry after encountering PoolClearedErrors.
124+
125+
This test MUST be implemented by any driver that implements the CMAP specification.
126+
127+
This test requires MongoDB 4.3.4+ for both the `errorLabels` and `blockConnection` fail point options.
128+
129+
1. Create a client with maxPoolSize=1 and retryWrites=true. If testing against a sharded deployment, be sure to connect
130+
to only a single mongos.
131+
132+
2. Enable the following failpoint:
133+
134+
```javascript
135+
{
136+
configureFailPoint: "failCommand",
137+
mode: { times: 1 },
138+
data: {
139+
failCommands: ["insert"],
140+
errorCode: 91,
141+
blockConnection: true,
142+
blockTimeMS: 1000,
143+
errorLabels: ["RetryableWriteError"]
144+
}
145+
}
146+
```
147+
148+
3. Start two threads and attempt to perform an `insertOne` simultaneously on both.
149+
150+
4. Verify that both `insertOne` attempts succeed.
151+
152+
5. Via CMAP monitoring, assert that the first check out succeeds.
153+
154+
6. Via CMAP monitoring, assert that a PoolClearedEvent is then emitted.
155+
156+
7. Via CMAP monitoring, assert that the second check out then fails due to a connection error.
157+
158+
8. Via Command Monitoring, assert that exactly three `insert` CommandStartedEvents were observed in total.
159+
160+
9. Disable the failpoint.
161+
162+
### 3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label.
163+
164+
This test MUST:
165+
166+
- be implemented by any driver that implements the Command Monitoring specification,
167+
- only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers.
168+
- be run against server versions 6.0 and above.
169+
170+
Additionally, this test requires drivers to set a fail point after an `insertOne` operation but before the subsequent
171+
retry. Drivers that are unable to set a failCommand after the CommandSucceededEvent SHOULD use mocking or write a unit
172+
test to cover the same sequence of events.
173+
174+
1. Create a client with `retryWrites=true`.
175+
176+
2. Configure a fail point with error code `91` (ShutdownInProgress):
177+
178+
```javascript
179+
{
180+
configureFailPoint: "failCommand",
181+
mode: {times: 1},
182+
data: {
183+
failCommands: ["insert"],
184+
errorLabels: ["RetryableWriteError"],
185+
writeConcernError: { code: 91 }
186+
}
187+
}
188+
```
189+
190+
3. Via the command monitoring CommandSucceededEvent, configure a fail point with error code `10107` (NotWritablePrimary)
191+
and a NoWritesPerformed label:
192+
193+
```javascript
194+
{
195+
configureFailPoint: "failCommand",
196+
mode: {times: 1},
197+
data: {
198+
failCommands: ["insert"],
199+
errorCode: 10107,
200+
errorLabels: ["RetryableWriteError", "NoWritesPerformed"]
201+
}
202+
}
203+
```
204+
205+
Drivers SHOULD only configure the `10107` fail point command if the the succeeded event is for the `91` error
206+
configured in step 2.
207+
208+
4. Attempt an `insertOne` operation on any record for any database and collection. For the resulting error, assert that
209+
the associated error code is `91`.
210+
211+
5. Disable the fail point:
212+
213+
```javascript
214+
{
215+
configureFailPoint: "failCommand",
216+
mode: "off"
217+
}
218+
```
219+
220+
### 4. Test that in a sharded cluster writes are retried on a different mongos when one is available.
221+
222+
This test MUST be executed against a sharded cluster that has at least two mongos instances, supports
223+
`retryWrites=true`, has enabled the `configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+).
224+
225+
> [!NOTE]
226+
> This test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior
227+
> intended to be tested) from "retry on a different mongos due to normal SDAM randomized suitable server selection".
228+
> Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code
229+
> coverage tool, etc.
230+
231+
1. Create two clients `s0` and `s1` that each connect to a single mongos from the sharded cluster. They must not connect
232+
to the same mongos.
233+
234+
2. Configure the following fail point for both `s0` and `s1`:
235+
236+
```javascript
237+
{
238+
configureFailPoint: "failCommand",
239+
mode: { times: 1 },
240+
data: {
241+
failCommands: ["insert"],
242+
errorCode: 6,
243+
errorLabels: ["RetryableWriteError"]
244+
}
245+
}
246+
```
247+
248+
3. Create a client `client` with `retryWrites=true` that connects to the cluster using the same two mongoses as `s0` and
249+
`s1`.
250+
251+
4. Enable failed command event monitoring for `client`.
252+
253+
5. Execute an `insert` command with `client`. Assert that the command failed.
254+
255+
6. Assert that two failed command events occurred. Assert that the failed command events occurred on different mongoses.
256+
257+
7. Disable the fail points on both `s0` and `s1`.
258+
259+
### 5. Test that in a sharded cluster writes are retried on the same mongos when no others are available.
260+
261+
This test MUST be executed against a sharded cluster that supports `retryWrites=true`, has enabled the
262+
`configureFailPoint` command, and supports the `errorLabels` field (MongoDB 4.3.1+).
263+
264+
Note: this test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior
265+
intended to be tested) from "retry on a different mongos due to normal SDAM behavior of randomized suitable server
266+
selection". Verify relevant code paths are correctly executed by the tests using external means such as a logging,
267+
debugger, code coverage tool, etc.
268+
269+
1. Create a client `s0` that connects to a single mongos from the cluster.
270+
271+
2. Configure the following fail point for `s0`:
272+
273+
```javascript
274+
{
275+
configureFailPoint: "failCommand",
276+
mode: { times: 1 },
277+
data: {
278+
failCommands: ["insert"],
279+
errorCode: 6,
280+
errorLabels: ["RetryableWriteError"],
281+
closeConnection: true
282+
}
283+
}
284+
```
285+
286+
3. Create a client `client` with `directConnection=false` (when not set by default) and `retryWrites=true` that connects
287+
to the cluster using the same single mongos as `s0`.
288+
289+
4. Enable succeeded and failed command event monitoring for `client`.
290+
291+
5. Execute an `insert` command with `client`. Assert that the command succeeded.
292+
293+
6. Assert that exactly one failed command event and one succeeded command event occurred. Assert that both events
294+
occurred on the same mongos.
295+
296+
7. Disable the fail point on `s0`.
297+
298+
## Changelog
299+
300+
- 2024-05-30: Migrated from reStructuredText to Markdown.
301+
302+
- 2024-02-27: Convert legacy retryable writes tests to unified format.
303+
304+
- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing\
305+
execution of deprioritization code
306+
paths.
307+
308+
- 2024-01-05: Fix typo in prose test title.
309+
310+
- 2024-01-03: Note server version requirements for fail point options and revise\
311+
tests to specify the `errorLabels`
312+
option at the top-level instead of within `writeConcernError`.
313+
314+
- 2023-08-26: Add prose tests for retrying in a sharded cluster.
315+
316+
- 2022-08-30: Add prose test verifying correct error handling for errors with\
317+
the NoWritesPerformed label, which is to
318+
return the original error.
319+
320+
- 2022-04-22: Clarifications to `serverless` and `useMultipleMongoses`.
321+
322+
- 2021-08-27: Add `serverless` to `runOn`. Clarify behavior of\
323+
`useMultipleMongoses` for `LoadBalanced` topologies.
324+
325+
- 2021-04-23: Add `load-balanced` to test topology requirements.
326+
327+
- 2021-03-24: Add prose test verifying `PoolClearedErrors` are retried.
328+
329+
- 2019-10-21: Add `errorLabelsContain` and `errorLabelsContain` fields to\
330+
`result`
331+
332+
- 2019-08-07: Add Prose Tests section
333+
334+
- 2019-06-07: Mention $merge stage for aggregate alongside $out
335+
336+
- 2019-03-01: Add top-level `runOn` field to denote server version and/or\
337+
topology requirements requirements for the
338+
test file. Removes the `minServerVersion` and `maxServerVersion` top-level fields, which are now expressed within
339+
`runOn` elements.
340+
341+
Add test-level `useMultipleMongoses` field.

0 commit comments

Comments
 (0)