Skip to content

Commit d184525

Browse files
vector-of-boolkevinAlbseramongodb
authored
[CDRIVER-6017] BSON Validation Refactor (#2026)
* New BSON validation routine rewrite The new `bson_validate` implementation does not make use of the error-prone `bson_visit` APIs. Instead, it is written as a simple recursive validator. The new validator respects requests for UTF-8 validation properly. * Stop validating at 1000 depth, preventing stack overflow * Replace most BSON validation tests with generated ones The existing test cases used BSON files, and didn't have any commentary on what they were actually testing. New test cases are generated from a Python shorthand and contain the tested bytes inline, with a distinct test case for each actual validation scenario. * Disable UTF-8 validation by default on CRUD APIs * Document and tweak the value of BSON_VALIDATE_CORRUPT * Add test cases related to the overlong null encoding * Tweak JS scope validation to permit more obj keys * Add a NEWS entry for validation changes. * Allow `-private.h` headers to not include the prelude header --------- Co-authored-by: Kevin Albertson <[email protected]> Co-authored-by: Ezra Chung <[email protected]>
1 parent bd745bb commit d184525

16 files changed

+4759
-628
lines changed

.evergreen/scripts/check-preludes.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
MONGOC_PREFIX / "mongoc-prelude.h",
3636
MONGOC_PREFIX / "mongoc.h",
3737
],
38-
"include": '#include <mongoc/mongoc-prelude.h>',
38+
"include": "#include <mongoc/mongoc-prelude.h>",
3939
},
4040
{
4141
"name": "libbson",
@@ -50,7 +50,7 @@
5050
"name": "common",
5151
"headers": list(COMMON_PREFIX.glob("*.h")),
5252
"exclusions": [COMMON_PREFIX / "common-prelude.h"],
53-
"include": '#include <common-prelude.h>',
53+
"include": "#include <common-prelude.h>",
5454
},
5555
]
5656

@@ -59,7 +59,7 @@
5959
print(f"Checking headers for {NAME}")
6060
assert len(check["headers"]) > 0
6161
for header in check["headers"]:
62-
if header in check["exclusions"]:
62+
if header in check["exclusions"] or header.name.endswith("-private.h"):
6363
continue
6464
lines = Path(header).read_text(encoding="utf-8").splitlines()
6565
if check["include"] not in lines:

src/libbson/NEWS

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
Unreleased
2+
==========
3+
4+
Fixes:
5+
6+
* Various fixes have been applied to the `bson_validate` family of functions,
7+
with some minor behavioral changes.
8+
* Previously accepted invalid UTF-8 will be rejected when `BSON_VALIDATE_UTF8`
9+
is specified.
10+
* The scope document in a deprecated "code with scope" element is now
11+
validated with a fixed set of rules and is treated as an opaque JavaScript
12+
object.
13+
* A document nesting limit is now enforced during validation.
14+
115
libbson 2.0.1
216
=============
317

src/libbson/doc/bson_validate_flags_t.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Synopsis
1919
BSON_VALIDATE_DOT_KEYS = (1 << 2),
2020
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
2121
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
22+
BSON_VALIDATE_CORRUPT = (1 << 5),
2223
} bson_validate_flags_t;
2324
2425
Description
@@ -40,6 +41,8 @@ Each defined flag aside from ``BSON_VALIDATE_NONE`` describes an optional valida
4041
* ``BSON_VALIDATE_DOLLAR_KEYS`` Prohibit keys that start with ``$`` outside of a "DBRef" subdocument.
4142
* ``BSON_VALIDATE_DOT_KEYS`` Prohibit keys that contain ``.`` anywhere in the string.
4243
* ``BSON_VALIDATE_EMPTY_KEYS`` Prohibit zero-length keys.
44+
* ``BSON_VALIDATE_CORRUPT`` is not a control flag, but is used as an error code
45+
when a validation routine encounters corrupt BSON data.
4346

4447
.. seealso::
4548

src/libbson/src/bson/bson-types.h

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -185,25 +185,54 @@ typedef struct {
185185

186186

187187
/**
188-
* bson_validate_flags_t:
188+
* @brief Flags and error codes for BSON validation functions.
189189
*
190-
* This enumeration is used for validation of BSON documents. It allows
191-
* selective control on what you wish to validate.
190+
* Pass these flags bits to control the behavior of the `bson_validate` family
191+
* of functions.
192192
*
193-
* %BSON_VALIDATE_NONE: No additional validation occurs.
194-
* %BSON_VALIDATE_UTF8: Check that strings are valid UTF-8.
195-
* %BSON_VALIDATE_DOLLAR_KEYS: Check that keys do not start with $.
196-
* %BSON_VALIDATE_DOT_KEYS: Check that keys do not contain a period.
197-
* %BSON_VALIDATE_UTF8_ALLOW_NULL: Allow NUL bytes in UTF-8 text.
198-
* %BSON_VALIDATE_EMPTY_KEYS: Prohibit zero-length field names
193+
* Additionally, if validation fails, then the error code set on a `bson_error_t`
194+
* will have the value corresponding to the reason that validation failed.
199195
*/
200196
typedef enum {
197+
/**
198+
* @brief No special validation behavior specified.
199+
*/
201200
BSON_VALIDATE_NONE = 0,
201+
/**
202+
* @brief Check that all text components of the BSON data are valid UTF-8.
203+
*
204+
* Note that this will also cause validation to reject valid text that contains
205+
* a null character. This can be changed by also passing
206+
* `BSON_VALIDATE_UTF8_ALLOW_NULL`
207+
*/
202208
BSON_VALIDATE_UTF8 = (1 << 0),
209+
/**
210+
* @brief Check that element keys do not begin with an ASCII dollar `$`
211+
*/
203212
BSON_VALIDATE_DOLLAR_KEYS = (1 << 1),
213+
/**
214+
* @brief Check that element keys do not contain an ASCII period `.`
215+
*/
204216
BSON_VALIDATE_DOT_KEYS = (1 << 2),
217+
/**
218+
* @brief If set then it is *not* an error for a UTF-8 string to contain
219+
* embedded null characters.
220+
*
221+
* This has no effect unless `BSON_VALIDATE_UTF8` is also passed.
222+
*/
205223
BSON_VALIDATE_UTF8_ALLOW_NULL = (1 << 3),
224+
/**
225+
* @brief Check that no element key is a zero-length empty string.
226+
*/
206227
BSON_VALIDATE_EMPTY_KEYS = (1 << 4),
228+
/**
229+
* @brief This is not a flag that controls behavior, but is instead used to indicate
230+
* that a BSON document is corrupted in some way. This is the value that will
231+
* appear as an error code.
232+
*
233+
* Passing this as a flag has no effect.
234+
*/
235+
BSON_VALIDATE_CORRUPT = (1 << 5),
207236
} bson_validate_flags_t;
208237

209238

0 commit comments

Comments
 (0)