CDRIVER-5641: BSON Binary Vector Subtype Support #1868

ghost · 2025-02-15T06:51:21Z

Resolves CDRIVER-5641 by documenting and implementing new BSON Binary Vector APIs
Synced bson_binary_vector and bson_corpus spec tests from specifications commit 585b797c110b6709f81def6200b946b94c8d9c55
tests: Full support for the existing version of the spec test suite.
tests: TODOs mark spec test improvements that depend on DRIVERS-3095 and DRIVERS-3097.
tests: Additional API usage and fuzzing.
New BSON Binary APIs: bson_append_binary_uninit, bson_iter_binary_equal, bson_iter_binary_subtype, bson_iter_overwrite_binary
Extend existing big-endian byte swapping API for 32-bit float
Compiler support for restricted pointer aliasing using BSON_RESTRICT.

The concept: no new owned data types are defined. Instead, we encourage access to vector data that's already allocated inside a bson_t. When you append a vector to a bson_t, the result is a view. Views are effectively pointers decorated with type and length information, conferring a validity guarantee about the memory they reference.

The view data types are included in the ABI, and several performance-critical accessors are implemented as inline functions.

For review I'd recommend starting with the new binary_vector.rst documentation page and following the links from there. The documentation page for each view type includes a short example. These examples are also included in the test suite, in an adapted form.

It's tempting to automatically generate the repetitive portions of this patch, but so far I have opted instead to avoid the extra complexity.

Commits are separated into new files, synced spec files, and modified files.

Verified with:

evergreen builds: https://spruce.mongodb.com/version/67b023355d3da6000759923e/tasks
libbson tests compiled and run on powerpc debian in qemu, to cover 32-bit and big endian

From specifications repository commit 585b797c110b6709f81def6200b946b94c8d9c55

…nd friends

src/libbson/src/bson/bson.c

src/libbson/src/bson/bson-macros.h

src/libbson/tests/test-bson-vector.c

src/libbson/src/bson/bson-vector.h

src/libbson/src/bson/bson-iter.h

Co-authored-by: Ezra Chung <[email protected]>

* replace s/packed_bits/packed_bit/ * in filenames also * and fixup documentation headings and indents

…dition rather than address of subscript

The earlier fix for a bug in allocating bson_t near the max size revealed a flawed test that was only succeeding because of the allocation bug. This fixes the test to identify both sides of the boundary: we fail to allocate a bson_t that's one byte too large, and on 64-bit platforms we allocate a max-sized bson_t.

This reverts commit 6f732fc. (Moved to CDRIVER-5915)

This reverts commit 05bd56a. (Moved to CDRIVER-5915)

This reverts commit 758856a.

The main reason for this change is to avoid an unhelpful implementation of -Warray-bounds in gcc 11 which notices that the memcpy in these tests can be out-of-range, but doesn't notice that the out-of-range memcpy will never be executed. The easiest fix was to dynamically allocate the value buffers, to prevent gcc from associating range info with these pointers. This replaces some arbitrary nonzero test values with zeroes, out of convenience.

The new name doesn't misleadingly imply that the result is always a power of two.

ghost · 2025-02-27T22:47:26Z

This PR has commits to fix allocating max sized bson_t, to facilitate binary vector tests. I copied them out into #1891 for separate review.

eramongodb

Some minor suggestions remaining; otherwise, LGTM.

src/libbson/src/bson/bson-macros.h

src/libbson/tests/test-bson-vector.c

eramongodb · 2025-02-28T21:28:57Z

src/libbson/tests/test-bson-vector.c

+   int8_t *expected_elements = bson_malloc (MAX_TESTED_VECTOR_LENGTH * sizeof *expected_elements);
+   int8_t *actual_elements = bson_malloc (MAX_TESTED_VECTOR_LENGTH * sizeof *actual_elements);
+   for (int fuzz_iter = 0; fuzz_iter < FUZZ_TEST_ITERS; fuzz_iter++) {
+      int r = rand ();


The rand functions definitely leave a lot to be desired w.r.t. performance. Their addition in #898 was very much a "correctness first, performance later" decision. The suggestion to use them here was only for rand()-avoidance and reuse of existing API to easily achieve size_t type consistency without requiring explicit casts (i.e. in module expressions). imo given the execution of these tests are still under a second in total, I do not think the performance tradeoff is especially significant given the overall execution time of the test suite. If you still prefer the explicit rand(), I am fine with that decision.

src/libbson/tests/test-bson-vector.c

Co-authored-by: Ezra Chung <[email protected]>

ghost · 2025-02-28T23:25:32Z

Not sure why the Github UI is not giving me an option to reply directly to this above..

The rand functions definitely leave a lot to be desired w.r.t. performance. Their addition in #898 was very much a "correctness first, performance later" decision. The suggestion to use them here was only for rand()-avoidance and reuse of existing API to easily achieve size_t type consistency without requiring explicit casts (i.e. in module expressions). imo given the execution of these tests are still under a second in total, I do not think the performance tradeoff is especially significant given the overall execution time of the test suite. If you still prefer the explicit rand(), I am fine with that decision.

Switching to the wrapper just for consistency here would really not be an improvement. It's not just a waste of CPU time, it's a waste of the (extremely limited!) state space of the 32-bit PRNG on windows. Each call to the wrapper consumes 5 rand() calls, so the result is a 64-bit wide number with a period of only (2^32)/5.

* evaluate inputs once, inside parenthesis * show errors with a hex dump instead of assuming the data is printable * include error location in the output

…ents

This reverts commit 35b0b31.

…ax size" This reverts commit f2bb318.

vector-of-bool

LGTM

vector-of-bool · 2025-03-06T18:43:48Z

src/libbson/src/bson/bson-vector.c

My only concern is the large amount of code duplication between vector types and the const/non-const variants. I can't think of a pretty solution that doesn't involve macro trickery, though.

Yeah, I feel similarly. It's been tempting to macro-ize it more, but I feel like I'd be setting a trap for folks who are trying to debug this code later. The lowest hanging fruit for de-duplication at this point might be some kind of template engine for the documentation pages?

Micah Scott added 5 commits February 14, 2025 15:25

Sync bson_corpus spec tests

2b0af51

From specifications repository commit 585b797c110b6709f81def6200b946b94c8d9c55

Sync bson binary vector spec tests

c22ad4e

From specifications repository commit 585b797c110b6709f81def6200b946b94c8d9c55

New documentation pages

9287971

New source files for the implementation of BSON Binary Vector

b01c769

Changes to integrate bson-binary-vector with libbson

c1ab6e9

ghost requested review from eramongodb and vector-of-bool February 15, 2025 06:51

Micah Scott added 5 commits February 18, 2025 08:31

Note about the differing public and private API of bson_iter_binary a…

6fe0d0a

…nd friends

Public APIs for direct pointer access to int8 vectors

9263e89

Trivial, fix inline placement

ae91e0d

Trivial, avoid temporarily viewing header bytes as int8_t

75a0984

Fix typo in man_page definition

6168cb2

ghost mentioned this pull request Feb 21, 2025

DRIVERS-3031, BSON Binary Vector clarifications: goals, non-goals, terms, and scope mongodb/specifications#1753

Closed

1 task

eramongodb requested changes Feb 25, 2025

View reviewed changes

mdbmes and others added 16 commits February 25, 2025 12:45

Update src/libbson/src/bson/bson.c

1a68f6a

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson.c

6270bc5

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson.c

21c5054

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson.c

edc9b20

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson-vector.c

fe491be

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson-vector-private.h

217562c

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson-vector.h

f6b58c0

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/src/bson/bson-vector.h

3fa7c05

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/tests/test-bson-vector.c

12e6edf

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/tests/test-bson-vector.c

41a7ebb

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/tests/test-bson-vector.c

149d441

Co-authored-by: Ezra Chung <[email protected]>

Rename packed_bits to packed_bit

4b2d6f3

* replace s/packed_bits/packed_bit/ * in filenames also * and fixup documentation headings and indents

Merge branch 'master' into CDRIVER-5641

fb3967c

Represent array keys with uint32_t not size_t

a17e716

In public headers, disable pointer arithmetic warnings from -Weverything

f3527ec

Style change, prefer to write element address calculation as plain ad…

dc9d22f

…dition rather than address of subscript

Micah Scott added 13 commits February 26, 2025 21:29

Edge case tests for sizing, allocation, and accessors

3969826

Move read/write edge case tests to macros

57a2a0d

Merge branch 'master' into CDRIVER-5641

abbe3cc

Revert "libbson bugfix, allow allocating max size documents"

d1b35d0

This reverts commit 6f732fc. (Moved to CDRIVER-5915)

Revert "bugfix for test_bson_reserve_buffer_errors"

91adb2a

This reverts commit 05bd56a. (Moved to CDRIVER-5915)

CDRIVER-5915: Fix for allocation of bson_t larger than half max size

f2bb318

Revert "Use INFINITY and NAN for float but not double"

c1363b0

This reverts commit 758856a.

clang-format

277dda0

Explicitly cast INFINITY to double

f8a3497

Fix missing underline from earlier _uninit change

fc7772d

Reconsider function name (_bson_round_up_alloc_size)

35b0b31

The new name doesn't misleadingly imply that the result is always a power of two.

ghost requested a review from eramongodb February 28, 2025 01:44

eramongodb approved these changes Feb 28, 2025

View reviewed changes

mdbmes and others added 2 commits February 28, 2025 15:08

Update src/libbson/tests/test-bson-vector.c

7b2c324

Co-authored-by: Ezra Chung <[email protected]>

Update src/libbson/tests/test-bson-vector.c

4337a3d

Co-authored-by: Ezra Chung <[email protected]>

Micah Scott added 8 commits February 28, 2025 15:31

End statement-like macros with a statement that requires a semicolon

f89c255

Replace BSON_ASSERT with ASSERT in test-bson-vector

66104f7

Explicit casts for all rand() calls in test-bson-vector

36f544d

Improve ASSERT_MEMCMP somewhat

ca5171d

* evaluate inputs once, inside parenthesis * show errors with a hex dump instead of assuming the data is printable * include error location in the output

Use ASSERT_MEMCMP for exact comparison of float32 and other bulk elem…

1702a2a

…ents

Revert "Reconsider function name (_bson_round_up_alloc_size)"

3e3d4db

This reverts commit 35b0b31.

Revert "CDRIVER-5915: Fix for allocation of bson_t larger than half m…

bcb2b87

…ax size" This reverts commit f2bb318.

Merge branch 'master' into CDRIVER-5641

8e41a4c

vector-of-bool approved these changes Mar 6, 2025

View reviewed changes

ghost merged commit 1126b6e into mongodb:master Mar 6, 2025
40 of 42 checks passed

ghost mentioned this pull request Mar 6, 2025

CDRIVER-5641: Build fix for _FORTIFY_SOURCE #1899

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CDRIVER-5641: BSON Binary Vector Subtype Support #1868

CDRIVER-5641: BSON Binary Vector Subtype Support #1868

Uh oh!

ghost commented Feb 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 27, 2025

Uh oh!

eramongodb left a comment

Uh oh!

Uh oh!

Uh oh!

eramongodb Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 28, 2025

Uh oh!

vector-of-bool left a comment

Uh oh!

vector-of-bool Mar 6, 2025

Uh oh!

ghost Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!

CDRIVER-5641: BSON Binary Vector Subtype Support #1868

CDRIVER-5641: BSON Binary Vector Subtype Support #1868

Uh oh!

Conversation

ghost commented Feb 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 27, 2025

Uh oh!

eramongodb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eramongodb Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 28, 2025

Uh oh!

vector-of-bool left a comment

Choose a reason for hiding this comment

Uh oh!

vector-of-bool Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

ghost Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!