-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-5641: BSON Binary Vector Subtype Support #1868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
From specifications repository commit 585b797c110b6709f81def6200b946b94c8d9c55
From specifications repository commit 585b797c110b6709f81def6200b946b94c8d9c55
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
* replace s/packed_bits/packed_bit/ * in filenames also * and fixup documentation headings and indents
…dition rather than address of subscript
The earlier fix for a bug in allocating bson_t near the max size revealed a flawed test that was only succeeding because of the allocation bug. This fixes the test to identify both sides of the boundary: we fail to allocate a bson_t that's one byte too large, and on 64-bit platforms we allocate a max-sized bson_t.
This reverts commit 6f732fc. (Moved to CDRIVER-5915)
This reverts commit 05bd56a. (Moved to CDRIVER-5915)
This reverts commit 758856a.
The main reason for this change is to avoid an unhelpful implementation of -Warray-bounds in gcc 11 which notices that the memcpy in these tests can be out-of-range, but doesn't notice that the out-of-range memcpy will never be executed. The easiest fix was to dynamically allocate the value buffers, to prevent gcc from associating range info with these pointers. This replaces some arbitrary nonzero test values with zeroes, out of convenience.
The new name doesn't misleadingly imply that the result is always a power of two.
This PR has commits to fix allocating max sized bson_t, to facilitate binary vector tests. I copied them out into #1891 for separate review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions remaining; otherwise, LGTM.
src/libbson/tests/test-bson-vector.c
Outdated
int8_t *expected_elements = bson_malloc (MAX_TESTED_VECTOR_LENGTH * sizeof *expected_elements); | ||
int8_t *actual_elements = bson_malloc (MAX_TESTED_VECTOR_LENGTH * sizeof *actual_elements); | ||
for (int fuzz_iter = 0; fuzz_iter < FUZZ_TEST_ITERS; fuzz_iter++) { | ||
int r = rand (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rand functions definitely leave a lot to be desired w.r.t. performance. Their addition in #898 was very much a "correctness first, performance later" decision. The suggestion to use them here was only for rand()
-avoidance and reuse of existing API to easily achieve size_t
type consistency without requiring explicit casts (i.e. in module expressions). imo given the execution of these tests are still under a second in total, I do not think the performance tradeoff is especially significant given the overall execution time of the test suite. If you still prefer the explicit rand()
, I am fine with that decision.
Co-authored-by: Ezra Chung <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Not sure why the Github UI is not giving me an option to reply directly to this above..
Switching to the wrapper just for consistency here would really not be an improvement. It's not just a waste of CPU time, it's a waste of the (extremely limited!) state space of the 32-bit PRNG on windows. Each call to the wrapper consumes 5 rand() calls, so the result is a 64-bit wide number with a period of only (2^32)/5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is the large amount of code duplication between vector types and the const
/non-const
variants. I can't think of a pretty solution that doesn't involve macro trickery, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel similarly. It's been tempting to macro-ize it more, but I feel like I'd be setting a trap for folks who are trying to debug this code later. The lowest hanging fruit for de-duplication at this point might be some kind of template engine for the documentation pages?
The concept: no new owned data types are defined. Instead, we encourage access to vector data that's already allocated inside a bson_t. When you append a vector to a bson_t, the result is a view. Views are effectively pointers decorated with type and length information, conferring a validity guarantee about the memory they reference.
The view data types are included in the ABI, and several performance-critical accessors are implemented as inline functions.
For review I'd recommend starting with the new
binary_vector.rst
documentation page and following the links from there. The documentation page for each view type includes a short example. These examples are also included in the test suite, in an adapted form.It's tempting to automatically generate the repetitive portions of this patch, but so far I have opted instead to avoid the extra complexity.
Commits are separated into new files, synced spec files, and modified files.
Verified with: