Skip to content

CDRIVER-4231: New random generation of OID values #931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

vector-of-bool
Copy link
Contributor

bson-context initializes five bytes with randomness. Previously, this was done by XOR'ing certain machine attribute into an integral seed value, and then calling rand_r(seed) to get a random value. Using plain XOR on those attributes would very often result in a seed collision, and rand_r() would only return four bytes of randomness, with the fifth byte being generated by another call to rand_r(), and the two sequence counters initialized with further calls to rand_r(). This resulted in at most 32 bits of randomness being used for nine bytes of random output, and the seed was very likely to collide in the first place.

This changeset instead generates a struct containing those same machine attributes, and then performs a SipHash of the bit representation of that struct, resulting in 16 bytes of psuedorandomness. These bytes are then stored as the random data and used to initialize the sequence counters.

The changeset has tweaked a few aspects of the context, which may need be un-tweaked.

- Restore hashing of machine attributes, plus an atomic init-counter,
  plus the process ID.
- Instead of MD5 hashing, use SipHash.
Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changeset has tweaked a few aspects of the context, which may need be un-tweaked.

Given it is declared in a private header, I think modifications should be fine.

then performs a SipHash of the bit representation of that struct

Please document the rationale behind the choice of SipHash for this purpose.

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with this repro that duplicates ObjectIDs are no longer produced an Ubuntu 18.04 spawn host.

Changes look good pending remaining questions, a fix to replace BSON_CONTEXT_DISABLE_HOST_CACHE with BSON_CONTEXT_DISABLE_PID_CACHE, and passing tests.

Since we are close to a release, I suggest configuring a full patch build to test debug-compile-* tasks on each platform once this is ready to merge.

No providing _rotl64. No different "thread-safe" OID generation.
@@ -249,7 +92,7 @@ _get_rand (unsigned int *pseed)
* --------------------------------------------------------------------------
*/
static void
_bson_context_get_hostname (char *out)
_bson_context_get_hostname (char out[HOST_NAME_MAX])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_bson_context_get_hostname (char out[HOST_NAME_MAX])
_bson_context_get_hostname (char* out)

Suggestion: avoid misleading array type parameters. Alternatively, deliberately opt into C99 "static array indices" (note: may not be supported by older compilers, e.g. VS 2013 and/or GCC 4.8):

Suggested change
_bson_context_get_hostname (char out[HOST_NAME_MAX])
_bson_context_get_hostname (char out[static HOST_NAME_MAX])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference with static is the introduction of additional UB consideration (and a helpful compiler warning). I switched from a plain pointer to an array to represent that the pointee must be an array of the given size. I'd add the static, but we aren't on C99 yet. Given that this is an internal-only API I think it is safe to use the array-decl as hints to the expected array-ness (and required bounds).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally opposed to syntax that differs from its actual semantics (for reasons colorfully described by Linus Torvalds). However, given it is not user-facing interface, its scope is limited enough that I do not consider its removal a blocker (I also see there are a few instances of it scattered throughout the C driver already, so there is precedent).

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional documentation and cleanup. LGTM!

@@ -249,7 +92,7 @@ _get_rand (unsigned int *pseed)
* --------------------------------------------------------------------------
*/
static void
_bson_context_get_hostname (char *out)
_bson_context_get_hostname (char out[HOST_NAME_MAX])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally opposed to syntax that differs from its actual semantics (for reasons colorfully described by Linus Torvalds). However, given it is not user-facing interface, its scope is limited enough that I do not consider its removal a blocker (I also see there are a few instances of it scattered throughout the C driver already, so there is precedent).

@vector-of-bool vector-of-bool merged commit ee64af1 into mongodb:master Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants