-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-4231: New random generation of OID values #931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDRIVER-4231: New random generation of OID values #931
Conversation
- Restore hashing of machine attributes, plus an atomic init-counter, plus the process ID. - Instead of MD5 hashing, use SipHash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changeset has tweaked a few aspects of the context, which may need be un-tweaked.
Given it is declared in a private header, I think modifications should be fine.
then performs a SipHash of the bit representation of that struct
Please document the rationale behind the choice of SipHash for this purpose.
…what's happening in the random init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with this repro that duplicates ObjectIDs are no longer produced an Ubuntu 18.04 spawn host.
Changes look good pending remaining questions, a fix to replace BSON_CONTEXT_DISABLE_HOST_CACHE
with BSON_CONTEXT_DISABLE_PID_CACHE
, and passing tests.
Since we are close to a release, I suggest configuring a full patch build to test debug-compile-*
tasks on each platform once this is ready to merge.
No providing _rotl64. No different "thread-safe" OID generation.
@@ -249,7 +92,7 @@ _get_rand (unsigned int *pseed) | |||
* -------------------------------------------------------------------------- | |||
*/ | |||
static void | |||
_bson_context_get_hostname (char *out) | |||
_bson_context_get_hostname (char out[HOST_NAME_MAX]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_bson_context_get_hostname (char out[HOST_NAME_MAX]) | |
_bson_context_get_hostname (char* out) |
Suggestion: avoid misleading array type parameters. Alternatively, deliberately opt into C99 "static array indices" (note: may not be supported by older compilers, e.g. VS 2013 and/or GCC 4.8):
_bson_context_get_hostname (char out[HOST_NAME_MAX]) | |
_bson_context_get_hostname (char out[static HOST_NAME_MAX]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference with static
is the introduction of additional UB consideration (and a helpful compiler warning). I switched from a plain pointer to an array to represent that the pointee must be an array of the given size. I'd add the static
, but we aren't on C99 yet. Given that this is an internal-only API I think it is safe to use the array-decl as hints to the expected array-ness (and required bounds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally opposed to syntax that differs from its actual semantics (for reasons colorfully described by Linus Torvalds). However, given it is not user-facing interface, its scope is limited enough that I do not consider its removal a blocker (I also see there are a few instances of it scattered throughout the C driver already, so there is precedent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional documentation and cleanup. LGTM!
@@ -249,7 +92,7 @@ _get_rand (unsigned int *pseed) | |||
* -------------------------------------------------------------------------- | |||
*/ | |||
static void | |||
_bson_context_get_hostname (char *out) | |||
_bson_context_get_hostname (char out[HOST_NAME_MAX]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally opposed to syntax that differs from its actual semantics (for reasons colorfully described by Linus Torvalds). However, given it is not user-facing interface, its scope is limited enough that I do not consider its removal a blocker (I also see there are a few instances of it scattered throughout the C driver already, so there is precedent).
bson-context initializes five bytes with randomness. Previously, this was done by XOR'ing certain machine attribute into an integral seed value, and then calling
rand_r(seed)
to get a random value. Using plain XOR on those attributes would very often result in a seed collision, andrand_r()
would only return four bytes of randomness, with the fifth byte being generated by another call torand_r()
, and the two sequence counters initialized with further calls torand_r()
. This resulted in at most 32 bits of randomness being used for nine bytes of random output, and the seed was very likely to collide in the first place.This changeset instead generates a struct containing those same machine attributes, and then performs a SipHash of the bit representation of that struct, resulting in 16 bytes of psuedorandomness. These bytes are then stored as the random data and used to initialize the sequence counters.
The changeset has tweaked a few aspects of the context, which may need be un-tweaked.