-
Notifications
You must be signed in to change notification settings - Fork 455
Implement BSON to JSON encoder that limits encoded string length #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BSON to JSON encoder that limits encoded string length #690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I have a couple small suggestions an an alternate API proposal.
src/libbson/doc/bson_json_opts_t.rst
Outdated
|
||
typedef struct { | ||
bson_json_mode_t mode; | ||
int32_t max_len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making this struct opaque so we can add more options in the future without breaking ABI. Perhaps with this API:
bson_json_opts_t* opts = bson_json_opts_new (BSON_JSON_MODE_CANONICAL, BSON_MAX_LEN_UNLIMITED);
bson_json_opts_destroy (opts);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this to an opaque API, but I'm not thrilled about where all the parts are right now:
- public
bson_json_opts_t
typedef inbson-types.h
- full struct definition in
bson-json-private.h
new
anddestroy
methods inbson-json.h
This seemed to be the most idiomatic (to our driver) way to separate these parts, but I think it's a bit confusing. Open to suggestions. There is no bson-types-private.h
, and unfortunately bson.h
requires the public bson_json_opts_t
typedef.
src/libbson/src/bson/bson-types.h
Outdated
* | ||
* Denotes unlimited length limit when converting BSON to JSON. | ||
*/ | ||
#define BSON_MAX_LEN_UNLIMITED -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 0
would be preferable. I don't think there's any realistic use of 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to say that -1
is more in-line with other driver functions that deal with string length, like the bson_append_*
functions that take -1
to mean "no specified key length." Here, it would be kind of parallel, "no specified length limit," or unlimited. Though I'm not sure why you would use it, I would also expect setting 0
for the max length to mean empty logging strings, and I think it would be a little counterintuitive if 0
meant "unlimited."
src/libbson/src/bson/bson.c
Outdated
@@ -3174,7 +3238,9 @@ _bson_as_json_visit_all (const bson_t *bson, | |||
return NULL; | |||
} | |||
|
|||
bson_string_append (state.str, " }"); | |||
if (!state.max_len_reached) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check that there is room for the last two characters? E.g. state.str->len + 2 <= state.max_len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to inspect the remaining space in the limit before adding the closing space and curly brace.
src/libbson/src/bson/bson.c
Outdated
@@ -3249,7 +3333,9 @@ bson_array_as_json (const bson_t *bson, size_t *length) | |||
return NULL; | |||
} | |||
|
|||
bson_string_append (state.str, " ]"); | |||
if (!state.max_len_reached) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, should this check that there is room for the last two characters? E.g. state.str->len + 2 <= state.max_len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this bit of the code, because this method is user-facing and doesn't actually take a limit. The max length is always set to BSON_MAX_LEN_UNLIMITED
. Do we want to add a limit to this? Was this logic just future-proofing for a day when we do take a limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bson_array_as_json
is existing API, so we can't change the behaviour here without a BC break, hence always using BSON_MAX_LEN_UNLIMITED
. This method is not used at all by libmongoc itself, so I don't think there's a use-case for adding the limit there. If there is, there should be a bson_array_as_json_with_opts
that takes a bson_json_opts_t
to pass the length, as was done with the other bson_as_*
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alcaeus that makes a lot of sense! Why not leave this code the way it was before, then? It feels a bit unnecessary to me to check if we've reached max length if we don't ever have a length limit. Do you feel there's a benefit to leaving this logic in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, I lost track of this comment thread. We can remove the logic from this method since it will never deal with max length and can add this back in if we do decide that we want to support this for bson_array_as_json
eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/libbson/tests/test-json.c
Outdated
bson_json_opts_t _opts = {_mode, _max_len}; \ | ||
size_t _json_len; \ | ||
char *_str = bson_as_json_with_opts (_bson, &_json_len, &_opts); \ | ||
ASSERT_CMPSTR (_str, _expected); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding another check to ensure that the produced string does not exceed maximum specified length when truncation occurs.
if (_max_len != BSON_MAX_LEN_UNLIMITED) {
ASSERT_CMPINT(_json_len, <=, _max_len);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4b01d3b
to
7a7f271
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Just have a few small comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This commit was extracted from #684, which is the POC for CDRIVER-3775. Since this logic is not entirely related to logging, I figured it was easier to review this separately.
This introduces a new
bson_as_json_with_opts
method, which is used to pass extra options to the encoding process. For now, the opts struct holds a mode (corresponding to the individualbson_as_json
,bson_as_canonical_extended_json
, andbson_as_relaxed_extended_json
functions) and a maximum length, which is required by the new structured logging component.The goal behind this function is that not all BSON is read and encoded, but that the process ends when the character limit is reached. For the time being, this works by checking the length limit in a
visit_after
visitor. This means that while we won't continue encoding BSON that is essentially dropped later, each field is always encoded in full and the JSON string is truncated later to not exceed the maximum length. This also stops the encoding process.Example usage:
The example above is equivalent to this:
While working on this, I noticed that we don't really cover the current logic. I've added tests that encode into both relaxed and canonical extended formats with and without length limits. I've skipped the legacy encoding mode for the time being, but can add test coverage for that as well if desired. I've also included documentation for the new types (including
bson_json_mode_t
which previously was private). Please let me know if I missed something there.