Skip to content

Implement BSON to JSON encoder that limits encoded string length #690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 8, 2021

Conversation

alcaeus
Copy link
Member

@alcaeus alcaeus commented Oct 14, 2020

This commit was extracted from #684, which is the POC for CDRIVER-3775. Since this logic is not entirely related to logging, I figured it was easier to review this separately.

This introduces a new bson_as_json_with_opts method, which is used to pass extra options to the encoding process. For now, the opts struct holds a mode (corresponding to the individual bson_as_json, bson_as_canonical_extended_json, and bson_as_relaxed_extended_json functions) and a maximum length, which is required by the new structured logging component.

The goal behind this function is that not all BSON is read and encoded, but that the process ends when the character limit is reached. For the time being, this works by checking the length limit in a visit_after visitor. This means that while we won't continue encoding BSON that is essentially dropped later, each field is always encoded in full and the JSON string is truncated later to not exceed the maximum length. This also stops the encoding process.

Example usage:

// Encode up to 1000 characters in canonical extended JSON
bson_json_opts_t opts = { BSON_JSON_MODE_CANONICAL, 1000 };
bson_t b;

char *str = bson_as_json_with_opts (b, NULL, opts);
// Encode the entire BSON document as relaxed extended json
bson_json_opts_t opts = { BSON_JSON_MODE_CANONICAL, BSON_MAX_LEN_UNLIMITED };
bson_t b;

char *str = bson_as_json_with_opts (b, NULL, opts);

The example above is equivalent to this:

bson_t b;

char *str = bson_as_relaxed_extended_json (b, NULL);

While working on this, I noticed that we don't really cover the current logic. I've added tests that encode into both relaxed and canonical extended formats with and without length limits. I've skipped the legacy encoding mode for the time being, but can add test coverage for that as well if desired. I've also included documentation for the new types (including bson_json_mode_t which previously was private). Please let me know if I missed something there.

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I have a couple small suggestions an an alternate API proposal.


typedef struct {
bson_json_mode_t mode;
int32_t max_len;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this struct opaque so we can add more options in the future without breaking ABI. Perhaps with this API:

bson_json_opts_t* opts = bson_json_opts_new (BSON_JSON_MODE_CANONICAL, BSON_MAX_LEN_UNLIMITED);
bson_json_opts_destroy (opts);

Copy link
Contributor

@samantharitter samantharitter Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this to an opaque API, but I'm not thrilled about where all the parts are right now:

  • public bson_json_opts_t typedef in bson-types.h
  • full struct definition in bson-json-private.h
  • new and destroy methods in bson-json.h

This seemed to be the most idiomatic (to our driver) way to separate these parts, but I think it's a bit confusing. Open to suggestions. There is no bson-types-private.h, and unfortunately bson.h requires the public bson_json_opts_t typedef.

*
* Denotes unlimited length limit when converting BSON to JSON.
*/
#define BSON_MAX_LEN_UNLIMITED -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 0 would be preferable. I don't think there's any realistic use of 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to say that -1 is more in-line with other driver functions that deal with string length, like the bson_append_* functions that take -1 to mean "no specified key length." Here, it would be kind of parallel, "no specified length limit," or unlimited. Though I'm not sure why you would use it, I would also expect setting 0 for the max length to mean empty logging strings, and I think it would be a little counterintuitive if 0 meant "unlimited."

@@ -3174,7 +3238,9 @@ _bson_as_json_visit_all (const bson_t *bson,
return NULL;
}

bson_string_append (state.str, " }");
if (!state.max_len_reached) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check that there is room for the last two characters? E.g. state.str->len + 2 <= state.max_len?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to inspect the remaining space in the limit before adding the closing space and curly brace.

@@ -3249,7 +3333,9 @@ bson_array_as_json (const bson_t *bson, size_t *length)
return NULL;
}

bson_string_append (state.str, " ]");
if (!state.max_len_reached) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should this check that there is room for the last two characters? E.g. state.str->len + 2 <= state.max_len?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this bit of the code, because this method is user-facing and doesn't actually take a limit. The max length is always set to BSON_MAX_LEN_UNLIMITED. Do we want to add a limit to this? Was this logic just future-proofing for a day when we do take a limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bson_array_as_json is existing API, so we can't change the behaviour here without a BC break, hence always using BSON_MAX_LEN_UNLIMITED. This method is not used at all by libmongoc itself, so I don't think there's a use-case for adding the limit there. If there is, there should be a bson_array_as_json_with_opts that takes a bson_json_opts_t to pass the length, as was done with the other bson_as_* methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alcaeus that makes a lot of sense! Why not leave this code the way it was before, then? It feels a bit unnecessary to me to check if we've reached max length if we don't ever have a length limit. Do you feel there's a benefit to leaving this logic in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, I lost track of this comment thread. We can remove the logic from this method since it will never deal with max length and can add this back in if we do decide that we want to support this for bson_array_as_json eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bson_json_opts_t _opts = {_mode, _max_len}; \
size_t _json_len; \
char *_str = bson_as_json_with_opts (_bson, &_json_len, &_opts); \
ASSERT_CMPSTR (_str, _expected); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding another check to ensure that the produced string does not exceed maximum specified length when truncation occurs.

if (_max_len != BSON_MAX_LEN_UNLIMITED) {
    ASSERT_CMPINT(_json_len, <=, _max_len);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@samantharitter samantharitter self-assigned this Dec 9, 2020
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Just have a few small comments.

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@samantharitter samantharitter merged commit 03ee519 into mongodb:master Jan 8, 2021
@alcaeus alcaeus deleted the truncating-bson-encoding branch January 10, 2021 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants