-
Notifications
You must be signed in to change notification settings - Fork 456
New internal bounded string append APIs #1816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If TXT strings extended past the end of the TXT record, this had no check to prevent a read past the buffer. I don't think the libc resolver does its own validation on these records at all, so we do need to validate here.
This isn't actually needed unless int is 64-bit, not common, but it was bugging me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial feedback. Still in review.
Co-authored-by: Ezra Chung <[email protected]>
Many of these are redundant, but some at least should be optimized out. This will improve some of our error messages at the expense of code size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remarkably, I only have minor comments. LGTM. Great work.
src/libbson/tests/test-string.c
Outdated
|
||
// Compatibility wrapper; deprecated private API. | ||
bson_string_t * | ||
_bson_string_alloc (const size_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removing _bson_string_alloc
and _bson_string_append_ex
and associated tests. They were private API before, and appear no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the wrappers here are only needed for the tests. I was trying to preserve existing test coverage during the refactor. If you think it'd be cleaner to remove them now, I'll include that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think remove. I see no reason to preserve test coverage. Test code using _bson_string_alloc
and _bson_string_append_ex
appear only to be test the functions themselves.
/* Parameters and outcome for a bounded append operation on a mcommon_string_t. Individual type-specific append | ||
* functions can consume this struct to communicate bounds info. "max_len_exceeded" can be tested any time an | ||
* algorithmic exit is convenient; the actual appended content will be limited by max_len. Truncation is guaranteed not | ||
* to split a valid UTF-8 byte sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the behavior of truncation in bson_as_json_with_opts
was previously unspecified. Request add an entry to src/libbson/NEWS
to note the possible output change:
libbson 1.30.0 (Unreleased)
===========================
Fixes:
* Truncated output of `bson_as_json_with_opts` is changed to no longer split valid UTF-8 sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Yes, my understanding is that the behavior was unspecified before.
comment wording Co-authored-by: Kevin Albertson <[email protected]>
Co-authored-by: Ezra Chung <[email protected]>
Ok, I'll need a minute to write a polished patch for this but I have a proposal for a new naming to make the ownership of strings and string append operations clearer. I considered renaming the string_append to be an "appender", but my main frustration with this is that it implies the existence of methods like "string_appender_append" and "string_appender_printf", whereas I'd rather treat the "append" as the conceptual object that various appending methods act on. The mcommon_string_append() method itself feels conceptually like operator() on a string_append object. I think this makes the naming of most actual append operations clearer, if the confusion about lifetimes can be cleared up. I'm avoiding the word "init" in describing how we come up with a mcommon_string_append_t. Something like "begin" or "do" could work, describing the expected placement of the operation, but I'd prefer to be more straightforward about what's happening. So, I'm considering that instead of a mcommon_string_append_t constructor, this could be written as a mcommon_string method that "sets" an append to itself. So |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subject-object reversal of string
and append
is a much appreciated improvement. Thank you very much! One last suggestion remaining; otherwise, LGTM.
…ior, rename 'atomic' append to 'all-or-none'
#1826) This PR is a follow-up for #1795, #1821, and #1816. The new internal string and json APIs are used to implement missing functionality and tests for the structured logging subsystem. * Adds public APIs for the max_document_length within a mongoc_structured_log_opts_t. * Addresses the remainder of CDRIVER-4485: JSON documents within structured log messages are truncated correctly according to the rules set out in the Logging specification. * Addresses the remainder of CDRIVER-4486 by implementing prose tests. * Addresses CDRIVER-4814. Command or payload data beyond the truncation limit no longer degrades logging performance. (New serializer for command-with-payload) * Unified tests use the suggested max document length of 10000 from the spec * Replace atomic counter with atomic flag for environment error guards * Tests for environment defaults now skip if relevant variables are set externally
This is a follow-up to #1816 and mongodb/mongo-cxx-driver#1322, documenting minor changes to JSON output subsequent to change f2c1bb7
To efficiently support JSON serialization with bounded length, this is a new internal string API that can express bounded UTF-8-preserving append operations directly.
This patch on its own should have limited effect, but it's a prerequisite for work on CDRIVER-3775 and CDRIVER-4814.
The main benefit to this new API is that a single buffer can be shared by all nesting levels of a complex document. Previously, all nesting levels of a BSON document would allocate their own string append buffer, and truncation happened as a separate step on the way out of these nested levels.
Several other temporary string buffers were eliminated, including the temporary buffer that was needed for every printf. Now mcommon_string_append_printf writes directly to the destination, growing it as needed.
This work wasn't focused on performance, but it has an overall positive effect. The new mcommon_string implementations can be optimized more effectively at compile-time, and the updated string escape implementation treats fewer bytes as 'special'.
Other than the internal API change itself, the effects should be:
The new internal APIs are documented in header comments. I've added test cases to cover UTF-8 truncation behavior, but most of the new code's test coverage is provided by existing tests.