Skip to content

[CXX-2625] Bring our very own string_view (plus: Some iterators + ranges backports) #1062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Dec 21, 2023

Conversation

vector-of-bool
Copy link
Contributor

@vector-of-bool vector-of-bool commented Nov 16, 2023

Refer: CXX-2625

This changeset replaces our usage of external string_view implementations with a single custom implementation, and includes several smaller supporting library components and general code cleanup. The commits have been carefully organized to represent the iteration process taken while developing this changeset, and should be reviewable in isolation, rather than requiring the entire 1.8k line change be reviewed in whole. For reader's benefit, the changes are summarized here as well, in commit order.

Change Summaries

Utility macro: bsoncxx_returns(...)

This macro is a shorthand for

noexcept(noexcept(...)) 
  -> decltype(...)
{ return ...; }
static_assert(true, "")

(The trailing static_assert forces usage to be followed by a semicolon, and prevents clang-format from being confused.)

It is used like this:

template <typename T>
auto some_func(T x)
  bsoncxx_returns(x.foo());

Besides being easier to read and write, this ensures that: The noexcept, -> decltype and return expressions are all equivalent and correct with respect to each other. Placing the return expression in the noexcept and decltype specifier causes semantically invalid code to introduce a substitution failure context rather than a hard error, which is especially helpful when doing SFINAE trickery.

decay_copy and rank

The decay_copy invocable object is an implementation of the C++23 auto(...) expression, previously known as the exposition-only decay-copy(...) It decays its argument and invokes the appropriate move/copy constructor thereof in-situ. This is especially useful with bsoncxx_returns, in order to eliminate reference-returning operations. Edit: decay_copy was removed as YAGNI. May come back someday, if useful.

The rank<N> struct template is simply a class template which inherits from rank<N-1>, except for rank<0>, which inherits from nothing. This is used in later changes to create ordering between ambiguous overloads using tag dispatch. A candidate conversion from rank<10> to rank<2> is "preferrable" to a conversion from rank<10> to rank<1>.

Operator Utilities in operators.hpp

The code in operators.hpp is not strictly based on and C++ feature and is used to provide simpler implementation of operator overloading in the string_view implementation (and eventually the optional<T> implementation). Potentially this could be spread to other types which provide overloaded operators to eliminate existing boilerplate.

  1. The struct detail::is_equality_comparable<T, U> trait template detects whether const T& and const U& are comparable using the == and != operators.
  2. struct detail::equal_to implements a callable object like std::equal_to<void>. If given to arguments l and r that satisfy is_equality_comparable, it returns l == r.
  3. class detail::equality_operators is a mixin base class that provides ADL-only (hidden friend) implementations of operator==(x, y) and operator!=(x, y) if-and-only-if there exists a valid ADL-visible tag_invoke(detail::equal_to{}, x, y) implementation.
  4. class detail::strong_ordering is based on the C++20 std::strong_ordering class. This changeset does not introduce the other ordering types, as this is all we need for string_view. To simulate C++ inline variables, the [[gnu::weak]] and __declspec(selectany) attributes are used for the strong_ordering constants.
  5. compare_three_way implements the std::compare_three_way invocable object from C++20. This will be the first usage of rank<N>. Invoking this object as compare_three_way{}(l, r) does the first of the following:
  6. If tag_invoke(compare_three_way{}, l, r) is valid, returns that value as strong_ordering
  7. Otherwise, if l < r is valid and l == r is valid:
    1. If l < r, returns strong_ordering::less
    2. Otherwise, l == r return strong_ordering::equal
    3. Otherwise, returns strong_ordering::greater.
  8. Otherwise, the operator() is ill-formed with a substitution error.
  9. class detail::ordering_operators is a mixin base class that provides all of operator<, operator>, operator<= and operator>= as ADL-only hidden friend functions based on tag_invoke(compare_three_way{}, lhs, rhs).

C++20 Iterators

To support some niceties of C++20/23 additions to string_view, a detour was made for some iterator (and ranges) backports.

  1. pointer_traits and to_address. These are utilities simply for converting an iterator into a raw pointer to the referred-to object. to_address_t and dereference_t are not part of a standard, but are useful with is_detected.
  2. is_dereferencable is a traits template detects an object to which we can apply unary operator * which returns a non-void expression.
  3. iter_value_t, iter_reference_t, and iter_difference_t all come from C++20 to obtain the value type, reference type, and difference type of an iterator-like object, respectively.
  4. is_weakly_incrementable is a trait template based on the weakly_incrementable C++20 concept, and the basis for detecting iterators.
  5. is_iterator is a trait template that detects whether a type is usable as an iterator.
  6. contiguous_iterator_tag is a C++20 iterators concept tag that extends random_access_iterator_tag.
  7. iterator_concept_t is not part of C++20, but is useful for obtaining the iterator category. It is not possible to infer contiguous_iterator_tag, but it is extremely useful to know it if its available. Unless C++20 support is enabled, only raw pointers will return contiguous_iterator_tag.
  8. is_{input,forward,bidirection,random_access,contiguous}_iterator are rough trait templates that detect an iterator type based on the result of iterator_concept_t. They do not implement the full iterator checks of the C++20 concepts.
  9. is_sentinel_for<S, I> trait template detects whether S is a valid sentinel type for I.
  10. is_sized_sentinel_for<S, I> extends is_sentinel_for<S, I> to require that taking the difference between S and I returns a value convertible to iter_difference_t<I>.

Detour: Preprocessor Macros

This is a bit of an aside change to add some "probably useful" macros to clean up some conditional compilation and diagnostic control.

  1. bsoncxx_concat concatenates tokens, and bsoncxx_stringify turns tokens into a string literal.
  2. bsoncxx_if_msvc, bsoncxx_if_gcc, bsoncxx_if_clang, and bsoncxx_if_gnu_like are function-like macros that expand to their arguments if-and-only-if we are compiling for the respective compiler.
  3. bsoncxx_pragma(...) is a function-like port of MSVC __pragma. It accepts a token soup, which is then evaluated as a preprocessor #pragma at the use site.
  4. bsoncxx_push_warnings()/bsoncxx_pop_warnings() is equivalent to the appropriate compiler pragma to push/pop the diagnostics settings at the use site.
  5. bsoncxx_disable_warning(Spec) disables warnings for a certain compiler (See macro code comment).

A Little C++20 Ranges, as a Treat

A very small bit of C++20 ranges and the range-algorithms were backported, plus some additional utilities. This was finicky to get working on VS 2015, but doable. The following features are available in the bsoncxx::detail namespace:

  1. The begin(), end(), size(), ssize(), and data() invocable objects.
  2. iterator_t, sentinel_t, range_size_t, range_difference_t, range_data_t, range_value_t, and range_concept_t alias templates.
  3. is_range<R> trait template detects iterator_t<R> and sentinel_t<R>
  4. is_contiguous_range<R> is not strictly equivalent to the contiguous_range concept, but suites our needs. It requires a valid data(R) and size(R).
  5. A subrange<> class template for creating ranges from iterator pairs.
  6. unreachable_sentinel for unbounded views.
  7. Algorithms: advance, next, equal, find, find_if, and search. Also: not_fn and default_searcher.
  8. Other non-standard bits: make_reversed_view, equal_to_any_of, and equal_to_value.

The algorithms differ from std-ranges in that the iterator-only overloads are omitted, instead all require a range and are bounds-checked. To use a "zero overhead" unchecked algorithms, use a a subrange with unreachable_sentinel.

The Actual string_view

Finally, the actual stdx::basic_string_view class template. It builds upon all of the previous additions and allows us to stop using mnmlstc-core and Boost for our string-viewing needs.

The basic_string_view class template is not equal to the C++23 implementation, but provides most of the same functionality.

  1. Constructible from:
    1. Nothing. Default-constructs to null.
    2. A pointer+size pair.
    3. An iterator+sentinel pair, from C++23. Requires that they be a sized sentinel and contiguous iterator.
    4. A contiguous range of the character type, from C++23. This allows construction from std::vector<char>, std::array<char>, etc. This is an explicit conversion.
    5. An implicit conversion from string-like ranges of the character type. This includes C++ std::string, as well as std::string_view itself. It detects "string-like" based on the presence of a .c_str() method.
    6. Construction from nullptr_t is explicitly deleted (from C++23).
  2. All of begin(), cbegin(), rbegin(), and crbegin() are defined, as well as the corresponding end()s
  3. operator[] and all non-throwing access functions that specify UB for out-of-bounds access include an assertion with a useful error message rather than a strict terminations, done using _assert_inbounds(). This method is defined inline so that assertion condition is visible to the inliner without requiring LTO, while the actual diagnostic+termination function is implemented out-of-line to reduce code size.
  4. The C++20 additions of starts_with and ends_with are included.
  5. C++23 contains.
  6. All find-etc members are included.
  7. The equality and ordering operators are implemented using the base class mixins from operators.hpp using tag_invoke. Only two functions make all six operators!

@vector-of-bool vector-of-bool changed the title Bring our very own string_view (plus: Some iterators + ranges backports) [MONGOCXX-2625] Bring our very own string_view (plus: Some iterators + ranges backports) Nov 16, 2023
@vector-of-bool vector-of-bool marked this pull request as ready for review November 16, 2023 21:41
@kevinAlbs kevinAlbs changed the title [MONGOCXX-2625] Bring our very own string_view (plus: Some iterators + ranges backports) [CXX-2625] Bring our very own string_view (plus: Some iterators + ranges backports) Nov 17, 2023
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive.

using self_type = basic_string_view;

/**
* @brief If R is a type for which we want to permit implicit conversion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @brief If R is a type for which we want to permit implicit conversion,
* @brief If S is a type for which we want to permit implicit conversion,

Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your hard work on this PR. A couple suggestions remaining; otherwise, LGTM. 👍

@@ -679,6 +674,7 @@ BSONCXX_INLINE bool operator==(const b_maxkey&, const b_maxkey&) {
} // namespace v_noabi
} // namespace bsoncxx

BSONCXX_POP_WARNINGS();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BSONCXX_POP_WARNINGS();
BSONCXX_POP_WARNINGS();

Spacing.

Comment on lines 83 to 87
constexpr basic_string_view(const basic_string_view&) noexcept = default;
bsoncxx_cxx14_constexpr basic_string_view& operator=(const basic_string_view&) noexcept =
default;

/// Default copy/move/assign/destroy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr basic_string_view(const basic_string_view&) noexcept = default;
bsoncxx_cxx14_constexpr basic_string_view& operator=(const basic_string_view&) noexcept =
default;
/// Default copy/move/assign/destroy
constexpr basic_string_view(const basic_string_view&) noexcept = default;
bsoncxx_cxx14_constexpr basic_string_view& operator=(const basic_string_view&) noexcept =
default;

Fix(?) spacing by separating the copy special member functions from the default ctor + remove a stray comment.

*
* @param n The number of characters to remove from the beginning. Must be less than size()
*/
bsoncxx_cxx14_constexpr void remove_prefix(size_type n) noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bsoncxx_cxx14_constexpr void remove_prefix(size_type n) noexcept {
bsoncxx_cxx14_constexpr void remove_prefix(size_type n) {

Stray noexcept.

*
* @throws std::out_of_range if pos > size()
*/
bsoncxx_cxx14_constexpr size_type copy(pointer dest, size_type count, size_type pos = 0) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bsoncxx_cxx14_constexpr size_type copy(pointer dest, size_type count, size_type pos = 0) const {
size_type copy(pointer dest, size_type count, size_type pos = 0) const {

Not yet addressed.

Comment on lines 336 to 337
bsoncxx_cxx14_constexpr size_type find(basic_string_view infix,
size_type pos = 0) const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bsoncxx_cxx14_constexpr size_type find(basic_string_view infix,
size_type pos = 0) const noexcept {
bsoncxx_cxx14_constexpr size_type find(basic_string_view infix, size_type pos = 0) const
noexcept {

ClangFormat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newer clang-format wants to break this line differently. Pain.

Comment on lines 357 to 358
bsoncxx_cxx14_constexpr size_type rfind(basic_string_view infix,
size_type pos = npos) const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bsoncxx_cxx14_constexpr size_type rfind(basic_string_view infix,
size_type pos = npos) const noexcept {
bsoncxx_cxx14_constexpr size_type rfind(basic_string_view infix, size_type pos = npos) const
noexcept {

ClangFormat.

Comment on lines 400 to 401
constexpr size_type find_last_not_of(basic_string_view set,
size_type pos = npos) const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr size_type find_last_not_of(basic_string_view set,
size_type pos = npos) const noexcept {
constexpr size_type find_last_not_of(basic_string_view set, size_type pos = npos) const
noexcept {

ClangFormat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants