Skip to content

[libc++] Optimize char_traits a bit #72799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2024

Conversation

philnik777
Copy link
Contributor

This implements two kinds of optimizations. Specifically

  • char_traits<char8_t> uses char code paths; these are heavily
    optimized and the operations are equivalent
  • char16_t and char32_t find uses std::find to forward to
    wmemchr if they have the same size

@philnik777 philnik777 requested a review from a team as a code owner November 19, 2023 16:58
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 19, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2023

@llvm/pr-subscribers-libcxx

Author: None (philnik777)

Changes

This implements two kinds of optimizations. Specifically

  • char_traits&lt;char8_t&gt; uses char code paths; these are heavily
    optimized and the operations are equivalent
  • char16_t and char32_t find uses std::find to forward to
    wmemchr if they have the same size

Full diff: https://github.com/llvm/llvm-project/pull/72799.diff

2 Files Affected:

  • (modified) libcxx/include/__string/char_traits.h (+42-76)
  • (modified) libcxx/include/__string/constexpr_c_functions.h (+9-4)
diff --git a/libcxx/include/__string/char_traits.h b/libcxx/include/__string/char_traits.h
index 17eade2f8188f76..b3bde455eb4a260 100644
--- a/libcxx/include/__string/char_traits.h
+++ b/libcxx/include/__string/char_traits.h
@@ -11,12 +11,14 @@
 
 #include <__algorithm/copy_n.h>
 #include <__algorithm/fill_n.h>
+#include <__algorithm/find.h>
 #include <__algorithm/find_end.h>
 #include <__algorithm/find_first_of.h>
 #include <__algorithm/min.h>
 #include <__compare/ordering.h>
 #include <__config>
 #include <__functional/hash.h>
+#include <__functional/identity.h>
 #include <__iterator/iterator_traits.h>
 #include <__string/constexpr_c_functions.h>
 #include <__type_traits/is_constant_evaluated.h>
@@ -355,34 +357,35 @@ struct _LIBCPP_TEMPLATE_VIS char_traits<char8_t>
         {return __c1 < __c2;}
 
   static _LIBCPP_HIDE_FROM_ABI constexpr int
-  compare(const char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT {
+  compare(const char_type* __s1, const char_type* __s2, size_t __n) noexcept {
       return std::__constexpr_memcmp(__s1, __s2, __element_count(__n));
   }
 
-    static _LIBCPP_HIDE_FROM_ABI constexpr
-    size_t           length(const char_type* __s) _NOEXCEPT;
+  static _LIBCPP_HIDE_FROM_ABI constexpr size_t length(const char_type* __str) noexcept {
+    return std::__constexpr_strlen(__str);
+  }
 
-    _LIBCPP_INLINE_VISIBILITY static constexpr
-    const char_type* find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT;
+  static _LIBCPP_HIDE_FROM_ABI constexpr const char_type*
+  find(const char_type* __s, size_t __n, const char_type& __a) noexcept {
+    return std::__constexpr_memchr(__s, __a, __n);
+  }
 
-    static _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-    char_type*       move(char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT {
-        return std::__constexpr_memmove(__s1, __s2, __element_count(__n));
-    }
+  static _LIBCPP_HIDE_FROM_ABI constexpr char_type*
+  move(char_type* __s1, const char_type* __s2, size_t __n) noexcept {
+    return std::__constexpr_memmove(__s1, __s2, __element_count(__n));
+  }
 
-    static _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-    char_type*       copy(char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT {
-      _LIBCPP_ASSERT_NON_OVERLAPPING_RANGES(!std::__is_pointer_in_range(__s1, __s1 + __n, __s2),
-                                            "char_traits::copy: source and destination ranges overlap");
-      std::copy_n(__s2, __n, __s1);
-      return __s1;
-    }
+  static _LIBCPP_HIDE_FROM_ABI constexpr char_type* copy(char_type* __s1, const char_type* __s2, size_t __n) noexcept {
+    _LIBCPP_ASSERT_NON_OVERLAPPING_RANGES(!std::__is_pointer_in_range(__s1, __s1 + __n, __s2),
+                                          "char_traits::copy: source and destination ranges overlap");
+    std::copy_n(__s2, __n, __s1);
+    return __s1;
+  }
 
-    static _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-    char_type*       assign(char_type* __s, size_t __n, char_type __a) _NOEXCEPT {
-        std::fill_n(__s, __n, __a);
-        return __s;
-    }
+  static _LIBCPP_HIDE_FROM_ABI constexpr char_type* assign(char_type* __s, size_t __n, char_type __a) noexcept {
+    std::fill_n(__s, __n, __a);
+    return __s;
+  }
 
     static inline _LIBCPP_HIDE_FROM_ABI constexpr int_type  not_eof(int_type __c) noexcept
         {return eq_int_type(__c, eof()) ? ~eof() : __c;}
@@ -396,31 +399,6 @@ struct _LIBCPP_TEMPLATE_VIS char_traits<char8_t>
         {return int_type(EOF);}
 };
 
-// TODO use '__builtin_strlen' if it ever supports char8_t ??
-inline constexpr
-size_t
-char_traits<char8_t>::length(const char_type* __s) _NOEXCEPT
-{
-    size_t __len = 0;
-    for (; !eq(*__s, char_type(0)); ++__s)
-        ++__len;
-    return __len;
-}
-
-// TODO use '__builtin_char_memchr' if it ever supports char8_t ??
-inline constexpr
-const char8_t*
-char_traits<char8_t>::find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT
-{
-    for (; __n; --__n)
-    {
-        if (eq(*__s, __a))
-            return __s;
-        ++__s;
-    }
-    return nullptr;
-}
-
 #endif // _LIBCPP_HAS_NO_CHAR8_T
 
 template <>
@@ -446,8 +424,15 @@ struct _LIBCPP_TEMPLATE_VIS char_traits<char16_t>
     int              compare(const char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT;
     _LIBCPP_INLINE_VISIBILITY static _LIBCPP_CONSTEXPR_SINCE_CXX17
     size_t           length(const char_type* __s) _NOEXCEPT;
-    _LIBCPP_INLINE_VISIBILITY static _LIBCPP_CONSTEXPR_SINCE_CXX17
-    const char_type* find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT;
+
+    static _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX17 const char_type*
+    find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT {
+      __identity __proj;
+      const char_type* __match = std::__find_impl(__s, __s + __n, __a, __proj);
+      if (__match == __s + __n)
+        return nullptr;
+      return __match;
+    }
 
     _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20
     static char_type*       move(char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT {
@@ -504,19 +489,6 @@ char_traits<char16_t>::length(const char_type* __s) _NOEXCEPT
     return __len;
 }
 
-inline _LIBCPP_CONSTEXPR_SINCE_CXX17
-const char16_t*
-char_traits<char16_t>::find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT
-{
-    for (; __n; --__n)
-    {
-        if (eq(*__s, __a))
-            return __s;
-        ++__s;
-    }
-    return nullptr;
-}
-
 template <>
 struct _LIBCPP_TEMPLATE_VIS char_traits<char32_t>
 {
@@ -540,8 +512,15 @@ struct _LIBCPP_TEMPLATE_VIS char_traits<char32_t>
     int              compare(const char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT;
     _LIBCPP_INLINE_VISIBILITY static _LIBCPP_CONSTEXPR_SINCE_CXX17
     size_t           length(const char_type* __s) _NOEXCEPT;
-    _LIBCPP_INLINE_VISIBILITY static _LIBCPP_CONSTEXPR_SINCE_CXX17
-    const char_type* find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT;
+
+    static _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX17 const char_type*
+    find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT {
+      __identity __proj;
+      const char_type* __match = std::__find_impl(__s, __s + __n, __a, __proj);
+      if (__match == __s + __n)
+        return nullptr;
+      return __match;
+    }
 
     _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20
     static char_type*       move(char_type* __s1, const char_type* __s2, size_t __n) _NOEXCEPT {
@@ -596,19 +575,6 @@ char_traits<char32_t>::length(const char_type* __s) _NOEXCEPT
     return __len;
 }
 
-inline _LIBCPP_CONSTEXPR_SINCE_CXX17
-const char32_t*
-char_traits<char32_t>::find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT
-{
-    for (; __n; --__n)
-    {
-        if (eq(*__s, __a))
-            return __s;
-        ++__s;
-    }
-    return nullptr;
-}
-
 // helper fns for basic_string and string_view
 
 // __str_find
diff --git a/libcxx/include/__string/constexpr_c_functions.h b/libcxx/include/__string/constexpr_c_functions.h
index 198f0f5e6809147..c029c2d29fec88c 100644
--- a/libcxx/include/__string/constexpr_c_functions.h
+++ b/libcxx/include/__string/constexpr_c_functions.h
@@ -35,18 +35,23 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 // of elements as opposed to a number of bytes.
 enum class __element_count : size_t {};
 
-inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 size_t __constexpr_strlen(const char* __str) {
+template <class _Tp>
+inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 size_t __constexpr_strlen(const _Tp* __str) _NOEXCEPT {
+  static_assert(
+      is_integral<_Tp>::value && sizeof(_Tp) == 1, "__constexpr_strlen only works with integral types of size 1");
   // GCC currently doesn't support __builtin_strlen for heap-allocated memory during constant evaluation.
   // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70816
-#ifdef _LIBCPP_COMPILER_GCC
   if (__libcpp_is_constant_evaluated()) {
+#if _LIBCPP_STD_VER >= 17 && defined(_LIBCPP_COMPILER_CLANG_BASED)
+    if constexpr (is_same_v<_Tp, char>)
+      return __builtin_strlen(__str);
+#endif
     size_t __i = 0;
     for (; __str[__i] != '\0'; ++__i)
       ;
     return __i;
   }
-#endif
-  return __builtin_strlen(__str);
+  return __builtin_strlen(reinterpret_cast<const char*>(__str));
 }
 
 // Because of __libcpp_is_trivially_lexicographically_comparable we know that comparing the object representations is

@EricWF
Copy link
Member

EricWF commented Nov 21, 2023

Could you please provide information about:

(A) Benchmark results for these functions.
(B) The changes to compile times and compiler memory usage.

@EricWF EricWF self-requested a review November 21, 2023 01:25
Copy link
Member

@EricWF EricWF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I would like to see some data on the improvements.

@philnik777
Copy link
Contributor Author

Could you please provide information about:

(A) Benchmark results for these functions.

These are the numbers from optimizing std::find:

---------------------------------------------------------------
Benchmark                                   old             new
---------------------------------------------------------------
bm_find<char>/1                         10.5 ns         10.5 ns
bm_find<char>/2                         18.3 ns         10.4 ns
bm_find<char>/3                         21.1 ns         10.4 ns
bm_find<char>/4                         22.4 ns         10.3 ns
bm_find<char>/5                         23.3 ns         10.3 ns
bm_find<char>/6                         23.7 ns         10.4 ns
bm_find<char>/7                         24.0 ns         10.3 ns
bm_find<char>/8                         24.5 ns         10.2 ns
bm_find<char>/16                        26.2 ns         10.2 ns
bm_find<char>/64                        29.9 ns         20.8 ns
bm_find<char>/512                       80.2 ns         36.0 ns
bm_find<char>/4096                       523 ns         43.0 ns
bm_find<char>/32768                     3838 ns          105 ns
bm_find<char>/262144                   30832 ns          988 ns
bm_find<char>/1048576                 122541 ns         4854 ns
bm_find<int>/1                          10.4 ns         12.1 ns
bm_find<int>/2                          18.4 ns         12.0 ns
bm_find<int>/3                          21.9 ns         12.0 ns
bm_find<int>/4                          23.3 ns         11.9 ns
bm_find<int>/5                          24.2 ns         12.1 ns
bm_find<int>/6                          24.7 ns         12.0 ns
bm_find<int>/7                          24.8 ns         11.9 ns
bm_find<int>/8                          25.2 ns         11.9 ns
bm_find<int>/16                         26.2 ns         21.9 ns
bm_find<int>/64                         30.5 ns         33.8 ns
bm_find<int>/512                        80.4 ns         40.4 ns
bm_find<int>/4096                        497 ns         71.7 ns
bm_find<int>/32768                      3844 ns          498 ns
bm_find<int>/262144                    30722 ns         4901 ns
bm_find<int>/1048576                  123303 ns        22864 ns

Since it's basically the same optimization I think these are pretty representative of the expected speedup. The numbers for length would also be similar, since strlen is basically memchr(str, '\0', <buffer_len>), just with possible out-of-bounds reads.

(B) The changes to compile times and compiler memory usage.

I don't know how to measure memory usage properly, but the time of including <string> is within margin of error. I've checked with -std=c++23, which eagerly instantiates all the functions.

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

✅ With the latest revision this PR passed the Python code formatter.

This implements two kinds of optimizations. Specifically
- `char_traits<char8_t>` uses `char` code paths; these are heavily
  optimized and the operations are equivalent
- `char16_t` and `char32_t` `find` uses `std::find` to forward to
  `wmemchr` if they have the same size
@philnik777 philnik777 force-pushed the optimize_char_traits branch from 03069d6 to 5e31d44 Compare March 25, 2024 08:30
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming we already have appropriate coverage in char_traits for char8_t and friends.

_LIBCPP_HIDE_FROM_ABI static _LIBCPP_CONSTEXPR_SINCE_CXX17 const char_type*
find(const char_type* __s, size_t __n, const char_type& __a) _NOEXCEPT {
__identity __proj;
const char_type* __match = std::__find_impl(__s, __s + __n, __a, __proj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this patch, but can we rename __find_impl to __find for consistency?

@philnik777
Copy link
Contributor Author

ping @EricWF

@philnik777
Copy link
Contributor Author

Landing, since I addressed all the feedback. @EricWF if you have any more thoughts feel free to comment.

@philnik777 philnik777 merged commit 29312d3 into llvm:main Apr 20, 2024
@philnik777 philnik777 deleted the optimize_char_traits branch April 20, 2024 09:40
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
This implements two kinds of optimizations. Specifically
- `char_traits<char8_t>` uses `char` code paths; these are heavily
  optimized and the operations are equivalent
- `char16_t` and `char32_t` `find` uses `std::find` to forward to
  `wmemchr` if they have the same size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants