Skip to content

[libc++] Optimize num_put integral functions #120859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

philnik777
Copy link
Contributor

-------------------------------------------------------
Benchmark                              old          new
-------------------------------------------------------
BM_num_put<bool>                   76.2 ns      32.0 ns
BM_num_put<long>                   76.9 ns      33.1 ns
BM_num_put<long long>              77.9 ns      34.2 ns
BM_num_put<unsigned long>          78.4 ns      33.1 ns
BM_num_put<unsigned long long>     78.0 ns      34.4 ns
BM_num_put<double>                  224 ns       228 ns
BM_num_put<long double>             239 ns       230 ns
BM_num_put<const void*>            68.7 ns      35.1 ns

Fixes #40109.

Copy link

github-actions bot commented Dec 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@philnik777 philnik777 changed the title [libc++] Optimize num_get integral functions [libc++] Optimize num_put integral functions Dec 22, 2024
@philnik777 philnik777 force-pushed the optimize_num_put branch 8 times, most recently from b6118e1 to bb7789e Compare December 27, 2024 10:11
@philnik777 philnik777 marked this pull request as ready for review December 27, 2024 15:40
@philnik777 philnik777 requested a review from a team as a code owner December 27, 2024 15:40
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Dec 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 27, 2024

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
-------------------------------------------------------
Benchmark                              old          new
-------------------------------------------------------
BM_num_put&lt;bool&gt;                   76.2 ns      32.0 ns
BM_num_put&lt;long&gt;                   76.9 ns      33.1 ns
BM_num_put&lt;long long&gt;              77.9 ns      34.2 ns
BM_num_put&lt;unsigned long&gt;          78.4 ns      33.1 ns
BM_num_put&lt;unsigned long long&gt;     78.0 ns      34.4 ns
BM_num_put&lt;double&gt;                  224 ns       228 ns
BM_num_put&lt;long double&gt;             239 ns       230 ns
BM_num_put&lt;const void*&gt;            68.7 ns      35.1 ns

Fixes #40109.


Patch is 35.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/120859.diff

14 Files Affected:

  • (modified) libcxx/docs/ReleaseNotes/20.rst (+2)
  • (modified) libcxx/include/__charconv/tables.h (+8-12)
  • (modified) libcxx/include/__charconv/to_chars_base_10.h (+14-18)
  • (modified) libcxx/include/__charconv/to_chars_integral.h (+45-27)
  • (modified) libcxx/include/__charconv/to_chars_result.h (+9)
  • (modified) libcxx/include/__charconv/traits.h (+3-7)
  • (modified) libcxx/include/__format/formatter_floating_point.h (+1)
  • (modified) libcxx/include/__format/formatter_integral.h (+1-1)
  • (modified) libcxx/include/__format/formatter_output.h (-18)
  • (modified) libcxx/include/locale (+65-39)
  • (modified) libcxx/include/module.modulemap (+5-1)
  • (added) libcxx/test/benchmarks/locale/num_put.bench.cpp (+39)
  • (modified) libcxx/test/std/input.output/iostream.format/output.streams/ostream.formatted/ostream.inserters.arithmetic/minus1.pass.cpp (+2-1)
  • (modified) libcxx/test/std/localization/locale.categories/category.numeric/locale.nm.put/facet.num.put.members/put_pointer.pass.cpp (+21-26)
diff --git a/libcxx/docs/ReleaseNotes/20.rst b/libcxx/docs/ReleaseNotes/20.rst
index c8a07fb8b73348..4cac642f22948d 100644
--- a/libcxx/docs/ReleaseNotes/20.rst
+++ b/libcxx/docs/ReleaseNotes/20.rst
@@ -73,6 +73,8 @@ Improvements and New Features
   optimized, resulting in a performance improvement of up to 2x for trivial element types (e.g., `std::vector<int>`),
   and up to 3.4x for non-trivial element types (e.g., `std::vector<std::vector<int>>`).
 
+- The ``num_get::do_put`` integral overloads have been optimized, resulting in a performance improvement of up to 2.4x.
+
 Deprecations and Removals
 -------------------------
 
diff --git a/libcxx/include/__charconv/tables.h b/libcxx/include/__charconv/tables.h
index 9568bf841cd029..b8c6fd8af0a0f8 100644
--- a/libcxx/include/__charconv/tables.h
+++ b/libcxx/include/__charconv/tables.h
@@ -19,16 +19,14 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-#if _LIBCPP_STD_VER >= 17
-
 namespace __itoa {
 
-inline constexpr char __base_2_lut[64] = {
+inline _LIBCPP_CONSTEXPR const char __base_2_lut[64] = {
     '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '1', '0', '0', '0', '1', '1', '0', '1', '0', '0', '0', '1',
     '0', '1', '0', '1', '1', '0', '0', '1', '1', '1', '1', '0', '0', '0', '1', '0', '0', '1', '1', '0', '1', '0',
     '1', '0', '1', '1', '1', '1', '0', '0', '1', '1', '0', '1', '1', '1', '1', '0', '1', '1', '1', '1'};
 
-inline constexpr char __base_8_lut[128] = {
+inline _LIBCPP_CONSTEXPR const char __base_8_lut[128] = {
     '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '1', '0', '1', '1', '1', '2',
     '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5',
     '2', '6', '2', '7', '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '4', '0',
@@ -36,7 +34,7 @@ inline constexpr char __base_8_lut[128] = {
     '5', '4', '5', '5', '5', '6', '5', '7', '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6',
     '6', '7', '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7'};
 
-inline constexpr char __base_16_lut[512] = {
+inline _LIBCPP_CONSTEXPR const char __base_16_lut[512] = {
     '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', '0', 'a', '0',
     'b', '0', 'c', '0', 'd', '0', 'e', '0', 'f', '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6',
     '1', '7', '1', '8', '1', '9', '1', 'a', '1', 'b', '1', 'c', '1', 'd', '1', 'e', '1', 'f', '2', '0', '2', '1', '2',
@@ -61,7 +59,7 @@ inline constexpr char __base_16_lut[512] = {
     '1', 'f', '2', 'f', '3', 'f', '4', 'f', '5', 'f', '6', 'f', '7', 'f', '8', 'f', '9', 'f', 'a', 'f', 'b', 'f', 'c',
     'f', 'd', 'f', 'e', 'f', 'f'};
 
-inline constexpr uint32_t __pow10_32[10] = {
+inline _LIBCPP_CONSTEXPR const uint32_t __pow10_32[10] = {
     UINT32_C(0),
     UINT32_C(10),
     UINT32_C(100),
@@ -73,7 +71,7 @@ inline constexpr uint32_t __pow10_32[10] = {
     UINT32_C(100000000),
     UINT32_C(1000000000)};
 
-inline constexpr uint64_t __pow10_64[20] = {
+inline _LIBCPP_CONSTEXPR const uint64_t __pow10_64[20] = {
     UINT64_C(0),
     UINT64_C(10),
     UINT64_C(100),
@@ -96,8 +94,8 @@ inline constexpr uint64_t __pow10_64[20] = {
     UINT64_C(10000000000000000000)};
 
 #  if _LIBCPP_HAS_INT128
-inline constexpr int __pow10_128_offset      = 0;
-inline constexpr __uint128_t __pow10_128[40] = {
+inline _LIBCPP_CONSTEXPR const int __pow10_128_offset      = 0;
+inline _LIBCPP_CONSTEXPR const __uint128_t __pow10_128[40] = {
     UINT64_C(0),
     UINT64_C(10),
     UINT64_C(100),
@@ -140,7 +138,7 @@ inline constexpr __uint128_t __pow10_128[40] = {
     (__uint128_t(UINT64_C(10000000000000000000)) * UINT64_C(10000000000000000000)) * 10};
 #  endif
 
-inline constexpr char __digits_base_10[200] = {
+inline _LIBCPP_CONSTEXPR const char __digits_base_10[200] = {
     // clang-format off
     '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
     '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9',
@@ -156,8 +154,6 @@ inline constexpr char __digits_base_10[200] = {
 
 } // namespace __itoa
 
-#endif // _LIBCPP_STD_VER >= 17
-
 _LIBCPP_END_NAMESPACE_STD
 
 #endif // _LIBCPP___CHARCONV_TABLES
diff --git a/libcxx/include/__charconv/to_chars_base_10.h b/libcxx/include/__charconv/to_chars_base_10.h
index 06e4e692337df5..d90952ea71f356 100644
--- a/libcxx/include/__charconv/to_chars_base_10.h
+++ b/libcxx/include/__charconv/to_chars_base_10.h
@@ -26,55 +26,53 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-#if _LIBCPP_STD_VER >= 17
-
 namespace __itoa {
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append1(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append1(char* __first, uint32_t __value) _NOEXCEPT {
   *__first = '0' + static_cast<char>(__value);
   return __first + 1;
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append2(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append2(char* __first, uint32_t __value) _NOEXCEPT {
   return std::copy_n(&__digits_base_10[__value * 2], 2, __first);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append3(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append3(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append2(__itoa::__append1(__first, __value / 100), __value % 100);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append4(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append4(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append2(__itoa::__append2(__first, __value / 100), __value % 100);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append5(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append5(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append4(__itoa::__append1(__first, __value / 10000), __value % 10000);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append6(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append6(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append4(__itoa::__append2(__first, __value / 10000), __value % 10000);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append7(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append7(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append6(__itoa::__append1(__first, __value / 1000000), __value % 1000000);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append8(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append8(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append6(__itoa::__append2(__first, __value / 1000000), __value % 1000000);
 }
 
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append9(char* __first, uint32_t __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char* __append9(char* __first, uint32_t __value) _NOEXCEPT {
   return __itoa::__append8(__itoa::__append1(__first, __value / 100000000), __value % 100000000);
 }
 
 template <class _Tp>
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI char* __append10(char* __first, _Tp __value) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI char* __append10(char* __first, _Tp __value) _NOEXCEPT {
   return __itoa::__append8(__itoa::__append2(__first, static_cast<uint32_t>(__value / 100000000)),
                            static_cast<uint32_t>(__value % 100000000));
 }
 
 _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char*
-__base_10_u32(char* __first, uint32_t __value) noexcept {
+__base_10_u32(char* __first, uint32_t __value) _NOEXCEPT {
   if (__value < 1000000) {
     if (__value < 10000) {
       if (__value < 100) {
@@ -110,7 +108,7 @@ __base_10_u32(char* __first, uint32_t __value) noexcept {
 }
 
 _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char*
-__base_10_u64(char* __buffer, uint64_t __value) noexcept {
+__base_10_u64(char* __buffer, uint64_t __value) _NOEXCEPT {
   if (__value <= UINT32_MAX)
     return __itoa::__base_10_u32(__buffer, static_cast<uint32_t>(__value));
 
@@ -132,13 +130,13 @@ __base_10_u64(char* __buffer, uint64_t __value) noexcept {
 /// \note The lookup table contains a partial set of exponents limiting the
 /// range that can be used. However the range is sufficient for
 /// \ref __base_10_u128.
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline __uint128_t __pow_10(int __exp) noexcept {
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline __uint128_t __pow_10(int __exp) _NOEXCEPT {
   _LIBCPP_ASSERT_INTERNAL(__exp >= __pow10_128_offset, "Index out of bounds");
   return __pow10_128[__exp - __pow10_128_offset];
 }
 
 _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI inline char*
-__base_10_u128(char* __buffer, __uint128_t __value) noexcept {
+__base_10_u128(char* __buffer, __uint128_t __value) _NOEXCEPT {
   _LIBCPP_ASSERT_INTERNAL(
       __value > numeric_limits<uint64_t>::max(), "The optimizations for this algorithm fails when this isn't true.");
 
@@ -179,8 +177,6 @@ __base_10_u128(char* __buffer, __uint128_t __value) noexcept {
 #  endif
 } // namespace __itoa
 
-#endif // _LIBCPP_STD_VER >= 17
-
 _LIBCPP_END_NAMESPACE_STD
 
 _LIBCPP_POP_MACROS
diff --git a/libcxx/include/__charconv/to_chars_integral.h b/libcxx/include/__charconv/to_chars_integral.h
index 710299df9b4da8..238c96d7c7a041 100644
--- a/libcxx/include/__charconv/to_chars_integral.h
+++ b/libcxx/include/__charconv/to_chars_integral.h
@@ -39,16 +39,12 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-#if _LIBCPP_STD_VER >= 17
-
-to_chars_result to_chars(char*, char*, bool, int = 10) = delete;
-
 template <typename _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_itoa(char* __first, char* __last, _Tp __value, false_type);
 
 template <typename _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_itoa(char* __first, char* __last, _Tp __value, true_type) {
   auto __x = std::__to_unsigned_like(__value);
   if (__value < 0 && __first != __last) {
@@ -60,7 +56,7 @@ __to_chars_itoa(char* __first, char* __last, _Tp __value, true_type) {
 }
 
 template <typename _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_itoa(char* __first, char* __last, _Tp __value, false_type) {
   using __tx  = __itoa::__traits<_Tp>;
   auto __diff = __last - __first;
@@ -73,7 +69,7 @@ __to_chars_itoa(char* __first, char* __last, _Tp __value, false_type) {
 
 #  if _LIBCPP_HAS_INT128
 template <>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_itoa(char* __first, char* __last, __uint128_t __value, false_type) {
   // When the value fits in 64-bits use the 64-bit code path. This reduces
   // the number of expensive calculations on 128-bit values.
@@ -92,20 +88,20 @@ __to_chars_itoa(char* __first, char* __last, __uint128_t __value, false_type) {
 }
 #  endif
 
-template <class _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
-__to_chars_integral(char* __first, char* __last, _Tp __value, int __base, false_type);
+template <class _Tp, __enable_if_t<!is_signed<_Tp>::value, int> = 0>
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
+__to_chars_integral(char* __first, char* __last, _Tp __value, int __base);
 
-template <typename _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
-__to_chars_integral(char* __first, char* __last, _Tp __value, int __base, true_type) {
+template <class _Tp, __enable_if_t<is_signed<_Tp>::value, int> = 0>
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
+__to_chars_integral(char* __first, char* __last, _Tp __value, int __base) {
   auto __x = std::__to_unsigned_like(__value);
   if (__value < 0 && __first != __last) {
     *__first++ = '-';
     __x        = std::__complement(__x);
   }
 
-  return std::__to_chars_integral(__first, __last, __x, __base, false_type());
+  return std::__to_chars_integral(__first, __last, __x, __base);
 }
 
 namespace __itoa {
@@ -116,7 +112,7 @@ struct _LIBCPP_HIDDEN __integral;
 template <>
 struct _LIBCPP_HIDDEN __integral<2> {
   template <typename _Tp>
-  _LIBCPP_HIDE_FROM_ABI static constexpr int __width(_Tp __value) noexcept {
+  _LIBCPP_HIDE_FROM_ABI static _LIBCPP_CONSTEXPR int __width(_Tp __value) _NOEXCEPT {
     // If value == 0 still need one digit. If the value != this has no
     // effect since the code scans for the most significant bit set. (Note
     // that __libcpp_clz doesn't work for 0.)
@@ -124,7 +120,7 @@ struct _LIBCPP_HIDDEN __integral<2> {
   }
 
   template <typename _Tp>
-  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static to_chars_result
+  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static __to_chars_result
   __to_chars(char* __first, char* __last, _Tp __value) {
     ptrdiff_t __cap = __last - __first;
     int __n         = __width(__value);
@@ -152,7 +148,7 @@ struct _LIBCPP_HIDDEN __integral<2> {
 template <>
 struct _LIBCPP_HIDDEN __integral<8> {
   template <typename _Tp>
-  _LIBCPP_HIDE_FROM_ABI static constexpr int __width(_Tp __value) noexcept {
+  _LIBCPP_HIDE_FROM_ABI static _LIBCPP_CONSTEXPR int __width(_Tp __value) _NOEXCEPT {
     // If value == 0 still need one digit. If the value != this has no
     // effect since the code scans for the most significat bit set. (Note
     // that __libcpp_clz doesn't work for 0.)
@@ -160,7 +156,7 @@ struct _LIBCPP_HIDDEN __integral<8> {
   }
 
   template <typename _Tp>
-  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static to_chars_result
+  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static __to_chars_result
   __to_chars(char* __first, char* __last, _Tp __value) {
     ptrdiff_t __cap = __last - __first;
     int __n         = __width(__value);
@@ -188,7 +184,7 @@ struct _LIBCPP_HIDDEN __integral<8> {
 template <>
 struct _LIBCPP_HIDDEN __integral<16> {
   template <typename _Tp>
-  _LIBCPP_HIDE_FROM_ABI static constexpr int __width(_Tp __value) noexcept {
+  _LIBCPP_HIDE_FROM_ABI static _LIBCPP_CONSTEXPR int __width(_Tp __value) _NOEXCEPT {
     // If value == 0 still need one digit. If the value != this has no
     // effect since the code scans for the most significat bit set. (Note
     // that __libcpp_clz doesn't work for 0.)
@@ -196,7 +192,7 @@ struct _LIBCPP_HIDDEN __integral<16> {
   }
 
   template <typename _Tp>
-  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static to_chars_result
+  _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI static __to_chars_result
   __to_chars(char* __first, char* __last, _Tp __value) {
     ptrdiff_t __cap = __last - __first;
     int __n         = __width(__value);
@@ -235,13 +231,13 @@ _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI int __to_chars_integral_widt
 }
 
 template <unsigned _Base, typename _Tp, __enable_if_t<(sizeof(_Tp) >= sizeof(unsigned)), int> = 0>
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_integral(char* __first, char* __last, _Tp __value) {
   return __itoa::__integral<_Base>::__to_chars(__first, __last, __value);
 }
 
 template <unsigned _Base, typename _Tp, __enable_if_t<(sizeof(_Tp) < sizeof(unsigned)), int> = 0>
-_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
+_LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
 __to_chars_integral(char* __first, char* __last, _Tp __value) {
   return std::__to_chars_integral<_Base>(__first, __last, static_cast<unsigned>(__value));
 }
@@ -272,9 +268,9 @@ _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI int __to_chars_integral_widt
   __libcpp_unreachable();
 }
 
-template <typename _Tp>
-inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
-__to_chars_integral(char* __first, char* __last, _Tp __value, int __base, false_type) {
+template <class _Tp, __enable_if_t<!is_signed<_Tp>::value, int> >
+inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI __to_chars_result
+__to_chars_integral(char* __first, char* __last, _Tp __value, int __base) {
   if (__base == 10) [[likely]]
     return std::__to_chars_itoa(__first, __last, __value, false_type());
 
@@ -302,6 +298,28 @@ __to_chars_integral(char* __first, char* __last, _Tp __value, int __base, false_
   return {__last, errc(0)};
 }
 
+_LIBCPP_HIDE_FROM_ABI inline _LIBCPP_CONSTEXPR_SINCE_CXX14 char __hex_to_upper(char __c) {
+  switch (__c) {
+  case 'a':
+    return 'A';
+  case 'b':
+    return 'B';
+  case 'c':
+    return 'C';
+  case 'd':
+    return 'D';
+  case 'e':
+    return 'E';
+  case 'f':
+    return 'F';
+  }
+  return __c;
+}
+
+#if _LIBCPP_STD_VER >= 17
+
+to_chars_result to_chars(char*, char*, bool, int = 10) = delete;
+
 template <typename _Tp, __enable_if_t<is_integral<_Tp>::value, int> = 0>
 inline _LIBCPP_CONSTEXPR_SINCE_CXX23 _LIBCPP_HIDE_FROM_ABI to_chars_result
 to_chars(char* __first, char* __last, _Tp __value) {
@@ -316,7 +334,7 @@ to_chars(char* __first, char* __last, _Tp __value, int __base) {
   _LIBCPP_ASSERT_UNCATEGORIZED(2 <= __base && __base <= 36, "base not in [2, 36]");
 
   using _Type = __make_32_64_or_128_bit_t<_Tp>;
-  return std::__to_chars_integral(__first, __last, static_cast<_Type>(__value), __base, is_signed<_Tp>());
+  return std::__to_chars_integral(__first, __last, static_cast<_Type>(__value), __base);
 }
 
 #endif // _LIBCPP_STD_VER >= 17
diff --git a/libcxx/include/__charconv/to_chars_result.h b/libcxx/include/__charconv/to_chars_result.h
index 8df0897a49fbbd..2ff7bd39ea3e1d 100644
--- a/libcxx/include/__charconv/to_chars_result.h
+++ b/libcxx/include/__charconv/to_chars_result.h
@@ -34,6 +34,15 @@ struct _LIBCPP_EXPORTED_FROM_ABI to_chars_result {
 
 #endif // _LIBCPP_STD_VER >= 17
 
+struct _LIBCPP_EXPORTED_FROM_ABI __to_chars_result {
+  char* __ptr;
+  errc __ec;
+
+#if _LIBCPP_STD_VER >= 17
+  _LIBCPP_HIDE_FROM_ABI constexpr operator to_chars_result() { return {__ptr, __ec}; }
+#endif
+};
+
 _LIBCPP_END_NAMESPACE_STD
 
 #endif // _LIBCPP___CHARCONV_TO_CHARS_RESULT_H
diff --git a/libcxx/include/__charconv/traits.h b/libcxx/include/__charconv/traits.h
index 2cb37c8cfb0238..36339c8c5ebf1b 100644
--- a/libcxx/include/__charconv/traits.h
+++ b/libcxx/include/__charconv/traits.h
@@ -29,15 +29,13 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-#if _LIBCPP_STD_VER >= 17
-
 namespace __itoa {
 
 template <typename _Tp, typename = void>
 struct _LIBCPP_HIDDEN __traits_base;
 
 template <typename _Tp>
-struct _LIBCPP_HIDDEN __traits_base<_Tp, __enable_if_t<sizeof(_Tp) <= sizeof(uint32_t)>> {
+struct _LIBCPP_HIDDEN __traits_base<_Tp, __enable_if_t<sizeof(_Tp) <= sizeof(uint32_t)> > {
   using type = uint32_t;
 
   /// The width estimation using a log10 algorithm.
@@ -63,7 +61,7 @@ struct _LIBCPP_HIDDEN __traits_base<_Tp, __enable_if_t<sizeof(_Tp) <= sizeof(uin
 };
 
 template <typename _Tp>
-struct _LIBCPP_HIDDEN __traits_base<_Tp, __enable_if_t<sizeof(_Tp) == sizeof(uint64_t)>> {
+struct _LIBCPP_HIDDEN __traits_base<_Tp, __enable_if_t<sizeof(_Tp) == sizeof(uint64_t)> > {
   using type = uint64_t;
 
   /// The width estimation using a log10 algorithm.
@@ -152,7 +150,7 @@ inline _LIBCPP_HIDE_FROM_ABI bool _LIBCPP_CONSTEXPR_SINCE_CXX23 __mul_overflowed
 
 template <typename _Tp>
 struct _LIBCPP_H...
[truncated]

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a LWG issue (please post the link)

@philnik777
Copy link
Contributor Author

The failures look unrelated.

@philnik777 philnik777 merged commit 15edf87 into llvm:main Mar 5, 2025
68 of 78 checks passed
@nico
Copy link
Contributor

nico commented Mar 5, 2025

We have a test that checks that streaming a pointer to an ostringstream prints a 0x prefix on non-win but not on win.

(I suppose the test is just supposed to document current behavior.)

This seems to change the behavior on windows to print a 0x prefix there too. Is that intentional?

@nico
Copy link
Contributor

nico commented Mar 5, 2025

@philnik777
Copy link
Contributor Author

We have a test that checks that streaming a pointer to an ostringstream prints a 0x prefix on non-win but not on win.

(I suppose the test is just supposed to document current behavior.)

This seems to change the behavior on windows to print a 0x prefix there too. Is that intentional?

Depends on what you mean by "intentional". We're aware that this changed and think that the improved performance is worth the change in behaviour. It also makes the output more consistent across platforms. Do you think this is important enough that we should try to mimic the platforms behaviour?

Also, one of these now seems to fail on chromeos: https://source.chromium.org/chromium/chromium/src/+/main:third_party/shell-encryption/src/int256_test.cc;l=527

Since int256 seems to be a custom type of yours, I'm not sure how it outputs stuff. Could you give a bit more detail on what change you observe?

@nico
Copy link
Contributor

nico commented Mar 5, 2025

Depends on what you mean by "intentional". We're aware that this changed and think that the improved performance is worth the change in behaviour. It also makes the output more consistent across platforms. Do you think this is important enough that we should try to mimic the platforms behaviour?

No, I was just wondering if it's intentional, since the commit description says "optimize" and normally optimizations don't change behavior, and it added a libcxx/docs/ReleaseNotes/20.rst change but didn't mention the behavior change. We can update the test if it's intentional.

@philnik777
Copy link
Contributor Author

Depends on what you mean by "intentional". We're aware that this changed and think that the improved performance is worth the change in behaviour. It also makes the output more consistent across platforms. Do you think this is important enough that we should try to mimic the platforms behaviour?

No, I was just wondering if it's intentional, since the commit description says "optimize" and normally optimizations don't change behavior, and it added a libcxx/docs/ReleaseNotes/20.rst change but didn't mention the behavior change. We can update the test if it's intentional.

Yeah, maybe we should add a release notes about that. Seems like it's worth mentioning.

@nico
Copy link
Contributor

nico commented Mar 11, 2025

Did you end up adding a release notes entry? (This would be useful to understand what exactly the intentional change is, and to point our upstream projects to as motivation for updating their tests.)

I reduced the int256 thing mentioned above to

#include <algorithm>
#include <sstream>
#include <utility>

#include <gtest/gtest.h>

struct MyType {};

std::ostream& operator<<(std::ostream& o, const MyType& b) {
  return o << 0;
}

TEST(Int256, OStream) {
  struct {
    std::ios_base::fmtflags flags;
    const char* rep;
  } cases[] = {
      // showbase does nothing on zero
      { std::ios::dec | std::ios::showbase, "0"},
      { std::ios::oct | std::ios::showbase, "0"},
      { std::ios::hex | std::ios::showbase, "0"},
  };
  for (size_t i = 0; i < ABSL_ARRAYSIZE(cases); ++i) {
    std::ostringstream os;
    os.flags(cases[i].flags);
    MyType type;
    os << type;
    EXPECT_EQ(cases[i].rep, os.str());
  }
}

That used to pass; now it outputs 0, 00, and 0x0 instead. (I.e. showbase on 0 used to not add a prefix and now it does.)

I'm guessing that's also intentional?

nico added a commit to nico/shell-encryption that referenced this pull request Mar 11, 2025
Upsteam libc++ is now also showing the octal `0` and hex `0x`
prefixes with zero, see
llvm/llvm-project#120859

These tests were failing with upstream libc++. The previous three
lines test this enough; just remove the tests that now have
inconsistent output across C++ standard libraries.
kwlyeo pushed a commit to google/shell-encryption that referenced this pull request Mar 12, 2025
Upsteam libc++ is now also showing the octal `0` and hex `0x`
prefixes with zero, see
llvm/llvm-project#120859

These tests were failing with upstream libc++. The previous three
lines test this enough; just remove the tests that now have
inconsistent output across C++ standard libraries.
@alexfh
Copy link
Contributor

alexfh commented Mar 14, 2025

No, I was just wondering if it's intentional, since the commit description says "optimize" and normally optimizations don't change behavior, and it added a libcxx/docs/ReleaseNotes/20.rst change but didn't mention the behavior change. We can update the test if it's intentional.

Yeah, maybe we should add a release notes about that. Seems like it's worth mentioning.

It's definitely worth adding release notes, especially given that the new behavior diverges both from the previous libc++ and the current libstdc++ behavior - https://gcc.godbolt.org/z/zK5689Y8P.

@alexfh
Copy link
Contributor

alexfh commented Mar 14, 2025

Actually, this behavior change affects a non-trivial amount of code, both Google-internal and open-source (for now, I found breakages in Abseil and tinyformat - https://github.com/c42f/tinyformat). I wonder if the C++ standard says anything about the particular formatting that should be used in these cases? https://gcc.godbolt.org/z/zK5689Y8P

@alexfh
Copy link
Contributor

alexfh commented Mar 14, 2025

I was pointed at https://bugzilla.redhat.com/show_bug.cgi?id=166735, which seems to claim that 0x0 should just be 0, IIUC.

@alexfh
Copy link
Contributor

alexfh commented Mar 14, 2025

The relevant parts of the recent C++ and C standards are: https://eel.is/c++draft/facet.num.put.virtuals, https://eel.is/c++draft/intro.refs, and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf 7.21.6.1 paragraph 6 (page 229).

@mordante
Copy link
Member

I was pointed at https://bugzilla.redhat.com/show_bug.cgi?id=166735, which seems to claim that 0x0 should just be 0, IIUC.

@nico @alexfh Thanks for the information. I agree that # for octal and hexadeximal should print 0 instead of 00 and 0x0.

@alexfh
Copy link
Contributor

alexfh commented Mar 15, 2025

I was pointed at https://bugzilla.redhat.com/show_bug.cgi?id=166735, which seems to claim that 0x0 should just be 0, IIUC.

@nico @alexfh Thanks for the information. I agree that # for octal and hexadeximal should print 0 instead of 00 and 0x0.

Apart from that, showpos should always print sign, i.e. for 0 the formatting output should be +0 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf says + The result of a signed conversion always begins with a plus or minus sign.)

As for pointers with null value, The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner., so this part isn't regulated by the standard.

@ldionne @philnik777 do you agree with the above arguments?

@alexfh
Copy link
Contributor

alexfh commented Mar 17, 2025

I've sent #131613 to test the feasibility of a revert at this point, but a timely fix forward is definitely an option as well. @ldionne @philnik777 what's your take on this?

alexfh added a commit that referenced this pull request Mar 18, 2025
Reverts #120859

This change breaks formatting of `0` with `std::showbase` + `std::hex`
or `std::oct`, as well as `+0` with `std::showpos`. I believe the new
behavior is violating the standard. See
#120859 (comment)
and later comments for details and explanation.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 18, 2025
…1613)

Reverts llvm/llvm-project#120859

This change breaks formatting of `0` with `std::showbase` + `std::hex`
or `std::oct`, as well as `+0` with `std::showpos`. I believe the new
behavior is violating the standard. See
llvm/llvm-project#120859 (comment)
and later comments for details and explanation.
@ldionne
Copy link
Member

ldionne commented Mar 18, 2025

Thanks a lot for the investigation. I filed #131710 for @philnik777 to take a look at this again when he's back from vacation.

FWIW I'm not surprised that our tests in this area were not 100% sufficient: we talked about it during the review but I'm not very surprised that something was missed, since this is a very widely used API. During review we had discussed testing against other implementations as well but I think this wasn't followed through -- when we re-land this we should try to also increase coverage.

jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
```
-------------------------------------------------------
Benchmark                              old          new
-------------------------------------------------------
BM_num_put<bool>                   76.2 ns      32.0 ns
BM_num_put<long>                   76.9 ns      33.1 ns
BM_num_put<long long>              77.9 ns      34.2 ns
BM_num_put<unsigned long>          78.4 ns      33.1 ns
BM_num_put<unsigned long long>     78.0 ns      34.4 ns
BM_num_put<double>                  224 ns       228 ns
BM_num_put<long double>             239 ns       230 ns
BM_num_put<const void*>            68.7 ns      35.1 ns
```

Fixes llvm#40109.
@philnik777 philnik777 deleted the optimize_num_put branch March 29, 2025 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ostream uses snprintf for formatting which results in 2x slower performance than libstdc++
6 participants