Skip to content

[libc++] Fix bitset conversion functions and refactor constructor #121348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 24, 2025

Conversation

winner245
Copy link
Contributor

@winner245 winner245 commented Dec 30, 2024

This patch addresses several implementation issues in bitset's conversion functions to_ullong and to_ulong, and refactors its converting constructor __bitset(unsigned long long __v) to a more generic and elegant implementation.

to_ullong:

for (size_t __i = 1; __i < sizeof(unsigned long long) / sizeof(__storage_type); ++__i)
__r |= static_cast<unsigned long long>(__first_[__i]) << (sizeof(__storage_type) * CHAR_BIT);

The existing implementation incorrectly concatenates multiple words using a fixed shifting amount __bits_per_word inside a loop. The correct shifting amount should be __i * __bits_per_word, which depends on the loop index __i. This is now correctly implemented in this patch.

Additionally, the previous implementation relied on tag-dispatching, which became difficult to follow due to the combination of multiple tags. This patch improves readability by replacing tag-dispatching with constexpr if, placing conditions in-line for a more straightforward and maintainable approach.

to_ulong:

template <size_t _Size>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long long __bitset<1, _Size>::to_ullong() const {
return __first_;
}

The to_ulong(unsigned long ) function in the one-word specialization template __bitset<1, _Size> is not standard-conforming as it unconditionally returns __first_, while the standard requires that it throws std::overflow_error if the value cannot fit within unsigned long (i.e., sizeof(size_t) > sizeof(unsigned long)). On LLP64 (e.g., Windows and MinGW), we have sizeof(size_t) = 64 and sizeof(unsigned long) = 32 (according to: https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models), and in this case, we should throw std::overflow_error. This is now fixed in this PR.

__bitset(unsigned long long __v):

The converting constructor is now refactored using a variadic template and an internal implementation of integer_sequence (to support C++03). This approach eliminates conditional preprocessing statements, improving readability and maintainability.

Copy link
Contributor

@frederick-vs-ja frederick-vs-ja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I think the change is actually "Fix possible out of range access in bitset::to_ullong implementation". The current title and PR description are indescriptive to me.

The intented change look good to me. But I guess we need far more changes to properly support 16-bit platforms. Has this been discussed before?

@winner245 winner245 changed the title [libc++] Improve bitset::to_ullong Implementation [libc++] Fix possible out of range access in bitset::to_ullong implementation Dec 31, 2024
@winner245 winner245 force-pushed the improve-to_ullong branch 10 times, most recently from 94e34bb to f85d6e7 Compare January 2, 2025 22:19
Copy link

github-actions bot commented Jan 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@winner245 winner245 force-pushed the improve-to_ullong branch 16 times, most recently from 67e021d to c86557e Compare January 3, 2025 19:13
@winner245
Copy link
Contributor Author

This patch is not just addressing issue for sizeof(size_t) == 2 only. It actually addresses all cases where the ratio sizeof(unsigned long long) / sizeof(size_t) != 1. It also fixes several issues in the previous implementation. The previous implementation, for example to_ullong, already considered the cases sizeof(unsigned long long) / sizeof(size_t) > 1, which includes sizeof(size_t) == 2 as a special case. However, its implementation is incorrect in that it concatenate the words with a fixed shifting amount __bits_per_word inside a for loop, where the correct shifting amount is i * __bits_per_word. Also, the previous implementation of __bitset<1, _Size>::to_ulong is incorrect for LLP64 with sizeof(size_t) = 64 and sizeof(unsigned long) = 32, where we should throw std::overflow_error.

This patch fixes the previous implementation for different scenarios with sizeof(unsigned long long) / sizeof(size_t) != 1. It also refactors the implementation to be more readable. Previously, the implementation was based on tag-dispatching, which is very misleading given that we have more that one tag in to_ullong. For example, it's really hard to understand what the different combinations of the tags to_ullong(false_type | true_type, false_type | true_type) mean, and we have to jump around to figure out. This patch makes it easier to read by using constexpr if with the conditions placed in-line.

@philnik777
Copy link
Contributor

@winner245 You're still adding branches here which are effectively dead code AFAICT. I'm not against fixing bugs, but I am against introducing effectively dead code. Given that your commit message starts with that case, it seems like it is the most important part of the patch. If it's not, I'd rip these parts out to make the actual intention clear.

@ldionne
Copy link
Member

ldionne commented Jun 11, 2025

@winner245 You're still adding branches here which are effectively dead code AFAICT. I'm not against fixing bugs, but I am against introducing effectively dead code. Given that your commit message starts with that case, it seems like it is the most important part of the patch. If it's not, I'd rip these parts out to make the actual intention clear.

I think I'm not following fully here. Looking at the code after the patch, I don't see what part of it is dead code. We do have platforms where size_t is 32 bits, which means the ratio between unsigned long long (64 bits) and size_t is 2. It is true that we don't officially support any platform where size_t is 16 bits, but I don't think this patch introduces any specific code to handle this. It just handles all sizes generically in a correct way, and this happens to also work for a 16 bit size_t. Perhaps the commit message needs to be adjusted to explain the motivation more clearly?

If I missed something and there is indeed dead code being introduced, then I agree that we shouldn't be checking in dead code -- but I'm not seeing anything dead in the latest version of the patch.

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I missed something and there is indeed dead code being introduced, then I agree that we shouldn't be checking in dead code -- but I'm not seeing anything dead in the latest version of the patch.

I've left a comment where I think we're introducing dead code.

Edit:

Perhaps the commit message needs to be adjusted to explain the motivation more clearly?

I think so. Starting with something that happens to work now and explain extensively why isn't exactly screaming "falls out of this patch" but rather "this is the main goal of this patch".

Comment on lines 620 to 621
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long to_ulong() const { return 0UL; }
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 unsigned long long to_ullong() const { return 0ULL; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

Comment on lines 9 to 10
// UNSUPPORTED: no-exceptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only guard the cases where exceptions are required. Otherwise we lose all coverage for to_ullong with exceptions disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree we should only guard the test cases which actually check exceptions. I am now using #ifndef TEST_HAS_NO_EXCEPTIONS around the test cases.


// unsigned long may span multiple words which are concatenated to form the result
template <typename _StorageType = __storage_type,
__enable_if_t<sizeof(_StorageType) < sizeof(unsigned long), int> = 0>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never true AFAICT. On all 32 bit platforms sizeof(unsigned long) is at most 32 bits, and sizeof(_StorageType) (i.e. sizeof(size_t)) should always be 32 bits as well.

On 64 bit platforms sizeof(size_t) is always eight, and I'm quite certain there is no platform where unsigned long is larger than that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, I've removed this code branch, and I am now focusing only on the case where sizeof(size_t) >= sizeof(unsigned long).

@@ -80,7 +80,7 @@ public:
constexpr bool operator[](size_t pos) const;
reference operator[](size_t pos); // constexpr since C++23
unsigned long to_ulong() const; // constexpr since C++23
unsigned long long to_ullong() const; // constexpr since C++23
unsigned long long to_ullong() const; // since C++11, constexpr since C++23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is LWG694. I don't think LWG694 retroactively applied to C++03 because long long was added in C++11 as a new feature. But since libc++ supports many C++11 features in C++03 mode, perhaps it's also fine not to mention "since C++11".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't mention "sine C++11" in most places, since we back-port a humongous amount of code from C++11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this function was in C++11 standard mode, but we backported it to C++03. So I've now removed the "since C++11" description.

Comment on lines 530 to 522
// TODO: This is a workaround for a gdb test failure (gdb_pretty_printer_test.sh.cpp) in
// stage1 CI (generic-gcc, gcc-14, g++-14), due to the __bits_per_word name lookup failure
// if not referenced in the constructor initializer list.
// See: https://github.com/llvm/llvm-project/actions/runs/15071518915/job/42368867929?pr=121348#logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: This is a workaround for a gdb test failure (gdb_pretty_printer_test.sh.cpp) in
// stage1 CI (generic-gcc, gcc-14, g++-14), due to the __bits_per_word name lookup failure
// if not referenced in the constructor initializer list.
// See: https://github.com/llvm/llvm-project/actions/runs/15071518915/job/42368867929?pr=121348#logs
// TODO: We must refer to __bits_per_word in order to work around an issue with the GDB pretty-printers.
// Without it, the pretty-printers complain about a missing __bits_per_word member. This must
// be investigated.

@winner245 winner245 changed the title [libc++] Fix possible out of range access in bitset [libc++] Fix bitset conversion functions and refactor constructor Jun 13, 2025
@winner245 winner245 force-pushed the improve-to_ullong branch from cf28025 to 958a1c3 Compare June 13, 2025 15:52
Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves my concerns, thanks! I'll leave the rest of the review to Louis.

Comment on lines 231 to 232
"bitset only supports platforms where sizeof(size_t) >= sizeof(unsigned long), such as 32-bit and "
"64-bit platforms");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"bitset only supports platforms where sizeof(size_t) >= sizeof(unsigned long), such as 32-bit and "
"64-bit platforms");
"libc++ only supports platforms where sizeof(size_t) >= sizeof(unsigned long), such as 32-bit and "
"64-bit platforms. If you're interested in supporting a platform where that is not the case, please contact the libc++ developers.");

Otherwise people might think that we have no interest in supporting their platform.

@winner245
Copy link
Contributor Author

Perhaps the commit message needs to be adjusted to explain the motivation more clearly?

I think so. Starting with something that happens to work now and explain extensively why isn't exactly screaming "falls out of this patch" but rather "this is the main goal of this patch".

I've completely rewritten the commit message, which now has a much clearer motivation.

@winner245 winner245 force-pushed the improve-to_ullong branch from 958a1c3 to 746e44b Compare June 13, 2025 16:31
@winner245 winner245 force-pushed the improve-to_ullong branch from 746e44b to db4565b Compare June 16, 2025 14:46
@winner245 winner245 force-pushed the improve-to_ullong branch from 3f7d57b to 3700be1 Compare June 18, 2025 21:45
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@ldionne ldionne merged commit d807661 into llvm:main Jun 24, 2025
268 of 279 checks passed
DrSergei pushed a commit to DrSergei/llvm-project that referenced this pull request Jun 24, 2025
…vm#121348)

This patch addresses several implementation issues in `bitset`'s
conversion functions `to_ullong` and `to_ulong`, and refactors its
converting constructor `__bitset(unsigned long long __v)` to a more
generic and elegant implementation.
@winner245 winner245 deleted the improve-to_ullong branch June 24, 2025 21:57
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…vm#121348)

This patch addresses several implementation issues in `bitset`'s
conversion functions `to_ullong` and `to_ulong`, and refactors its
converting constructor `__bitset(unsigned long long __v)` to a more
generic and elegant implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants