Skip to content

[libc++] Do not guard inclusion of wchar.h with _LIBCPP_HAS_WIDE_CHARACTERS #126924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

stevew817
Copy link
Contributor

@stevew817 stevew817 commented Feb 12, 2025

mbstate_t needs to be visible to libcpp, even when it is not providing wide
character functionality (i.e. _LIBCPP_HAS_WIDE_CHARACTERS is turned off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of mbstate_t in <wchar.h> even though they do not
come with wide character functions.

Since there is a way to conditionally include the C library's <wchar.h>
only if it exists, we should rely on the fact that if it exists, it will
provide mbstate_t. Removing this guard will allow using libc++ on top of
newlib-nano/picolibc while not breaking the cases where it is used on top
of a C library which doesn't provide <wchar.h> (since it would then still
go look for <uchar.h> or error out).

@stevew817 stevew817 requested a review from a team as a code owner February 12, 2025 15:26
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Feb 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2025

@llvm/pr-subscribers-libcxx

Author: Steven Cooreman (stevew817)

Changes

mbstate_t needs to be visible to libc++, even when it is not relying on wide character functionality in the C library. C90 amendment 1 says that this type is to be visible through wchar.h. That header file should be provided by the C standard library even when that library is not implementing wchar functions (such as newlib with --nano-formatted-io).

The reason this guard was in place is purely historical, I believe. Tracing the modification history, I can't find any reason other than maybe has_include_next not having been in place at the time of creation of this file. Looking at the other include chains, libc++'s &lt;wchar.h&gt; already unconditionally includes the underlying C lib's wchar.h, so I don't really see why this guard needs to still be in place. If the C lib is providing wchar.h, it can be used to get hold of mbstate_t regardless of whether or not libc++ is allowed to call the wide char functions or not.

Tagging @ldionne for review according to the maintainer file.


Full diff: https://github.com/llvm/llvm-project/pull/126924.diff

1 Files Affected:

  • (modified) libcxx/include/__mbstate_t.h (+4-4)
diff --git a/libcxx/include/__mbstate_t.h b/libcxx/include/__mbstate_t.h
index e013384454b41..e32c1dbd0c7ab 100644
--- a/libcxx/include/__mbstate_t.h
+++ b/libcxx/include/__mbstate_t.h
@@ -43,12 +43,12 @@
 #  include <bits/types/mbstate_t.h> // works on most Unixes
 #elif __has_include(<sys/_types/_mbstate_t.h>)
 #  include <sys/_types/_mbstate_t.h> // works on Darwin
-#elif _LIBCPP_HAS_WIDE_CHARACTERS && __has_include_next(<wchar.h>)
-#  include_next <wchar.h> // fall back to the C standard provider of mbstate_t
+#elif __has_include_next(<wchar.h>)
+#  include_next <wchar.h> // user the C standard provider of mbstate_t if present
 #elif __has_include_next(<uchar.h>)
-#  include_next <uchar.h> // <uchar.h> is also required to make mbstate_t visible
+#  include_next <uchar.h> // <uchar.h> can alternatively provide mbstate_t
 #else
-#  error "We don't know how to get the definition of mbstate_t without <wchar.h> on your platform."
+#  error "We don't know how to get the definition of mbstate_t on your platform."
 #endif
 
 #endif // _LIBCPP___MBSTATE_T_H

@mordante
Copy link
Member

Thanks for your patch! Running libc++ without wide characters is not a standard conforming implementation. Can you explain what problem this solves (for you)?

@stevew817
Copy link
Contributor Author

@mordante Sure! And thanks for taking a look. The context for this is bare-metal Cortex-M class applications, where code size is of paramount concern. 99% of our applications are built with newlib-nano (aka nanolib), as shipped with ARMGCC, as the C lib. We are looking at adding support for ARM LLVM in addition to ARM GCC, while not breaking too much in existing codebases. That means providing newlib-nano/nanolib as a C library, since migrating to picolibc means having to change more than just the buildline. But since nanolib doesn't provide wchar functionality, the C++ library on top of it can't make use of wchar functions either. Which I assume is the point of even having the _LIBCPP_HAS_WIDE_CHARACTERS flag.

For more background on where this particular change originated from, see arm/arm-toolchain#60

Picolibc has been hitting the same snag, by the way, but they decided to work around it by adding a mock bits/types/mbstate_t.h, thereby making use of the include path for "most unixes" even though picolibc is absolutely not a Unix.

This change would allow picolibc to get rid of that workaround, as well as enable arm-toolchain's build of newlib-nano to not have to be patched to work around this either.

@mordante mordante requested a review from ldionne February 12, 2025 19:31
@mordante
Copy link
Member

Thanks for the additional information. It would be great to have some of that information in the original submission so it will be part of the commit. This code is quite sensitive to changes to I'd like @ldionne to have a look.

…ACTERS

mbstate_t needs to be visible to libcpp, even when it is not providing wide
character functionality (i.e. _LIBCPP_HAS_WIDE_CHARACTERS is turned off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of mbstate_t in <wchar.h> even though they do not
come with wide character functions.

Since there is a way to conditionally include the C library's <wchar.h>
only if it exists, we should rely on the fact that if it exists, it will
provide mbstate_t. Removing this guard will allow using libc++ on top of
newlib-nano/picolibc while not breaking the cases where it is used on top
of a C library which doesn't provide <wchar.h> (since it would then still
go look for <uchar.h> or error out).
@stevew817 stevew817 force-pushed the libcpp/mbstate_without_wchar_support branch from 2689a99 to de6e558 Compare February 14, 2025 13:55
@stevew817
Copy link
Contributor Author

It would be great to have some of that information in the original submission so it will be part of the commit.

Thanks for the feedback. I edited the commit log to be clearer about this, and fixed a spelling error I inadvertently introduced. Does this look better to you now?

@mordante
Copy link
Member

It would be great to have some of that information in the original submission so it will be part of the commit.

Thanks for the feedback. I edited the commit log to be clearer about this, and fixed a spelling error I inadvertently introduced. Does this look better to you now?

Yes thanks!
There was a minor issue. The final commit message will be the message in the PR not the message in the (first) commit. I've updated the PR message with your wording.

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, pending CI. I suspect there's a reason why I didn't want to include wchar.h if wide character support was disabled, but I don't know what it was. If the CI's passing, this should be OK.

@ldionne ldionne added the pending-ci Merging the PR is only pending completion of CI label Feb 17, 2025
@stevew817
Copy link
Contributor Author

Thanks for the approval! May I ask which CI this is pending on, @ldionne ? I don’t see any pending checks?

@philnik777
Copy link
Contributor

@stevew817 The CI probably wasn't finished when Louis looked at it.

@philnik777 philnik777 merged commit 7620011 into llvm:main Feb 18, 2025
77 checks passed
Copy link

@stevew817 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Mar 25, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 2, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 17, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Apr 30, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request May 15, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request May 29, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Jun 13, 2025
…ACTERS (llvm#126924)

`mbstate_t` needs to be visible to libcpp, even when it is not providing
wide
character functionality (i.e. `_LIBCPP_HAS_WIDE_CHARACTERS` is turned
off)
and thus not using any of the C library's wide character functions.

There are C libraries (such as newlib-nano/nanolib/picolibc) which do
provide their definition of `mbstate_t` in `<wchar.h>` even though they
do not
come with wide character functions.

Since there is a way to conditionally include the C library's
`<wchar.h>`
only if it exists, we should rely on the fact that if it exists, it will
provide `mbstate_t`. Removing this guard will allow using libc++ on top
of
newlib-nano/picolibc while not breaking the cases where it is used on
top
of a C library which doesn't provide `<wchar.h>` (since it would then
still
go look for `<uchar.h>` or error out).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pending-ci Merging the PR is only pending completion of CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants