-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add support for char8_t (from C++20). #26153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for char8_t (from C++20). #26153
Conversation
lib/PrintAsObjC/PrintAsObjC.cpp
Outdated
@@ -2579,6 +2580,7 @@ class ModuleWriter { | |||
"# if __has_include(<uchar.h>)\n" | |||
"# include <uchar.h>\n" | |||
"# elif !defined(__cplusplus)\n" | |||
"typedef char char8_t;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: this should probably be unsigned char
, since that's how it's defined in the standard. (Actual citation: C++20 N4820 [basic.fundamental]p9.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -100,6 +100,9 @@ public typealias CLongDouble = Double | |||
/// The C++ 'wchar_t' type. | |||
public typealias CWideChar = Unicode.Scalar | |||
|
|||
/// The C++20 'char8_t' type, which has UTF-8 encoding. | |||
public typealias CChar8 = UInt8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@airspeedswift, how are we handling availability for new top-level typealiases? These can't affect ABI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I suggested Varun pick this up as a starter-ish task, but this is a public change to the stdlib, and CChar8
is close to CChar
. Does it need formal review or even a proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, given that WG14 is working on a TN for char16_t
and char32_t
, I could see char8_t
being introduced into C, where it gets interesting with -fsigned-char
and -fno-signed-char
. I think that on Darwin targets where you default to -fsigned-char
, having CChar
be signed char
, and CChar8
be unsigned char
is slightly scary. Might be worth bringing this up through a proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compnerd but what's the alternative, really? Yeah, it would be slightly messy, but there's really no other option on the table except maybe "just don't import these types". I think this is a thing that we just live with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that "just don't import these types" is a viable solution (particularly long term). Sure, we can get away with it for the time being, but, I don't think that we would be able to escape it forever.
Now, I'm not saying that this is what we should do or even that what I'm suggesting is good. Hopefully someone can come up with something more clever.
We could import this as CXX.Char8
(where CXX
is a @frozen
enum which I believe should allow inling).
Yes, I am a bad person doing bad things to the compiler :-(.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, non-generic containing contexts don't really affect codegen at all for nested types. So yeah, that is technically an option. It's a weird choice if C22 adopts char8_t
as well, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we could introduce CChar8
at that time for the C char8_t
and if they make it compatible with char
(including -fsigned-char
), we can handle both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see char8_t
ever being signed; it's defined as a UTF-8 code unit. And we can't introduce CChar8
separate from CXX.Char8
because they'd be different types you could overload on (unless one is a typealias).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, they're all typealiases. Retracted, at least partially.
I think it's correct not to modify |
Fixes rdar://39988329.
I don't quite understand what the status of the discussion is ATM or what my next steps should be:
I'll run the tests for now: @swift-ci please smoke test. |
I think we should probably just put it on ice, and assign the Radar over to @airspeedswift or @stephentyrone, saying "here's an implementation for if and when we decide we can do this". Sorry for putting it in the list! |
Closing the PR since we're not merging this right now but someone can cherry-pick the commit if they'd like to do so when the Radar is being fixed. |
That's the entire point of creating the char8_t type in the first place. char16_t and char32_t were created to get around platform specific behavior with wchar_t, char8_t is defined to be unsigned so it works across platforms so it can store the 8 bit code units used by the UTF-8 encoding. |
https://en.cppreference.com/w/cpp/keyword/char8_t This is based on a patch from Varun Gandhi: #26153 rdar://39988329 / resolves #68726
https://en.cppreference.com/w/cpp/keyword/char8_t This is based on a patch from Varun Gandhi: #26153 rdar://39988329 / resolves #68726
Fixes rdar://39988329.
I believe I've covered all the places where the related types
char16_t
andchar32_t
also appear. One question though -- what should be done about stdlib-stable.json? That file is marked as read-only, so I have not added an entry forCChar8
there.