Skip to content

Add support for char8_t (from C++20). #26153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add support for char8_t (from C++20). #26153

wants to merge 1 commit into from

Conversation

varungandhi-apple
Copy link
Contributor

@varungandhi-apple varungandhi-apple commented Jul 15, 2019

Fixes rdar://39988329.

I believe I've covered all the places where the related types char16_t and char32_t also appear. One question though -- what should be done about stdlib-stable.json? That file is marked as read-only, so I have not added an entry for CChar8 there.

@@ -2579,6 +2580,7 @@ class ModuleWriter {
"# if __has_include(<uchar.h>)\n"
"# include <uchar.h>\n"
"# elif !defined(__cplusplus)\n"
"typedef char char8_t;\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: this should probably be unsigned char, since that's how it's defined in the standard. (Actual citation: C++20 N4820 [basic.fundamental]p9.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -100,6 +100,9 @@ public typealias CLongDouble = Double
/// The C++ 'wchar_t' type.
public typealias CWideChar = Unicode.Scalar

/// The C++20 'char8_t' type, which has UTF-8 encoding.
public typealias CChar8 = UInt8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airspeedswift, how are we handling availability for new top-level typealiases? These can't affect ABI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suggested Varun pick this up as a starter-ish task, but this is a public change to the stdlib, and CChar8 is close to CChar. Does it need formal review or even a proposal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, given that WG14 is working on a TN for char16_t and char32_t, I could see char8_t being introduced into C, where it gets interesting with -fsigned-char and -fno-signed-char. I think that on Darwin targets where you default to -fsigned-char, having CChar be signed char, and CChar8 be unsigned char is slightly scary. Might be worth bringing this up through a proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compnerd but what's the alternative, really? Yeah, it would be slightly messy, but there's really no other option on the table except maybe "just don't import these types". I think this is a thing that we just live with.

Copy link
Member

@compnerd compnerd Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that "just don't import these types" is a viable solution (particularly long term). Sure, we can get away with it for the time being, but, I don't think that we would be able to escape it forever.

Now, I'm not saying that this is what we should do or even that what I'm suggesting is good. Hopefully someone can come up with something more clever.

We could import this as CXX.Char8 (where CXX is a @frozen enum which I believe should allow inling).

Yes, I am a bad person doing bad things to the compiler :-(.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, non-generic containing contexts don't really affect codegen at all for nested types. So yeah, that is technically an option. It's a weird choice if C22 adopts char8_t as well, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could introduce CChar8 at that time for the C char8_t and if they make it compatible with char (including -fsigned-char), we can handle both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see char8_t ever being signed; it's defined as a UTF-8 code unit. And we can't introduce CChar8 separate from CXX.Char8 because they'd be different types you could overload on (unless one is a typealias).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, they're all typealiases. Retracted, at least partially.

@jrose-apple
Copy link
Contributor

I think it's correct not to modify stdlib-stable.json. That describes the state of the stdlib as it was in Swift 5.0, i.e. what it needs to be compatible with.

@varungandhi-apple
Copy link
Contributor Author

I don't quite understand what the status of the discussion is ATM or what my next steps should be:

  • Do I submit a written proposal for this, specifying the semantics, particularly what happens if/when C adds char8_t and the user specifies -fsigned-char/-funsigned-char?
  • Wait for some more discussion/consensus between the reviewers?
  • Something else?

I'll run the tests for now: @swift-ci please smoke test.

@jrose-apple
Copy link
Contributor

I think we should probably just put it on ice, and assign the Radar over to @airspeedswift or @stephentyrone, saying "here's an implementation for if and when we decide we can do this". Sorry for putting it in the list!

@varungandhi-apple
Copy link
Contributor Author

varungandhi-apple commented Nov 20, 2019

Closing the PR since we're not merging this right now but someone can cherry-pick the commit if they'd like to do so when the Radar is being fixed.

@MarcusJohnson91
Copy link

Do I submit a written proposal for this, specifying the semantics, particularly what happens if/when C adds char8_t and the user specifies -fsigned-char/-funsigned-char?

That's the entire point of creating the char8_t type in the first place.

char16_t and char32_t were created to get around platform specific behavior with wchar_t, char8_t is defined to be unsigned so it works across platforms so it can store the 8 bit code units used by the UTF-8 encoding.

egorzhdan added a commit that referenced this pull request Sep 26, 2024
https://en.cppreference.com/w/cpp/keyword/char8_t

This is based on a patch from Varun Gandhi: #26153

rdar://39988329 / resolves #68726
egorzhdan added a commit that referenced this pull request Sep 27, 2024
https://en.cppreference.com/w/cpp/keyword/char8_t

This is based on a patch from Varun Gandhi: #26153

rdar://39988329 / resolves #68726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants