-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[C99] Claim conformance to WG14 N717 #87228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f8e130d
[C99] Claim conformance to WG14 N717
AaronBallman 23f7554
Merge remote-tracking branch 'origin/main' into aballman-wg14-n717
AaronBallman 9638a52
Update based on review comments
AaronBallman 2237fa1
Update the test based on offline discussions
AaronBallman b870898
Switch from character constant to string constant
AaronBallman 10db0df
Add some more tests based on offline discussions
AaronBallman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
// RUN: %clang_cc1 -verify -std=c99 %s | ||
// RUN: %clang_cc1 -verify -std=c99 -fno-dollars-in-identifiers %s | ||
|
||
/* WG14 N717: Clang 17 | ||
* Extended identifiers | ||
*/ | ||
|
||
// Used as a sink for UCNs. | ||
#define M(arg) | ||
|
||
// C99 6.4.3p1 specifies the grammar for UCNs. A \u must be followed by exactly | ||
// four hex digits, and \U must be followed by exactly eight. | ||
M(\u1) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\u12) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\u123) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\u1234) // Okay | ||
M(\u12345)// Okay, two tokens (UCN followed by 5) | ||
|
||
M(\U1) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U12) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U123) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U1234) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} \ | ||
expected-note {{did you mean to use '\u'?}} | ||
M(\U12345) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U123456) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U1234567) // expected-warning {{incomplete universal character name; treating as '\' followed by identifier}} | ||
M(\U12345678) // Okay | ||
M(\U123456789) // Okay-ish, two tokens (valid-per-spec-but-actually-invalid UCN followed by 9) | ||
|
||
// Now test the ones that should work. Note, these work in C17 and earlier but | ||
// are part of the basic character set in C23 and thus should be diagnosed in | ||
// that mode. They're valid in a character constant, but not valid in an | ||
// identifier, except for U+0024 which is allowed if -fdollars-in-identifiers | ||
// is enabled. | ||
// FIXME: These three should be handled the same way, and should be accepted | ||
// when dollar signs are allowed in identifiers, rather than rejected, see | ||
// GH87106. | ||
M(\u0024) // expected-error {{character '$' cannot be specified by a universal character name}} | ||
M(\U00000024) // expected-error {{character '$' cannot be specified by a universal character name}} | ||
M($) | ||
|
||
// These should always be rejected because they're not valid identifier | ||
// characters. | ||
// FIXME: the diagnostic could be improved to make it clear this is an issue | ||
// with forming an identifier rather than a UCN. | ||
M(\u0040) // expected-error {{character '@' cannot be specified by a universal character name}} | ||
M(\u0060) // expected-error {{character '`' cannot be specified by a universal character name}} | ||
M(\U00000040) // expected-error {{character '@' cannot be specified by a universal character name}} | ||
M(\U00000060) // expected-error {{character '`' cannot be specified by a universal character name}} | ||
|
||
// UCNs outside of identifiers are handled in Phase 5 of translation, so we | ||
// cannot use the macro expansion to test their behavior. | ||
|
||
// This is outside of the range of values specified by ISO 10646. | ||
const char *c1 = "\U00110000"; // expected-error {{invalid universal character}} | ||
// This does not fall outside of the range | ||
const char *c2 = "\U0010FFFF"; | ||
|
||
// These should always be accepted because they're a valid in a character | ||
// constant. | ||
int c3 = '\u0024'; | ||
int c4 = '\u0040'; | ||
int c5 = '\u0060'; | ||
|
||
int c6 = '\U00000024'; | ||
int c7 = '\U00000040'; | ||
int c8 = '\U00000060'; | ||
|
||
// Valid lone surrogates. | ||
M(\uD799) | ||
const char *c9 = "\U0000E000"; | ||
|
||
// Invalid lone surrogates, which are excluded explicitly by 6.4.3p2. | ||
M(\uD800) // expected-error {{invalid universal character}} | ||
const char *c10 = "\U0000DFFF"; // expected-error {{invalid universal character}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what was not supported before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UCNs weren't supported in C89 but were added in C99, so this is testing UCNs as specified in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n717.htm but based on whatever was in the final text of the C99 standard (so it incorporates changes from NB comments, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, before clang 17. I don't remember we changed anything there (for C99) in a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! We were missing some diagnostics: https://godbolt.org/z/hEKsEKqzT