-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang] Allow raw string literals in C as an extension #88265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-format @llvm/pr-subscribers-clang-driver Author: None (Sirraide) ChangesThis is a tentative implementation of support for raw string literals in C following the discussion on #85703. GCC supports raw string literals in C in
Full diff: https://github.com/llvm/llvm-project/pull/88265.diff 9 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index f96cebbde3d825..20d14130fb62bc 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -43,6 +43,9 @@ code bases.
C/C++ Language Potentially Breaking Changes
-------------------------------------------
+- Clang now supports raw string literals in ``-std=gnuXY`` mode as an extension in
+ C. This behaviour can also be overridden using ``-f[no-]raw-string-literals``.
+
C++ Specific Potentially Breaking Changes
-----------------------------------------
- Clang now diagnoses function/variable templates that shadow their own template parameters, e.g. ``template<class T> void T();``.
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 8ef6700ecdc78e..96bd339bb1851d 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -454,6 +454,8 @@ LANGOPT(MatrixTypes, 1, 0, "Enable or disable the builtin matrix type")
LANGOPT(CXXAssumptions, 1, 1, "Enable or disable codegen and compile-time checks for C++23's [[assume]] attribute")
+LANGOPT(RawStringLiterals, 1, 0, "Enable or disable raw string literals")
+
ENUM_LANGOPT(StrictFlexArraysLevel, StrictFlexArraysLevelKind, 2,
StrictFlexArraysLevelKind::Default,
"Rely on strict definition of flexible arrays")
diff --git a/clang/include/clang/Basic/LangStandard.h b/clang/include/clang/Basic/LangStandard.h
index 8e25afc833661c..0a308b93ada746 100644
--- a/clang/include/clang/Basic/LangStandard.h
+++ b/clang/include/clang/Basic/LangStandard.h
@@ -130,6 +130,12 @@ struct LangStandard {
/// hasDigraphs - Language supports digraphs.
bool hasDigraphs() const { return Flags & Digraphs; }
+ /// hasRawStringLiterals - Language supports R"()" raw string literals.
+ bool hasRawStringLiterals() const {
+ // GCC supports raw string literals in C, but not in C++ before C++11.
+ return isCPlusPlus11() || (!isCPlusPlus() && isGNUMode());
+ }
+
/// isGNUMode - Language includes GNU extensions.
bool isGNUMode() const { return Flags & GNUMode; }
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index f745e573eb2686..32e6c10e1251b7 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4142,6 +4142,12 @@ def fenable_matrix : Flag<["-"], "fenable-matrix">, Group<f_Group>,
HelpText<"Enable matrix data type and related builtin functions">,
MarshallingInfoFlag<LangOpts<"MatrixTypes">>;
+defm raw_string_literals : BoolFOption<"raw-string-literals",
+ LangOpts<"RawStringLiterals">, Default<std#".hasRawStringLiterals()">,
+ PosFlag<SetTrue, [], [], "Enable">,
+ NegFlag<SetFalse, [], [], "Disable">,
+ BothFlags<[], [ClangOption, CC1Option], " raw string literals">>;
+
def fzero_call_used_regs_EQ
: Joined<["-"], "fzero-call-used-regs=">, Group<f_Group>,
Visibility<[ClangOption, CC1Option]>,
diff --git a/clang/lib/Basic/LangOptions.cpp b/clang/lib/Basic/LangOptions.cpp
index a0adfbf61840e3..c34f0ed5ed7174 100644
--- a/clang/lib/Basic/LangOptions.cpp
+++ b/clang/lib/Basic/LangOptions.cpp
@@ -124,6 +124,7 @@ void LangOptions::setLangDefaults(LangOptions &Opts, Language Lang,
Opts.HexFloats = Std.hasHexFloats();
Opts.WChar = Std.isCPlusPlus();
Opts.Digraphs = Std.hasDigraphs();
+ Opts.RawStringLiterals = Std.hasRawStringLiterals();
Opts.HLSL = Lang == Language::HLSL;
if (Opts.HLSL && Opts.IncludeDefaultHeader)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 766a9b91e3c0ad..c99bfe4efc4137 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6536,6 +6536,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
Args.AddLastArg(CmdArgs, options::OPT_fheinous_gnu_extensions);
Args.AddLastArg(CmdArgs, options::OPT_fdigraphs, options::OPT_fno_digraphs);
Args.AddLastArg(CmdArgs, options::OPT_fzero_call_used_regs_EQ);
+ Args.AddLastArg(CmdArgs, options::OPT_fraw_string_literals,
+ options::OPT_fno_raw_string_literals);
if (Args.hasFlag(options::OPT_femulated_tls, options::OPT_fno_emulated_tls,
Triple.hasDefaultEmulatedTLS()))
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 89e6c19b0af45c..71865bb061f57e 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -3850,6 +3850,7 @@ LangOptions getFormattingLangOpts(const FormatStyle &Style) {
// the sequence "<::" will be unconditionally treated as "[:".
// Cf. Lexer::LexTokenInternal.
LangOpts.Digraphs = LexingStd >= FormatStyle::LS_Cpp11;
+ LangOpts.RawStringLiterals = LexingStd >= FormatStyle::LS_Cpp11;
LangOpts.LineComment = 1;
bool AlternativeOperators = Style.isCpp();
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index c98645993abe07..67d75c1140b232 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -3867,7 +3867,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
tok::utf16_char_constant);
// UTF-16 raw string literal
- if (Char == 'R' && LangOpts.CPlusPlus11 &&
+ if (Char == 'R' && LangOpts.RawStringLiterals &&
getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
return LexRawStringLiteral(Result,
ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
@@ -3889,7 +3889,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
SizeTmp2, Result),
tok::utf8_char_constant);
- if (Char2 == 'R' && LangOpts.CPlusPlus11) {
+ if (Char2 == 'R' && LangOpts.RawStringLiterals) {
unsigned SizeTmp3;
char Char3 = getCharAndSize(CurPtr + SizeTmp + SizeTmp2, SizeTmp3);
// UTF-8 raw string literal
@@ -3925,7 +3925,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
tok::utf32_char_constant);
// UTF-32 raw string literal
- if (Char == 'R' && LangOpts.CPlusPlus11 &&
+ if (Char == 'R' && LangOpts.RawStringLiterals &&
getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
return LexRawStringLiteral(Result,
ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
@@ -3940,7 +3940,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
// Notify MIOpt that we read a non-whitespace/non-comment token.
MIOpt.ReadToken();
- if (LangOpts.CPlusPlus11) {
+ if (LangOpts.RawStringLiterals) {
Char = getCharAndSize(CurPtr, SizeTmp);
if (Char == '"')
@@ -3963,7 +3963,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
tok::wide_string_literal);
// Wide raw string literal.
- if (LangOpts.CPlusPlus11 && Char == 'R' &&
+ if (LangOpts.RawStringLiterals && Char == 'R' &&
getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
return LexRawStringLiteral(Result,
ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
diff --git a/clang/test/Lexer/raw-string-ext.c b/clang/test/Lexer/raw-string-ext.c
new file mode 100644
index 00000000000000..45e3990cadf3d2
--- /dev/null
+++ b/clang/test/Lexer/raw-string-ext.c
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -fsyntax-only -std=gnu11 -verify=gnu -DGNU %s
+// RUN: %clang_cc1 -fsyntax-only -std=c11 -fraw-string-literals -verify=gnu -DGNU %s
+// RUN: %clang_cc1 -fsyntax-only -std=c11 -verify=std %s
+// RUN: %clang_cc1 -fsyntax-only -std=gnu11 -fno-raw-string-literals -verify=std %s
+
+void f() {
+ (void) R"foo()foo"; // std-error {{use of undeclared identifier 'R'}}
+ (void) LR"foo()foo"; // std-error {{use of undeclared identifier 'LR'}}
+ (void) uR"foo()foo"; // std-error {{use of undeclared identifier 'uR'}}
+ (void) u8R"foo()foo"; // std-error {{use of undeclared identifier 'u8R'}}
+ (void) UR"foo()foo"; // std-error {{use of undeclared identifier 'UR'}}
+}
+
+// gnu-error@* {{missing terminating delimiter}}
+// gnu-error@* {{expected expression}}
+// gnu-error@* {{expected ';' after top level declarator}}
+#define R "bar"
+const char* s = R"foo(";
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! Broadly speaking, I think the idea makes a lot of sense.
GCC does not seem to support raw string literals in C++ before C++11, even if e.g. -std=gnu++03 is passed. Should we follow this behaviour or should we enable raw string literals in earlier C++ language modes as well if -gnu++XY is passed? -fraw-string-literals currently makes it possible to enable them in e.g. C++03.
I think we should follow that behavior; because R
can be a valid macro identifier, being conservative is defensible.
-fno-raw-string-literals allows users to disable raw string literals in -gnuXY mode. I thought it might be useful to have this, but do we want it?
I think it's reasonable to have it, but I don't think we should allow it for C++11 and later modes unless there's some rationale I'm missing. (I don't think we want to let users disable language features in standards modes where the feature is standardized without some sort of reasonable justification.)
The implementation of this currently adds a RawStringLiterals option to the LangOpts; -f[no-]raw-string-literals overrides the default value for it which depends on the language standard. As a consequence, passing e.g. -std=c++11 -fno-raw-string-literals will disable raw string literals even though we’re in C++11 mode. Do we want to allow this or should we just ignore -f[no-]raw-string-literals if we’re in C++11 or later?
I think we should either ignore or diagnose it in C++11 or later.
This probably deserves a note in LanguageExtensions.rst, but I’m not exactly sure where.
It definitely should be noted in there; I would probably recommend https://clang.llvm.org/docs/LanguageExtensions.html#c-11-raw-string-literals for the C++ side of things and then something similar for C around where we document those.
Should we add a flag for this to __has_feature/__has_extension?
Yes, but it's a fun question as to which one. We currently use __has_feature
for it in C++:
FEATURE(cxx_raw_string_literals, LangOpts.CPlusPlus11)
and it seems like it would make sense to continue to do so for C++. But this isn't a language feature of C, so __has_extension
makes sense there. But that's confusing because then we've got both, so I'm not entirely certain that's the right approach. Perhaps using __has_feature
for both C and C++ makes the most sense?
To clarify, should we allow enabling them in e.g. |
I think we should allow users to enable them in C++03 modes if |
Btw, it seems that precommit CI found some valid issues to be addressed |
In that case I think it might just make sense to ignore the flag in C++11 and later then and allow it before C++11.
Ah, I didn’t know that all new warnings should have a |
@AaronBallman I just noticed something that I’ve somehow not realised until now even though I’d already written a test case for it: Not only does GCC allow raw string literals in gnuXY mode, but also UTF string literals, e.g. Should we follow suit here? And if so, should we add a separate flag for that or rename |
C has Unicode string literals as well: https://godbolt.org/z/chdjYrK9v and so if we're allowing raw string literals, it makes sense to also allow raw unicode string literals IMO. I don't think we need to rename the flag though. |
I think that makes the most sense. |
Yes, these things are completely orthogonal, it makes no sense to treat raw strings with an encoding prefix differently |
Alright, I think this has the behaviour that we want now:
|
So, apparently, this test here llvm-project/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp Lines 586 to 589 in cb76896
is now failing, presumably because of this: llvm-project/clang/lib/Lex/DependencyDirectivesScanner.cpp Lines 71 to 79 in cb76896
I’m not entirely sure how to fix this candidly. It doesn’t look like unconditionally enabling raw string literals is an option here... This situation reminds me of a similar issue we’re having with |
Why gnu99 mode and not gnu89 mode? I see GCC has that behavior, but I'm not certain why.
Yeah, it's pretty frustrating that we've found two instances of this in such a short period of time. :-/ That test was added in ee8ed0b and it seems to be a bit of a drive-by as the author noticed the behavior. Given that dependency scanning is never going to care about raw string literals to begin with (at least that I can think of), I'm not certain there's any harm in always supporting raw string literals from dependency scanning, so we could probably do that in the worst case. But my concerns from #93753 (comment) are still relevant too. CC @jansvoboda11 |
We went over this a while back: #88265 (comment)
👍 |
THAT is why this was so familiar to me! :-D Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang changes LGTM modulo the dependency scanner bits.
Alright, I’ll wait for a reply from @jansvoboda11 then |
I assume that @benlangmuir added the scanner unit-test to demonstrate the current behavior instead of trying to make sure it's preserved. I think making it so that the test passes (actually handles raw string literals) and updating the FIXME in |
To clarify, that means setting the |
Yes, that's what I had in mind 👍 |
Alright, I just enabled raw string literals in the dependency scanner by default; barring any further complications, I’ll merge this once CI is done. |
Correct. They're only interesting to the scanner insofar as they're used in Thanks for working on this! |
Silly me forgot to actually update the test after enabling raw string literals in the dependency scanner, but now everything should pass. |
This enables raw R"" string literals in C in some language modes and adds an option to disable or enable them explicitly as an extension. Background: GCC supports raw string literals in C in `-gnuXY` modes starting with gnu99. This pr both enables raw string literals in gnu99 mode and later in C and adds an `-f[no-]raw-string-literals` flag to override this behaviour. The decision not to enable raw string literals in gnu89 mode, according to the GCC devs, is intentional as that mode is supposed to be used for ‘old code’ that they don’t want to break; we’ve decided to match GCC’s behaviour here as well. The `-fraw-string-literals` flag can additionally be used to enable raw string literals in modes where they aren’t enabled by default (such as c99—as opposed to gnu99—or even e.g. C++03); conversely, the negated flag can be used to disable them in any gnuXY modes that *do* provide them by default, or to override a previous flag. However, we do *not* support disabling raw string literals (or indeed either of these two options) in C++11 mode and later, because we don’t want to just start supporting disabling features that are actually part of the language in the general case. This fixes llvm#85703.
This is a tentative implementation of support for raw string literals in C following the discussion on #85703.
GCC supports raw string literals in C in
-gnuXY
mode. This pr both enables raw string literals in-gnuXY
mode in C and adds a-f[no-]raw-string-literals
flag to override this beheviour. There are a few questions I still have though:-std=gnu++03
is passed. Should we follow this behaviour or should we enable raw string literals in earlier C++ language modes as well if-gnu++XY
is passed?-fraw-string-literals
currently makes it possible to enable them in e.g. C++03.-fno-raw-string-literals
allows users to disable raw string literals in-gnuXY
mode. I thought it might be useful to have this, but do we want it?RawStringLiterals
option to the LangOpts;-f[no-]raw-string-literals
overrides the default value for it which depends on the language standard. As a consequence, passing e.g.-std=c++11 -fno-raw-string-literals
will disable raw string literals even though we’re in C++11 mode. Do we want to allow this or should we just ignore-f[no-]raw-string-literals
if we’re in C++11 or later?LanguageExtensions.rst
, but I’m not exactly sure where.__has_feature
/__has_extension
?