-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][Preprocessor] Expand UCNs in macro concatenation #145351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yronglin <[email protected]>
@llvm/pr-subscribers-clang Author: None (yronglin) ChangesFixs #145240. Full diff: https://github.com/llvm/llvm-project/pull/145351.diff 3 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 96477ef6ddc9a..af107a2d51062 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -720,6 +720,7 @@ Bug Fixes in This Version
- Fixed incorrect token location when emitting diagnostics for tokens expanded from macros. (#GH143216)
- Fixed an infinite recursion when checking constexpr destructors. (#GH141789)
- Fixed a crash when a malformed using declaration appears in a ``constexpr`` function. (#GH144264)
+- Fixed a bug when use unicode character name in macro concatenation. (#GH145240)
Bug Fixes to Compiler Builtins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/lib/Lex/TokenLexer.cpp b/clang/lib/Lex/TokenLexer.cpp
index 6e93416e01c0c..72f1ffa7ed06e 100644
--- a/clang/lib/Lex/TokenLexer.cpp
+++ b/clang/lib/Lex/TokenLexer.cpp
@@ -748,6 +748,7 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
const char *ResultTokStrPtr = nullptr;
SourceLocation StartLoc = LHSTok.getLocation();
SourceLocation PasteOpLoc;
+ bool HasUCNs = false;
auto IsAtEnd = [&TokenStream, &CurIdx] {
return TokenStream.size() == CurIdx;
@@ -885,6 +886,9 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
// Finally, replace LHS with the result, consume the RHS, and iterate.
++CurIdx;
+
+ // Set Token::HasUCN flag if LHS or RHS contains any UCNs.
+ HasUCNs = LHSTok.hasUCN() || RHS.hasUCN() || HasUCNs;
LHSTok = Result;
} while (!IsAtEnd() && TokenStream[CurIdx].is(tok::hashhash));
@@ -913,6 +917,13 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
// token pasting re-lexes the result token in raw mode, identifier information
// isn't looked up. As such, if the result is an identifier, look up id info.
if (LHSTok.is(tok::raw_identifier)) {
+
+ // If there has any UNCs in concated token, we should mark this token
+ // with Token::HasUCN flag, then LookUpIdentifierInfo will expand UCNs in
+ // token.
+ if (HasUCNs)
+ LHSTok.setFlag(Token::HasUCN);
+
// Look up the identifier info for the token. We disabled identifier lookup
// by saying we're skipping contents, so we need to do this manually.
PP.LookUpIdentifierInfo(LHSTok);
diff --git a/clang/test/Preprocessor/macro_paste_identifier_ucn.c b/clang/test/Preprocessor/macro_paste_identifier_ucn.c
new file mode 100644
index 0000000000000..c9eb8190edfe8
--- /dev/null
+++ b/clang/test/Preprocessor/macro_paste_identifier_ucn.c
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -fms-extensions %s -verify
+// RUN: %clang_cc1 -E -fms-extensions %s | FileCheck %s
+// expected-no-diagnostics
+
+#define CAT(a,b) a##b
+
+char foo\u00b5;
+char*p = &CAT(foo, \u00b5);
+// CHECK: char fooµ;
+// CHECK-NEXT: char*p = &fooµ;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summaries should not just be a link to an issue, it should at minimum briefly explain the problem and the fix. For a simple PR a reviewer should be able to digest the PR w/o leaving the review to look for more information.
Thanks for your review!
Sorry for that, I'll update the summary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review! |
Fixs llvm#145240. The UCN in preprocessor pasted identifier not resolved to unicode, it may cause the following issue: ```c #define CAT(a,b) a##b char foo\u00b5; char*p = &CAT(foo, \u00b5); // error: use of undeclared identifier 'foo\u00b5' ``` The real identifier after paste is `fooµ`. This PR fix this issue in `TokenLexer::pasteTokens`, if there has any UCN in pasting tokens, the final pasted token should have a Token::HasUCN flag. Then `Preprocessor::LookUpIdentifierInfo` will expand UCNs in this token. Signed-off-by: yronglin <[email protected]>
Fixes #145240.
The UCN in preprocessor pasted identifier not resolved to unicode, it may cause the following issue:
The real identifier after paste is
fooµ
. This PR fix this issue inTokenLexer::pasteTokens
, if there has any UCN in pasting tokens, the final pasted token should have a Token::HasUCN flag. ThenPreprocessor::LookUpIdentifierInfo
will expand UCNs in this token.