Skip to content

Commit 8b0d112

Browse files
authored
[Clang][Preprocessor] Expand UCNs in macro concatenation (#145351)
Fixs #145240. The UCN in preprocessor pasted identifier not resolved to unicode, it may cause the following issue: ```c #define CAT(a,b) a##b char foo\u00b5; char*p = &CAT(foo, \u00b5); // error: use of undeclared identifier 'foo\u00b5' ``` The real identifier after paste is `fooµ`. This PR fix this issue in `TokenLexer::pasteTokens`, if there has any UCN in pasting tokens, the final pasted token should have a Token::HasUCN flag. Then `Preprocessor::LookUpIdentifierInfo` will expand UCNs in this token. Signed-off-by: yronglin <[email protected]>
1 parent 353f754 commit 8b0d112

File tree

3 files changed

+22
-0
lines changed

3 files changed

+22
-0
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,7 @@ Bug Fixes in This Version
722722
- Fixed incorrect token location when emitting diagnostics for tokens expanded from macros. (#GH143216)
723723
- Fixed an infinite recursion when checking constexpr destructors. (#GH141789)
724724
- Fixed a crash when a malformed using declaration appears in a ``constexpr`` function. (#GH144264)
725+
- Fixed a bug when use unicode character name in macro concatenation. (#GH145240)
725726

726727
Bug Fixes to Compiler Builtins
727728
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

clang/lib/Lex/TokenLexer.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -748,6 +748,7 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
748748
const char *ResultTokStrPtr = nullptr;
749749
SourceLocation StartLoc = LHSTok.getLocation();
750750
SourceLocation PasteOpLoc;
751+
bool HasUCNs = false;
751752

752753
auto IsAtEnd = [&TokenStream, &CurIdx] {
753754
return TokenStream.size() == CurIdx;
@@ -885,6 +886,9 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
885886

886887
// Finally, replace LHS with the result, consume the RHS, and iterate.
887888
++CurIdx;
889+
890+
// Set Token::HasUCN flag if LHS or RHS contains any UCNs.
891+
HasUCNs = LHSTok.hasUCN() || RHS.hasUCN() || HasUCNs;
888892
LHSTok = Result;
889893
} while (!IsAtEnd() && TokenStream[CurIdx].is(tok::hashhash));
890894

@@ -913,6 +917,13 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
913917
// token pasting re-lexes the result token in raw mode, identifier information
914918
// isn't looked up. As such, if the result is an identifier, look up id info.
915919
if (LHSTok.is(tok::raw_identifier)) {
920+
921+
// If there has any UNCs in concated token, we should mark this token
922+
// with Token::HasUCN flag, then LookUpIdentifierInfo will expand UCNs in
923+
// token.
924+
if (HasUCNs)
925+
LHSTok.setFlag(Token::HasUCN);
926+
916927
// Look up the identifier info for the token. We disabled identifier lookup
917928
// by saying we're skipping contents, so we need to do this manually.
918929
PP.LookUpIdentifierInfo(LHSTok);
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// RUN: %clang_cc1 -fms-extensions %s -verify
2+
// RUN: %clang_cc1 -E -fms-extensions %s | FileCheck %s
3+
// expected-no-diagnostics
4+
5+
#define CAT(a,b) a##b
6+
7+
char foo\u00b5;
8+
char*p = &CAT(foo, \u00b5);
9+
// CHECK: char fooµ;
10+
// CHECK-NEXT: char*p = &fooµ;

0 commit comments

Comments
 (0)