[Clang][Preprocessor] Expand UCNs in macro concatenation #145351

yronglin · 2025-06-23T16:03:45Z

The UCN in preprocessor pasted identifier not resolved to unicode, it may cause the following issue:

#define CAT(a,b) a##b

char foo\u00b5;
char*p = &CAT(foo, \u00b5); // error: use of undeclared identifier 'foo\u00b5'

The real identifier after paste is fooµ. This PR fix this issue in TokenLexer::pasteTokens, if there has any UCN in pasting tokens, the final pasted token should have a Token::HasUCN flag. Then Preprocessor::LookUpIdentifierInfo will expand UCNs in this token.

Signed-off-by: yronglin <[email protected]>

llvmbot · 2025-06-23T16:04:17Z

@llvm/pr-subscribers-clang

Author: None (yronglin)

Changes

Fixs #145240.

Full diff: https://github.com/llvm/llvm-project/pull/145351.diff

3 Files Affected:

(modified) clang/docs/ReleaseNotes.rst (+1)
(modified) clang/lib/Lex/TokenLexer.cpp (+11)
(added) clang/test/Preprocessor/macro_paste_identifier_ucn.c (+10)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 96477ef6ddc9a..af107a2d51062 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -720,6 +720,7 @@ Bug Fixes in This Version
 - Fixed incorrect token location when emitting diagnostics for tokens expanded from macros. (#GH143216)
 - Fixed an infinite recursion when checking constexpr destructors. (#GH141789)
 - Fixed a crash when a malformed using declaration appears in a ``constexpr`` function. (#GH144264)
+- Fixed a bug when use unicode character name in macro concatenation. (#GH145240) 
 
 Bug Fixes to Compiler Builtins
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/lib/Lex/TokenLexer.cpp b/clang/lib/Lex/TokenLexer.cpp
index 6e93416e01c0c..72f1ffa7ed06e 100644
--- a/clang/lib/Lex/TokenLexer.cpp
+++ b/clang/lib/Lex/TokenLexer.cpp
@@ -748,6 +748,7 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
   const char *ResultTokStrPtr = nullptr;
   SourceLocation StartLoc = LHSTok.getLocation();
   SourceLocation PasteOpLoc;
+  bool HasUCNs = false;
 
   auto IsAtEnd = [&TokenStream, &CurIdx] {
     return TokenStream.size() == CurIdx;
@@ -885,6 +886,9 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
 
     // Finally, replace LHS with the result, consume the RHS, and iterate.
     ++CurIdx;
+
+    // Set Token::HasUCN flag if LHS or RHS contains any UCNs.
+    HasUCNs = LHSTok.hasUCN() || RHS.hasUCN() || HasUCNs;
     LHSTok = Result;
   } while (!IsAtEnd() && TokenStream[CurIdx].is(tok::hashhash));
 
@@ -913,6 +917,13 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
   // token pasting re-lexes the result token in raw mode, identifier information
   // isn't looked up.  As such, if the result is an identifier, look up id info.
   if (LHSTok.is(tok::raw_identifier)) {
+
+    // If there has any UNCs in concated token, we should mark this token
+    // with Token::HasUCN flag, then LookUpIdentifierInfo will expand UCNs in
+    // token.
+    if (HasUCNs)
+      LHSTok.setFlag(Token::HasUCN);
+
     // Look up the identifier info for the token.  We disabled identifier lookup
     // by saying we're skipping contents, so we need to do this manually.
     PP.LookUpIdentifierInfo(LHSTok);
diff --git a/clang/test/Preprocessor/macro_paste_identifier_ucn.c b/clang/test/Preprocessor/macro_paste_identifier_ucn.c
new file mode 100644
index 0000000000000..c9eb8190edfe8
--- /dev/null
+++ b/clang/test/Preprocessor/macro_paste_identifier_ucn.c
@@ -0,0 +1,10 @@
+// RUN: %clang_cc1 -fms-extensions %s -verify
+// RUN: %clang_cc1 -E -fms-extensions %s | FileCheck %s
+// expected-no-diagnostics
+
+#define CAT(a,b) a##b
+
+char foo\u00b5;
+char*p = &CAT(foo, \u00b5);
+// CHECK: char fooµ;
+// CHECK-NEXT: char*p = &fooµ;

shafik

Summaries should not just be a link to an issue, it should at minimum briefly explain the problem and the fix. For a simple PR a reviewer should be able to digest the PR w/o leaving the review to look for more information.

yronglin · 2025-06-24T01:42:17Z

Thanks for your review!

Summaries should not just be a link to an issue, it should at minimum briefly explain the problem and the fix. For a simple PR a reviewer should be able to digest the PR w/o leaving the review to look for more information.

Sorry for that, I'll update the summary.

cor3ntin

LGTM

yronglin · 2025-06-24T16:37:47Z

Thanks for the review!

Fixs llvm#145240. The UCN in preprocessor pasted identifier not resolved to unicode, it may cause the following issue: ```c #define CAT(a,b) a##b char foo\u00b5; char*p = &CAT(foo, \u00b5); // error: use of undeclared identifier 'foo\u00b5' ``` The real identifier after paste is `fooµ`. This PR fix this issue in `TokenLexer::pasteTokens`, if there has any UCN in pasting tokens, the final pasted token should have a Token::HasUCN flag. Then `Preprocessor::LookUpIdentifierInfo` will expand UCNs in this token. Signed-off-by: yronglin <[email protected]>

[Clang][Preprocessor] Expand UCNs in macro concatenation

4892892

Signed-off-by: yronglin <[email protected]>

yronglin requested review from cor3ntin, AaronBallman and erichkeane June 23, 2025 16:03

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 23, 2025

yronglin requested a review from shafik June 23, 2025 16:54

shafik reviewed Jun 23, 2025

View reviewed changes

cor3ntin approved these changes Jun 24, 2025

View reviewed changes

yronglin merged commit 8b0d112 into llvm:main Jun 24, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang][Preprocessor] Expand UCNs in macro concatenation #145351

[Clang][Preprocessor] Expand UCNs in macro concatenation #145351

yronglin commented Jun 23, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jun 23, 2025

Uh oh!

shafik left a comment

Uh oh!

yronglin commented Jun 24, 2025

Uh oh!

cor3ntin left a comment

Uh oh!

yronglin commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

[Clang][Preprocessor] Expand UCNs in macro concatenation #145351

[Clang][Preprocessor] Expand UCNs in macro concatenation #145351

Conversation

yronglin commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 23, 2025

Uh oh!

shafik left a comment

Choose a reason for hiding this comment

Uh oh!

yronglin commented Jun 24, 2025

Uh oh!

cor3ntin left a comment

Choose a reason for hiding this comment

Uh oh!

yronglin commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

yronglin commented Jun 23, 2025 •

edited

Loading