Skip to content

[clang-format] Recognize TableGen paste operator on separate line #133722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 10, 2025

Conversation

sstwcw
Copy link
Contributor

@sstwcw sstwcw commented Mar 31, 2025

Formatting this piece of code made the program crash.

class TypedVecListRegOperand<RegisterClass Reg, int lanes, string eltsize>
    : RegisterOperand<Reg, "printTypedVectorList<" # lanes # ", '"
                                                   # eltsize # "'>">;

The line starting with the # was treated as a separate preprocessor directive line. Then the code dereferenced a null pointer when it tried to continue parsing the first line that did not end in a semicolon.

Now the 2 problems are fixed.

Formatting this piece of code made the program crash.

```
class TypedVecListRegOperand<RegisterClass Reg, int lanes, string eltsize>
    : RegisterOperand<Reg, "printTypedVectorList<" # lanes # ", '"
                                                   # eltsize # "'>">;
```

The line starting with the `#` was treated as a separate preprocessor
directive line.  Then the code dereferenced a null pointer when it tried
to continue parsing the first line that did not end in a semicolon.

Now the 2 problems are fixed.
@llvmbot
Copy link
Member

llvmbot commented Mar 31, 2025

@llvm/pr-subscribers-clang-format

Author: None (sstwcw)

Changes

Formatting this piece of code made the program crash.

class TypedVecListRegOperand&lt;RegisterClass Reg, int lanes, string eltsize&gt;
    : RegisterOperand&lt;Reg, "printTypedVectorList&lt;" # lanes # ", '"
                                                   # eltsize # "'&gt;"&gt;;

The line starting with the # was treated as a separate preprocessor directive line. Then the code dereferenced a null pointer when it tried to continue parsing the first line that did not end in a semicolon.

Now the 2 problems are fixed.


Full diff: https://github.com/llvm/llvm-project/pull/133722.diff

4 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+2)
  • (modified) clang/lib/Format/UnwrappedLineParser.cpp (+9-2)
  • (modified) clang/unittests/Format/FormatTestTableGen.cpp (+6)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+17)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index d87b3a6088bd8..278355aa58586 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -949,6 +949,8 @@ class AnnotatingParser {
       HashTok->setType(TT_Unknown);
       if (!parseTableGenValue(ParseNameMode))
         return false;
+      if (!CurrentToken)
+        return true;
     }
     // In name mode, '{' is regarded as the end of the value.
     // See TGParser::ParseValue in TGParser.cpp
diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp
index f7712bea01c2c..aa0c372d5e15f 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -4853,9 +4853,16 @@ void UnwrappedLineParser::readToken(int LevelDifference) {
     PreviousWasComment = FormatTok->is(tok::comment);
 
     while (!Line->InPPDirective && FormatTok->is(tok::hash) &&
-           (!Style.isVerilog() ||
-            Keywords.isVerilogPPDirective(*Tokens->peekNextToken())) &&
            FirstNonCommentOnLine) {
+      // In Verilog, the backtick is used for macro invocations. In TableGen,
+      // the single hash is used for the paste operator.
+      const FormatToken *Next = Tokens->peekNextToken();
+      assert(Next); // There is an EOF token at the end.
+      if ((Style.isVerilog() && !Keywords.isVerilogPPDirective(*Next)) ||
+          (Style.isTableGen() &&
+           !Next->isOneOf(tok::pp_define, tok::pp_ifdef, tok::pp_ifndef))) {
+        break;
+      }
       distributeComments(Comments, FormatTok);
       Comments.clear();
       // If there is an unfinished unwrapped line, we flush the preprocessor
diff --git a/clang/unittests/Format/FormatTestTableGen.cpp b/clang/unittests/Format/FormatTestTableGen.cpp
index 92377c31f2e91..b78f79f20704f 100644
--- a/clang/unittests/Format/FormatTestTableGen.cpp
+++ b/clang/unittests/Format/FormatTestTableGen.cpp
@@ -218,6 +218,12 @@ TEST_F(FormatTestTableGen, PasteOperator) {
                "  string Z = [\"Traring\", \"Paste\", \"Traring\", \"Paste\",\n"
                "              \"Traring\", \"Paste\"]#;\n"
                "}");
+  verifyFormat("def x#x {}", "def x\n"
+                             "#x {}");
+  verifyFormat("def x#x {}", "def x\n"
+                             "#\n"
+                             "x {}");
+  verifyFormat("def x#x");
 }
 
 TEST_F(FormatTestTableGen, ClassDefinition) {
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index ac5e979aea071..fb8f5d30a669f 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2832,6 +2832,23 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
   Tokens = Annotate("!cond");
   EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenCondOperator);
 
+  // The paste operator should not be treated as a preprocessor directive even
+  // if it is on a separate line.
+  Tokens = Annotate("def x\n"
+                    "#embed {}");
+  ASSERT_EQ(Tokens.size(), 7u) << Tokens;
+  EXPECT_TOKEN(Tokens[1], tok::identifier, TT_StartOfName);
+  EXPECT_TOKEN(Tokens[2], tok::hash, TT_Unknown);
+  EXPECT_EQ(Tokens[1]->Next, Tokens[2]);
+  Tokens = Annotate("def x\n"
+                    "#define x\n"
+                    "#embed {}");
+  ASSERT_EQ(Tokens.size(), 10u) << Tokens;
+  EXPECT_TOKEN(Tokens[1], tok::identifier, TT_StartOfName);
+  EXPECT_TOKEN(Tokens[2], tok::hash, TT_Unknown);
+  EXPECT_TOKEN(Tokens[5], tok::hash, TT_Unknown);
+  EXPECT_EQ(Tokens[1]->Next, Tokens[5]);
+
   auto AnnotateValue = [this, &Style](StringRef Code) {
     // Values are annotated only in specific context.
     auto Result = annotate(("def X { let V = " + Code + "; }").str(), Style);

Comment on lines 4859 to 4860
const FormatToken *Next = Tokens->peekNextToken();
assert(Next); // There is an EOF token at the end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const FormatToken *Next = Tokens->peekNextToken();
assert(Next); // There is an EOF token at the end.
const auto *Next = Tokens->peekNextToken();

It seems that peekNextToken() never returns null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that asserts were for things that never happen. So I added one. Is this something that should happen even less so that asserts are no good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO assert() should not be used on something that's obviously true, and for this reason we never assert on peekNextToken().

@sstwcw sstwcw merged commit f7617f7 into llvm:main Apr 10, 2025
11 checks passed
@sstwcw sstwcw deleted the format-table-gen-paste branch April 10, 2025 12:52
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…vm#133722)

Formatting this piece of code made the program crash.

```
class TypedVecListRegOperand<RegisterClass Reg, int lanes, string eltsize>
    : RegisterOperand<Reg, "printTypedVectorList<" # lanes # ", '"
                                                   # eltsize # "'>">;
```

The line starting with the `#` was treated as a separate preprocessor
directive line. Then the code dereferenced a null pointer when it tried
to continue parsing the first line that did not end in a semicolon.

Now the 2 problems are fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants