Skip to content

[clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. #78996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

hnakamura5
Copy link
Contributor

Adds the support for tokens that have forms like unary operators.

  • bang operators: !name
  • cond operator: !cond
  • numeric literals: +1, -1
    cond operator are one of bang operators but is distinguished because it has very specific syntax.

@hnakamura5 hnakamura5 changed the title [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literal. [clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. Jan 22, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-clang-format

Author: Hirofumi Nakamura (hnakamura5)

Changes

Adds the support for tokens that have forms like unary operators.

  • bang operators: !name
  • cond operator: !cond
  • numeric literals: +1, -1
    cond operator are one of bang operators but is distinguished because it has very specific syntax.

Full diff: https://github.com/llvm/llvm-project/pull/78996.diff

3 Files Affected:

  • (modified) clang/lib/Format/FormatToken.h (+2)
  • (modified) clang/lib/Format/FormatTokenLexer.cpp (+38-7)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+20-4)
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index dede89f2600150f..bace91b5f99b4df 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -148,6 +148,8 @@ namespace format {
   TYPE(StructLBrace)                                                           \
   TYPE(StructRBrace)                                                           \
   TYPE(StructuredBindingLSquare)                                               \
+  TYPE(TableGenBangOperator)                                                   \
+  TYPE(TableGenCondOperator)                                                   \
   TYPE(TableGenMultiLineString)                                                \
   TYPE(TemplateCloser)                                                         \
   TYPE(TemplateOpener)                                                         \
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index 52a55ea23b5f2f7..d7de09ef0e12ab6 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
       return;
     }
   }
-  // TableGen's Multi line string starts with [{
-  if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
-                                           TT_TableGenMultiLineString)) {
-    // Set again with finalizing. This must never be annotated as other types.
-    Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
-    Tokens.back()->Tok.setKind(tok::string_literal);
-    return;
+  if (Style.isTableGen()) {
+    // TableGen's Multi line string starts with [{
+    if (tryMergeTokens({tok::l_square, tok::l_brace},
+                       TT_TableGenMultiLineString)) {
+      // Set again with finalizing. This must never be annotated as other types.
+      Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+      Tokens.back()->Tok.setKind(tok::string_literal);
+      return;
+    }
+    // TableGen's bang operator is the form !<name>.
+    // !cond is a special case with specific syntax.
+    if (tryMergeTokens({tok::exclaim, tok::identifier},
+                       TT_TableGenBangOperator)) {
+      Tokens.back()->Tok.setKind(tok::identifier);
+      Tokens.back()->Tok.setIdentifierInfo(nullptr);
+      if (Tokens.back()->TokenText == "!cond")
+        Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+      else
+        Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+      return;
+    }
+    if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+      // Here, "! if" becomes "!if".  That is, ! captures if even when the space
+      // exists. That is only one possibility in TableGen's syntax.
+      Tokens.back()->Tok.setKind(tok::identifier);
+      Tokens.back()->Tok.setIdentifierInfo(nullptr);
+      Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+      return;
+    }
+    // +, - with numbers are literals. Not unary operators.
+    if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+      Tokens.back()->Tok.setKind(tok::numeric_constant);
+      return;
+    }
+    if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
+      Tokens.back()->Tok.setKind(tok::numeric_constant);
+      return;
+    }
   }
 }
 
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 3dbf504c35ed55e..cb93930e0fc3bc8 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2210,16 +2210,24 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
   EXPECT_TRUE(Tokens[0]->IsMultiline);
   EXPECT_EQ(Tokens[0]->LastLineColumnWidth, sizeof("   the string. }]") - 1);
 
+  // Numeric literals.
+  Tokens = Annotate("1234");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("-1");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("+1234");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("0b0110");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+  Tokens = Annotate("0x1abC");
+  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+
   // Identifier tokens. In TableGen, identifiers can begin with a number.
   // In ambiguous cases, the lexer tries to lex it as a number.
   // Even if the try fails, it does not fall back to identifier lexing and
   // regard as an error.
   // The ambiguity is not documented. The result of those tests are based on the
   // implementation of llvm::TGLexer::LexToken.
-  Tokens = Annotate("1234");
-  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
-  Tokens = Annotate("0x1abC");
-  EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
   // This is invalid syntax of number, but not an identifier.
   Tokens = Annotate("0x1234x");
   EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
@@ -2244,6 +2252,14 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
   EXPECT_TOKEN(Tokens[6], tok::l_brace, TT_ElseLBrace);
   Tokens = Annotate("defset Foo Def2 = {}");
   EXPECT_TOKEN(Tokens[4], tok::l_brace, TT_FunctionLBrace);
+
+  // Bang Operators.
+  Tokens = Annotate("!foreach");
+  EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
+  Tokens = Annotate("!if");
+  EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
+  Tokens = Annotate("!cond");
+  EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenCondOperator);
 }
 
 TEST_F(TokenAnnotatorTest, UnderstandConstructors) {

Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use TableGen myself so I can't really say if this is quite the correct thing to do, I feel it needs a TableGenTest.cpp example of real TableGen code for every example you give here. Then we should try and cover the other use cases. That we might be breaking

I have slight concern of merging + and a number because of X + 1 type logic that might exist. but Like I said I don't know enough about TableGen syntax

// +, - with numbers are literals. Not unary operators.
if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
Tokens.back()->Tok.setKind(tok::numeric_constant);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a TableGen expert, but what does this do to [Offset + 1] type code in a td file? could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/TableGen/ProgRef.html#values-and-expressions

As far as I read from the manual, TableGen does not have + as infix binary operator.
And as noted in the warning above, - is lexed as the integer's prefix rather than infix operator for range and slice.

could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?

I agree, and actually there is a comprehensive set of unit test for TableGen's syntax here. (Even this may be missing real examples of TableGen usage in target definition, mlir and so on.)
https://github.com/llvm/llvm-project/pull/76059/files#diff-2ce45a84684fe19d813e79bab2f732809f3544d38f344e3d2cfe23aa9216a1c8

Current pull request is separated from this PR. I'm wondering when to add the test. Because now it only recognizes tokens, and cannot format many part of that yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mydeveloperday
Thank you for reviewing this pull request.
Now it comes to 1 week since this PR is started. I want to continue before I forget.
Could you mind accepting or adding some suggestion?
Or if you do not intend neither, can I request another reviewer?

}
if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
Tokens.back()->Tok.setKind(tok::numeric_constant);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't say anything because I was also waiting on @mydeveloperday.

I see no problem with the current approach and think token annotator tests are enough.

@hnakamura5 hnakamura5 merged commit 0058263 into llvm:main Jan 30, 2024
@hnakamura5
Copy link
Contributor Author

@HazardyKnusperkeks
Thank you for checking and accepting!

@mydeveloperday
You will be able to see the points in the consequent PRs.

Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants