-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. #78996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. #78996
Conversation
…rm, bang operators and numeric literal.
@llvm/pr-subscribers-clang-format Author: Hirofumi Nakamura (hnakamura5) ChangesAdds the support for tokens that have forms like unary operators.
Full diff: https://github.com/llvm/llvm-project/pull/78996.diff 3 Files Affected:
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index dede89f2600150f..bace91b5f99b4df 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -148,6 +148,8 @@ namespace format {
TYPE(StructLBrace) \
TYPE(StructRBrace) \
TYPE(StructuredBindingLSquare) \
+ TYPE(TableGenBangOperator) \
+ TYPE(TableGenCondOperator) \
TYPE(TableGenMultiLineString) \
TYPE(TemplateCloser) \
TYPE(TemplateOpener) \
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index 52a55ea23b5f2f7..d7de09ef0e12ab6 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
return;
}
}
- // TableGen's Multi line string starts with [{
- if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
- TT_TableGenMultiLineString)) {
- // Set again with finalizing. This must never be annotated as other types.
- Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
- Tokens.back()->Tok.setKind(tok::string_literal);
- return;
+ if (Style.isTableGen()) {
+ // TableGen's Multi line string starts with [{
+ if (tryMergeTokens({tok::l_square, tok::l_brace},
+ TT_TableGenMultiLineString)) {
+ // Set again with finalizing. This must never be annotated as other types.
+ Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
+ Tokens.back()->Tok.setKind(tok::string_literal);
+ return;
+ }
+ // TableGen's bang operator is the form !<name>.
+ // !cond is a special case with specific syntax.
+ if (tryMergeTokens({tok::exclaim, tok::identifier},
+ TT_TableGenBangOperator)) {
+ Tokens.back()->Tok.setKind(tok::identifier);
+ Tokens.back()->Tok.setIdentifierInfo(nullptr);
+ if (Tokens.back()->TokenText == "!cond")
+ Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
+ else
+ Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+ return;
+ }
+ if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+ // Here, "! if" becomes "!if". That is, ! captures if even when the space
+ // exists. That is only one possibility in TableGen's syntax.
+ Tokens.back()->Tok.setKind(tok::identifier);
+ Tokens.back()->Tok.setIdentifierInfo(nullptr);
+ Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
+ return;
+ }
+ // +, - with numbers are literals. Not unary operators.
+ if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+ Tokens.back()->Tok.setKind(tok::numeric_constant);
+ return;
+ }
+ if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
+ Tokens.back()->Tok.setKind(tok::numeric_constant);
+ return;
+ }
}
}
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 3dbf504c35ed55e..cb93930e0fc3bc8 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2210,16 +2210,24 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
EXPECT_TRUE(Tokens[0]->IsMultiline);
EXPECT_EQ(Tokens[0]->LastLineColumnWidth, sizeof(" the string. }]") - 1);
+ // Numeric literals.
+ Tokens = Annotate("1234");
+ EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+ Tokens = Annotate("-1");
+ EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+ Tokens = Annotate("+1234");
+ EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+ Tokens = Annotate("0b0110");
+ EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+ Tokens = Annotate("0x1abC");
+ EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
+
// Identifier tokens. In TableGen, identifiers can begin with a number.
// In ambiguous cases, the lexer tries to lex it as a number.
// Even if the try fails, it does not fall back to identifier lexing and
// regard as an error.
// The ambiguity is not documented. The result of those tests are based on the
// implementation of llvm::TGLexer::LexToken.
- Tokens = Annotate("1234");
- EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
- Tokens = Annotate("0x1abC");
- EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
// This is invalid syntax of number, but not an identifier.
Tokens = Annotate("0x1234x");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
@@ -2244,6 +2252,14 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
EXPECT_TOKEN(Tokens[6], tok::l_brace, TT_ElseLBrace);
Tokens = Annotate("defset Foo Def2 = {}");
EXPECT_TOKEN(Tokens[4], tok::l_brace, TT_FunctionLBrace);
+
+ // Bang Operators.
+ Tokens = Annotate("!foreach");
+ EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
+ Tokens = Annotate("!if");
+ EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
+ Tokens = Annotate("!cond");
+ EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenCondOperator);
}
TEST_F(TokenAnnotatorTest, UnderstandConstructors) {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use TableGen myself so I can't really say if this is quite the correct thing to do, I feel it needs a TableGenTest.cpp example of real TableGen code for every example you give here. Then we should try and cover the other use cases. That we might be breaking
I have slight concern of merging +
and a number
because of X + 1
type logic that might exist. but Like I said I don't know enough about TableGen syntax
// +, - with numbers are literals. Not unary operators. | ||
if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) { | ||
Tokens.back()->Tok.setKind(tok::numeric_constant); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a TableGen expert, but what does this do to [Offset + 1]
type code in a td file? could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://llvm.org/docs/TableGen/ProgRef.html#values-and-expressions
As far as I read from the manual, TableGen does not have +
as infix binary operator.
And as noted in the warning above, -
is lexed as the integer's prefix rather than infix operator for range and slice.
could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?
I agree, and actually there is a comprehensive set of unit test for TableGen's syntax here. (Even this may be missing real examples of TableGen usage in target definition, mlir and so on.)
https://github.com/llvm/llvm-project/pull/76059/files#diff-2ce45a84684fe19d813e79bab2f732809f3544d38f344e3d2cfe23aa9216a1c8
Current pull request is separated from this PR. I'm wondering when to add the test. Because now it only recognizes tokens, and cannot format many part of that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mydeveloperday
Thank you for reviewing this pull request.
Now it comes to 1 week since this PR is started. I want to continue before I forget.
Could you mind accepting or adding some suggestion?
Or if you do not intend neither, can I request another reviewer?
} | ||
if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) { | ||
Tokens.back()->Tok.setKind(tok::numeric_constant); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't say anything because I was also waiting on @mydeveloperday.
I see no problem with the current approach and think token annotator tests are enough.
@HazardyKnusperkeks @mydeveloperday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds the support for tokens that have forms like unary operators.
!name
!cond
+1
,-1
cond operator are one of bang operators but is distinguished because it has very specific syntax.