Skip to content

[clang-format] Support of TableGen tokens with unary operator like form, bang operators and numeric literals. #78996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions clang/lib/Format/FormatToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ namespace format {
TYPE(StructLBrace) \
TYPE(StructRBrace) \
TYPE(StructuredBindingLSquare) \
TYPE(TableGenBangOperator) \
TYPE(TableGenCondOperator) \
TYPE(TableGenMultiLineString) \
TYPE(TemplateCloser) \
TYPE(TemplateOpener) \
Expand Down
45 changes: 38 additions & 7 deletions clang/lib/Format/FormatTokenLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -276,13 +276,44 @@ void FormatTokenLexer::tryMergePreviousTokens() {
return;
}
}
// TableGen's Multi line string starts with [{
if (Style.isTableGen() && tryMergeTokens({tok::l_square, tok::l_brace},
TT_TableGenMultiLineString)) {
// Set again with finalizing. This must never be annotated as other types.
Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
Tokens.back()->Tok.setKind(tok::string_literal);
return;
if (Style.isTableGen()) {
// TableGen's Multi line string starts with [{
if (tryMergeTokens({tok::l_square, tok::l_brace},
TT_TableGenMultiLineString)) {
// Set again with finalizing. This must never be annotated as other types.
Tokens.back()->setFinalizedType(TT_TableGenMultiLineString);
Tokens.back()->Tok.setKind(tok::string_literal);
return;
}
// TableGen's bang operator is the form !<name>.
// !cond is a special case with specific syntax.
if (tryMergeTokens({tok::exclaim, tok::identifier},
TT_TableGenBangOperator)) {
Tokens.back()->Tok.setKind(tok::identifier);
Tokens.back()->Tok.setIdentifierInfo(nullptr);
if (Tokens.back()->TokenText == "!cond")
Tokens.back()->setFinalizedType(TT_TableGenCondOperator);
else
Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
return;
}
if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
// Here, "! if" becomes "!if". That is, ! captures if even when the space
// exists. That is only one possibility in TableGen's syntax.
Tokens.back()->Tok.setKind(tok::identifier);
Tokens.back()->Tok.setIdentifierInfo(nullptr);
Tokens.back()->setFinalizedType(TT_TableGenBangOperator);
return;
}
// +, - with numbers are literals. Not unary operators.
if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
Tokens.back()->Tok.setKind(tok::numeric_constant);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a TableGen expert, but what does this do to [Offset + 1] type code in a td file? could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/TableGen/ProgRef.html#values-and-expressions

As far as I read from the manual, TableGen does not have + as infix binary operator.
And as noted in the warning above, - is lexed as the integer's prefix rather than infix operator for range and slice.

could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?could we build a better set of FormatTableGen unit tests to ensure we don't cause any regressions?

I agree, and actually there is a comprehensive set of unit test for TableGen's syntax here. (Even this may be missing real examples of TableGen usage in target definition, mlir and so on.)
https://github.com/llvm/llvm-project/pull/76059/files#diff-2ce45a84684fe19d813e79bab2f732809f3544d38f344e3d2cfe23aa9216a1c8

Current pull request is separated from this PR. I'm wondering when to add the test. Because now it only recognizes tokens, and cannot format many part of that yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mydeveloperday
Thank you for reviewing this pull request.
Now it comes to 1 week since this PR is started. I want to continue before I forget.
Could you mind accepting or adding some suggestion?
Or if you do not intend neither, can I request another reviewer?

}
if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
Tokens.back()->Tok.setKind(tok::numeric_constant);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

}
}
}

Expand Down
24 changes: 20 additions & 4 deletions clang/unittests/Format/TokenAnnotatorTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2210,16 +2210,24 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
EXPECT_TRUE(Tokens[0]->IsMultiline);
EXPECT_EQ(Tokens[0]->LastLineColumnWidth, sizeof(" the string. }]") - 1);

// Numeric literals.
Tokens = Annotate("1234");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Tokens = Annotate("-1");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Tokens = Annotate("+1234");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Tokens = Annotate("0b0110");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Tokens = Annotate("0x1abC");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);

// Identifier tokens. In TableGen, identifiers can begin with a number.
// In ambiguous cases, the lexer tries to lex it as a number.
// Even if the try fails, it does not fall back to identifier lexing and
// regard as an error.
// The ambiguity is not documented. The result of those tests are based on the
// implementation of llvm::TGLexer::LexToken.
Tokens = Annotate("1234");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Tokens = Annotate("0x1abC");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
// This is invalid syntax of number, but not an identifier.
Tokens = Annotate("0x1234x");
EXPECT_TOKEN(Tokens[0], tok::numeric_constant, TT_Unknown);
Expand All @@ -2244,6 +2252,14 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
EXPECT_TOKEN(Tokens[6], tok::l_brace, TT_ElseLBrace);
Tokens = Annotate("defset Foo Def2 = {}");
EXPECT_TOKEN(Tokens[4], tok::l_brace, TT_FunctionLBrace);

// Bang Operators.
Tokens = Annotate("!foreach");
EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
Tokens = Annotate("!if");
EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenBangOperator);
Tokens = Annotate("!cond");
EXPECT_TOKEN(Tokens[0], tok::identifier, TT_TableGenCondOperator);
}

TEST_F(TokenAnnotatorTest, UnderstandConstructors) {
Expand Down