Skip to content

[clang][refactor] Refactor findNextTokenIncludingComments #123060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

legrosbuffle
Copy link
Contributor

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to Lexer::findNextToken, so just add an extra agument to the latter.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang-tools-extra clang-tidy clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 15, 2025

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-tidy

Author: Clement Courbet (legrosbuffle)

Changes

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to Lexer::findNextToken, so just add an extra agument to the latter.


Full diff: https://github.com/llvm/llvm-project/pull/123060.diff

7 Files Affected:

  • (modified) clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp (+3-31)
  • (modified) clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp (+1-1)
  • (modified) clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp (+2-3)
  • (modified) clang-tools-extra/clang-tidy/utils/LexerUtils.cpp (-23)
  • (modified) clang-tools-extra/clang-tidy/utils/LexerUtils.h (-4)
  • (modified) clang/include/clang/Lex/Lexer.h (+2-1)
  • (modified) clang/lib/Lex/Lexer.cpp (+3-1)
diff --git a/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp b/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
index 80ee31368fe9a5..40c96f92254e42 100644
--- a/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
+++ b/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
@@ -118,35 +118,6 @@ findMembersUsedInInitExpr(const CXXCtorInitializer *Initializer,
   return Results;
 }
 
-/// Returns the next token after `Loc` (including comment tokens).
-static std::optional<Token> getTokenAfter(SourceLocation Loc,
-                                          const SourceManager &SM,
-                                          const LangOptions &LangOpts) {
-  if (Loc.isMacroID()) {
-    return std::nullopt;
-  }
-  Loc = Lexer::getLocForEndOfToken(Loc, 0, SM, LangOpts);
-
-  // Break down the source location.
-  std::pair<FileID, unsigned> LocInfo = SM.getDecomposedLoc(Loc);
-
-  // Try to load the file buffer.
-  bool InvalidTemp = false;
-  StringRef File = SM.getBufferData(LocInfo.first, &InvalidTemp);
-  if (InvalidTemp)
-    return std::nullopt;
-
-  const char *TokenBegin = File.data() + LocInfo.second;
-
-  Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-              TokenBegin, File.end());
-  lexer.SetCommentRetentionState(true);
-  // Find the token.
-  Token Tok;
-  lexer.LexFromRawLexer(Tok);
-  return Tok;
-}
-
 /// Returns the end of the trailing comments after `Loc`.
 static SourceLocation getEndOfTrailingComment(SourceLocation Loc,
                                               const SourceManager &SM,
@@ -154,11 +125,12 @@ static SourceLocation getEndOfTrailingComment(SourceLocation Loc,
   // We consider any following comment token that is indented more than the
   // first comment to be part of the trailing comment.
   const unsigned Column = SM.getPresumedColumnNumber(Loc);
-  std::optional<Token> Tok = getTokenAfter(Loc, SM, LangOpts);
+  std::optional<Token> Tok =
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   while (Tok && Tok->is(tok::comment) &&
          SM.getPresumedColumnNumber(Tok->getLocation()) > Column) {
     Loc = Tok->getEndLoc();
-    Tok = getTokenAfter(Loc, SM, LangOpts);
+    Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   }
   return Loc;
 }
diff --git a/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp b/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
index d24b613015d8ee..b4f54d02fc3362 100644
--- a/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
+++ b/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
@@ -59,7 +59,7 @@ SourceRange NS::getNamespaceBackRange(const SourceManager &SM,
   // Back from '}' to conditional '// namespace xxx'
   SourceLocation Loc = front()->getRBraceLoc();
   std::optional<Token> Tok =
-      utils::lexer::findNextTokenIncludingComments(Loc, SM, LangOpts);
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   if (!Tok)
     return getDefaultNamespaceBackRange();
   if (Tok->getKind() != tok::TokenKind::comment)
diff --git a/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp b/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
index fd5bd9f0b181b1..6191ebfbfb01f0 100644
--- a/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
+++ b/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
@@ -229,9 +229,8 @@ void UseOverrideCheck::check(const MatchFinder::MatchResult &Result) {
   if (HasVirtual) {
     for (Token Tok : Tokens) {
       if (Tok.is(tok::kw_virtual)) {
-        std::optional<Token> NextToken =
-            utils::lexer::findNextTokenIncludingComments(
-                Tok.getEndLoc(), Sources, getLangOpts());
+        std::optional<Token> NextToken = Lexer::findNextToken(
+            Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);
         if (NextToken.has_value()) {
           Diag << FixItHint::CreateRemoval(CharSourceRange::getCharRange(
               Tok.getLocation(), NextToken->getLocation()));
diff --git a/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp b/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
index 92c3e0ed7894e1..50da196315d3b3 100644
--- a/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
+++ b/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
@@ -86,29 +86,6 @@ SourceLocation findNextTerminator(SourceLocation Start, const SourceManager &SM,
   return findNextAnyTokenKind(Start, SM, LangOpts, tok::comma, tok::semi);
 }
 
-std::optional<Token>
-findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
-                               const LangOptions &LangOpts) {
-  // `Lexer::findNextToken` will ignore comment
-  if (Start.isMacroID())
-    return std::nullopt;
-  Start = Lexer::getLocForEndOfToken(Start, 0, SM, LangOpts);
-  // Break down the source location.
-  std::pair<FileID, unsigned> LocInfo = SM.getDecomposedLoc(Start);
-  bool InvalidTemp = false;
-  StringRef File = SM.getBufferData(LocInfo.first, &InvalidTemp);
-  if (InvalidTemp)
-    return std::nullopt;
-  // Lex from the start of the given location.
-  Lexer L(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-          File.data() + LocInfo.second, File.end());
-  L.SetCommentRetentionState(true);
-  // Find the token.
-  Token Tok;
-  L.LexFromRawLexer(Tok);
-  return Tok;
-}
-
 std::optional<Token>
 findNextTokenSkippingComments(SourceLocation Start, const SourceManager &SM,
                               const LangOptions &LangOpts) {
diff --git a/clang-tools-extra/clang-tidy/utils/LexerUtils.h b/clang-tools-extra/clang-tidy/utils/LexerUtils.h
index ea9bd512b68b8f..c3f8721bca5ffa 100644
--- a/clang-tools-extra/clang-tidy/utils/LexerUtils.h
+++ b/clang-tools-extra/clang-tidy/utils/LexerUtils.h
@@ -89,10 +89,6 @@ SourceLocation findNextAnyTokenKind(SourceLocation Start,
   }
 }
 
-std::optional<Token>
-findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
-                               const LangOptions &LangOpts);
-
 // Finds next token that's not a comment.
 std::optional<Token> findNextTokenSkippingComments(SourceLocation Start,
                                                    const SourceManager &SM,
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index b6ecc7e5ded9e2..82a041ea3f848a 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -554,7 +554,8 @@ class Lexer : public PreprocessorLexer {
   /// Returns the next token, or std::nullopt if the location is inside a macro.
   static std::optional<Token> findNextToken(SourceLocation Loc,
                                             const SourceManager &SM,
-                                            const LangOptions &LangOpts);
+                                            const LangOptions &LangOpts,
+                                            bool IncludeComments = false);
 
   /// Checks that the given token is the first token that occurs after
   /// the given location (this excludes comments and whitespace). Returns the
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 72364500a48f9f..115b6c1606a022 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -1323,7 +1323,8 @@ const char *Lexer::SkipEscapedNewLines(const char *P) {
 
 std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
                                           const SourceManager &SM,
-                                          const LangOptions &LangOpts) {
+                                          const LangOptions &LangOpts,
+                                          bool IncludeComments) {
   if (Loc.isMacroID()) {
     if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))
       return std::nullopt;
@@ -1344,6 +1345,7 @@ std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
   // Lex from the start of the given location.
   Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
                                       TokenBegin, File.end());
+  lexer.SetCommentRetentionState(IncludeComments);
   // Find the token.
   Token Tok;
   lexer.LexFromRawLexer(Tok);

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat :)

utils::lexer::findNextTokenIncludingComments(
Tok.getEndLoc(), Sources, getLangOpts());
std::optional<Token> NextToken = Lexer::findNextToken(
Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);
Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments=*/true);

@@ -59,7 +59,7 @@ SourceRange NS::getNamespaceBackRange(const SourceManager &SM,
// Back from '}' to conditional '// namespace xxx'
SourceLocation Loc = front()->getRBraceLoc();
std::optional<Token> Tok =
utils::lexer::findNextTokenIncludingComments(Loc, SM, LangOpts);
Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);

Comment on lines 92 to 95
std::optional<Token>
findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
const LangOptions &LangOpts);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could keep that, and just use Lexer::findNextToken as its implementation (it would avoid the /*IncludeComments=*/true everywhere)

We have two copies of the same code in clang-tidy and
clang-reorder-fields, and those are extremenly similar to
`Lexer::findNextToken`, so just add an extra agument to the latter.
@legrosbuffle
Copy link
Contributor Author

Thanks, and sorry I had to force-push because github would not refresh the PR (https://github.com/orgs/community/discussions/78775)

Apply suggestions

Co-authored-by: cor3ntin <[email protected]>
Comment on lines +128 to +129
std::optional<Token> Tok =
Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<Token> Tok =
Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);
std::optional<Token> Tok =
findNextTokenIncludingComments(Loc, SM, LangOpts);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, right?

while (Tok && Tok->is(tok::comment) &&
SM.getPresumedColumnNumber(Tok->getLocation()) > Column) {
Loc = Tok->getEndLoc();
Tok = getTokenAfter(Loc, SM, LangOpts);
Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);
Tok = findNextTokenIncludingComments(Loc, SM, LangOpts);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think clang-reorder-fields is supposed to depend on stuff from clang-tidy ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think it's not worth it (adding a dependency on clang-tidy for the sake of findNextTokenIncludingComments seems like overkill).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed actually, thanks

Copy link

github-actions bot commented Jan 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@alexander-shaposhnikov alexander-shaposhnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@cor3ntin cor3ntin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@legrosbuffle legrosbuffle merged commit 1819646 into llvm:main Jan 16, 2025
8 checks passed
Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lexer::findNextToken has an additional check:

if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))

Does this not change behavior in the case we are replacing? I imagine folks just copied from Lexer::findNextToken and purposely left this out, is that wrong?

@@ -118,47 +118,19 @@ findMembersUsedInInitExpr(const CXXCtorInitializer *Initializer,
return Results;
}

/// Returns the next token after `Loc` (including comment tokens).
static std::optional<Token> getTokenAfter(SourceLocation Loc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like just a better formatted findNextTokenIncludingComments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why we wanted to get rid of that code, it does not really make sense to have 3 quasi-identical copies of the same code.

@legrosbuffle
Copy link
Contributor Author

Lexer::findNextToken has an additional check:

if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))

Does this not change behavior in the case we are replacing? I imagine folks just copied from Lexer::findNextToken and purposely left this out, is that wrong?

I think the intent was clearly to copy the whole thing, the original author states: "This part of code copy from Lexer::findNextToken except this line. " source "this line" referring to L.SetCommentRetentionState(WithComment);.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category clang-tidy clang-tools-extra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants