[clang][refactor] Refactor `findNextTokenIncludingComments` #123060

legrosbuffle · 2025-01-15T14:03:33Z

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to Lexer::findNextToken, so just add an extra agument to the latter.

llvmbot · 2025-01-15T14:04:04Z

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-tidy

Author: Clement Courbet (legrosbuffle)

Changes

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to Lexer::findNextToken, so just add an extra agument to the latter.

Full diff: https://github.com/llvm/llvm-project/pull/123060.diff

7 Files Affected:

(modified) clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp (+3-31)
(modified) clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp (+1-1)
(modified) clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp (+2-3)
(modified) clang-tools-extra/clang-tidy/utils/LexerUtils.cpp (-23)
(modified) clang-tools-extra/clang-tidy/utils/LexerUtils.h (-4)
(modified) clang/include/clang/Lex/Lexer.h (+2-1)
(modified) clang/lib/Lex/Lexer.cpp (+3-1)

diff --git a/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp b/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
index 80ee31368fe9a5..40c96f92254e42 100644
--- a/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
+++ b/clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp
@@ -118,35 +118,6 @@ findMembersUsedInInitExpr(const CXXCtorInitializer *Initializer,
   return Results;
 }
 
-/// Returns the next token after `Loc` (including comment tokens).
-static std::optional<Token> getTokenAfter(SourceLocation Loc,
-                                          const SourceManager &SM,
-                                          const LangOptions &LangOpts) {
-  if (Loc.isMacroID()) {
-    return std::nullopt;
-  }
-  Loc = Lexer::getLocForEndOfToken(Loc, 0, SM, LangOpts);
-
-  // Break down the source location.
-  std::pair<FileID, unsigned> LocInfo = SM.getDecomposedLoc(Loc);
-
-  // Try to load the file buffer.
-  bool InvalidTemp = false;
-  StringRef File = SM.getBufferData(LocInfo.first, &InvalidTemp);
-  if (InvalidTemp)
-    return std::nullopt;
-
-  const char *TokenBegin = File.data() + LocInfo.second;
-
-  Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-              TokenBegin, File.end());
-  lexer.SetCommentRetentionState(true);
-  // Find the token.
-  Token Tok;
-  lexer.LexFromRawLexer(Tok);
-  return Tok;
-}
-
 /// Returns the end of the trailing comments after `Loc`.
 static SourceLocation getEndOfTrailingComment(SourceLocation Loc,
                                               const SourceManager &SM,
@@ -154,11 +125,12 @@ static SourceLocation getEndOfTrailingComment(SourceLocation Loc,
   // We consider any following comment token that is indented more than the
   // first comment to be part of the trailing comment.
   const unsigned Column = SM.getPresumedColumnNumber(Loc);
-  std::optional<Token> Tok = getTokenAfter(Loc, SM, LangOpts);
+  std::optional<Token> Tok =
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   while (Tok && Tok->is(tok::comment) &&
          SM.getPresumedColumnNumber(Tok->getLocation()) > Column) {
     Loc = Tok->getEndLoc();
-    Tok = getTokenAfter(Loc, SM, LangOpts);
+    Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   }
   return Loc;
 }
diff --git a/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp b/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
index d24b613015d8ee..b4f54d02fc3362 100644
--- a/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
+++ b/clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp
@@ -59,7 +59,7 @@ SourceRange NS::getNamespaceBackRange(const SourceManager &SM,
   // Back from '}' to conditional '// namespace xxx'
   SourceLocation Loc = front()->getRBraceLoc();
   std::optional<Token> Tok =
-      utils::lexer::findNextTokenIncludingComments(Loc, SM, LangOpts);
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);
   if (!Tok)
     return getDefaultNamespaceBackRange();
   if (Tok->getKind() != tok::TokenKind::comment)
diff --git a/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp b/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
index fd5bd9f0b181b1..6191ebfbfb01f0 100644
--- a/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
+++ b/clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp
@@ -229,9 +229,8 @@ void UseOverrideCheck::check(const MatchFinder::MatchResult &Result) {
   if (HasVirtual) {
     for (Token Tok : Tokens) {
       if (Tok.is(tok::kw_virtual)) {
-        std::optional<Token> NextToken =
-            utils::lexer::findNextTokenIncludingComments(
-                Tok.getEndLoc(), Sources, getLangOpts());
+        std::optional<Token> NextToken = Lexer::findNextToken(
+            Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);
         if (NextToken.has_value()) {
           Diag << FixItHint::CreateRemoval(CharSourceRange::getCharRange(
               Tok.getLocation(), NextToken->getLocation()));
diff --git a/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp b/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
index 92c3e0ed7894e1..50da196315d3b3 100644
--- a/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
+++ b/clang-tools-extra/clang-tidy/utils/LexerUtils.cpp
@@ -86,29 +86,6 @@ SourceLocation findNextTerminator(SourceLocation Start, const SourceManager &SM,
   return findNextAnyTokenKind(Start, SM, LangOpts, tok::comma, tok::semi);
 }
 
-std::optional<Token>
-findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
-                               const LangOptions &LangOpts) {
-  // `Lexer::findNextToken` will ignore comment
-  if (Start.isMacroID())
-    return std::nullopt;
-  Start = Lexer::getLocForEndOfToken(Start, 0, SM, LangOpts);
-  // Break down the source location.
-  std::pair<FileID, unsigned> LocInfo = SM.getDecomposedLoc(Start);
-  bool InvalidTemp = false;
-  StringRef File = SM.getBufferData(LocInfo.first, &InvalidTemp);
-  if (InvalidTemp)
-    return std::nullopt;
-  // Lex from the start of the given location.
-  Lexer L(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-          File.data() + LocInfo.second, File.end());
-  L.SetCommentRetentionState(true);
-  // Find the token.
-  Token Tok;
-  L.LexFromRawLexer(Tok);
-  return Tok;
-}
-
 std::optional<Token>
 findNextTokenSkippingComments(SourceLocation Start, const SourceManager &SM,
                               const LangOptions &LangOpts) {
diff --git a/clang-tools-extra/clang-tidy/utils/LexerUtils.h b/clang-tools-extra/clang-tidy/utils/LexerUtils.h
index ea9bd512b68b8f..c3f8721bca5ffa 100644
--- a/clang-tools-extra/clang-tidy/utils/LexerUtils.h
+++ b/clang-tools-extra/clang-tidy/utils/LexerUtils.h
@@ -89,10 +89,6 @@ SourceLocation findNextAnyTokenKind(SourceLocation Start,
   }
 }
 
-std::optional<Token>
-findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
-                               const LangOptions &LangOpts);
-
 // Finds next token that's not a comment.
 std::optional<Token> findNextTokenSkippingComments(SourceLocation Start,
                                                    const SourceManager &SM,
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index b6ecc7e5ded9e2..82a041ea3f848a 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -554,7 +554,8 @@ class Lexer : public PreprocessorLexer {
   /// Returns the next token, or std::nullopt if the location is inside a macro.
   static std::optional<Token> findNextToken(SourceLocation Loc,
                                             const SourceManager &SM,
-                                            const LangOptions &LangOpts);
+                                            const LangOptions &LangOpts,
+                                            bool IncludeComments = false);
 
   /// Checks that the given token is the first token that occurs after
   /// the given location (this excludes comments and whitespace). Returns the
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 72364500a48f9f..115b6c1606a022 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -1323,7 +1323,8 @@ const char *Lexer::SkipEscapedNewLines(const char *P) {
 
 std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
                                           const SourceManager &SM,
-                                          const LangOptions &LangOpts) {
+                                          const LangOptions &LangOpts,
+                                          bool IncludeComments) {
   if (Loc.isMacroID()) {
     if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))
       return std::nullopt;
@@ -1344,6 +1345,7 @@ std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
   // Lex from the start of the given location.
   Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
                                       TokenBegin, File.end());
+  lexer.SetCommentRetentionState(IncludeComments);
   // Find the token.
   Token Tok;
   lexer.LexFromRawLexer(Tok);

cor3ntin

This is neat :)

cor3ntin · 2025-01-15T14:23:34Z

clang-tools-extra/clang-tidy/modernize/UseOverrideCheck.cpp

-            utils::lexer::findNextTokenIncludingComments(
-                Tok.getEndLoc(), Sources, getLangOpts());
+        std::optional<Token> NextToken = Lexer::findNextToken(
+            Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);


Suggested change

Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments*/ true);

Tok.getEndLoc(), Sources, getLangOpts(), /*IncludeComments=*/true);

cor3ntin · 2025-01-15T14:23:43Z

clang-tools-extra/clang-tidy/modernize/ConcatNestedNamespacesCheck.cpp

@@ -59,7 +59,7 @@ SourceRange NS::getNamespaceBackRange(const SourceManager &SM,
  // Back from '}' to conditional '// namespace xxx'
  SourceLocation Loc = front()->getRBraceLoc();
  std::optional<Token> Tok =
-      utils::lexer::findNextTokenIncludingComments(Loc, SM, LangOpts);
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);


Suggested change

Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments*/ true);

Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);

clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp

cor3ntin · 2025-01-15T14:27:53Z

clang-tools-extra/clang-tidy/utils/LexerUtils.h

-std::optional<Token>
-findNextTokenIncludingComments(SourceLocation Start, const SourceManager &SM,
-                               const LangOptions &LangOpts);
-


I think you could keep that, and just use Lexer::findNextToken as its implementation (it would avoid the /*IncludeComments=*/true everywhere)

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter.

legrosbuffle · 2025-01-16T07:13:32Z

Thanks, and sorry I had to force-push because github would not refresh the PR (https://github.com/orgs/community/discussions/78775)

Apply suggestions Co-authored-by: cor3ntin <[email protected]>

cor3ntin · 2025-01-16T08:28:29Z

clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp

+  std::optional<Token> Tok =
+      Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);


Suggested change

std::optional<Token> Tok =

Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);

std::optional<Token> Tok =

findNextTokenIncludingComments(Loc, SM, LangOpts);

We can do that, right?

cor3ntin · 2025-01-16T08:29:39Z

clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp

  while (Tok && Tok->is(tok::comment) &&
         SM.getPresumedColumnNumber(Tok->getLocation()) > Column) {
    Loc = Tok->getEndLoc();
-    Tok = getTokenAfter(Loc, SM, LangOpts);
+    Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);


Suggested change

Tok = Lexer::findNextToken(Loc, SM, LangOpts, /*IncludeComments=*/true);

Tok = findNextTokenIncludingComments(Loc, SM, LangOpts);

I don't think clang-reorder-fields is supposed to depend on stuff from clang-tidy ?

yeah, i think it's not worth it (adding a dependency on clang-tidy for the sake of findNextTokenIncludingComments seems like overkill).

Agreed actually, thanks

github-actions · 2025-01-16T08:53:15Z

✅ With the latest revision this PR passed the C/C++ code formatter.

alexander-shaposhnikov

lgtm

cor3ntin

LGTM

shafik

Lexer::findNextToken has an additional check:

if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))

Does this not change behavior in the case we are replacing? I imagine folks just copied from Lexer::findNextToken and purposely left this out, is that wrong?

shafik · 2025-01-16T21:19:50Z

clang-tools-extra/clang-reorder-fields/ReorderFieldsAction.cpp

@@ -118,47 +118,19 @@ findMembersUsedInInitExpr(const CXXCtorInitializer *Initializer,
  return Results;
 }

-/// Returns the next token after `Loc` (including comment tokens).
-static std::optional<Token> getTokenAfter(SourceLocation Loc,


This looks like just a better formatted findNextTokenIncludingComments

Yes, that's why we wanted to get rid of that code, it does not really make sense to have 3 quasi-identical copies of the same code.

legrosbuffle · 2025-01-17T09:32:31Z

Lexer::findNextToken has an additional check:

if (!Lexer::isAtEndOfMacroExpansion(Loc, SM, LangOpts, &Loc))

Does this not change behavior in the case we are replacing? I imagine folks just copied from Lexer::findNextToken and purposely left this out, is that wrong?

I think the intent was clearly to copy the whole thing, the original author states: "This part of code copy from Lexer::findNextToken except this line. " source "this line" referring to L.SetCommentRetentionState(WithComment);.

legrosbuffle requested a review from alexander-shaposhnikov January 15, 2025 14:03

llvmbot added clang Clang issues not falling into any other category clang-tools-extra clang-tidy clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 15, 2025

cor3ntin reviewed Jan 15, 2025

View reviewed changes

legrosbuffle added 2 commits January 15, 2025 15:49

[clang][refactor] Refactor findNextTokenIncludingComments

005a730

We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter.

address review comments.

c1f4603

legrosbuffle force-pushed the refactor-lexer branch from 46d06ab to c1f4603 Compare January 16, 2025 07:12

Apply suggestions from code review

d256171

Apply suggestions Co-authored-by: cor3ntin <[email protected]>

cor3ntin reviewed Jan 16, 2025

View reviewed changes

add unit tests

66acd22

alexander-shaposhnikov approved these changes Jan 16, 2025

View reviewed changes

format

7df85b4

cor3ntin approved these changes Jan 16, 2025

View reviewed changes

legrosbuffle merged commit 1819646 into llvm:main Jan 16, 2025
8 checks passed

shafik reviewed Jan 16, 2025

View reviewed changes

	Tok.getEndLoc(), Sources, getLangOpts(), /IncludeComments/ true);
	Tok.getEndLoc(), Sources, getLangOpts(), /IncludeComments=/true);

	Lexer::findNextToken(Loc, SM, LangOpts, /IncludeComments/ true);
	Lexer::findNextToken(Loc, SM, LangOpts, /IncludeComments=/true);

		std::optional<Token> Tok =
		Lexer::findNextToken(Loc, SM, LangOpts, /IncludeComments=/true);

	Tok = Lexer::findNextToken(Loc, SM, LangOpts, /IncludeComments=/true);
	Tok = findNextTokenIncludingComments(Loc, SM, LangOpts);

[clang][refactor] Refactor findNextTokenIncludingComments #123060

[clang][refactor] Refactor findNextTokenIncludingComments #123060

Uh oh!

Conversation

legrosbuffle commented Jan 15, 2025

Uh oh!

llvmbot commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cor3ntin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

legrosbuffle commented Jan 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexander-shaposhnikov left a comment

Choose a reason for hiding this comment

Uh oh!

cor3ntin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shafik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

legrosbuffle commented Jan 17, 2025

Uh oh!

Uh oh!

[clang][refactor] Refactor `findNextTokenIncludingComments` #123060

[clang][refactor] Refactor `findNextTokenIncludingComments` #123060

llvmbot commented Jan 15, 2025 •

edited

Loading

github-actions bot commented Jan 16, 2025 •

edited

Loading