Skip to content

[clangd] Collect comments from function definitions into the index #67802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

ckandeler
Copy link
Member

This is useful with projects that put their (doxygen) comments at the implementation site, rather than the header.

@llvmbot llvmbot added clang Clang issues not falling into any other category clangd clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 29, 2023
@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2023

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clangd

Changes

This is useful with projects that put their (doxygen) comments at the implementation site, rather than the header.


Full diff: https://github.com/llvm/llvm-project/pull/67802.diff

3 Files Affected:

  • (modified) clang-tools-extra/clangd/index/Symbol.h (+3-1)
  • (modified) clang-tools-extra/clangd/index/SymbolCollector.cpp (+24-3)
  • (modified) clang/lib/AST/ASTContext.cpp (+10-1)
diff --git a/clang-tools-extra/clangd/index/Symbol.h b/clang-tools-extra/clangd/index/Symbol.h
index 1aa5265299231b7..0917abd3c7edad9 100644
--- a/clang-tools-extra/clangd/index/Symbol.h
+++ b/clang-tools-extra/clangd/index/Symbol.h
@@ -75,7 +75,9 @@ struct Symbol {
   /// (When snippets are disabled, the symbol name alone is used).
   /// Only set when the symbol is indexed for completion.
   llvm::StringRef CompletionSnippetSuffix;
-  /// Documentation including comment for the symbol declaration.
+  /// Comment for the symbol declaration.
+  llvm::StringRef DocComment;
+  /// Documentation including DocComment.
   llvm::StringRef Documentation;
   /// Type when this symbol is used in an expression. (Short display form).
   /// e.g. return type of a function, or type of a variable.
diff --git a/clang-tools-extra/clangd/index/SymbolCollector.cpp b/clang-tools-extra/clangd/index/SymbolCollector.cpp
index 74aca9b99c8a584..71833b56d8a02e2 100644
--- a/clang-tools-extra/clangd/index/SymbolCollector.cpp
+++ b/clang-tools-extra/clangd/index/SymbolCollector.cpp
@@ -1016,15 +1016,16 @@ const Symbol *SymbolCollector::addDeclaration(const NamedDecl &ND, SymbolID ID,
       *ASTCtx, *PP, CodeCompletionContext::CCC_Symbol, *CompletionAllocator,
       *CompletionTUInfo,
       /*IncludeBriefComments*/ false);
-  std::string Documentation =
-      formatDocumentation(*CCS, getDocComment(Ctx, SymbolCompletion,
-                                              /*CommentsFromHeaders=*/true));
+  std::string DocComment = getDocComment(Ctx, SymbolCompletion,
+                                         /*CommentsFromHeaders=*/true);
+  std::string Documentation = formatDocumentation(*CCS, DocComment);
   if (!(S.Flags & Symbol::IndexedForCodeCompletion)) {
     if (Opts.StoreAllDocumentation)
       S.Documentation = Documentation;
     Symbols.insert(S);
     return Symbols.find(S.ID);
   }
+  S.DocComment = DocComment;
   S.Documentation = Documentation;
   std::string Signature;
   std::string SnippetSuffix;
@@ -1065,6 +1066,26 @@ void SymbolCollector::addDefinition(const NamedDecl &ND,
   Symbol S = DeclSym;
   // FIXME: use the result to filter out symbols.
   S.Definition = *DefLoc;
+
+  std::string DocComment;
+  std::string Documentation;
+  if (S.DocComment.empty() &&
+      (llvm::isa<FunctionDecl>(ND) || llvm::isa<CXXMethodDecl>(ND))) {
+    CodeCompletionResult SymbolCompletion(&getTemplateOrThis(ND), 0);
+    const auto *CCS = SymbolCompletion.CreateCodeCompletionString(
+        *ASTCtx, *PP, CodeCompletionContext::CCC_Symbol, *CompletionAllocator,
+        *CompletionTUInfo,
+        /*IncludeBriefComments*/ false);
+    DocComment = getDocComment(ND.getASTContext(), SymbolCompletion,
+                               /*CommentsFromHeaders=*/true);
+    if (!S.Documentation.empty())
+      Documentation = S.Documentation.str() + '\n' + DocComment;
+    else
+      Documentation = formatDocumentation(*CCS, DocComment);
+    S.DocComment = DocComment;
+    S.Documentation = Documentation;
+  }
+
   Symbols.insert(S);
 }
 
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 57aaa05b1d81ddb..28cdc70ed444440 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -509,8 +509,17 @@ const RawComment *ASTContext::getRawCommentForAnyRedecl(
     if (LastCheckedRedecl) {
       if (LastCheckedRedecl == Redecl) {
         LastCheckedRedecl = nullptr;
+        continue;
+      }
+      if (auto F = llvm::dyn_cast<FunctionDecl>(Redecl)) {
+        if (!F->isThisDeclarationADefinition())
+          continue;
+      } else if (auto M = llvm::dyn_cast<CXXMethodDecl>(Redecl)) {
+        if (!M->isThisDeclarationADefinition())
+          continue;
+      } else {
+        continue;
       }
-      continue;
     }
     const RawComment *RedeclComment = getRawCommentForDeclNoCache(Redecl);
     if (RedeclComment) {

@ckandeler ckandeler requested a review from kadircet September 29, 2023 14:35
@ckandeler
Copy link
Member Author

I suspect that this is not The Proper Way. If so, I'll be thankful for any suggestions.

@ckandeler
Copy link
Member Author

ping

@HighCommander4
Copy link
Collaborator

Do you have another patch where you use the new DocComment field? Is it for showing in a hover?

@ckandeler
Copy link
Member Author

Do you have another patch where you use the new DocComment field? Is it for showing in a hover?

Yes, it is for showing documentation in a hover. clangd already supports that; it's just that it currently works only if the comments are attached to the declaration. With this patch it works also for comments at the implementation site, (which I think was the intended behavior all along). No additional patch is necessary.

@HighCommander4
Copy link
Collaborator

Ok, I see. (I was confused because nothing in the patch looks at the contents of Symbol::DocComment other than an empty() check; maybe a bool HasDocComment flag is sufficient?)

I'll have a more detailed look when I get a chance, but one suggestion I wanted to make in the meantime: for changes that add new information to the index, it helps to have a sense of how large of an increase to the index's disk and memory footprint they entail. In the past, I've measured this with the LLVM codebase's index as a "test case".

The disk footprint can be measured with a simple du -hs .cache/clangd/index or similar. For the memory footprint, we have a $/memoryUsage protocol extension that provides this information (the background_index entry in particular is of interest).

Perhaps you would be interested in taking some before/after measurements along these lines? Feel free to choose a different codebase than LLVM as the test case, especially if you know of one that uses "doc comments at the definition" as the prevailing style.

@ckandeler
Copy link
Member Author

Ok, I see. (I was confused because nothing in the patch looks at the contents of Symbol::DocComment other than
an empty() check; maybe a bool HasDocComment flag is sufficient?)

Right, we just need to save the information whether there was a doc comment before clangd put default values into the Documentation field.

I'll have a more detailed look when I get a chance, but one suggestion I wanted to make in the meantime: for
changes that add new information to the index, it helps to have a sense of how large of an increase to the index's
disk and memory footprint they entail. In the past, I've measured this with the LLVM codebase's index as a "test
case".

Will check.

@ckandeler
Copy link
Member Author

I have removed the extra Symbol member and extended the Flags enum instead.
I benchmarked with Qt as the test project (qtbase, to be exact), which is heavily documented and has all its function documentation in the cpp files, so it should provide an upper bound on the effects of this patch.
Index size before this patch : 68MB on-disk, 552MB in-memory
With original patch (extra Symbol member): 70MB/576MB
With latest patch set: 70MB/564 MB
I only did one measurement each, so I don't know if there might be unrelated fluctuations there.

@ckandeler
Copy link
Member Author

ping

@ckandeler
Copy link
Member Author

ping

Copy link
Collaborator

@HighCommander4 HighCommander4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response time. I finally had a chance to look at this in a bit more detail, though I still don't think I fully understand this patch.

At this point, my main feedback is that it would be good to include some unit tests. SymbolCollectorTests.cpp is a good place for them; see e.g. this test for an example of testing on a header/source file pair as input, and this test for an example of testing the value of the documentation field.

What is the purpose of the change to ASTContext.cpp? This codepath has consumers other than clangd, so if we're changing the behaviour, we need to evaluate whether it's appropriate for other consumers (and maybe add libAST unit tests as well).


std::string DocComment;
std::string Documentation;
if (!(S.Flags & Symbol::HasDocComment) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for both addDeclaration() and addDefinition() to be called for a given Decl* (if e.g. the definition is the first encountered declaration).

In such cases, if addDeclaration() failed to find a doc comment, we should probably avoid trying again in addDefinition() (since that will fail too and just run CreateCodeCompletionString() and getDocComment() needlessly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ckandeler
Copy link
Member Author

At this point, my main feedback is that it would be good to include some unit tests. SymbolCollectorTests.cpp is a good
place for them.

I've added a new test and verified that it fails without this patch.

What is the purpose of the change to ASTContext.cpp?

There was some assumption about the order of redeclarations that did not match reality, like that the definition always comes first or something along those lines. I do not know whether the assumption there was wrong or whether some other code messed up by breaking it.

This codepath has consumers other than clangd, so if we're changing the behaviour, we need to evaluate whether it's
appropriate for other consumers (and maybe add libAST unit tests as well).

Hence my original comment. Though it should be noted that no existing tests appear to have been broken.

Copy link
Collaborator

@HighCommander4 HighCommander4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test case. Debugging it helped me better understand the changes to ASTContext.cpp.

I believe the fact that these changes are needed to get the test to pass, tells us that there is a bug in ASTContext::getRawCommentForAnyRedecl(). However, I'm not sure that these changes are an appropriate fix for the bug.

I wrote a libAST unit test to demonstrate the bug, and filed #108145 about it. Hopefully folks more familiar with that code can suggest an appropriate fix there.

@ckandeler
Copy link
Member Author

I wrote a libAST unit test to demonstrate the bug, and filed #108145 about it. Hopefully folks more familiar with that code can suggest an appropriate fix there.

Awesome, thanks.

Copy link
Collaborator

@HighCommander4 HighCommander4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the test cases look good.

@HighCommander4
Copy link
Collaborator

Meanwhile, I thought of a fix for #108145 and submitted it as #108475.

If that's accepted, you should be able to drop the ASTContext.cpp changes from this patch.

@ckandeler
Copy link
Member Author

Meanwhile, I thought of a fix for #108145 and submitted it as #108475.
If that's accepted, you should be able to drop the ASTContext.cpp changes from this patch.

Nice, will do.

@@ -451,8 +451,17 @@ const RawComment *ASTContext::getRawCommentForAnyRedecl(
if (LastCheckedRedecl) {
if (LastCheckedRedecl == Redecl) {
LastCheckedRedecl = nullptr;
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for #108145 has merged now, so let's go ahead and drop this hunk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This is useful with projects that put their (doxygen) comments at the
implementation site, rather than the header.
Copy link
Collaborator

@HighCommander4 HighCommander4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The patch looks good to me. And the index size measurements reported in this comment look good as well, thank you for taking them.

@HighCommander4 HighCommander4 merged commit 0659fd9 into llvm:main Sep 21, 2024
8 checks passed
@ckandeler ckandeler deleted the def-comments branch September 23, 2024 10:36
cristianadam pushed a commit to cristianadam/qt-creator that referenced this pull request Sep 23, 2024
See llvm/llvm-project#67802.

Task-number: QTCREATORBUG-4557
Change-Id: I1b7135414104d67426eefded679a03fdaa306944
qtprojectorg pushed a commit to qt-creator/qt-creator that referenced this pull request Sep 26, 2024
See llvm/llvm-project#67802.

Task-number: QTCREATORBUG-4557
Change-Id: I1b7135414104d67426eefded679a03fdaa306944
Reviewed-by: David Schulz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category clang-tools-extra clangd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants