Skip to content

[Clang][Comments] Attach comments to decl even if preproc directives are in between #88367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 1, 2024

Conversation

hdoc
Copy link
Contributor

@hdoc hdoc commented Apr 11, 2024

Background

It's surprisingly common for C++ code in the wild to conditionally show/hide declarations to Doxygen through the use of preprocessor directives. One especially common version of this pattern is demonstrated below:

/// @brief Test comment
#ifdef DOXYGEN_BUILD_ENABLED
template<typename T>
#else
template <typename T>
typename std::enable_if<std::is_integral<T>::value>::type
#endif
void f() {}

There are more examples I've collected below to demonstrate usage of this pattern:

From my research, it seems like the most common rationale for this functionality is hiding difficult-to-parse code from Doxygen, especially where template metaprogramming is concerned.

Currently, Clang does not support attaching comments to decls if there are preprocessor comments between the comment and the decl. This is enforced here:

// There should be no other declarations or preprocessor directives between
// comment and declaration.
if (Text.find_last_of(";{}#@") != StringRef::npos)
return nullptr;

Alongside preprocessor directives, any instance of ;{}#@ between a comment and decl will cause the comment to not be attached to the decl.

Rationale

It would be nice for Clang-based documentation tools, such as hdoc, to support code using this pattern. Users expect to see comments attached to the relevant decl — even if there is an #ifdef in the way — which Clang does not currently do.

History

Originally, commas were also in the list of "banned" characters, but were removed in b534d3a0ef69 (link) because availability macros often have commas in them. From my reading of the code, it appears that the original intent of the code was to exclude macros and decorators between comments and decls, possibly in an attempt to properly attribute comments to macros (discussed further in "Complications", below). There's some more discussion here: https://reviews.llvm.org/D125061.

Change

This modifies Clang comment parsing so that comments are attached to subsequent declarations even if there are preprocessor directives between the end of the comment and the start of the decl. Furthermore, this change:

  • Adds tests to verify that comments are attached to their associated decls even if there are preprocessor directives in between
  • Adds tests to verify that current behavior has not changed (i.e. use of the other characters between comment and decl will result in the comment not being attached to the decl)
  • Updates existing lit tests which would otherwise break.

Complications

Clang does not yet support attaching doc comments to macros. Consequently, the change proposed in this RFC affects cases where a doc comment attached to a macro is followed immediately by a normal declaration. In these cases, the macro's doc comments will be attached to the subsequent decl. Previously they would be ignored because any preprocessor directives between a comment and a decl would result in the comment not being attached to the decl. An example of this is shown below.

/// Doc comment for a function-like macro
/// @param n
///    A macro argument
#define custom_sqrt(n) __internal_sqrt(n)

int __internal_sqrt(int n) { return __builtin_sqrt(n); }

// NB: the doc comment for the custom_sqrt macro will actually be attached to __internal_sqrt!

There is a real instance of this problem in the Clang codebase, namely here:

/// Perform matrix multiplication of two tiles containing complex elements and
/// accumulate the results into a packed single precision tile. Each dword
/// element in input tiles \a a and \a b is interpreted as a complex number
/// with FP16 real part and FP16 imaginary part.
/// Calculates the real part of the result. For each possible combination
/// of (row of \a a, column of \a b), it performs a set of multiplication
/// and accumulations on all corresponding complex numbers (one from \a a
/// and one from \a b). The real part of the \a a element is multiplied
/// with the real part of the corresponding \a b element, and the negated
/// imaginary part of the \a a element is multiplied with the imaginary
/// part of the corresponding \a b elements. The two accumulated results
/// are added, and then accumulated into the corresponding row and column
/// of \a dst.
///
/// \headerfile <x86intrin.h>
///
/// \code
/// void _tile_cmmrlfp16ps(__tile dst, __tile a, __tile b);
/// \endcode
///
/// \code{.operation}
/// FOR m := 0 TO dst.rows - 1
/// tmp := dst.row[m]
/// FOR k := 0 TO (a.colsb / 4) - 1
/// FOR n := 0 TO (dst.colsb / 4) - 1
/// tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
/// tmp.fp32[n] += FP32(-a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
/// ENDFOR
/// ENDFOR
/// write_row_and_zero(dst, m, tmp, dst.colsb)
/// ENDFOR
/// zero_upper_rows(dst, dst.rows)
/// zero_tileconfig_start()
/// \endcode
///
/// This intrinsic corresponds to the \c TCMMIMFP16PS instruction.
///
/// \param dst
/// The destination tile. Max size is 1024 Bytes.
/// \param a
/// The 1st source tile. Max size is 1024 Bytes.
/// \param b
/// The 2nd source tile. Max size is 1024 Bytes.
#define _tile_cmmrlfp16ps(dst, a, b) __builtin_ia32_tcmmrlfp16ps(dst, a, b)
static __inline__ _tile1024i __DEFAULT_FN_ATTRS_COMPLEX
_tile_cmmimfp16ps_internal(unsigned short m, unsigned short n, unsigned short k,
_tile1024i dst, _tile1024i src1, _tile1024i src2) {
return __builtin_ia32_tcmmimfp16ps_internal(m, n, k, dst, src1, src2);
}

As part of this RFC, I've added a semicolon to break up the Clang comment parsing so that the -Wdocumentation errors go away, but this is a hack. The real solution is to fix Clang comment parsing so that doc comments are properly attached to macros, however this would be a large change that is outside of the scope of this RFC.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@hdoc hdoc force-pushed the dev/attach-comment-to-decl branch from 54c1147 to 61612c5 Compare April 11, 2024 09:28
@hdoc
Copy link
Contributor Author

hdoc commented May 5, 2024

Ping

@Boddlnagg
Copy link

Regarding the positioning of Doxygen comments, I just saw this and was wondering if it's possible to also support parsing of Doxygen comments that come after the signature, which happens to be the preferred style in our codebase ...

static int doSomething(int foobar)
/** \brief ...
 * \param foobar foo bar
 * \return  return value
 */
{
...

Doxygen accepts this, although I don't know if it's specified/documented as such.

@hdoc
Copy link
Contributor Author

hdoc commented May 7, 2024

From my reading of the Doxygen documentation it seems like anything goes except putting the doc comment inside the actual body of a function. That being said, I think that change is out of the scope of this PR because it would likely require modifications to the Clang comment-to-decl matching functionality. It's better to do that in a separate PR

@AaronBallman
Copy link
Collaborator

Is this still in Draft status or should this be marked as ready for review?

@hdoc hdoc marked this pull request as ready for review May 9, 2024 16:26
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 labels May 9, 2024
@hdoc
Copy link
Contributor Author

hdoc commented May 9, 2024

@AaronBallman this one is ready for review :)

@llvmbot llvmbot added clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics labels May 9, 2024
@llvmbot
Copy link
Member

llvmbot commented May 9, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: hdoc (hdoc)

Changes

Background

It's surprisingly common for C++ code in the wild to conditionally show/hide declarations to Doxygen through the use of preprocessor directives. One especially common version of this pattern is demonstrated below:

/// @<!-- -->brief Test comment
#ifdef DOXYGEN_BUILD_ENABLED
template&lt;typename T&gt;
#else
template &lt;typename T&gt;
typename std::enable_if&lt;std::is_integral&lt;T&gt;::value&gt;::type
#endif
void f() {}

There are more examples I've collected below to demonstrate usage of this pattern:

From my research, it seems like the most common rationale for this functionality is hiding difficult-to-parse code from Doxygen, especially where template metaprogramming is concerned.

Currently, Clang does not support attaching comments to decls if there are preprocessor comments between the comment and the decl. This is enforced here:

// There should be no other declarations or preprocessor directives between
// comment and declaration.
if (Text.find_last_of(";{}#@") != StringRef::npos)
return nullptr;

Alongside preprocessor directives, any instance of ;{}#@ between a comment and decl will cause the comment to not be attached to the decl.

Rationale

It would be nice for Clang-based documentation tools, such as hdoc, to support code using this pattern. Users expect to see comments attached to the relevant decl — even if there is an #ifdef in the way — which Clang does not currently do.

History

Originally, commas were also in the list of "banned" characters, but were removed in b534d3a0ef69 (link) because availability macros often have commas in them. From my reading of the code, it appears that the original intent of the code was to exclude macros and decorators between comments and decls, possibly in an attempt to properly attribute comments to macros (discussed further in "Complications", below). There's some more discussion here: https://reviews.llvm.org/D125061.

Change

This modifies Clang comment parsing so that comments are attached to subsequent declarations even if there are preprocessor directives between the end of the comment and the start of the decl. Furthermore, this change:

  • Adds tests to verify that comments are attached to their associated decls even if there are preprocessor directives in between
  • Adds tests to verify that current behavior has not changed (i.e. use of the other characters between comment and decl will result in the comment not being attached to the decl)
  • Updates existing lit tests which would otherwise break.

Complications

Clang does not yet support attaching doc comments to macros. Consequently, the change proposed in this RFC affects cases where a doc comment attached to a macro is followed immediately by a normal declaration. In these cases, the macro's doc comments will be attached to the subsequent decl. Previously they would be ignored because any preprocessor directives between a comment and a decl would result in the comment not being attached to the decl. An example of this is shown below.

/// Doc comment for a function-like macro
/// @<!-- -->param n
///    A macro argument
#define custom_sqrt(n) __internal_sqrt(n)

int __internal_sqrt(int n) { return __builtin_sqrt(n); }

// NB: the doc comment for the custom_sqrt macro will actually be attached to __internal_sqrt!

There is a real instance of this problem in the Clang codebase, namely here:

/// Perform matrix multiplication of two tiles containing complex elements and
/// accumulate the results into a packed single precision tile. Each dword
/// element in input tiles \a a and \a b is interpreted as a complex number
/// with FP16 real part and FP16 imaginary part.
/// Calculates the real part of the result. For each possible combination
/// of (row of \a a, column of \a b), it performs a set of multiplication
/// and accumulations on all corresponding complex numbers (one from \a a
/// and one from \a b). The real part of the \a a element is multiplied
/// with the real part of the corresponding \a b element, and the negated
/// imaginary part of the \a a element is multiplied with the imaginary
/// part of the corresponding \a b elements. The two accumulated results
/// are added, and then accumulated into the corresponding row and column
/// of \a dst.
///
/// \headerfile <x86intrin.h>
///
/// \code
/// void _tile_cmmrlfp16ps(__tile dst, __tile a, __tile b);
/// \endcode
///
/// \code{.operation}
/// FOR m := 0 TO dst.rows - 1
/// tmp := dst.row[m]
/// FOR k := 0 TO (a.colsb / 4) - 1
/// FOR n := 0 TO (dst.colsb / 4) - 1
/// tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
/// tmp.fp32[n] += FP32(-a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
/// ENDFOR
/// ENDFOR
/// write_row_and_zero(dst, m, tmp, dst.colsb)
/// ENDFOR
/// zero_upper_rows(dst, dst.rows)
/// zero_tileconfig_start()
/// \endcode
///
/// This intrinsic corresponds to the \c TCMMIMFP16PS instruction.
///
/// \param dst
/// The destination tile. Max size is 1024 Bytes.
/// \param a
/// The 1st source tile. Max size is 1024 Bytes.
/// \param b
/// The 2nd source tile. Max size is 1024 Bytes.
#define _tile_cmmrlfp16ps(dst, a, b) __builtin_ia32_tcmmrlfp16ps(dst, a, b)
static __inline__ _tile1024i __DEFAULT_FN_ATTRS_COMPLEX
_tile_cmmimfp16ps_internal(unsigned short m, unsigned short n, unsigned short k,
_tile1024i dst, _tile1024i src1, _tile1024i src2) {
return __builtin_ia32_tcmmimfp16ps_internal(m, n, k, dst, src1, src2);
}

As part of this RFC, I've added a semicolon to break up the Clang comment parsing so that the -Wdocumentation errors go away, but this is a hack. The real solution is to fix Clang comment parsing so that doc comments are properly attached to macros, however this would be a large change that is outside of the scope of this RFC.


Full diff: https://github.com/llvm/llvm-project/pull/88367.diff

5 Files Affected:

  • (modified) clang/lib/AST/ASTContext.cpp (+4-3)
  • (modified) clang/lib/Headers/amxcomplexintrin.h (+18)
  • (modified) clang/lib/Headers/ia32intrin.h (+2)
  • (modified) clang/test/Index/annotate-comments.cpp (+3-2)
  • (modified) clang/unittests/AST/DeclTest.cpp (+310)
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 2fa6aedca4c6..1c8d55b86274 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -282,9 +282,10 @@ RawComment *ASTContext::getRawCommentForDeclNoCacheImpl(
   StringRef Text(Buffer + CommentEndOffset,
                  DeclLocDecomp.second - CommentEndOffset);
 
-  // There should be no other declarations or preprocessor directives between
-  // comment and declaration.
-  if (Text.find_last_of(";{}#@") != StringRef::npos)
+  // There should be no other declarations between comment and declaration.
+  // Preprocessor directives are implicitly allowed to be between a comment and
+  // its associated decl.
+  if (Text.find_last_of(";{}@") != StringRef::npos)
     return nullptr;
 
   return CommentBeforeDecl;
diff --git a/clang/lib/Headers/amxcomplexintrin.h b/clang/lib/Headers/amxcomplexintrin.h
index 84ef972fcadf..6ae40d2eab5c 100644
--- a/clang/lib/Headers/amxcomplexintrin.h
+++ b/clang/lib/Headers/amxcomplexintrin.h
@@ -107,6 +107,24 @@
 ///    The 2nd source tile. Max size is 1024 Bytes.
 #define _tile_cmmrlfp16ps(dst, a, b) __builtin_ia32_tcmmrlfp16ps(dst, a, b)
 
+/// Perform matrix multiplication of two tiles containing complex elements and
+///    accumulate the results into a packed single precision tile.
+///
+/// \param m
+///    The number of rows in the first tile and the number of rows in the result
+///    tile.
+/// \param n
+///    The number of columns in the second tile and the number of columns in the
+///    result tile.
+/// \param k
+///    The number of columns in the first tile and the number of rows in the
+///    second tile.
+/// \param dst
+///    Pointer to the destination tile where the result will be stored.
+/// \param src1
+///    Pointer to the first source tile.
+/// \param src2
+///    Pointer to the second source tile.
 static __inline__ _tile1024i __DEFAULT_FN_ATTRS_COMPLEX
 _tile_cmmimfp16ps_internal(unsigned short m, unsigned short n, unsigned short k,
                            _tile1024i dst, _tile1024i src1, _tile1024i src2) {
diff --git a/clang/lib/Headers/ia32intrin.h b/clang/lib/Headers/ia32intrin.h
index 8e65f232a0de..c9c92b9ee0f9 100644
--- a/clang/lib/Headers/ia32intrin.h
+++ b/clang/lib/Headers/ia32intrin.h
@@ -533,6 +533,8 @@ __rdtscp(unsigned int *__A) {
 /// \see __rdpmc
 #define _rdpmc(A) __rdpmc(A)
 
+;
+
 static __inline__ void __DEFAULT_FN_ATTRS
 _wbinvd(void) {
   __builtin_ia32_wbinvd();
diff --git a/clang/test/Index/annotate-comments.cpp b/clang/test/Index/annotate-comments.cpp
index 6f9f8f0bbbc9..bff25d46cf80 100644
--- a/clang/test/Index/annotate-comments.cpp
+++ b/clang/test/Index/annotate-comments.cpp
@@ -204,9 +204,9 @@ void isdoxy45(void);
 /// Ggg. IS_DOXYGEN_END
 void isdoxy46(void);
 
-/// IS_DOXYGEN_NOT_ATTACHED
+/// isdoxy47 IS_DOXYGEN_SINGLE
 #define FOO
-void notdoxy47(void);
+void isdoxy47(void);
 
 /// IS_DOXYGEN_START Aaa bbb
 /// \param ccc
@@ -330,6 +330,7 @@ void isdoxy54(int);
 // CHECK: annotate-comments.cpp:185:6: FunctionDecl=isdoxy44:{{.*}} BriefComment=[IS_DOXYGEN_START Aaa bbb ccc.]
 // CHECK: annotate-comments.cpp:195:6: FunctionDecl=isdoxy45:{{.*}} BriefComment=[Ddd eee. Fff.]
 // CHECK: annotate-comments.cpp:205:6: FunctionDecl=isdoxy46:{{.*}} BriefComment=[Ddd eee. Fff.]
+// CHECK: annotate-comments.cpp:209:6: FunctionDecl=isdoxy47:{{.*}} isdoxy47 IS_DOXYGEN_SINGLE
 // CHECK: annotate-comments.cpp:214:6: FunctionDecl=isdoxy48:{{.*}} BriefComment=[IS_DOXYGEN_START Aaa bbb]
 // CHECK: annotate-comments.cpp:218:6: FunctionDecl=isdoxy49:{{.*}} BriefComment=[IS_DOXYGEN_START Aaa]
 // CHECK: annotate-comments.cpp:222:6: FunctionDecl=isdoxy50:{{.*}} BriefComment=[Returns ddd IS_DOXYGEN_END]
diff --git a/clang/unittests/AST/DeclTest.cpp b/clang/unittests/AST/DeclTest.cpp
index 2530ce74eb6a..e5c1a5788642 100644
--- a/clang/unittests/AST/DeclTest.cpp
+++ b/clang/unittests/AST/DeclTest.cpp
@@ -545,3 +545,313 @@ TEST(Decl, TemplateArgumentDefaulted) {
   EXPECT_TRUE(ArgList.get(2).getIsDefaulted());
   EXPECT_TRUE(ArgList.get(3).getIsDefaulted());
 }
+
+TEST(Decl, CommentsAttachedToDecl1) {
+  const SmallVector<StringRef> Sources{
+      R"(
+        /// Test comment
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+        #if 0
+        // tralala
+        #endif
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+
+        #if 0
+        // tralala
+        #endif
+
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+        #ifdef DOCS
+        template<typename T>
+        #endif
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+
+        #ifdef DOCS
+        template<typename T>
+        #endif
+
+        void f();
+      )",
+  };
+
+  for (const auto code : Sources) {
+    auto AST = tooling::buildASTFromCodeWithArgs(code, /*Args=*/{"-std=c++20"});
+    ASTContext &Ctx = AST->getASTContext();
+
+    auto const *F = selectFirst<FunctionDecl>(
+        "id", match(functionDecl(hasName("f")).bind("id"), Ctx));
+    ASSERT_NE(F, nullptr);
+
+    auto const *C = Ctx.getRawCommentForDeclNoCache(F);
+    ASSERT_NE(C, nullptr);
+    EXPECT_EQ(C->getRawText(Ctx.getSourceManager()), "/// Test comment");
+  }
+}
+
+TEST(Decl, CommentsAttachedToDecl2) {
+  const SmallVector<StringRef> Sources{
+      R"(
+        /** Test comment
+         */
+        void f();
+      )",
+
+      R"(
+        /** Test comment
+         */
+
+        void f();
+      )",
+
+      R"(
+        /** Test comment
+         */
+        #if 0
+        /* tralala */
+        #endif
+        void f();
+      )",
+
+      R"(
+        /** Test comment
+         */
+
+        #if 0
+        /* tralala */
+        #endif
+
+        void f();
+      )",
+
+      R"(
+        /** Test comment
+         */
+        #ifdef DOCS
+        template<typename T>
+        #endif
+        void f();
+      )",
+
+      R"(
+        /** Test comment
+         */
+
+        #ifdef DOCS
+        template<typename T>
+        #endif
+
+        void f();
+      )",
+  };
+
+  for (const auto code : Sources) {
+    auto AST = tooling::buildASTFromCodeWithArgs(code, /*Args=*/{"-std=c++20"});
+    ASTContext &Ctx = AST->getASTContext();
+
+    auto const *F = selectFirst<FunctionDecl>(
+        "id", match(functionDecl(hasName("f")).bind("id"), Ctx));
+    ASSERT_NE(F, nullptr);
+
+    auto const *C = Ctx.getRawCommentForDeclNoCache(F);
+    ASSERT_NE(C, nullptr);
+    EXPECT_EQ(C->getRawText(Ctx.getSourceManager()),
+              "/** Test comment\n         */");
+  }
+}
+
+TEST(Decl, CommentsAttachedToDecl3) {
+  const SmallVector<StringRef> Sources{
+      R"(
+        /// @brief Test comment
+        void f();
+      )",
+
+      R"(
+        /// @brief Test comment
+
+        void f();
+      )",
+
+      R"(
+        /// @brief Test comment
+        #if 0
+        // tralala
+        #endif
+        void f();
+      )",
+
+      R"(
+        /// @brief Test comment
+
+        #if 0
+        // tralala
+        #endif
+
+        void f();
+      )",
+
+      R"(
+        /// @brief Test comment
+        #ifdef DOCS
+        template<typename T>
+        #endif
+        void f();
+      )",
+
+      R"(
+        /// @brief Test comment
+
+        #ifdef DOCS
+        template<typename T>
+        #endif
+
+        void f();
+     )",
+  };
+
+  for (const auto code : Sources) {
+    auto AST = tooling::buildASTFromCodeWithArgs(code, /*Args=*/{"-std=c++20"});
+    ASTContext &Ctx = AST->getASTContext();
+
+    auto const *F = selectFirst<FunctionDecl>(
+        "id", match(functionDecl(hasName("f")).bind("id"), Ctx));
+    ASSERT_NE(F, nullptr);
+
+    auto const *C = Ctx.getRawCommentForDeclNoCache(F);
+    ASSERT_NE(C, nullptr);
+    EXPECT_EQ(C->getRawText(Ctx.getSourceManager()), "/// @brief Test comment");
+  }
+}
+
+TEST(Decl, CommentsAttachedToDecl4) {
+  const SmallVector<StringRef> Sources{
+      R"(
+        /** \brief Test comment
+         */
+        void f();
+      )",
+
+      R"(
+        /** \brief Test comment
+         */
+
+        void f();
+      )",
+
+      R"(
+        /** \brief Test comment
+         */
+        #if 0
+        /* tralala */
+        #endif
+        void f();
+      )",
+
+      R"(
+        /** \brief Test comment
+         */
+
+        #if 0
+        /* tralala */
+        #endif
+
+        void f();
+      )",
+
+      R"(
+        /** \brief Test comment
+         */
+        #ifdef DOCS
+        template<typename T>
+        #endif
+        void f();
+      )",
+
+      R"(
+        /** \brief Test comment
+         */
+
+        #ifdef DOCS
+        template<typename T>
+        #endif
+
+        void f();
+     )",
+  };
+
+  for (const auto code : Sources) {
+    auto AST = tooling::buildASTFromCodeWithArgs(code, /*Args=*/{"-std=c++20"});
+    ASTContext &Ctx = AST->getASTContext();
+
+    auto const *F = selectFirst<FunctionDecl>(
+        "id", match(functionDecl(hasName("f")).bind("id"), Ctx));
+    ASSERT_NE(F, nullptr);
+
+    auto const *C = Ctx.getRawCommentForDeclNoCache(F);
+    ASSERT_NE(C, nullptr);
+    EXPECT_EQ(C->getRawText(Ctx.getSourceManager()),
+              "/** \\brief Test comment\n         */");
+  }
+}
+
+/// This example intentionally inserts characters between a doc comment and the
+/// associated declaration to verify that the comment does not become associated
+/// with the FunctionDecl.
+/// By default, Clang does not allow for other declarations (aside from
+/// preprocessor directives, as shown above) to be placed between a doc comment
+/// and a declaration.
+TEST(Decl, CommentsAttachedToDecl5) {
+  const SmallVector<StringRef> Sources{
+      R"(
+        /// Test comment
+        ;
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+        // @
+        void f();
+      )",
+
+      R"(
+        /// Test comment
+        // {}
+        void f();
+      )",
+  };
+
+  for (const auto code : Sources) {
+    auto AST = tooling::buildASTFromCodeWithArgs(code, /*Args=*/{"-std=c++20"});
+    ASTContext &Ctx = AST->getASTContext();
+
+    auto const *F = selectFirst<FunctionDecl>(
+        "id", match(functionDecl(hasName("f")).bind("id"), Ctx));
+    ASSERT_NE(F, nullptr);
+
+    auto const *C = Ctx.getRawCommentForDeclNoCache(F);
+    ASSERT_EQ(C, nullptr);
+  }
+}

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because doxygen supports documenting macros (https://www.doxygen.nl/manual/commands.html#cmddef), I am worried how often this will cause us to associate comments incorrectly on the declaration.

I wonder if we should be a bit smarter and check for #define at the start of a line when we encounter a #. e.g.,

/*!
  \def DERP(x)
  Does a derpy thing with x
*/
#define DERP(x) (x)

void derp(void); // Does not get the doxygen comment

while

/// Does amazing things, like works in the presence of
/// #define doing stupid things.
void func(); // does get the doxygen comment

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes generally LG aside from an update to a test, but this should also have a release note in clang/docs/ReleaseNotes.rst so users know about the new functionality and the limitations regarding macros.

@hdoc hdoc force-pushed the dev/attach-comment-to-decl branch from 31c954f to 571fb55 Compare June 7, 2024 01:45
@hdoc
Copy link
Contributor Author

hdoc commented Jun 9, 2024

@AaronBallman Added the blurb to the release notes, let me know if that matches your expectations or if any changes are required. Thank you!

@hdoc
Copy link
Contributor Author

hdoc commented Jun 19, 2024

Ping

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hdoc hdoc force-pushed the dev/attach-comment-to-decl branch from 823cd80 to d42bb16 Compare July 1, 2024 05:43
@hdoc
Copy link
Contributor Author

hdoc commented Jul 1, 2024

This is rebased onto the latest rev of the main branch and should be ready to go. We do not have commit permissions, so @AaronBallman could you please help merge this in? Thank you :)

@AaronBallman AaronBallman merged commit 9f04d75 into llvm:main Jul 1, 2024
8 checks passed
Copy link

github-actions bot commented Jul 1, 2024

@hdoc Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 1, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot1 while building clang at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/835

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/i386-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m32', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-Wthread-safety', '-Wthread-safety-reference', '-Wthread-safety-beta', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 9980 tests, 80 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: libFuzzer-x86_64-libcxx-Linux :: fuzzer-finalstats.test (1523 of 9980)
******************** TEST 'libFuzzer-x86_64-libcxx-Linux :: fuzzer-finalstats.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang    -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/SimpleTest.cpp -o /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/SimpleTest.cpp -o /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest
RUN: at line 2: /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest -seed=1 -runs=77 -print_final_stats=1 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest -seed=1 -runs=77 -print_final_stats=1
RUN: at line 9: /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/dict1.txt -runs=33 -print_final_stats=1 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS1
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/dict1.txt -runs=33 -print_final_stats=1
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS1
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test:10:15: error: FINAL_STATS1: expected string not found in input
FINAL_STATS1: stat::number_of_executed_units: 33
              ^
<stdin>:1:1: note: scanning from here
INFO: Running with entropic power schedule (0xFF, 100).
^
<stdin>:12:1: note: possible intended match here
stat::number_of_executed_units: 34
^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: INFO: Running with entropic power schedule (0xFF, 100). 
check:10'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2: INFO: Seed: 607427166 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~
            3: INFO: Loaded 1 modules (9 inline 8-bit counters): 9 [0x5616712bc4a8, 0x5616712bc4b1),  
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4: INFO: Loaded 1 PC tables (9 PCs): 9 [0x5616712bc4b8,0x5616712bc548),  
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Step 11 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/i386-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m32', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-Wthread-safety', '-Wthread-safety-reference', '-Wthread-safety-beta', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:60: warning: Path reported by clang does not exist: "/b/sanitizer-x86_64-linux/build/build_symbolizer/lib/clang/19/lib/x86_64-unknown-linux-gnu". This path was found by running ['/b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang', '--target=x86_64-unknown-linux-gnu', '-m64', '-print-runtime-dir'].
llvm-lit: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 9980 tests, 80 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: libFuzzer-x86_64-libcxx-Linux :: fuzzer-finalstats.test (1523 of 9980)
******************** TEST 'libFuzzer-x86_64-libcxx-Linux :: fuzzer-finalstats.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang    -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/SimpleTest.cpp -o /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/fuzzer -m64 /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/SimpleTest.cpp -o /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest
RUN: at line 2: /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest -seed=1 -runs=77 -print_final_stats=1 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest -seed=1 -runs=77 -print_final_stats=1
RUN: at line 9: /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/dict1.txt -runs=33 -print_final_stats=1 2>&1 | FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS1
+ /b/sanitizer-x86_64-linux/build/build_symbolizer/runtimes/runtimes-bins/compiler-rt/test/fuzzer/X86_64LibcxxLinuxConfig/Output/fuzzer-finalstats.test.tmp-SimpleTest /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/dict1.txt -runs=33 -print_final_stats=1
+ FileCheck /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test --check-prefix=FINAL_STATS1
/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test:10:15: error: FINAL_STATS1: expected string not found in input
FINAL_STATS1: stat::number_of_executed_units: 33
              ^
<stdin>:1:1: note: scanning from here
INFO: Running with entropic power schedule (0xFF, 100).
^
<stdin>:12:1: note: possible intended match here
stat::number_of_executed_units: 34
^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/fuzzer/fuzzer-finalstats.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: INFO: Running with entropic power schedule (0xFF, 100). 
check:10'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2: INFO: Seed: 607427166 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~
            3: INFO: Loaded 1 modules (9 inline 8-bit counters): 9 [0x5616712bc4a8, 0x5616712bc4b1),  
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4: INFO: Loaded 1 PC tables (9 PCs): 9 [0x5616712bc4b8,0x5616712bc548),  
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@gribozavr
Copy link
Collaborator

I'd like to push back against this change, I believe it is too broad. (Full disclosure: I'm the original author of this subsystem in Clang.)

We (Google) are integrating this change downstream and it causes a bunch of comments to get associated to unrelated declarations. The problem is particularly bad with file-level comments, which get attached to the first declaration.

For example: https://github.com/google/crubit/blob/main/rs_bindings_from_cc/test/golden/definition_of_forward_declaration.h

In this file the license comment gets attached to ForwardDeclaredStruct even though they are separated by an #include directive.

I think you should at the very minimum check for #include directives as a negative signal that breaks the connection.

@danakj
Copy link
Contributor

danakj commented Jul 2, 2024

#include and #define at a minimum should break the connection. Comments above a #define can be attributed to the macro defined there, which is something that Subdoc does (example result: https://suslib.cc/macro.sus_check.html)

AaronBallman added a commit that referenced this pull request Jul 2, 2024
…ectives are in between (#88367)"

This reverts commit 9f04d75.

There was post-commit feedback on the direction this PR took.
@AaronBallman
Copy link
Collaborator

Thanks for the post-commit feedback! I've reverted the changes in 072e81d so that @hdoc can investigate further.

@hdoc
Copy link
Contributor Author

hdoc commented Jul 2, 2024

+1 on the revert and thank you for letting me know about this. I'll make changes to limit how aggressively the frontend attaches comments to decls inline with what @gribozavr and @danakj suggested.

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…are in between (llvm#88367)

### Background

It's surprisingly common for C++ code in the wild to conditionally
show/hide declarations to Doxygen through the use of preprocessor
directives. One especially common version of this pattern is
demonstrated below:

```cpp
/// @brief Test comment
#ifdef DOXYGEN_BUILD_ENABLED
template<typename T>
#else
template <typename T>
typename std::enable_if<std::is_integral<T>::value>::type
#endif
void f() {}
```

There are more examples I've collected below to demonstrate usage of
this pattern:
- Example 1:
[Magnum](https://github.com/mosra/magnum/blob/8538610fa27e1db37070eaabe34f1e4e41648bab/src/Magnum/Resource.h#L117-L127)
- Example 2:
[libcds](https://github.com/khizmax/libcds/blob/9985d2a87feaa3e92532e28f4ab762a82855a49c/cds/container/michael_list_nogc.h#L36-L54)
- Example 3:
[rocPRIM](https://github.com/ROCm/rocPRIM/blob/609ae19565ff6a3499168b76a0be5652762e24f6/rocprim/include/rocprim/block/detail/block_reduce_raking_reduce.hpp#L60-L65)
 
From my research, it seems like the most common rationale for this
functionality is hiding difficult-to-parse code from Doxygen, especially
where template metaprogramming is concerned.

Currently, Clang does not support attaching comments to decls if there
are preprocessor comments between the comment and the decl. This is
enforced here:
https://github.com/llvm/llvm-project/blob/b6ebea7972cd05a8e4dcf0d5a54f2c793999995a/clang/lib/AST/ASTContext.cpp#L284-L287

Alongside preprocessor directives, any instance of `;{}#@` between a
comment and decl will cause the comment to not be attached to the decl.

#### Rationale

It would be nice for Clang-based documentation tools, such as
[hdoc](https://hdoc.io), to support code using this pattern. Users
expect to see comments attached to the relevant decl — even if there is
an `#ifdef` in the way — which Clang does not currently do.

#### History

Originally, commas were also in the list of "banned" characters, but
were removed in `b534d3a0ef69`
([link](llvm@b534d3a))
because availability macros often have commas in them. From my reading
of the code, it appears that the original intent of the code was to
exclude macros and decorators between comments and decls, possibly in an
attempt to properly attribute comments to macros (discussed further in
"Complications", below). There's some more discussion here:
https://reviews.llvm.org/D125061.

### Change

This modifies Clang comment parsing so that comments are attached to
subsequent declarations even if there are preprocessor directives
between the end of the comment and the start of the decl. Furthermore,
this change:

- Adds tests to verify that comments are attached to their associated
decls even if there are preprocessor directives in between
- Adds tests to verify that current behavior has not changed (i.e. use
of the other characters between comment and decl will result in the
comment not being attached to the decl)
- Updates existing `lit` tests which would otherwise break.

#### Complications

Clang [does not yet
support](llvm#38206) attaching
doc comments to macros. Consequently, the change proposed in this RFC
affects cases where a doc comment attached to a macro is followed
immediately by a normal declaration. In these cases, the macro's doc
comments will be attached to the subsequent decl. Previously they would
be ignored because any preprocessor directives between a comment and a
decl would result in the comment not being attached to the decl. An
example of this is shown below.

```cpp
/// Doc comment for a function-like macro
/// @param n
///    A macro argument
#define custom_sqrt(n) __internal_sqrt(n)

int __internal_sqrt(int n) { return __builtin_sqrt(n); }

// NB: the doc comment for the custom_sqrt macro will actually be attached to __internal_sqrt!
```

There is a real instance of this problem in the Clang codebase, namely
here:
https://github.com/llvm/llvm-project/blob/be10070f91b86a6f126d2451852242bfcb2cd366/clang/lib/Headers/amxcomplexintrin.h#L65-L114

As part of this RFC, I've added a semicolon to break up the Clang
comment parsing so that the `-Wdocumentation` errors go away, but this
is a hack. The real solution is to fix Clang comment parsing so that doc
comments are properly attached to macros, however this would be a large
change that is outside of the scope of this RFC.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…ectives are in between (llvm#88367)"

This reverts commit 9f04d75.

There was post-commit feedback on the direction this PR took.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…are in between (llvm#88367)

### Background

It's surprisingly common for C++ code in the wild to conditionally
show/hide declarations to Doxygen through the use of preprocessor
directives. One especially common version of this pattern is
demonstrated below:

```cpp
/// @brief Test comment
#ifdef DOXYGEN_BUILD_ENABLED
template<typename T>
#else
template <typename T>
typename std::enable_if<std::is_integral<T>::value>::type
#endif
void f() {}
```

There are more examples I've collected below to demonstrate usage of
this pattern:
- Example 1:
[Magnum](https://github.com/mosra/magnum/blob/8538610fa27e1db37070eaabe34f1e4e41648bab/src/Magnum/Resource.h#L117-L127)
- Example 2:
[libcds](https://github.com/khizmax/libcds/blob/9985d2a87feaa3e92532e28f4ab762a82855a49c/cds/container/michael_list_nogc.h#L36-L54)
- Example 3:
[rocPRIM](https://github.com/ROCm/rocPRIM/blob/609ae19565ff6a3499168b76a0be5652762e24f6/rocprim/include/rocprim/block/detail/block_reduce_raking_reduce.hpp#L60-L65)
 
From my research, it seems like the most common rationale for this
functionality is hiding difficult-to-parse code from Doxygen, especially
where template metaprogramming is concerned.

Currently, Clang does not support attaching comments to decls if there
are preprocessor comments between the comment and the decl. This is
enforced here:
https://github.com/llvm/llvm-project/blob/b6ebea7972cd05a8e4dcf0d5a54f2c793999995a/clang/lib/AST/ASTContext.cpp#L284-L287

Alongside preprocessor directives, any instance of `;{}#@` between a
comment and decl will cause the comment to not be attached to the decl.

#### Rationale

It would be nice for Clang-based documentation tools, such as
[hdoc](https://hdoc.io), to support code using this pattern. Users
expect to see comments attached to the relevant decl — even if there is
an `#ifdef` in the way — which Clang does not currently do.

#### History

Originally, commas were also in the list of "banned" characters, but
were removed in `b534d3a0ef69`
([link](llvm@b534d3a))
because availability macros often have commas in them. From my reading
of the code, it appears that the original intent of the code was to
exclude macros and decorators between comments and decls, possibly in an
attempt to properly attribute comments to macros (discussed further in
"Complications", below). There's some more discussion here:
https://reviews.llvm.org/D125061.

### Change

This modifies Clang comment parsing so that comments are attached to
subsequent declarations even if there are preprocessor directives
between the end of the comment and the start of the decl. Furthermore,
this change:

- Adds tests to verify that comments are attached to their associated
decls even if there are preprocessor directives in between
- Adds tests to verify that current behavior has not changed (i.e. use
of the other characters between comment and decl will result in the
comment not being attached to the decl)
- Updates existing `lit` tests which would otherwise break.

#### Complications

Clang [does not yet
support](llvm#38206) attaching
doc comments to macros. Consequently, the change proposed in this RFC
affects cases where a doc comment attached to a macro is followed
immediately by a normal declaration. In these cases, the macro's doc
comments will be attached to the subsequent decl. Previously they would
be ignored because any preprocessor directives between a comment and a
decl would result in the comment not being attached to the decl. An
example of this is shown below.

```cpp
/// Doc comment for a function-like macro
/// @param n
///    A macro argument
#define custom_sqrt(n) __internal_sqrt(n)

int __internal_sqrt(int n) { return __builtin_sqrt(n); }

// NB: the doc comment for the custom_sqrt macro will actually be attached to __internal_sqrt!
```

There is a real instance of this problem in the Clang codebase, namely
here:
https://github.com/llvm/llvm-project/blob/be10070f91b86a6f126d2451852242bfcb2cd366/clang/lib/Headers/amxcomplexintrin.h#L65-L114

As part of this RFC, I've added a semicolon to break up the Clang
comment parsing so that the `-Wdocumentation` errors go away, but this
is a hack. The real solution is to fix Clang comment parsing so that doc
comments are properly attached to macros, however this would be a large
change that is outside of the scope of this RFC.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…ectives are in between (llvm#88367)"

This reverts commit 9f04d75.

There was post-commit feedback on the direction this PR took.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants