Skip to content

Reland: [clang][test] add testing for the AST matcher reference #112168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

5chmidti
Copy link
Contributor

Problem Statement

Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

Solution

This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In ASTMatchers.h, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to <tt>text</tt> (which is what Doxygen's \c does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:

  • people reading the header/using signature help
  • the Doxygen generated documentation
  • the generated HTML AST matcher reference
  • (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).

Description

DSL for generating the tests from documentation.

TLDR:

  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one or more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (not to the
                         previous case)

The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like \c or the sentences where these annotations are embedded into: The matcher \matcher{expr()} matches the number \match{42}..

Language Grammar

[] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases

Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

  • c all available versions of C
  • c++11 only C++11
  • c++11-or-later C++11 or later
  • c++11-or-earlier C++11 or earlier
  • c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
    23 (inclusive)
  • c++11-23,c same as above

Tags

type:

Match types are used to select where the string that is used to check if a node matches comes from.
Available: code, name, typestr, typeofstr. The default is code.

  • code: Forwards to tooling::fixit::getText(...) and should be the preferred way to show what matches.
  • name: Casts the match to a NamedDecl and returns the result of getNameAsString. Useful when the matched AST node is not easy to spell out (code type), e.g., namespaces or classes with many members.
  • typestr: Returns the result of QualType::getAsString for the type derived from Type (otherwise, if it is derived from Decl, recurses with Node->getTypeForDecl())

Matcher types are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented.

count:

Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1.

std:

A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions.

sub:

The sub tag on a \match will indicate that the match is for a node of a bound sub-matcher.
E.g., \matcher{expr(expr().bind("inner"))} has a sub-matcher that binds to inner, which is the value for the sub tag of the expected match for the sub-matcher \match{sub=inner$...}. Currently, sub-matchers are not tested in any way.

What if ...?

... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run ninja check-clang-unit to test the documentation.

... the example I wrote is wrong?

The test-failure output of the generated test file will provide information about

  • where the generated test file is located
  • which line in ASTMatcher.h the example is from
  • which matches were: found, not-(yet)-found, expected
  • in case of an unexpected match: what the node looks like using the different types
  • the language version and if the test ran with a windows -target flag (also in failure summary)

... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as matcher is missing an example with a file:line: prefix,
which should provide enough information about the issue.

... the script diagnoses a false-positive issue with a Doxygen comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the expected_failure_statistics at the top of the generate_ast_matcher_doc_tests.py file.

Fixes #57607
Fixes #63748

@5chmidti 5chmidti added the clang Clang issues not falling into any other category label Oct 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-clang

Author: Julian Schmidt (5chmidti)

Changes

Problem Statement

Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

Solution

This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In ASTMatchers.h, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to &lt;tt&gt;text&lt;/tt&gt; (which is what Doxygen's \c does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:

  • people reading the header/using signature help
  • the Doxygen generated documentation
  • the generated HTML AST matcher reference
  • (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).

Description

DSL for generating the tests from documentation.

TLDR:

  \header{a.h}
  \endheader     &lt;- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} &lt;- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} &lt;- one or more matchers in succession
  \match{42}   &lt;- one or more matches in succession

  \matcher{varDecl()} &lt;- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} &lt;- only applies to the previous matcher (not to the
                         previous case)

The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like \c or the sentences where these annotations are embedded into: The matcher \matcher{expr()} matches the number \match{42}..

Language Grammar

[] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[&lt;compile_arg&gt;;]&lt;compile_arg&gt;}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=&lt;value&gt;;]matcher_tag_key=&lt;value&gt;
  match_tags ::= [match_tag_key=&lt;value&gt;;]match_tag_key=&lt;value&gt;
  matcher ::= \matcher{[matcher_tags$]&lt;matcher&gt;}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]&lt;match&gt;}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{&lt;name&gt;} &lt;code&gt; \endheader
  code-block ::= \code &lt;code&gt; \endcode
  testcase ::= code-block [compile_args] cases

Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

  • c all available versions of C
  • c++11 only C++11
  • c++11-or-later C++11 or later
  • c++11-or-earlier C++11 or earlier
  • c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
    23 (inclusive)
  • c++11-23,c same as above

Tags

type:

Match types are used to select where the string that is used to check if a node matches comes from.
Available: code, name, typestr, typeofstr. The default is code.

  • code: Forwards to tooling::fixit::getText(...) and should be the preferred way to show what matches.
  • name: Casts the match to a NamedDecl and returns the result of getNameAsString. Useful when the matched AST node is not easy to spell out (code type), e.g., namespaces or classes with many members.
  • typestr: Returns the result of QualType::getAsString for the type derived from Type (otherwise, if it is derived from Decl, recurses with Node-&gt;getTypeForDecl())

Matcher types are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented.

count:

Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1.

std:

A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions.

sub:

The sub tag on a \match will indicate that the match is for a node of a bound sub-matcher.
E.g., \matcher{expr(expr().bind("inner"))} has a sub-matcher that binds to inner, which is the value for the sub tag of the expected match for the sub-matcher \match{sub=inner$...}. Currently, sub-matchers are not tested in any way.

What if ...?

... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run ninja check-clang-unit to test the documentation.

... the example I wrote is wrong?

The test-failure output of the generated test file will provide information about

  • where the generated test file is located
  • which line in ASTMatcher.h the example is from
  • which matches were: found, not-(yet)-found, expected
  • in case of an unexpected match: what the node looks like using the different types
  • the language version and if the test ran with a windows -target flag (also in failure summary)

... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as matcher is missing an example with a file:line: prefix,
which should provide enough information about the issue.

... the script diagnoses a false-positive issue with a Doxygen comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the expected_failure_statistics at the top of the generate_ast_matcher_doc_tests.py file.

Fixes #57607
Fixes #63748


Patch is 906.88 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/112168.diff

8 Files Affected:

  • (modified) clang/docs/LibASTMatchersReference.html (+5679-2272)
  • (modified) clang/docs/ReleaseNotes.rst (+3)
  • (modified) clang/docs/doxygen.cfg.in (+8-1)
  • (modified) clang/docs/tools/dump_ast_matchers.py (+63-5)
  • (modified) clang/include/clang/ASTMatchers/ASTMatchers.h (+4117-1664)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersTest.h (+427-3)
  • (modified) clang/unittests/ASTMatchers/CMakeLists.txt (+15)
  • (added) clang/utils/generate_ast_matcher_doc_tests.py (+1097)
diff --git a/clang/docs/LibASTMatchersReference.html b/clang/docs/LibASTMatchersReference.html
index a16b9c44ef0eab..baf39befd796a5 100644
--- a/clang/docs/LibASTMatchersReference.html
+++ b/clang/docs/LibASTMatchersReference.html
@@ -586,28 +586,36 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
   #pragma omp declare simd
   int min();
-attr()
-  matches "nodiscard", "nonnull", "noinline", and the whole "#pragma" line.
+
+The matcher attr()
+matches nodiscard, nonnull, noinline, and
+declare simd.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;</td><td class="name" onclick="toggle('cxxBaseSpecifier0')"><a name="cxxBaseSpecifier0Anchor">cxxBaseSpecifier</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxBaseSpecifier0"><pre>Matches class bases.
 
-Examples matches public virtual B.
+Given
   class B {};
   class C : public virtual B {};
+
+The matcher cxxRecordDecl(hasDirectBase(cxxBaseSpecifier()))
+matches C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;</td><td class="name" onclick="toggle('cxxCtorInitializer0')"><a name="cxxCtorInitializer0Anchor">cxxCtorInitializer</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxCtorInitializer0"><pre>Matches constructor initializers.
 
-Examples matches i(42).
+Given
   class C {
     C() : i(42) {}
     int i;
   };
+
+The matcher cxxCtorInitializer()
+matches i(42).
 </pre></td></tr>
 
 
@@ -619,17 +627,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
   public:
     int a;
   };
-accessSpecDecl()
-  matches 'public:'
+
+The matcher accessSpecDecl()
+matches public:.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('bindingDecl0')"><a name="bindingDecl0Anchor">bindingDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html">BindingDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="bindingDecl0"><pre>Matches binding declarations
-Example matches foo and bar
-(matcher = bindingDecl()
 
-  auto [foo, bar] = std::make_pair{42, 42};
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
+  auto [foo, bar] = make(42, 42);
+
+The matcher bindingDecl()
+matches foo and bar.
 </pre></td></tr>
 
 
@@ -642,14 +655,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
   myFunc(^(int p) {
     printf("%d", p);
   })
+
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('classTemplateDecl0')"><a name="classTemplateDecl0Anchor">classTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ClassTemplateDecl.html">ClassTemplateDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="classTemplateDecl0"><pre>Matches C++ class template declarations.
 
-Example matches Z
+Given
   template&lt;class T&gt; class Z {};
+
+The matcher classTemplateDecl()
+matches Z.
 </pre></td></tr>
 
 
@@ -660,13 +677,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;class T1, class T2, int I&gt;
   class A {};
 
-  template&lt;class T, int I&gt;
-  class A&lt;T, T*, I&gt; {};
+  template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {};
 
   template&lt;&gt;
   class A&lt;int, int, 1&gt; {};
-classTemplatePartialSpecializationDecl()
-  matches the specialization A&lt;T,T*,I&gt; but not A&lt;int,int,1&gt;
+
+The matcher classTemplatePartialSpecializationDecl()
+matches template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {},
+but does not match A&lt;int, int, 1&gt;.
 </pre></td></tr>
 
 
@@ -677,87 +695,128 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;typename T&gt; class A {};
   template&lt;&gt; class A&lt;double&gt; {};
   A&lt;int&gt; a;
-classTemplateSpecializationDecl()
-  matches the specializations A&lt;int&gt; and A&lt;double&gt;
+
+The matcher classTemplateSpecializationDecl()
+matches class A&lt;int&gt;
+and class A&lt;double&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('conceptDecl0')"><a name="conceptDecl0Anchor">conceptDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ConceptDecl.html">ConceptDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="conceptDecl0"><pre>Matches concept declarations.
 
-Example matches integral
-  template&lt;typename T&gt;
-  concept integral = std::is_integral_v&lt;T&gt;;
+Given
+  template&lt;typename T&gt; concept my_concept = true;
+
+
+The matcher conceptDecl()
+matches template&lt;typename T&gt;
+concept my_concept = true.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConstructorDecl0')"><a name="cxxConstructorDecl0Anchor">cxxConstructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructorDecl.html">CXXConstructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConstructorDecl0"><pre>Matches C++ constructor declarations.
 
-Example matches Foo::Foo() and Foo::Foo(int)
+Given
   class Foo {
    public:
     Foo();
     Foo(int);
     int DoSomething();
   };
+
+  struct Bar {};
+
+
+The matcher cxxConstructorDecl()
+matches Foo() and Foo(int).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConversionDecl0')"><a name="cxxConversionDecl0Anchor">cxxConversionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConversionDecl.html">CXXConversionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConversionDecl0"><pre>Matches conversion operator declarations.
 
-Example matches the operator.
+Given
   class X { operator int() const; };
+
+
+The matcher cxxConversionDecl()
+matches operator int() const.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDeductionGuideDecl0')"><a name="cxxDeductionGuideDecl0Anchor">cxxDeductionGuideDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDeductionGuideDecl.html">CXXDeductionGuideDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDeductionGuideDecl0"><pre>Matches user-defined and implicitly generated deduction guide.
 
-Example matches the deduction guide.
+Given
   template&lt;typename T&gt;
-  class X { X(int) };
+  class X { X(int); };
   X(int) -&gt; X&lt;int&gt;;
+
+
+The matcher cxxDeductionGuideDecl()
+matches the written deduction guide
+auto (int) -&gt; X&lt;int&gt;,
+the implicit copy deduction guide auto (int) -&gt; X&lt;T&gt;
+and the implicitly declared deduction guide
+auto (X&lt;T&gt;) -&gt; X&lt;T&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDestructorDecl0')"><a name="cxxDestructorDecl0Anchor">cxxDestructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDestructorDecl.html">CXXDestructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDestructorDecl0"><pre>Matches explicit C++ destructor declarations.
 
-Example matches Foo::~Foo()
+Given
   class Foo {
    public:
     virtual ~Foo();
   };
+
+  struct Bar {};
+
+
+The matcher cxxDestructorDecl()
+matches virtual ~Foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxMethodDecl0')"><a name="cxxMethodDecl0Anchor">cxxMethodDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXMethodDecl.html">CXXMethodDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxMethodDecl0"><pre>Matches method declarations.
 
-Example matches y
+Given
   class X { void y(); };
+
+
+The matcher cxxMethodDecl()
+matches void y().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxRecordDecl0')"><a name="cxxRecordDecl0Anchor">cxxRecordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXRecordDecl.html">CXXRecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxRecordDecl0"><pre>Matches C++ class declarations.
 
-Example matches X, Z
+Given
   class X;
   template&lt;class T&gt; class Z {};
+
+The matcher cxxRecordDecl()
+matches X and Z.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decl0')"><a name="decl0Anchor">decl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decl0"><pre>Matches declarations.
 
-Examples matches X, C, and the friend declaration inside C;
+Given
   void X();
   class C {
-    friend X;
+    friend void X();
   };
+
+The matcher decl()
+matches void X(), C
+and friend void X().
 </pre></td></tr>
 
 
@@ -767,40 +826,49 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { int y; };
-declaratorDecl()
-  matches int y.
+
+The matcher declaratorDecl()
+matches int y.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decompositionDecl0')"><a name="decompositionDecl0Anchor">decompositionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1DecompositionDecl.html">DecompositionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decompositionDecl0"><pre>Matches decomposition-declarations.
 
-Examples matches the declaration node with foo and bar, but not
-number.
-(matcher = declStmt(has(decompositionDecl())))
-
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
   int number = 42;
-  auto [foo, bar] = std::make_pair{42, 42};
+  auto [foo, bar] = make(42, 42);
+
+The matcher decompositionDecl()
+matches auto [foo, bar] = make(42, 42),
+but does not match number.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumConstantDecl0')"><a name="enumConstantDecl0Anchor">enumConstantDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumConstantDecl.html">EnumConstantDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumConstantDecl0"><pre>Matches enum constants.
 
-Example matches A, B, C
+Given
   enum X {
     A, B, C
   };
+The matcher enumConstantDecl()
+matches A, B and C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumDecl0')"><a name="enumDecl0Anchor">enumDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumDecl.html">EnumDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumDecl0"><pre>Matches enum declarations.
 
-Example matches X
+Given
   enum X {
     A, B, C
   };
+
+The matcher enumDecl()
+matches the enum X.
 </pre></td></tr>
 
 
@@ -808,9 +876,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="fieldDecl0"><pre>Matches field declarations.
 
 Given
-  class X { int m; };
-fieldDecl()
-  matches 'm'.
+  int a;
+  struct Foo {
+    int x;
+  };
+  void bar(int val);
+
+The matcher fieldDecl()
+matches int x.
 </pre></td></tr>
 
 
@@ -819,16 +892,20 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { friend void foo(); };
-friendDecl()
-  matches 'friend void foo()'.
+
+The matcher friendDecl()
+matches friend void foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('functionDecl0')"><a name="functionDecl0Anchor">functionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html">FunctionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="functionDecl0"><pre>Matches function declarations.
 
-Example matches f
+Given
   void f();
+
+The matcher functionDecl()
+matches void f().
 </pre></td></tr>
 
 
@@ -837,6 +914,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Example matches f
   template&lt;class T&gt; void f(T t) {}
+
+
+The matcher functionTemplateDecl()
+matches template&lt;class T&gt; void f(T t) {}.
 </pre></td></tr>
 
 
@@ -845,8 +926,8 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   struct X { struct { int a; }; };
-indirectFieldDecl()
-  matches 'a'.
+The matcher indirectFieldDecl()
+matches a.
 </pre></td></tr>
 
 
@@ -854,10 +935,13 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="labelDecl0"><pre>Matches a declaration of label.
 
 Given
-  goto FOO;
-  FOO: bar();
-labelDecl()
-  matches 'FOO:'
+  void bar();
+  void foo() {
+    goto FOO;
+    FOO: bar();
+  }
+The matcher labelDecl()
+matches FOO: bar().
 </pre></td></tr>
 
 
@@ -866,8 +950,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   extern "C" {}
-linkageSpecDecl()
-  matches "extern "C" {}"
+
+The matcher linkageSpecDecl()
+matches extern "C" {}.
 </pre></td></tr>
 
 
@@ -875,12 +960,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="namedDecl0"><pre>Matches a declaration of anything that could have a name.
 
 Example matches X, S, the anonymous union type, i, and U;
+Given
   typedef int X;
   struct S {
     union {
       int i;
     } U;
   };
+The matcher namedDecl()
+matches typedef int X, S, int i
+ and U,
+with S matching twice in C++.
+Once for the injected class name and once for the declaration itself.
 </pre></td></tr>
 
 
@@ -890,8 +981,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace test {}
   namespace alias = ::test;
-namespaceAliasDecl()
-  matches "namespace alias" but not "namespace test"
+
+The matcher namespaceAliasDecl()
+matches alias,
+but does not match test.
 </pre></td></tr>
 
 
@@ -901,8 +994,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace {}
   namespace test {}
-namespaceDecl()
-  matches "namespace {}" and "namespace test {}"
+
+The matcher namespaceDecl()
+matches namespace {} and namespace test {}.
 </pre></td></tr>
 
 
@@ -911,8 +1005,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-nonTypeTemplateParmDecl()
-  matches 'N', but not 'T'.
+
+The matcher nonTypeTemplateParmDecl()
+matches int N,
+but does not match typename T.
 </pre></td></tr>
 
 
@@ -922,6 +1018,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @interface Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -931,6 +1028,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @implementation Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -940,6 +1038,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @implementation Foo
   @end
+
 </pre></td></tr>
 
 
@@ -949,6 +1048,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @interface Foo
   @end
+
 </pre></td></tr>
 
 
@@ -960,6 +1060,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
     BOOL _enabled;
   }
   @end
+
 </pre></td></tr>
 
 
@@ -974,6 +1075,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @implementation Foo
   - (void)method {}
   @end
+
 </pre></td></tr>
 
 
@@ -984,6 +1086,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @interface Foo
   @property BOOL enabled;
   @end
+
 </pre></td></tr>
 
 
@@ -993,6 +1096,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches FooDelegate
   @protocol FooDelegate
   @end
+
 </pre></td></tr>
 
 
@@ -1001,48 +1105,58 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   void f(int x);
-parmVarDecl()
-  matches int x.
+The matcher parmVarDecl()
+matches int x.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('recordDecl0')"><a name="recordDecl0Anchor">recordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1RecordDecl.html">RecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="recordDecl0"><pre>Matches class, struct, and union declarations.
 
-Example matches X, Z, U, and S
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
+
+The matcher recordDecl()
+matches X, Z,
+S and U.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('staticAssertDecl0')"><a name="staticAssertDecl0Anchor">staticAssertDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1StaticAssertDecl.html">StaticAssertDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="staticAssertDecl0"><pre>Matches a C++ static_assert declaration.
 
-Example:
-  staticAssertDecl()
-matches
-  static_assert(sizeof(S) == sizeof(int))
-in
+Given
   struct S {
     int x;
   };
   static_assert(sizeof(S) == sizeof(int));
+
+
+The matcher staticAssertDecl()
+matches static_assert(sizeof(S) == sizeof(int)).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('tagDecl0')"><a name="tagDecl0Anchor">tagDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TagDecl.html">TagDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="tagDecl0"><pre>Matches tag declarations.
 
-Example matches X, Z, U, S, E
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
-  enum E {
-    A, B, C
-  };
+  enum E { A, B, C };
+
+
+The matcher tagDecl()
+matches class X, class Z {}, the injected class name
+class Z, struct S {},
+the injected class name struct S, union U {},
+the injected class name union U
+and enum E { A, B, C }.
 </pre></td></tr>
 
 
@@ -1051,8 +1165,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;template &lt;typename&gt; class Z, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'Z', but not 'N'.
+
+The matcher templateTemplateParmDecl()
+matches template &lt;typename&gt; class Z,
+but does not match int N.
 </pre></td></tr>
 
 
@@ -1061,8 +1177,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'T', but not 'N'.
+
+The matcher templateTypeParmDecl()
+matches typename T,
+but does not int N.
 </pre></td></tr>
 
 
@@ -1072,10 +1190,12 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   int X;
   namespace NS {
-  int Y;
+    int Y;
   }  // namespace NS
-decl(hasDeclContext(translationUnitDecl()))
-  matches "int X", but not "int Y".
+
+The matcher namedDecl(hasDeclContext(translationUnitDecl()))
+matches X and NS,
+but does not match Y.
 </pre></td></tr>
 
 
@@ -1085,17 +1205,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   typedef int X;
   using Y = int;
-typeAliasDecl()
-  matches "using Y = int", but not "typedef int X"
+
+The matcher typeAliasDecl()
+matches using Y = int,
+but does not match typedef int X.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('typeAliasTemplateDecl0')"><a name="typeAliasTemplateDecl0Anchor">typeAliasTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TypeAliasTemplateDecl.html">TypeAliasTemplateDecl</a>&gt;...</td></tr>
 <tr><t...
[truncated]

@5chmidti
Copy link
Contributor Author

The last five commits are the cumulative changes for fixing the Buildbots. I will check in with the Buildbot owner to see if the previous issue has been solved by 4a85fa3

Previously, the examples in the AST matcher reference, which gets
generated by the doxygen comments in `ASTMatchers.h`, were untested
and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

This patch introduces a simple DSL around doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy.
In `ASTMatchers.h`, most matchers are documented with a doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that testing the documentation is done, is by using doxygens
alias feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what doxygens \c does, but for multiple words).
Using the doxygen aliases was the obvious choice, because there are
(now) four consumers:
 - people reading the header/using signature help
 - the doxygen generated documentation
 - the generated html AST matcher reference
 - (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a doxygen comment)
and provides diagnostics and statistics about the matchers.
Below is a file-level comment from the test generation script that
describes how documenting matchers to be tested works on a slightly more
technical level. In general, the new comments can be used as a reference
for how to implement a tested documentation.

The current statistics emitted by the parser are:

```text
Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6
```

The tests are generated during building and the script will only print
something if it found an issue (compile failure, parsing issues,
the expected and actual number of failures differs).

DSL for generating the tests from documentation.

TLDR:
The order for a single code snippet example is:

  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one ore more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (no the
                         previous case)

The above block can be repeated inside of a doxygen command for multiple
code examples.

Language Grammar:
  [] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases

The 'std' tag and '\compile_args' support specifying a specific
language version, a whole language and all of it's versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' seperator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the matcher.
Predicates for the 'std' compiler flag are used with disjunction between
languages (e.g. 'c || c++') and conjunction for all predicates specific
to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:
 - c                                    all available versions of C
 - c++11                                only C++11
 - c++11-or-later                       C++11 or later
 - c++11-or-earlier                     C++11 or earlier
 - c++11-or-later,c++23-or-earlier,c    all of C and C++ between 11 and
                                          23 (inclusive)
 - c++11-23,c                             same as above

Tags:

  Type:
  Match types are used to select where the string that is used to check if
  a node matches comes from.
  Available: code, name, typestr, typeofstr.
  The default is 'code'.

  Matcher types are used to mark matchers as submatchers with 'sub' or as
  deactivated using 'none'. Testing submatchers is not implemented.

  Count:
  Specifying a 'count=n' on a match will result in a test that requires that
  the specified match will be matched n times. Default is 1.

  Std:
  A match allows specifying if it matches only in specific language versions.
  This may be needed when the AST differs between language versions.

Fixes llvm#57607
Fixes llvm#63748
Fix for the buildbot failure due to lower python versions not supporting
some types to be subscripted. Tested with python3.8.
@5chmidti 5chmidti force-pushed the users/5chmidti/add_testing_for_the_AST_matcher_reference branch from 210b60c to c59ab52 Compare October 31, 2024 08:32
@omjavaid
Copy link
Contributor

@5chmidti Sorry for the delay. I have tested this and it seems to compile on windows msvc without any regressions.

@5chmidti
Copy link
Contributor Author

@5chmidti Sorry for the delay. I have tested this and it seems to compile on windows msvc without any regressions.

Thank you for checking that it works 👍

@5chmidti 5chmidti merged commit 53e92e4 into llvm:main Nov 15, 2024
9 checks passed
@5chmidti 5chmidti deleted the users/5chmidti/add_testing_for_the_AST_matcher_reference branch November 15, 2024 09:51
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 16, 2024

LLVM Buildbot has detected a new failure on builder clang-arm64-windows-msvc running on linaro-armv8-windows-msvc-04 while building clang at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/161/builds/3227

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
[1516/1535] Linking CXX executable unittests\Transforms\Coroutines\CoroTests.exe
[1517/1535] Linking CXX executable unittests\Transforms\Vectorize\SandboxVectorizer\SandboxVectorizerTests.exe
[1518/1535] Linking CXX executable unittests\Transforms\Instrumentation\InstrumentationTests.exe
[1519/1535] Linking CXX executable unittests\tools\llvm-profgen\LLVMProfgenTests.exe
[1520/1535] Linking CXX executable unittests\Transforms\Utils\UtilsTests.exe
[1521/1535] Linking CXX executable unittests\tools\llvm-profdata\LLVMProfdataTests.exe
[1522/1535] Linking CXX executable unittests\Transforms\Scalar\ScalarTests.exe
[1523/1535] Linking CXX executable unittests\tools\llvm-cfi-verify\CFIVerifyTests.exe
[1524/1535] Linking CXX executable unittests\tools\llvm-mca\LLVMMCATests.exe
[1525/1535] Linking CXX executable unittests\tools\llvm-exegesis\LLVMExegesisTests.exe
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
program finished with exit code 1
elapsedTime=2216.650482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors in AST matcher documentation/examples? classTemplateSpecializationDecl does not work as described
4 participants