-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang][test] add testing for the AST matcher reference #110258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][test] add testing for the AST matcher reference #110258
Conversation
Previously, the examples in the AST matcher reference, which gets generated by the doxygen comments in `ASTMatchers.h`, were untested and best effort. Some of the matchers had no or wrong examples of how to use the matcher. This patch introduces a simple DSL around doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy. In `ASTMatchers.h`, most matchers are documented with a doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that testing the documentation is done, is by using doxygens alias feature to declare custom aliases. These aliases forward to `<tt>text</tt>` (which is what doxygens \c does, but for multiple words). Using the doxygen aliases was the obvious choice, because there are (now) four consumers: - people reading the header/using signature help - the doxygen generated documentation - the generated html AST matcher reference - (new) the generated matcher tests This patch rewrites/extends the documentation such that all matchers have a documented example. The new `generate_ast_matcher_doc_tests.py` script will warn on any undocumented matchers (but not on matchers without a doxygen comment) and provides diagnostics and statistics about the matchers. Below is a file-level comment from the test generation script that describes how documenting matchers to be tested works on a slightly more technical level. In general, the new comments can be used as a reference for how to implement a tested documentation. The current statistics emitted by the parser are: ```text Statistics: doxygen_blocks : 519 missing_tests : 10 skipped_objc : 42 code_snippets : 503 matches : 820 matchers : 580 tested_matchers : 574 none_type_matchers : 6 ``` The tests are generated during building and the script will only print something if it found an issue (compile failure, parsing issues, the expected and actual number of failures differs). DSL for generating the tests from documentation. TLDR: The order for a single code snippet example is: \header{a.h} \endheader <- zero or more header \code int a = 42; \endcode \compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and whole languages \matcher{expr()} <- one or more matchers in succession \match{42} <- one ore more matches in succession \matcher{varDecl()} <- new matcher resets the context, the above \match will not count for this new matcher(-group) \match{int a = 42} <- only applies to the previous matcher (no the previous case) The above block can be repeated inside of a doxygen command for multiple code examples. Language Grammar: [] denotes an optional, and <> denotes user-input compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>} matcher_tag_key ::= type match_tag_key ::= type || std || count matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value> match_tags ::= [match_tag_key=<value>;]match_tag_key=<value> matcher ::= \matcher{[matcher_tags$]<matcher>} matchers ::= [matcher] matcher match ::= \match{[match_tags$]<match>} matches ::= [match] match case ::= matchers matches cases ::= [case] case header-block ::= \header{<name>} <code> \endheader code-block ::= \code <code> \endcode testcase ::= code-block [compile_args] cases The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of it's versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' seperator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples: - c all available versions of C - c++11 only C++11 - c++11-or-later C++11 or later - c++11-or-earlier C++11 or earlier - c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and 23 (inclusive) - c++11-23,c same as above Tags: Type: Match types are used to select where the string that is used to check if a node matches comes from. Available: code, name, typestr, typeofstr. The default is 'code'. Matcher types are used to mark matchers as submatchers with 'sub' or as deactivated using 'none'. Testing submatchers is not implemented. Count: Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1. Std: A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions. Fixes #57607 Fixes #63748
Fix for the buildbot failure due to lower python versions not supporting some types to be subscripted. Tested with python3.8.
This is an attempt to reland this PR, which created buildbot failures because of the python version not supporting subscripted types, and because previously, the test generation script would try to compile the code with a Removed the type-hints that didn't work, and testing that the examples work has been removed from the test generation script. That part will also be tested when running the unit test, and was only added to catch compile failures of the examples earlier. |
Original PR: #94248 |
The buildkite failure is unrelated |
## Problem Statement Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in `ASTMatchers.h`, were untested and best effort. Some of the matchers had no or wrong examples of how to use the matcher. ## Solution This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use. In `ASTMatchers.h`, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to `<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers: - people reading the header/using signature help - the Doxygen generated documentation - the generated HTML AST matcher reference - (new) the generated matcher tests This patch rewrites/extends the documentation such that all matchers have a documented example. The new `generate_ast_matcher_doc_tests.py` script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers. The current statistics emitted by the parser are: ```text Statistics: doxygen_blocks : 519 missing_tests : 10 skipped_objc : 42 code_snippets : 503 matches : 820 matchers : 580 tested_matchers : 574 none_type_matchers : 6 ``` The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests). ## Description DSL for generating the tests from documentation. TLDR: ``` \header{a.h} \endheader <- zero or more header \code int a = 42; \endcode \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and whole languages \matcher{expr()} <- one or more matchers in succession \match{42} <- one or more matches in succession \matcher{varDecl()} <- new matcher resets the context, the above \match will not count for this new matcher(-group) \match{int a = 42} <- only applies to the previous matcher (not to the previous case) ``` The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher. The test generation script will only look for these annotations and ignore anything else like `\c` or the sentences where these annotations are embedded into: `The matcher \matcher{expr()} matches the number \match{42}.`. ### Language Grammar [] denotes an optional, and <> denotes user-input ``` compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>} matcher_tag_key ::= type match_tag_key ::= type || std || count || sub matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value> match_tags ::= [match_tag_key=<value>;]match_tag_key=<value> matcher ::= \matcher{[matcher_tags$]<matcher>} matchers ::= [matcher] matcher match ::= \match{[match_tags$]<match>} matches ::= [match] match case ::= matchers matches cases ::= [case] case header-block ::= \header{<name>} <code> \endheader code-block ::= \code <code> \endcode testcase ::= code-block [compile_args] cases ``` ### Language Standard Versions The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples: - `c` all available versions of C - `c++11` only C++11 - `c++11-or-later` C++11 or later - `c++11-or-earlier` C++11 or earlier - `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and 23 (inclusive) - `c++11-23,c` same as above ### Tags #### `type`: **Match types** are used to select where the string that is used to check if a node matches comes from. Available: `code`, `name`, `typestr`, `typeofstr`. The default is `code`. - `code`: Forwards to `tooling::fixit::getText(...)` and should be the preferred way to show what matches. - `name`: Casts the match to a `NamedDecl` and returns the result of `getNameAsString`. Useful when the matched AST node is not easy to spell out (`code` type), e.g., namespaces or classes with many members. - `typestr`: Returns the result of `QualType::getAsString` for the type derived from `Type` (otherwise, if it is derived from `Decl`, recurses with `Node->getTypeForDecl()`) **Matcher types** are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented. #### `count`: Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1. #### `std`: A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions. #### `sub`: The `sub` tag on a `\match` will indicate that the match is for a node of a bound sub-matcher. E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that binds to `inner`, which is the value for the `sub` tag of the expected match for the sub-matcher `\match{sub=inner$...}`. Currently, sub-matchers are not tested in any way. ### What if ...? #### ... I want to add a matcher? Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run `ninja check-clang-unit` to test the documentation. #### ... the example I wrote is wrong? The test-failure output of the generated test file will provide information about - where the generated test file is located - which line in `ASTMatcher.h` the example is from - which matches were: found, not-(yet)-found, expected - in case of an unexpected match: what the node looks like using the different `type`s - the language version and if the test ran with a windows `-target` flag (also in failure summary) #### ... I don't adhere to the required order of the syntax? The script will diagnose any found issues, such as `matcher is missing an example` with a `file:line:` prefix, which should provide enough information about the issue. #### ... the script diagnoses a false-positive issue with a Doxygen comment? It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the `expected_failure_statistics` at the top of the `generate_ast_matcher_doc_tests.py` file. Fixes llvm#57607 Fixes llvm#63748
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/161/builds/2385 Here is the relevant piece of the build log for the reference
|
)" This reverts commit e42cc3f.
…0354) Reverts #110258 The commit caused a timeout for clang-arm64-windows-msvc: https://lab.llvm.org/buildbot/#/builders/161/builds/2385 and it looks like my commit is at fault.
…0354) Reverts llvm/llvm-project#110258 The commit caused a timeout for clang-arm64-windows-msvc: https://lab.llvm.org/buildbot/#/builders/161/builds/2385 and it looks like my commit is at fault.
…0354) Reverts llvm/llvm-project#110258 The commit caused a timeout for clang-arm64-windows-msvc: https://lab.llvm.org/buildbot/#/builders/161/builds/2385 and it looks like my commit is at fault.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/554 Here is the relevant piece of the build log for the reference
|
Problem Statement
Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in
ASTMatchers.h
, were untested and best effort.Some of the matchers had no or wrong examples of how to use the matcher.
Solution
This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In
ASTMatchers.h
, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to<tt>text</tt>
(which is what Doxygen's\c
does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers:This patch rewrites/extends the documentation such that all matchers have a documented example.
The new
generate_ast_matcher_doc_tests.py
script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers.The current statistics emitted by the parser are:
The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests).
Description
DSL for generating the tests from documentation.
TLDR:
The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like
\c
or the sentences where these annotations are embedded into:The matcher \matcher{expr()} matches the number \match{42}.
.Language Grammar
[] denotes an optional, and <> denotes user-input
Language Standard Versions
The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').
Examples:
c
all available versions of Cc++11
only C++11c++11-or-later
C++11 or laterc++11-or-earlier
C++11 or earlierc++11-or-later,c++23-or-earlier,c
all of C and C++ between 11 and23 (inclusive)
c++11-23,c
same as aboveTags
type
:Match types are used to select where the string that is used to check if a node matches comes from.
Available:
code
,name
,typestr
,typeofstr
. The default iscode
.code
: Forwards totooling::fixit::getText(...)
and should be the preferred way to show what matches.name
: Casts the match to aNamedDecl
and returns the result ofgetNameAsString
. Useful when the matched AST node is not easy to spell out (code
type), e.g., namespaces or classes with many members.typestr
: Returns the result ofQualType::getAsString
for the type derived fromType
(otherwise, if it is derived fromDecl
, recurses withNode->getTypeForDecl()
)Matcher types are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented.
count
:Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1.
std
:A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions.
sub
:The
sub
tag on a\match
will indicate that the match is for a node of a bound sub-matcher.E.g.,
\matcher{expr(expr().bind("inner"))}
has a sub-matcher that binds toinner
, which is the value for thesub
tag of the expected match for the sub-matcher\match{sub=inner$...}
. Currently, sub-matchers are not tested in any way.What if ...?
... I want to add a matcher?
Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run
ninja check-clang-unit
to test the documentation.... the example I wrote is wrong?
The test-failure output of the generated test file will provide information about
ASTMatcher.h
the example is fromtype
s-target
flag (also in failure summary)... I don't adhere to the required order of the syntax?
The script will diagnose any found issues, such as
matcher is missing an example
with afile:line:
prefix,which should provide enough information about the issue.
... the script diagnoses a false-positive issue with a Doxygen comment?
It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the
expected_failure_statistics
at the top of thegenerate_ast_matcher_doc_tests.py
file.Fixes #57607
Fixes #63748