Skip to content

Commit e42cc3f

Browse files
authored
[clang][test] add testing for the AST matcher reference (#110258)
## Problem Statement Previously, the examples in the AST matcher reference, which gets generated by the Doxygen comments in `ASTMatchers.h`, were untested and best effort. Some of the matchers had no or wrong examples of how to use the matcher. ## Solution This patch introduces a simple DSL around Doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use. In `ASTMatchers.h`, most matchers are documented with a Doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using Doxygen's alias feature to declare custom aliases. These aliases forward to `<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple words). Using the Doxygen aliases is the obvious choice, because there are (now) four consumers: - people reading the header/using signature help - the Doxygen generated documentation - the generated HTML AST matcher reference - (new) the generated matcher tests This patch rewrites/extends the documentation such that all matchers have a documented example. The new `generate_ast_matcher_doc_tests.py` script will warn on any undocumented matchers (but not on matchers without a Doxygen comment) and provides diagnostics and statistics about the matchers. The current statistics emitted by the parser are: ```text Statistics: doxygen_blocks : 519 missing_tests : 10 skipped_objc : 42 code_snippets : 503 matches : 820 matchers : 580 tested_matchers : 574 none_type_matchers : 6 ``` The tests are generated during building, and the script will only print something if it found an issue with the specified tests (e.g., missing tests). ## Description DSL for generating the tests from documentation. TLDR: ``` \header{a.h} \endheader <- zero or more header \code int a = 42; \endcode \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and whole languages \matcher{expr()} <- one or more matchers in succession \match{42} <- one or more matches in succession \matcher{varDecl()} <- new matcher resets the context, the above \match will not count for this new matcher(-group) \match{int a = 42} <- only applies to the previous matcher (not to the previous case) ``` The above block can be repeated inside a Doxygen command for multiple code examples for a single matcher. The test generation script will only look for these annotations and ignore anything else like `\c` or the sentences where these annotations are embedded into: `The matcher \matcher{expr()} matches the number \match{42}.`. ### Language Grammar [] denotes an optional, and <> denotes user-input ``` compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>} matcher_tag_key ::= type match_tag_key ::= type || std || count || sub matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value> match_tags ::= [match_tag_key=<value>;]match_tag_key=<value> matcher ::= \matcher{[matcher_tags$]<matcher>} matchers ::= [matcher] matcher match ::= \match{[match_tags$]<match>} matches ::= [match] match case ::= matchers matches cases ::= [case] case header-block ::= \header{<name>} <code> \endheader code-block ::= \code <code> \endcode testcase ::= code-block [compile_args] cases ``` ### Language Standard Versions The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples: - `c` all available versions of C - `c++11` only C++11 - `c++11-or-later` C++11 or later - `c++11-or-earlier` C++11 or earlier - `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and 23 (inclusive) - `c++11-23,c` same as above ### Tags #### `type`: **Match types** are used to select where the string that is used to check if a node matches comes from. Available: `code`, `name`, `typestr`, `typeofstr`. The default is `code`. - `code`: Forwards to `tooling::fixit::getText(...)` and should be the preferred way to show what matches. - `name`: Casts the match to a `NamedDecl` and returns the result of `getNameAsString`. Useful when the matched AST node is not easy to spell out (`code` type), e.g., namespaces or classes with many members. - `typestr`: Returns the result of `QualType::getAsString` for the type derived from `Type` (otherwise, if it is derived from `Decl`, recurses with `Node->getTypeForDecl()`) **Matcher types** are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented. #### `count`: Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1. #### `std`: A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions. #### `sub`: The `sub` tag on a `\match` will indicate that the match is for a node of a bound sub-matcher. E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that binds to `inner`, which is the value for the `sub` tag of the expected match for the sub-matcher `\match{sub=inner$...}`. Currently, sub-matchers are not tested in any way. ### What if ...? #### ... I want to add a matcher? Add a Doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run `ninja check-clang-unit` to test the documentation. #### ... the example I wrote is wrong? The test-failure output of the generated test file will provide information about - where the generated test file is located - which line in `ASTMatcher.h` the example is from - which matches were: found, not-(yet)-found, expected - in case of an unexpected match: what the node looks like using the different `type`s - the language version and if the test ran with a windows `-target` flag (also in failure summary) #### ... I don't adhere to the required order of the syntax? The script will diagnose any found issues, such as `matcher is missing an example` with a `file:line:` prefix, which should provide enough information about the issue. #### ... the script diagnoses a false-positive issue with a Doxygen comment? It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with Doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the `expected_failure_statistics` at the top of the `generate_ast_matcher_doc_tests.py` file. Fixes #57607 Fixes #63748
1 parent b65930c commit e42cc3f

File tree

8 files changed

+11407
-3945
lines changed

8 files changed

+11407
-3945
lines changed

clang/docs/LibASTMatchersReference.html

Lines changed: 5679 additions & 2272 deletions
Large diffs are not rendered by default.

clang/docs/ReleaseNotes.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,9 @@ AST Matchers
568568

569569
- Fixed a crash when traverse lambda expr with invalid captures. (#GH106444)
570570

571+
- The examples in the AST matcher reference are now tested and additional
572+
examples and descriptions were added.
573+
571574
clang-format
572575
------------
573576

clang/docs/doxygen.cfg.in

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,14 @@ TAB_SIZE = 2
220220
# "Side Effects:". You can put \n's in the value part of an alias to insert
221221
# newlines.
222222

223-
ALIASES =
223+
ALIASES += compile_args{1}="Compiled with <tt>\1</tt>.\n"
224+
ALIASES += matcher{1}="<tt>\1</tt>"
225+
ALIASES += matcher{2$}="<tt>\2</tt>"
226+
ALIASES += match{1}="<tt>\1</tt>"
227+
ALIASES += match{2$}="<tt>\2</tt>"
228+
ALIASES += nomatch{1}="<tt>\1</tt>"
229+
ALIASES += header{1}="\code"
230+
ALIASES += endheader="\endcode"
224231

225232
# This tag can be used to specify a number of word-keyword mappings (TCL only).
226233
# A mapping has the form "name=value". For example adding "class=itcl::class"

clang/docs/tools/dump_ast_matchers.py

Lines changed: 63 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,72 @@ def extract_result_types(comment):
100100
comment = m.group(1)
101101

102102

103+
def find_next_closing_rbrace(
104+
data: str, start_pos: int, braces_to_be_matched: int
105+
) -> int:
106+
"""Finds the location of the closing rbrace '}' inside of data."""
107+
"""'start_pos' should be one past the opening lbrace and braces_to_be_matched is initialized with 0"""
108+
next_lbrace = data.find("{", start_pos)
109+
next_rbrace = data.find("}", start_pos)
110+
if next_lbrace != -1:
111+
if next_lbrace < next_rbrace:
112+
return find_next_closing_rbrace(
113+
data, next_lbrace + 1, braces_to_be_matched + 1
114+
)
115+
if braces_to_be_matched == 0:
116+
return next_rbrace
117+
return find_next_closing_rbrace(data, next_rbrace + 1, braces_to_be_matched - 1)
118+
119+
if braces_to_be_matched > 0:
120+
return find_next_closing_rbrace(data, next_rbrace + 1, braces_to_be_matched - 1)
121+
122+
return next_rbrace
123+
124+
103125
def strip_doxygen(comment):
104126
"""Returns the given comment without \-escaped words."""
105-
# If there is only a doxygen keyword in the line, delete the whole line.
106-
comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
107-
108127
# If there is a doxygen \see command, change the \see prefix into "See also:".
109128
# FIXME: it would be better to turn this into a link to the target instead.
110129
comment = re.sub(r"\\see", r"See also:", comment)
111130

131+
commands: list[str] = [
132+
"\\compile_args{",
133+
"\\matcher{",
134+
"\\match{",
135+
"\\nomatch{",
136+
]
137+
138+
for command in commands:
139+
delete_command = command == "\\compile_args{"
140+
command_begin_loc = comment.find(command)
141+
while command_begin_loc != -1:
142+
command_end_loc = command_begin_loc + len(command)
143+
end_brace_loc = find_next_closing_rbrace(comment, command_end_loc + 1, 0)
144+
if end_brace_loc == -1:
145+
print("found unmatched {")
146+
command_begin_loc = comment.find(command, command_end_loc)
147+
continue
148+
149+
if delete_command:
150+
comment = comment[0:command_begin_loc] + comment[end_brace_loc + 1 :]
151+
command_begin_loc = comment.find(command, command_begin_loc)
152+
continue
153+
154+
tag_seperator_loc = comment.find("$", command_end_loc)
155+
if tag_seperator_loc != -1 and tag_seperator_loc < end_brace_loc:
156+
command_end_loc = tag_seperator_loc + 1
157+
158+
comment = (
159+
comment[0:command_begin_loc]
160+
+ comment[command_end_loc:end_brace_loc]
161+
+ comment[end_brace_loc + 1 :]
162+
)
163+
164+
command_begin_loc = comment.find(command, command_begin_loc)
165+
166+
# If there is only a doxygen keyword in the line, delete the whole line.
167+
comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
168+
112169
# Delete the doxygen command and the following whitespace.
113170
comment = re.sub(r"\\[^\s]+\s+", r"", comment)
114171
return comment
@@ -191,8 +248,9 @@ def act_on_decl(declaration, comment, allowed_types):
191248
definition.
192249
"""
193250
if declaration.strip():
194-
195-
if re.match(r"^\s?(#|namespace|using|template <typename NodeType> using|})", declaration):
251+
if re.match(
252+
r"^\s?(#|namespace|using|template <typename NodeType> using|})", declaration
253+
):
196254
return
197255

198256
# Node matchers are defined by writing:

0 commit comments

Comments
 (0)