Skip to content

Commit 2e90b54

Browse files
committed
[clang][test] add testing for the AST matcher reference
Previously, the examples in the AST matcher reference, which gets generated by the doxygen comments in `ASTMatchers.h`, were untested and best effort. Some of the matchers had no or wrong examples of how to use the matcher. This patch introduces a simple DSL around doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy. In `ASTMatchers.h`, most matchers are documented with a doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that testing the documentation is done, is by using doxygens alias feature to declare custom aliases. These aliases forward to `<tt>text</tt>` (which is what doxygens \c does, but for multiple words). Using the doxygen aliases was the obvious choice, because there are (now) four consumers: - people reading the header/using signature help - the doxygen generated documentation - the generated html AST matcher reference - (new) the generated matcher tests This patch rewrites/extends the documentation such that all matchers have a documented example. The new `generate_ast_matcher_doc_tests.py` script will warn on any undocumented matchers (but not on matchers without a doxygen comment) and provides diagnostics and statistics about the matchers. Below is a file-level comment from the test generation script that describes how documenting matchers to be tested works on a slightly more technical level. In general, the new comments can be used as a reference for how to implement a tested documentation. The current statistics emitted by the parser are: ```text Statistics: doxygen_blocks : 519 missing_tests : 10 skipped_objc : 42 code_snippets : 503 matches : 820 matchers : 580 tested_matchers : 574 none_type_matchers : 6 ``` The tests are generated during building and the script will only print something if it found an issue (compile failure, parsing issues, the expected and actual number of failures differs). DSL for generating the tests from documentation. TLDR: The order for a single code snippet example is: \header{a.h} \endheader <- zero or more header \code int a = 42; \endcode \compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and whole languages \matcher{expr()} <- one or more matchers in succession \match{42} <- one ore more matches in succession \matcher{varDecl()} <- new matcher resets the context, the above \match will not count for this new matcher(-group) \match{int a = 42} <- only applies to the previous matcher (no the previous case) The above block can be repeated inside of a doxygen command for multiple code examples. Language Grammar: [] denotes an optional, and <> denotes user-input compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>} matcher_tag_key ::= type match_tag_key ::= type || std || count matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value> match_tags ::= [match_tag_key=<value>;]match_tag_key=<value> matcher ::= \matcher{[matcher_tags$]<matcher>} matchers ::= [matcher] matcher match ::= \match{[match_tags$]<match>} matches ::= [match] match case ::= matchers matches cases ::= [case] case header-block ::= \header{<name>} <code> \endheader code-block ::= \code <code> \endcode testcase ::= code-block [compile_args] cases The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of it's versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' seperator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier'). Examples: - c all available versions of C - c++11 only C++11 - c++11-or-later C++11 or later - c++11-or-earlier C++11 or earlier - c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and 23 (inclusive) - c++11-23,c same as above Tags: Type: Match types are used to select where the string that is used to check if a node matches comes from. Available: code, name, typestr, typeofstr. The default is 'code'. Matcher types are used to mark matchers as submatchers with 'sub' or as deactivated using 'none'. Testing submatchers is not implemented. Count: Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1. Std: A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions. Fixes #57607 Fixes #63748
1 parent 615f30b commit 2e90b54

File tree

8 files changed

+11263
-3913
lines changed

8 files changed

+11263
-3913
lines changed

clang/docs/LibASTMatchersReference.html

Lines changed: 5679 additions & 2272 deletions
Large diffs are not rendered by default.

clang/docs/ReleaseNotes.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -980,6 +980,8 @@ AST Matchers
980980
- Fixed ``forEachArgumentWithParam`` and ``forEachArgumentWithParamType`` to
981981
not skip the explicit object parameter for operator calls.
982982
- Fixed captureVars assertion failure if not capturesVariables. (#GH76425)
983+
- The examples in the AST matcher reference are now tested and additional
984+
examples and descriptions were added.
983985

984986
clang-format
985987
------------

clang/docs/doxygen.cfg.in

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,14 @@ TAB_SIZE = 2
220220
# "Side Effects:". You can put \n's in the value part of an alias to insert
221221
# newlines.
222222

223-
ALIASES =
223+
ALIASES += compile_args{1}="Compiled with <tt>\1</tt>.\n"
224+
ALIASES += matcher{1}="<tt>\1</tt>"
225+
ALIASES += matcher{2$}="<tt>\2</tt>"
226+
ALIASES += match{1}="<tt>\1</tt>"
227+
ALIASES += match{2$}="<tt>\2</tt>"
228+
ALIASES += nomatch{1}="<tt>\1</tt>"
229+
ALIASES += header{1}="\code"
230+
ALIASES += endheader="\endcode"
224231

225232
# This tag can be used to specify a number of word-keyword mappings (TCL only).
226233
# A mapping has the form "name=value". For example adding "class=itcl::class"

clang/docs/tools/dump_ast_matchers.py

Lines changed: 63 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,15 +100,72 @@ def extract_result_types(comment):
100100
comment = m.group(1)
101101

102102

103+
def find_next_closing_rbrace(
104+
data: str, start_pos: int, braces_to_be_matched: int
105+
) -> int:
106+
"""Finds the location of the closing rbrace '}' inside of data."""
107+
"""'start_pos' should be one past the opening lbrace and braces_to_be_matched is initialized with 0"""
108+
next_lbrace = data.find("{", start_pos)
109+
next_rbrace = data.find("}", start_pos)
110+
if next_lbrace != -1:
111+
if next_lbrace < next_rbrace:
112+
return find_next_closing_rbrace(
113+
data, next_lbrace + 1, braces_to_be_matched + 1
114+
)
115+
if braces_to_be_matched == 0:
116+
return next_rbrace
117+
return find_next_closing_rbrace(data, next_rbrace + 1, braces_to_be_matched - 1)
118+
119+
if braces_to_be_matched > 0:
120+
return find_next_closing_rbrace(data, next_rbrace + 1, braces_to_be_matched - 1)
121+
122+
return next_rbrace
123+
124+
103125
def strip_doxygen(comment):
104126
"""Returns the given comment without \-escaped words."""
105-
# If there is only a doxygen keyword in the line, delete the whole line.
106-
comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
107-
108127
# If there is a doxygen \see command, change the \see prefix into "See also:".
109128
# FIXME: it would be better to turn this into a link to the target instead.
110129
comment = re.sub(r"\\see", r"See also:", comment)
111130

131+
commands: list[str] = [
132+
"\\compile_args{",
133+
"\\matcher{",
134+
"\\match{",
135+
"\\nomatch{",
136+
]
137+
138+
for command in commands:
139+
delete_command = command == "\\compile_args{"
140+
command_begin_loc = comment.find(command)
141+
while command_begin_loc != -1:
142+
command_end_loc = command_begin_loc + len(command)
143+
end_brace_loc = find_next_closing_rbrace(comment, command_end_loc + 1, 0)
144+
if end_brace_loc == -1:
145+
print("found unmatched {")
146+
command_begin_loc = comment.find(command, command_end_loc)
147+
continue
148+
149+
if delete_command:
150+
comment = comment[0:command_begin_loc] + comment[end_brace_loc + 1 :]
151+
command_begin_loc = comment.find(command, command_begin_loc)
152+
continue
153+
154+
tag_seperator_loc = comment.find("$", command_end_loc)
155+
if tag_seperator_loc != -1 and tag_seperator_loc < end_brace_loc:
156+
command_end_loc = tag_seperator_loc + 1
157+
158+
comment = (
159+
comment[0:command_begin_loc]
160+
+ comment[command_end_loc:end_brace_loc]
161+
+ comment[end_brace_loc + 1 :]
162+
)
163+
164+
command_begin_loc = comment.find(command, command_begin_loc)
165+
166+
# If there is only a doxygen keyword in the line, delete the whole line.
167+
comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
168+
112169
# Delete the doxygen command and the following whitespace.
113170
comment = re.sub(r"\\[^\s]+\s+", r"", comment)
114171
return comment
@@ -191,8 +248,9 @@ def act_on_decl(declaration, comment, allowed_types):
191248
definition.
192249
"""
193250
if declaration.strip():
194-
195-
if re.match(r"^\s?(#|namespace|using|template <typename NodeType> using|})", declaration):
251+
if re.match(
252+
r"^\s?(#|namespace|using|template <typename NodeType> using|})", declaration
253+
):
196254
return
197255

198256
# Node matchers are defined by writing:

0 commit comments

Comments
 (0)