-
Notifications
You must be signed in to change notification settings - Fork 14.3k
fix(clang/**.py): fix invalid escape sequences #94029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Eisuke Kawashima (e-kwsm) ChangesFull diff: https://github.com/llvm/llvm-project/pull/94029.diff 2 Files Affected:
diff --git a/clang/docs/tools/dump_ast_matchers.py b/clang/docs/tools/dump_ast_matchers.py
index 705ff0d4d4098..d47111819a1e2 100755
--- a/clang/docs/tools/dump_ast_matchers.py
+++ b/clang/docs/tools/dump_ast_matchers.py
@@ -86,11 +86,11 @@ def extract_result_types(comment):
parsed.
"""
result_types = []
- m = re.search(r"Usable as: Any Matcher[\s\n]*$", comment, re.S)
+ m = re.search("Usable as: Any Matcher[\\s\n]*$", comment, re.S)
if m:
return ["*"]
while True:
- m = re.match(r"^(.*)Matcher<([^>]+)>\s*,?[\s\n]*$", comment, re.S)
+ m = re.match("^(.*)Matcher<([^>]+)>\\s*,?[\\s\n]*$", comment, re.S)
if not m:
if re.search(r"Usable as:\s*$", comment):
return result_types
@@ -101,9 +101,9 @@ def extract_result_types(comment):
def strip_doxygen(comment):
- """Returns the given comment without \-escaped words."""
+ r"""Returns the given comment without \-escaped words."""
# If there is only a doxygen keyword in the line, delete the whole line.
- comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
+ comment = re.sub("^\\\\[^\\s]+\n", r"", comment, flags=re.M)
# If there is a doxygen \see command, change the \see prefix into "See also:".
# FIXME: it would be better to turn this into a link to the target instead.
@@ -236,7 +236,7 @@ def act_on_decl(declaration, comment, allowed_types):
# Parse the various matcher definition macros.
m = re.match(
- """.*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
+ r""".*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
\s*([^\s,]+\s*),
\s*(?:[^\s,]+\s*),
\s*AST_POLYMORPHIC_SUPPORTED_TYPES\(([^)]*)\)
diff --git a/clang/test/Analysis/check-analyzer-fixit.py b/clang/test/Analysis/check-analyzer-fixit.py
index b616255de89b0..43968f4b1b6e8 100644
--- a/clang/test/Analysis/check-analyzer-fixit.py
+++ b/clang/test/Analysis/check-analyzer-fixit.py
@@ -55,7 +55,7 @@ def run_test_once(args, extra_args):
# themselves. We need to keep the comments to preserve line numbers while
# avoiding empty lines which could potentially trigger formatting-related
# checks.
- cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)
+ cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)
write_file(temp_file_name, cleaned_test)
original_file_name = temp_file_name + ".orig"
|
1358b29
to
3d9fa04
Compare
Hi, could you explain the motivation of this patch? |
In Python the The presence or absence of the "r" prefix does not influence the type of the object represented by the string literal -- it only influences the contents of the string object. For example the raw string literal Raw strings are indeed frequently used for regular expressions, because a string that represents a regexp usually contains many backslashes and it's more comfortable to specify them as a raw string literal -- but there is no formal connection between them. (Unlike languages like Perl or shell scripts, regular expressions in Python are purely implemented within the standard library, there is no special syntax for them.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in clang/test/Analysis/check-analyzer-fixit.py
is a good step forward, it indeed fixes an invalid escape sequence[1].
However, I don't see any reason for the changes in clang/docs/tools/dump_ast_matchers.py
: those were raw strings, so there cannot be "invalid escapes sequences" within them. Please elaborate the reason why you want to apply these changes.
[1]: For readers unfamiliar with Python: the character combination \-
does not have special meaning in Python string literals; so in older Python version these two characters were directly included into the string. However, this behavior is deprecated and in the future unrecognized escape sequences will cause SyntaxError
(in non-raw strings).
@@ -55,7 +55,7 @@ def run_test_once(args, extra_args): | |||
# themselves. We need to keep the comments to preserve line numbers while | |||
# avoiding empty lines which could potentially trigger formatting-related | |||
# checks. | |||
cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text) | |||
cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text) | |
cleaned_test = re.sub(r"// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text) |
I would prefer switching to a raw string literal here -- it's functionally equivalent to your change and is more idiomatic to specify a regular expression as a raw string literal.
No description provided.