fix(clang/**.py): fix invalid escape sequences #94029

e-kwsm · 2024-05-31T19:45:09Z

No description provided.

github-actions · 2024-05-31T19:45:25Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-05-31T19:45:54Z

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Eisuke Kawashima (e-kwsm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/94029.diff

2 Files Affected:

(modified) clang/docs/tools/dump_ast_matchers.py (+5-5)
(modified) clang/test/Analysis/check-analyzer-fixit.py (+1-1)

diff --git a/clang/docs/tools/dump_ast_matchers.py b/clang/docs/tools/dump_ast_matchers.py
index 705ff0d4d4098..d47111819a1e2 100755
--- a/clang/docs/tools/dump_ast_matchers.py
+++ b/clang/docs/tools/dump_ast_matchers.py
@@ -86,11 +86,11 @@ def extract_result_types(comment):
     parsed.
     """
     result_types = []
-    m = re.search(r"Usable as: Any Matcher[\s\n]*$", comment, re.S)
+    m = re.search("Usable as: Any Matcher[\\s\n]*$", comment, re.S)
     if m:
         return ["*"]
     while True:
-        m = re.match(r"^(.*)Matcher<([^>]+)>\s*,?[\s\n]*$", comment, re.S)
+        m = re.match("^(.*)Matcher<([^>]+)>\\s*,?[\\s\n]*$", comment, re.S)
         if not m:
             if re.search(r"Usable as:\s*$", comment):
                 return result_types
@@ -101,9 +101,9 @@ def extract_result_types(comment):
 
 
 def strip_doxygen(comment):
-    """Returns the given comment without \-escaped words."""
+    r"""Returns the given comment without \-escaped words."""
     # If there is only a doxygen keyword in the line, delete the whole line.
-    comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
+    comment = re.sub("^\\\\[^\\s]+\n", r"", comment, flags=re.M)
 
     # If there is a doxygen \see command, change the \see prefix into "See also:".
     # FIXME: it would be better to turn this into a link to the target instead.
@@ -236,7 +236,7 @@ def act_on_decl(declaration, comment, allowed_types):
 
         # Parse the various matcher definition macros.
         m = re.match(
-            """.*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
+            r""".*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
                        \s*([^\s,]+\s*),
                        \s*(?:[^\s,]+\s*),
                        \s*AST_POLYMORPHIC_SUPPORTED_TYPES\(([^)]*)\)
diff --git a/clang/test/Analysis/check-analyzer-fixit.py b/clang/test/Analysis/check-analyzer-fixit.py
index b616255de89b0..43968f4b1b6e8 100644
--- a/clang/test/Analysis/check-analyzer-fixit.py
+++ b/clang/test/Analysis/check-analyzer-fixit.py
@@ -55,7 +55,7 @@ def run_test_once(args, extra_args):
     # themselves.  We need to keep the comments to preserve line numbers while
     # avoiding empty lines which could potentially trigger formatting-related
     # checks.
-    cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)
+    cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)
     write_file(temp_file_name, cleaned_test)
 
     original_file_name = temp_file_name + ".orig"

steakhal · 2025-03-12T08:23:06Z

Hi, could you explain the motivation of this patch?
I'm not an expert on Python, but to me it seemed like the r"..." strings are supposed to be used for regular expressions, and in this change you appear to transform those strings into plain old strings.
Could you help me understand this?

NagyDonat · 2025-03-12T09:49:42Z

to me it seemed like the r"..." strings are supposed to be used for regular expressions, and in this change you appear to transform those strings into plain old strings. Could you help me understand this?

In Python the r string prefix stands for a raw string literal where the escape sequences are not interpreted (see relevant part of the language reference).

The presence or absence of the "r" prefix does not influence the type of the object represented by the string literal -- it only influences the contents of the string object. For example the raw string literal r"\n+" (three characters: backslash, letter "n", plus) is exactly equivalent to the plain old string literal "\\n+" (where the two backslashes are interpreted as an escape sequence that produces a single backslash). (Note that without the r the literal "\n+" consists of two characters: a newline and a plus sign.)

Raw strings are indeed frequently used for regular expressions, because a string that represents a regexp usually contains many backslashes and it's more comfortable to specify them as a raw string literal -- but there is no formal connection between them. (Unlike languages like Perl or shell scripts, regular expressions in Python are purely implemented within the standard library, there is no special syntax for them.)

NagyDonat

The change in clang/test/Analysis/check-analyzer-fixit.py is a good step forward, it indeed fixes an invalid escape sequence[1].

However, I don't see any reason for the changes in clang/docs/tools/dump_ast_matchers.py: those were raw strings, so there cannot be "invalid escapes sequences" within them. Please elaborate the reason why you want to apply these changes.

[1]: For readers unfamiliar with Python: the character combination \- does not have special meaning in Python string literals; so in older Python version these two characters were directly included into the string. However, this behavior is deprecated and in the future unrecognized escape sequences will cause SyntaxError (in non-raw strings).

NagyDonat · 2025-03-12T09:53:32Z

clang/test/Analysis/check-analyzer-fixit.py

@@ -55,7 +55,7 @@ def run_test_once(args, extra_args):
    # themselves.  We need to keep the comments to preserve line numbers while
    # avoiding empty lines which could potentially trigger formatting-related
    # checks.
-    cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)
+    cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)


Suggested change

cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)

cleaned_test = re.sub(r"// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)

I would prefer switching to a raw string literal here -- it's functionally equivalent to your change and is more idiomatic to specify a regular expression as a raw string literal.

llvmbot added the clang Clang issues not falling into any other category label May 31, 2024

e-kwsm mentioned this pull request May 31, 2024

fix(python): fix invalid escape sequences #91856

Closed

e-kwsm force-pushed the clang/W605 branch from 3827289 to 5ad3e56 Compare June 23, 2024 12:22

e-kwsm force-pushed the clang/W605 branch 2 times, most recently from 1358b29 to 3d9fa04 Compare September 2, 2024 07:11

e-kwsm force-pushed the clang/W605 branch from 3d9fa04 to 8ebfa30 Compare November 20, 2024 03:08

e-kwsm force-pushed the clang/W605 branch from 8ebfa30 to 93cb19e Compare November 29, 2024 14:25

e-kwsm force-pushed the clang/W605 branch from 93cb19e to e6ea87d Compare December 10, 2024 22:52

e-kwsm force-pushed the clang/W605 branch from e6ea87d to 511dda8 Compare December 25, 2024 20:13

e-kwsm force-pushed the clang/W605 branch from 511dda8 to 6f595ed Compare January 13, 2025 14:21

fix(clang/**.py): fix invalid escape sequences

a125660

e-kwsm force-pushed the clang/W605 branch from 6f595ed to a125660 Compare March 12, 2025 04:39

llvmbot added the clang:static analyzer label Mar 12, 2025

NagyDonat reviewed Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(clang/**.py): fix invalid escape sequences #94029

fix(clang/**.py): fix invalid escape sequences #94029

Uh oh!

e-kwsm commented May 31, 2024

Uh oh!

github-actions bot commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024 •

edited

Loading

Uh oh!

steakhal commented Mar 12, 2025

Uh oh!

NagyDonat commented Mar 12, 2025 •

edited

Loading

Uh oh!

NagyDonat left a comment

Uh oh!

NagyDonat Mar 12, 2025

Uh oh!

Uh oh!

	cleaned_test = re.sub("// CHECK-[A-Z0-9\\-]:[^\r\n]*", "//", input_text)
	cleaned_test = re.sub(r"// CHECK-[A-Z0-9\-]:[^\r\n]*", "//", input_text)

fix(clang/**.py): fix invalid escape sequences #94029

Are you sure you want to change the base?

fix(clang/**.py): fix invalid escape sequences #94029

Uh oh!

Conversation

e-kwsm commented May 31, 2024

Uh oh!

github-actions bot commented May 31, 2024

Uh oh!

llvmbot commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented Mar 12, 2025

Uh oh!

NagyDonat commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NagyDonat left a comment

Choose a reason for hiding this comment

Uh oh!

NagyDonat Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented May 31, 2024 •

edited

Loading

NagyDonat commented Mar 12, 2025 •

edited

Loading