Skip to content

fix(clang/**.py): fix invalid escape sequences #94029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

e-kwsm
Copy link
Contributor

@e-kwsm e-kwsm commented May 31, 2024

No description provided.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label May 31, 2024
@llvmbot
Copy link
Member

llvmbot commented May 31, 2024

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Eisuke Kawashima (e-kwsm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/94029.diff

2 Files Affected:

  • (modified) clang/docs/tools/dump_ast_matchers.py (+5-5)
  • (modified) clang/test/Analysis/check-analyzer-fixit.py (+1-1)
diff --git a/clang/docs/tools/dump_ast_matchers.py b/clang/docs/tools/dump_ast_matchers.py
index 705ff0d4d4098..d47111819a1e2 100755
--- a/clang/docs/tools/dump_ast_matchers.py
+++ b/clang/docs/tools/dump_ast_matchers.py
@@ -86,11 +86,11 @@ def extract_result_types(comment):
     parsed.
     """
     result_types = []
-    m = re.search(r"Usable as: Any Matcher[\s\n]*$", comment, re.S)
+    m = re.search("Usable as: Any Matcher[\\s\n]*$", comment, re.S)
     if m:
         return ["*"]
     while True:
-        m = re.match(r"^(.*)Matcher<([^>]+)>\s*,?[\s\n]*$", comment, re.S)
+        m = re.match("^(.*)Matcher<([^>]+)>\\s*,?[\\s\n]*$", comment, re.S)
         if not m:
             if re.search(r"Usable as:\s*$", comment):
                 return result_types
@@ -101,9 +101,9 @@ def extract_result_types(comment):
 
 
 def strip_doxygen(comment):
-    """Returns the given comment without \-escaped words."""
+    r"""Returns the given comment without \-escaped words."""
     # If there is only a doxygen keyword in the line, delete the whole line.
-    comment = re.sub(r"^\\[^\s]+\n", r"", comment, flags=re.M)
+    comment = re.sub("^\\\\[^\\s]+\n", r"", comment, flags=re.M)
 
     # If there is a doxygen \see command, change the \see prefix into "See also:".
     # FIXME: it would be better to turn this into a link to the target instead.
@@ -236,7 +236,7 @@ def act_on_decl(declaration, comment, allowed_types):
 
         # Parse the various matcher definition macros.
         m = re.match(
-            """.*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
+            r""".*AST_TYPE(LOC)?_TRAVERSE_MATCHER(?:_DECL)?\(
                        \s*([^\s,]+\s*),
                        \s*(?:[^\s,]+\s*),
                        \s*AST_POLYMORPHIC_SUPPORTED_TYPES\(([^)]*)\)
diff --git a/clang/test/Analysis/check-analyzer-fixit.py b/clang/test/Analysis/check-analyzer-fixit.py
index b616255de89b0..43968f4b1b6e8 100644
--- a/clang/test/Analysis/check-analyzer-fixit.py
+++ b/clang/test/Analysis/check-analyzer-fixit.py
@@ -55,7 +55,7 @@ def run_test_once(args, extra_args):
     # themselves.  We need to keep the comments to preserve line numbers while
     # avoiding empty lines which could potentially trigger formatting-related
     # checks.
-    cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)
+    cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)
     write_file(temp_file_name, cleaned_test)
 
     original_file_name = temp_file_name + ".orig"

@steakhal
Copy link
Contributor

Hi, could you explain the motivation of this patch?
I'm not an expert on Python, but to me it seemed like the r"..." strings are supposed to be used for regular expressions, and in this change you appear to transform those strings into plain old strings.
Could you help me understand this?

@NagyDonat
Copy link
Contributor

NagyDonat commented Mar 12, 2025

to me it seemed like the r"..." strings are supposed to be used for regular expressions, and in this change you appear to transform those strings into plain old strings. Could you help me understand this?

In Python the r string prefix stands for a raw string literal where the escape sequences are not interpreted (see relevant part of the language reference).

The presence or absence of the "r" prefix does not influence the type of the object represented by the string literal -- it only influences the contents of the string object. For example the raw string literal r"\n+" (three characters: backslash, letter "n", plus) is exactly equivalent to the plain old string literal "\\n+" (where the two backslashes are interpreted as an escape sequence that produces a single backslash). (Note that without the r the literal "\n+" consists of two characters: a newline and a plus sign.)

Raw strings are indeed frequently used for regular expressions, because a string that represents a regexp usually contains many backslashes and it's more comfortable to specify them as a raw string literal -- but there is no formal connection between them. (Unlike languages like Perl or shell scripts, regular expressions in Python are purely implemented within the standard library, there is no special syntax for them.)

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in clang/test/Analysis/check-analyzer-fixit.py is a good step forward, it indeed fixes an invalid escape sequence[1].

However, I don't see any reason for the changes in clang/docs/tools/dump_ast_matchers.py: those were raw strings, so there cannot be "invalid escapes sequences" within them. Please elaborate the reason why you want to apply these changes.

[1]: For readers unfamiliar with Python: the character combination \- does not have special meaning in Python string literals; so in older Python version these two characters were directly included into the string. However, this behavior is deprecated and in the future unrecognized escape sequences will cause SyntaxError (in non-raw strings).

@@ -55,7 +55,7 @@ def run_test_once(args, extra_args):
# themselves. We need to keep the comments to preserve line numbers while
# avoiding empty lines which could potentially trigger formatting-related
# checks.
cleaned_test = re.sub("// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)
cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cleaned_test = re.sub("// *CHECK-[A-Z0-9\\-]*:[^\r\n]*", "//", input_text)
cleaned_test = re.sub(r"// *CHECK-[A-Z0-9\-]*:[^\r\n]*", "//", input_text)

I would prefer switching to a raw string literal here -- it's functionally equivalent to your change and is more idiomatic to specify a regular expression as a raw string literal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants