Skip to content

Add PR check to suggest alternatives to using undef #118506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 11, 2024
109 changes: 108 additions & 1 deletion llvm/utils/git/code-format-helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

import argparse
import os
import re
import shlex
import subprocess
import sys
from typing import List, Optional
Expand Down Expand Up @@ -312,7 +314,112 @@ def format_run(self, changed_files: List[str], args: FormatArgs) -> Optional[str
return None


ALL_FORMATTERS = (DarkerFormatHelper(), ClangFormatHelper())
class UndefGetFormatHelper(FormatHelper):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little strange this being in code-format-helper.py but that's perhaps more the naming of the file than this being in the wrong place. If we think of this as "stuff that runs on the changed lines" then it's not so weird. Plus you get for free the feature that updates the comment once the PR author fixes it, duplicating that would be annoying.

I would at least override pr_comment_text_for_diff because the explanation from the tool is not a diff. Also if you do that, you can present it as Markdown with code blocks so it's nicer to read.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had the same thought as @DavidSpickett at first. I always thought of code-format-helper as something that's just checking formatting issues with clang format and friends. But I guess it makes sense to have other checks without having to create a separate github flow for it.

But I think we need to change the name of the script and the integration name in that case to really make it a bit more generic. Otherwise I think it might cause confusion.

What do you think @boomanaiden154 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it in here seems like the best place. It allows taking advantage of the existing infrastructure. I agree that we should probably change the name to reflect this later.

Perfectly fine to leave that to a follow up patch in my opinion though. There are probably some other things that can get cleaned up at the same time.

name = "undef deprecator"
friendly_name = "undef deprecator"

@property
def instructions(self) -> str:
return " ".join(shlex.quote(c) for c in self.cmd)

def filter_changed_files(self, changed_files: List[str]) -> List[str]:
filtered_files = []
for path in changed_files:
_, ext = os.path.splitext(path)
if ext in (".cpp", ".c", ".h", ".hpp", ".hxx", ".cxx", ".inc", ".cppm", ".ll"):
filtered_files.append(path)
return filtered_files

def has_tool(self) -> bool:
return True

def pr_comment_text_for_diff(self, diff: str) -> str:
return f"""
:warning: {self.name} found issues in your code. :warning:

<details>
<summary>
You can test this locally with the following command:
</summary>

``````````bash
{self.instructions}
``````````

</details>

{diff}
"""

def format_run(self, changed_files: List[str], args: FormatArgs) -> Optional[str]:
files = self.filter_changed_files(changed_files)

# Use git to find files that have had a change in the number of undefs
regex = "([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)"
cmd = ["git", "diff", "-U0", "--pickaxe-regex", "-S", regex]

if args.start_rev and args.end_rev:
cmd.append(args.start_rev)
cmd.append(args.end_rev)

cmd += files
self.cmd = cmd

if args.verbose:
print(f"Running: {self.instructions}")

proc = subprocess.run(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding="utf-8"
)
sys.stdout.write(proc.stderr)
stdout = proc.stdout

files = []
# Split the diff so we have one array entry per file.
# Each file is prefixed like:
# diff --git a/file b/file
for file in re.split("^diff --git ", stdout, 0, re.MULTILINE):
# search for additions of undef
if re.search("^[+].*" + regex, file, re.MULTILINE):
files.append(re.match("a/([^ ]+)", file.splitlines()[0])[1])

if not files:
return None

files = "\n".join(" - " + f for f in files)
report = f"""
The following files introduce new uses of undef:
{files}

[Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields `undef`. You should use `poison` values for placeholders instead.

In tests, avoid using `undef` and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:
```llvm
define void @fn() {{
...
br i1 undef, ...
}}
```

Please use the following instead:
```llvm
define void @fn(i1 %cond) {{
...
br i1 %cond, ...
}}
```

Please refer to the [Undefined Behavior Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information.
"""
if args.verbose:
print(f"error: {self.name} failed")
print(report)
return report


ALL_FORMATTERS = (DarkerFormatHelper(), ClangFormatHelper(), UndefGetFormatHelper())


def hook_main():
Expand Down
Loading