Skip to content

added a script to update llvm-mc test file #107246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 23, 2024

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Sep 4, 2024

Added a script to update the test file generated by llvm-mc binary. The script accepts .s and .txt for asm and dasm.

For mc test I am targetting there is no function name which can be used as a key, thus no clear mapping between input and output. The script assumes the test are always line-by-line and it update the output marker for each test line-by-line.

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2024

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

Added a script to update the test file generated by llvm-mc binary. The script parse the test assembly and disassembly line-by-line, and output check marks the same way as update_llc_test_check.

The script currently accept .s and .txt for asm and dasm. It assumes the test is always line-by-line and propogate the output correspondingly.


Full diff: https://github.com/llvm/llvm-project/pull/107246.diff

7 Files Affected:

  • (added) llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s (+3)
  • (added) llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s.expected (+5)
  • (added) llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt (+5)
  • (added) llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt.expected (+8)
  • (added) llvm/test/tools/UpdateTestChecks/update_mc_test_checks/amdgpu-basic.test (+7)
  • (modified) llvm/utils/UpdateTestChecks/common.py (+1-1)
  • (added) llvm/utils/update_mc_test_check.py (+330)
diff --git a/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s
new file mode 100644
index 00000000000000..b21935e1d1a3ab
--- /dev/null
+++ b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s
@@ -0,0 +1,3 @@
+// RUN: llvm-mc -triple=amdgcn -show-encoding %s 2>&1 | FileCheck --check-prefixes=CHECK %s
+
+v_bfrev_b32 v5, v1
diff --git a/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s.expected b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s.expected
new file mode 100644
index 00000000000000..d29e1fc121e852
--- /dev/null
+++ b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_asm.s.expected
@@ -0,0 +1,5 @@
+; NOTE: Assertions have been autogenerated by utils/update_mc_test_check.py UTC_ARGS: --version 5
+// RUN: llvm-mc -triple=amdgcn -show-encoding %s 2>&1 | FileCheck --check-prefixes=CHECK %s
+
+// CHECK: v_bfrev_b32_e32 v5, v1                  ; encoding: [0x01,0x71,0x0a,0x7e]
+v_bfrev_b32 v5, v1
diff --git a/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt
new file mode 100644
index 00000000000000..9f5fba6e50df25
--- /dev/null
+++ b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt
@@ -0,0 +1,5 @@
+# RUN: llvm-mc -triple=amdgcn -mcpu=gfx1100 -disassemble -show-encoding %s 2>&1 | FileCheck -check-prefixes=CHECK %s
+
+0x00,0x00,0x00,0x7e
+
+0xfd,0xb8,0x0a,0x7f
diff --git a/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt.expected b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt.expected
new file mode 100644
index 00000000000000..896d5beb12d575
--- /dev/null
+++ b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/Inputs/amdgpu_dasm.txt.expected
@@ -0,0 +1,8 @@
+; NOTE: Assertions have been autogenerated by utils/update_mc_test_check.py UTC_ARGS: --version 5
+# RUN: llvm-mc -triple=amdgcn -mcpu=gfx1100 -disassemble -show-encoding %s 2>&1 | FileCheck -check-prefixes=CHECK %s
+
+# CHECK: v_nop                                   ; encoding: [0x00,0x00,0x00,0x7e]
+0x00,0x00,0x00,0x7e
+
+# COM: CHECK: warning: invalid instruction encoding
+0xfd,0xb8,0x0a,0x7f
diff --git a/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/amdgpu-basic.test b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/amdgpu-basic.test
new file mode 100644
index 00000000000000..a74e0ae4e76f95
--- /dev/null
+++ b/llvm/test/tools/UpdateTestChecks/update_mc_test_checks/amdgpu-basic.test
@@ -0,0 +1,7 @@
+# REQUIRES: amdgpu-registered-target
+## Check that basic asm/dasm process is correct
+
+# RUN: cp -f %S/Inputs/amdgpu_asm.s %t.s && %update_mc_test_checks %t.s
+# RUN: diff -u %S/Inputs/amdgpu_asm.s.expected %t.s
+# RUN: cp -f %S/Inputs/amdgpu_dasm.txt %t.txt && %update_mc_test_checks %t.txt
+# RUN: diff -u %S/Inputs/amdgpu_dasm.txt.expected %t.txt
diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index 9b9be69ee38448..b861bd010e2b25 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -573,7 +573,7 @@ def invoke_tool(exe, cmd_args, ir, preprocess_cmd=None, verbose=False):
 
 IR_FUNCTION_RE = re.compile(r'^\s*define\s+(?:internal\s+)?[^@]*@"?([\w.$-]+)"?\s*\(')
 TRIPLE_IR_RE = re.compile(r'^\s*target\s+triple\s*=\s*"([^"]+)"$')
-TRIPLE_ARG_RE = re.compile(r"-mtriple[= ]([^ ]+)")
+TRIPLE_ARG_RE = re.compile(r"-m?triple[= ]([^ ]+)")
 MARCH_ARG_RE = re.compile(r"-march[= ]([^ ]+)")
 DEBUG_ONLY_ARG_RE = re.compile(r"-debug-only[= ]([^ ]+)")
 
diff --git a/llvm/utils/update_mc_test_check.py b/llvm/utils/update_mc_test_check.py
new file mode 100755
index 00000000000000..ccaee25b3fa6ad
--- /dev/null
+++ b/llvm/utils/update_mc_test_check.py
@@ -0,0 +1,330 @@
+#!/usr/bin/env python3
+"""
+A test update script.  This script is a utility to update LLVM 'llvm-mc' based test cases with new FileCheck patterns.
+"""
+
+from __future__ import print_function
+
+import argparse
+import os  # Used to advertise this file's name ("autogenerated_note").
+
+from UpdateTestChecks import common
+
+import subprocess
+import re
+
+mc_LIKE_TOOLS = [
+    "llvm-mc",
+]
+
+ERROR_RE = re.compile(r"(warning|error): .*")
+ERROR_CHECK_RE = re.compile(r"# COM: .*")
+OUTPUT_SKIPPED_RE = re.compile(r"(.text)")
+COMMENT = {
+        "asm" : "//",
+        "dasm" : "#"
+        }
+
+
+def invoke_tool(exe, cmd_args, testline, verbose=False):
+    if isinstance(cmd_args, list):
+        args = [applySubstitutions(a, substitutions) for a in cmd_args]
+    else:
+        args = cmd_args
+
+    cmd = "echo \"" + testline + "\" | " + exe + " " + args
+    if verbose:
+        print("Command: ", cmd)
+    out = subprocess.check_output(cmd, shell=True)
+    # Fix line endings to unix CR style.
+    return out.decode().replace("\r\n", "\n")
+
+
+# create tests line-by-line, here we just filter out the check lines and comments
+# and treat all others as tests
+def isTestLine(input_line, mc_mode):
+    # Skip comment lines
+    if input_line.strip(' \t\r').startswith(COMMENT[mc_mode]):
+        return False
+    elif input_line.strip(' \t\r') == '':
+        return False
+    # skip any CHECK lines.
+    elif common.CHECK_RE.match(input_line):
+        return False
+    return True
+
+def hasErr(err):
+    if err is None or len(err) == 0:
+        return False
+    if ERROR_RE.search(err):
+        return True
+    return False
+
+def getErrString(err):
+    if err is None or len(err) == 0:
+        return ""
+
+    lines = err.split('\n')
+    # take the first match
+    for line in lines:
+        s = ERROR_RE.search(line)
+        if s:
+            return s.group(0)
+    return ""
+
+def getOutputString(out):
+    if out is None or len(out) == 0:
+        return ""
+    lines = out.split('\n')
+    output = ""
+
+    for line in lines:
+        if OUTPUT_SKIPPED_RE.search(line):
+            continue
+        if line.strip('\t ') == '':
+            continue
+        output += line.lstrip('\t ')
+    return output
+
+def should_add_line_to_output(input_line, prefix_set, mc_mode):
+    # special check line
+    if mc_mode == 'dasm' and ERROR_CHECK_RE.search(input_line):
+        return False
+    else:
+        return common.should_add_line_to_output(input_line, prefix_set, comment_marker=COMMENT[mc_mode])
+
+
+def getStdCheckLine(prefix, output, mc_mode):
+    lines = output.split('\n')
+    output = ""
+    for line in lines:
+        output += COMMENT[mc_mode] + ' ' + prefix + ": " + line + '\n'
+    return output
+
+def getErrCheckLine(prefix, output, mc_mode):
+    if mc_mode == 'asm':
+        return COMMENT[mc_mode] + ' ' + prefix + ": " + output + '\n'
+    elif mc_mode == 'dasm':
+        return COMMENT[mc_mode] + ' COM: ' + prefix + ": " + output + '\n'
+
+def main():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--mc-binary",
+        default=None,
+        help='The "mc" binary to use to generate the test case',
+    )
+    parser.add_argument(
+        "--tool",
+        default=None,
+        help="Treat the given tool name as an mc-like tool for which check lines should be generated",
+    )
+    parser.add_argument(
+        "--default-march",
+        default=None,
+        help="Set a default -march for when neither triple nor arch are found in a RUN line",
+    )
+    parser.add_argument("tests", nargs="+")
+    initial_args = common.parse_commandline_args(parser)
+
+    script_name = os.path.basename(__file__)
+
+    for ti in common.itertests(
+        initial_args.tests, parser, script_name="utils/" + script_name
+    ):
+        if ti.path.endswith('.s'):
+            mc_mode = "asm"
+        elif ti.path.endswith('.txt'):
+            mc_mode = "dasm"
+        else:
+            common.warn("Expected .s and .txt, Skipping file : ", ti.path)
+            continue
+
+        triple_in_ir = None
+        for l in ti.input_lines:
+            m = common.TRIPLE_IR_RE.match(l)
+            if m:
+                triple_in_ir = m.groups()[0]
+                break
+
+        run_list = []
+        for l in ti.run_lines:
+            if "|" not in l:
+                common.warn("Skipping unparsable RUN line: " + l)
+                continue
+
+            commands = [cmd.strip() for cmd in l.split("|")]
+            assert len(commands) >= 2
+            mc_cmd = " | ".join(commands[:-1])
+            filecheck_cmd = commands[-1]
+            mc_tool = mc_cmd.split(" ")[0]
+
+            triple_in_cmd = None
+            m = common.TRIPLE_ARG_RE.search(mc_cmd)
+            if m:
+                triple_in_cmd = m.groups()[0]
+
+            march_in_cmd = ti.args.default_march
+            m = common.MARCH_ARG_RE.search(mc_cmd)
+            if m:
+                march_in_cmd = m.groups()[0]
+
+            common.verify_filecheck_prefixes(filecheck_cmd)
+
+            mc_like_tools = mc_LIKE_TOOLS[:]
+            if ti.args.tool:
+                mc_like_tools.append(ti.args.tool)
+            if mc_tool not in mc_like_tools:
+                common.warn("Skipping non-mc RUN line: " + l)
+                continue
+
+            if not filecheck_cmd.startswith("FileCheck "):
+                common.warn("Skipping non-FileChecked RUN line: " + l)
+                continue
+
+            mc_cmd_args = mc_cmd[len(mc_tool) :].strip()
+            mc_cmd_args = mc_cmd_args.replace("< %s", "").replace("%s", "").strip()
+            check_prefixes = common.get_check_prefixes(filecheck_cmd)
+
+            run_list.append(
+                (
+                    check_prefixes,
+                    mc_tool,
+                    mc_cmd_args,
+                    triple_in_cmd,
+                    march_in_cmd,
+                )
+            )
+        
+
+        # find all test line from input
+        testlines = [l for l in ti.input_lines if isTestLine(l, mc_mode)]
+        run_list_size = len(run_list)
+        testnum = len(testlines)
+
+        raw_output = []
+        raw_prefixes = []
+        for (
+            prefixes,
+            mc_tool,
+            mc_args,
+            triple_in_cmd,
+            march_in_cmd,
+        ) in run_list:
+            common.debug("Extracted mc cmd:", mc_tool, mc_args)
+            common.debug("Extracted FileCheck prefixes:", str(prefixes))
+            common.debug("Extracted triple :", str(triple_in_cmd))
+            common.debug("Extracted march:", str(march_in_cmd))
+
+            triple = triple_in_cmd or triple_in_ir
+            if not triple:
+                triple = common.get_triple_from_march(march_in_cmd)
+
+            raw_output.append([])
+            for line in testlines:
+                # get output for each testline
+                out = invoke_tool(
+                    ti.args.mc_binary or mc_tool,
+                    mc_args,
+                    line,
+                    verbose=ti.args.verbose,
+                )
+                raw_output[-1].append(out)
+
+            common.debug("Collect raw tool lines:", str(len(raw_output[-1])))
+            
+            raw_prefixes.append(prefixes)
+
+        output_lines = []
+        generated_prefixes = []
+        used_prefixes = set()
+        prefix_set = set([prefix for p in run_list for prefix in p[0]])
+        common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
+
+        for test_id in range(testnum):
+            input_line = testlines[test_id]
+
+            # a {prefix : output, [runid] } dict
+            # insert output to a prefix-key dict, and do a max sorting
+            # to select the most-used prefix which share the same output string
+            p_dict = {}
+            for run_id in range(run_list_size):
+                out = raw_output[run_id][test_id]
+
+                if hasErr(out):
+                    o = getErrString(out)
+                else:
+                    o = getOutputString(out)
+                
+                prefixes = raw_prefixes[run_id]
+
+                for p in prefixes:
+                    if p not in p_dict:
+                        p_dict[p] = o, [run_id]
+                    else:
+                        if p_dict[p] == (None, []):
+                            continue
+
+                        prev_o, run_ids = p_dict[p]
+                        if o == prev_o:
+                            run_ids.append(run_id)
+                            p_dict[p] = o, run_ids
+                        else:
+                            # conflict, discard
+                            p_dict[p] = None, []
+
+            p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: -len(item[1][1])))
+
+            # prefix is selected and generated with most shared output lines
+            # each run_id can only be used once
+            gen_prefix = ""
+            used_runid = set()
+            for prefix, tup in p_dict_sorted.items():
+                o, run_ids = tup
+
+                if len(run_ids) == 0:
+                    continue
+
+                skip = False
+                for i in run_ids:
+                    if i in used_runid:
+                        skip = True
+                    else:
+                        used_runid.add(i)
+                if not skip:
+                    used_prefixes.add(prefix)
+
+                    if hasErr(o):
+                        gen_prefix += getErrCheckLine(prefix, o, mc_mode)
+                    else:
+                        gen_prefix += getStdCheckLine(prefix, o, mc_mode)
+
+            generated_prefixes.append(gen_prefix.rstrip('\n'))
+
+        # write output
+        prefix_id = 0
+        for input_info in ti.iterlines(output_lines):
+            input_line = input_info.line
+            if isTestLine(input_line, mc_mode):
+                output_lines.append(generated_prefixes[prefix_id])
+                output_lines.append(input_line)
+                prefix_id += 1
+
+            elif should_add_line_to_output(input_line, prefix_set, mc_mode):
+                output_lines.append(input_line)
+
+            elif input_line in ti.run_lines or input_line == "":
+                output_lines.append(input_line)
+
+        if ti.args.gen_unused_prefix_body:
+            output_lines.extend(
+                ti.get_checks_for_unused_prefixes(run_list, used_prefixes)
+            )
+
+        common.debug("Writing %d lines to %s..." % (len(output_lines), ti.path))
+        with open(ti.path, "wb") as f:
+            f.writelines(["{}\n".format(l).encode("utf-8") for l in output_lines])
+
+
+if __name__ == "__main__":
+    main()

@broxigarchen
Copy link
Contributor Author

Hi reviewers, the script is being added when I am working on the amdgpu development, I am no sure how useful this script is, but posting it for review and collect feedbacks.

Copy link

github-actions bot commented Sep 4, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Collaborator

@kosarev kosarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea.

Does this work for combined asm/disasm tests, e.g., gfx12_asm_vop1.s? Does this work for other backend's asm/disasm tests?

@kosarev kosarev requested review from rampitec and arsenm September 4, 2024 17:17
Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really useful! I wonder if we could also make it work for the error message checking?

E.g. generate the following check lines
CHECK-ERR: [[#@LINE]]:<col>: error: ...

if the RUN: line contains something like not llvm-mc .... 2>&1 | FileCheck?

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Sep 4, 2024

Nice idea.

Does this work for combined asm/disasm tests, e.g., gfx12_asm_vop1.s? Does this work for other backend's asm/disasm tests?

I tried gfx12_asm_vop1.s, but seems the script does not understand the %extract_encoding% in the cmd line. I think it requires some parsing for these special tokens.

Regarding to the other backend, I did a quick look, and it seems some tests are using .section as test seperators so there must be lot of cases that are not covered.

--

I took a look and it seems it required some parsing on the lit.local.cfg file. The test infra seems not have something for this yet. I guess for now we can just try to replace the %extract_encoding% with actual command it should be working

@broxigarchen
Copy link
Contributor Author

This looks really useful! I wonder if we could also make it work for the error message checking?

E.g. generate the following check lines CHECK-ERR: [[#@LINE]]:<col>: error: ...

if the RUN: line contains something like not llvm-mc .... 2>&1 | FileCheck?

I was looking at it. It's currently not supporting since the not is interrupting the binary check. I'll try to see if I can get this to work

@llvmbot llvmbot added the mc Machine (object) code label Sep 9, 2024
@broxigarchen
Copy link
Contributor Author

This looks really useful! I wonder if we could also make it work for the error message checking?
E.g. generate the following check lines CHECK-ERR: [[#@LINE]]:<col>: error: ...
if the RUN: line contains something like not llvm-mc .... 2>&1 | FileCheck?

I was looking at it. It's currently not supporting since the not is interrupting the binary check. I'll try to see if I can get this to work

Added

Copy link
Collaborator

@kosarev kosarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVGTM. Please also wait for Alexander's approval.

// RUN: not llvm-mc -triple=amdgcn -show-encoding %s 2>&1 | FileCheck --check-prefixes=CHECK %s

v_bfrev_b32 v5, v299
// CHECK: error: register index is out of range
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daydreaming: I guess this could even do the [[@LINE-1]] thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can work. Let me try it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Sep 10, 2024

After some research on windows platform I realized that all lit tests for the update scripts are disabled for windows platform (either binary check failed or platform not supported. In windows, llc is named llc.exe so the llc binary check in os.path.isfile(llc) return false -_-! and it will not run).

Disabled windows platform test for this script as well

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline - looking forward to being able to use this script.

@broxigarchen
Copy link
Contributor Author

Quick ping! This PR should be ready to get in

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this LGTM.

A few minor simplification suggestions if you think this makes it better.

simplify the code

Co-authored-by: Alexander Richardson <[email protected]>
@broxigarchen broxigarchen merged commit 2b892b0 into llvm:main Sep 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants