[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

asb · 2025-03-05T14:52:21Z

This change removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.

I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach. isNopCopy is attractively named, but it is focused on removing cases where a copy is made redundant by a previous copy, rather than the case covered here of a copy that is literally a no-op (src == dst).

…on leaves behind a no-op reg move

…ter forwarded uses This changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.

llvmbot · 2025-03-05T14:52:58Z

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.

I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach. isNopCopy is attractively named, but it is focused on removing cases where a copy is made redundant by a previous copy, rather than the case covered here of a copy that is literally a no-op (src == dst).

Full diff: https://github.com/llvm/llvm-project/pull/129889.diff

5 Files Affected:

(modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+11)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll (-1)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll (-1)
(added) llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir (+74)
(modified) llvm/test/CodeGen/RISCV/sextw-removal.ll (-1)

diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 1105b8c15515f..9bc0f5836f33a 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -971,6 +971,17 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
 
     forwardUses(MI);
 
+    // It's possible that the previous transformation has resulted in a no-op
+    // register move (i.e. one where source and destination registers are the
+    // same). If so, delete it.
+    CopyOperands = isCopyInstr(MI, *TII, UseCopyInstr);
+    if (CopyOperands && CopyOperands->Source->getReg() == CopyOperands->Destination->getReg()) {
+      MI.eraseFromParent();
+      NumDeletes++;
+      Changed = true;
+      continue;
+    }
+
     // Not a copy.
     SmallVector<Register, 4> Defs;
     const MachineOperand *RegMask = nullptr;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index 2fcb911c2654a..309ebf71127c4 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -38,7 +38,6 @@ define void @constant_fold_barrier_i128(ptr %p) {
 ; RV32-NEXT:    seqz a7, a6
 ; RV32-NEXT:    and a1, a7, a1
 ; RV32-NEXT:    add a7, a4, zero
-; RV32-NEXT:    add a5, a5, zero
 ; RV32-NEXT:    sltu a4, a4, a4
 ; RV32-NEXT:    or a1, a3, a1
 ; RV32-NEXT:    add a7, a7, a1
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 2c3e3faddc391..8e4c0376ad0a5 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -26,7 +26,6 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
 ; RV64-NEXT:    and a0, a0, a2
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    sltu a2, a0, a2
-; RV64-NEXT:    add a1, a1, zero
 ; RV64-NEXT:    add a1, a1, a2
 ; RV64-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
new file mode 100644
index 0000000000000..d739537b50d05
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
@@ -0,0 +1,74 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -o - %s -mtriple=riscv64 -run-pass=machine-cp -mcp-use-is-copy-instr | FileCheck %s
+
+## This test was added to capture a case where MachineCopyPropagation risks
+## leaving a no-op register move (add, x0, reg).
+
+---
+name: ham
+body: |
+  ; CHECK-LABEL: name: ham
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $x11 = ANDI killed renamable $x10, 1
+  ; CHECK-NEXT:   renamable $x10 = ADDI $x0, 1
+  ; CHECK-NEXT:   BEQ killed renamable $x11, $x0, %bb.3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x11 = ADDI $x0, 0
+  ; CHECK-NEXT:   BEQ renamable $x10, $x0, %bb.4
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   $x10 = ADDI $x0, 0
+  ; CHECK-NEXT:   PseudoRET implicit $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $x11 = ADDI $x0, 1
+  ; CHECK-NEXT:   renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+  ; CHECK-NEXT:   BNE renamable $x10, $x0, %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4:
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   PseudoRET implicit $x10
+  bb.0:
+    successors: %bb.1(0x40000000), %bb.2(0x40000000)
+    liveins: $x10
+
+    renamable $x11 = ANDI killed renamable $x10, 1
+    renamable $x10 = ADDI $x0, 1
+    BEQ killed renamable $x11, $x0, %bb.2
+
+  bb.1:
+    successors: %bb.4(0x40000000), %bb.5(0x40000000)
+    liveins: $x10
+
+    $x11 = ADDI $x0, 0
+    renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+    BEQ renamable $x10, $x0, %bb.4
+
+  bb.5:
+    $x10 = ADDI $x0, 0
+    PseudoRET implicit $x10
+
+  bb.2:
+    successors: %bb.4(0x40000000), %bb.5(0x40000000)
+    liveins: $x10
+
+    renamable $x11 = ADDI $x0, 1
+    renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+    BNE renamable $x10, $x0, %bb.5
+
+  bb.4:
+    liveins: $x10
+
+    PseudoRET implicit $x10
+...
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index e0a16aa05cd00..49494608eee4d 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1352,7 +1352,6 @@ define signext i32 @sextw_sh2add(i1 zeroext %0, ptr %1, i32 signext %2, i32 sign
 ; NOREMOVAL-LABEL: sextw_sh2add:
 ; NOREMOVAL:       # %bb.0:
 ; NOREMOVAL-NEXT:    sh2add a2, a2, a3
-; NOREMOVAL-NEXT:    mv a2, a2
 ; NOREMOVAL-NEXT:    beqz a0, .LBB22_2
 ; NOREMOVAL-NEXT:  # %bb.1:
 ; NOREMOVAL-NEXT:    sw a2, 0(a1)

github-actions · 2025-03-05T14:55:58Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/CodeGen/MachineCopyPropagation.cpp

preames

LGTM

…pagation-noop

…on leaves behind a no-op reg move Pre-commit for #129889.

…pagation-noop

mstorsjo · 2025-04-25T13:25:39Z

Hi Alex!

Sorry to say, but this commit seems to have introduced a miscompilation for aarch64 targets. When compiling Clang for aarch64-w64-mingw32 (possibly any aarch64 Windows, possibly any aarch64 target) from a Clang source version at 378ac57 or later, with a Clang version including this commit, then the resulting Clang binary would crash when compiling - but not always.

(I didn't notice this right away, as I don't have full continuous testing of Clang running on Windows on ARM, I only test some cross compiled applications like ffmpeg, in my local nightly testing. But as free github actions runners with Windows on ARM is available now, I should be able to test this continuously; that's how I noticed the issue, when testing out such setups.)

In any case, once pinpointed, the issue looks quite clear. The issue can be reproduced with https://martin.st/temp/DAGCombiner-preproc.cpp.

This input file, compiled like this:

$ clang -target aarch64-w64-mingw32 -O3 -std=c++17 -fno-exceptions -funwind-tables -fno-rtti DAGCombiner-preproc.cpp -S -o dagcombiner.s

After this commit, we can observe the following diff in the generated output:

--- dagcombiner-good.s  2025-04-25 16:18:25.317278754 +0300
+++ dagcombiner-bad.s   2025-04-25 16:16:10.153507344 +0300
@@ -87849,7 +87849,6 @@
        b.ne    .LBB167_141
 .LBB167_121:
        ldr     x8, [sp, #80]                   // 8-byte Folded Reload
-       mov     w29, w29
        ldr     x8, [x8, #48]
        ldr     q0, [x8, x29, lsl #4]
        umov    w8, v0.h[0]
@@ -97136,7 +97135,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -129458,7 +129456,6 @@
 // %bb.61:
        mov     w8, w27
        mov     w0, w20
-       mov     w29, w29
        str     x8, [sp, #40]                   // 8-byte Folded Spill
        bl      _ZN4llvm3ISD23getSetCCSwappedOperandsENS0_8CondCodeE
        mov     w27, w0
@@ -153275,7 +153272,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -202049,7 +202045,6 @@
        add     x8, x10, w8, uxth #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #224]
        b.ne    .LBB792_16
@@ -202959,7 +202954,6 @@
        add     x8, x10, w8, uxtw #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #48]
        b.ne    .LBB794_22

Removing instructions like mov x8, x8 should be fine to do. However, removing an instruction like mov w29, w29 may not be. On aarch64, w29 is the 32 bit lower half of the 64 bit register x29, and an instruction like mov w29, w29 will implicitly clear the upper 32 bits of the register. In the first context of the diff, we can see the following instruction ldr q0, [x8, x29, lsl #4]. And that instruction uses the whole 64 bit register x29 as offset/index, while the mov w29, w29 will have cleared the upper bits. (I think this should be possible to express as ldr q0, [x8, w29, uxtw #4] as well - but that's not what we have here anyway.)

In any case, that first removed mov w29, w29 is the culprit - as it now uses potentially uninitialized/undefined upper bits as offset. This explains why the built Clang now sometimes runs successfully and sometimes crashes.

Can we revert until we have this issue fixed properly? (Offhand I have no clue how to fix it.) As this has been in tree for over a month already, chances are that many things depend on it and can't be trivially reverted at this point.

CC @davemgreen @DavidSpickett for the aarch64 backend.

davemgreen · 2025-04-25T13:45:10Z

I believe it should look something like this, with an implicit def of x29 (which makes it not a "CopyInstr").

renamable $w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit-def $x29

It seems to be OK in simpler cases https://godbolt.org/z/3n5zKxfjn. Do you know if it has the implicit-def, or if it doesn't why it is missing in this case?

mstorsjo · 2025-04-25T13:49:09Z

I believe it should look something like this, with an implicit def of x29 (which makes it not a "CopyInstr").
renamable $w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit-def $x29
It seems to be OK in simpler cases https://godbolt.org/z/3n5zKxfjn. Do you know if it has the implicit-def, or if it doesn't why it is missing in this case?

Sorry, I'm not too familiar with those middle/backend lowering passes, so it would probably take me a good while to pinpoint exactly which instruction is being modified here - I hope someone more familiar and fluent with the tools can dig it up from the repro example above.

davemgreen · 2025-04-25T14:23:10Z

Ah - I was hoping you might have looked already, the reproducer is fairly large and you might have known the function to look at.

There is a smaller version from llvm-extract, of what I think is the right function in https://godbolt.org/z/xvdTzM5nP. It looks like block land.lhs.true252 is where w29 gets removed.

$w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit $fp, implicit-def $fp

The problem appears to be that the registers are not in the order we thought they should be. We assume that - W0 + X0 will be the equivalent X reg, but that goes out-of-order for w29/FP. I will try to put together a patch, the fix should be relatively simple but I need to create a test.

mstorsjo · 2025-04-25T14:26:58Z

Ah - I was hoping you might have looked already, the reproducer is fairly large and you might have known the function to look at.

Nah - I have been hunting to reliably reproduce and pinpoint the bug for a week already; once I had an undisputable reproducer, I wanted to hand it over to someone else :-)

The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.

davemgreen · 2025-04-25T15:33:45Z

Thanks in either case for the report. Do you mind giving #137348 a run with your original problem case to see if it fixes all the issues you are seeing or if some remain?

mstorsjo · 2025-04-25T17:55:25Z

Thanks in either case for the report. Do you mind giving #137348 a run with your original problem case to see if it fixes all the issues you are seeing or if some remain?

Thanks for the very quick fix! This does seem to avoid the issue; a Clang built with a Clang with this patch, no longer crashes for the single reproducer input that I had. I've started a longer full build chain which will try to build a larger number of test projects with the produced Windows/arm64 compiler, I'll know by tomorrow if that fixed things fully or not. But the fix for this issue does look good, thanks! (And it would have taken me ages to isolate, dig in and figure out what's broken and how to fix it!)

The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on #129889.

mstorsjo · 2025-04-26T11:12:20Z

I've started a longer full build chain which will try to build a larger number of test projects with the produced Windows/arm64 compiler, I'll know by tomorrow if that fixed things fully or not.

The full test sequence passed flawlessly, so it seems like this is fully fixed with this patch (now merged), thanks!

The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.

asb added 2 commits March 5, 2025 14:36

[RISCV][test] Add test case showing case where machine copy propagati…

32d1aa1

…on leaves behind a no-op reg move

[MachineCopyPropagation] Recognise and delete no-op moves produced af…

03e255f

…ter forwarded uses This changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.

asb requested review from preames, topperc and wangpc-pp March 5, 2025 14:52

llvmbot added llvm:regalloc llvm:globalisel labels Mar 5, 2025

Formatting fix

d209c13

preames reviewed Mar 5, 2025

View reviewed changes

llvm/lib/CodeGen/MachineCopyPropagation.cpp Show resolved Hide resolved

llvm/lib/CodeGen/MachineCopyPropagation.cpp Show resolved Hide resolved

Don't delete apparent no-op if src==def is a reserved reg

c0498f8

preames reviewed Mar 7, 2025

View reviewed changes

preames approved these changes Mar 7, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into 2025q1-machinecopypro…

a9ff08e

…pagation-noop

asb added a commit that referenced this pull request Mar 10, 2025

[RISCV][test] Add test case showing case where machine copy propagati…

7fb71d1

…on leaves behind a no-op reg move Pre-commit for #129889.

Merge remote-tracking branch 'origin/main' into 2025q1-machinecopypro…

85468c1

…pagation-noop

asb merged commit dffbc03 into llvm:main Mar 10, 2025
6 of 10 checks passed

shiltian mentioned this pull request Mar 10, 2025

[AMDGPU] Fix test failures when expensive checks are enabled #130644

Merged

asb mentioned this pull request Mar 28, 2025

[RISCV] Address instances of redundant or suboptimally canonicalised instructions in llvm-test-suite (tracking issue) #133449

Open

12 tasks

davemgreen mentioned this pull request Apr 25, 2025

[AArch64] Correctly detect X reg from w reg in isCopyImpl #137348

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

Uh oh!

asb commented Mar 5, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

preames left a comment

Uh oh!

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 26, 2025

Uh oh!

Uh oh!

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

Uh oh!

Conversation

asb commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 25, 2025

Uh oh!

mstorsjo commented Apr 26, 2025

Uh oh!

Uh oh!

asb commented Mar 5, 2025 •

edited

Loading

llvmbot commented Mar 5, 2025 •

edited

Loading

github-actions bot commented Mar 5, 2025 •

edited

Loading