Skip to content

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 10, 2025

Conversation

asb
Copy link
Contributor

@asb asb commented Mar 5, 2025

This change removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.


I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach. isNopCopy is attractively named, but it is focused on removing cases where a copy is made redundant by a previous copy, rather than the case covered here of a copy that is literally a no-op (src == dst).

asb added 2 commits March 5, 2025 14:36
…ter forwarded uses

This changes removes 189 static instances of no-op reg-reg moves (i.e.
where src == dest) across llvm-test-suite when compiled for RISC-V
rv64gc and with SPEC included.
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2025

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.


I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach. isNopCopy is attractively named, but it is focused on removing cases where a copy is made redundant by a previous copy, rather than the case covered here of a copy that is literally a no-op (src == dst).


Full diff: https://github.com/llvm/llvm-project/pull/129889.diff

5 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll (-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll (-1)
  • (added) llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir (+74)
  • (modified) llvm/test/CodeGen/RISCV/sextw-removal.ll (-1)
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 1105b8c15515f..9bc0f5836f33a 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -971,6 +971,17 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
 
     forwardUses(MI);
 
+    // It's possible that the previous transformation has resulted in a no-op
+    // register move (i.e. one where source and destination registers are the
+    // same). If so, delete it.
+    CopyOperands = isCopyInstr(MI, *TII, UseCopyInstr);
+    if (CopyOperands && CopyOperands->Source->getReg() == CopyOperands->Destination->getReg()) {
+      MI.eraseFromParent();
+      NumDeletes++;
+      Changed = true;
+      continue;
+    }
+
     // Not a copy.
     SmallVector<Register, 4> Defs;
     const MachineOperand *RegMask = nullptr;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index 2fcb911c2654a..309ebf71127c4 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -38,7 +38,6 @@ define void @constant_fold_barrier_i128(ptr %p) {
 ; RV32-NEXT:    seqz a7, a6
 ; RV32-NEXT:    and a1, a7, a1
 ; RV32-NEXT:    add a7, a4, zero
-; RV32-NEXT:    add a5, a5, zero
 ; RV32-NEXT:    sltu a4, a4, a4
 ; RV32-NEXT:    or a1, a3, a1
 ; RV32-NEXT:    add a7, a7, a1
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 2c3e3faddc391..8e4c0376ad0a5 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -26,7 +26,6 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
 ; RV64-NEXT:    and a0, a0, a2
 ; RV64-NEXT:    add a0, a0, a2
 ; RV64-NEXT:    sltu a2, a0, a2
-; RV64-NEXT:    add a1, a1, zero
 ; RV64-NEXT:    add a1, a1, a2
 ; RV64-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
new file mode 100644
index 0000000000000..d739537b50d05
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
@@ -0,0 +1,74 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -o - %s -mtriple=riscv64 -run-pass=machine-cp -mcp-use-is-copy-instr | FileCheck %s
+
+## This test was added to capture a case where MachineCopyPropagation risks
+## leaving a no-op register move (add, x0, reg).
+
+---
+name: ham
+body: |
+  ; CHECK-LABEL: name: ham
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $x11 = ANDI killed renamable $x10, 1
+  ; CHECK-NEXT:   renamable $x10 = ADDI $x0, 1
+  ; CHECK-NEXT:   BEQ killed renamable $x11, $x0, %bb.3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x11 = ADDI $x0, 0
+  ; CHECK-NEXT:   BEQ renamable $x10, $x0, %bb.4
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   $x10 = ADDI $x0, 0
+  ; CHECK-NEXT:   PseudoRET implicit $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $x11 = ADDI $x0, 1
+  ; CHECK-NEXT:   renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+  ; CHECK-NEXT:   BNE renamable $x10, $x0, %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4:
+  ; CHECK-NEXT:   liveins: $x10
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   PseudoRET implicit $x10
+  bb.0:
+    successors: %bb.1(0x40000000), %bb.2(0x40000000)
+    liveins: $x10
+
+    renamable $x11 = ANDI killed renamable $x10, 1
+    renamable $x10 = ADDI $x0, 1
+    BEQ killed renamable $x11, $x0, %bb.2
+
+  bb.1:
+    successors: %bb.4(0x40000000), %bb.5(0x40000000)
+    liveins: $x10
+
+    $x11 = ADDI $x0, 0
+    renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+    BEQ renamable $x10, $x0, %bb.4
+
+  bb.5:
+    $x10 = ADDI $x0, 0
+    PseudoRET implicit $x10
+
+  bb.2:
+    successors: %bb.4(0x40000000), %bb.5(0x40000000)
+    liveins: $x10
+
+    renamable $x11 = ADDI $x0, 1
+    renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+    BNE renamable $x10, $x0, %bb.5
+
+  bb.4:
+    liveins: $x10
+
+    PseudoRET implicit $x10
+...
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index e0a16aa05cd00..49494608eee4d 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1352,7 +1352,6 @@ define signext i32 @sextw_sh2add(i1 zeroext %0, ptr %1, i32 signext %2, i32 sign
 ; NOREMOVAL-LABEL: sextw_sh2add:
 ; NOREMOVAL:       # %bb.0:
 ; NOREMOVAL-NEXT:    sh2add a2, a2, a3
-; NOREMOVAL-NEXT:    mv a2, a2
 ; NOREMOVAL-NEXT:    beqz a0, .LBB22_2
 ; NOREMOVAL-NEXT:  # %bb.1:
 ; NOREMOVAL-NEXT:    sw a2, 0(a1)

Copy link

github-actions bot commented Mar 5, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

asb added a commit that referenced this pull request Mar 10, 2025
…on leaves behind a no-op reg move

Pre-commit for #129889.
@asb asb merged commit dffbc03 into llvm:main Mar 10, 2025
6 of 10 checks passed
@mstorsjo
Copy link
Member

Hi Alex!

Sorry to say, but this commit seems to have introduced a miscompilation for aarch64 targets. When compiling Clang for aarch64-w64-mingw32 (possibly any aarch64 Windows, possibly any aarch64 target) from a Clang source version at 378ac57 or later, with a Clang version including this commit, then the resulting Clang binary would crash when compiling - but not always.

(I didn't notice this right away, as I don't have full continuous testing of Clang running on Windows on ARM, I only test some cross compiled applications like ffmpeg, in my local nightly testing. But as free github actions runners with Windows on ARM is available now, I should be able to test this continuously; that's how I noticed the issue, when testing out such setups.)

In any case, once pinpointed, the issue looks quite clear. The issue can be reproduced with https://martin.st/temp/DAGCombiner-preproc.cpp.

This input file, compiled like this:

$ clang -target aarch64-w64-mingw32 -O3 -std=c++17 -fno-exceptions -funwind-tables -fno-rtti DAGCombiner-preproc.cpp -S -o dagcombiner.s

After this commit, we can observe the following diff in the generated output:

--- dagcombiner-good.s  2025-04-25 16:18:25.317278754 +0300
+++ dagcombiner-bad.s   2025-04-25 16:16:10.153507344 +0300
@@ -87849,7 +87849,6 @@
        b.ne    .LBB167_141
 .LBB167_121:
        ldr     x8, [sp, #80]                   // 8-byte Folded Reload
-       mov     w29, w29
        ldr     x8, [x8, #48]
        ldr     q0, [x8, x29, lsl #4]
        umov    w8, v0.h[0]
@@ -97136,7 +97135,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -129458,7 +129456,6 @@
 // %bb.61:
        mov     w8, w27
        mov     w0, w20
-       mov     w29, w29
        str     x8, [sp, #40]                   // 8-byte Folded Spill
        bl      _ZN4llvm3ISD23getSetCCSwappedOperandsENS0_8CondCodeE
        mov     w27, w0
@@ -153275,7 +153272,6 @@
        lsl     x10, x9, #3
        add     w9, w9, #1
        str     w9, [x20, #12]
-       mov     w9, w9
        str     x1, [x8, x10]
        mov     w1, #1                          // =0x1
        ldr     x8, [x20]
@@ -202049,7 +202045,6 @@
        add     x8, x10, w8, uxth #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #224]
        b.ne    .LBB792_16
@@ -202959,7 +202954,6 @@
        add     x8, x10, w8, uxtw #1
        ldurh   w0, [x8, #-2]
        and     x8, x0, #0xffff
-       mov     x8, x8
        ands    x9, x8, #0xffff
        stp     x8, xzr, [sp, #48]
        b.ne    .LBB794_22

Removing instructions like mov x8, x8 should be fine to do. However, removing an instruction like mov w29, w29 may not be. On aarch64, w29 is the 32 bit lower half of the 64 bit register x29, and an instruction like mov w29, w29 will implicitly clear the upper 32 bits of the register. In the first context of the diff, we can see the following instruction ldr q0, [x8, x29, lsl #4]. And that instruction uses the whole 64 bit register x29 as offset/index, while the mov w29, w29 will have cleared the upper bits. (I think this should be possible to express as ldr q0, [x8, w29, uxtw #4] as well - but that's not what we have here anyway.)

In any case, that first removed mov w29, w29 is the culprit - as it now uses potentially uninitialized/undefined upper bits as offset. This explains why the built Clang now sometimes runs successfully and sometimes crashes.

Can we revert until we have this issue fixed properly? (Offhand I have no clue how to fix it.) As this has been in tree for over a month already, chances are that many things depend on it and can't be trivially reverted at this point.

CC @davemgreen @DavidSpickett for the aarch64 backend.

@davemgreen
Copy link
Collaborator

I believe it should look something like this, with an implicit def of x29 (which makes it not a "CopyInstr").

renamable $w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit-def $x29

It seems to be OK in simpler cases https://godbolt.org/z/3n5zKxfjn. Do you know if it has the implicit-def, or if it doesn't why it is missing in this case?

@mstorsjo
Copy link
Member

I believe it should look something like this, with an implicit def of x29 (which makes it not a "CopyInstr").

renamable $w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit-def $x29

It seems to be OK in simpler cases https://godbolt.org/z/3n5zKxfjn. Do you know if it has the implicit-def, or if it doesn't why it is missing in this case?

Sorry, I'm not too familiar with those middle/backend lowering passes, so it would probably take me a good while to pinpoint exactly which instruction is being modified here - I hope someone more familiar and fluent with the tools can dig it up from the repro example above.

@davemgreen
Copy link
Collaborator

Ah - I was hoping you might have looked already, the reproducer is fairly large and you might have known the function to look at.

There is a smaller version from llvm-extract, of what I think is the right function in https://godbolt.org/z/xvdTzM5nP. It looks like block land.lhs.true252 is where w29 gets removed.

$w29 = ORRWrs $wzr, killed renamable $w29, 0, implicit $fp, implicit-def $fp

The problem appears to be that the registers are not in the order we thought they should be. We assume that - W0 + X0 will be the equivalent X reg, but that goes out-of-order for w29/FP. I will try to put together a patch, the fix should be relatively simple but I need to create a test.

@mstorsjo
Copy link
Member

Ah - I was hoping you might have looked already, the reproducer is fairly large and you might have known the function to look at.

Nah - I have been hunting to reliably reproduce and pinpoint the bug for a week already; once I had an undisputable reproducer, I wanted to hand it over to someone else :-)

davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Apr 25, 2025
The MachineCopyPropagation pass was incorrectly removing copy
(ORRWrs) instruction that appeared to be a nop. The instruction
should not have been marked as a copy though, the code was
incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is
fp it was out of order with the other registers.

Fixes an issue reported on llvm#129889.
@davemgreen
Copy link
Collaborator

Thanks in either case for the report. Do you mind giving #137348 a run with your original problem case to see if it fixes all the issues you are seeing or if some remain?

@mstorsjo
Copy link
Member

Thanks in either case for the report. Do you mind giving #137348 a run with your original problem case to see if it fixes all the issues you are seeing or if some remain?

Thanks for the very quick fix! This does seem to avoid the issue; a Clang built with a Clang with this patch, no longer crashes for the single reproducer input that I had. I've started a longer full build chain which will try to build a larger number of test projects with the produced Windows/arm64 compiler, I'll know by tomorrow if that fixed things fully or not. But the fix for this issue does look good, thanks! (And it would have taken me ages to isolate, dig in and figure out what's broken and how to fix it!)

davemgreen added a commit that referenced this pull request Apr 26, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on #129889.
@mstorsjo
Copy link
Member

I've started a longer full build chain which will try to build a larger number of test projects with the produced Windows/arm64 compiler, I'll know by tomorrow if that fixed things fully or not.

The full test sequence passed flawlessly, so it seems like this is fully fixed with this patch (now merged), thanks!

jyli0116 pushed a commit to jyli0116/llvm-project that referenced this pull request Apr 28, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on llvm#129889.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on llvm#129889.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on llvm#129889.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on llvm#129889.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs)
instruction that appeared to be a nop. The instruction should not have
been marked as a copy though, the code was incorrectly assuming that
w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other
registers.

Fixes an issue reported on llvm#129889.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants