-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses #129889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on leaves behind a no-op reg move
…ter forwarded uses This changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.
@llvm/pr-subscribers-llvm-regalloc @llvm/pr-subscribers-llvm-globalisel Author: Alex Bradbury (asb) ChangesThis changes removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included. I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach. Full diff: https://github.com/llvm/llvm-project/pull/129889.diff 5 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 1105b8c15515f..9bc0f5836f33a 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -971,6 +971,17 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
forwardUses(MI);
+ // It's possible that the previous transformation has resulted in a no-op
+ // register move (i.e. one where source and destination registers are the
+ // same). If so, delete it.
+ CopyOperands = isCopyInstr(MI, *TII, UseCopyInstr);
+ if (CopyOperands && CopyOperands->Source->getReg() == CopyOperands->Destination->getReg()) {
+ MI.eraseFromParent();
+ NumDeletes++;
+ Changed = true;
+ continue;
+ }
+
// Not a copy.
SmallVector<Register, 4> Defs;
const MachineOperand *RegMask = nullptr;
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index 2fcb911c2654a..309ebf71127c4 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -38,7 +38,6 @@ define void @constant_fold_barrier_i128(ptr %p) {
; RV32-NEXT: seqz a7, a6
; RV32-NEXT: and a1, a7, a1
; RV32-NEXT: add a7, a4, zero
-; RV32-NEXT: add a5, a5, zero
; RV32-NEXT: sltu a4, a4, a4
; RV32-NEXT: or a1, a3, a1
; RV32-NEXT: add a7, a7, a1
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 2c3e3faddc391..8e4c0376ad0a5 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -26,7 +26,6 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
; RV64-NEXT: and a0, a0, a2
; RV64-NEXT: add a0, a0, a2
; RV64-NEXT: sltu a2, a0, a2
-; RV64-NEXT: add a1, a1, zero
; RV64-NEXT: add a1, a1, a2
; RV64-NEXT: ret
entry:
diff --git a/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
new file mode 100644
index 0000000000000..d739537b50d05
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/machine-copyprop-noop-removal.mir
@@ -0,0 +1,74 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -o - %s -mtriple=riscv64 -run-pass=machine-cp -mcp-use-is-copy-instr | FileCheck %s
+
+## This test was added to capture a case where MachineCopyPropagation risks
+## leaving a no-op register move (add, x0, reg).
+
+---
+name: ham
+body: |
+ ; CHECK-LABEL: name: ham
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.3(0x40000000)
+ ; CHECK-NEXT: liveins: $x10
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $x11 = ANDI killed renamable $x10, 1
+ ; CHECK-NEXT: renamable $x10 = ADDI $x0, 1
+ ; CHECK-NEXT: BEQ killed renamable $x11, $x0, %bb.3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: liveins: $x10
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $x11 = ADDI $x0, 0
+ ; CHECK-NEXT: BEQ renamable $x10, $x0, %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: $x10 = ADDI $x0, 0
+ ; CHECK-NEXT: PseudoRET implicit $x10
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: liveins: $x10
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: renamable $x11 = ADDI $x0, 1
+ ; CHECK-NEXT: renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+ ; CHECK-NEXT: BNE renamable $x10, $x0, %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: liveins: $x10
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: PseudoRET implicit $x10
+ bb.0:
+ successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ liveins: $x10
+
+ renamable $x11 = ANDI killed renamable $x10, 1
+ renamable $x10 = ADDI $x0, 1
+ BEQ killed renamable $x11, $x0, %bb.2
+
+ bb.1:
+ successors: %bb.4(0x40000000), %bb.5(0x40000000)
+ liveins: $x10
+
+ $x11 = ADDI $x0, 0
+ renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+ BEQ renamable $x10, $x0, %bb.4
+
+ bb.5:
+ $x10 = ADDI $x0, 0
+ PseudoRET implicit $x10
+
+ bb.2:
+ successors: %bb.4(0x40000000), %bb.5(0x40000000)
+ liveins: $x10
+
+ renamable $x11 = ADDI $x0, 1
+ renamable $x10 = ADD killed renamable $x11, killed renamable $x10
+ BNE renamable $x10, $x0, %bb.5
+
+ bb.4:
+ liveins: $x10
+
+ PseudoRET implicit $x10
+...
diff --git a/llvm/test/CodeGen/RISCV/sextw-removal.ll b/llvm/test/CodeGen/RISCV/sextw-removal.ll
index e0a16aa05cd00..49494608eee4d 100644
--- a/llvm/test/CodeGen/RISCV/sextw-removal.ll
+++ b/llvm/test/CodeGen/RISCV/sextw-removal.ll
@@ -1352,7 +1352,6 @@ define signext i32 @sextw_sh2add(i1 zeroext %0, ptr %1, i32 signext %2, i32 sign
; NOREMOVAL-LABEL: sextw_sh2add:
; NOREMOVAL: # %bb.0:
; NOREMOVAL-NEXT: sh2add a2, a2, a3
-; NOREMOVAL-NEXT: mv a2, a2
; NOREMOVAL-NEXT: beqz a0, .LBB22_2
; NOREMOVAL-NEXT: # %bb.1:
; NOREMOVAL-NEXT: sw a2, 0(a1)
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…on leaves behind a no-op reg move Pre-commit for #129889.
Hi Alex! Sorry to say, but this commit seems to have introduced a miscompilation for aarch64 targets. When compiling Clang for aarch64-w64-mingw32 (possibly any aarch64 Windows, possibly any aarch64 target) from a Clang source version at 378ac57 or later, with a Clang version including this commit, then the resulting Clang binary would crash when compiling - but not always. (I didn't notice this right away, as I don't have full continuous testing of Clang running on Windows on ARM, I only test some cross compiled applications like ffmpeg, in my local nightly testing. But as free github actions runners with Windows on ARM is available now, I should be able to test this continuously; that's how I noticed the issue, when testing out such setups.) In any case, once pinpointed, the issue looks quite clear. The issue can be reproduced with https://martin.st/temp/DAGCombiner-preproc.cpp. This input file, compiled like this:
After this commit, we can observe the following diff in the generated output: --- dagcombiner-good.s 2025-04-25 16:18:25.317278754 +0300
+++ dagcombiner-bad.s 2025-04-25 16:16:10.153507344 +0300
@@ -87849,7 +87849,6 @@
b.ne .LBB167_141
.LBB167_121:
ldr x8, [sp, #80] // 8-byte Folded Reload
- mov w29, w29
ldr x8, [x8, #48]
ldr q0, [x8, x29, lsl #4]
umov w8, v0.h[0]
@@ -97136,7 +97135,6 @@
lsl x10, x9, #3
add w9, w9, #1
str w9, [x20, #12]
- mov w9, w9
str x1, [x8, x10]
mov w1, #1 // =0x1
ldr x8, [x20]
@@ -129458,7 +129456,6 @@
// %bb.61:
mov w8, w27
mov w0, w20
- mov w29, w29
str x8, [sp, #40] // 8-byte Folded Spill
bl _ZN4llvm3ISD23getSetCCSwappedOperandsENS0_8CondCodeE
mov w27, w0
@@ -153275,7 +153272,6 @@
lsl x10, x9, #3
add w9, w9, #1
str w9, [x20, #12]
- mov w9, w9
str x1, [x8, x10]
mov w1, #1 // =0x1
ldr x8, [x20]
@@ -202049,7 +202045,6 @@
add x8, x10, w8, uxth #1
ldurh w0, [x8, #-2]
and x8, x0, #0xffff
- mov x8, x8
ands x9, x8, #0xffff
stp x8, xzr, [sp, #224]
b.ne .LBB792_16
@@ -202959,7 +202954,6 @@
add x8, x10, w8, uxtw #1
ldurh w0, [x8, #-2]
and x8, x0, #0xffff
- mov x8, x8
ands x9, x8, #0xffff
stp x8, xzr, [sp, #48]
b.ne .LBB794_22 Removing instructions like In any case, that first removed Can we revert until we have this issue fixed properly? (Offhand I have no clue how to fix it.) As this has been in tree for over a month already, chances are that many things depend on it and can't be trivially reverted at this point. CC @davemgreen @DavidSpickett for the aarch64 backend. |
I believe it should look something like this, with an implicit def of x29 (which makes it not a "CopyInstr").
It seems to be OK in simpler cases https://godbolt.org/z/3n5zKxfjn. Do you know if it has the implicit-def, or if it doesn't why it is missing in this case? |
Sorry, I'm not too familiar with those middle/backend lowering passes, so it would probably take me a good while to pinpoint exactly which instruction is being modified here - I hope someone more familiar and fluent with the tools can dig it up from the repro example above. |
Ah - I was hoping you might have looked already, the reproducer is fairly large and you might have known the function to look at. There is a smaller version from llvm-extract, of what I think is the right function in https://godbolt.org/z/xvdTzM5nP. It looks like block land.lhs.true252 is where w29 gets removed.
The problem appears to be that the registers are not in the order we thought they should be. We assume that - W0 + X0 will be the equivalent X reg, but that goes out-of-order for w29/FP. I will try to put together a patch, the fix should be relatively simple but I need to create a test. |
Nah - I have been hunting to reliably reproduce and pinpoint the bug for a week already; once I had an undisputable reproducer, I wanted to hand it over to someone else :-) |
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
Thanks in either case for the report. Do you mind giving #137348 a run with your original problem case to see if it fixes all the issues you are seeing or if some remain? |
Thanks for the very quick fix! This does seem to avoid the issue; a Clang built with a Clang with this patch, no longer crashes for the single reproducer input that I had. I've started a longer full build chain which will try to build a larger number of test projects with the produced Windows/arm64 compiler, I'll know by tomorrow if that fixed things fully or not. But the fix for this issue does look good, thanks! (And it would have taken me ages to isolate, dig in and figure out what's broken and how to fix it!) |
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on #129889.
The full test sequence passed flawlessly, so it seems like this is fully fixed with this patch (now merged), thanks! |
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
The MachineCopyPropagation pass was incorrectly removing copy (ORRWrs) instruction that appeared to be a nop. The instruction should not have been marked as a copy though, the code was incorrectly assuming that w29 - w0 + x0 == x29, but as x29 is fp it was out of order with the other registers. Fixes an issue reported on llvm#129889.
This change removes 189 static instances of no-op reg-reg moves (i.e. where src == dest) across llvm-test-suite when compiled for RISC-V rv64gc and with SPEC included.
I think this is a reasonable way to handle this case, but it's possible I'm missing an alternate approach.
isNopCopy
is attractively named, but it is focused on removing cases where a copy is made redundant by a previous copy, rather than the case covered here of a copy that is literally a no-op (src == dst).