-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[PHIElimination] Reuse existing COPY in predecessor basic block #131837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-hexagon Author: Guy David (guy-david) ChangesThe insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced). Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff 127 Files Affected:
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsertPos =
findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
+ // Reuse an existing copy in the block if possible.
+ if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+ if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+ MRI->use_empty(SrcReg)) {
+ DefMI->getOperand(0).setReg(IncomingReg);
+ continue;
+ }
+ }
+
// Insert the copy.
MachineInstr *NewSrcInstr = nullptr;
if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
; -O0: bl __aarch64_cas16_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
; -O0: bl __aarch64_cas16_acq_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
bb.1:
%x:gpr32 = COPY $wzr
; Test that the debug location is not copied into bb1!
- ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+ ; CHECK: %3:gpr32 = COPY $wzr
; CHECK-LABEL: bb.2:
bb.2:
%y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
+ ; CHECK-NEXT: liveins: $nzcv, $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+ ; CHECK-NEXT: Bcc 8, %bb.2, implicit $nzcv
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: %y:gpr32 = COPY [[DEF]]
+ ; CHECK-NEXT: $wzr = COPY %y
+ bb.0:
+ liveins: $nzcv, $wzr
+ Bcc 8, %bb.2, implicit $nzcv
+ bb.1:
+ %x:gpr32 = COPY $wzr
+ bb.2:
+ %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+ $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
; CHECK-SD-NEXT: mov w10, w2
; CHECK-SD-NEXT: b.hi .LBB5_4
; CHECK-SD-NEXT: // %bb.2:
-; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: mov w8, wzr
+; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: b .LBB5_7
; CHECK-SD-NEXT: .LBB5_3:
; CHECK-SD-NEXT: mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_8:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalb w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_16:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalh w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_32:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddal w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_64:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: // kill: def $x8 killed $w8
; LSE-NEXT: ldaddal x8, x0, [x0]
; LSE-NEXT: ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB4_1
; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: adds x14, x11, #1
; NOLSE-NEXT: cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB4_1
; NOLSE-NEXT: b .LBB4_6
; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB4_1
; LSE-NEXT: .LBB4_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB4_1
; LSE-NEXT: b .LBB4_2
; LSE-NEXT: .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB9_1
; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: mov w8, w11
; NOLSE-NEXT: mvn w10, w8
; NOLSE-NEXT: // implicit-def: $x8
; NOLSE-NEXT: mov w8, w10
; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT: mov x15, #-1
+; NOLSE-NEXT: mov x15, #-1 // =0xffffffffffffffff
; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB9_1
; NOLSE-NEXT: b .LBB9_6
; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB9_1
; LSE-NEXT: .LBB9_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: // implicit-def: $x9
; LSE-NEXT: mov w9, w12
; LSE-NEXT: orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT: mov x9, #-1
+; LSE-NEXT: mov x9, #-1 // =0xffffffffffffffff
; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
; LSE-NEXT: mov x3, x9
; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB9_1
; LSE-NEXT: b .LBB9_2
; LSE-NEXT: .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]
|
@llvm/pr-subscribers-backend-loongarch Author: Guy David (guy-david) ChangesThe insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced). Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff 127 Files Affected:
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsertPos =
findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
+ // Reuse an existing copy in the block if possible.
+ if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+ if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+ MRI->use_empty(SrcReg)) {
+ DefMI->getOperand(0).setReg(IncomingReg);
+ continue;
+ }
+ }
+
// Insert the copy.
MachineInstr *NewSrcInstr = nullptr;
if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
; -O0: bl __aarch64_cas16_relax
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
; -O0: bl __aarch64_cas16_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
; -O0: bl __aarch64_cas16_acq_rel
-; -O0: subs x10, x10, x11
-; -O0: ccmp x8, x9, #0, eq
+; -O0: subs x9, x0, x9
+; -O0: ccmp x1, x8, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_unordered:
; -O1: ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stxp w8, x14, x15, [x9]
-; -O0: stxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stxp w12, x14, x15, [x13]
+; -O0: stxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_monotonic:
; -O1: ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0: ldxp x10, x12, [x9]
+; -O0: ldxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_release:
; -O1: ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0: ldaxp x10, x12, [x9]
+; -O0: ldaxp x8, x10, [x13]
+; -O0: cmp x8, x9
; -O0: cmp x10, x11
-; -O0: cmp x12, x13
-; -O0: stlxp w8, x14, x15, [x9]
-; -O0: stlxp w8, x10, x12, [x9]
-; -O0: subs x12, x12, x13
-; -O0: ccmp x10, x11, #0, eq
+; -O0: stlxp w12, x14, x15, [x13]
+; -O0: stlxp w12, x8, x10, [x13]
+; -O0: subs x10, x10, x11
+; -O0: ccmp x8, x9, #0, eq
;
; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
; -O1: ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
bb.1:
%x:gpr32 = COPY $wzr
; Test that the debug location is not copied into bb1!
- ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+ ; CHECK: %3:gpr32 = COPY $wzr
; CHECK-LABEL: bb.2:
bb.2:
%y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: test
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
+ ; CHECK-NEXT: liveins: $nzcv, $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+ ; CHECK-NEXT: Bcc 8, %bb.2, implicit $nzcv
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: %y:gpr32 = COPY [[DEF]]
+ ; CHECK-NEXT: $wzr = COPY %y
+ bb.0:
+ liveins: $nzcv, $wzr
+ Bcc 8, %bb.2, implicit $nzcv
+ bb.1:
+ %x:gpr32 = COPY $wzr
+ bb.2:
+ %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+ $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
; CHECK-SD-NEXT: mov w10, w2
; CHECK-SD-NEXT: b.hi .LBB5_4
; CHECK-SD-NEXT: // %bb.2:
-; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: mov w8, wzr
+; CHECK-SD-NEXT: mov x11, xzr
; CHECK-SD-NEXT: b .LBB5_7
; CHECK-SD-NEXT: .LBB5_3:
; CHECK-SD-NEXT: mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_8:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalb w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_16:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddalh w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_32:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: ldaddal w8, w0, [x0]
; LSE-NEXT: ret
entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst) {
;
; LSE-LABEL: test_rmw_add_64:
; LSE: // %bb.0: // %entry
-; LSE-NEXT: mov w8, #1
+; LSE-NEXT: mov w8, #1 // =0x1
; LSE-NEXT: // kill: def $x8 killed $w8
; LSE-NEXT: ldaddal x8, x0, [x0]
; LSE-NEXT: ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB4_1
; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: adds x14, x11, #1
; NOLSE-NEXT: cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB4_1
; NOLSE-NEXT: b .LBB4_6
; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB4_1
; LSE-NEXT: .LBB4_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB4_1
; LSE-NEXT: b .LBB4_2
; LSE-NEXT: .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: sub sp, sp, #48
; NOLSE-NEXT: .cfi_def_cfa_offset 48
; NOLSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT: ldr x8, [x0, #8]
-; NOLSE-NEXT: ldr x9, [x0]
+; NOLSE-NEXT: ldr x9, [x0, #8]
+; NOLSE-NEXT: ldr x8, [x0]
; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; NOLSE-NEXT: b .LBB9_1
; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
; NOLSE-NEXT: // =>This Loop Header: Depth=1
; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
; NOLSE-NEXT: mov w8, w11
; NOLSE-NEXT: mvn w10, w8
; NOLSE-NEXT: // implicit-def: $x8
; NOLSE-NEXT: mov w8, w10
; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT: mov x15, #-1
+; NOLSE-NEXT: mov x15, #-1 // =0xffffffffffffffff
; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
; NOLSE-NEXT: subs x12, x12, x13
; NOLSE-NEXT: ccmp x10, x11, #0, eq
-; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; NOLSE-NEXT: b.ne .LBB9_1
; NOLSE-NEXT: b .LBB9_6
; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: sub sp, sp, #48
; LSE-NEXT: .cfi_def_cfa_offset 48
; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT: ldr x8, [x0, #8]
-; LSE-NEXT: ldr x9, [x0]
+; LSE-NEXT: ldr x9, [x0, #8]
+; LSE-NEXT: ldr x8, [x0]
; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
; LSE-NEXT: b .LBB9_1
; LSE-NEXT: .LBB9_1: // %atomicrmw.start
; LSE-NEXT: // =>This Inner Loop Header: Depth=1
-; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload
; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
; LSE-NEXT: mov x0, x10
; LSE-NEXT: mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: // implicit-def: $x9
; LSE-NEXT: mov w9, w12
; LSE-NEXT: orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT: mov x9, #-1
+; LSE-NEXT: mov x9, #-1 // =0xffffffffffffffff
; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
; LSE-NEXT: mov x3, x9
; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst) {
; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
; LSE-NEXT: subs x11, x8, x11
; LSE-NEXT: ccmp x9, x10, #0, eq
-; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT: str x8, [sp, #32] // 8-byte Folded Spill
; LSE-NEXT: b.ne .LBB9_1
; LSE-NEXT: b .LBB9_2
; LSE-NEXT: .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]
|
1f7635b
to
3593737
Compare
0ae66b8
to
d87dc5b
Compare
d87dc5b
to
c7d638d
Compare
ping :) |
c7d638d
to
8848f2e
Compare
e549696
to
045edd6
Compare
045edd6
to
5d768f6
Compare
5d768f6
to
8cb3107
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/10862 Here is the relevant piece of the build log for the reference
|
Hello @guy-david The following starts crashing with this patch:
It crashes like
I originally saw the same crash without any special flags for my out-of-tree target and then saw I could reproduce on aarch64 (with added flags) as well. PHI elimination tturns
into
and I think that is what then trips the coalescer over. |
Thanks for looking into this, issued a fix in: #146320. |
Thanks, that fix seems to solve that problem. It looks like there are other problems as well though. I don't have a reproducer I can share now but if we have virtual registers of two different register classes "32BitRC" with 32 bit registers and "16BitRC" with 16 bit registers it looks like it turns
into
i.e. it ignores the sub register in the PHI? |
PR which introduced the bug: llvm/llvm-project#131837. Fixes a crash around dead registers which started in f5c62ee by verifying that the reused incoming register is also virtual.
I fiddled with the repro for my out-of-tree target and changed it to something for aarch64: Now, I don't know aarch64 and its register classes but the sub register access "%1.sub_32" is just dropped in the output. |
Now I feel bad 😆 Can you verify whether c4c9e0e solves the issue? |
It does. Thanks! :) |
The COPY you reuse is dead (right?) so why is reusing it any better than inserting a new one and allowing the dead one to be DCEd? (Or, why wasn't the dead COPY already DCEd before we got to this point?) |
// Reuse an existing copy in the block if possible. | ||
if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) { | ||
if (DefMI->isCopy() && DefMI->getParent() == &opBlock && | ||
MRI->use_empty(SrcReg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_nodbg_empty to avoid debug instruction effects
That's an unfortunate edge case for the problem I was trying to solve, for which I added a regression test in #146320. In the original issue there were no dead instructions. |
No, I mean in your patch DefMI is a COPY which defines SrcReg which has no uses, therefore it's a dead COPY, right? |
…ck (llvm#131837)" hangs hipCatch2 tests This reverts commit f5c62ee.
The user is the PHI node which is being removed in flight. |
Hi @guy-david, Another verifier error like this:
It fails with:
|
I'm also running into miscompilations caused by this in libvpx, for armv7 targets, with the latest git main of llvm. The reproducer for the miscompile is https://martin.st/temp/y4minput-preproc.c, compiled like this: $ clang -target armv7-w64-mingw32 y4minput-preproc.c -c -o y4minput.c.o -O2 |
Sorry for the inconvenience. I was not able to reproduce locally, can you test whether #146337 fixes the issue? |
…#131837) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.
PR which introduced the bug: llvm#131837. Fixes a crash around dead registers which started in f5c62ee by verifying that the reused incoming register is also virtual.
…#131837) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.
PR which introduced the bug: llvm#131837. Fixes a crash around dead registers which started in f5c62ee by verifying that the reused incoming register is also virtual.
…ass, update livevars. (#146337) Follow up to the second bug that #131837 introduced, described in #131837 (comment).
…register class, update livevars. (#146337) Follow up to the second bug that llvm/llvm-project#131837 introduced, described in llvm/llvm-project#131837 (comment).
The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.
This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.