Skip to content

[PHIElimination] Reuse existing COPY in predecessor basic block #131837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 29, 2025

Conversation

guy-david
Copy link
Contributor

@guy-david guy-david commented Mar 18, 2025

The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.

This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-backend-hexagon

Author: Guy David (guy-david)

Changes

The insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced).


Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff

127 Files Affected:

  • (modified) llvm/lib/CodeGen/PHIElimination.cpp (+9)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll (+8-8)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll (+24-24)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll (+24-24)
  • (modified) llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir (+1-1)
  • (added) llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir (+35)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/atomicrmw-O0.ll (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/bfis-in-loop.ll (+1-1)
  • (added) llvm/test/CodeGen/AArch64/block-layout-regression.mir (+107)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-crash.ll (+15-15)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-reductions-predicated-scalable.ll (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-reductions.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/phi.ll (+20-20)
  • (modified) llvm/test/CodeGen/AArch64/pr48188.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ragreedy-csr.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll (+56-57)
  • (modified) llvm/test/CodeGen/AArch64/reduce-or-opt.ll (+12-12)
  • (modified) llvm/test/CodeGen/AArch64/sink-and-fold.ll (+3-3)
  • (modified) llvm/test/CodeGen/AArch64/sve-lsrchain.ll (+7-7)
  • (modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-sink.ll (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/swifterror.ll (+8-8)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll (+74-72)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll (+74-72)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll (+832-789)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+110-100)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+1387-1378)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+924-908)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+924-908)
  • (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+914-922)
  • (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+114-114)
  • (modified) llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll (+29-33)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll (+29-33)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+952-950)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll (+658-656)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll (+658-656)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll (+793-791)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i32_system.ll (+323-323)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i64_system.ll (+461-461)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+255-255)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+225-225)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+225-225)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+227-227)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+62-77)
  • (modified) llvm/test/CodeGen/AMDGPU/move-to-valu-atomicrmw-system.ll (+17-17)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+869-871)
  • (modified) llvm/test/CodeGen/AMDGPU/sdiv64.ll (+117-117)
  • (modified) llvm/test/CodeGen/AMDGPU/srem64.ll (+117-117)
  • (modified) llvm/test/CodeGen/AMDGPU/udiv64.ll (+105-105)
  • (modified) llvm/test/CodeGen/AMDGPU/urem64.ll (+89-89)
  • (modified) llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll (+42-41)
  • (modified) llvm/test/CodeGen/AMDGPU/wave32.ll (+4-4)
  • (modified) llvm/test/CodeGen/ARM/and-cmp0-sink.ll (+11-11)
  • (modified) llvm/test/CodeGen/ARM/cttz.ll (+46-46)
  • (modified) llvm/test/CodeGen/ARM/select-imm.ll (+8-8)
  • (modified) llvm/test/CodeGen/ARM/struct-byval-loop.ll (+8-8)
  • (modified) llvm/test/CodeGen/ARM/swifterror.ll (+154-154)
  • (modified) llvm/test/CodeGen/AVR/bug-81911.ll (+17-17)
  • (modified) llvm/test/CodeGen/Hexagon/swp-conv3x3-nested.ll (+1-2)
  • (modified) llvm/test/CodeGen/Hexagon/swp-epilog-phi7.ll (+1)
  • (modified) llvm/test/CodeGen/Hexagon/swp-matmul-bitext.ll (+1-1)
  • (modified) llvm/test/CodeGen/Hexagon/swp-stages4.ll (+2-5)
  • (modified) llvm/test/CodeGen/Hexagon/tinycore.ll (+8-3)
  • (modified) llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll (+28-28)
  • (modified) llvm/test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir (+1-2)
  • (modified) llvm/test/CodeGen/PowerPC/disable-ctr-ppcf128.ll (+3-3)
  • (modified) llvm/test/CodeGen/PowerPC/phi-eliminate.mir (+3-6)
  • (modified) llvm/test/CodeGen/PowerPC/ppcf128-freeze.mir (+15-15)
  • (modified) llvm/test/CodeGen/PowerPC/pr116071.ll (+18-7)
  • (modified) llvm/test/CodeGen/PowerPC/sms-phi-2.ll (+6-7)
  • (modified) llvm/test/CodeGen/PowerPC/sms-phi-3.ll (+12-12)
  • (modified) llvm/test/CodeGen/PowerPC/stack-restore-with-setjmp.ll (+4-6)
  • (modified) llvm/test/CodeGen/PowerPC/subreg-postra-2.ll (+9-9)
  • (modified) llvm/test/CodeGen/PowerPC/vsx.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/abds.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/machine-pipeliner.ll (+13-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+60-60)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+30-31)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/xcvbi.ll (+30-30)
  • (modified) llvm/test/CodeGen/SystemZ/swifterror.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll (+48-48)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-disabled-in-loloops.ll (+22-22)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll (+16-16)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/while-loops.ll (+53-58)
  • (modified) llvm/test/CodeGen/Thumb2/mve-blockplacement.ll (+9-12)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float32regloops.ll (+23-20)
  • (modified) llvm/test/CodeGen/Thumb2/mve-laneinterleaving-reduct.ll (+4-4)
  • (modified) llvm/test/CodeGen/Thumb2/mve-memtp-loop.ll (+50-51)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+7-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll (+41-44)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+8-11)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll (+9-8)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll (+22-22)
  • (modified) llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll (+17-16)
  • (modified) llvm/test/CodeGen/Thumb2/pr52817.ll (+8-8)
  • (modified) llvm/test/CodeGen/VE/Scalar/br_jt.ll (+19-19)
  • (modified) llvm/test/CodeGen/X86/2012-01-10-UndefExceptionEdge.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll (+16-16)
  • (modified) llvm/test/CodeGen/X86/atomic32.ll (+72-54)
  • (modified) llvm/test/CodeGen/X86/atomic64.ll (+20-15)
  • (modified) llvm/test/CodeGen/X86/atomic6432.ll (+36-36)
  • (modified) llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/callbr-asm-kill.mir (+3-6)
  • (modified) llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness-reduced.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/combine-pmuldq.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/fp128-select.ll (+11-10)
  • (modified) llvm/test/CodeGen/X86/madd.ll (+58-58)
  • (modified) llvm/test/CodeGen/X86/masked_load.ll (+13-14)
  • (modified) llvm/test/CodeGen/X86/min-legal-vector-width.ll (+15-15)
  • (modified) llvm/test/CodeGen/X86/pcsections-atomics.ll (+158-138)
  • (modified) llvm/test/CodeGen/X86/pr15705.ll (+9-8)
  • (modified) llvm/test/CodeGen/X86/pr32256.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/pr38795.ll (+9-6)
  • (modified) llvm/test/CodeGen/X86/pr49451.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/pr63108.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/sad.ll (+13-13)
  • (modified) llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll (+40-48)
  • (modified) llvm/test/CodeGen/X86/statepoint-cmp-sunk-past-statepoint.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/swifterror.ll (+9-8)
  • (modified) llvm/test/DebugInfo/MIR/InstrRef/phi-regallocd-to-stack.mir (+3-4)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll (+7-11)
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
     MachineBasicBlock::iterator InsertPos =
         findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
 
+    // Reuse an existing copy in the block if possible.
+    if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+      if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+          MRI->use_empty(SrcReg)) {
+        DefMI->getOperand(0).setReg(IncomingReg);
+        continue;
+      }
+    }
+
     // Insert the copy.
     MachineInstr *NewSrcInstr = nullptr;
     if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
 ; -O0:    bl __aarch64_cas16_relax
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O0:    bl __aarch64_cas16_relax
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
 ; -O0:    bl __aarch64_cas16_rel
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O0:    bl __aarch64_cas16_acq_rel
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0:    ldaxp x10, x12, [x9]
+; -O0:    ldaxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0:    ldaxp x10, x12, [x9]
+; -O0:    ldaxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
   bb.1:
     %x:gpr32 = COPY $wzr
   ; Test that the debug location is not copied into bb1!
-  ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+  ; CHECK: %3:gpr32 = COPY $wzr
   ; CHECK-LABEL: bb.2:
   bb.2:
     %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.1(0x40000000)
+  ; CHECK-NEXT:   liveins: $nzcv, $wzr
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+  ; CHECK-NEXT:   Bcc 8, %bb.2, implicit $nzcv
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   %y:gpr32 = COPY [[DEF]]
+  ; CHECK-NEXT:   $wzr = COPY %y
+  bb.0:
+    liveins: $nzcv, $wzr
+    Bcc 8, %bb.2, implicit $nzcv
+  bb.1:
+    %x:gpr32 = COPY $wzr
+  bb.2:
+    %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+    $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-SD-NEXT:    mov w10, w2
 ; CHECK-SD-NEXT:    b.hi .LBB5_4
 ; CHECK-SD-NEXT:  // %bb.2:
-; CHECK-SD-NEXT:    mov x11, xzr
 ; CHECK-SD-NEXT:    mov w8, wzr
+; CHECK-SD-NEXT:    mov x11, xzr
 ; CHECK-SD-NEXT:    b .LBB5_7
 ; CHECK-SD-NEXT:  .LBB5_3:
 ; CHECK-SD-NEXT:    mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_8:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddalb w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_16:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddalh w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_32:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddal w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_64:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    // kill: def $x8 killed $w8
 ; LSE-NEXT:    ldaddal x8, x0, [x0]
 ; LSE-NEXT:    ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; NOLSE-NEXT:    sub sp, sp, #48
 ; NOLSE-NEXT:    .cfi_def_cfa_offset 48
 ; NOLSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT:    ldr x8, [x0, #8]
-; NOLSE-NEXT:    ldr x9, [x0]
+; NOLSE-NEXT:    ldr x9, [x0, #8]
+; NOLSE-NEXT:    ldr x8, [x0]
 ; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b .LBB4_1
 ; NOLSE-NEXT:  .LBB4_1: // %atomicrmw.start
 ; NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; NOLSE-NEXT:    // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT:    ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
 ; NOLSE-NEXT:    ldr x9, [sp, #24] // 8-byte Folded Reload
 ; NOLSE-NEXT:    adds x14, x11, #1
 ; NOLSE-NEXT:    cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; NOLSE-NEXT:    str x9, [sp, #16] // 8-byte Folded Spill
 ; NOLSE-NEXT:    subs x12, x12, x13
 ; NOLSE-NEXT:    ccmp x10, x11, #0, eq
-; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b.ne .LBB4_1
 ; NOLSE-NEXT:    b .LBB4_6
 ; NOLSE-NEXT:  .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; LSE-NEXT:    sub sp, sp, #48
 ; LSE-NEXT:    .cfi_def_cfa_offset 48
 ; LSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT:    ldr x8, [x0, #8]
-; LSE-NEXT:    ldr x9, [x0]
+; LSE-NEXT:    ldr x9, [x0, #8]
+; LSE-NEXT:    ldr x8, [x0]
 ; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; LSE-NEXT:    b .LBB4_1
 ; LSE-NEXT:  .LBB4_1: // %atomicrmw.start
 ; LSE-NEXT:    // =>This Inner Loop Header: Depth=1
-; LSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT:    ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x10, [sp, #40] // 8-byte Folded Reload
 ; LSE-NEXT:    ldr x8, [sp, #24] // 8-byte Folded Reload
 ; LSE-NEXT:    mov x0, x10
 ; LSE-NEXT:    mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; LSE-NEXT:    str x8, [sp, #16] // 8-byte Folded Spill
 ; LSE-NEXT:    subs x11, x8, x11
 ; LSE-NEXT:    ccmp x9, x10, #0, eq
-; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    b.ne .LBB4_1
 ; LSE-NEXT:    b .LBB4_2
 ; LSE-NEXT:  .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; NOLSE-NEXT:    sub sp, sp, #48
 ; NOLSE-NEXT:    .cfi_def_cfa_offset 48
 ; NOLSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT:    ldr x8, [x0, #8]
-; NOLSE-NEXT:    ldr x9, [x0]
+; NOLSE-NEXT:    ldr x9, [x0, #8]
+; NOLSE-NEXT:    ldr x8, [x0]
 ; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b .LBB9_1
 ; NOLSE-NEXT:  .LBB9_1: // %atomicrmw.start
 ; NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; NOLSE-NEXT:    // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT:    ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
 ; NOLSE-NEXT:    ldr x9, [sp, #24] // 8-byte Folded Reload
 ; NOLSE-NEXT:    mov w8, w11
 ; NOLSE-NEXT:    mvn w10, w8
 ; NOLSE-NEXT:    // implicit-def: $x8
 ; NOLSE-NEXT:    mov w8, w10
 ; NOLSE-NEXT:    orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT:    mov x15, #-1
+; NOLSE-NEXT:    mov x15, #-1 // =0xffffffffffffffff
 ; NOLSE-NEXT:  .LBB9_2: // %atomicrmw.start
 ; NOLSE-NEXT:    // Parent Loop BB9_1 Depth=1
 ; NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; NOLSE-NEXT:    str x9, [sp, #16] // 8-byte Folded Spill
 ; NOLSE-NEXT:    subs x12, x12, x13
 ; NOLSE-NEXT:    ccmp x10, x11, #0, eq
-; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b.ne .LBB9_1
 ; NOLSE-NEXT:    b .LBB9_6
 ; NOLSE-NEXT:  .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    sub sp, sp, #48
 ; LSE-NEXT:    .cfi_def_cfa_offset 48
 ; LSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT:    ldr x8, [x0, #8]
-; LSE-NEXT:    ldr x9, [x0]
+; LSE-NEXT:    ldr x9, [x0, #8]
+; LSE-NEXT:    ldr x8, [x0]
 ; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; LSE-NEXT:    b .LBB9_1
 ; LSE-NEXT:  .LBB9_1: // %atomicrmw.start
 ; LSE-NEXT:    // =>This Inner Loop Header: Depth=1
-; LSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT:    ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x10, [sp, #40] // 8-byte Folded Reload
 ; LSE-NEXT:    ldr x8, [sp, #24] // 8-byte Folded Reload
 ; LSE-NEXT:    mov x0, x10
 ; LSE-NEXT:    mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    // implicit-def: $x9
 ; LSE-NEXT:    mov w9, w12
 ; LSE-NEXT:    orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT:    mov x9, #-1
+; LSE-NEXT:    mov x9, #-1 // =0xffffffffffffffff
 ; LSE-NEXT:    // kill: def $x2 killed $x2 def $x2_x3
 ; LSE-NEXT:    mov x3, x9
 ; LSE-NEXT:    caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    str x8, [sp, #16] // 8-byte Folded Spill
 ; LSE-NEXT:    subs x11, x8, x11
 ; LSE-NEXT:    ccmp x9, x10, #0, eq
-; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    b.ne .LBB9_1
 ; LSE-NEXT:    b .LBB9_2
 ; LSE-NEXT:  .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-backend-loongarch

Author: Guy David (guy-david)

Changes

The insertion point of COPY isn't always optimal and could lead to a worse block layout, see the regression test in the first commit (which needs to be reduced).


Patch is 2.30 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131837.diff

127 Files Affected:

  • (modified) llvm/lib/CodeGen/PHIElimination.cpp (+9)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll (+8-8)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll (+24-24)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll (+24-24)
  • (modified) llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir (+1-1)
  • (added) llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir (+35)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/atomicrmw-O0.ll (+30-30)
  • (modified) llvm/test/CodeGen/AArch64/bfis-in-loop.ll (+1-1)
  • (added) llvm/test/CodeGen/AArch64/block-layout-regression.mir (+107)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-crash.ll (+15-15)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-reductions-predicated-scalable.ll (+14-14)
  • (modified) llvm/test/CodeGen/AArch64/complex-deinterleaving-reductions.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/phi.ll (+20-20)
  • (modified) llvm/test/CodeGen/AArch64/pr48188.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/ragreedy-csr.ll (+11-11)
  • (modified) llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll (+56-57)
  • (modified) llvm/test/CodeGen/AArch64/reduce-or-opt.ll (+12-12)
  • (modified) llvm/test/CodeGen/AArch64/sink-and-fold.ll (+3-3)
  • (modified) llvm/test/CodeGen/AArch64/sve-lsrchain.ll (+7-7)
  • (modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-sink.ll (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/swifterror.ll (+8-8)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll (+74-72)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll (+74-72)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll (+832-789)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+110-100)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fadd.ll (+1387-1378)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmax.ll (+924-908)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointer-atomicrmw-fmin.ll (+924-908)
  • (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+914-922)
  • (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+114-114)
  • (modified) llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmax.ll (+29-33)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-atomicrmw-fmin.ll (+29-33)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+952-950)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmax.ll (+658-656)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fmin.ll (+658-656)
  • (modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fsub.ll (+793-791)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i32_system.ll (+323-323)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_i64_system.ll (+461-461)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+255-255)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+225-225)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+225-225)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+227-227)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+62-77)
  • (modified) llvm/test/CodeGen/AMDGPU/move-to-valu-atomicrmw-system.ll (+17-17)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+869-871)
  • (modified) llvm/test/CodeGen/AMDGPU/sdiv64.ll (+117-117)
  • (modified) llvm/test/CodeGen/AMDGPU/srem64.ll (+117-117)
  • (modified) llvm/test/CodeGen/AMDGPU/udiv64.ll (+105-105)
  • (modified) llvm/test/CodeGen/AMDGPU/urem64.ll (+89-89)
  • (modified) llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll (+42-41)
  • (modified) llvm/test/CodeGen/AMDGPU/wave32.ll (+4-4)
  • (modified) llvm/test/CodeGen/ARM/and-cmp0-sink.ll (+11-11)
  • (modified) llvm/test/CodeGen/ARM/cttz.ll (+46-46)
  • (modified) llvm/test/CodeGen/ARM/select-imm.ll (+8-8)
  • (modified) llvm/test/CodeGen/ARM/struct-byval-loop.ll (+8-8)
  • (modified) llvm/test/CodeGen/ARM/swifterror.ll (+154-154)
  • (modified) llvm/test/CodeGen/AVR/bug-81911.ll (+17-17)
  • (modified) llvm/test/CodeGen/Hexagon/swp-conv3x3-nested.ll (+1-2)
  • (modified) llvm/test/CodeGen/Hexagon/swp-epilog-phi7.ll (+1)
  • (modified) llvm/test/CodeGen/Hexagon/swp-matmul-bitext.ll (+1-1)
  • (modified) llvm/test/CodeGen/Hexagon/swp-stages4.ll (+2-5)
  • (modified) llvm/test/CodeGen/Hexagon/tinycore.ll (+8-3)
  • (modified) llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll (+28-28)
  • (modified) llvm/test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir (+1-2)
  • (modified) llvm/test/CodeGen/PowerPC/disable-ctr-ppcf128.ll (+3-3)
  • (modified) llvm/test/CodeGen/PowerPC/phi-eliminate.mir (+3-6)
  • (modified) llvm/test/CodeGen/PowerPC/ppcf128-freeze.mir (+15-15)
  • (modified) llvm/test/CodeGen/PowerPC/pr116071.ll (+18-7)
  • (modified) llvm/test/CodeGen/PowerPC/sms-phi-2.ll (+6-7)
  • (modified) llvm/test/CodeGen/PowerPC/sms-phi-3.ll (+12-12)
  • (modified) llvm/test/CodeGen/PowerPC/stack-restore-with-setjmp.ll (+4-6)
  • (modified) llvm/test/CodeGen/PowerPC/subreg-postra-2.ll (+9-9)
  • (modified) llvm/test/CodeGen/PowerPC/vsx.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/abds.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/machine-pipeliner.ll (+13-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+60-60)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+30-31)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/xcvbi.ll (+30-30)
  • (modified) llvm/test/CodeGen/SystemZ/swifterror.ll (+2-2)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll (+48-48)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-disabled-in-loloops.ll (+22-22)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll (+16-16)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/while-loops.ll (+53-58)
  • (modified) llvm/test/CodeGen/Thumb2/mve-blockplacement.ll (+9-12)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float32regloops.ll (+23-20)
  • (modified) llvm/test/CodeGen/Thumb2/mve-laneinterleaving-reduct.ll (+4-4)
  • (modified) llvm/test/CodeGen/Thumb2/mve-memtp-loop.ll (+50-51)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+7-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll (+41-44)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+8-11)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-distribute.ll (+9-8)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll (+22-22)
  • (modified) llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll (+17-16)
  • (modified) llvm/test/CodeGen/Thumb2/pr52817.ll (+8-8)
  • (modified) llvm/test/CodeGen/VE/Scalar/br_jt.ll (+19-19)
  • (modified) llvm/test/CodeGen/X86/2012-01-10-UndefExceptionEdge.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll (+16-16)
  • (modified) llvm/test/CodeGen/X86/atomic32.ll (+72-54)
  • (modified) llvm/test/CodeGen/X86/atomic64.ll (+20-15)
  • (modified) llvm/test/CodeGen/X86/atomic6432.ll (+36-36)
  • (modified) llvm/test/CodeGen/X86/callbr-asm-branch-folding.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/callbr-asm-kill.mir (+3-6)
  • (modified) llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness-reduced.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/combine-pmuldq.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/fp128-select.ll (+11-10)
  • (modified) llvm/test/CodeGen/X86/madd.ll (+58-58)
  • (modified) llvm/test/CodeGen/X86/masked_load.ll (+13-14)
  • (modified) llvm/test/CodeGen/X86/min-legal-vector-width.ll (+15-15)
  • (modified) llvm/test/CodeGen/X86/pcsections-atomics.ll (+158-138)
  • (modified) llvm/test/CodeGen/X86/pr15705.ll (+9-8)
  • (modified) llvm/test/CodeGen/X86/pr32256.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/pr38795.ll (+9-6)
  • (modified) llvm/test/CodeGen/X86/pr49451.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/pr63108.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/sad.ll (+13-13)
  • (modified) llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll (+40-48)
  • (modified) llvm/test/CodeGen/X86/statepoint-cmp-sunk-past-statepoint.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/swifterror.ll (+9-8)
  • (modified) llvm/test/DebugInfo/MIR/InstrRef/phi-regallocd-to-stack.mir (+3-4)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/RISCV/lsr-drop-solution.ll (+7-11)
diff --git a/llvm/lib/CodeGen/PHIElimination.cpp b/llvm/lib/CodeGen/PHIElimination.cpp
index 14f91a87f75b4..cc3d4aac55b9d 100644
--- a/llvm/lib/CodeGen/PHIElimination.cpp
+++ b/llvm/lib/CodeGen/PHIElimination.cpp
@@ -587,6 +587,15 @@ void PHIEliminationImpl::LowerPHINode(MachineBasicBlock &MBB,
     MachineBasicBlock::iterator InsertPos =
         findPHICopyInsertPoint(&opBlock, &MBB, SrcReg);
 
+    // Reuse an existing copy in the block if possible.
+    if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
+      if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
+          MRI->use_empty(SrcReg)) {
+        DefMI->getOperand(0).setReg(IncomingReg);
+        continue;
+      }
+    }
+
     // Insert the copy.
     MachineInstr *NewSrcInstr = nullptr;
     if (!reusedIncoming && IncomingReg) {
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
index c1c5c53aa7df2..6c300b04508b2 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-outline_atomics.ll
@@ -118,8 +118,8 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
 ; -O0:    bl __aarch64_cas16_relax
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -131,8 +131,8 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O0:    bl __aarch64_cas16_relax
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -144,8 +144,8 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
 ; -O0:    bl __aarch64_cas16_rel
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -157,8 +157,8 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O0:    bl __aarch64_cas16_acq_rel
-; -O0:    subs x10, x10, x11
-; -O0:    ccmp x8, x9, #0, eq
+; -O0:    subs x9, x0, x9
+; -O0:    ccmp x1, x8, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
index d1047d84e2956..2a7bbad9d6454 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-rcpc.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0:    ldaxp x10, x12, [x9]
+; -O0:    ldaxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
index 1a79c73355143..493bc742f7663 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-store-v8a.ll
@@ -117,13 +117,13 @@ define dso_local void @store_atomic_i64_aligned_seq_cst(i64 %value, ptr %ptr) {
 
 define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_unordered:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_unordered:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -134,13 +134,13 @@ define dso_local void @store_atomic_i128_aligned_unordered(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_monotonic:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stxp w8, x14, x15, [x9]
-; -O0:    stxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stxp w12, x14, x15, [x13]
+; -O0:    stxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_monotonic:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -151,13 +151,13 @@ define dso_local void @store_atomic_i128_aligned_monotonic(i128 %value, ptr %ptr
 
 define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_release:
-; -O0:    ldxp x10, x12, [x9]
+; -O0:    ldxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_release:
 ; -O1:    ldxp xzr, x8, [x2]
@@ -168,13 +168,13 @@ define dso_local void @store_atomic_i128_aligned_release(i128 %value, ptr %ptr)
 
 define dso_local void @store_atomic_i128_aligned_seq_cst(i128 %value, ptr %ptr) {
 ; -O0-LABEL: store_atomic_i128_aligned_seq_cst:
-; -O0:    ldaxp x10, x12, [x9]
+; -O0:    ldaxp x8, x10, [x13]
+; -O0:    cmp x8, x9
 ; -O0:    cmp x10, x11
-; -O0:    cmp x12, x13
-; -O0:    stlxp w8, x14, x15, [x9]
-; -O0:    stlxp w8, x10, x12, [x9]
-; -O0:    subs x12, x12, x13
-; -O0:    ccmp x10, x11, #0, eq
+; -O0:    stlxp w12, x14, x15, [x13]
+; -O0:    stlxp w12, x8, x10, [x13]
+; -O0:    subs x10, x10, x11
+; -O0:    ccmp x8, x9, #0, eq
 ;
 ; -O1-LABEL: store_atomic_i128_aligned_seq_cst:
 ; -O1:    ldaxp xzr, x8, [x2]
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
index 01c44e3f253bb..993d1c1f1b5f0 100644
--- a/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-debugloc.mir
@@ -37,7 +37,7 @@ body: |
   bb.1:
     %x:gpr32 = COPY $wzr
   ; Test that the debug location is not copied into bb1!
-  ; CHECK: %3:gpr32 = COPY killed %x{{$}}
+  ; CHECK: %3:gpr32 = COPY $wzr
   ; CHECK-LABEL: bb.2:
   bb.2:
     %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0, debug-location !14
diff --git a/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
new file mode 100644
index 0000000000000..883d130bfac4e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/PHIElimination-reuse-copy.mir
@@ -0,0 +1,35 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -run-pass=phi-node-elimination -mtriple=aarch64-linux-gnu -o - %s | FileCheck %s
+
+# Verify that the original COPY in bb.1 is reappropriated as the PHI source in bb.2,
+# instead of creating a new COPY with the same source register.
+
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.1(0x40000000)
+  ; CHECK-NEXT:   liveins: $nzcv, $wzr
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:gpr32 = IMPLICIT_DEF
+  ; CHECK-NEXT:   Bcc 8, %bb.2, implicit $nzcv
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:gpr32 = COPY $wzr
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   %y:gpr32 = COPY [[DEF]]
+  ; CHECK-NEXT:   $wzr = COPY %y
+  bb.0:
+    liveins: $nzcv, $wzr
+    Bcc 8, %bb.2, implicit $nzcv
+  bb.1:
+    %x:gpr32 = COPY $wzr
+  bb.2:
+    %y:gpr32 = PHI %x:gpr32, %bb.1, undef %undef:gpr32, %bb.0
+    $wzr = COPY %y:gpr32
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
index fb6575cc0ee83..10fc431b07b18 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -587,8 +587,8 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A,
 ; CHECK-SD-NEXT:    mov w10, w2
 ; CHECK-SD-NEXT:    b.hi .LBB5_4
 ; CHECK-SD-NEXT:  // %bb.2:
-; CHECK-SD-NEXT:    mov x11, xzr
 ; CHECK-SD-NEXT:    mov w8, wzr
+; CHECK-SD-NEXT:    mov x11, xzr
 ; CHECK-SD-NEXT:    b .LBB5_7
 ; CHECK-SD-NEXT:  .LBB5_3:
 ; CHECK-SD-NEXT:    mov w8, wzr
diff --git a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
index 37a7782caeed9..cab6fba59cbd1 100644
--- a/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
+++ b/llvm/test/CodeGen/AArch64/atomicrmw-O0.ll
@@ -45,7 +45,7 @@ define i8 @test_rmw_add_8(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_8:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddalb w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -94,7 +94,7 @@ define i16 @test_rmw_add_16(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_16:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddalh w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -143,7 +143,7 @@ define i32 @test_rmw_add_32(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_32:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    ldaddal w8, w0, [x0]
 ; LSE-NEXT:    ret
 entry:
@@ -192,7 +192,7 @@ define i64 @test_rmw_add_64(ptr %dst)   {
 ;
 ; LSE-LABEL: test_rmw_add_64:
 ; LSE:       // %bb.0: // %entry
-; LSE-NEXT:    mov w8, #1
+; LSE-NEXT:    mov w8, #1 // =0x1
 ; LSE-NEXT:    // kill: def $x8 killed $w8
 ; LSE-NEXT:    ldaddal x8, x0, [x0]
 ; LSE-NEXT:    ret
@@ -207,16 +207,16 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; NOLSE-NEXT:    sub sp, sp, #48
 ; NOLSE-NEXT:    .cfi_def_cfa_offset 48
 ; NOLSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT:    ldr x8, [x0, #8]
-; NOLSE-NEXT:    ldr x9, [x0]
+; NOLSE-NEXT:    ldr x9, [x0, #8]
+; NOLSE-NEXT:    ldr x8, [x0]
 ; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b .LBB4_1
 ; NOLSE-NEXT:  .LBB4_1: // %atomicrmw.start
 ; NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; NOLSE-NEXT:    // Child Loop BB4_2 Depth 2
-; NOLSE-NEXT:    ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
 ; NOLSE-NEXT:    ldr x9, [sp, #24] // 8-byte Folded Reload
 ; NOLSE-NEXT:    adds x14, x11, #1
 ; NOLSE-NEXT:    cinc x15, x13, hs
@@ -246,8 +246,8 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; NOLSE-NEXT:    str x9, [sp, #16] // 8-byte Folded Spill
 ; NOLSE-NEXT:    subs x12, x12, x13
 ; NOLSE-NEXT:    ccmp x10, x11, #0, eq
-; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b.ne .LBB4_1
 ; NOLSE-NEXT:    b .LBB4_6
 ; NOLSE-NEXT:  .LBB4_6: // %atomicrmw.end
@@ -261,15 +261,15 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; LSE-NEXT:    sub sp, sp, #48
 ; LSE-NEXT:    .cfi_def_cfa_offset 48
 ; LSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT:    ldr x8, [x0, #8]
-; LSE-NEXT:    ldr x9, [x0]
+; LSE-NEXT:    ldr x9, [x0, #8]
+; LSE-NEXT:    ldr x8, [x0]
 ; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; LSE-NEXT:    b .LBB4_1
 ; LSE-NEXT:  .LBB4_1: // %atomicrmw.start
 ; LSE-NEXT:    // =>This Inner Loop Header: Depth=1
-; LSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT:    ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x10, [sp, #40] // 8-byte Folded Reload
 ; LSE-NEXT:    ldr x8, [sp, #24] // 8-byte Folded Reload
 ; LSE-NEXT:    mov x0, x10
 ; LSE-NEXT:    mov x1, x11
@@ -284,8 +284,8 @@ define i128 @test_rmw_add_128(ptr %dst)   {
 ; LSE-NEXT:    str x8, [sp, #16] // 8-byte Folded Spill
 ; LSE-NEXT:    subs x11, x8, x11
 ; LSE-NEXT:    ccmp x9, x10, #0, eq
-; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    b.ne .LBB4_1
 ; LSE-NEXT:    b .LBB4_2
 ; LSE-NEXT:  .LBB4_2: // %atomicrmw.end
@@ -597,23 +597,23 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; NOLSE-NEXT:    sub sp, sp, #48
 ; NOLSE-NEXT:    .cfi_def_cfa_offset 48
 ; NOLSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; NOLSE-NEXT:    ldr x8, [x0, #8]
-; NOLSE-NEXT:    ldr x9, [x0]
+; NOLSE-NEXT:    ldr x9, [x0, #8]
+; NOLSE-NEXT:    ldr x8, [x0]
 ; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b .LBB9_1
 ; NOLSE-NEXT:  .LBB9_1: // %atomicrmw.start
 ; NOLSE-NEXT:    // =>This Loop Header: Depth=1
 ; NOLSE-NEXT:    // Child Loop BB9_2 Depth 2
-; NOLSE-NEXT:    ldr x13, [sp, #40] // 8-byte Folded Reload
-; NOLSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x13, [sp, #32] // 8-byte Folded Reload
+; NOLSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
 ; NOLSE-NEXT:    ldr x9, [sp, #24] // 8-byte Folded Reload
 ; NOLSE-NEXT:    mov w8, w11
 ; NOLSE-NEXT:    mvn w10, w8
 ; NOLSE-NEXT:    // implicit-def: $x8
 ; NOLSE-NEXT:    mov w8, w10
 ; NOLSE-NEXT:    orr x14, x8, #0xfffffffffffffffe
-; NOLSE-NEXT:    mov x15, #-1
+; NOLSE-NEXT:    mov x15, #-1 // =0xffffffffffffffff
 ; NOLSE-NEXT:  .LBB9_2: // %atomicrmw.start
 ; NOLSE-NEXT:    // Parent Loop BB9_1 Depth=1
 ; NOLSE-NEXT:    // => This Inner Loop Header: Depth=2
@@ -640,8 +640,8 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; NOLSE-NEXT:    str x9, [sp, #16] // 8-byte Folded Spill
 ; NOLSE-NEXT:    subs x12, x12, x13
 ; NOLSE-NEXT:    ccmp x10, x11, #0, eq
-; NOLSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; NOLSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; NOLSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; NOLSE-NEXT:    b.ne .LBB9_1
 ; NOLSE-NEXT:    b .LBB9_6
 ; NOLSE-NEXT:  .LBB9_6: // %atomicrmw.end
@@ -655,15 +655,15 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    sub sp, sp, #48
 ; LSE-NEXT:    .cfi_def_cfa_offset 48
 ; LSE-NEXT:    str x0, [sp, #24] // 8-byte Folded Spill
-; LSE-NEXT:    ldr x8, [x0, #8]
-; LSE-NEXT:    ldr x9, [x0]
+; LSE-NEXT:    ldr x9, [x0, #8]
+; LSE-NEXT:    ldr x8, [x0]
 ; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
 ; LSE-NEXT:    b .LBB9_1
 ; LSE-NEXT:  .LBB9_1: // %atomicrmw.start
 ; LSE-NEXT:    // =>This Inner Loop Header: Depth=1
-; LSE-NEXT:    ldr x11, [sp, #40] // 8-byte Folded Reload
-; LSE-NEXT:    ldr x10, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x11, [sp, #32] // 8-byte Folded Reload
+; LSE-NEXT:    ldr x10, [sp, #40] // 8-byte Folded Reload
 ; LSE-NEXT:    ldr x8, [sp, #24] // 8-byte Folded Reload
 ; LSE-NEXT:    mov x0, x10
 ; LSE-NEXT:    mov x1, x11
@@ -672,7 +672,7 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    // implicit-def: $x9
 ; LSE-NEXT:    mov w9, w12
 ; LSE-NEXT:    orr x2, x9, #0xfffffffffffffffe
-; LSE-NEXT:    mov x9, #-1
+; LSE-NEXT:    mov x9, #-1 // =0xffffffffffffffff
 ; LSE-NEXT:    // kill: def $x2 killed $x2 def $x2_x3
 ; LSE-NEXT:    mov x3, x9
 ; LSE-NEXT:    caspal x0, x1, x2, x3, [x8]
@@ -682,8 +682,8 @@ define i128 @test_rmw_nand_128(ptr %dst)   {
 ; LSE-NEXT:    str x8, [sp, #16] // 8-byte Folded Spill
 ; LSE-NEXT:    subs x11, x8, x11
 ; LSE-NEXT:    ccmp x9, x10, #0, eq
-; LSE-NEXT:    str x9, [sp, #32] // 8-byte Folded Spill
-; LSE-NEXT:    str x8, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x9, [sp, #40] // 8-byte Folded Spill
+; LSE-NEXT:    str x8, [sp, #32] // 8-byte Folded Spill
 ; LSE-NEXT:    b.ne .LBB9_1
 ; LSE-NEXT:    b .LBB9_2
 ; LSE-NEXT:  .LBB9_2: // %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
index 43d49da1abd21..b0339222bc2df 100644
--- a/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
+++ b/llvm/test/CodeGen/AArch64/bfis-in-loop.ll
@@ -14,8 +14,8 @@ define i64 @bfi...
[truncated]

@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch from 1f7635b to 3593737 Compare March 20, 2025 12:26
@guy-david guy-david requested a review from arsenm March 20, 2025 12:26
@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch 2 times, most recently from 0ae66b8 to d87dc5b Compare March 24, 2025 07:57
@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch from d87dc5b to c7d638d Compare March 30, 2025 07:50
@guy-david
Copy link
Contributor Author

ping :)

@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch from c7d638d to 8848f2e Compare April 2, 2025 15:38
@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch 2 times, most recently from e549696 to 045edd6 Compare April 20, 2025 19:43
@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch from 045edd6 to 5d768f6 Compare April 28, 2025 09:38
@guy-david guy-david force-pushed the users/guy-david/phi-elimination-reuse-copy branch from 5d768f6 to 8cb3107 Compare June 26, 2025 14:17
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 29, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 10 "build stage 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/10862

Here is the relevant piece of the build log for the reference
Step 10 (build stage 2) failure: 'ninja' (failure)
...
[2413/6442] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/CodeGenPassBuilder.cpp.o
[2414/6442] Building CXX object lib/Transforms/Coroutines/CMakeFiles/LLVMCoroutines.dir/CoroSplit.cpp.o
[2415/6442] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/Dominators.cpp.o
[2416/6442] Building CXX object lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/GlobalOpt.cpp.o
[2417/6442] Building CXX object lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilderBindings.cpp.o
[2418/6442] Building CXX object utils/TableGen/CMakeFiles/llvm-tblgen.dir/GlobalISelEmitter.cpp.o
[2419/6442] Building CXX object lib/Transforms/Scalar/CMakeFiles/LLVMScalarOpts.dir/LoopIdiomRecognize.cpp.o
[2420/6442] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineBasicBlock.cpp.o
[2421/6442] Building CXX object lib/Transforms/Scalar/CMakeFiles/LLVMScalarOpts.dir/ConstraintElimination.cpp.o
[2422/6442] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o
FAILED: lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o 
ccache /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/lib/CodeGen -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/CodeGen -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fPIC  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o -MF lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o.d -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o -c /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/CodeGen/LiveDebugVariables.cpp
clang++: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/include/llvm/CodeGen/Register.h:83: unsigned int llvm::Register::virtRegIndex() const: Assertion `isVirtual() && "Not a virtual register"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -std=c++17 -fPIC -fno-exceptions -funwind-tables -fno-rtti -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/lib/CodeGen -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/CodeGen -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/include -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/include -DNDEBUG -UNDEBUG -c -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/LiveDebugVariables.cpp.o /home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/CodeGen/LiveDebugVariables.cpp
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/llvm/llvm/lib/CodeGen/LiveDebugVariables.cpp'.
4.	Running pass 'Register Coalescer' on function '@_ZN4llvm18LiveDebugVariables7LDVImpl18collectDebugValuesERNS_15MachineFunctionEb'
 #0 0x00007fff8def6e40 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMSupport.so.21.0git+0x256e40)
 #1 0x00007fff8def4754 llvm::sys::CleanupOnSignal(unsigned long) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMSupport.so.21.0git+0x254754)
 #2 0x00007fff8dd95858 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007fff9bbe04d8 (linux-vdso64.so.1+0x4d8)
 #4 0x00007fff8d71a4c8 raise (/lib64/libc.so.6+0x4a4c8)
 #5 0x00007fff8d6f4a54 abort (/lib64/libc.so.6+0x24a54)
 #6 0x00007fff8d70dcb0 __assert_fail_base (/lib64/libc.so.6+0x3dcb0)
 #7 0x00007fff8d70dd54 __assert_fail (/lib64/libc.so.6+0x3dd54)
 #8 0x00007fff922a2154 llvm::VirtReg2IndexFunctor::operator()(llvm::Register) const (.isra.72.part.73) InlineSpiller.cpp:0:0
 #9 0x00007fff922a382c llvm::MachineRegisterInfo::getRegClass(llvm::Register) const (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMCodeGen.so.21.0git+0x31382c)
#10 0x00007fff926cb3e0 llvm::CoalescerPair::setRegisters(llvm::MachineInstr const*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMCodeGen.so.21.0git+0x73b3e0)
#11 0x00007fff926d6d14 (anonymous namespace)::RegisterCoalescer::copyCoalesceWorkList(llvm::MutableArrayRef<llvm::MachineInstr*>) RegisterCoalescer.cpp:0:0
#12 0x00007fff926dd1f0 (anonymous namespace)::RegisterCoalescer::run(llvm::MachineFunction&) RegisterCoalescer.cpp:0:0
#13 0x00007fff926de290 (anonymous namespace)::RegisterCoalescerLegacy::runOnMachineFunction(llvm::MachineFunction&) RegisterCoalescer.cpp:0:0
#14 0x00007fff92424e80 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.100) MachineFunctionPass.cpp:0:0
#15 0x00007fff8e40fb5c llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMCore.so.21.0git+0x32fb5c)
#16 0x00007fff8e40feb8 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMCore.so.21.0git+0x32feb8)
#17 0x00007fff8e4110dc llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMCore.so.21.0git+0x3310dc)
#18 0x00007fff92eedfdc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangCodeGen.so.21.0git+0x14dfdc)
#19 0x00007fff9335fa1c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangCodeGen.so.21.0git+0x5bfa1c)
#20 0x00007fff89f54204 clang::ParseAST(clang::Sema&, bool, bool) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/../lib/libclangParse.so.21.0git+0x44204)
#21 0x00007fff90ff920c clang::ASTFrontendAction::ExecuteAction() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangFrontend.so.21.0git+0x17920c)
#22 0x00007fff93360700 clang::CodeGenAction::ExecuteAction() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangCodeGen.so.21.0git+0x5c0700)
#23 0x00007fff910000c8 clang::FrontendAction::Execute() (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangFrontend.so.21.0git+0x1800c8)
#24 0x00007fff90f79340 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangFrontend.so.21.0git+0xf9340)
#25 0x00007fff952a6094 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libclangFrontendTool.so.21.0git+0x6094)
#26 0x000000001001b2ac cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/clang+++0x1001b2ac)
#27 0x00000000100107e0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#28 0x00007fff90b0c688 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0

mtrofin added a commit to mtrofin/llvm-project that referenced this pull request Jun 30, 2025
mtrofin added a commit that referenced this pull request Jun 30, 2025
@mikaelholmen
Copy link
Collaborator

Hello @guy-david

The following starts crashing with this patch:

llc -O0 -o /dev/null bbi-108462.ll -enable-subreg-liveness=1 -optimize-regalloc -mtriple=aarch64-none-linux-gnu

It crashes like

llc: ../include/llvm/CodeGen/Register.h:83: unsigned int llvm::Register::virtRegIndex() const: Assertion `isVirtual() && "Not a virtual register"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/llc -O0 -o /dev/null bbi-108462.ll -enable-subreg-liveness=1 -optimize-regalloc -mtriple=aarch64-none-linux-gnu
1.	Running pass 'Function Pass Manager' on module 'bbi-108462.ll'.
2.	Running pass 'Register Coalescer' on function '@f3'
 #0 0x000055cf75860066 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/llc+0x76a7066)
 #1 0x000055cf7585db85 llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x76a4b85)
 #2 0x000055cf75860799 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f433c539d10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f4339ed952f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f4339eace65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f4339eacd39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f4339ed1e86 (/lib64/libc.so.6+0x46e86)
 #8 0x000055cf74ac72d5 llvm::CoalescerPair::setRegisters(llvm::MachineInstr const*) (build-all/bin/llc+0x690e2d5)
 #9 0x000055cf74acc643 (anonymous namespace)::RegisterCoalescer::joinCopy(llvm::MachineInstr*, bool&, llvm::SmallPtrSetImpl<llvm::MachineInstr*>&) RegisterCoalescer.cpp:0:0
#10 0x000055cf74acbd38 (anonymous namespace)::RegisterCoalescer::copyCoalesceWorkList(llvm::MutableArrayRef<llvm::MachineInstr*>) RegisterCoalescer.cpp:0:0
#11 0x000055cf74ac9220 (anonymous namespace)::RegisterCoalescer::run(llvm::MachineFunction&) RegisterCoalescer.cpp:0:0
#12 0x000055cf74aca7a6 (anonymous namespace)::RegisterCoalescerLegacy::runOnMachineFunction(llvm::MachineFunction&) RegisterCoalescer.cpp:0:0
#13 0x000055cf74879de7 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build-all/bin/llc+0x66c0de7)
#14 0x000055cf74dd6209 llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x6c1d209)
#15 0x000055cf74dde7e2 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x6c257e2)
#16 0x000055cf74dd6cc8 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x6c1dcc8)
#17 0x000055cf72819c70 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#18 0x000055cf72817380 main (build-all/bin/llc+0x465e380)
#19 0x00007f4339ec57e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#20 0x000055cf728167ee _start (build-all/bin/llc+0x465d7ee)
Abort (core dumped)

I originally saw the same crash without any special flags for my out-of-tree target and then saw I could reproduce on aarch64 (with added flags) as well.

PHI elimination tturns

dead %5:gpr64 = COPY $xzr

into

dead $noreg = COPY $xzr

and I think that is what then trips the coalescer over.
bbi-108462.ll.gz

@guy-david
Copy link
Contributor Author

Thanks for looking into this, issued a fix in: #146320.

@mikaelholmen
Copy link
Collaborator

Thanks for looking into this, issued a fix in: #146320.

Thanks, that fix seems to solve that problem.

It looks like there are other problems as well though. I don't have a reproducer I can share now but if we have virtual registers of two different register classes "32BitRC" with 32 bit registers and "16BitRC" with 16 bit registers it looks like it turns

  %8:32BitRC = COPY %7:32BitRC
  [...]
  %2:16BitRC = PHI %8.low16:32BitRC, %bb.0, %1:16BitRC, %bb.1

into

  %9:16BitRC = COPY %7:32BitRC
  [...]
  %2:16BitRC = COPY killed %9:16BitRC

i.e. it ignores the sub register in the PHI?

guy-david added a commit that referenced this pull request Jun 30, 2025
PR which introduced the bug:
#131837.
Fixes a crash around dead registers which started in f5c62ee by
verifying that the reused incoming register is also virtual.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 30, 2025
PR which introduced the bug:
llvm/llvm-project#131837.
Fixes a crash around dead registers which started in f5c62ee by
verifying that the reused incoming register is also virtual.
@mikaelholmen
Copy link
Collaborator

It looks like there are other problems as well though. I don't have a reproducer I can share now

I fiddled with the repro for my out-of-tree target and changed it to something for aarch64:
llc bbi-108462_2_aarch64.mir -mtriple=aarch64 -o - -run-pass phi-node-elimination

Now, I don't know aarch64 and its register classes but the sub register access "%1.sub_32" is just dropped in the output.
For my target at least, that is wrong because in the case where it originally broke, the sub registers specifies if we should use the high or low 16 bits of the 32 bit operand.

bbi-108462_2_aarch64.mir.gz

@guy-david
Copy link
Contributor Author

guy-david commented Jun 30, 2025

Now I feel bad 😆 Can you verify whether c4c9e0e solves the issue?

@mikaelholmen
Copy link
Collaborator

Now I feel bad 😆 Can you verify whether c4c9e0e solves the issue?

It does. Thanks! :)

@jayfoad
Copy link
Contributor

jayfoad commented Jun 30, 2025

The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.

This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.

The COPY you reuse is dead (right?) so why is reusing it any better than inserting a new one and allowing the dead one to be DCEd? (Or, why wasn't the dead COPY already DCEd before we got to this point?)

// Reuse an existing copy in the block if possible.
if (MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg)) {
if (DefMI->isCopy() && DefMI->getParent() == &opBlock &&
MRI->use_empty(SrcReg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_nodbg_empty to avoid debug instruction effects

@guy-david
Copy link
Contributor Author

guy-david commented Jun 30, 2025

The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.
This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.

The COPY you reuse is dead (right?) so why is reusing it any better than inserting a new one and allowing the dead one to be DCEd? (Or, why wasn't the dead COPY already DCEd before we got to this point?)

That's an unfortunate edge case for the problem I was trying to solve, for which I added a regression test in #146320. In the original issue there were no dead instructions.

@jayfoad
Copy link
Contributor

jayfoad commented Jun 30, 2025

The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.
This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.

The COPY you reuse is dead (right?) so why is reusing it any better than inserting a new one and allowing the dead one to be DCEd? (Or, why wasn't the dead COPY already DCEd before we got to this point?)

That's an unfortunate edge case for the problem I was trying to solve, for which I added a regression test in #146320. In the original issue there were no dead instructions.

No, I mean in your patch DefMI is a COPY which defines SrcReg which has no uses, therefore it's a dead COPY, right?

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jun 30, 2025
@guy-david
Copy link
Contributor Author

guy-david commented Jun 30, 2025

The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit.
This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.

The COPY you reuse is dead (right?) so why is reusing it any better than inserting a new one and allowing the dead one to be DCEd? (Or, why wasn't the dead COPY already DCEd before we got to this point?)

That's an unfortunate edge case for the problem I was trying to solve, for which I added a regression test in #146320. In the original issue there were no dead instructions.

No, I mean in your patch DefMI is a COPY which defines SrcReg which has no uses, therefore it's a dead COPY, right?

The user is the PHI node which is being removed in flight.

@mikaelholmen
Copy link
Collaborator

Hi @guy-david,

Another verifier error like this:

llc bbi-108462_3_aarch64.mir -o - -verify-machineinstrs -run-pass livevars,phi-node-elimination -mtriple=aarch64

It fails with:

# After Eliminate PHI nodes for register allocation
# Machine code for function main: NoPHIs, TracksLiveness

bb.0:
  successors: %bb.1(0x80000000); %bb.1(100.00%)
  liveins: $w0, $w1, $nzcv
  %0:gpr32 = COPY killed $w0
  %4:gpr32 = COPY killed $w1
  B %bb.1

bb.1:
; predecessors: %bb.0, %bb.1, %bb.2
  successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
  liveins: $nzcv
  dead %2:gpr32 = COPY killed %4:gpr32
  %4:gpr32 = COPY %0:gpr32
  Bcc 1, %bb.1, implicit $nzcv

bb.2:
; predecessors: %bb.1
  successors: %bb.1(0x80000000); %bb.1(100.00%)
  liveins: $nzcv
  %4:gpr32 = IMPLICIT_DEF
  B %bb.1

# End machine code for function main.

*** Bad machine code: LiveVariables: Block should not be in AliveBlocks ***
- function:    main
- basic block: %bb.2  (0x5620b8ee9ba0)
Virtual register %3 is not needed live through the block.
LLVM ERROR: Found 1 machine code errors.

bbi-108462_3_aarch64.mir.gz

@mstorsjo
Copy link
Member

mstorsjo commented Jul 1, 2025

I'm also running into miscompilations caused by this in libvpx, for armv7 targets, with the latest git main of llvm.

The reproducer for the miscompile is https://martin.st/temp/y4minput-preproc.c, compiled like this:

$ clang -target armv7-w64-mingw32 y4minput-preproc.c -c -o y4minput.c.o -O2

@guy-david
Copy link
Contributor Author

guy-david commented Jul 1, 2025

I'm also running into miscompilations caused by this in libvpx, for armv7 targets, with the latest git main of llvm.

The reproducer for the miscompile is https://martin.st/temp/y4minput-preproc.c, compiled like this:

$ clang -target armv7-w64-mingw32 y4minput-preproc.c -c -o y4minput.c.o -O2

Sorry for the inconvenience. I was not able to reproduce locally, can you test whether #146337 fixes the issue?

rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…#131837)

The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
PR which introduced the bug:
llvm#131837.
Fixes a crash around dead registers which started in f5c62ee by
verifying that the reused incoming register is also virtual.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
…#131837)

The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
rlavaee pushed a commit to rlavaee/llvm-project that referenced this pull request Jul 1, 2025
PR which introduced the bug:
llvm#131837.
Fixes a crash around dead registers which started in f5c62ee by
verifying that the reused incoming register is also virtual.
guy-david added a commit that referenced this pull request Jul 1, 2025
…ass, update livevars. (#146337)

Follow up to the second bug that
#131837 introduced, described
in
#131837 (comment).
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 1, 2025
…register class, update livevars. (#146337)

Follow up to the second bug that
llvm/llvm-project#131837 introduced, described
in
llvm/llvm-project#131837 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants