SplitKit: Take register class directly from instruction definition #129727

arsenm · 2025-03-04T15:44:48Z

This fixes an expensive chesk failure after 8476a5d. The issue
was essentially that getRegClassConstraintEffectForVReg was not doing
anything useful, sometimes. If the register passed to it is not present
in the instruction, it is a no-op and returns the original classe. The
Edit->getReg() register may not be the register as it appears in either
the use or def instruction. It may be some split register, so take
the register directly from the instruction being rematerialized.

Also directly query the constraint from the def instruction, with a
hardcoded operand index. This isn't ideal, but all the other rematerialize
code makes the same assumption.

So far I've been unable to reproduce this with a standalone MIR test. In the
original case, stop-before=greedy and running the one pass is not working.

arsenm · 2025-03-04T15:45:08Z

SplitKit: Take register class directly from instruction definition #129727 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-03-04T15:45:23Z

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

This fixes an expensive chesk failure after 8476a5d. The issue
was essentially that getRegClassConstraintEffectForVReg was not doing
anything useful, sometimes. If the register passed to it is not present
in the instruction, it is a no-op and returns the original classe. The
Edit->getReg() register may not be the register as it appears in either
the use or def instruction. It may be some split register, so take
the register directly from the instruction being rematerialized.

Also directly query the constraint from the def instruction, with a
hardcoded operand index. This isn't ideal, but all the other rematerialize
code makes the same assumption.

So far I've been unable to reproduce this with a standalone MIR test. In the
original case, stop-before=greedy and running the one pass is not working.

Patch is 20.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129727.diff

6 Files Affected:

(modified) llvm/lib/CodeGen/SplitKit.cpp (+13-11)
(added) llvm/test/CodeGen/ARM/splitkit-remat-regclass-constraint-regression.ll (+39)
(modified) llvm/test/CodeGen/ARM/splitkit.ll (+15-1)
(modified) llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll (+28-28)
(modified) llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll (+36-36)
(modified) llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir (+2-3)

diff --git a/llvm/lib/CodeGen/SplitKit.cpp b/llvm/lib/CodeGen/SplitKit.cpp
index 63ca45b2e69ad..332c42abfea3f 100644
--- a/llvm/lib/CodeGen/SplitKit.cpp
+++ b/llvm/lib/CodeGen/SplitKit.cpp
@@ -598,25 +598,27 @@ bool SplitEditor::rematWillIncreaseRestriction(const MachineInstr *DefMI,
   if (!UseMI)
     return false;
 
-  Register Reg = Edit->getReg();
-  const TargetRegisterClass *RC = MRI.getRegClass(Reg);
+  // Currently code assumes rematerialization only happens for a def at 0.
+  const unsigned DefOperandIdx = 0;
+  // We want to compute the static register class constraint for the instruction
+  // def. If it is a smaller subclass than getLargestLegalSuperClass at the use
+  // site, then rematerializing it will increase the constraints.
+  const TargetRegisterClass *DefConstrainRC =
+      DefMI->getRegClassConstraint(DefOperandIdx, &TII, &TRI);
+  if (!DefConstrainRC)
+    return false;
+
+  const TargetRegisterClass *RC = MRI.getRegClass(Edit->getReg());
 
   // We want to find the register class that can be inflated to after the split
   // occurs in recomputeRegClass
   const TargetRegisterClass *SuperRC =
       TRI.getLargestLegalSuperClass(RC, *MBB.getParent());
 
-  // We want to compute the static register class constraint for the instruction
-  // def. If it is a smaller subclass than getLargestLegalSuperClass at the use
-  // site, then rematerializing it will increase the constraints.
-  const TargetRegisterClass *DefConstrainRC =
-      DefMI->getRegClassConstraintEffectForVReg(Reg, SuperRC, &TII, &TRI,
-                                                /*ExploreBundle=*/true);
-
+  Register DefReg = DefMI->getOperand(DefOperandIdx).getReg();
   const TargetRegisterClass *UseConstrainRC =
-      UseMI->getRegClassConstraintEffectForVReg(Reg, SuperRC, &TII, &TRI,
+      UseMI->getRegClassConstraintEffectForVReg(DefReg, SuperRC, &TII, &TRI,
                                                 /*ExploreBundle=*/true);
-
   return UseConstrainRC->hasSubClass(DefConstrainRC);
 }
 
diff --git a/llvm/test/CodeGen/ARM/splitkit-remat-regclass-constraint-regression.ll b/llvm/test/CodeGen/ARM/splitkit-remat-regclass-constraint-regression.ll
new file mode 100644
index 0000000000000..1ef6322bb48c8
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/splitkit-remat-regclass-constraint-regression.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -verify-machineinstrs -stress-regalloc=2 -o - %s | FileCheck %s
+
+; Check that no register class constraint error is produced during
+; rematerialization
+
+target triple = "thumbv7-apple-ios"
+
+declare ptr @_Znwm()
+
+define void @func() {
+; CHECK-LABEL: func:
+; CHECK:       @ %bb.0: @ %bb14
+; CHECK-NEXT:    str lr, [sp, #-4]!
+; CHECK-NEXT:    movs r0, #0
+; CHECK-NEXT:    movs r1, #4
+; CHECK-NEXT:    str r0, [r1]
+; CHECK-NEXT:    movs r1, #8
+; CHECK-NEXT:    str r0, [r1]
+; CHECK-NEXT:    str r0, [r0]
+; CHECK-NEXT:    bl __Znwm
+; CHECK-NEXT:    movs r1, #0
+; CHECK-NEXT:    movs r0, #4
+; CHECK-NEXT:    str r1, [r0]
+; CHECK-NEXT:    movs r0, #8
+; CHECK-NEXT:    str r1, [r0]
+; CHECK-NEXT:    str r1, [r1]
+; CHECK-NEXT:    ldr lr, [sp], #4
+; CHECK-NEXT:    bx lr
+bb14:
+  call void @llvm.memset.p0.i32(ptr null, i8 0, i32 12, i1 false)
+  %tmp34 = call ptr @_Znwm()
+  call void @llvm.memset.p0.i32(ptr null, i8 0, i32 12, i1 false)
+  ret void
+}
+
+declare void @llvm.memset.p0.i32(ptr writeonly captures(none), i8, i32, i1 immarg) #0
+
+attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: write) }
diff --git a/llvm/test/CodeGen/ARM/splitkit.ll b/llvm/test/CodeGen/ARM/splitkit.ll
index d9db9f7cf4151..6bf3a64b3ed3e 100644
--- a/llvm/test/CodeGen/ARM/splitkit.ll
+++ b/llvm/test/CodeGen/ARM/splitkit.ll
@@ -94,7 +94,7 @@ declare ptr @bar(ptr returned)
 
 declare i32 @__cxa_atexit(ptr, ptr, ptr)
 
-declare ptr @wobble(ptr returned, ptr ) 
+declare ptr @wobble(ptr returned, ptr )
 
 declare i32 @quux(...)
 
@@ -236,6 +236,20 @@ bbunwind:
   resume { ptr, i32 } undef
 }
 
+; CHECK-LABEL: func_reduced_remat_regclass_error:
+define void @func_reduced_remat_regclass_error(ptr %global.10, ptr %global.15) {
+bb14:
+  store i32 999, ptr %global.10, align 4
+  call void @llvm.memset.p0.i32(ptr null, i8 0, i32 12, i1 false)
+  call void @llvm.memcpy.p0.p0.i32(ptr null, ptr null, i32 60, i1 false)
+  %tmp34 = call ptr @_Znwm()
+  store i32 999, ptr %global.15, align 4
+  call void @llvm.memcpy.p0.p0.i32(ptr %global.10, ptr null, i32 52, i1 false)
+  call void @llvm.memset.p0.i32(ptr null, i8 0, i32 12, i1 false)
+  call void @llvm.memset.p0.i32(ptr null, i8 0, i32 12, i1 false)
+  ret void
+}
+
 declare void @llvm.trap()
 
 declare void @llvm.memcpy.p0.p0.i32(ptr , ptr , i32, i1)
diff --git a/llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll b/llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
index d2faed51bc502..536a1ae3b918d 100644
--- a/llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
+++ b/llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll
@@ -241,21 +241,21 @@ define <4 x i128> @test_signed_v4i128_v4f32(<4 x float> %f) nounwind {
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    callq __fixsfti@PLT
 ; CHECK-NEXT:    movq %rdx, %r15
-; CHECK-NEXT:    xorl %edx, %edx
+; CHECK-NEXT:    xorl %r14d, %r14d
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %rdx, %rax
-; CHECK-NEXT:    movabsq $-9223372036854775808, %r14 # imm = 0x8000000000000000
-; CHECK-NEXT:    cmovbq %r14, %r15
+; CHECK-NEXT:    cmovbq %r14, %rax
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
+; CHECK-NEXT:    cmovbq %rcx, %r15
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    movabsq $9223372036854775807, %rbp # imm = 0x7FFFFFFFFFFFFFFF
 ; CHECK-NEXT:    cmovaq %rbp, %r15
 ; CHECK-NEXT:    movq $-1, %rcx
 ; CHECK-NEXT:    cmovaq %rcx, %rax
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm0
-; CHECK-NEXT:    cmovpq %rdx, %rax
+; CHECK-NEXT:    cmovpq %r14, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovpq %rdx, %r15
+; CHECK-NEXT:    cmovpq %r14, %r15
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -264,16 +264,16 @@ define <4 x i128> @test_signed_v4i128_v4f32(<4 x float> %f) nounwind {
 ; CHECK-NEXT:    movq %rdx, %r13
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    movl $0, %eax
-; CHECK-NEXT:    cmovbq %rax, %r12
-; CHECK-NEXT:    cmovbq %r14, %r13
+; CHECK-NEXT:    cmovbq %r14, %r12
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000
+; CHECK-NEXT:    cmovbq %rax, %r13
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovaq %rbp, %r13
-; CHECK-NEXT:    movq $-1, %rcx
-; CHECK-NEXT:    cmovaq %rcx, %r12
+; CHECK-NEXT:    movq $-1, %rax
+; CHECK-NEXT:    cmovaq %rax, %r12
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm0
-; CHECK-NEXT:    cmovpq %rax, %r12
-; CHECK-NEXT:    cmovpq %rax, %r13
+; CHECK-NEXT:    cmovpq %r14, %r12
+; CHECK-NEXT:    cmovpq %r14, %r13
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -1187,14 +1187,14 @@ define <8 x i128> @test_signed_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovbq %r12, %rax
-; CHECK-NEXT:    movabsq $-9223372036854775808, %r13 # imm = 0x8000000000000000
-; CHECK-NEXT:    cmovbq %r13, %rdx
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rbp # imm = 0x8000000000000000
+; CHECK-NEXT:    cmovbq %rbp, %rdx
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    movabsq $9223372036854775807, %r15 # imm = 0x7FFFFFFFFFFFFFFF
 ; CHECK-NEXT:    cmovaq %r15, %rdx
 ; CHECK-NEXT:    movq $-1, %rcx
 ; CHECK-NEXT:    cmovaq %rcx, %rax
-; CHECK-NEXT:    movq $-1, %rbp
+; CHECK-NEXT:    movq $-1, %r13
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm0
 ; CHECK-NEXT:    cmovpq %r12, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
@@ -1209,10 +1209,10 @@ define <8 x i128> @test_signed_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovbq %r12, %rax
-; CHECK-NEXT:    cmovbq %r13, %rdx
+; CHECK-NEXT:    cmovbq %rbp, %rdx
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovaq %r15, %rdx
-; CHECK-NEXT:    cmovaq %rbp, %rax
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq $-1, %r14
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm0
 ; CHECK-NEXT:    cmovpq %r12, %rax
@@ -1228,7 +1228,7 @@ define <8 x i128> @test_signed_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovbq %r12, %rax
-; CHECK-NEXT:    cmovbq %r13, %rdx
+; CHECK-NEXT:    cmovbq %rbp, %rdx
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovaq %r15, %rdx
 ; CHECK-NEXT:    cmovaq %r14, %rax
@@ -1247,7 +1247,8 @@ define <8 x i128> @test_signed_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovbq %r12, %rax
-; CHECK-NEXT:    cmovbq %r13, %rdx
+; CHECK-NEXT:    cmovbq %rbp, %rdx
+; CHECK-NEXT:    movq %rbp, %r13
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovaq %r15, %rdx
 ; CHECK-NEXT:    cmovaq %r14, %rax
@@ -1286,17 +1287,16 @@ define <8 x i128> @test_signed_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    movl $0, %eax
-; CHECK-NEXT:    cmovbq %rax, %r14
-; CHECK-NEXT:    movabsq $-9223372036854775808, %rcx # imm = 0x8000000000000000
-; CHECK-NEXT:    cmovbq %rcx, %r15
+; CHECK-NEXT:    cmovbq %r12, %r14
+; CHECK-NEXT:    movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000
+; CHECK-NEXT:    cmovbq %rax, %r15
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
 ; CHECK-NEXT:    cmovaq %r13, %r15
-; CHECK-NEXT:    movq $-1, %rcx
-; CHECK-NEXT:    cmovaq %rcx, %r14
+; CHECK-NEXT:    movq $-1, %rax
+; CHECK-NEXT:    cmovaq %rax, %r14
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm0
-; CHECK-NEXT:    cmovpq %rax, %r14
-; CHECK-NEXT:    cmovpq %rax, %r15
+; CHECK-NEXT:    cmovpq %r12, %r14
+; CHECK-NEXT:    cmovpq %r12, %r15
 ; CHECK-NEXT:    movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    callq __extendhfsf2@PLT
diff --git a/llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll b/llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
index 4d28ef7884954..4305886168abe 100644
--- a/llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
+++ b/llvm/test/CodeGen/X86/fptoui-sat-vector-128.ll
@@ -263,17 +263,17 @@ define <4 x i128> @test_unsigned_v4i128_v4f32(<4 x float> %f) nounwind {
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
 ; CHECK-NEXT:    callq __fixunssfti@PLT
 ; CHECK-NEXT:    movq %rdx, %r15
-; CHECK-NEXT:    xorl %ebp, %ebp
+; CHECK-NEXT:    xorl %r14d, %r14d
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 16-byte Reload
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm1
-; CHECK-NEXT:    cmovbq %rbp, %r15
-; CHECK-NEXT:    cmovbq %rbp, %rax
+; CHECK-NEXT:    cmovbq %r14, %r15
+; CHECK-NEXT:    cmovbq %r14, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    movq $-1, %r14
-; CHECK-NEXT:    cmovaq %r14, %rax
+; CHECK-NEXT:    movq $-1, %rbp
+; CHECK-NEXT:    cmovaq %rbp, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r14, %r15
+; CHECK-NEXT:    cmovaq %rbp, %r15
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -282,11 +282,11 @@ define <4 x i128> @test_unsigned_v4i128_v4f32(<4 x float> %f) nounwind {
 ; CHECK-NEXT:    movq %rdx, %r13
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %rbp, %r13
-; CHECK-NEXT:    cmovbq %rbp, %r12
+; CHECK-NEXT:    cmovbq %r14, %r13
+; CHECK-NEXT:    cmovbq %r14, %r12
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r14, %r12
-; CHECK-NEXT:    cmovaq %r14, %r13
+; CHECK-NEXT:    cmovaq %rbp, %r12
+; CHECK-NEXT:    cmovaq %rbp, %r13
 ; CHECK-NEXT:    movaps (%rsp), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
 ; CHECK-NEXT:    movaps %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -1149,18 +1149,18 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    callq __extendhfsf2@PLT
 ; CHECK-NEXT:    movd %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 4-byte Folded Spill
 ; CHECK-NEXT:    callq __fixunssfti@PLT
-; CHECK-NEXT:    xorl %r13d, %r13d
+; CHECK-NEXT:    xorl %r12d, %r12d
 ; CHECK-NEXT:    pxor %xmm0, %xmm0
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm1 # 4-byte Reload
 ; CHECK-NEXT:    # xmm1 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss %xmm0, %xmm1
-; CHECK-NEXT:    cmovbq %r13, %rdx
-; CHECK-NEXT:    cmovbq %r13, %rax
+; CHECK-NEXT:    cmovbq %r12, %rdx
+; CHECK-NEXT:    cmovbq %r12, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; CHECK-NEXT:    movq $-1, %r12
-; CHECK-NEXT:    cmovaq %r12, %rax
+; CHECK-NEXT:    movq $-1, %r13
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r12, %rdx
+; CHECK-NEXT:    cmovaq %r13, %rdx
 ; CHECK-NEXT:    movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[1,1,1,1]
@@ -1170,12 +1170,12 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %r13, %rdx
-; CHECK-NEXT:    cmovbq %r13, %rax
+; CHECK-NEXT:    cmovbq %r12, %rdx
+; CHECK-NEXT:    cmovbq %r12, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r12, %rax
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r12, %rdx
+; CHECK-NEXT:    cmovaq %r13, %rdx
 ; CHECK-NEXT:    movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    psrlq $48, %xmm0
@@ -1185,12 +1185,12 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %r13, %rdx
-; CHECK-NEXT:    cmovbq %r13, %rax
+; CHECK-NEXT:    cmovbq %r12, %rdx
+; CHECK-NEXT:    cmovbq %r12, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r12, %rax
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r12, %rdx
+; CHECK-NEXT:    cmovaq %r13, %rdx
 ; CHECK-NEXT:    movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    movhlps {{.*#+}} xmm0 = xmm0[1,1]
@@ -1200,12 +1200,12 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %r13, %rdx
-; CHECK-NEXT:    cmovbq %r13, %rax
+; CHECK-NEXT:    cmovbq %r12, %rdx
+; CHECK-NEXT:    cmovbq %r12, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r12, %rax
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r12, %rdx
+; CHECK-NEXT:    cmovaq %r13, %rdx
 ; CHECK-NEXT:    movq %rdx, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
 ; CHECK-NEXT:    movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    psrldq {{.*#+}} xmm0 = xmm0[10,11,12,13,14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
@@ -1216,12 +1216,12 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %r13, %rbp
-; CHECK-NEXT:    cmovbq %r13, %rax
+; CHECK-NEXT:    cmovbq %r12, %rbp
+; CHECK-NEXT:    cmovbq %r12, %rax
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r12, %rax
+; CHECK-NEXT:    cmovaq %r13, %rax
 ; CHECK-NEXT:    movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
-; CHECK-NEXT:    cmovaq %r12, %rbp
+; CHECK-NEXT:    cmovaq %r13, %rbp
 ; CHECK-NEXT:    movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    shufps {{.*#+}} xmm0 = xmm0[3,3,3,3]
 ; CHECK-NEXT:    callq __extendhfsf2@PLT
@@ -1232,11 +1232,11 @@ define <8 x i128> @test_unsigned_v8i128_v8f16(<8 x half> %f) nounwind {
 ; CHECK-NEXT:    movss {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 4-byte Reload
 ; CHECK-NEXT:    # xmm0 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovbq %r13, %r15
-; CHECK-NEXT:    cmovbq %r13, %r14
+; CHECK-NEXT:    cmovbq %r12, %r15
+; CHECK-NEXT:    cmovbq %r12, %r14
 ; CHECK-NEXT:    ucomiss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
-; CHECK-NEXT:    cmovaq %r12, %r14
-; CHECK-NEXT:    cmovaq %r12, %r15
+; CHECK-NEXT:    cmovaq %r13, %r14
+; CHECK-NEXT:    cmovaq %r13, %r15
 ; CHECK-NEXT:    movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
 ; CHECK-NEXT:    psrldq {{.*#+}} xmm0 = xmm0[14,15],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    callq __extendhfsf2@PLT
diff --git a/llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir b/llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
index fc0efd275ae25..5f05270729fde 100644
--- a/llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
+++ b/llvm/test/CodeGen/X86/statepoint-invoke-ra-enter-at-end.mir
@@ -343,9 +343,8 @@ body:             |
   ; CHECK-NEXT:   [[MOV64rm4:%[0-9]+]]:gr64 = NOT64r [[MOV64rm4]]
   ; CHECK-NEXT:   CMP64rr [[MOV64rm4]], [[COPY7]], implicit-def $eflags
   ; CHECK-NEXT:   undef [[MOV32ri1:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = MOV32ri 0
-  ; CHECK-NEXT:   [[COPY8:%[0-9]+]]:gr64 = COPY [[MOV32ri1]]
-  ; CHECK-NEXT:   [[MOV64rm4:%[0-9]+]]:gr64 = CMOV64rr [[MOV64rm4]], [[COPY8]], 4, implicit killed $eflags
-  ; CHECK-NEXT:   INLINEASM &"lock btsq $0,($1)", 1 /* sideeffect attdialect */, 4784137 /* reguse:GR64 */, [[COPY8]], 4784137 /* reguse:GR64 */, undef %56:gr64, 12 /* clobber */, implicit-def dead early-clobber $df, 12 /* clobber */, implicit-def early-clobber $fpsw, 12 /* clobber */, implicit-def dead early-clobber $eflags
+  ; CHECK-NEXT:   [[MOV64rm4:%[0-9]+]]:gr64 = CMOV64rr [[MOV64rm4]], [[MOV32ri1]], 4, implicit killed $eflags
+  ; CHECK-NEXT:   INLINEASM &"lock btsq $0,($1)", 1 /* side...
[truncated]

mikaelholmen · 2025-03-05T10:21:37Z

I've verified that this patch solves downstream problems I bisected to
#122110
which I mentioned in
#122110 (comment)

RKSimon

Add -verify-machineinstrs to the x86 sat-vector tests?

arsenm · 2025-03-06T08:37:33Z

Add -verify-machineinstrs to the x86 sat-vector tests?

I don't think it's worth it, EXPENSIVE_CHECKS is working as intended

RKSimon · 2025-03-06T08:44:07Z

OK - can you see if you can create useful test coverage from #129740?

This fixes an expensive chesk failure after 8476a5d. The issue was essentially that getRegClassConstraintEffectForVReg was not doing anything useful, sometimes. If the register passed to it is not present in the instruction, it is a no-op and returns the original classe. The Edit->getReg() register may not be the register as it appears in either the use or def instruction. It may be some split register, so take the register directly from the instruction being rematerialized. Also directly query the constraint from the def instruction, with a hardcoded operand index. This isn't ideal, but all the other rematerialize code makes the same assumption. So far I've been unable to reproduce this with a standalone MIR test. In the original case, stop-before=greedy and running the one pass is not working.

mikaelholmen · 2025-03-07T04:59:51Z

I've verified that this patch solves downstream problems I bisected to #122110 which I mentioned in #122110 (comment)

As I mentioned in
#122110 (comment)
after more downstream testing I see that while many problems we saw was fixed with this patch, there are still
verifier errors like

*** Bad machine code: Illegal virtual register for instruction ***
- function:    func_4
- basic block: %bb.32 cont299 (0x557fac4ba7e0) [6672B;7040B)
- instruction: 6680B	%1145:anh_rn = ldx_or_optionally_mv_nimm16_a16 0, 0, $noreg, 0
- operand 0:   %1145:anh_rn
Expected a aNh_0_7 register, but got a aNh_rN register

And this happens for our downstream target so I can't share a reproducer. A bit problematic. :/

arsenm · 2025-03-07T09:49:39Z

FWIW I suspect we have other constraint bugs somewhere. In principle this patch is making the code more conservative. Does your problem go away if you add a getCommonSubClass check to the use and def rc?

mikaelholmen · 2025-03-07T11:15:49Z

FWIW I suspect we have other constraint bugs somewhere. In principle this patch is making the code more conservative. Does your problem go away if you add a getCommonSubClass check to the use and def rc?

I'm happy to try something out but you'd need to be more specific about what and where in that case.

I reduced the problem down to a smallish mir-reproducer and from "-debug" printouts (with added LIS->dump() in RAGreedy::selectOrSplit) for that I see that we go from

176B	  %10:anh_0_7 = ldx_or_optionally_mv_nimm16_a16 0
[...]
288B	  nop implicit %10:anh_0_7
[...]
336B	  nop implicit %10:anh_0_7
[...]
496B	  dead %14:rn = COPY %10:anh_0_7

in the input to

176B	  dead %29:anh_0_7 = ldx_or_optionally_mv_nimm16_a16 0
[...]
232B	  %27:anh_0_7 = ldx_or_optionally_mv_nimm16_a16 0
[...]
280B	  %28:anh_rn = COPY %27:anh_0_7
288B	  nop implicit %28:anh_rn
[...]
328B	  %30:anh_rn = COPY %28:anh_rn
336B	  nop implicit %28:anh_rn
[...]
472B	  %32:anh_rn = COPY %30:anh_rn
[...]
496B	  dead %14:rn = COPY %32:anh_rn

which might still be correct, even if I don't understand why it has started using the register class "anh_rn" so much, but then after dealing with %30 we get

176B	  dead %29:anh_0_7 = ldx_or_optionally_mv_nimm16_a16 0
[...]
232B	  %27:anh_0_7 = ldx_or_optionally_mv_nimm16_a16 0
[...]
296B	  %28:anh_rn = COPY %27:anh_0_7
304B	  nop implicit %28:anh_rn
[...]
368B	  nop implicit %28:anh_rn
[...]
412B	  %46:anh_rn = ldx_or_optionally_mv_nimm16_a16 0
472B	  %32:anh_rn = COPY %46:anh_rn
[...]
512B	  dead %14:rn = COPY %32:anh_rn

and this is where the verifer protests on the %46 instruction since the ldx instruction should produce an "anh_0"_7 register and not "anh_rn".

Our machine has "a" and "r" registers and the "ldx" instruction can only write to "a" (i.e. "anh_0_7") registers and not to both "a" and "r" ("anh_rn").

(I've used "nop" instructions with implicit defs/uses as much as possible to get rid of our instruction set as much as I could in my reduced reproducer)

I also attached the regalloc log (with added LIS->dump()) in case that is interesting.
bbi-104668_2.log.gz

…lvm#129727) This fixes an expensive chesk failure after 8476a5d. The issue was essentially that getRegClassConstraintEffectForVReg was not doing anything useful, sometimes. If the register passed to it is not present in the instruction, it is a no-op and returns the original classe. The Edit->getReg() register may not be the register as it appears in either the use or def instruction. It may be some split register, so take the register directly from the instruction being rematerialized. Also directly query the constraint from the def instruction, with a hardcoded operand index. This isn't ideal, but all the other rematerialize code makes the same assumption. So far I've been unable to reproduce this with a standalone MIR test. In the original case, stop-before=greedy and running the one pass is not working.

arsenm added the llvm:regalloc label Mar 4, 2025 — with Graphite App

arsenm requested review from jayfoad, kparzysz, perlfu and qcolombet March 4, 2025 15:45

arsenm marked this pull request as ready for review March 4, 2025 15:45

llvmbot added backend:ARM backend:X86 labels Mar 4, 2025

arsenm mentioned this pull request Mar 4, 2025

SplitKit: Fix rematerialization undoing subclass based split #122110

Merged

RKSimon mentioned this pull request Mar 5, 2025

Assertion `(!R2 || (Kind <= REX2 || Kind == EVEX)) && "invalid setting" #129740

Closed

RKSimon reviewed Mar 6, 2025

View reviewed changes

qcolombet approved these changes Mar 6, 2025

View reviewed changes

Add x86 mir test

432fb2d

arsenm force-pushed the users/arsenm/regalloc/regression-expensive-checks-splitkit-no-remat-if-increases-constraints branch from 14aeafc to 432fb2d Compare March 6, 2025 10:25

arsenm merged commit b21663c into main Mar 6, 2025
11 checks passed

arsenm deleted the users/arsenm/regalloc/regression-expensive-checks-splitkit-no-remat-if-increases-constraints branch March 6, 2025 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SplitKit: Take register class directly from instruction definition #129727

SplitKit: Take register class directly from instruction definition #129727

Uh oh!

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025

Uh oh!

llvmbot commented Mar 4, 2025 •

edited

Loading

Uh oh!

mikaelholmen commented Mar 5, 2025

Uh oh!

RKSimon left a comment

Uh oh!

arsenm commented Mar 6, 2025

Uh oh!

RKSimon commented Mar 6, 2025

Uh oh!

Uh oh!

mikaelholmen commented Mar 7, 2025

Uh oh!

arsenm commented Mar 7, 2025

Uh oh!

mikaelholmen commented Mar 7, 2025

Uh oh!

Uh oh!

SplitKit: Take register class directly from instruction definition #129727

SplitKit: Take register class directly from instruction definition #129727

Uh oh!

Conversation

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025

Uh oh!

llvmbot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaelholmen commented Mar 5, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Mar 6, 2025

Uh oh!

RKSimon commented Mar 6, 2025

Uh oh!

Uh oh!

mikaelholmen commented Mar 7, 2025

Uh oh!

arsenm commented Mar 7, 2025

Uh oh!

mikaelholmen commented Mar 7, 2025

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2025 •

edited

Loading