AMDGPU/GlobalISel: Partially move constant selection to patterns #100786

arsenm · 2024-07-26T17:59:48Z

This is still relying on the manual code for splitting 64-bit
constants, and handling pointers.

We were missing some of the tablegen patterns for all immediate types,
so this has some side effect DAG path improvements. This also reduces
the diff in the 2 selector outputs.

arsenm · 2024-07-26T18:00:05Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite

llvmbot · 2024-07-26T18:00:43Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)

Changes

This is still relying on the manual code for splitting 64-bit
constants, and handling pointers.

We were missing some of the tablegen patterns for all immediate types,
so this has some side effect DAG path improvements. This also reduces
the diff in the 2 selector outputs.

Patch is 268.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100786.diff

37 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUGISel.td (+3-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+15-12)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h (+11-2)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+3-1)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll (+15-14)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll (+36-36)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/global-atomic-fadd.f64.ll (+16-16)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-constant.mir (+213-40)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-fconstant.mir (+7-7)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir (+15-15)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-global-saddr.mir (+80-112)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-smrd.mir (+3-9)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ptrmask.mir (+16-26)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll (+13-16)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mfma.gfx90a.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll (+56-48)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.set.inactive.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memset.ll (+3-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll (+56-109)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+22-22)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+9-9)
(modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+64-64)
(modified) llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa-memcpy.ll (+10-11)
(modified) llvm/test/CodeGen/AMDGPU/mmra.ll (+2-3)
(modified) llvm/test/CodeGen/AMDGPU/offset-split-flat.ll (+237-642)
(modified) llvm/test/CodeGen/AMDGPU/offset-split-global.ll (+174-492)
(modified) llvm/test/CodeGen/AMDGPU/roundeven.ll (+9-9)
(modified) llvm/test/CodeGen/AMDGPU/scalar_to_vector_v2x16.ll (+4-8)
(modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-2)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
index 537d3a43aa9fa..825c1de6c91f1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
@@ -398,8 +398,10 @@ def gi_as_i1timm : GICustomOperandRenderer<"renderTruncTImm">,
 def gi_NegateImm : GICustomOperandRenderer<"renderNegateImm">,
   GISDNodeXFormEquiv<NegateImm>;
 
-def gi_bitcast_fpimm_to_i32 : GICustomOperandRenderer<"renderBitcastImm">,
+def gi_bitcast_fpimm_to_i32 : GICustomOperandRenderer<"renderBitcastFPImm32">,
   GISDNodeXFormEquiv<bitcast_fpimm_to_i32>;
+def gi_bitcast_fpimm_to_i64 : GICustomOperandRenderer<"renderBitcastFPImm64">,
+  GISDNodeXFormEquiv<bitcast_fpimm_to_i64>;
 
 def gi_IMMPopCount : GICustomOperandRenderer<"renderPopcntImm">,
   GISDNodeXFormEquiv<IMMPopCount>;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index aaa292291334c..9a73629b0f0cd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -2504,10 +2504,19 @@ bool AMDGPUInstructionSelector::selectG_FPEXT(MachineInstr &I) const {
 }
 
 bool AMDGPUInstructionSelector::selectG_CONSTANT(MachineInstr &I) const {
+  if (selectImpl(I, *CoverageInfo))
+    return true;
+
+  // FIXME: Relying on manual selection for 64-bit case, and pointer typed
+  // constants.
   MachineBasicBlock *BB = I.getParent();
   MachineOperand &ImmOp = I.getOperand(1);
   Register DstReg = I.getOperand(0).getReg();
-  unsigned Size = MRI->getType(DstReg).getSizeInBits();
+  LLT Ty = MRI->getType(DstReg);
+  unsigned Size = Ty.getSizeInBits();
+  assert((Size == 64 || Ty.isPointer()) &&
+         "patterns should have selected this");
+
   bool IsFP = false;
 
   // The AMDGPU backend only supports Imm operands and not CImm or FPImm.
@@ -5606,18 +5615,12 @@ void AMDGPUInstructionSelector::renderNegateImm(MachineInstrBuilder &MIB,
   MIB.addImm(-MI.getOperand(1).getCImm()->getSExtValue());
 }
 
-void AMDGPUInstructionSelector::renderBitcastImm(MachineInstrBuilder &MIB,
-                                                 const MachineInstr &MI,
-                                                 int OpIdx) const {
-  assert(OpIdx == -1);
-
+void AMDGPUInstructionSelector::renderBitcastFPImm(MachineInstrBuilder &MIB,
+                                                   const MachineInstr &MI,
+                                                   int OpIdx) const {
   const MachineOperand &Op = MI.getOperand(1);
-  if (MI.getOpcode() == TargetOpcode::G_FCONSTANT)
-    MIB.addImm(Op.getFPImm()->getValueAPF().bitcastToAPInt().getZExtValue());
-  else {
-    assert(MI.getOpcode() == TargetOpcode::G_CONSTANT && "Expected G_CONSTANT");
-    MIB.addImm(Op.getCImm()->getSExtValue());
-  }
+  assert(MI.getOpcode() == TargetOpcode::G_FCONSTANT && OpIdx == -1);
+  MIB.addImm(Op.getFPImm()->getValueAPF().bitcastToAPInt().getZExtValue());
 }
 
 void AMDGPUInstructionSelector::renderPopcntImm(MachineInstrBuilder &MIB,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
index 43ed210508d33..0e98d25e36a3a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
@@ -333,8 +333,17 @@ class AMDGPUInstructionSelector final : public InstructionSelector {
   void renderNegateImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
                        int OpIdx) const;
 
-  void renderBitcastImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
-                        int OpIdx) const;
+  void renderBitcastFPImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
+                          int OpIdx) const;
+
+  void renderBitcastFPImm32(MachineInstrBuilder &MIB, const MachineInstr &MI,
+                            int OpIdx) const {
+    renderBitcastFPImm(MIB, MI, OpIdx);
+  }
+  void renderBitcastFPImm64(MachineInstrBuilder &MIB, const MachineInstr &MI,
+                            int OpIdx) const {
+    renderBitcastFPImm(MIB, MI, OpIdx);
+  }
 
   void renderPopcntImm(MachineInstrBuilder &MIB, const MachineInstr &MI,
                        int OpIdx) const;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index 00a8b159c9526..4ab903f8c9ad0 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -828,7 +828,9 @@ def InlineImmFP64 : FPImmLeaf<f64, [{
 
 class VGPRImm <dag frag> : PatLeaf<frag, [{
   return isVGPRImm(N);
-}]>;
+}]> {
+  let GISelPredicateCode = [{return true;}];
+}
 
 def NegateImm : SDNodeXForm<imm, [{
   return CurDAG->getConstant(-N->getSExtValue(), SDLoc(N), MVT::i32);
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 15078bc941292..d2101654d2acb 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2163,6 +2163,11 @@ def : GCNPat <
   (S_MOV_B32 $ga)
 >;
 
+def : GCNPat <
+  (VGPRImm<(i16 imm)>:$imm),
+  (V_MOV_B32_e32 imm:$imm)
+>;
+
 // FIXME: Workaround for ordering issue with peephole optimizer where
 // a register class copy interferes with immediate folding.  Should
 // use s_mov_b32, which can be shrunk to s_movk_i32
@@ -2229,20 +2234,15 @@ def : GCNPat <
   (S_MOV_B64 InlineImm64:$imm)
 >;
 
-// XXX - Should this use a s_cmp to set SCC?
-
 // Set to sign-extended 64-bit value (true = -1, false = 0)
-def : GCNPat <
-  (i1 imm:$imm),
-  (S_MOV_B64 (i64 (as_i64imm $imm)))
-> {
+// Set to sign-extended 64-bit value (true = -1, false = 0)
+def : GCNPat <(i1 imm:$imm),
+              (S_MOV_B64 imm:$imm)> {
   let WaveSizePredicate = isWave64;
 }
 
-def : GCNPat <
-  (i1 imm:$imm),
-  (S_MOV_B32 (i32 (as_i32imm $imm)))
-> {
+def : GCNPat <(i1 imm:$imm),
+              (S_MOV_B32 imm:$imm)> {
   let WaveSizePredicate = isWave32;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
index c701e873fdd2c..1270e69998e6c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmax.ll
@@ -501,8 +501,8 @@ define float @global_agent_atomic_fmax_ret_f32__amdgpu_no_fine_grained_memory(pt
 ; GFX7-NEXT:    s_mov_b32 s7, 0xf000
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dword v3, v[0:1], s[4:7], 0 addr64
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
 ; GFX7-NEXT:    v_mul_f32_e32 v2, 1.0, v2
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB4_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -710,8 +710,8 @@ define void @global_agent_atomic_fmax_noret_f32__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b32 s7, 0xf000
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dword v3, v[0:1], s[4:7], 0 addr64
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
 ; GFX7-NEXT:    v_mul_f32_e32 v4, 1.0, v2
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB5_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -936,7 +936,7 @@ define double @global_agent_atomic_fmax_ret_f64__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dwordx2 v[0:1], v[4:5], s[4:7], 0 addr64
 ; GFX7-NEXT:    v_max_f64 v[6:7], v[2:3], v[2:3]
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB6_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -1150,7 +1150,7 @@ define void @global_agent_atomic_fmax_noret_f64__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dwordx2 v[4:5], v[0:1], s[4:7], 0 addr64
 ; GFX7-NEXT:    v_max_f64 v[6:7], v[2:3], v[2:3]
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB7_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll
index 90110e6e0c09e..e36161a936e36 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/atomicrmw_fmin.ll
@@ -501,8 +501,8 @@ define float @global_agent_atomic_fmin_ret_f32__amdgpu_no_fine_grained_memory(pt
 ; GFX7-NEXT:    s_mov_b32 s7, 0xf000
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dword v3, v[0:1], s[4:7], 0 addr64
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
 ; GFX7-NEXT:    v_mul_f32_e32 v2, 1.0, v2
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB4_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -710,8 +710,8 @@ define void @global_agent_atomic_fmin_noret_f32__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b32 s7, 0xf000
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dword v3, v[0:1], s[4:7], 0 addr64
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
 ; GFX7-NEXT:    v_mul_f32_e32 v4, 1.0, v2
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB5_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -936,7 +936,7 @@ define double @global_agent_atomic_fmin_ret_f64__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dwordx2 v[0:1], v[4:5], s[4:7], 0 addr64
 ; GFX7-NEXT:    v_max_f64 v[6:7], v[2:3], v[2:3]
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB6_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
@@ -1150,7 +1150,7 @@ define void @global_agent_atomic_fmin_noret_f64__amdgpu_no_fine_grained_memory(p
 ; GFX7-NEXT:    s_mov_b64 s[4:5], 0
 ; GFX7-NEXT:    buffer_load_dwordx2 v[4:5], v[0:1], s[4:7], 0 addr64
 ; GFX7-NEXT:    v_max_f64 v[6:7], v[2:3], v[2:3]
-; GFX7-NEXT:    s_mov_b64 s[8:9], 0
+; GFX7-NEXT:    s_mov_b64 s[8:9], s[4:5]
 ; GFX7-NEXT:  .LBB7_1: ; %atomicrmw.start
 ; GFX7-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
index d4d5cb18bbd30..966a481b6594d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
@@ -226,15 +226,16 @@ exit:
 define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3 x i32> inreg %.WorkgroupId, <3 x i32> %.LocalInvocationId) #0 {
 ; GFX10-LABEL: single_lane_execution_attribute:
 ; GFX10:       ; %bb.0: ; %.entry
+; GFX10-NEXT:    s_mov_b32 s6, 0
 ; GFX10-NEXT:    s_getpc_b64 s[4:5]
-; GFX10-NEXT:    s_mov_b32 s12, 0
-; GFX10-NEXT:    s_mov_b32 s13, -1
-; GFX10-NEXT:    s_mov_b32 s2, s0
-; GFX10-NEXT:    s_and_b64 s[4:5], s[4:5], s[12:13]
-; GFX10-NEXT:    s_mov_b32 s3, s12
+; GFX10-NEXT:    s_mov_b32 s7, -1
+; GFX10-NEXT:    s_mov_b32 s2, s1
+; GFX10-NEXT:    s_and_b64 s[4:5], s[4:5], s[6:7]
+; GFX10-NEXT:    s_mov_b32 s1, 0
 ; GFX10-NEXT:    v_mbcnt_lo_u32_b32 v1, -1, 0
-; GFX10-NEXT:    s_or_b64 s[2:3], s[4:5], s[2:3]
-; GFX10-NEXT:    s_load_dwordx8 s[4:11], s[2:3], 0x0
+; GFX10-NEXT:    s_or_b64 s[12:13], s[4:5], s[0:1]
+; GFX10-NEXT:    s_mov_b32 s3, -1
+; GFX10-NEXT:    s_load_dwordx8 s[4:11], s[12:13], 0x0
 ; GFX10-NEXT:    v_mbcnt_hi_u32_b32 v1, -1, v1
 ; GFX10-NEXT:    v_lshlrev_b32_e32 v2, 2, v1
 ; GFX10-NEXT:    v_and_b32_e32 v3, 1, v1
@@ -248,8 +249,8 @@ define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3
 ; GFX10-NEXT:    v_cmp_eq_u32_e64 s0, 0, v2
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB4_4
 ; GFX10-NEXT:  ; %bb.1: ; %.preheader.preheader
-; GFX10-NEXT:    v_mov_b32_e32 v3, s12
-; GFX10-NEXT:    v_mov_b32_e32 v4, s12
+; GFX10-NEXT:    v_mov_b32_e32 v3, s1
+; GFX10-NEXT:    v_mov_b32_e32 v4, s1
 ; GFX10-NEXT:  .LBB4_2: ; %.preheader
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX10-NEXT:    buffer_load_dword v5, v3, s[4:7], 0 offen
@@ -261,17 +262,17 @@ define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB4_2
 ; GFX10-NEXT:  ; %bb.3: ; %.preheader._crit_edge
 ; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v2
-; GFX10-NEXT:    s_mov_b32 s13, 0
-; GFX10-NEXT:    s_or_b32 s2, s0, vcc_lo
-; GFX10-NEXT:    v_cndmask_b32_e64 v3, 0, 1, s2
+; GFX10-NEXT:    s_mov_b32 s3, 0
+; GFX10-NEXT:    s_or_b32 s1, s0, vcc_lo
+; GFX10-NEXT:    v_cndmask_b32_e64 v3, 0, 1, s1
 ; GFX10-NEXT:  .LBB4_4: ; %Flow
-; GFX10-NEXT:    s_and_b32 vcc_lo, exec_lo, s13
+; GFX10-NEXT:    s_and_b32 vcc_lo, exec_lo, s3
 ; GFX10-NEXT:    s_cbranch_vccz .LBB4_6
 ; GFX10-NEXT:  ; %bb.5: ; %.19
 ; GFX10-NEXT:    v_cndmask_b32_e64 v1, 0, 1, s0
 ; GFX10-NEXT:    v_or_b32_e32 v3, 2, v1
 ; GFX10-NEXT:  .LBB4_6: ; %.22
-; GFX10-NEXT:    v_add_lshl_u32 v0, v0, s1, 2
+; GFX10-NEXT:    v_add_lshl_u32 v0, v0, s2, 2
 ; GFX10-NEXT:    buffer_store_dword v3, v0, s[8:11], 0 offen
 ; GFX10-NEXT:    s_endpgm
 .entry:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll
index 5515de0cd2fee..f52b7c635a66f 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll
@@ -193,12 +193,12 @@ bb12:
 define amdgpu_kernel void @break_loop(i32 %arg) {
 ; CHECK-LABEL: break_loop:
 ; CHECK:       ; %bb.0: ; %bb
-; CHECK-NEXT:    s_load_dword s2, s[6:7], 0x0
-; CHECK-NEXT:    s_mov_b64 s[0:1], 0
+; CHECK-NEXT:    s_load_dword s0, s[6:7], 0x0
+; CHECK-NEXT:    ; implicit-def: $sgpr2_sgpr3
 ; CHECK-NEXT:    ; implicit-def: $vgpr1
 ; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:    v_subrev_u32_e32 v0, s2, v0
-; CHECK-NEXT:    ; implicit-def: $sgpr2_sgpr3
+; CHECK-NEXT:    v_subrev_u32_e32 v0, s0, v0
+; CHECK-NEXT:    s_mov_b64 s[0:1], 0
 ; CHECK-NEXT:    s_branch .LBB5_3
 ; CHECK-NEXT:  .LBB5_1: ; %bb4
 ; CHECK-NEXT:    ; in Loop: Header=BB5_3 Depth=1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll
index 34635b077cd92..e19e7823053c8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll
@@ -8,9 +8,9 @@ define double @v_floor_f64_ieee(double %x) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e32 v[2:3], v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -31,9 +31,9 @@ define double @v_floor_f64_ieee_nnan(double %x) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e32 v[2:3], v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_add_f64 v[0:1], v[0:1], -v[2:3]
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -51,9 +51,9 @@ define double @v_floor_f64_ieee_fneg(double %x) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e64 v[2:3], -v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -75,9 +75,9 @@ define double @v_floor_f64_nonieee(double %x) #1 {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e32 v[2:3], v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -98,9 +98,9 @@ define double @v_floor_f64_nonieee_nnan(double %x) #1 {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e32 v[2:3], v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_add_f64 v[0:1], v[0:1], -v[2:3]
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -118,9 +118,9 @@ define double @v_floor_f64_non_ieee_fneg(double %x) #1 {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e64 v[2:3], -v[0:1]
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -142,9 +142,9 @@ define double @v_floor_f64_fabs(double %x) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e64 v[2:3], |v[0:1]|
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -166,9 +166,9 @@ define double @v_floor_f64_fneg_fabs(double %x) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_fract_f64_e64 v[2:3], -|v[0:1]|
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1
+; GFX6-NEXT:    v_mov_b32_e32 v5, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], v[4:5]
 ; GFX6-NEXT:    v_cmp_o_f64_e32 vcc, v[0:1], v[0:1]
 ; GFX6-NEXT:    v_cndmask_b32_e32 v2, v2, v0, vcc
 ; GFX6-NEXT:    v_cndmask_b32_e32 v3, v3, v1, vcc
@@ -190,9 +190,9 @@ define amdgpu_ps <2 x float> @s_floor_f64(double inreg %x) {
 ; GFX6-LABEL: s_floor_f64:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    v_fract_f64_e32 v[0:1], s[2:3]
-; GFX6-NEXT:    s_mov_b32 s0, -1
-; GFX6-NEXT:    s_mov_b32 s1, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[0:1], v[0:1], s[0:1]
+; GFX6-NEXT:    v_mov_b32_e32 v2, -1
+; GFX6-NEXT:    v_mov_b32_e32 v3, 0x3fefffff
+; GFX6-NEXT:    v_min_f64 v[0:1], v[0:1], v[2:3]
 ; GFX6-NEXT:    v_cmp_o_f64_e64 vcc, s[2:3], s[2:3]
 ; GFX6-NEXT:    v_mov_b32_e32 v2, s2
 ; GFX6-NEXT:    v_mov_b32_e32 v3, s3
@@ -214,9 +214,9 @@ define amdgpu_ps <2 x float> @s_floor_f64_fneg(double inreg %x) {
 ; GFX6-LABEL: s_floor_f64_fneg:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    v_fract_f64_e64 v[0:1], -s[2:3]
-; GFX6-NEXT:    s_mov_b32 s0, -1
-; GFX6-NEXT:    s_mov_b32 s1, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[0:1], v[0:1], s[0:1]
+; GFX6-NEXT:    v_mov_b32_e32...
[truncated]

Sisyph · 2024-07-29T16:02:41Z

llvm/test/CodeGen/AMDGPU/GlobalISel/floor.f64.ll

-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, 0x3fefffff
-; GFX6-NEXT:    v_min_f64 v[2:3], v[2:3], s[4:5]
+; GFX6-NEXT:    v_mov_b32_e32 v4, -1


Is this a regression, to use vgprs instead of sgprs?

Sort of yes and sort of no. It's not slower but it would probably be best to use SGPRs when possible

Sisyph · 2024-07-29T16:08:43Z

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll

+; GFX12-NEXT:    v_mov_b32_e32 v2, -1
+; GFX12-NEXT:    v_mov_b32_e32 v3, 0x7fefffff
+; GFX12-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT:    v_min_num_f64_e32 v[0:1], v[0:1], v[2:3]


Same question, is this regressing the 64 bit case by using vgprs instead of sgprs? I don't see any fix for that in later patches in the stack.

It's the same thing. It's not slower but we probably should be trying to use SGPRs more. I think we need a replacement for SIFoldOperands to address this kind of thing, it's contextually dependent on the uses

I guess we also could make the VGPRImm predicate do a similar analysis that the DAG path does

I guess we also could make the VGPRImm predicate do a similar analysis that the DAG path does

If that fixes the issue I would be for it. Or perhaps it could be a fix to SIFoldOperands. If it's doing something different for 64 bit values or instructions? We should always be trying to save VGPRs.

The problem here is we're disregarding what regbankselect said the bank should be, but it has to be conservative to avoid violating the constant bus restriction. I don't want to have a fixup constant bus violations pass later, so this is a more complicated patch I think should be done separately

It can be a separate patch. LGTM

llvm/lib/Target/AMDGPU/SIInstructions.td

Sisyph · 2024-07-30T13:56:21Z

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.rsq.clamp.ll

+; GFX12-NEXT:    v_mov_b32_e32 v2, -1
+; GFX12-NEXT:    v_mov_b32_e32 v3, 0x7fefffff
+; GFX12-NEXT:    s_delay_alu instid0(TRANS32_DEP_1) | instid1(VALU_DEP_1)
+; GFX12-NEXT:    v_min_num_f64_e32 v[0:1], v[0:1], v[2:3]


It can be a separate patch. LGTM

arsenm · 2024-07-30T14:11:23Z

Merge activity

Jul 30, 10:11 AM EDT: @arsenm started a stack merge that includes this pull request via Graphite.
Jul 30, 10:13 AM EDT: Graphite rebased this pull request as part of a merge.
Jul 30, 10:18 AM EDT: @arsenm merged this pull request with Graphite.

This is still relying on the manual code for splitting 64-bit constants, and handling pointers. We were missing some of the tablegen patterns for all immediate types, so this has some side effect DAG path improvements. This also reduces the diff in the 2 selector outputs.

This was referenced Jul 26, 2024

AMDGPU: Cleanup immediate selection patterns #100787

Merged

AMDGPU/GlobalISel: Select all constants in tablegen #100788

Merged

arsenm added backend:AMDGPU llvm:globalisel labels Jul 26, 2024 — with Graphite App

arsenm requested review from gandhi56, jayfoad, mbrkusanin, perlfu, petar-avramovic, Pierre-vh, rovka and Sisyph July 26, 2024 18:01

arsenm marked this pull request as ready for review July 26, 2024 18:01

Sisyph reviewed Jul 29, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/amdgpu-globalisel-partially-move-constant-selection-to-patterns branch from 4a7ffeb to 1e492ae Compare July 30, 2024 07:26

Sisyph approved these changes Jul 30, 2024

View reviewed changes

arsenm added 2 commits July 30, 2024 14:12

different true16/fake16 patterns

8104287

arsenm force-pushed the users/arsenm/amdgpu-globalisel-partially-move-constant-selection-to-patterns branch from 1e492ae to 8104287 Compare July 30, 2024 14:12

arsenm merged commit b356aa3 into main Jul 30, 2024
4 of 6 checks passed

arsenm deleted the users/arsenm/amdgpu-globalisel-partially-move-constant-selection-to-patterns branch July 30, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU/GlobalISel: Partially move constant selection to patterns #100786

AMDGPU/GlobalISel: Partially move constant selection to patterns #100786

Uh oh!

arsenm commented Jul 26, 2024

Uh oh!

arsenm commented Jul 26, 2024

Uh oh!

llvmbot commented Jul 26, 2024 •

edited

Loading

Uh oh!

Sisyph Jul 29, 2024

Uh oh!

arsenm Jul 29, 2024

Uh oh!

Sisyph Jul 29, 2024

Uh oh!

arsenm Jul 29, 2024

Uh oh!

arsenm Jul 29, 2024

Uh oh!

Sisyph Jul 29, 2024

Uh oh!

arsenm Jul 30, 2024

Uh oh!

Sisyph Jul 30, 2024

Uh oh!

Uh oh!

Sisyph Jul 30, 2024

Uh oh!

arsenm commented Jul 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU/GlobalISel: Partially move constant selection to patterns #100786

AMDGPU/GlobalISel: Partially move constant selection to patterns #100786

Uh oh!

Conversation

arsenm commented Jul 26, 2024

Uh oh!

arsenm commented Jul 26, 2024

Uh oh!

llvmbot commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jul 26, 2024 •

edited

Loading

arsenm commented Jul 30, 2024 •

edited

Loading