[RISCV] Add Tune to DontSinkSplatOperands #79199

michaelmaitland · 2024-01-23T20:01:37Z

A CPU may prefer to not sink splat operands, one reason being that it could require a S2V transfer buffer to move scalars into buffers.

This commit is stacked on #79015 and will be updated to mark the p670 with this Tune once it lands.

A CPU may prefer to not sink splat operands, one reason being that it could require a S2V transfer buffer to move scalars into buffers. This is a precommit for llvm#79015.

llvmbot · 2024-01-23T20:01:58Z

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

A CPU may prefer to not sink splat operands, one reason being that it could require a S2V transfer buffer to move scalars into buffers.

This is a precommit for #79015.

Full diff: https://github.com/llvm/llvm-project/pull/79199.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+7)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+8)
(added) llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll (+353)

diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index cbb096ba20ae67..d9522eb5e2dbc8 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1082,6 +1082,13 @@ def TuneShortForwardBranchOpt
 def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
 def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
 
+// Some subtargets require a S2V transfer buffer to move scalars into vectors.
+// FIXME: Forming .vx/.vf can reduce register pressure.
+def TuneDontSinkSplatOperands
+    : SubtargetFeature<"dont-sink-splat-operands", "DontSinkSplatOperands",
+                       "true", "Don't sink splat operands to enable .vx or .vf "
+                       "instructions">;
+
 def TuneConditionalCompressedMoveFusion
     : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
                        "true", "Enable branch+c.mv fusion">;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index b41e2f40dc72f0..6737f1c1623890 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -2000,6 +2000,14 @@ bool RISCVTargetLowering::shouldSinkOperands(
   if (!I->getType()->isVectorTy() || !Subtarget.hasVInstructions())
     return false;
 
+  // Don't sink splat operands if the target prefers it. Some targets requires
+  // S2V transfer buffers and we can run out of them copying the same value
+  // repeatedly.
+  // FIXME: It could still be worth doing if it would improve vector register
+  // pressure and prevent a vector spill.
+  if (Subtarget.dontSinkSplatOperands())
+    return false;
+
   for (auto OpIdx : enumerate(I->operands())) {
     if (!canSplatOperand(I, OpIdx.index()))
       continue;
diff --git a/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll b/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
new file mode 100644
index 00000000000000..38c1ee6a9c71a5
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll
@@ -0,0 +1,353 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+f -target-abi=lp64f \
+; RUN:   -mattr=+dont-sink-splat-operands -riscv-v-vector-bits-min=128 | FileCheck %s
+
+; Test that we don't sink splat operands when compiling with dont-sink-splat-operands.
+; Each scalar register access requires a S2V transfer buffer entry. Using too many
+; limits performance.
+; FIXME: This is potentially bad for register pressure. Need a better heuristic.
+
+define void @sink_splat_add(i32* nocapture %a, i32 signext %x) {
+; CHECK-LABEL: sink_splat_add:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a1
+; CHECK-NEXT:    lui a1, 1
+; CHECK-NEXT:    add a1, a0, a1
+; CHECK-NEXT:  .LBB0_1: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vle32.v v9, (a0)
+; CHECK-NEXT:    vadd.vv v9, v9, v8
+; CHECK-NEXT:    vse32.v v9, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    bne a0, a1, .LBB0_1
+; CHECK-NEXT:  # %bb.2: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i32 0
+  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %entry
+  %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
+  %0 = getelementptr inbounds i32, i32* %a, i64 %index
+  %1 = bitcast i32* %0 to <4 x i32>*
+  %wide.load = load <4 x i32>, <4 x i32>* %1, align 4
+  %2 = add <4 x i32> %wide.load, %broadcast.splat
+  %3 = bitcast i32* %0 to <4 x i32>*
+  store <4 x i32> %2, <4 x i32>* %3, align 4
+  %index.next = add nuw i64 %index, 4
+  %4 = icmp eq i64 %index.next, 1024
+  br i1 %4, label %for.cond.cleanup, label %vector.body
+
+for.cond.cleanup:                                 ; preds = %vector.body
+  ret void
+}
+
+declare i64 @llvm.vscale.i64()
+
+define void @sink_splat_add_scalable(i32* nocapture %a, i32 signext %x) {
+; CHECK-LABEL: sink_splat_add_scalable:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    csrr a5, vlenb
+; CHECK-NEXT:    srli a2, a5, 1
+; CHECK-NEXT:    li a3, 1024
+; CHECK-NEXT:    bgeu a3, a2, .LBB1_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a3, 0
+; CHECK-NEXT:    j .LBB1_5
+; CHECK-NEXT:  .LBB1_2: # %vector.ph
+; CHECK-NEXT:    addi a3, a2, -1
+; CHECK-NEXT:    andi a4, a3, 1024
+; CHECK-NEXT:    xori a3, a4, 1024
+; CHECK-NEXT:    vsetvli a6, zero, e32, m2, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a1
+; CHECK-NEXT:    slli a5, a5, 1
+; CHECK-NEXT:    mv a6, a0
+; CHECK-NEXT:    mv a7, a3
+; CHECK-NEXT:  .LBB1_3: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vl2re32.v v10, (a6)
+; CHECK-NEXT:    vadd.vv v10, v10, v8
+; CHECK-NEXT:    vs2r.v v10, (a6)
+; CHECK-NEXT:    sub a7, a7, a2
+; CHECK-NEXT:    add a6, a6, a5
+; CHECK-NEXT:    bnez a7, .LBB1_3
+; CHECK-NEXT:  # %bb.4: # %middle.block
+; CHECK-NEXT:    beqz a4, .LBB1_7
+; CHECK-NEXT:  .LBB1_5: # %for.body.preheader
+; CHECK-NEXT:    slli a2, a3, 2
+; CHECK-NEXT:    add a2, a0, a2
+; CHECK-NEXT:    lui a3, 1
+; CHECK-NEXT:    add a0, a0, a3
+; CHECK-NEXT:  .LBB1_6: # %for.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    lw a3, 0(a2)
+; CHECK-NEXT:    add a3, a3, a1
+; CHECK-NEXT:    sw a3, 0(a2)
+; CHECK-NEXT:    addi a2, a2, 4
+; CHECK-NEXT:    bne a2, a0, .LBB1_6
+; CHECK-NEXT:  .LBB1_7: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.vscale.i64()
+  %1 = shl i64 %0, 2
+  %min.iters.check = icmp ugt i64 %1, 1024
+  br i1 %min.iters.check, label %for.body.preheader, label %vector.ph
+
+vector.ph:                                        ; preds = %entry
+  %2 = call i64 @llvm.vscale.i64()
+  %3 = shl i64 %2, 2
+  %n.mod.vf = urem i64 1024, %3
+  %n.vec = sub nsw i64 1024, %n.mod.vf
+  %broadcast.splatinsert = insertelement <vscale x 4 x i32> poison, i32 %x, i32 0
+  %broadcast.splat = shufflevector <vscale x 4 x i32> %broadcast.splatinsert, <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
+  %4 = call i64 @llvm.vscale.i64()
+  %5 = shl i64 %4, 2
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %vector.ph
+  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
+  %6 = getelementptr inbounds i32, i32* %a, i64 %index
+  %7 = bitcast i32* %6 to <vscale x 4 x i32>*
+  %wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %7, align 4
+  %8 = add <vscale x 4 x i32> %wide.load, %broadcast.splat
+  %9 = bitcast i32* %6 to <vscale x 4 x i32>*
+  store <vscale x 4 x i32> %8, <vscale x 4 x i32>* %9, align 4
+  %index.next = add nuw i64 %index, %5
+  %10 = icmp eq i64 %index.next, %n.vec
+  br i1 %10, label %middle.block, label %vector.body
+
+middle.block:                                     ; preds = %vector.body
+  %cmp.n = icmp eq i64 %n.mod.vf, 0
+  br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader
+
+for.body.preheader:                               ; preds = %entry, %middle.block
+  %indvars.iv.ph = phi i64 [ 0, %entry ], [ %n.vec, %middle.block ]
+  br label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.body, %middle.block
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %indvars.iv.ph, %for.body.preheader ]
+  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
+  %11 = load i32, i32* %arrayidx, align 4
+  %add = add i32 %11, %x
+  store i32 %add, i32* %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %cmp.not = icmp eq i64 %indvars.iv.next, 1024
+  br i1 %cmp.not, label %for.cond.cleanup, label %for.body
+}
+
+declare <4 x i32> @llvm.vp.add.v4i32(<4 x i32>, <4 x i32>, <4 x i1>, i32)
+
+define void @sink_splat_vp_add(i32* nocapture %a, i32 signext %x, <4 x i1> %m, i32 zeroext %vl) {
+; CHECK-LABEL: sink_splat_vp_add:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a1
+; CHECK-NEXT:    lui a1, 1
+; CHECK-NEXT:    add a1, a0, a1
+; CHECK-NEXT:  .LBB2_1: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vle32.v v9, (a0)
+; CHECK-NEXT:    vsetvli zero, a2, e32, m1, ta, ma
+; CHECK-NEXT:    vadd.vv v9, v9, v8, v0.t
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vse32.v v9, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    bne a0, a1, .LBB2_1
+; CHECK-NEXT:  # %bb.2: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i32 0
+  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %entry
+  %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
+  %0 = getelementptr inbounds i32, i32* %a, i64 %index
+  %1 = bitcast i32* %0 to <4 x i32>*
+  %wide.load = load <4 x i32>, <4 x i32>* %1, align 4
+  %2 = call <4 x i32> @llvm.vp.add.v4i32(<4 x i32> %wide.load, <4 x i32> %broadcast.splat, <4 x i1> %m, i32 %vl)
+  %3 = bitcast i32* %0 to <4 x i32>*
+  store <4 x i32> %2, <4 x i32>* %3, align 4
+  %index.next = add nuw i64 %index, 4
+  %4 = icmp eq i64 %index.next, 1024
+  br i1 %4, label %for.cond.cleanup, label %vector.body
+
+for.cond.cleanup:                                 ; preds = %vector.body
+  ret void
+}
+
+define void @sink_splat_fadd(float* nocapture %a, float %x) {
+; CHECK-LABEL: sink_splat_fadd:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfmv.v.f v8, fa0
+; CHECK-NEXT:    lui a1, 1
+; CHECK-NEXT:    add a1, a0, a1
+; CHECK-NEXT:  .LBB3_1: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vle32.v v9, (a0)
+; CHECK-NEXT:    vfadd.vv v9, v9, v8
+; CHECK-NEXT:    vse32.v v9, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    bne a0, a1, .LBB3_1
+; CHECK-NEXT:  # %bb.2: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %broadcast.splatinsert = insertelement <4 x float> poison, float %x, i32 0
+  %broadcast.splat = shufflevector <4 x float> %broadcast.splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %entry
+  %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
+  %0 = getelementptr inbounds float, float* %a, i64 %index
+  %1 = bitcast float* %0 to <4 x float>*
+  %wide.load = load <4 x float>, <4 x float>* %1, align 4
+  %2 = fadd <4 x float> %wide.load, %broadcast.splat
+  %3 = bitcast float* %0 to <4 x float>*
+  store <4 x float> %2, <4 x float>* %3, align 4
+  %index.next = add nuw i64 %index, 4
+  %4 = icmp eq i64 %index.next, 1024
+  br i1 %4, label %for.cond.cleanup, label %vector.body
+
+for.cond.cleanup:                                 ; preds = %vector.body
+  ret void
+}
+
+define void @sink_splat_fadd_scalable(float* nocapture %a, float %x) {
+; CHECK-LABEL: sink_splat_fadd_scalable:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    csrr a1, vlenb
+; CHECK-NEXT:    srli a2, a1, 2
+; CHECK-NEXT:    li a3, 1024
+; CHECK-NEXT:    bgeu a3, a2, .LBB4_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a3, 0
+; CHECK-NEXT:    j .LBB4_5
+; CHECK-NEXT:  .LBB4_2: # %vector.ph
+; CHECK-NEXT:    addi a3, a2, -1
+; CHECK-NEXT:    andi a4, a3, 1024
+; CHECK-NEXT:    xori a3, a4, 1024
+; CHECK-NEXT:    vsetvli a5, zero, e32, m1, ta, ma
+; CHECK-NEXT:    vfmv.v.f v8, fa0
+; CHECK-NEXT:    mv a5, a0
+; CHECK-NEXT:    mv a6, a3
+; CHECK-NEXT:  .LBB4_3: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vl1re32.v v9, (a5)
+; CHECK-NEXT:    vfadd.vv v9, v9, v8
+; CHECK-NEXT:    vs1r.v v9, (a5)
+; CHECK-NEXT:    sub a6, a6, a2
+; CHECK-NEXT:    add a5, a5, a1
+; CHECK-NEXT:    bnez a6, .LBB4_3
+; CHECK-NEXT:  # %bb.4: # %middle.block
+; CHECK-NEXT:    beqz a4, .LBB4_7
+; CHECK-NEXT:  .LBB4_5: # %for.body.preheader
+; CHECK-NEXT:    slli a1, a3, 2
+; CHECK-NEXT:    add a1, a0, a1
+; CHECK-NEXT:    lui a2, 1
+; CHECK-NEXT:    add a0, a0, a2
+; CHECK-NEXT:  .LBB4_6: # %for.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    flw fa5, 0(a1)
+; CHECK-NEXT:    fadd.s fa5, fa5, fa0
+; CHECK-NEXT:    fsw fa5, 0(a1)
+; CHECK-NEXT:    addi a1, a1, 4
+; CHECK-NEXT:    bne a1, a0, .LBB4_6
+; CHECK-NEXT:  .LBB4_7: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.vscale.i64()
+  %1 = shl i64 %0, 1
+  %min.iters.check = icmp ugt i64 %1, 1024
+  br i1 %min.iters.check, label %for.body.preheader, label %vector.ph
+
+vector.ph:                                        ; preds = %entry
+  %2 = call i64 @llvm.vscale.i64()
+  %3 = shl i64 %2, 1
+  %n.mod.vf = urem i64 1024, %3
+  %n.vec = sub nsw i64 1024, %n.mod.vf
+  %broadcast.splatinsert = insertelement <vscale x 2 x float> poison, float %x, i32 0
+  %broadcast.splat = shufflevector <vscale x 2 x float> %broadcast.splatinsert, <vscale x 2 x float> poison, <vscale x 2 x i32> zeroinitializer
+  %4 = call i64 @llvm.vscale.i64()
+  %5 = shl i64 %4, 1
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %vector.ph
+  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
+  %6 = getelementptr inbounds float, float* %a, i64 %index
+  %7 = bitcast float* %6 to <vscale x 2 x float>*
+  %wide.load = load <vscale x 2 x float>, <vscale x 2 x float>* %7, align 4
+  %8 = fadd <vscale x 2 x float> %wide.load, %broadcast.splat
+  %9 = bitcast float* %6 to <vscale x 2 x float>*
+  store <vscale x 2 x float> %8, <vscale x 2 x float>* %9, align 4
+  %index.next = add nuw i64 %index, %5
+  %10 = icmp eq i64 %index.next, %n.vec
+  br i1 %10, label %middle.block, label %vector.body
+
+middle.block:                                     ; preds = %vector.body
+  %cmp.n = icmp eq i64 %n.mod.vf, 0
+  br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader
+
+for.body.preheader:                               ; preds = %entry, %middle.block
+  %indvars.iv.ph = phi i64 [ 0, %entry ], [ %n.vec, %middle.block ]
+  br label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.body, %middle.block
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %indvars.iv.ph, %for.body.preheader ]
+  %arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
+  %11 = load float, float* %arrayidx, align 4
+  %mul = fadd float %11, %x
+  store float %mul, float* %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %cmp.not = icmp eq i64 %indvars.iv.next, 1024
+  br i1 %cmp.not, label %for.cond.cleanup, label %for.body
+}
+
+declare <4 x float> @llvm.vp.fadd.v4i32(<4 x float>, <4 x float>, <4 x i1>, i32)
+
+define void @sink_splat_vp_fadd(float* nocapture %a, float %x, <4 x i1> %m, i32 zeroext %vl) {
+; CHECK-LABEL: sink_splat_vp_fadd:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vfmv.v.f v8, fa0
+; CHECK-NEXT:    lui a2, 1
+; CHECK-NEXT:    add a2, a0, a2
+; CHECK-NEXT:  .LBB5_1: # %vector.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vle32.v v9, (a0)
+; CHECK-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
+; CHECK-NEXT:    vfadd.vv v9, v9, v8, v0.t
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vse32.v v9, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    bne a0, a2, .LBB5_1
+; CHECK-NEXT:  # %bb.2: # %for.cond.cleanup
+; CHECK-NEXT:    ret
+entry:
+  %broadcast.splatinsert = insertelement <4 x float> poison, float %x, i32 0
+  %broadcast.splat = shufflevector <4 x float> %broadcast.splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:                                      ; preds = %vector.body, %entry
+  %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]
+  %0 = getelementptr inbounds float, float* %a, i64 %index
+  %1 = bitcast float* %0 to <4 x float>*
+  %wide.load = load <4 x float>, <4 x float>* %1, align 4
+  %2 = call <4 x float> @llvm.vp.fadd.v4i32(<4 x float> %wide.load, <4 x float> %broadcast.splat, <4 x i1> %m, i32 %vl)
+  %3 = bitcast float* %0 to <4 x float>*
+  store <4 x float> %2, <4 x float>* %3, align 4
+  %index.next = add nuw i64 %index, 4
+  %4 = icmp eq i64 %index.next, 1024
+  br i1 %4, label %for.cond.cleanup, label %vector.body
+
+for.cond.cleanup:                                 ; preds = %vector.body
+  ret void
+}

wangpc-pp · 2024-01-24T03:43:00Z

llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll

@@ -0,0 +1,353 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+f -target-abi=lp64f \
+; RUN:   -mattr=+dont-sink-splat-operands -riscv-v-vector-bits-min=128 | FileCheck %s


Add a RUN without +dont-sink-splat-operands?

wangpc-pp · 2024-01-24T03:44:43Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  // repeatedly.
+  // FIXME: It could still be worth doing if it would improve vector register
+  // pressure and prevent a vector spill.
+  if (Subtarget.dontSinkSplatOperands())


Is this the only place to control the generation of sinked V instructions?

The middle end LICM pass will usually hoist splats of the following form out loops.

%a = insertelt <vscale x 2 x i64> poison, i64 0, i32 0 %b = shufflevector <vscale x 2 x i64> %a, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer>

The code in function tries to sink these splats back into loops if it would enable a .vx/.vf/.wx/.wf instruction to be formed. This tuning flag disables the sinking and leaves the splats outside the loop the way LICM left them.

This is a simple way to prevent many cases of .vx/.vf/.wx/.wf instructions in loops. But as noted in the FIXMEs it is bad for register pressure. We've been trying to write an optimization pass to do this better, but we don't have anything that gives good results yet.

wangpc-pp · 2024-01-24T03:45:05Z

llvm/lib/Target/RISCV/RISCVFeatures.td

+// FIXME: Forming .vx/.vf can reduce register pressure.
+def TuneDontSinkSplatOperands
+    : SubtargetFeature<"dont-sink-splat-operands", "DontSinkSplatOperands",
+                       "true", "Don't sink splat operands to enable .vx or .vf "


.wx and .wf?

wangpc-pp

LGTM if comment is addressed.

wangpc-pp · 2024-01-25T02:56:22Z

llvm/lib/Target/RISCV/RISCVFeatures.td

+// Some subtargets require a S2V transfer buffer to move scalars into vectors.
+// FIXME: Forming .vx/.vf/.wx/.wf can reduce register pressure.
+def TuneDontSinkSplatOperands
+    : SubtargetFeature<"dont-sink-splat-operands", "DontSinkSplatOperands",


DontSinkSplatOperands -> SinkSplatOperands, and then change true to false?
This matches other SubtargetFeatures that will disable some features.

It looks like TuneNoDefaultUnroll, TuneNoOptimizedZeroStride, FeatureNoRVCHints disable features and contain the No string, similar to Dont. I don't see any other SubtargetFeatures with false without the No string. Are you sure your suggestion matches with how we disable features?

I made an attempt to implement your suggestion, and here's what I found:

If we had SinkSplatOperands and false, then we would want to have

if (!Subtarget.sinkSplatOperands()) return false;

since

if (Subtarget.sinkSplatOperands()) return true;

is not the desired behavior, since this tuning having a true value does not imply that splat operands should be sunk. Instead, a true value implies that we should allow the rest of the logic in that function to determine whether to sink the splat operands. The desired behavior is that if sinkSplatOperands is false then we will not sink splat operands.

Now lets consider the three command line options:

+sink-splat-operands

-sink-splat-operands

Tuning not specified

If we had SinkSplatOperands and false, then only the tuning not specified option works as expected.
If we had SinkSplatOperands and true, then +sink-splat-operands and -sink-splat-operands work correctly, but tuning not specified does not sink when it should be sinking.

Am I misunderstanding something? Based on my understanding, I think the tuning is in line with other tunings that try to disable features. The one refinement I can think of is that we can rename to TuneNoSinkSplatOperands since the other features use the No instead of Dont naming scheme.

The suggestion is something like

def TuneNoSinkSplatOperands : SubtargetFeature<"no-sink-splat-operands", "SinkSplatOperands", "false"

Notice the command line option has "No", but the variable in Subtarget field does not.

The use of "false" will make "SinkSplatOperands" default to true, and passing -mattr=-no-sink-splat-operands will set it to false.

Thanks for the explanation. Updated.

topperc · 2024-01-25T17:41:42Z

llvm/test/CodeGen/RISCV/rvv/dont-sink-splat-operands.ll

+; FIXME: This is potentially bad for register pressure. Need a better heuristic.
+
+define void @sink_splat_add(i32* nocapture %a, i32 signext %x) {
+; DONT-SINK-LABEL: sink_splat_add:


[RISCV] Add Tune to DontSinkSplatOperands

99aad5a

A CPU may prefer to not sink splat operands, one reason being that it could require a S2V transfer buffer to move scalars into buffers. This is a precommit for llvm#79015.

michaelmaitland added the backend:RISC-V label Jan 23, 2024

michaelmaitland requested review from topperc and wangpc-pp January 23, 2024 20:01

michaelmaitland mentioned this pull request Jan 23, 2024

[RISCV] Add sifive-p670 processor #79015

Merged

wangpc-pp reviewed Jan 24, 2024

View reviewed changes

michaelmaitland added 2 commits January 24, 2024 11:06

!fixme update description to include wx and wf

d0cab7b

!fixup add RUN with sinking

8115ab4

wangpc-pp approved these changes Jan 25, 2024

View reviewed changes

!fixup change feature name

4ce060a

topperc reviewed Jan 25, 2024

View reviewed changes

!fixup DONT->NO

6fac8a3

michaelmaitland merged commit 594b92a into llvm:main Jan 25, 2024

michaelmaitland deleted the tune-done-sink-splat branch January 25, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Add Tune to DontSinkSplatOperands #79199

[RISCV] Add Tune to DontSinkSplatOperands #79199

Uh oh!

michaelmaitland commented Jan 23, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jan 23, 2024

Uh oh!

wangpc-pp Jan 24, 2024

Uh oh!

wangpc-pp Jan 24, 2024

Uh oh!

topperc Jan 24, 2024

Uh oh!

wangpc-pp Jan 24, 2024

Uh oh!

michaelmaitland Jan 24, 2024

Uh oh!

wangpc-pp left a comment

Uh oh!

wangpc-pp Jan 25, 2024

Uh oh!

michaelmaitland Jan 25, 2024 •

edited

Loading

Uh oh!

topperc Jan 25, 2024 •

edited

Loading

Uh oh!

michaelmaitland Jan 25, 2024

Uh oh!

topperc Jan 25, 2024

Uh oh!

Uh oh!

[RISCV] Add Tune to DontSinkSplatOperands #79199

[RISCV] Add Tune to DontSinkSplatOperands #79199

Uh oh!

Conversation

michaelmaitland commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelmaitland commented Jan 23, 2024 •

edited

Loading

michaelmaitland Jan 25, 2024 •

edited

Loading

topperc Jan 25, 2024 •

edited

Loading