[LV, VPlan] Check if plan is compatible to EVL transform #92092

arcbbb · 2024-05-14T09:58:08Z

The transform updates all users of inductions to work based on EVL, instead
of the VF directly. At the moment, widened inductions cannot be updated, so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.

llvmbot · 2024-05-14T09:58:36Z

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration.

This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.

Full diff: https://github.com/llvm/llvm-project/pull/92092.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+28-1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+103)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll (+9-57)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index adae7caf5917c..22fc627dc9141 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8539,6 +8539,29 @@ VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
   return tryToWiden(Instr, Operands, VPBB);
 }
 
+// EVL transform doesn't support backends where EVL diffs from RuntimeVF
+// in the second-to-last iteration.
+// Return false if the vector region has recipes relying on
+// RuntimeVF.
+static bool isCompatibleToEVLTransform(VPlan &Plan) {
+  auto HasAnyRuntimeVFUserInLoop = [](VPlan &Plan) -> bool {
+    for (auto &Phi : Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis())
+      if (isa<VPWidenIntOrFpInductionRecipe>(&Phi) ||
+          isa<VPWidenPointerInductionRecipe>(&Phi))
+        return true;
+    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+             vp_depth_first_deep(Plan.getVectorLoopRegion())))
+      for (VPRecipeBase &Recipe : *VPBB)
+        if (auto *VecPtrR = dyn_cast<VPVectorPointerRecipe>(&Recipe))
+          if (VecPtrR->isReverse())
+            return true;
+    return false;
+  };
+  if (HasAnyRuntimeVFUserInLoop(Plan))
+    return false;
+  return true;
+}
+
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
                                                         ElementCount MaxVF) {
   assert(OrigLoop->isInnermost() && "Inner loop expected.");
@@ -8553,8 +8576,12 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
             *Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
       VPlanTransforms::optimize(*Plan, *PSE.getSE());
       // TODO: try to put it close to addActiveLaneMask().
-      if (CM.foldTailWithEVL())
+      if (CM.foldTailWithEVL()) {
+        // Don't generate plan if the plan is not EVL-compatible
+        if (!isCompatibleToEVLTransform(*Plan))
+          break;
         VPlanTransforms::addExplicitVectorLength(*Plan);
+      }
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4a..94c342089fbdc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1576,6 +1576,7 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
 
   void execute(VPTransformState &State) override;
 
+  bool isReverse() { return IsReverse; }
   bool onlyFirstLaneUsed(const VPValue *Op) const override {
     assert(is_contained(operands(), Op) &&
            "Op must be an operand of the recipe");
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
new file mode 100644
index 0000000000000..1b652fdda9e44
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
@@ -0,0 +1,103 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
+
+; Check loops having VPWidenIntOrFpInductionRecipe
+define void @foo(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT:    store i64 [[IV]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+  store i64 %iv, ptr %arrayidx, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; Check loops having VPWidenPointerInductionRecipe
+define void @foo2(ptr noalias %a, ptr noalias %b, i64 %N) {
+; CHECK-LABEL: define void @foo2(
+; CHECK-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[B]], [[ENTRY]] ]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 8
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT:    store ptr [[ADDR]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %addr = phi ptr [ %incdec.ptr, %for.body ], [ %b, %entry ]
+  %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 8
+  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+  store ptr %addr, ptr %arrayidx, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; Check loops having VPVectorPointerRecipe that access in reverse order
+define void @foo3(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo3(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[A]], [[ENTRY]] ]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 -4
+; CHECK-NEXT:    store i64 [[N]], ptr [[ADDR]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %addr = phi ptr [ %incdec.ptr, %for.body ], [ %a, %entry ]
+  %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 -4
+  store i64 %N, ptr %addr, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
index ae01bdd371106..a52da79ee3963 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
@@ -12,66 +12,18 @@
 define void @gather_scatter(ptr noalias %in, ptr noalias %out, ptr noalias %index, i64 %n) {
 ; IF-EVL-LABEL: @gather_scatter(
 ; IF-EVL-NEXT:  entry:
-; IF-EVL-NEXT:    [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT:    [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT:    br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; IF-EVL:       vector.ph:
-; IF-EVL-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 2
-; IF-EVL-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 2
-; IF-EVL-NEXT:    [[TMP8:%.*]] = sub i64 [[TMP7]], 1
-; IF-EVL-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP8]]
-; IF-EVL-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
-; IF-EVL-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; IF-EVL-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 2
-; IF-EVL-NEXT:    [[TMP11:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
-; IF-EVL-NEXT:    [[TMP12:%.*]] = add <vscale x 2 x i64> [[TMP11]], zeroinitializer
-; IF-EVL-NEXT:    [[TMP13:%.*]] = mul <vscale x 2 x i64> [[TMP12]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
-; IF-EVL-NEXT:    [[INDUCTION:%.*]] = add <vscale x 2 x i64> zeroinitializer, [[TMP13]]
-; IF-EVL-NEXT:    [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
-; IF-EVL-NEXT:    [[TMP16:%.*]] = mul i64 1, [[TMP15]]
-; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP16]], i64 0
-; IF-EVL-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
-; IF-EVL-NEXT:    br label [[VECTOR_BODY:%.*]]
-; IF-EVL:       vector.body:
-; IF-EVL-NEXT:    [[INDEX1:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[VEC_IND:%.*]] = phi <vscale x 2 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[TMP17:%.*]] = sub i64 [[N]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT:    [[TMP18:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[TMP17]], i32 2, i1 true)
-; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP21:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 2 x float> @llvm.vp.gather.nxv2f32.nxv2p0(<vscale x 2 x ptr> align 4 [[TMP21]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP22:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT:    call void @llvm.vp.scatter.nxv2f32.nxv2p0(<vscale x 2 x float> [[WIDE_MASKED_GATHER2]], <vscale x 2 x ptr> align 4 [[TMP22]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP23:%.*]] = zext i32 [[TMP18]] to i64
-; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add i64 [[TMP23]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX1]], [[TMP10]]
-; IF-EVL-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
-; IF-EVL-NEXT:    [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; IF-EVL-NEXT:    br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
-; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
-; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
-; IF-EVL-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX]], i64 [[INDVARS_IV]]
-; IF-EVL-NEXT:    [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
-; IF-EVL-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN]], i64 [[TMP25]]
-; IF-EVL-NEXT:    [[TMP26:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
-; IF-EVL-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT]], i64 [[TMP25]]
-; IF-EVL-NEXT:    store float [[TMP26]], ptr [[ARRAYIDX7]], align 4
+; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; IF-EVL-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], i64 [[INDVARS_IV]]
+; IF-EVL-NEXT:    [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
+; IF-EVL-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT:    [[TMP1:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
+; IF-EVL-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT:    store float [[TMP1]], ptr [[ARRAYIDX7]], align 4
 ; IF-EVL-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
-; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N:%.*]]
+; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
 ; IF-EVL:       for.end:
 ; IF-EVL-NEXT:    ret void
 ;

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll

fhahn

LGTM, thanks!

Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration.

For the part above, I'd use the same wording as the commit in addExplicitVectorLength

This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.

the vectorizer stop to generate the plan -> the plan is discarded?

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for #92092

Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.

wangpc-pp · 2025-02-25T09:57:52Z

I found some loops can't be vectorized and finally I dug to this PR. :-)
I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

zixuan-wu · 2025-03-20T09:15:12Z

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

wangpc-pp · 2025-03-20T10:59:42Z

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

The problem I met is because there are some VPWidenPointerInductionRecipes and it can't be handled when using EVL tail folding. I don't see any problem when using Data or DataWithoutLaneMask.

zixuan-wu · 2025-03-21T02:30:05Z

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

The problem I met is because there are some VPWidenPointerInductionRecipes and it can't be handled when using EVL tail folding. I don't see any problem when using Data or DataWithoutLaneMask.

If bail out to DataWithoutLaneMask, it's still a kind of tail-folding. Is it better than scalar epilogue? As what 'predicate-else-scalar-epilogue' says, create scalar epilogue if tail folding fails. So the question is whether it's failure tail-folding when DataWithEVL style fails and DataWithoutLaneMask succeed.

arcbbb · 2025-03-21T02:46:26Z

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

It depends on the two PRs

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638
[VPlan] Pass NumUnrolledElems as operand to VPWidenPointerInductionRecipe. NFC #119859

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)
described in https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-config

6.3 Constraints on Setting vl
The vset{i}vl{i} instructions first set VLMAX according to their vtype argument, then set vl obeying the following constraints:

vl = AVL if AVL ≤ VLMAX

ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

vl = VLMAX if AVL ≥ (2 * VLMAX)

wangpc-pp · 2025-03-21T03:03:58Z

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

@arcbbb Actually I don't understand this, can you explain it to me?

arcbbb · 2025-03-21T03:25:23Z

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

@arcbbb Actually I don't understand this, can you explain it to me?

Under the constraint ceil(AVL / 2) ≤ VL ≤ VLMAX when AVL < 2 * VLMAX:
Assuming VLMAX = 16 and AVL = 20, the loop would require 2 iterations.
The values returned by vsetvl in those iterations could be either (16, 4) or (10, 10).
This is HW implementation-defined.

And the increment of widen iv (ptr or integer /fp) is using VLMax instead of VL. it doesn't work on HW implementations which goes (10, 10).
As the diff change shows

; CHECK-NEXT:    [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
; CHECK-NEXT:    [[TMP16:%.*]] = mul i64 1, [[TMP15]]
; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP16]], i64 0
; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
; CHECK:       vector.body:
; .....
; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[DOTSPLAT]]

wangpc-pp · 2025-03-21T03:54:46Z

@arcbbb Get it! Thanks!

zixuan-wu · 2025-03-21T09:04:35Z

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

[VPlan] Pass VF as operand to VPWidenPointerInductionRecipe #119859

So after those two enhancement, then it will not discard the VPlan?

arcbbb · 2025-03-21T09:33:20Z

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

[VPlan] Pass VF as operand to VPWidenPointerInductionRecipe #119859

So after those two enhancement, then it will not discard the VPlan?

Yes, as long as EVL is applied properly.

arcbbb requested review from fhahn and alexey-bataev May 14, 2024 09:58

llvmbot added vectorizers llvm:transforms labels May 14, 2024

fhahn reviewed May 14, 2024

View reviewed changes

arcbbb added a commit to arcbbb/llvm-project that referenced this pull request May 15, 2024

[LV][NFC] precommit test for EVL transform

f161b62

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092

arcbbb mentioned this pull request May 15, 2024

[LV][NFC] precommit test for EVL transform #92203

Merged

arcbbb added a commit to arcbbb/llvm-project that referenced this pull request May 15, 2024

[LV][NFC] precommit test for EVL transform

3f3bf37

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092

arcbbb force-pushed the check-evl-compat branch from b393ff1 to 07a62be Compare May 15, 2024 07:33

fhahn reviewed May 16, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

fhahn reviewed May 23, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll Outdated Show resolved Hide resolved

arcbbb force-pushed the check-evl-compat branch from 72d38d9 to 64e8908 Compare May 24, 2024 03:07

fhahn mentioned this pull request May 24, 2024

[LV][EVL] Support in-loop reduction using tail folding with EVL. #90184

Merged

fhahn approved these changes May 24, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.h Outdated Show resolved Hide resolved

arcbbb added a commit that referenced this pull request May 24, 2024

[LV][NFC] precommit test for EVL transform (#92203)

b008a2d

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for #92092

arcbbb added 5 commits May 24, 2024 08:26

Address comments

f4bed80

Fold isCompatibleToEVLTransform into addExplicitVectorLength

fe8cf54

Address comments

13b994a

Rename addExplicitVectorLength to tryAddExplicitVectorLength

3a85956

arcbbb force-pushed the check-evl-compat branch from 64e8908 to 3a85956 Compare May 24, 2024 15:37

arcbbb merged commit 0338c55 into llvm:main May 25, 2024
7 checks passed

arcbbb deleted the check-evl-compat branch May 27, 2024 06:07

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request May 27, 2024

mis-vectctorized after llvm#92092.

7fbcb00

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 5, 2024

mis-vectctorized after llvm#92092.

e799412

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 10, 2024

mis-vectctorized after llvm#92092.

61ef129

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 14, 2024

mis-vectctorized after llvm#92092.

ab5362d

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 19, 2024

mis-vectctorized after llvm#92092.

01877f5

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 20, 2024

mis-vectctorized after llvm#92092.

933eae2

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 24, 2024

mis-vectctorized after llvm#92092.

ba84800

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 28, 2024

mis-vectctorized after llvm#92092.

29fcadf

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jul 8, 2024

mis-vectctorized after llvm#92092.

5e3ba60

Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jul 15, 2024

mis-vectctorized after llvm#92092.

73f8bfa

[LV, VPlan] Check if plan is compatible to EVL transform #92092

[LV, VPlan] Check if plan is compatible to EVL transform #92092

Uh oh!

Conversation

arcbbb commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wangpc-pp commented Feb 25, 2025

Uh oh!

zixuan-wu commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangpc-pp commented Mar 20, 2025

Uh oh!

zixuan-wu commented Mar 21, 2025

Uh oh!

arcbbb commented Mar 21, 2025

Uh oh!

wangpc-pp commented Mar 21, 2025

Uh oh!

arcbbb commented Mar 21, 2025

Uh oh!

wangpc-pp commented Mar 21, 2025

Uh oh!

zixuan-wu commented Mar 21, 2025

Uh oh!

arcbbb commented Mar 21, 2025

Uh oh!

Uh oh!

arcbbb commented May 14, 2024 •

edited

Loading

zixuan-wu commented Mar 20, 2025 •

edited

Loading