-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV, VPlan] Check if plan is compatible to EVL transform #92092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Shih-Po Hung (arcbbb) ChangesVector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan. Full diff: https://github.com/llvm/llvm-project/pull/92092.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index adae7caf5917c..22fc627dc9141 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8539,6 +8539,29 @@ VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
return tryToWiden(Instr, Operands, VPBB);
}
+// EVL transform doesn't support backends where EVL diffs from RuntimeVF
+// in the second-to-last iteration.
+// Return false if the vector region has recipes relying on
+// RuntimeVF.
+static bool isCompatibleToEVLTransform(VPlan &Plan) {
+ auto HasAnyRuntimeVFUserInLoop = [](VPlan &Plan) -> bool {
+ for (auto &Phi : Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis())
+ if (isa<VPWidenIntOrFpInductionRecipe>(&Phi) ||
+ isa<VPWidenPointerInductionRecipe>(&Phi))
+ return true;
+ for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+ vp_depth_first_deep(Plan.getVectorLoopRegion())))
+ for (VPRecipeBase &Recipe : *VPBB)
+ if (auto *VecPtrR = dyn_cast<VPVectorPointerRecipe>(&Recipe))
+ if (VecPtrR->isReverse())
+ return true;
+ return false;
+ };
+ if (HasAnyRuntimeVFUserInLoop(Plan))
+ return false;
+ return true;
+}
+
void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");
@@ -8553,8 +8576,12 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
*Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
VPlanTransforms::optimize(*Plan, *PSE.getSE());
// TODO: try to put it close to addActiveLaneMask().
- if (CM.foldTailWithEVL())
+ if (CM.foldTailWithEVL()) {
+ // Don't generate plan if the plan is not EVL-compatible
+ if (!isCompatibleToEVLTransform(*Plan))
+ break;
VPlanTransforms::addExplicitVectorLength(*Plan);
+ }
assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
VPlans.push_back(std::move(Plan));
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4a..94c342089fbdc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1576,6 +1576,7 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
void execute(VPTransformState &State) override;
+ bool isReverse() { return IsReverse; }
bool onlyFirstLaneUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
new file mode 100644
index 0000000000000..1b652fdda9e44
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
@@ -0,0 +1,103 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
+
+; Check loops having VPWidenIntOrFpInductionRecipe
+define void @foo(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT: store i64 [[IV]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK: for.cond.cleanup:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+ store i64 %iv, ptr %arrayidx, align 8
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond.not = icmp eq i64 %iv.next, %N
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+ ret void
+}
+
+; Check loops having VPWidenPointerInductionRecipe
+define void @foo2(ptr noalias %a, ptr noalias %b, i64 %N) {
+; CHECK-LABEL: define void @foo2(
+; CHECK-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[B]], [[ENTRY]] ]
+; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 8
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT: store ptr [[ADDR]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK: for.cond.cleanup:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %addr = phi ptr [ %incdec.ptr, %for.body ], [ %b, %entry ]
+ %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 8
+ %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+ store ptr %addr, ptr %arrayidx, align 8
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond.not = icmp eq i64 %iv.next, %N
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+ ret void
+}
+
+; Check loops having VPVectorPointerRecipe that access in reverse order
+define void @foo3(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo3(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[A]], [[ENTRY]] ]
+; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 -4
+; CHECK-NEXT: store i64 [[N]], ptr [[ADDR]], align 8
+; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK: for.cond.cleanup:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %addr = phi ptr [ %incdec.ptr, %for.body ], [ %a, %entry ]
+ %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 -4
+ store i64 %N, ptr %addr, align 8
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond.not = icmp eq i64 %iv.next, %N
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+ ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
index ae01bdd371106..a52da79ee3963 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
@@ -12,66 +12,18 @@
define void @gather_scatter(ptr noalias %in, ptr noalias %out, ptr noalias %index, i64 %n) {
; IF-EVL-LABEL: @gather_scatter(
; IF-EVL-NEXT: entry:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; IF-EVL: vector.ph:
-; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
-; IF-EVL-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 2
-; IF-EVL-NEXT: [[TMP8:%.*]] = sub i64 [[TMP7]], 1
-; IF-EVL-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP8]]
-; IF-EVL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
-; IF-EVL-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; IF-EVL-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP10:%.*]] = mul i64 [[TMP9]], 2
-; IF-EVL-NEXT: [[TMP11:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
-; IF-EVL-NEXT: [[TMP12:%.*]] = add <vscale x 2 x i64> [[TMP11]], zeroinitializer
-; IF-EVL-NEXT: [[TMP13:%.*]] = mul <vscale x 2 x i64> [[TMP12]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
-; IF-EVL-NEXT: [[INDUCTION:%.*]] = add <vscale x 2 x i64> zeroinitializer, [[TMP13]]
-; IF-EVL-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP15:%.*]] = mul i64 [[TMP14]], 2
-; IF-EVL-NEXT: [[TMP16:%.*]] = mul i64 1, [[TMP15]]
-; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP16]], i64 0
-; IF-EVL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
-; IF-EVL-NEXT: br label [[VECTOR_BODY:%.*]]
-; IF-EVL: vector.body:
-; IF-EVL-NEXT: [[INDEX1:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT: [[VEC_IND:%.*]] = phi <vscale x 2 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT: [[TMP17:%.*]] = sub i64 [[N]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT: [[TMP18:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[TMP17]], i32 2, i1 true)
-; IF-EVL-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT: [[TMP21:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 2 x float> @llvm.vp.gather.nxv2f32.nxv2p0(<vscale x 2 x ptr> align 4 [[TMP21]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT: call void @llvm.vp.scatter.nxv2f32.nxv2p0(<vscale x 2 x float> [[WIDE_MASKED_GATHER2]], <vscale x 2 x ptr> align 4 [[TMP22]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT: [[TMP23:%.*]] = zext i32 [[TMP18]] to i64
-; IF-EVL-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[TMP23]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX1]], [[TMP10]]
-; IF-EVL-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
-; IF-EVL-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; IF-EVL-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; IF-EVL: middle.block:
-; IF-EVL-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
-; IF-EVL: scalar.ph:
-; IF-EVL-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
; IF-EVL: for.body:
-; IF-EVL-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
-; IF-EVL-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX]], i64 [[INDVARS_IV]]
-; IF-EVL-NEXT: [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
-; IF-EVL-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN]], i64 [[TMP25]]
-; IF-EVL-NEXT: [[TMP26:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
-; IF-EVL-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT]], i64 [[TMP25]]
-; IF-EVL-NEXT: store float [[TMP26]], ptr [[ARRAYIDX7]], align 4
+; IF-EVL-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; IF-EVL-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], i64 [[INDVARS_IV]]
+; IF-EVL-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
+; IF-EVL-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
+; IF-EVL-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT: store float [[TMP1]], ptr [[ARRAYIDX7]], align 4
; IF-EVL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; IF-EVL-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
-; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; IF-EVL-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N:%.*]]
+; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
; IF-EVL: for.end:
; IF-EVL-NEXT: ret void
;
|
llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
Outdated
Show resolved
Hide resolved
A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092
A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092
llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration.
For the part above, I'd use the same wording as the commit in addExplicitVectorLength
This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.
the vectorizer stop to generate the plan -> the plan is discarded?
A precommit test case to show vector loops generated from EVL transform - This is a precommit test for #92092
Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.
I found some loops can't be vectorized and finally I dug to this PR. :-) |
Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes. |
The problem I met is because there are some |
If bail out to DataWithoutLaneMask, it's still a kind of tail-folding. Is it better than scalar epilogue? As what 'predicate-else-scalar-epilogue' says, create scalar epilogue if tail folding fails. So the question is whether it's failure tail-folding when DataWithEVL style fails and DataWithoutLaneMask succeed. |
It depends on the two PRs
Enabling the vectorization causes issues on HW that implements
|
@arcbbb Actually I don't understand this, can you explain it to me? |
Under the constraint And the increment of widen iv (ptr or integer /fp) is using VLMax instead of VL. it doesn't work on HW implementations which goes (10, 10).
|
@arcbbb Get it! Thanks! |
So after those two enhancement, then it will not discard the VPlan? |
Yes, as long as EVL is applied properly. |
The transform updates all users of inductions to work based on EVL, instead
of the VF directly. At the moment, widened inductions cannot be updated, so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.