Skip to content

[LV, VPlan] Check if plan is compatible to EVL transform #92092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 25, 2024

Conversation

arcbbb
Copy link
Contributor

@arcbbb arcbbb commented May 14, 2024

The transform updates all users of inductions to work based on EVL, instead
of the VF directly. At the moment, widened inductions cannot be updated, so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.

@llvmbot
Copy link
Member

llvmbot commented May 14, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration.

This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.


Full diff: https://github.com/llvm/llvm-project/pull/92092.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+28-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1)
  • (added) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+103)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll (+9-57)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index adae7caf5917c..22fc627dc9141 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8539,6 +8539,29 @@ VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
   return tryToWiden(Instr, Operands, VPBB);
 }
 
+// EVL transform doesn't support backends where EVL diffs from RuntimeVF
+// in the second-to-last iteration.
+// Return false if the vector region has recipes relying on
+// RuntimeVF.
+static bool isCompatibleToEVLTransform(VPlan &Plan) {
+  auto HasAnyRuntimeVFUserInLoop = [](VPlan &Plan) -> bool {
+    for (auto &Phi : Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis())
+      if (isa<VPWidenIntOrFpInductionRecipe>(&Phi) ||
+          isa<VPWidenPointerInductionRecipe>(&Phi))
+        return true;
+    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+             vp_depth_first_deep(Plan.getVectorLoopRegion())))
+      for (VPRecipeBase &Recipe : *VPBB)
+        if (auto *VecPtrR = dyn_cast<VPVectorPointerRecipe>(&Recipe))
+          if (VecPtrR->isReverse())
+            return true;
+    return false;
+  };
+  if (HasAnyRuntimeVFUserInLoop(Plan))
+    return false;
+  return true;
+}
+
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
                                                         ElementCount MaxVF) {
   assert(OrigLoop->isInnermost() && "Inner loop expected.");
@@ -8553,8 +8576,12 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
             *Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
       VPlanTransforms::optimize(*Plan, *PSE.getSE());
       // TODO: try to put it close to addActiveLaneMask().
-      if (CM.foldTailWithEVL())
+      if (CM.foldTailWithEVL()) {
+        // Don't generate plan if the plan is not EVL-compatible
+        if (!isCompatibleToEVLTransform(*Plan))
+          break;
         VPlanTransforms::addExplicitVectorLength(*Plan);
+      }
       assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
       VPlans.push_back(std::move(Plan));
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c74329a0bcc4a..94c342089fbdc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1576,6 +1576,7 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags {
 
   void execute(VPTransformState &State) override;
 
+  bool isReverse() { return IsReverse; }
   bool onlyFirstLaneUsed(const VPValue *Op) const override {
     assert(is_contained(operands(), Op) &&
            "Op must be an operand of the recipe");
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
new file mode 100644
index 0000000000000..1b652fdda9e44
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll
@@ -0,0 +1,103 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
+
+; Check loops having VPWidenIntOrFpInductionRecipe
+define void @foo(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT:    store i64 [[IV]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+  store i64 %iv, ptr %arrayidx, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; Check loops having VPWidenPointerInductionRecipe
+define void @foo2(ptr noalias %a, ptr noalias %b, i64 %N) {
+; CHECK-LABEL: define void @foo2(
+; CHECK-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[B]], [[ENTRY]] ]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 8
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
+; CHECK-NEXT:    store ptr [[ADDR]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %addr = phi ptr [ %incdec.ptr, %for.body ], [ %b, %entry ]
+  %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 8
+  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
+  store ptr %addr, ptr %arrayidx, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; Check loops having VPVectorPointerRecipe that access in reverse order
+define void @foo3(ptr noalias %a, i64 %N) {
+; CHECK-LABEL: define void @foo3(
+; CHECK-SAME: ptr noalias [[A:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ], [ [[A]], [[ENTRY]] ]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[ADDR]], i64 -4
+; CHECK-NEXT:    store i64 [[N]], ptr [[ADDR]], align 8
+; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %addr = phi ptr [ %incdec.ptr, %for.body ], [ %a, %entry ]
+  %incdec.ptr = getelementptr inbounds i8, ptr %addr, i64 -4
+  store i64 %N, ptr %addr, align 8
+  %iv.next = add nuw nsw i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
index ae01bdd371106..a52da79ee3963 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-gather-scatter.ll
@@ -12,66 +12,18 @@
 define void @gather_scatter(ptr noalias %in, ptr noalias %out, ptr noalias %index, i64 %n) {
 ; IF-EVL-LABEL: @gather_scatter(
 ; IF-EVL-NEXT:  entry:
-; IF-EVL-NEXT:    [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT:    [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT:    br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; IF-EVL:       vector.ph:
-; IF-EVL-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 2
-; IF-EVL-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 2
-; IF-EVL-NEXT:    [[TMP8:%.*]] = sub i64 [[TMP7]], 1
-; IF-EVL-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP8]]
-; IF-EVL-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
-; IF-EVL-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; IF-EVL-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 2
-; IF-EVL-NEXT:    [[TMP11:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
-; IF-EVL-NEXT:    [[TMP12:%.*]] = add <vscale x 2 x i64> [[TMP11]], zeroinitializer
-; IF-EVL-NEXT:    [[TMP13:%.*]] = mul <vscale x 2 x i64> [[TMP12]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
-; IF-EVL-NEXT:    [[INDUCTION:%.*]] = add <vscale x 2 x i64> zeroinitializer, [[TMP13]]
-; IF-EVL-NEXT:    [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
-; IF-EVL-NEXT:    [[TMP16:%.*]] = mul i64 1, [[TMP15]]
-; IF-EVL-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP16]], i64 0
-; IF-EVL-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
-; IF-EVL-NEXT:    br label [[VECTOR_BODY:%.*]]
-; IF-EVL:       vector.body:
-; IF-EVL-NEXT:    [[INDEX1:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[VEC_IND:%.*]] = phi <vscale x 2 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
-; IF-EVL-NEXT:    [[TMP17:%.*]] = sub i64 [[N]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT:    [[TMP18:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[TMP17]], i32 2, i1 true)
-; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], <vscale x 2 x i64> [[VEC_IND]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 2 x i64> @llvm.vp.gather.nxv2i64.nxv2p0(<vscale x 2 x ptr> align 8 [[TMP20]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP21:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 2 x float> @llvm.vp.gather.nxv2f32.nxv2p0(<vscale x 2 x ptr> align 4 [[TMP21]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP22:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], <vscale x 2 x i64> [[WIDE_MASKED_GATHER]]
-; IF-EVL-NEXT:    call void @llvm.vp.scatter.nxv2f32.nxv2p0(<vscale x 2 x float> [[WIDE_MASKED_GATHER2]], <vscale x 2 x ptr> align 4 [[TMP22]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP18]])
-; IF-EVL-NEXT:    [[TMP23:%.*]] = zext i32 [[TMP18]] to i64
-; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add i64 [[TMP23]], [[EVL_BASED_IV]]
-; IF-EVL-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX1]], [[TMP10]]
-; IF-EVL-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
-; IF-EVL-NEXT:    [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; IF-EVL-NEXT:    br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
-; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
-; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
-; IF-EVL-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX]], i64 [[INDVARS_IV]]
-; IF-EVL-NEXT:    [[TMP25:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
-; IF-EVL-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN]], i64 [[TMP25]]
-; IF-EVL-NEXT:    [[TMP26:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
-; IF-EVL-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT]], i64 [[TMP25]]
-; IF-EVL-NEXT:    store float [[TMP26]], ptr [[ARRAYIDX7]], align 4
+; IF-EVL-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; IF-EVL-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[INDEX:%.*]], i64 [[INDVARS_IV]]
+; IF-EVL-NEXT:    [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
+; IF-EVL-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds float, ptr [[IN:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT:    [[TMP1:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
+; IF-EVL-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, ptr [[OUT:%.*]], i64 [[TMP0]]
+; IF-EVL-NEXT:    store float [[TMP1]], ptr [[ARRAYIDX7]], align 4
 ; IF-EVL-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
-; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; IF-EVL-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N:%.*]]
+; IF-EVL-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
 ; IF-EVL:       for.end:
 ; IF-EVL-NEXT:    ret void
 ;

arcbbb added a commit to arcbbb/llvm-project that referenced this pull request May 15, 2024
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for llvm#92092
arcbbb added a commit to arcbbb/llvm-project that referenced this pull request May 15, 2024
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for llvm#92092
@arcbbb arcbbb force-pushed the check-evl-compat branch from b393ff1 to 07a62be Compare May 15, 2024 07:33
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Vector loops generated from the EVL transform may experience issues on architectures where EVL differs from RuntimeVF at the second-to-last iteration.

For the part above, I'd use the same wording as the commit in addExplicitVectorLength

This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the vectorizer stop to generate the plan.

the vectorizer stop to generate the plan -> the plan is discarded?

arcbbb added a commit that referenced this pull request May 24, 2024
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for
#92092
arcbbb added 5 commits May 24, 2024 08:26
Vector loops generated from the EVL transform may experience issues on
architectures where EVL differs from RuntimeVF at the second-to-last
iteration.

This patch introduces a check before applying EVL transform. If any recipes in
loop rely on RuntimeVF, the vectorizer stop to generate the plan.
@arcbbb arcbbb force-pushed the check-evl-compat branch from 64e8908 to 3a85956 Compare May 24, 2024 15:37
@arcbbb arcbbb merged commit 0338c55 into llvm:main May 25, 2024
7 checks passed
@arcbbb arcbbb deleted the check-evl-compat branch May 27, 2024 06:07
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request May 27, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 5, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 10, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 14, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 19, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 20, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 24, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jun 28, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jul 8, 2024
Mel-Chen added a commit to Mel-Chen/llvm-project that referenced this pull request Jul 15, 2024
@wangpc-pp
Copy link
Contributor

I found some loops can't be vectorized and finally I dug to this PR. :-)
I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

@zixuan-wu
Copy link
Contributor

zixuan-wu commented Mar 20, 2025

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

@wangpc-pp
Copy link
Contributor

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

The problem I met is because there are some VPWidenPointerInductionRecipes and it can't be handled when using EVL tail folding. I don't see any problem when using Data or DataWithoutLaneMask.

@zixuan-wu
Copy link
Contributor

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

Do you find any other cases with functional/performance issue if it bails out to DataWithoutLaneMask style mask instead of discarding the VPlan? Is it better to bail out to TailFoldingStyle::Data style as what getPreferredTailFoldingStyle prefers when V extension enables. If it bails out to TailFoldingStyle::Data style, then useActiveLaneMask should do some work at tryToBuildVPlanWithVPRecipes.

The problem I met is because there are some VPWidenPointerInductionRecipes and it can't be handled when using EVL tail folding. I don't see any problem when using Data or DataWithoutLaneMask.

If bail out to DataWithoutLaneMask, it's still a kind of tail-folding. Is it better than scalar epilogue? As what 'predicate-else-scalar-epilogue' says, create scalar epilogue if tail folding fails. So the question is whether it's failure tail-folding when DataWithEVL style fails and DataWithoutLaneMask succeed.

@arcbbb
Copy link
Contributor Author

arcbbb commented Mar 21, 2025

I found some loops can't be vectorized and finally I dug to this PR. :-) I simply commented out these lines in this PR and the loop can be vectorized correctly. What's the status now? Are there any things I missed?

It depends on the two PRs

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)
described in https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-config

6.3 Constraints on Setting vl
The vset{i}vl{i} instructions first set VLMAX according to their vtype argument, then set vl obeying the following constraints:

vl = AVL if AVL ≤ VLMAX

ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

vl = VLMAX if AVL ≥ (2 * VLMAX)

@wangpc-pp
Copy link
Contributor

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

@arcbbb Actually I don't understand this, can you explain it to me?

@arcbbb
Copy link
Contributor Author

arcbbb commented Mar 21, 2025

Enabling the vectorization causes issues on HW that implements ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)

@arcbbb Actually I don't understand this, can you explain it to me?

Under the constraint ceil(AVL / 2) ≤ VL ≤ VLMAX when AVL < 2 * VLMAX:
Assuming VLMAX = 16 and AVL = 20, the loop would require 2 iterations.
The values returned by vsetvl in those iterations could be either (16, 4) or (10, 10).
This is HW implementation-defined.

And the increment of widen iv (ptr or integer /fp) is using VLMax instead of VL. it doesn't work on HW implementations which goes (10, 10).
As the diff change shows

; CHECK-NEXT:    [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
; CHECK-NEXT:    [[TMP16:%.*]] = mul i64 1, [[TMP15]]
; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[TMP16]], i64 0
; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
; CHECK:       vector.body:
; .....
; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[DOTSPLAT]]

@wangpc-pp
Copy link
Contributor

@arcbbb Get it! Thanks!

@zixuan-wu
Copy link
Contributor

So after those two enhancement, then it will not discard the VPlan?

@arcbbb
Copy link
Contributor Author

arcbbb commented Mar 21, 2025

So after those two enhancement, then it will not discard the VPlan?

Yes, as long as EVL is applied properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants