-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV] Account for vp_merge in out of loop EVL reductions in legacy cost model #115903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Account for vp_merge in out of loop EVL reductions in legacy cost model #115903
Conversation
…t model In llvm#101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes. Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017. Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost. It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that.
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesIn #101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes. Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017. Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost. It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that. Full diff: https://github.com/llvm/llvm-project/pull/115903.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1ebc62f9843905..5c3afbe5214feb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6567,6 +6567,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
CmpInst::BAD_ICMP_PREDICATE, CostKind);
}
+ // When tail folding with EVL, if the phi is part of an out of loop reduction
+ // then it will be transformed into a wide vp_merge.
+ if (VF.isVector() && foldTailWithEVL() &&
+ Legal->getReductionVars().contains(Phi) && !isInLoopReduction(Phi)) {
+ IntrinsicCostAttributes ICA(
+ Intrinsic::vp_merge, ToVectorTy(Phi->getType(), VF),
+ {ToVectorTy(Type::getInt1Ty(Phi->getContext()), VF)});
+ return TTI.getIntrinsicInstrCost(ICA, CostKind);
+ }
+
return TTI.getCFInstrCost(Instruction::PHI, CostKind);
}
case Instruction::UDiv:
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction-cost.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction-cost.ll
new file mode 100644
index 00000000000000..6d20731d2502b4
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction-cost.ll
@@ -0,0 +1,34 @@
+; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
+; RUN: -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s 2>&1 | FileCheck %s \
+; RUN: --check-prefix=EVL
+
+; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s 2>&1 | FileCheck %s \
+; RUN: --check-prefix=NO-EVL
+
+; EVL: Cost of 2 for VF vscale x 4: WIDEN-INTRINSIC vp<%{{.+}}> = call llvm.vp.merge(ir<true>, ir<%add>, ir<%rdx>, vp<%{{.+}}>)
+; EVL: LV: Found an estimated cost of 2 for VF vscale x 4 For instruction: %rdx = phi i32 [ %start, %entry ], [ %add, %for.body ]
+
+; NO-EVL: Cost of 0 for VF vscale x 4: EMIT vp<%{{.+}}> = select vp<%active.lane.mask>, ir<%add>, ir<%rdx>
+; NO-EVL: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: %rdx = phi i32 [ %start, %entry ], [ %add, %for.body ]
+
+define i32 @add(ptr %a, i64 %n, i32 %start) {
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %rdx = phi i32 [ %start, %entry ], [ %add, %for.body ]
+ %arrayidx = getelementptr inbounds i32, ptr %a, i64 %iv
+ %0 = load i32, ptr %arrayidx, align 4
+ %add = add nsw i32 %0, %rdx
+ %iv.next = add nuw nsw i64 %iv, 1
+ %exitcond.not = icmp eq i64 %iv.next, %n
+ br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end:
+ ret i32 %add
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Please make sure the formatting is fixed before landing
|
||
; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \ | ||
; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \ | ||
; RUN: -mtriple=riscv64 -mattr=+v -S < %s 2>&1 | FileCheck %s \ | ||
; RUN: --check-prefix=NO-EVL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just testing the EVL cost here, no need to have this second run line?
%exitcond.not = icmp eq i64 %iv.next, %n | ||
br i1 %exitcond.not, label %for.end, label %for.body | ||
|
||
for.end: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for.end: | |
exit: |
slightly shorter name
entry: | ||
br label %for.body | ||
|
||
for.body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for.body: | |
loop: |
nit: slightly shorter name
In #101641, support for out of loop reductions with EVL tail folding was added by transforming selects to vp_merges in transformRecipestoEVLRecipes.
Whilst the select was previously free, the vp_merge wasn't and incurs a cost on RISC-V with the VPlan cost model. But this diverged from the legacy cost model and caused the "VPlan cost model and legacy cost model disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU 2017.
Neither the select nor vp_merge recipes from the VPlan exist in the underlying instructions, so I thought it would make the most sense to fix this by adding the cost to the underlying phi instruction in getInstructionCost.
It's worth noting that on RISC-V this vp_merge won't actually generate any instructions because the mask is all true, and will be folded away. So we should update the cost model at some point to reflect that.