Skip to content

[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. #101641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
49ca386
Remove the FIXME
Mel-Chen Jun 21, 2024
41823f0
VPlan Pattern match for Instruction::Select
Mel-Chen Jun 26, 2024
d4307d0
Init implement TU select
Mel-Chen Jul 10, 2024
453bc3b
Transform select(HeaderMask, LHS, RHS)
Mel-Chen Jul 10, 2024
5a4aba6
Add vplan test for cond reduction with basic block.
Mel-Chen Aug 1, 2024
7b21944
Revert "Add vplan test for cond reduction with basic block."
Mel-Chen Aug 2, 2024
8660beb
Sort code and comments.
Mel-Chen Aug 2, 2024
fd5a5de
Force to predicate a reduction operation when fold by EVL.
Mel-Chen Aug 12, 2024
6052fa8
Rebase and update test case.
Mel-Chen Aug 30, 2024
32556dd
Rebase and updated
Mel-Chen Oct 3, 2024
5636af8
Updated VPlanVerifier
Mel-Chen Oct 3, 2024
ca9e34a
Updated test case
Mel-Chen Oct 3, 2024
921931a
Updated VPlanAnalysis
Mel-Chen Oct 3, 2024
c6a389d
Move Ctx into transformRecipestoEVLRecipes
Mel-Chen Oct 7, 2024
a52d3de
Comment for all true condition. nfc
Mel-Chen Oct 7, 2024
9f4f2b8
Remove unnecessary code
Mel-Chen Oct 7, 2024
31c3e70
NEED_FIX: Requires cost of vp.merge in RISCV TTI
Mel-Chen Oct 14, 2024
efebb1d
Emit VPWidenIntrinsicRecipe
Mel-Chen Oct 14, 2024
7cce658
Update VPlanVerify
Mel-Chen Oct 14, 2024
d88b12b
Update test case
Mel-Chen Oct 14, 2024
36c857b
Revert "NEED_FIX: Requires cost of vp.merge in RISCV TTI"
Mel-Chen Oct 18, 2024
bc20d2a
Revert "Update VPlanVerify"
Mel-Chen Oct 18, 2024
ba74ffd
Rebase and update test cases
Mel-Chen Oct 18, 2024
02475a0
Revert "Updated VPlanVerifier"
Mel-Chen Oct 18, 2024
5b2ee05
Revert "Updated VPlanAnalysis"
Mel-Chen Oct 18, 2024
f35fbe6
Remove VPInstruction::MergeUntilPivot
Mel-Chen Oct 18, 2024
7445806
Remove irrelevant updates
Mel-Chen Oct 18, 2024
2f9c256
Rebase
Mel-Chen Oct 22, 2024
908250c
doc the code.
Mel-Chen Oct 22, 2024
d020ce0
drop the blank line
Mel-Chen Oct 25, 2024
012079e
Refine constructor of VPWidenIntrinsicRecipe
Mel-Chen Oct 28, 2024
f5ad37e
Refine the comments according to Florian's comment
Mel-Chen Oct 29, 2024
8d9daf6
Updated test case
Mel-Chen Oct 30, 2024
e8cb377
Rebase and update test case
Mel-Chen Nov 5, 2024
53d9ac6
Updated comment
Mel-Chen Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1484,6 +1484,18 @@ class LoopVectorizationCostModel {
return InLoopReductions.contains(Phi);
}

/// Returns true if the predicated reduction select should be used to set the
/// incoming value for the reduction phi.
bool usePredicatedReductionSelect(unsigned Opcode, Type *PhiTy) const {
// Force to use predicated reduction select since the EVL of the
// second-to-last iteration might not be VF*UF.
if (foldTailWithEVL())
return true;
return PreferPredicatedReductionSelect ||
TTI.preferPredicatedReductionSelect(
Opcode, PhiTy, TargetTransformInfo::ReductionFlags());
}

/// Estimate cost of an intrinsic call instruction CI if it were vectorized
/// with factor VF. Return the cost of the instruction, including
/// scalarization overhead if it's needed.
Expand Down Expand Up @@ -9453,10 +9465,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
cast<VPInstruction>(&U)->getOpcode() ==
VPInstruction::ComputeReductionResult;
});
if (PreferPredicatedReductionSelect ||
TTI.preferPredicatedReductionSelect(
PhiR->getRecurrenceDescriptor().getOpcode(), PhiTy,
TargetTransformInfo::ReductionFlags()))
if (CM.usePredicatedReductionSelect(
PhiR->getRecurrenceDescriptor().getOpcode(), PhiTy))
Comment on lines +9468 to +9469
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to adjust the reduction phi recipe when introducing EVL recipes instead?

Copy link
Contributor Author

@Mel-Chen Mel-Chen Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be adjusted in the EVL transformation. patch 8a3982f
But I don't recommend this. Such an implementation is more complicated, especially after the non-predicated reduction select may sink out of the vectorized loop in the future VPlan transformation.
Could you point out why you want to adjust it in the EVL transformation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, everything EVL related is applied during the transform that introduces EVL recipes; one potential issue is that we assume EVL is used here, but the transform may not apply.

Don't have any strong preferences, doing it later indeed seems to require some extra work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've raised a good point.
Adjusting the reduction phi too early can indeed cause some issues. Fortunately, this issue is related to performance rather than correctness. We can proceed with this approach for now and address this performance issue in a later patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. to me

PhiR->setOperand(1, NewExitingVPV);
}

Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -1672,6 +1672,12 @@ class VPWidenIntrinsicRecipe : public VPRecipeWithIRFlags {
!Attrs.hasFnAttr(Attribute::WillReturn);
}

VPWidenIntrinsicRecipe(Intrinsic::ID VectorIntrinsicID,
std::initializer_list<VPValue *> CallArguments,
Type *Ty, DebugLoc DL = {})
: VPWidenIntrinsicRecipe(VectorIntrinsicID,
ArrayRef<VPValue *>(CallArguments), Ty, DL) {}

~VPWidenIntrinsicRecipe() override = default;

VPWidenIntrinsicRecipe *clone() override {
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,17 @@ template <typename Class> struct bind_ty {
}
};

/// Match a specified VPValue.
struct specificval_ty {
const VPValue *Val;

specificval_ty(const VPValue *V) : Val(V) {}

bool match(VPValue *VPV) const { return VPV == Val; }
};

inline specificval_ty m_Specific(const VPValue *VPV) { return VPV; }

/// Match a specified integer value or vector of all elements of that
/// value. \p BitWidth optionally specifies the bitwidth the matched constant
/// must have. If it is 0, the matched constant can have any bitwidth.
Expand Down
32 changes: 22 additions & 10 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1442,8 +1442,11 @@ void VPlanTransforms::addActiveLaneMask(

/// Replace recipes with their EVL variants.
static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
using namespace llvm::VPlanPatternMatch;
Type *CanonicalIVType = Plan.getCanonicalIV()->getScalarType();
VPTypeAnalysis TypeInfo(CanonicalIVType);
LLVMContext &Ctx = CanonicalIVType->getContext();
SmallVector<VPValue *> HeaderMasks = collectAllHeaderMasks(Plan);
VPTypeAnalysis TypeInfo(Plan.getCanonicalIV()->getScalarType());
for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) {
for (VPUser *U : collectUsersRecursively(HeaderMask)) {
auto *CurRecipe = cast<VPRecipeBase>(U);
Expand Down Expand Up @@ -1480,7 +1483,23 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
TypeInfo.inferScalarType(Sel),
Sel->getDebugLoc());
})

.Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
VPValue *LHS, *RHS;
// Transform select with a header mask condition
// select(header_mask, LHS, RHS)
// into vector predication merge.
// vp.merge(all-true, LHS, RHS, EVL)
if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
m_VPValue(RHS))))
return nullptr;
// Use all true as the condition because this transformation is
// limited to selects whose condition is a header mask.
VPValue *AllTrue =
Plan.getOrAddLiveIn(ConstantInt::getTrue(Ctx));
return new VPWidenIntrinsicRecipe(
Intrinsic::vp_merge, {AllTrue, LHS, RHS, &EVL},
TypeInfo.inferScalarType(LHS), VPI->getDebugLoc());
})
.Default([&](VPRecipeBase *R) { return nullptr; });

if (!NewRecipe)
Expand Down Expand Up @@ -1553,14 +1572,7 @@ bool VPlanTransforms::tryAddExplicitVectorLength(
return isa<VPWidenIntOrFpInductionRecipe, VPWidenPointerInductionRecipe>(
&Phi);
});
// FIXME: Remove this once we can transform (select header_mask, true_value,
// false_value) into vp.merge.
bool ContainsOutloopReductions =
any_of(Header->phis(), [&](VPRecipeBase &Phi) {
auto *R = dyn_cast<VPReductionPHIRecipe>(&Phi);
return R && !R->isInLoop();
});
if (ContainsWidenInductions || ContainsOutloopReductions)
if (ContainsWidenInductions)
return false;

auto *CanonicalIVPHI = Plan.getCanonicalIV();
Expand Down
Loading