Skip to content

[LV] Improve AnyOf reduction codegen. #78304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions llvm/include/llvm/Transforms/Utils/LoopUtils.h
Original file line number Diff line number Diff line change
Expand Up @@ -372,15 +372,6 @@ RecurKind getMinMaxReductionRecurKind(Intrinsic::ID RdxID);
/// Returns the comparison predicate used when expanding a min/max reduction.
CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);

/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we
/// are trying to match. In this pattern, we are only ever selecting between two
/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a
/// loop invariant value. If any of lane value in \p Left, \p Right is not equal
/// to \p StartVal, select the loop invariant value. This is done by selecting
/// \p Right iff \p Left is equal to \p StartVal.
Value *createAnyOfOp(IRBuilderBase &Builder, Value *StartVal, RecurKind RK,
Value *Left, Value *Right);

/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
/// The Builder's fast-math-flags must be set to propagate the expected values.
Value *createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
Expand Down
24 changes: 6 additions & 18 deletions llvm/lib/Transforms/Utils/LoopUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1034,15 +1034,6 @@ CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
}
}

Value *llvm::createAnyOfOp(IRBuilderBase &Builder, Value *StartVal,
RecurKind RK, Value *Left, Value *Right) {
if (auto VTy = dyn_cast<VectorType>(Left->getType()))
StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
}

Value *llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
Value *Right) {
Type *Ty = Left->getType();
Expand Down Expand Up @@ -1151,16 +1142,13 @@ Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder, Value *Src,
NewVal = SI->getTrueValue();
}

// Create a splat vector with the new value and compare this to the vector
// we want to reduce.
ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
Value *Right = Builder.CreateVectorSplat(EC, InitVal);
Value *Cmp =
Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");

// If any predicate is true it means that we want to select the new value.
Cmp = Builder.CreateOrReduce(Cmp);
return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
Value *AnyOf =
Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
// The compares in the loop may yield poison, which propagates through the
// bitwise ORs. Freeze it here before the condition is used.
AnyOf = Builder.CreateFreeze(AnyOf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further refactoring thought: the AnyOf boolean reduction completes with the CreateOrReduce(), and freezing it. Similar to a plain "result |= value[i]" OR reduction, on i1 values, where freezing may also be needed? The subsequent select deserves a separate recipe, which post-processes the result of the AnyOf reduction and also depends on the "IfAny" and "Else" (or "Start" and "Other") Live-In VP/Values directly, rather than looking for them here and now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to a plain "result |= value[i]" OR reduction,

Freeze won't be needed in that case; if there's already a binary OR in the input, poison from the compare gets already propagated. It is only needed when converting from the select form (which doesn't propagate poison from the condition to its result)

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

return Builder.CreateSelect(AnyOf, NewVal, InitVal, "rdx.select");
}

Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ class VPBuilder {
public:
VPBuilder() = default;
VPBuilder(VPBasicBlock *InsertBB) { setInsertPoint(InsertBB); }
VPBuilder(VPRecipeBase *InsertPt) { setInsertPoint(InsertPt); }

/// Clear the insertion point: created instructions will not be inserted into
/// a block.
Expand Down
46 changes: 43 additions & 3 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7405,7 +7405,6 @@ static void createAndCollectMergePhiForReduction(
auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();

TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
Value *FinalValue =
State.get(RedResult, VPIteration(State.UF - 1, VPLane::getFirstLane()));
auto *ResumePhi =
Expand All @@ -7430,7 +7429,7 @@ static void createAndCollectMergePhiForReduction(
BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
Incoming);
else
BCBlockPhi->addIncoming(ReductionStartValue, Incoming);
BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
}

auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
Expand Down Expand Up @@ -8854,6 +8853,10 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// A ComputeReductionResult recipe is added to the middle block, also for
// in-loop reductions which compute their result in-loop, because generating
// the subsequent bc.merge.rdx phi is driven by ComputeReductionResult recipes.
//
// Adjust AnyOf reductions; replace the reduction phi for the selected value
// with a boolean reduction phi node to check if the condition is true in any
// iteration. The final value is selected by the final ComputeReductionResult.
void LoopVectorizationPlanner::adjustRecipesForReductions(
VPBasicBlock *LatchVPBB, VPlanPtr &Plan, VPRecipeBuilder &RecipeBuilder,
ElementCount MinVF) {
Expand Down Expand Up @@ -9027,6 +9030,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
continue;

const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
// Adjust AnyOf reductions; replace the reduction phi for the selected value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjustRecipesForReductions() is getting excessively long, should be refactored. Its documentation above should (also) include the adjustment of AnyOf reductions described here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, probably best as followup?

// with a boolean reduction phi node to check if the condition is true in
// any iteration. The final value is selected by the final
// ComputeReductionResult.
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
auto *Select = cast<VPRecipeBase>(*find_if(PhiR->users(), [](VPUser *U) {
return isa<VPWidenSelectRecipe>(U) ||
(isa<VPReplicateRecipe>(U) &&
cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
Instruction::Select);
}));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert(Select && "a meaningful error message");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast should already assert to check for non-null.

VPValue *Cmp = Select->getOperand(0);
// If the compare is checking the reduction PHI node, adjust it to check
// the start value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, either the true or false values of the select should use PhiR, but only it? The condition of an AnyOf reduction should be any boolean-expression predicate that depends on the current iteration only, i.e., should be a recipe (if it's live-in the entire select should be LICM'd), independent of AnyOf's PhiR, neither directly nor indirectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, AnyOf reduction are also formed for code like

define i32 @select_i32_from_icmp_same_inputs(i32 %a, i32 %b, i64 %n) {                                   
entry:
  br label %for.body

for.body:                                      ; preds = %entry, %for.body
  %0 = phi i64 [ 0, %entry ], [ %4, %for.body ]
  %1 = phi i32 [ %a, %entry ], [ %3, %for.body ]
  %2 = icmp eq i32 %1, 3
  %3 = select i1 %2, i32 %1, i32 %b
  %4 = add nuw nsw i64 %0, 1
  %5 = icmp eq i64 %4, %n
  br i1 %5, label %exit, label %for.body

exit:                                     ; preds = %for.body
  ret i32 %3
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayalz The current recognition for AnyOf reduction involves a set of cmp- select instruction and cannot identify cases with only a single select. This requires expanding isAnyOfPattern or applying conditional reduction to model the AnyOf idiom for support.

if (VPRecipeBase *CmpR = Cmp->getDefiningRecipe()) {
for (unsigned I = 0; I != CmpR->getNumOperands(); ++I)
if (CmpR->getOperand(I) == PhiR)
CmpR->setOperand(I, PhiR->getStartValue());
}
VPBuilder::InsertPointGuard Guard(Builder);
Builder.setInsertPoint(Select);

// If the true value of the select is the reduction phi, the new value is
// selected if the negated condition is true in any iteration.
if (Select->getOperand(1) == PhiR)
Cmp = Builder.createNot(Cmp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Mel pointed out, this case of "Red = cond ? PhiR : Other" where a single false cond suffices for the result to be Other, could be considered an AllOf reduction starting with true using createAnd() instead of createOr(), resulting in "AND(conds) ? Start : Other".

Negating cond swaps the operands and translates into the "Red = !cond ? Other : PhiR" form of an AnyOf reduction starting with false and using createOr(), where a single true !cond suffices for the result to be Other, resulting in "OR(!conds) ? Other : Start".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be handled like that, but I think then we would need to support both patterns here and also in codegen, so negating seems simpler (and the negation should be removable by instcombine). Left as is for now, but happy to adjust if needed. But then it should probably be modeled as AllOf directly in the reduction descriptor.

VPValue *Or = Builder.createOr(PhiR, Cmp);
Select->getVPSingleValue()->replaceAllUsesWith(Or);

// Convert the reduction phi to operate on bools.
PhiR->setOperand(0, Plan->getVPValueOrAddLiveIn(ConstantInt::getFalse(
OrigLoop->getHeader()->getContext())));
}

// If tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the beginning of the
// dedicated latch block.
Expand Down Expand Up @@ -9059,7 +9097,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// then extend the loop exit value to enable InstCombine to evaluate the
// entire expression in the smaller type.
Type *PhiTy = PhiR->getStartValue()->getLiveInIRValue()->getType();
if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
!RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be it better to truncate AnyOf reductions to smaller type (boolean) here instead of above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only reach this path now because we adjust PhiR's start value to a bool. It requires more than plain truncates as below, so I think it's probably worth to keep it separate. I think it also needs handling before introducing selects for tail-folding; otherwise those selects would also need updating.

assert(!PhiR->isInLoop() && "Unexpected truncated inloop reduction!");
Type *RdxTy = RdxDesc.getRecurrenceType();
auto *Trunc =
Expand Down
13 changes: 6 additions & 7 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,8 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
// Reduce all of the unrolled parts into a single vector.
Value *ReducedPartRdx = RdxParts[0];
unsigned Op = RecurrenceDescriptor::getOpcode(RK);
if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
Op = Instruction::Or;

if (PhiR->isOrdered()) {
ReducedPartRdx = RdxParts[State.UF - 1];
Expand All @@ -454,19 +456,16 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
if (Op != Instruction::ICmp && Op != Instruction::FCmp)
ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
TrackingVH<Value> ReductionStartValue =
RdxDesc.getRecurrenceStartValue();
ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
ReducedPartRdx, RdxPart);
} else
else
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
}
}

// Create the reduction after the loop. Note that inloop reductions create
// the target reduction in the loop using a Reduction recipe.
if (State.VF.isVector() && !PhiR->isInLoop()) {
if ((State.VF.isVector() ||
RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
!PhiR->isInLoop()) {
ReducedPartRdx =
createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As raised above, it may be better to have ComputeReductionResult recipe take care only of reducing AnyOf to a boolean here, followed by a Select recipe to chose between Start and Other live-in values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

// If the reduction can be performed in a smaller type, we need to extend
Expand Down
Loading