Skip to content

Commit 745bf6c

Browse files
committed
[LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069
1 parent 5446ec8 commit 745bf6c

File tree

8 files changed

+444
-130
lines changed

8 files changed

+444
-130
lines changed

llvm/include/llvm/Analysis/IVDescriptors.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,11 @@ class RecurrenceDescriptor {
220220
/// Returns true if all source operands of the recurrence are SExtInsts.
221221
bool isSigned() { return IsSigned; }
222222

223+
/// Attempts to find a chain of operations from Phi to LoopExitInst that can
224+
/// be treated as a set of reductions instructions for in-loop reductions.
225+
SmallVector<Instruction *, 4> getReductionOpChain(PHINode *Phi,
226+
Loop *L) const;
227+
223228
private:
224229
// The starting value of the recurrence.
225230
// It does not have to be zero!

llvm/lib/Analysis/IVDescriptors.cpp

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -437,6 +437,8 @@ bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurrenceKind Kind,
437437
// instructions that are a part of the reduction. The vectorizer cost
438438
// model could then apply the recurrence type to these instructions,
439439
// without needing a white list of instructions to ignore.
440+
// This may also be useful for the inloop reductions, if it can be
441+
// kept simple enough.
440442
collectCastsToIgnore(TheLoop, ExitInstruction, RecurrenceType, CastInsts);
441443
}
442444

@@ -796,6 +798,76 @@ unsigned RecurrenceDescriptor::getRecurrenceBinOp(RecurrenceKind Kind) {
796798
}
797799
}
798800

801+
SmallVector<Instruction *, 4>
802+
RecurrenceDescriptor::getReductionOpChain(PHINode *Phi, Loop *L) const {
803+
SmallVector<Instruction *, 4> ReductionOperations;
804+
unsigned RedOp = getRecurrenceBinOp(Kind);
805+
806+
// Search down from the Phi to the LoopExitInstr, looking for instructions
807+
// with a single user of the correct type for the reduction.
808+
809+
// Note that we check that the type of the operand is correct for each item in
810+
// the chain, including the last (the loop exit value). This can come up from
811+
// sub, which would otherwise be treated as an add reduction. MinMax also need
812+
// to check for a pair of icmp/select, for which we use getNextInstruction and
813+
// isCorrectOpcode functions to step the right number of instruction, and
814+
// check the icmp/select pair.
815+
// FIXME: We also do not attempt to look through Phi/Select's yet, which might
816+
// be part of the reduction chain, or attempt to looks through And's to find a
817+
// smaller bitwidth. Subs are also currently not allowed (which are usually
818+
// treated as part of a add reduction) as they are expected to generally be
819+
// more expensive than out-of-loop reductions, and need to be costed more
820+
// carefully.
821+
unsigned ExpectedUses = 1;
822+
if (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp)
823+
ExpectedUses = 2;
824+
825+
auto getNextInstruction = [&](Instruction *Cur) {
826+
if (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp) {
827+
// We are expecting a icmp/select pair, which we go to the next select
828+
// instruction if we can. We already know that Cur has 2 uses.
829+
if (isa<SelectInst>(*Cur->user_begin()))
830+
return cast<Instruction>(*Cur->user_begin());
831+
else
832+
return cast<Instruction>(*std::next(Cur->user_begin()));
833+
}
834+
return cast<Instruction>(*Cur->user_begin());
835+
};
836+
auto isCorrectOpcode = [&](Instruction *Cur) {
837+
if (RedOp == Instruction::ICmp || RedOp == Instruction::FCmp) {
838+
Value *LHS, *RHS;
839+
return SelectPatternResult::isMinOrMax(
840+
matchSelectPattern(Cur, LHS, RHS).Flavor);
841+
}
842+
return Cur->getOpcode() == RedOp;
843+
};
844+
845+
// The loop exit instruction we check first (as a quick test) but add last. We
846+
// check the opcode is correct (and dont allow them to be Subs) and that they
847+
// have expected to have the expected number of uses. They will have one use
848+
// from the phi and one from a LCSSA value, no matter the type.
849+
if (!isCorrectOpcode(LoopExitInstr) || !LoopExitInstr->hasNUses(2))
850+
return {};
851+
852+
// Check that the Phi has one (or two for min/max) uses.
853+
if (!Phi->hasNUses(ExpectedUses))
854+
return {};
855+
Instruction *Cur = getNextInstruction(Phi);
856+
857+
// Each other instruction in the chain should have the expected number of uses
858+
// and be the correct opcode.
859+
while (Cur != LoopExitInstr) {
860+
if (!isCorrectOpcode(Cur) || !Cur->hasNUses(ExpectedUses))
861+
return {};
862+
863+
ReductionOperations.push_back(Cur);
864+
Cur = getNextInstruction(Cur);
865+
}
866+
867+
ReductionOperations.push_back(Cur);
868+
return ReductionOperations;
869+
}
870+
799871
InductionDescriptor::InductionDescriptor(Value *Start, InductionKind K,
800872
const SCEV *Step, BinaryOperator *BOp,
801873
SmallVectorImpl<Instruction *> *Casts)

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ namespace llvm {
3434
class LoopVectorizationLegality;
3535
class LoopVectorizationCostModel;
3636
class PredicatedScalarEvolution;
37+
class VPRecipeBuilder;
3738

3839
/// VPlan-based builder utility analogous to IRBuilder.
3940
class VPBuilder {
@@ -294,6 +295,13 @@ class LoopVectorizationPlanner {
294295
/// according to the information gathered by Legal when it checked if it is
295296
/// legal to vectorize the loop. This method creates VPlans using VPRecipes.
296297
void buildVPlansWithVPRecipes(unsigned MinVF, unsigned MaxVF);
298+
299+
/// Adjust the recipes for any inloop reductions. The chain of instructions
300+
/// leading from the loop exit instr to the phi need to be converted to
301+
/// reductions, with one operand being vector and the other being the scalar
302+
/// reduction chain.
303+
void adjustRecipesForInLoopReductions(VPlanPtr &Plan,
304+
VPRecipeBuilder &RecipeBuilder);
297305
};
298306

299307
} // namespace llvm

0 commit comments

Comments
 (0)