-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[VPlan] Use VPInstruction for uniform binops. #141429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -8395,7 +8395,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI, | |||||
return new VPHistogramRecipe(Opcode, HGramOps, HI->Store->getDebugLoc()); | ||||||
} | ||||||
|
||||||
VPReplicateRecipe * | ||||||
VPSingleDefRecipe * | ||||||
VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands, | ||||||
VFRange &Range) { | ||||||
bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange( | ||||||
|
@@ -8453,6 +8453,21 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands, | |||||
assert((Range.Start.isScalar() || !IsUniform || !IsPredicated || | ||||||
(Range.Start.isScalable() && isa<IntrinsicInst>(I))) && | ||||||
"Should not predicate a uniform recipe"); | ||||||
if (IsUniform && !IsPredicated) { | ||||||
VPInstruction *VPI = nullptr; | ||||||
if (Instruction::isCast(I->getOpcode())) { | ||||||
VPI = new VPInstructionWithType(I->getOpcode(), Operands, I->getType(), | ||||||
VPIRFlags(*I), I->getDebugLoc(), | ||||||
I->getName()); | ||||||
Comment on lines
+8459
to
+8461
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could VPInstructionWithType also represent binary ops which are uniform - thereby placing IsSingleScalar alongside Type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that could work, the only question is what to do with |
||||||
} else if (Instruction::isBinaryOp(I->getOpcode())) { | ||||||
VPI = new VPInstruction(I->getOpcode(), Operands, VPIRFlags(*I), | ||||||
I->getDebugLoc(), I->getName(), true); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also done in #140623 |
||||||
} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. else - a uniform replicate recipe still takes care of all other cases? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||||||
if (VPI) { | ||||||
VPI->setUnderlyingValue(I); | ||||||
return VPI; | ||||||
} | ||||||
} | ||||||
auto *Recipe = new VPReplicateRecipe(I, Operands, IsUniform, BlockInMask, | ||||||
VPIRMetadata(*I, LVer)); | ||||||
return Recipe; | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -876,6 +876,9 @@ class VPInstruction : public VPRecipeWithIRFlags, | |
public VPUnrollPartAccessor<1> { | ||
friend class VPlanSlp; | ||
|
||
/// True if the VPInstruction produces a single scalar value. | ||
bool IsSingleScalar; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Being single scalar is conceptually part of a VPValue's type, complementing its element type. A non single scalar value is either a vector of VF (fixed, scalable, EVL) elements or a collection of VF scalars, which may also deserve clear indication(?), potentially extended in the future to support other widths considering SLP starting with interleave groups, mixed vectorization, etc. VPlan keeps VPValues agnostic of their Type, relying on VPTypeAnalysis to infer the element type, taking care of caching it. Only recipes that change their type contain that information, as in VPInstructionWithType casts, loads, calls, and initial VPValues. Narrowing bit widths introduces such casts to convey newly inferred element types. Should single-scalar rely on similar analysis with only initial VPValues, and recipes that change it (vector to scalar reduction and extraction, broadcast and build steps), know about it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think in some cases it can also be seens as property of the producer (e.g. if we proved uniformity for the VF for the specific operation + operands). In those cases, whether a single scalar is produced doesn't depend on either users or operands (e.g. adding a new wide user doesn't change what is produced by the recipe). I think contrary to scalar types, whether the result is a single scalar is currently queried in many more places during various transforms and would be needed when executing each VPInstruction. The proposed patch would allow us to start the transition by using a similiar field to current VPReplicateRecipe and hence simplify VPReplicateRecipe to only handle the replicating case. Similarly we could also use VPInstruction for various VPWiden*Recipes. It might be worth consolidating and simplifying the existing recipe classes first, before adjusting how we reason/represent single-scalar/uniformity? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Indeed in some cases the producer knows it generates a single scalar, as in reducing a vector to a scalar and extracting a scalar from a vector.
Execution and cost arguably need complete type information - both number of elements and type of each element. In addition to the relative frequency of other usages, is the issue of cache-invalidation or updating the type information when recipes are added/removed/replaced/updated.
This is indeed a good incremental step forward, although replicate recipe still handles remaining uniform cases?
Would be good to make incremental progress alongside clarifying roadmap goal. If current representation is expected to be revisited, worth a note. I.e., reason about the inconsistency between element type information recorded where needed and propagated via type analysis, versus recording IsSingleScalar in each VPInstruction - all other VPSingleDefRecipe's are implicitly non single scalar? All other VPValues are implicitly single scalar - live-ins? Should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yep I think that would be good to re-visit at some point.
Most recipes w/o attached type can construct IR instruction w/o querying the type explicitly as opcode + operands is sufficient in most cases. For cost, both will be needed.
Yep there's still some way to go to complete the full transition. Would probably be good to start with one of the simplest cases first (casts, #140623), then binops, then the rest.
There are some recipes that are always scalar (e.g.
I think it at least in the first patch (casts, https://github.com/llvm/llvm-project/pull/1406230) it is too early, as we don't yet set it for all relevant opcodes . But once we set it for all relevant opcodes, |
||
|
||
public: | ||
/// VPlan opcodes, extending LLVM IR with idiomatics instructions. | ||
enum { | ||
|
@@ -966,7 +969,7 @@ class VPInstruction : public VPRecipeWithIRFlags, | |
|
||
VPInstruction(unsigned Opcode, ArrayRef<VPValue *> Operands, | ||
const VPIRFlags &Flags, DebugLoc DL = {}, | ||
const Twine &Name = ""); | ||
const Twine &Name = "", bool IsSingleScalar = false); | ||
|
||
VP_CLASSOF_IMPL(VPDef::VPInstructionSC) | ||
|
||
|
@@ -1051,7 +1054,8 @@ class VPInstructionWithType : public VPInstruction { | |
VPInstructionWithType(unsigned Opcode, ArrayRef<VPValue *> Operands, | ||
Type *ResultTy, const VPIRFlags &Flags, DebugLoc DL, | ||
const Twine &Name = "") | ||
: VPInstruction(Opcode, Operands, Flags, DL, Name), ResultTy(ResultTy) {} | ||
: VPInstruction(Opcode, Operands, Flags, DL, Name, true), | ||
ResultTy(ResultTy) {} | ||
|
||
static inline bool classof(const VPRecipeBase *R) { | ||
// VPInstructionWithType are VPInstructions with specific opcodes requiring | ||
|
@@ -1087,10 +1091,7 @@ class VPInstructionWithType : public VPInstruction { | |
|
||
/// Return the cost of this VPInstruction. | ||
InstructionCost computeCost(ElementCount VF, | ||
VPCostContext &Ctx) const override { | ||
// TODO: Compute accurate cost after retiring the legacy cost model. | ||
return 0; | ||
} | ||
VPCostContext &Ctx) const override; | ||
|
||
Type *getResultType() const { return ResultTy; } | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -408,9 +408,9 @@ template class VPUnrollPartAccessor<3>; | |||||
|
||||||
VPInstruction::VPInstruction(unsigned Opcode, ArrayRef<VPValue *> Operands, | ||||||
const VPIRFlags &Flags, DebugLoc DL, | ||||||
const Twine &Name) | ||||||
const Twine &Name, bool IsSingleScalar) | ||||||
: VPRecipeWithIRFlags(VPDef::VPInstructionSC, Operands, Flags, DL), | ||||||
Opcode(Opcode), Name(Name.str()) { | ||||||
IsSingleScalar(IsSingleScalar), Opcode(Opcode), Name(Name.str()) { | ||||||
assert(flagsValidForOpcode(getOpcode()) && | ||||||
"Set flags not supported for the provided opcode"); | ||||||
} | ||||||
|
@@ -841,7 +841,8 @@ bool VPInstruction::isVectorToScalar() const { | |||||
} | ||||||
|
||||||
bool VPInstruction::isSingleScalar() const { | ||||||
return getOpcode() == VPInstruction::ResumePhi || | ||||||
// TODO: Set IsSingleScalar for ResumePhi and PHI. | ||||||
return IsSingleScalar || getOpcode() == VPInstruction::ResumePhi || | ||||||
getOpcode() == Instruction::PHI; | ||||||
} | ||||||
|
||||||
|
@@ -965,7 +966,7 @@ void VPInstruction::dump() const { | |||||
|
||||||
void VPInstruction::print(raw_ostream &O, const Twine &Indent, | ||||||
VPSlotTracker &SlotTracker) const { | ||||||
O << Indent << "EMIT "; | ||||||
O << Indent << (isSingleScalar() ? "SINGLE-SCALAR " : "EMIT "); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs rebasing to emit EMIT-SCALAR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks |
||||||
|
||||||
if (hasResult()) { | ||||||
printAsOperand(O, SlotTracker); | ||||||
|
@@ -1049,15 +1050,17 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |||||
|
||||||
void VPInstructionWithType::execute(VPTransformState &State) { | ||||||
State.setDebugLocFrom(getDebugLoc()); | ||||||
switch (getOpcode()) { | ||||||
case Instruction::ZExt: | ||||||
case Instruction::Trunc: { | ||||||
if (Instruction::isCast(getOpcode())) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed now that SExt/UIToFP/? are also represented as VPInstructionWithType? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, now we need to generate code for any cast |
||||||
Value *Op = State.get(getOperand(0), VPLane(0)); | ||||||
Value *Cast = State.Builder.CreateCast(Instruction::CastOps(getOpcode()), | ||||||
Op, ResultTy); | ||||||
if (auto *I = dyn_cast<Instruction>(Cast)) | ||||||
applyFlags(*I); | ||||||
Comment on lines
+1057
to
+1058
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed now that flagged casts are represented as VPInstructionWithType's? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||||||
State.set(this, Cast, VPLane(0)); | ||||||
break; | ||||||
return; | ||||||
} | ||||||
|
||||||
switch (getOpcode()) { | ||||||
case VPInstruction::StepVector: { | ||||||
Value *StepVector = | ||||||
State.Builder.CreateStepVector(VectorType::get(ResultTy, State.VF)); | ||||||
|
@@ -1069,10 +1072,19 @@ void VPInstructionWithType::execute(VPTransformState &State) { | |||||
} | ||||||
} | ||||||
|
||||||
InstructionCost VPInstructionWithType::computeCost(ElementCount VF, | ||||||
VPCostContext &Ctx) const { | ||||||
// TODO: Compute cost for VPInstructions without underlying values once | ||||||
// the legacy cost model has been retired. | ||||||
if (!getUnderlyingValue()) | ||||||
return 0; | ||||||
return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF); | ||||||
} | ||||||
|
||||||
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) | ||||||
void VPInstructionWithType::print(raw_ostream &O, const Twine &Indent, | ||||||
VPSlotTracker &SlotTracker) const { | ||||||
O << Indent << "EMIT "; | ||||||
O << Indent << (isSingleScalar() ? "SINGLE-SCALAR " : "EMIT "); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
consistent with VPInstruction::print(). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done thanks |
||||||
printAsOperand(O, SlotTracker); | ||||||
O << " = "; | ||||||
|
||||||
|
@@ -1585,23 +1597,25 @@ bool VPIRFlags::flagsValidForOpcode(unsigned Opcode) const { | |||||
switch (OpType) { | ||||||
case OperationType::OverflowingBinOp: | ||||||
return Opcode == Instruction::Add || Opcode == Instruction::Sub || | ||||||
Opcode == Instruction::Mul || | ||||||
Opcode == Instruction::Mul || Opcode == Instruction::Shl || | ||||||
Opcode == VPInstruction::VPInstruction::CanonicalIVIncrementForPart; | ||||||
case OperationType::DisjointOp: | ||||||
return Opcode == Instruction::Or; | ||||||
case OperationType::PossiblyExactOp: | ||||||
return Opcode == Instruction::AShr; | ||||||
return Opcode == Instruction::AShr || Opcode == Instruction::LShr || | ||||||
Opcode == Instruction::SDiv || Opcode == Instruction::UDiv; | ||||||
case OperationType::GEPOp: | ||||||
return Opcode == Instruction::GetElementPtr || | ||||||
Opcode == VPInstruction::PtrAdd; | ||||||
case OperationType::FPMathOp: | ||||||
return Opcode == Instruction::FAdd || Opcode == Instruction::FMul || | ||||||
Opcode == Instruction::FSub || Opcode == Instruction::FNeg || | ||||||
Opcode == Instruction::FDiv || Opcode == Instruction::FRem || | ||||||
Opcode == Instruction::FPTrunc || Opcode == Instruction::FPExt || | ||||||
Opcode == Instruction::FCmp || Opcode == Instruction::Select || | ||||||
Opcode == VPInstruction::WideIVStep; | ||||||
case OperationType::NonNegOp: | ||||||
return Opcode == Instruction::ZExt; | ||||||
return Opcode == Instruction::UIToFP || Opcode == Instruction::ZExt; | ||||||
break; | ||||||
case OperationType::Cmp: | ||||||
return Opcode == Instruction::ICmp; | ||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -154,6 +154,10 @@ static bool sinkScalarOperands(VPlan &Plan) { | |||||||||||||||
if (auto *RepR = dyn_cast<VPReplicateRecipe>(SinkCandidate)) { | ||||||||||||||||
if (!ScalarVFOnly && RepR->isSingleScalar()) | ||||||||||||||||
continue; | ||||||||||||||||
Comment on lines
154
to
156
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could be simplified and generalized to
Suggested change
? |
||||||||||||||||
} else if (auto *RepR = dyn_cast<VPInstruction>(SinkCandidate)) { | ||||||||||||||||
if ((!ScalarVFOnly && RepR->isSingleScalar()) || | ||||||||||||||||
!RepR->getUnderlyingValue()) | ||||||||||||||||
Comment on lines
+157
to
+159
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So now non-single-scalar VPInstructions with underlying values are also subject to sinking?
Suggested change
|
||||||||||||||||
continue; | ||||||||||||||||
} else if (!isa<VPScalarIVStepsRecipe>(SinkCandidate)) | ||||||||||||||||
continue; | ||||||||||||||||
|
||||||||||||||||
|
@@ -196,6 +200,15 @@ static bool sinkScalarOperands(VPlan &Plan) { | |||||||||||||||
SinkCandidate->replaceUsesWithIf(Clone, [SinkTo](VPUser &U, unsigned) { | ||||||||||||||||
return cast<VPRecipeBase>(&U)->getParent() != SinkTo; | ||||||||||||||||
}); | ||||||||||||||||
} else { | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth a comment how non-single-scalar VPInstructions are sunk. |
||||||||||||||||
if (auto *VPI = dyn_cast<VPInstruction>(SinkCandidate)) { | ||||||||||||||||
Comment on lines
+203
to
+204
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
auto *OldCand = SinkCandidate; | ||||||||||||||||
SinkCandidate = new VPReplicateRecipe(VPI->getUnderlyingInstr(), | ||||||||||||||||
SinkCandidate->operands(), true, | ||||||||||||||||
nullptr /*Mask*/); | ||||||||||||||||
Comment on lines
+206
to
+208
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can VPI be cloned instead? |
||||||||||||||||
SinkCandidate->insertBefore(OldCand); | ||||||||||||||||
OldCand->replaceAllUsesWith(SinkCandidate); | ||||||||||||||||
} | ||||||||||||||||
} | ||||||||||||||||
SinkCandidate->moveBefore(*SinkTo, SinkTo->getFirstNonPhi()); | ||||||||||||||||
for (VPValue *Op : SinkCandidate->operands()) | ||||||||||||||||
|
@@ -1047,8 +1060,14 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { | |||||||||||||||
unsigned ExtOpcode = match(R.getOperand(0), m_SExt(m_VPValue())) | ||||||||||||||||
? Instruction::SExt | ||||||||||||||||
: Instruction::ZExt; | ||||||||||||||||
auto *VPC = | ||||||||||||||||
new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy); | ||||||||||||||||
VPSingleDefRecipe *VPC; | ||||||||||||||||
if (vputils::isSingleScalar(R.getVPSingleValue())) | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in #140623 |
||||||||||||||||
VPC = new VPInstructionWithType(Instruction::CastOps(ExtOpcode), {A}, | ||||||||||||||||
TruncTy, {}, {}); | ||||||||||||||||
else | ||||||||||||||||
VPC = new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should both be VPInstructionWithType, one with IsSingleScalar turned on and the other off? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, there's an earlier WIP patch to use VPInstructionWithType also for VPWidenCastRecipe (#129712), although I think it would probably make sense to re-visit this after support for uniform casts land. |
||||||||||||||||
TruncTy); | ||||||||||||||||
|
||||||||||||||||
if (auto *UnderlyingExt = R.getOperand(0)->getUnderlyingValue()) { | ||||||||||||||||
// UnderlyingExt has distinct return type, used to retain legacy cost. | ||||||||||||||||
VPC->setUnderlyingValue(UnderlyingExt); | ||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not asserted earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added assert to #140623