Skip to content

[LLVM][VPlan] Keep all VPBlend masks until VPlan transformation. #104015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8170,8 +8170,6 @@ VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
// builder. At this point we generate the predication tree. There may be
// duplications since this is a simple recursive scan, but future
// optimizations will clean it up.
// TODO: At the moment the first mask is always skipped, but it would be
// better to skip the most expensive mask.
SmallVector<VPValue *, 2> OperandsWithMask;

for (unsigned In = 0; In < NumIncoming; In++) {
Expand All @@ -8184,8 +8182,6 @@ VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
"Distinct incoming values with one having a full mask");
break;
}
if (In == 0)
continue;
OperandsWithMask.push_back(EdgeMask);
}
return new VPBlendRecipe(Phi, OperandsWithMask);
Expand Down
26 changes: 16 additions & 10 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -2033,12 +2033,12 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe {
class VPBlendRecipe : public VPSingleDefRecipe {
public:
/// The blend operation is a User of the incoming values and of their
/// respective masks, ordered [I0, I1, M1, I2, M2, ...]. Note that the first
/// incoming value does not have a mask associated.
/// respective masks, ordered [I0, M0, I1, M1, I2, M2, ...]. Note that M0 can
/// be ommited (implied by passing an odd number of operands) in which case
/// all other incoming values are merged into it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M0 should be the complement of all other M's, so could be rematerialized if dropped, but it may have a simpler way of computing, which might be lost if dropped?

VPBlendRecipe(PHINode *Phi, ArrayRef<VPValue *> Operands)
: VPSingleDefRecipe(VPDef::VPBlendSC, Operands, Phi, Phi->getDebugLoc()) {
assert((Operands.size() + 1) % 2 == 0 &&
"Expected an odd number of operands");
assert(Operands.size() > 0 && "Expected at least one operand!");
}

VPBlendRecipe *clone() override {
Expand All @@ -2048,19 +2048,25 @@ class VPBlendRecipe : public VPSingleDefRecipe {

VP_CLASSOF_IMPL(VPDef::VPBlendSC)

/// Return the number of incoming values, taking into account that the first
/// incoming value has no mask.
unsigned getNumIncomingValues() const { return (getNumOperands() + 1) / 2; }
/// A normalized blend is one that has an odd number of operands, whereby the
/// first operand does not have an associated mask.
bool isNormalized() const { return getNumOperands() % 2; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operands that may or may not be present typically appear last, thereby fixing the access to all other operands. While we're making changes here, how about reversing the order of operand pairs, so that the last operand drops its mask rather than the first?

Note that an absent mask represents a full mask elsewhere in VPlan, so best clarify that this last mask is dropped because it is not needed rather than because it is known to be full. The last mask can be retained (until execution) rather than dropped but doing so might prevent dead code elimination.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated a little and to me this change makes VBlend harder to work with. Nothing major, just minor things like not being able to write for i = 1 < num_of_coming values. What do you think about VBlend taking an explicit initial value which the masked operands are blended in to? For most cases this would be undef or perhaps poison assuming we're happy to always lose the poison during normalisation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've investigated a little and to me this change makes VBlend harder to work with. Nothing major, just minor things like not being able to write for i = 1 < num_of_coming values.

Changing the order in which operand pairs are held should keep the API of accessing and traversing them in tact, only simplifying how these are implemented. Perhaps main distinction is traversing from last to first instead of from first to last.

What do you think about VBlend taking an explicit initial value which the masked operands are blended in to? For most cases this would be undef or perhaps poison assuming we're happy to always lose the poison during normalisation.

Sounds like another "optional" operand. Suffice to simplify how the redundant mask operand is handled?


/// Return the number of incoming values, taking into account when normalized
/// the first incoming value will have no mask.
unsigned getNumIncomingValues() const {
return (getNumOperands() + isNormalized()) / 2;
}

/// Return incoming value number \p Idx.
VPValue *getIncomingValue(unsigned Idx) const {
return Idx == 0 ? getOperand(0) : getOperand(Idx * 2 - 1);
return Idx == 0 ? getOperand(0) : getOperand(Idx * 2 - isNormalized());
}

/// Return mask number \p Idx.
VPValue *getMask(unsigned Idx) const {
assert(Idx > 0 && "First index has no mask associated.");
return getOperand(Idx * 2);
assert((Idx > 0 || !isNormalized()) && "First index has no mask!");
return Idx == 0 ? getOperand(1) : getOperand(Idx * 2 + !isNormalized());
}

/// Generate the phi/select nodes.
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1635,6 +1635,7 @@ void VPVectorPointerRecipe::print(raw_ostream &O, const Twine &Indent,
#endif

void VPBlendRecipe::execute(VPTransformState &State) {
assert(isNormalized() && "Expected blend to be normalized!");
State.setDebugLocFrom(getDebugLoc());
// We know that all PHIs in non-header blocks are converted into
// selects, so we don't have to worry about the insertion order and we
Expand Down
44 changes: 38 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -989,15 +989,47 @@ void VPlanTransforms::clearReductionWrapFlags(VPlan &Plan) {
/// Try to simplify recipe \p R.
static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
using namespace llvm::VPlanPatternMatch;
// Try to remove redundant blend recipes.

if (auto *Blend = dyn_cast<VPBlendRecipe>(&R)) {
VPValue *Inc0 = Blend->getIncomingValue(0);
// Try to remove redundant blend recipes.
SmallPtrSet<VPValue *, 4> UniqueValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UniqueValues is used here only for checking if all non-masked-out operands are the same, which is more simply done by comparing them to the first non-masked-out operand, as the previous implementation was doing (assuming first operand is not masked out).

if (Blend->isNormalized() || !match(Blend->getMask(0), m_False()))
UniqueValues.insert(Blend->getIncomingValue(0));
for (unsigned I = 1; I != Blend->getNumIncomingValues(); ++I)
if (Inc0 != Blend->getIncomingValue(I) &&
!match(Blend->getMask(I), m_False()))
return;
Blend->replaceAllUsesWith(Inc0);
if (!match(Blend->getMask(I), m_False()))
UniqueValues.insert(Blend->getIncomingValue(I));

if (UniqueValues.size() == 1) {
Blend->replaceAllUsesWith(*UniqueValues.begin());
Blend->eraseFromParent();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth running recursivelyDeleteDeadRecipes() in this case too - on all masks? Or does the elimination of Blend here suffice to trigger dce.

return;
}

if (Blend->isNormalized())
return;

// Normalize the blend so its first incomming value is used as the initial
// value with the others blended into it.

unsigned StartIndex = 0;
SmallVector<VPValue *, 4> OperandsWithMask;
OperandsWithMask.push_back(Blend->getIncomingValue(StartIndex));

for (unsigned I = 0; I != Blend->getNumIncomingValues(); ++I) {
if (I == StartIndex)
continue;
OperandsWithMask.push_back(Blend->getIncomingValue(I));
OperandsWithMask.push_back(Blend->getMask(I));
}

auto *NewBlend = new VPBlendRecipe(
cast<PHINode>(Blend->getUnderlyingValue()), OperandsWithMask);
NewBlend->insertBefore(&R);

VPValue *DeadMask = Blend->getMask(StartIndex);
Blend->replaceAllUsesWith(NewBlend);
Blend->eraseFromParent();
recursivelyDeleteDeadRecipes(DeadMask);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here? Other patterns below and here previously rely on DCE to clean them up later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#104019 shows relying on DCE to clean up the plan can hamper VPlanTransformations, which is only run once, so I figure it's better to exit a transformation with the plan in the most optimal state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that PR introduces a pattern that checks the number of uses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. It tries to pick a mask that it knows will be dead once removed from the blend.

return;
}

Expand Down
Loading