Skip to content

[VPlan] Build initial VPlan 0 using HCFGBuilder for inner loops. (NFC) #124432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 18, 2025
138 changes: 94 additions & 44 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9298,6 +9298,7 @@ static void addExitUsersForFirstOrderRecurrences(
VPlanPtr
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {

using namespace llvm::VPlanPatternMatch;
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;

// ---------------------------------------------------------------------------
Expand All @@ -9321,6 +9322,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
Comment on lines +9325 to +9327
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps VPlanHCFGBuilder::buildingHierarchicalCFG() which creates initial VPBB's inside loop region ( via PlainCFGBuilder::buildPlainCFG()) should now be combined with VPlan::createInitialVPlan() which creates initial VPBB's elsewhere?
Can be done as follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, maybe best to merge as follow-up


// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
Expand Down Expand Up @@ -9371,12 +9376,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// Construct recipes for the instructions in the loop
// ---------------------------------------------------------------------------

// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.
LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);

VPBasicBlock *HeaderVPBB = Plan->getVectorLoopRegion()->getEntryBasicBlock();
VPRegionBlock *LoopRegion = Plan->getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = LoopRegion->getEntryBasicBlock();
VPBasicBlock *VPBB = HeaderVPBB;
BasicBlock *HeaderBB = OrigLoop->getHeader();
bool NeedsMasks =
Expand All @@ -9389,26 +9390,70 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
RecipeBuilder.collectScaledReductions(Range);

auto *MiddleVPBB = Plan->getMiddleBlock();

// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the above comment over here:

Suggested change
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

HeaderVPBB);

VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove

  LoopBlocksDFS DFS(OrigLoop);
  DFS.perform(LI);

above which become dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.
if (VPBB != HeaderVPBB)
VPBB->setName(BB->getName());
Builder.setInsertPoint(VPBB);
VPBlockBase *PrevVPBB = nullptr;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
// Handle VPBBs down to the latch.
if (VPBB == LoopRegion->getExiting()) {
assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
"the latch block shouldn't have a corresponding IRBB");
Comment on lines +9404 to +9405
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... We always have a latch VPBB, which isn't associated with an IRBB, which is what this covered

Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.

Thanks for clarifying. Should this discrepancy be fixed, perhaps as part of integrating HCFGBuilder with skeleton creation, to form "buildLoop" VPlan whose loop VPBB's are in full 1-1 correspondence with the original IRBB's of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, currently they are quite intertwined already, but once we use it in the inner loop path I'll continue to further refactor/improve on the current structure

VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
break;
}

if (VPBB == HeaderVPBB)
// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set the insert point of Builder once, here, as done now, possibly to first non-Phi of VPBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that doesn't play nice with remove the recipes.

Builder.setInsertPoint(VPBB, VPBB->begin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: setting insert point here to first non phi of VPBB instead of beginning of VPBB doesn't play well with recipe removal?

Suggested change
Builder.setInsertPoint(VPBB, VPBB->begin());
Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to insert at the begining here, as the blocks other than the header may contain VPWidenPhis that need to be converted to VPBlend, and the mask needs to be inserted at the beginning of the block for those blends.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So non-phi recipes that compute a block's mask must be introduced before the phi recipes of the block, because the latter is eventually replaced by blends that use the former? This calls for a FIXME note.
One alternative may be to sink the blends when they replace the phis.
Another is to introduce masking recipes along with replacing phi recipes with blends, as done currently in tryToBuildVPlanWithRecipes().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a FIXME for now. We should be able to sink the blends at the right place.

The whole mask creation logic will be replaced when rewriting to perform predication on VPlan, cleaning up the logic here.

if (VPBB == HeaderVPBB) {
Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
RecipeBuilder.createHeaderMask();
else if (NeedsMasks)
RecipeBuilder.createBlockInMask(BB);
} else if (NeedsMasks) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the header mask also be created only if NeedsMasks? (Seems independent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently createHeaderMask itself checks if tail folding is enabled, and only then creates the mask

// FIXME: At the moment, masks need to be placed at the beginning of the
// block, as blends introduced for phi nodes need to use it. The created
// blends should be sunk after the mask recipes.
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to move edge masks and block masks to be cached according to VPBB's rather than IRBB's?
Rather than hacking VPlanHCFGBuilder and PlainCFGBuilder to record and retrieve IRBB for VPBB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is somewhat a chicken-and-egg problem. Without this patch, we cannot do that, because we only ever create the flattened CFG in VPlan. I might be missing another option, but I think once this patch lands as first step we can move the mask creation to be based on the VPBlocks. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try, but it doesn't work out with the flattening of the CFG here, because we might already removed edges earlier before querying a mask on that edge.

We could first compute all masks in a separate loop, but there also are quite a few places that need updating, include BasicBlock -> VPBasicBlock, various places that get the mask for the parent of an instruction, looking for BranchOnCond instead of BranchInst, so that would also increase the diff quite a bit unfortunately. It would probably be better to tackle that separately, also to make testing easier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, another suggestion: getIRBBForVPB() is needed only here - for non-header (non-latch) (non-empty?) VPBB, whose recipes are all expected to have underlying Instructions, which know their parental BasicBlock? Can the latter be retrieved as in below rather than hacking HCFGBuilder to record and expose this VRBB-to-IRBB mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfrotunately we don't create recipes for branches, so there may be empty blocks for which we would need to create masks (because there succesors may require them at the moment).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with HCFGBuilder.getIRBBForVPB(), for now ...
Following our roadmap, HCFGBuilder would be the VPlan-to-VPlan transformation producing "buildLoop" from "wrapInput", and tryToBuildVPlanWithRecipes would follow as a subsequent VPlan-to-VPlan transformation(s). Obtaining this VPBB-to-IRBB mapping from HCFGBuilder should be removed in one of two ways (or more):

  1. If the buildLoop VPlan is to maintain its correspondence with IR basic blocks then a new type of VPBB can be introduced - one providing getIRBasicBlock() similar to VPIRBasicBlock but having an execute() similar to VPBasicBlock - that generates new basic blocks, or deemed abstract/unreachable - to be materialized into a concreate VPBB prior to codegen.
  2. If, OTOH, getIRBBForVPB() is meant only as a temporary solution to support createBlockInMask() until the latter is upgraded to work directly on VPBB's, then a FIXME should be added.

(Recipes are created for internal conditional branches but not for unconditional branches (including early exits) - which imply a single successor. An empty block also implies a single predecessor due to lack of phi recipes. Figuring out the BasicBlock from the single predecessor and/or successor of an empty VPBB seems cumbersome; getIRBBForVPB() could record the mapping for empty VPBB's only; simplest to probably to record all VPBB's in loop, for now.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a FXIME to the declaration, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
auto *SingleDef = cast<VPSingleDefRecipe>(VPBB->begin());
Instruction *Instr = SingleDef->getUnderlyingInstr();
RecipeBuilder.createBlockInMask(Instr->getParent());

?

}

// Convert input VPInstructions to widened recipes.
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
auto *SingleDef = cast<VPSingleDefRecipe>(&R);
auto *UnderlyingValue = SingleDef->getUnderlyingValue();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

auto *UnderlyingValue = SingleDef->getUnderlyingValue();
// Skip recipes that do not need transforming, including canonical IV,
// wide canonical IV and VPInstructions without underlying values. The
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a FIXME to stop relying on the underlying values of VPInstructions.
The "buildLoop" VPlan0 should use VPInstructions with flags and/or other recipes to best model a copy of the original scalar loop, in contrast to VPIRInstructions which model the original scalar loop itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

// latter are added above for masking.
// FIXME: Migrate code relying on the underlying instruction from VPlan0
// to construct recipes below to not use the underlying instruction.
if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
(isa<VPInstruction>(&R) && !UnderlyingValue))
continue;

// Introduce each ingredient into VPlan.
// TODO: Model and preserve debug intrinsics in VPlan.
for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
Instruction *Instr = &I;
// FIXME: VPlan0, which models a copy of the original scalar loop, should
// not use VPWidenPHIRecipe to model the phis.
assert((isa<VPWidenPHIRecipe>(&R) || isa<VPInstruction>(&R)) &&
UnderlyingValue && "unsupported recipe");

if (isa<VPInstruction>(&R) &&
(cast<VPInstruction>(&R)->getOpcode() ==
VPInstruction::BranchOnCond ||
(cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch))) {
R.eraseFromParent();
break;
}

// TODO: Gradually replace uses of underlying instruction by analyses on
// VPlan.
Instruction *Instr = cast<Instruction>(UnderlyingValue);
Builder.setInsertPoint(SingleDef);
SmallVector<VPValue *, 4> Operands;
auto *Phi = dyn_cast<PHINode>(Instr);
if (Phi && Phi->getParent() == HeaderBB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth commenting that the other operand of the header phi - the one across the back-edge, will be added later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

// The backedge value will be added in fixHeaderPhis later.
Operands.push_back(Plan->getOrAddLiveIn(
Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader())));
} else {
Expand All @@ -9420,15 +9465,16 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// in the exit block, a uniform store recipe will be created for the final
// invariant store of the reduction.
StoreInst *SI;
if ((SI = dyn_cast<StoreInst>(&I)) &&
if ((SI = dyn_cast<StoreInst>(Instr)) &&
Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
// Only create recipe for the final invariant store of the reduction.
if (!Legal->isInvariantStoreOfReduction(SI))
continue;
auto *Recipe = new VPReplicateRecipe(
SI, make_range(Operands.begin(), Operands.end()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
if (Legal->isInvariantStoreOfReduction(SI)) {
auto *Recipe = new VPReplicateRecipe(
SI, make_range(Operands.begin(), Operands.end()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
}
R.eraseFromParent();
Comment on lines +9476 to +9477
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
R.eraseFromParent();
R.eraseFromParent();
}

otherwise R seems to be dropped altogether? Is this case tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be multiple stores to the invariant address of a reduction (Legal->isInvariantAddressOfReduction(SI->getPointerOperand())). All of those need to be removed, and only the final one (isInvariantStoreOfReduction) will produce the final store recipe .

There are tests covering the cases with single and multiple stores

continue;
}

Expand All @@ -9438,25 +9484,29 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
Recipe = RecipeBuilder.handleReplication(Instr, Operands, Range);

RecipeBuilder.setRecipe(Instr, Recipe);
if (isa<VPHeaderPHIRecipe>(Recipe)) {
// VPHeaderPHIRecipes must be kept in the phi section of HeaderVPBB. In
// the following cases, VPHeaderPHIRecipes may be created after non-phi
// recipes and need to be moved to the phi section of HeaderVPBB:
// * tail-folding (non-phi recipes computing the header mask are
// introduced earlier than regular header phi recipes, and should appear
// after them)
// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.

assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
"unexpected recipe needs moving");
Comment on lines -9450 to -9452
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is associated with the preceding explanation. Both become obsolete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only move truncated inductions, for which it still is needed

if (isa<VPWidenIntOrFpInductionRecipe>(Recipe) && isa<TruncInst>(Instr)) {
// Optimized a truncate to VPWidenIntOrFpInductionRecipe. It needs to be
// moved to the phi section in the header.
Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
} else
VPBB->appendRecipe(Recipe);
}

VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
} else {
Builder.insert(Recipe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Builder.insert(Recipe);
Builder.insert(Recipe);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

}
if (Recipe->getNumDefinedValues() == 1)
SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
else
assert(Recipe->getNumDefinedValues() == 0 &&
"Unexpected multidef recipe");
R.eraseFromParent();
}

// Flatten the CFG in the loop. Masks for blocks have already been generated
// and added to recipes as needed. To do so, first disconnect VPBB from its
// successors. Then connect VPBB to the previously visited VPBB.
for (auto *Succ : to_vector(VPBB->getSuccessors()))
VPBlockUtils::disconnectBlocks(VPBB, Succ);
if (PrevVPBB)
VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
PrevVPBB = VPBB;
}

// After here, VPBB should not be used.
Expand Down
17 changes: 13 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -600,16 +600,25 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
}

const VPRecipeBase *R = &VPBB->back();
bool IsSwitch = isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch;
bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
(void)IsCondBranch;

if (VPBB->getNumSuccessors() >= 2 ||
(void)IsSwitch;
if (VPBB->getNumSuccessors() == 2 ||
(VPBB->isExiting() && !VPBB->getParent()->isReplicator())) {
assert(IsCondBranch && "block with multiple successors not terminated by "
"conditional branch recipe");
assert((IsCondBranch || IsSwitch) &&
"block with multiple successors not terminated by "
"conditional branch nor switch recipe");

return true;
}

if (VPBB->getNumSuccessors() > 2) {
assert(IsSwitch && "block with more than 2 successors not terminated by "
"a switch recipe");
return true;
}

Expand Down
35 changes: 28 additions & 7 deletions llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class PlainCFGBuilder {
: TheLoop(Lp), LI(LI), Plan(P) {}

/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();
void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
};
} // anonymous namespace

Expand Down Expand Up @@ -237,10 +237,10 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
// Instruction definition is in outermost loop PH.
return false;

// Check whether Instruction definition is in the loop exit.
BasicBlock *Exit = TheLoop->getUniqueExitBlock();
assert(Exit && "Expected loop with single exit.");
if (InstParent == Exit) {
// Check whether Instruction definition is in a loop exit.
SmallVector<BasicBlock *> ExitBlocks;
TheLoop->getExitBlocks(ExitBlocks);
if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}
Comment on lines +241 to 246
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer expecting a loop with single exit?
Teaching PlainCFGBuilder about multiple exits - does this imply that "native" can now vectorize such outerloops, and worth testing?
OTOH, PlainCFGBuilder::buildPlainCFG() does handle multiple/non-unique exits(?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the inner loop multi-exit support. The native path still won't vectorize early exits, due to checks in legality I think. I can add a test case, thanks

Expand Down Expand Up @@ -283,6 +283,7 @@ VPValue *PlainCFGBuilder::getOrCreateVPOperand(Value *IRVal) {
void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
BasicBlock *BB) {
VPIRBuilder.setInsertPoint(VPBB);
// TODO: Model and preserve debug intrinsics in VPlan.
for (Instruction &InstRef : BB->instructionsWithoutDebug(false)) {
Instruction *Inst = &InstRef;

Expand All @@ -308,6 +309,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
continue;
}

if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
for (auto Case : SI->cases())
Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
continue;
}

Comment on lines +312 to +319
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teaching PlainCFGBuilder about switch statements - does this imply that "native" can now vectorize outerloops with switches, and worth testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native path still won't vectorize switches, due to checks in legality I think. I can add a test case, thanks

VPValue *NewVPV;
if (auto *Phi = dyn_cast<PHINode>(Inst)) {
// Phi node's operands may have not been visited at this point. We create
Expand All @@ -334,7 +343,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
}

// Main interface to build the plain CFG.
void PlainCFGBuilder::buildPlainCFG() {
void PlainCFGBuilder::buildPlainCFG(
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
// skeleton. These were created directly rather than via getOrCreateVPBB(),
// revisit them now to update BB2VPBB. Note that header/entry and
Expand Down Expand Up @@ -423,6 +433,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// Set VPBB successors. We create empty VPBBs for successors if they don't
// exist already. Recipes will be created when the successor is visited
// during the RPO traversal.
if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
SmallVector<VPBlockBase *> Succs = {
getOrCreateVPBB(SI->getDefaultDest())};
for (auto Case : SI->cases())
Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
VPBB->setSuccessors(Succs);
continue;
}
auto *BI = cast<BranchInst>(BB->getTerminator());
unsigned NumSuccs = succ_size(BB);
if (NumSuccs == 1) {
Expand Down Expand Up @@ -476,11 +494,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
// VPlan operands.
fixPhiNodes();

for (const auto &[IRBB, VPB] : BB2VPBB)
VPB2IRBB[VPB] = IRBB;
}

void VPlanHCFGBuilder::buildPlainCFG() {
PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
PCFGBuilder.buildPlainCFG();
PCFGBuilder.buildPlainCFG(VPB2IRBB);
}

// Public interface to build a H-CFG.
Expand Down
12 changes: 12 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
// are introduced.
VPDominatorTree VPDomTree;

/// Map of create VP blocks to their input IR basic blocks, if they have been
/// created for a input IR basic block.
DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;

/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();

Expand All @@ -62,6 +66,14 @@ class VPlanHCFGBuilder {

/// Build H-CFG for TheLoop and update Plan accordingly.
void buildHierarchicalCFG();

/// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
/// there is no such corresponding block.
/// FIXME: This is a temporary workaround to drive the createBlockInMask.
/// Remove once mask creation is done on VPlan.
BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
return VPB2IRBB.lookup(VPB);
}
};
} // namespace llvm

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why/Is this line no longer NEXT? Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now also include printing the VPlan after HCFG construction, should add a test to check that or drop the extra output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, whatever you prefer.

; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
Expand Down Expand Up @@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom
Expand Down