-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[VPlan] Remove hoisting to preheader in VPTransformState::get #142594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[VPlan] Remove hoisting to preheader in VPTransformState::get #142594
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesI noticed this after updating #118638 on top of #117506, and seeing that some broadcasts were no longer being hoisted into the loop preheader. It was calling VPlan::getVectorPreheader(), which at this point now is dissolved and returns null. This fixes it by getting the header from vputils::getFirstLoopHeader and getting the preheader from its first successor. I've also added an assertion in getVectorPreheader() to make sure its not called after the regions are dissolved. Full diff: https://github.com/llvm/llvm-project/pull/142594.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 280ea47c5d7cc..b69278bac047e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -290,18 +290,22 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
return Data.VPV2Vector[Def];
auto GetBroadcastInstrs = [this, Def](Value *V) {
+ VPBasicBlock *PreheaderVPBB = nullptr;
+ if (auto *Header = vputils::getFirstLoopHeader(*Plan, VPDT))
+ PreheaderVPBB = cast<VPBasicBlock>(Header->getPredecessors()[0]);
+
bool SafeToHoist =
- !Def->hasDefiningRecipe() ||
- VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
- Plan->getVectorPreheader());
+ PreheaderVPBB &&
+ (!Def->hasDefiningRecipe() ||
+ VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
+ PreheaderVPBB));
if (VF.isScalar())
return V;
// Place the code for broadcasting invariant variables in the new preheader.
IRBuilder<>::InsertPointGuard Guard(Builder);
if (SafeToHoist) {
- BasicBlock *LoopVectorPreHeader =
- CFG.VPBB2IRBB[Plan->getVectorPreheader()];
+ BasicBlock *LoopVectorPreHeader = CFG.VPBB2IRBB[PreheaderVPBB];
if (LoopVectorPreHeader)
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 44f0b6d964a6e..736f600773094 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3984,13 +3984,11 @@ class VPlan {
VPBasicBlock *getEntry() { return Entry; }
const VPBasicBlock *getEntry() const { return Entry; }
- /// Returns the preheader of the vector loop region, if one exists, or null
- /// otherwise.
+ /// Returns the preheader of the vector loop region, provided it exists.
VPBasicBlock *getVectorPreheader() {
VPRegionBlock *VectorRegion = getVectorLoopRegion();
- return VectorRegion
- ? cast<VPBasicBlock>(VectorRegion->getSinglePredecessor())
- : nullptr;
+ assert(VectorRegion && "vector loop region no longer exists?");
+ return cast<VPBasicBlock>(VectorRegion->getSinglePredecessor());
}
/// Returns the VPRegionBlock of the vector loop.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll b/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
index 9a70ed451cf42..9d377a95e8252 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
@@ -14,6 +14,8 @@ define i8 @pr141968(i1 %cond, i8 %v) {
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i1> poison, i1 [[COND]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i1> [[BROADCAST_SPLATINSERT]], <16 x i1> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[TMP0:%.*]] = xor <16 x i1> [[BROADCAST_SPLAT]], splat (i1 true)
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_SDIV_CONTINUE30:.*]] ]
@@ -97,8 +99,6 @@ define i8 @pr141968(i1 %cond, i8 %v) {
; CHECK: [[PRED_SDIV_IF29]]:
; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE30]]
; CHECK: [[PRED_SDIV_CONTINUE30]]:
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[PREDPHI:%.*]] = select <16 x i1> [[BROADCAST_SPLAT]], <16 x i8> zeroinitializer, <16 x i8> [[BROADCAST_SPLAT32]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16
; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
|
Ping |
@@ -290,18 +290,22 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) { | |||
return Data.VPV2Vector[Def]; | |||
|
|||
auto GetBroadcastInstrs = [this, Def](Value *V) { | |||
VPBasicBlock *PreheaderVPBB = nullptr; | |||
if (auto *Header = vputils::getFirstLoopHeader(*Plan, VPDT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the definition of getFirstLoopHeader
:
VPBasicBlock *vputils::getFirstLoopHeader(VPlan &Plan, VPDominatorTree &VPDT) {
auto DepthFirst = vp_depth_first_shallow(Plan.getEntry());
auto I = find_if(DepthFirst, [&VPDT](VPBlockBase *VPB) {
return VPBlockUtils::isHeader(VPB, VPDT);
});
return I == DepthFirst.end() ? nullptr : cast<VPBasicBlock>(*I);
}
and isHeader
:
bool VPBlockUtils::isHeader(const VPBlockBase *VPB,
const VPDominatorTree &VPDT) {
auto *VPBB = dyn_cast<VPBasicBlock>(VPB);
if (!VPBB)
return false;
// If VPBB is in a region R, VPBB is a loop header if R is a loop region with
// VPBB as its entry, i.e., free of predecessors.
if (auto *R = VPBB->getParent())
return !R->isReplicator() && VPBB->getNumPredecessors() == 0;
// A header dominates its second predecessor (the latch), with the other
// predecessor being the preheader
return VPB->getPredecessors().size() == 2 &&
VPDT.dominates(VPB, VPB->getPredecessors()[1]);
}
It suggests a case where getFirstLoopHeader
returns a Header
without any predecessors:
// If VPBB is in a region R, VPBB is a loop header if R is a loop region with
// VPBB as its entry, i.e., free of predecessors.
Does this mean the code on the line below is unsafe because there are no predecessors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage during execution the regions will have been dissolved so VPBB->getParent()
should be nullptr, except for replicator regions. But there's a check for R->isReplicator()
, so I think VPBlockUtils::isHeader
should only return true if there's 2 predecessors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for explaining! In that case, is it worth asserting assert(!getVectorLoopRegion() && ...)
at the start of GetBroadcastInstrs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in ab70f47
@@ -97,8 +99,6 @@ define i8 @pr141968(i1 %cond, i8 %v) { | |||
; CHECK: [[PRED_SDIV_IF29]]: | |||
; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE30]] | |||
; CHECK: [[PRED_SDIV_CONTINUE30]]: | |||
; CHECK-NEXT: [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0 | |||
; CHECK-NEXT: [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer | |||
; CHECK-NEXT: [[PREDPHI:%.*]] = select <16 x i1> [[BROADCAST_SPLAT]], <16 x i8> zeroinitializer, <16 x i8> [[BROADCAST_SPLAT32]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is the only user a VPPredInstPHIRecipe? It uses the vector value, but has usesScalars
returns true. Should it return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, removed the overridden definition in ba9e503 so it should return false now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes to GetBroadcastInstrs
still needed? It sounds like the SafeToHoist
could be removed now? I think the last user of the code has been removed in 6108d50.
If that's the case would be good to have one patch updating VPPredInstPHIRecipe and one stripping the SafeToHoist logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will result in some small test diffs with #118638, where previously we were inserting broadcasts into the preheader but now it will be in the loop body, e.g. see https://github.com/llvm/llvm-project/pull/118638/files#diff-3967d969cafbfb0bd4e6d92db8cb1d5f36181d85e5c5a7ab4bcdfe81d03df0c1
But it's not critical though. If we're happy with the test diff, which I presume LICM will hoist anyway, then agreed we should remove SafeToHoist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #143798 for VPPredInstPHIRecipe, and reworked this PR to remove the SafeToHoist logic.
In either case we still end up with the test diff in this test case, i.e. changing usesScalars doesn't seem to affect it.
In either case we still end up with this broadcast not being hoisted, but it's not a big deal since LICM will hoist it afterwards.
It uses the vector value of its operand if possible. Use the default definition in VPUser, which is false since onlyFirstLaneUsed is false. From the discussion at llvm#142594 (comment)
Now that the start values of reductions are explicitly modelled as a VPInstruction in 6108d50, we no longer really need to hoist when executing. From the discussion in llvm#142594 (comment)
ba9e503
to
cf4a828
Compare
Now that the start values of reductions are explicitly modelled as a VPInstruction in 6108d50, we no longer really need to hoist when executing.
The last remaining use of this is one
VPPredInstPHIRecipe
test case