Skip to content

[VPlan] Remove hoisting to preheader in VPTransformState::get #142594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jun 3, 2025

Now that the start values of reductions are explicitly modelled as a VPInstruction in 6108d50, we no longer really need to hoist when executing.

The last remaining use of this is one VPPredInstPHIRecipe test case

@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

I noticed this after updating #118638 on top of #117506, and seeing that some broadcasts were no longer being hoisted into the loop preheader.

It was calling VPlan::getVectorPreheader(), which at this point now is dissolved and returns null.

This fixes it by getting the header from vputils::getFirstLoopHeader and getting the preheader from its first successor.

I've also added an assertion in getVectorPreheader() to make sure its not called after the regions are dissolved.


Full diff: https://github.com/llvm/llvm-project/pull/142594.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+9-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+3-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 280ea47c5d7cc..b69278bac047e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -290,18 +290,22 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
     return Data.VPV2Vector[Def];
 
   auto GetBroadcastInstrs = [this, Def](Value *V) {
+    VPBasicBlock *PreheaderVPBB = nullptr;
+    if (auto *Header = vputils::getFirstLoopHeader(*Plan, VPDT))
+      PreheaderVPBB = cast<VPBasicBlock>(Header->getPredecessors()[0]);
+
     bool SafeToHoist =
-        !Def->hasDefiningRecipe() ||
-        VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
-                               Plan->getVectorPreheader());
+        PreheaderVPBB &&
+        (!Def->hasDefiningRecipe() ||
+         VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
+                                PreheaderVPBB));
 
     if (VF.isScalar())
       return V;
     // Place the code for broadcasting invariant variables in the new preheader.
     IRBuilder<>::InsertPointGuard Guard(Builder);
     if (SafeToHoist) {
-      BasicBlock *LoopVectorPreHeader =
-          CFG.VPBB2IRBB[Plan->getVectorPreheader()];
+      BasicBlock *LoopVectorPreHeader = CFG.VPBB2IRBB[PreheaderVPBB];
       if (LoopVectorPreHeader)
         Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 44f0b6d964a6e..736f600773094 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3984,13 +3984,11 @@ class VPlan {
   VPBasicBlock *getEntry() { return Entry; }
   const VPBasicBlock *getEntry() const { return Entry; }
 
-  /// Returns the preheader of the vector loop region, if one exists, or null
-  /// otherwise.
+  /// Returns the preheader of the vector loop region, provided it exists.
   VPBasicBlock *getVectorPreheader() {
     VPRegionBlock *VectorRegion = getVectorLoopRegion();
-    return VectorRegion
-               ? cast<VPBasicBlock>(VectorRegion->getSinglePredecessor())
-               : nullptr;
+    assert(VectorRegion && "vector loop region no longer exists?");
+    return cast<VPBasicBlock>(VectorRegion->getSinglePredecessor());
   }
 
   /// Returns the VPRegionBlock of the vector loop.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll b/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
index 9a70ed451cf42..9d377a95e8252 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr141968-instsimplifyfolder.ll
@@ -14,6 +14,8 @@ define i8 @pr141968(i1 %cond, i8 %v) {
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i1> poison, i1 [[COND]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i1> [[BROADCAST_SPLATINSERT]], <16 x i1> poison, <16 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP0:%.*]] = xor <16 x i1> [[BROADCAST_SPLAT]], splat (i1 true)
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_SDIV_CONTINUE30:.*]] ]
@@ -97,8 +99,6 @@ define i8 @pr141968(i1 %cond, i8 %v) {
 ; CHECK:       [[PRED_SDIV_IF29]]:
 ; CHECK-NEXT:    br label %[[PRED_SDIV_CONTINUE30]]
 ; CHECK:       [[PRED_SDIV_CONTINUE30]]:
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <16 x i1> [[BROADCAST_SPLAT]], <16 x i8> zeroinitializer, <16 x i8> [[BROADCAST_SPLAT32]]
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16
 ; CHECK-NEXT:    [[TMP17:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256

@lukel97
Copy link
Contributor Author

lukel97 commented Jun 10, 2025

Ping

@@ -290,18 +290,22 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
return Data.VPV2Vector[Def];

auto GetBroadcastInstrs = [this, Def](Value *V) {
VPBasicBlock *PreheaderVPBB = nullptr;
if (auto *Header = vputils::getFirstLoopHeader(*Plan, VPDT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definition of getFirstLoopHeader:

VPBasicBlock *vputils::getFirstLoopHeader(VPlan &Plan, VPDominatorTree &VPDT) {
  auto DepthFirst = vp_depth_first_shallow(Plan.getEntry());
  auto I = find_if(DepthFirst, [&VPDT](VPBlockBase *VPB) {
    return VPBlockUtils::isHeader(VPB, VPDT);
  });
  return I == DepthFirst.end() ? nullptr : cast<VPBasicBlock>(*I);
}

and isHeader:

bool VPBlockUtils::isHeader(const VPBlockBase *VPB,
                            const VPDominatorTree &VPDT) {
  auto *VPBB = dyn_cast<VPBasicBlock>(VPB);
  if (!VPBB)
    return false;

  // If VPBB is in a region R, VPBB is a loop header if R is a loop region with
  // VPBB as its entry, i.e., free of predecessors.
  if (auto *R = VPBB->getParent())
    return !R->isReplicator() && VPBB->getNumPredecessors() == 0;

  // A header dominates its second predecessor (the latch), with the other
  // predecessor being the preheader
  return VPB->getPredecessors().size() == 2 &&
         VPDT.dominates(VPB, VPB->getPredecessors()[1]);
}

It suggests a case where getFirstLoopHeader returns a Header without any predecessors:

  // If VPBB is in a region R, VPBB is a loop header if R is a loop region with
  // VPBB as its entry, i.e., free of predecessors.

Does this mean the code on the line below is unsafe because there are no predecessors?

Copy link
Contributor Author

@lukel97 lukel97 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage during execution the regions will have been dissolved so VPBB->getParent() should be nullptr, except for replicator regions. But there's a check for R->isReplicator(), so I think VPBlockUtils::isHeader should only return true if there's 2 predecessors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for explaining! In that case, is it worth asserting assert(!getVectorLoopRegion() && ...) at the start of GetBroadcastInstrs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in ab70f47

@@ -97,8 +99,6 @@ define i8 @pr141968(i1 %cond, i8 %v) {
; CHECK: [[PRED_SDIV_IF29]]:
; CHECK-NEXT: br label %[[PRED_SDIV_CONTINUE30]]
; CHECK: [[PRED_SDIV_CONTINUE30]]:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT31:%.*]] = insertelement <16 x i8> poison, i8 [[V]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT32:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT31]], <16 x i8> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[PREDPHI:%.*]] = select <16 x i1> [[BROADCAST_SPLAT]], <16 x i8> zeroinitializer, <16 x i8> [[BROADCAST_SPLAT32]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is the only user a VPPredInstPHIRecipe? It uses the vector value, but has usesScalars returns true. Should it return false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, removed the overridden definition in ba9e503 so it should return false now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes to GetBroadcastInstrs still needed? It sounds like the SafeToHoist could be removed now? I think the last user of the code has been removed in 6108d50.

If that's the case would be good to have one patch updating VPPredInstPHIRecipe and one stripping the SafeToHoist logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will result in some small test diffs with #118638, where previously we were inserting broadcasts into the preheader but now it will be in the loop body, e.g. see https://github.com/llvm/llvm-project/pull/118638/files#diff-3967d969cafbfb0bd4e6d92db8cb1d5f36181d85e5c5a7ab4bcdfe81d03df0c1

But it's not critical though. If we're happy with the test diff, which I presume LICM will hoist anyway, then agreed we should remove SafeToHoist.

Copy link
Contributor Author

@lukel97 lukel97 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #143798 for VPPredInstPHIRecipe, and reworked this PR to remove the SafeToHoist logic.

In either case we still end up with the test diff in this test case, i.e. changing usesScalars doesn't seem to affect it.

In either case we still end up with this broadcast not being hoisted, but it's not a big deal since LICM will hoist it afterwards.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jun 11, 2025
It uses the vector value of its operand if possible. Use the default definition in VPUser, which is false since onlyFirstLaneUsed is false.

From the discussion at llvm#142594 (comment)
Now that the start values of reductions are explicitly modelled as a VPInstruction in 6108d50, we no longer really need to hoist when executing.

From the discussion in llvm#142594 (comment)
@lukel97 lukel97 force-pushed the loop-vectorize/dont-use-region-preheader-in-execute branch from ba9e503 to cf4a828 Compare June 11, 2025 23:14
@lukel97 lukel97 changed the title [VPlan] Fix broadcasted values using loop region during execution [VPlan] Remove hoisting to preheader in VPTransformState::get Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants