Skip to content

[LV] Limits the splat operations be hoisted must not be defined by a recipe. #117138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 11, 2025

Conversation

Mel-Chen
Copy link
Contributor

@Mel-Chen Mel-Chen commented Nov 21, 2024

Issue #117139

@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2024

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/117138.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 8b1a4aeb88f81f..32996426c28490 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -266,7 +266,7 @@ Value *VPTransformState::get(VPValue *Def, bool NeedsScalar) {
     return Data.VPV2Vector[Def];
 
   auto GetBroadcastInstrs = [this, Def](Value *V) {
-    bool SafeToHoist = Def->isDefinedOutsideLoopRegions();
+    bool SafeToHoist = !Def->hasDefiningRecipe();
     if (VF.isScalar())
       return V;
     // Place the code for broadcasting invariant variables in the new preheader.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
index aa78113ebaa48c..b1c202eab9dd3d 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
@@ -23,9 +23,9 @@
 ; Check that the extractvalue operands are actually free in vector code.
 
 ; FORCED:         [[E1:%.+]] = extractvalue { i64, i64 } %sv, 0
-; FORCED-NEXT:    [[E2:%.+]] = extractvalue { i64, i64 } %sv, 1
 ; FORCED-NEXT:    %broadcast.splatinsert = insertelement <2 x i64> poison, i64 [[E1]], i64 0
 ; FORCED-NEXT:    %broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer
+; FORCED-NEXT:    [[E2:%.+]] = extractvalue { i64, i64 } %sv, 1
 ; FORCED-NEXT:    %broadcast.splatinsert1 = insertelement <2 x i64> poison, i64 [[E2]], i64 0
 ; FORCED-NEXT:    %broadcast.splat2 = shufflevector <2 x i64> %broadcast.splatinsert1, <2 x i64> poison, <2 x i32> zeroinitializer
 ; FORCED-NEXT:    [[ADD:%.+]] = add <2 x i64> %broadcast.splat, %broadcast.splat2
@@ -75,9 +75,9 @@ declare float @powf(float, float) readnone nounwind
 ; FORCED-LABEL: define void @test_getVectorCallCost
 
 ; FORCED:         [[E1:%.+]] = extractvalue { float, float } %sv, 0
-; FORCED-NEXT:    [[E2:%.+]] = extractvalue { float, float } %sv, 1
 ; FORCED-NEXT:    %broadcast.splatinsert = insertelement <2 x float> poison, float [[E1]], i64 0
 ; FORCED-NEXT:    %broadcast.splat = shufflevector <2 x float> %broadcast.splatinsert, <2 x float> poison, <2 x i32> zeroinitializer
+; FORCED-NEXT:    [[E2:%.+]] = extractvalue { float, float } %sv, 1
 ; FORCED-NEXT:    %broadcast.splatinsert1 = insertelement <2 x float> poison, float [[E2]], i64 0
 ; FORCED-NEXT:    %broadcast.splat2 = shufflevector <2 x float> %broadcast.splatinsert1, <2 x float> poison, <2 x i32> zeroinitializer
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
index 7778f01c58dc34..fb5087db254b23 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-extractvalue.ll
@@ -5,9 +5,9 @@ target triple = "aarch64-unknown-linux-gnu"
 define void @widen_extractvalue(ptr %dst, {i64, i64} %sv) #0 {
 ; CHECK-LABEL: @widen_extractvalue(
 ; CHECK:        [[EXTRACT0:%.*]] = extractvalue { i64, i64 } [[SV:%.*]], 0
-; CHECK-NEXT:   [[EXTRACT1:%.*]] = extractvalue { i64, i64 } [[SV]], 1
 ; CHECK-NEXT:   [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT0]], i64 0
 ; CHECK-NEXT:   [[DOTSPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
+; CHECK-NEXT:   [[EXTRACT1:%.*]] = extractvalue { i64, i64 } [[SV]], 1
 ; CHECK-NEXT:   [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[EXTRACT1]], i64 0
 ; CHECK-NEXT:   [[DOTSPLAT2:%.*]] = shufflevector <vscale x 2 x i64> [[DOTSPLATINSERT1]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
 ; CHECK:        [[ADD:%.*]] = add <vscale x 2 x i64> [[DOTSPLAT]], [[DOTSPLAT2]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll b/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
index e94bd841360256..7840a9dec794b3 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
@@ -137,6 +137,8 @@ define void @test_induction_step_needs_expansion(ptr noalias %j, ptr %k, i64 %l,
 ; CHECK-LABEL: @test_induction_step_needs_expansion(
 ; CHECK-NEXT:  iter.check:
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i16 0, [[OFF:%.*]]
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <16 x i16> [[BROADCAST_SPLATINSERT]], <16 x i16> poison, <16 x i32> zeroinitializer
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[L:%.*]], 8
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.*]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.*]]
 ; CHECK:       vector.main.loop.iter.check:
@@ -145,8 +147,6 @@ define void @test_induction_step_needs_expansion(ptr noalias %j, ptr %k, i64 %l,
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[L]], 64
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[L]], [[N_MOD_VF]]
-; CHECK-NEXT:    [[DOTSPLATINSERT2:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <16 x i16> [[DOTSPLATINSERT2]], <16 x i16> poison, <16 x i32> zeroinitializer
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul <16 x i16> splat (i16 16), [[TMP2]]
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <16 x i16> poison, i16 [[TMP0]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <16 x i16> [[DOTSPLATINSERT]], <16 x i16> poison, <16 x i32> zeroinitializer

@fhahn
Copy link
Contributor

fhahn commented Nov 21, 2024

How about adding a VPDominatorTree to VPTransformState, using that to check for dominance? VPlan's CFG shouldn't change during execution, so no need to worry about updates

@Mel-Chen
Copy link
Contributor Author

Ping @fhahn, do you have any thoughts on circular dependency between VPlan.h and VPlanDominatorTree.h? Or could we proceed with using the VPDominatorTree pointer?

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Jan 7, 2025

How about adding a VPDominatorTree to VPTransformState, using that to check for dominance? VPlan's CFG shouldn't change during execution, so no need to worry about updates

@fhahn 40c2dee
I have to update the VPDominatorTree before VPlan::execute since the plan will be changed after State was constructed:

    // 2. Copy and widen instructions from the old loop into the new loop.
    BestVPlan.prepareToExecute(
        ILV.getTripCount(),
        ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
    replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);  <--- This function will change CFG of plan

    BestVPlan.execute(&State);

And also, could you please take a look the issue about circular dependency between VPlan.h and VPlanDominatorTree.h?
Or could we proceed with using the pointer?

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 23, 2025
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(llvm#117138).
@fhahn
Copy link
Contributor

fhahn commented Jan 23, 2025

How about adding a VPDominatorTree to VPTransformState, using that to check for dominance? VPlan's CFG shouldn't change during execution, so no need to worry about updates

@fhahn 40c2dee I have to update the VPDominatorTree before VPlan::execute since the plan will be changed after State was constructed:

    // 2. Copy and widen instructions from the old loop into the new loop.
    BestVPlan.prepareToExecute(
        ILV.getTripCount(),
        ILV.getOrCreateVectorTripCount(ILV.LoopVectorPreHeader), State);
    replaceVPBBWithIRVPBB(VectorPH, State.CFG.PrevBB);  <--- This function will change CFG of plan

    BestVPlan.execute(&State);

And also, could you please take a look the issue about circular dependency between VPlan.h and VPlanDominatorTree.h? Or could we proceed with using the pointer?

I think it should be possible to organize the declarations in a better way to enable this (with the main other benefit being that it also reduces the size of VPlan.h): #124104

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 23, 2025
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(llvm#117138).
fhahn added a commit to fhahn/llvm-project that referenced this pull request Jan 30, 2025
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(llvm#117138).
fhahn added a commit that referenced this pull request Feb 2, 2025
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(#117138).

PR: #124104
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 2, 2025
…(#124104)

Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(llvm/llvm-project#117138).

PR: llvm/llvm-project#124104
@Mel-Chen Mel-Chen force-pushed the safe-to-hoist branch 2 times, most recently from ef3da20 to 324ba39 Compare February 5, 2025 13:58
@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Feb 5, 2025

I think it should be possible to organize the declarations in a better way to enable this (with the main other benefit being that it also reduces the size of VPlan.h): #124104

Thanks, #124104 help me a lot! I j already rebased this patch on that.:)
But I just saw #124644, do we still need to use VPDominatorTree?
In my opinion, the explicit splat is the best approach to resolve this issue.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to organize the declarations in a better way to enable this (with the main other benefit being that it also reduces the size of VPlan.h): #124104

Thanks, #124104 help me a lot! I j already rebased this patch on that.:) But I just saw #124644, do we still need to use VPDominatorTree? In my opinion, the explicit splat is the best approach to resolve this issue.

It depends if anything is blocked on this at the moment. #124644 is the first step, there is a bit more follow-up work needed to completely remove SafeToHoist, but after that the VPDT shouldn't be needed.

If you need this fixed urgently to unblock follow-up patches then it may make sense to use the VPDT here for now.

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Feb 6, 2025

I think it should be possible to organize the declarations in a better way to enable this (with the main other benefit being that it also reduces the size of VPlan.h): #124104

Thanks, #124104 help me a lot! I j already rebased this patch on that.:) But I just saw #124644, do we still need to use VPDominatorTree? In my opinion, the explicit splat is the best approach to resolve this issue.

It depends if anything is blocked on this at the moment. #124644 is the first step, there is a bit more follow-up work needed to completely remove SafeToHoist, but after that the VPDT shouldn't be needed.

If you need this fixed urgently to unblock follow-up patches then it may make sense to use the VPDT here for now.

Got it. #124644 is currently limited to LiveIn values, and further work is needed to support splatting non-LiveIn values.
In that case, maybe we can proceed with the VPDT approach first. :)

@Mel-Chen Mel-Chen requested review from david-arm and fhahn February 6, 2025 09:55
@Mel-Chen Mel-Chen requested a review from arcbbb February 6, 2025 09:55
bool SafeToHoist =
!Def->hasDefiningRecipe() ||
VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
Plan->getVectorPreheader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IICU properlyDominates means that we won't consider things as safe-to-hoist if they are defined int he vector preheader, but that would also be safe, as we insert at the end of the vector preheader. Can we catch this also by checking if it properly dominates the vector loop region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use dominates instead of properlyDominates to achieve this.

However, I don’t understand the benefit of doing so. In my opinion, if Def is already defined in the preheader yet we still determine that the corresponding splat operation can be hoisted, this might move the splat operation further away from Def, extending Def's live range. (please see the change of llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll)

I might be missing something—could you please explain the advantages of this change in more detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @fhahn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have users in the entry block as well, but here we only ever hoist to the vector preheader, so that should be fine as-is for now, thanks

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

bool SafeToHoist =
!Def->hasDefiningRecipe() ||
VPDT.properlyDominates(Def->getDefiningRecipe()->getParent(),
Plan->getVectorPreheader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have users in the entry block as well, but here we only ever hoist to the vector preheader, so that should be fine as-is for now, thanks

@Mel-Chen Mel-Chen merged commit 1ff10fa into llvm:main Mar 11, 2025
11 checks passed
Mel-Chen added a commit that referenced this pull request Mar 13, 2025
Mel-Chen added a commit that referenced this pull request Mar 13, 2025
…ee (#117138)

This patch restricts broadcast operations from being hoisted to the vector
preheader unless the basic block that defines the broadcasted value properly
dominates the vector preheader.

This prevents potential use-before-definition issues when the broadcasted
value is defined within the plan. VPDominatorTree is used to confirm this
restriction while still allowing safe hoisting for broadcasted values defined
outside the plan.

Issue #117139
@Mel-Chen
Copy link
Contributor Author

Accidentally used the wrong commit log. :(
Reverted and recommitted with the corrected commit log.
5d5e706

frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
…ee (llvm#117138)

This patch restricts broadcast operations from being hoisted to the vector
preheader unless the basic block that defines the broadcasted value properly
dominates the vector preheader.

This prevents potential use-before-definition issues when the broadcasted
value is defined within the plan. VPDominatorTree is used to confirm this
restriction while still allowing safe hoisting for broadcasted values defined
outside the plan.

Issue llvm#117139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants