[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #125058

david-arm · 2025-01-30T13:42:58Z

This patch adds an initial implementation of
VPInstruction::computeCost with support for only one
instruction so far - VPInstruction::AnyOf. This is only
used when vectorising loops with uncountable early exits.

llvmbot · 2025-01-30T13:43:55Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-vectorizers

Author: David Sherwood (david-arm)

Changes

This patch adds an initial implementation of
VPInstruction::computeCost with support for only one
instruction so far - VPInstruction::AnyOf. This is only
used when vectorising loops with uncountable early exits.

Full diff: https://github.com/llvm/llvm-project/pull/125058.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-4)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+14)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+32-16)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1e948d2c756c24..e2947d73361cc5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7539,6 +7539,7 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
                         CM.CostKind);
   precomputeCosts(BestPlan, BestFactor.Width, CostCtx);
   assert((BestFactor.Width == LegacyVF.Width ||
+          Legal->hasUncountableEarlyExit() ||
           planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width),
                                                 CostCtx, OrigLoop) ||
           planContainsAdditionalSimplifications(getPlanFor(LegacyVF.Width),
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 459222234bc37f..051cda12875917 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1316,10 +1316,7 @@ class VPInstruction : public VPRecipeWithIRFlags,
 
   /// Return the cost of this VPInstruction.
   InstructionCost computeCost(ElementCount VF,
-                              VPCostContext &Ctx) const override {
-    // TODO: Compute accurate cost after retiring the legacy cost model.
-    return 0;
-  }
+                              VPCostContext &Ctx) const override;
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the VPInstruction to \p O.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 81031b9401ca09..3e484a68751a8e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -709,6 +709,20 @@ Value *VPInstruction::generate(VPTransformState &State) {
   }
 }
 
+InstructionCost VPInstruction::computeCost(ElementCount VF,
+                                           VPCostContext &Ctx) const {
+  switch (getOpcode()) {
+  case VPInstruction::AnyOf: {
+    auto *VecI1Ty = VectorType::get(Type::getInt1Ty(Ctx.LLVMCtx), VF);
+    return Ctx.TTI.getArithmeticReductionCost(Instruction::Or, VecI1Ty,
+                                              std::nullopt, Ctx.CostKind);
+  }
+  default:
+    // TODO: Fill out other opcodes!
+    return 0;
+  }
+}
+
 bool VPInstruction::isVectorToScalar() const {
   return getOpcode() == VPInstruction::ExtractFromEnd ||
          getOpcode() == VPInstruction::ExtractFirstActive ||
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
index 9d21ea0ab6de39..b439b64e829e5f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll
@@ -272,36 +272,52 @@ define i64 @loop_contains_safe_div() #1 {
 ; CHECK-NEXT:    [[P2:%.*]] = alloca [1024 x i8], align 4
 ; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024)
 ; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024)
+; CHECK-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 4
 ; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
+; CHECK-NEXT:    [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP10]], 4
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 64, [[TMP3]]
+; CHECK-NEXT:    [[INDEX1:%.*]] = sub i64 64, [[N_MOD_VF]]
+; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 4
+; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 3, [[INDEX1]]
+; CHECK-NEXT:    [[TMP16:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
+; CHECK-NEXT:    [[TMP17:%.*]] = mul <vscale x 4 x i64> [[TMP16]], splat (i64 1)
+; CHECK-NEXT:    [[INDUCTION:%.*]] = add <vscale x 4 x i64> splat (i64 3), [[TMP17]]
+; CHECK-NEXT:    [[TMP9:%.*]] = mul i64 1, [[TMP5]]
+; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP9]], i64 0
+; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
-; CHECK-NEXT:    [[INDEX1:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT2:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 3, i64 4>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 3, [[INDEX1]]
-; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT:    [[INDEX2:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT2:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[OFFSET_IDX1:%.*]] = add i64 3, [[INDEX2]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[OFFSET_IDX1]], 0
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P1]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 1
-; CHECK-NEXT:    [[TMP3:%.*]] = udiv <2 x i32> [[WIDE_LOAD]], splat (i32 20000)
-; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq <2 x i32> [[TMP3]], splat (i32 1)
-; CHECK-NEXT:    [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], 2
-; CHECK-NEXT:    [[TMP5:%.*]] = xor <2 x i1> [[TMP4]], splat (i1 true)
-; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> [[TMP5]])
-; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT2]], 64
-; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP2]], align 1
+; CHECK-NEXT:    [[TMP13:%.*]] = udiv <vscale x 4 x i32> [[WIDE_LOAD]], splat (i32 20000)
+; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq <vscale x 4 x i32> [[TMP13]], splat (i32 1)
+; CHECK-NEXT:    [[INDEX_NEXT2]] = add nuw i64 [[INDEX2]], [[TMP5]]
+; CHECK-NEXT:    [[TMP15:%.*]] = xor <vscale x 4 x i1> [[TMP14]], splat (i1 true)
+; CHECK-NEXT:    [[TMP6:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[TMP15]])
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT2]], [[INDEX1]]
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
 ; CHECK-NEXT:    [[TMP8:%.*]] = or i1 [[TMP6]], [[TMP7]]
 ; CHECK-NEXT:    br i1 [[TMP8]], label [[MIDDLE_SPLIT:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       middle.split:
 ; CHECK-NEXT:    br i1 [[TMP6]], label [[VECTOR_EARLY_EXIT:%.*]], label [[MIDDLE_BLOCK:%.*]]
 ; CHECK:       vector.early.exit:
-; CHECK-NEXT:    [[FIRST_ACTIVE_LANE:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.v2i1(<2 x i1> [[TMP5]], i1 true)
-; CHECK-NEXT:    [[EARLY_EXIT_VALUE:%.*]] = extractelement <2 x i64> [[VEC_IND]], i64 [[FIRST_ACTIVE_LANE]]
+; CHECK-NEXT:    [[FIRST_ACTIVE_LANE:%.*]] = call i64 @llvm.experimental.cttz.elts.i64.nxv4i1(<vscale x 4 x i1> [[TMP15]], i1 true)
+; CHECK-NEXT:    [[EARLY_EXIT_VALUE:%.*]] = extractelement <vscale x 4 x i64> [[VEC_IND]], i64 [[FIRST_ACTIVE_LANE]]
 ; CHECK-NEXT:    br label [[LOOP_END:%.*]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[LOOP_END]], label [[SCALAR_PH]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 64, [[INDEX1]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label [[LOOP_END]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 67, [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[OFFSET_IDX]], [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[INDEX_NEXT:%.*]], [[LOOP_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]

hassnaaHamdi · 2025-01-31T14:49:52Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  switch (getOpcode()) {
+  case VPInstruction::AnyOf: {
+    auto *VecI1Ty = VectorType::get(Type::getInt1Ty(Ctx.LLVMCtx), VF);
+    return Ctx.TTI.getArithmeticReductionCost(Instruction::Or, VecI1Ty,


Hi Dave,
Is there a reason behind using the reduction cost specifically?

Yeah so in VPInstruction::generate the AnyOf instruction is implemented as follows:

case VPInstruction::AnyOf: { Value *A = State.get(getOperand(0)); return Builder.CreateOrReduce(A); }

i.e. it's creating an or reduction across the input vector so I've just created a cost that matches the IR generated.

Aha, I see. Thanks for the clarification.

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

fhahn · 2025-02-04T11:44:07Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7539,7 +7539,7 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
                        CM.CostKind);
  precomputeCosts(BestPlan, BestFactor.Width, CostCtx);
  assert((BestFactor.Width == LegacyVF.Width ||
-          Legal->hasUncountableEarlyExit() ||
+          BestPlan.hasRegionWithEarlyExit() ||


Thanks for the update. I've been thinking about the naming a bit, and the region/plan doesn't technically have any early exits any more , as we converted those, but the input had one. I don't really have any good suggestions for the naming. Any ideas for a better terminology to use here?

Alternatively we could just inline the check here, with an explanation.

Yeah, I struggled with naming too. There is only one exiting block that just happens to have the conditions merged and there is only a single successor - the middle split block. What most accurately describes the CFG is something like:

doesRegionTerminatorHaveMultipleConditionsForExiting

or

isMiddleBlockIndirectSuccessorOfVectorRegion

but these seem a bit wordy! I'm happy to inline the check here for now, however I imagine at some point it will be useful to have some way of querying this.

I think it would probably be best to inline things for now, and reconsider after more uses arise. Maybe things will get clearer then

fhahn

LGTM with suggestion inline, thanks

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.

david-arm · 2025-02-05T14:56:32Z

I had to rebase + force push just to work around the ridiculous "Processing Updates" hang in github since this morning.

david-arm · 2025-02-05T14:56:57Z

I still miss Phabricator. :(

Following on from llvm#125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the *maximum* trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits.

…25058) This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.

Following on from llvm#125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the *maximum* trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.

) Following on from #125058, this patch takes into account the work done in the vector early exit block when assessing the profitability of vectorising the loop. I have renamed areRuntimeChecksProfitable to isOutsideLoopWorkProfitable and we now pass in the early exit costs. As part of this, I have added the ExtractFirstActive opcode to VPInstruction::computeCost. It's worth pointing out that when we assess profitability of the loop we calculate a minimum trip count and compare that against the *maximum* trip count. However, since the loop has an early exit the runtime trip count can still end up being less than the minimum. Alternatively, we may never take the early exit at all at runtime and so we have the opposite problem of over-estimating the cost of the loop. The loop vectoriser cannot simultaneously take two contradictory positions and so I feel the only sensible thing to do is be conservative and assume the loop will be more expensive than loops without early exits. We may find in future that we need to adjust the cost according to the probability of taking the early exit. This will become even more important once we support multiple early exits. However, we have to start somewhere and we can always revisit this later.

david-arm requested review from fhahn and hassnaaHamdi January 30, 2025 13:42

llvmbot added vectorizers llvm:transforms labels Jan 30, 2025

hassnaaHamdi reviewed Jan 31, 2025

View reviewed changes

fhahn reviewed Jan 31, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp Outdated Show resolved Hide resolved

fhahn reviewed Jan 31, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

fhahn reviewed Feb 4, 2025

View reviewed changes

fhahn approved these changes Feb 4, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

fhahn mentioned this pull request Feb 5, 2025

[VPlan] Compute cost for binary op VPInstruction with underlying values. #125434

Merged

david-arm added 4 commits February 5, 2025 14:54

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan

57998f8

This patch adds an initial implementation of VPInstruction::computeCost with support for only one instruction so far - VPInstruction::AnyOf. This is only used when vectorising loops with uncountable early exits.

Address review comments

7dae35d

Address review comment

686adb2

Address review comment

4944d54

david-arm force-pushed the ee_costs1 branch from 6e1629d to 4944d54 Compare February 5, 2025 14:54

david-arm merged commit f07cd36 into llvm:main Feb 5, 2025
8 checks passed

david-arm mentioned this pull request Feb 7, 2025

[LoopVectorize] Further improve cost model for early exit loops #126235

Merged

david-arm deleted the ee_costs1 branch February 27, 2025 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #125058

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #125058

Uh oh!

david-arm commented Jan 30, 2025

Uh oh!

llvmbot commented Jan 30, 2025 •

edited

Loading

Uh oh!

hassnaaHamdi Jan 31, 2025

Uh oh!

david-arm Jan 31, 2025

Uh oh!

hassnaaHamdi Jan 31, 2025

Uh oh!

Uh oh!

Uh oh!

fhahn Feb 4, 2025

Uh oh!

david-arm Feb 4, 2025

Uh oh!

fhahn Feb 4, 2025

Uh oh!

david-arm Feb 4, 2025

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

david-arm commented Feb 5, 2025

Uh oh!

david-arm commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #125058

[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan #125058

Uh oh!

Conversation

david-arm commented Jan 30, 2025

Uh oh!

llvmbot commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hassnaaHamdi Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

hassnaaHamdi Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fhahn Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david-arm commented Feb 5, 2025

Uh oh!

david-arm commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jan 30, 2025 •

edited

Loading