-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). #127966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). #127966
Conversation
…uteCosts() Skip calaulating instruction costs for exit conditions in precomputeCosts() when it should be skiped.
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Elvis Wang (ElvisWang123) ChangesSkip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: #115744 (comment) Full diff: https://github.com/llvm/llvm-project/pull/127966.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 584cda34f902e..73478becf75e5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7291,6 +7291,7 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
for (unsigned I = 0; I != ExitInstrs.size(); ++I) {
Instruction *CondI = ExitInstrs[I];
if (!OrigLoop->contains(CondI) ||
+ CostCtx.skipCostComputation(CondI, VF.isVector()) ||
!CostCtx.SkipCostComputation.insert(CondI).second)
continue;
InstructionCost CondICost = CostCtx.getLegacyCost(CondI, VF);
diff --git a/llvm/test/Transforms/LoopVectorize/X86/vplan-based-cost-assertion.ll b/llvm/test/Transforms/LoopVectorize/X86/vplan-based-cost-assertion.ll
new file mode 100644
index 0000000000000..5c515af35d991
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/vplan-based-cost-assertion.ll
@@ -0,0 +1,86 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=loop-vectorize -mtriple=x86_64-unknown-linux-gnu -S | FileCheck %s
+
+; Check if the vplan-based cost model select same VF to the legacy cost model.
+; Reduced from: https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463
+
+; int *e;
+; int f;
+; void g() {
+; for (; e[f]; f++)
+; e[b(f + 9)];
+; }
+
+define void @g(i64 %n) {
+; CHECK-LABEL: @g(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = trunc i64 [[N:%.*]] to i32
+; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[TMP0]], 1
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], 8
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
+; CHECK: vector.scevcheck:
+; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt i64 [[N]], 4294967295
+; CHECK-NEXT: br i1 [[TMP2]], label [[SCALAR_PH]], label [[ENTRY:%.*]]
+; CHECK: vector.ph:
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 8
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]]
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[N]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT: br label [[LOOP:%.*]]
+; CHECK: vector.body:
+; CHECK-NEXT: [[SELECT:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[SELECT_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[VEC_PHI:%.*]] = phi <4 x i32> [ zeroinitializer, [[ENTRY]] ], [ [[TMP9:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[VEC_PHI1:%.*]] = phi <4 x i32> [ zeroinitializer, [[ENTRY]] ], [ [[TMP10:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i32> [[VEC_IND]], splat (i32 4)
+; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[VEC_IND]] to <4 x i64>
+; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i32> [[STEP_ADD]] to <4 x i64>
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <4 x i64> [[BROADCAST_SPLAT]], [[TMP3]]
+; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i64> [[BROADCAST_SPLAT]], [[TMP4]]
+; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> zeroinitializer, <4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP6]], <4 x i32> zeroinitializer, <4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP9]] = or <4 x i32> [[TMP7]], [[VEC_PHI]]
+; CHECK-NEXT: [[TMP10]] = or <4 x i32> [[TMP8]], [[VEC_PHI1]]
+; CHECK-NEXT: [[SELECT_NEXT]] = add nuw i32 [[SELECT]], 8
+; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[STEP_ADD]], splat (i32 4)
+; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[SELECT_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[LOOP]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: middle.block:
+; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i32> [[TMP10]], [[TMP9]]
+; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[BIN_RDX]])
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK: scalar.ph:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY1:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
+; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP12]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY1]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
+; CHECK-NEXT: br label [[LOOP1:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP1]] ]
+; CHECK-NEXT: [[SELECT1:%.*]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[SELECT_NEXT1:%.*]], [[LOOP1]] ]
+; CHECK-NEXT: [[IV_WIDEN:%.*]] = zext i32 [[IV]] to i64
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[N]], [[IV_WIDEN]]
+; CHECK-NEXT: [[SELECT_I:%.*]] = select i1 [[EXITCOND]], i32 0, i32 0
+; CHECK-NEXT: [[SELECT_NEXT1]] = or i32 [[SELECT_I]], [[SELECT1]]
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT]], label [[LOOP1]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: exit:
+; CHECK-NEXT: [[SPEC_SELECT_I_LCSSA:%.*]] = phi i32 [ [[SELECT_NEXT1]], [[LOOP1]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+ %select = phi i32 [ 0, %entry ], [ %select.next, %loop ]
+ %iv.widen = zext i32 %iv to i64
+ %exitcond = icmp eq i64 %n, %iv.widen
+ %select.i = select i1 %exitcond, i32 0, i32 0
+ %select.next = or i32 %select.i, %select
+ %iv.next = add i32 %iv, 1
+ br i1 %exitcond, label %exit, label %loop
+
+exit:
+ %spec.select.i.lcssa = phi i32 [ %select.next, %loop ]
+ ret void
+}
|
llvm/test/Transforms/LoopVectorize/X86/vplan-based-cost-assertion.ll
Outdated
Show resolved
Hide resolved
@@ -7291,6 +7291,7 @@ LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF, | |||
for (unsigned I = 0; I != ExitInstrs.size(); ++I) { | |||
Instruction *CondI = ExitInstrs[I]; | |||
if (!OrigLoop->contains(CondI) || | |||
CostCtx.skipCostComputation(CondI, VF.isVector()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip adding the instructions directly in the loop above?
exit: | ||
store i32 %select.next, ptr %f, align 4 | ||
ret void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly simpler?
exit: | |
store i32 %select.next, ptr %f, align 4 | |
ret void | |
exit: | |
ret i32 %select.next, |
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py | ||
; RUN: opt < %s -passes=loop-vectorize -mtriple=x86_64-unknown-linux-gnu -S | FileCheck %s | ||
|
||
; Check if the vplan-based cost model select same VF to the legacy cost model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to group with other tests for various crashes, llvm/test/Transforms/LoopVectorize/X86/cost-model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
I think it would be good to pick this on the 20.x release branch. I requested a back port: #128694 |
…uteCosts(). (llvm#127966) Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: llvm#115744 (comment) Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv (cherry picked from commit 8009c1f)
…uteCosts(). (llvm#127966) Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped. Reported from: llvm#115744 (comment) Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv (cherry picked from commit 8009c1f)
Skip calculating instruction costs for exit conditions in precomputeCosts() when it should be skipped.
Reported from: #115744 (comment)
Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv