-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SCEV][LV] Invalidate LCSSA exit phis more thoroughly #69909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Nikita Popov (nikic) ChangesThis an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns used by the LCSSA phi node. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Full diff: https://github.com/llvm/llvm-project/pull/69909.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 2765f1286d8bce5..ab2007abe6b3d5d 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -943,6 +943,10 @@ class ScalarEvolution {
/// def-use chain linking it to a loop.
void forgetValue(Value *V);
+ /// Forget LCSSA Phi node to which a new predecessor was added, such that
+ /// it may no longer be trivial.
+ void forgetLcssaPhiWithNewPredecessor(PHINode *V);
+
/// Called when the client has changed the disposition of values in
/// this loop.
///
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 4850a6aa5625d42..17edce0aff61758 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) {
forgetMemoizedResults(ToForget);
}
+void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) {
+ // If SCEV looked through a trivial LCSSA phi node, we might have SCEV's
+ // directly using a SCEVUnknown defined in the loop. After an extra
+ // predecessor is added, this is no longer valid. Find all SCEVUnknown's
+ // defined in the loop and invalidate any SCEV's making use of them.
+ if (isSCEVable(V->getType())) {
+ if (const SCEV *S = getExistingSCEV(V)) {
+ struct SCEVUnknownCollector {
+ SmallVector<const SCEV *, 8> Unknowns;
+
+ bool follow(const SCEV *S) {
+ // This could be restricted to unknowns defined in the loop.
+ if (isa<SCEVUnknown>(S))
+ Unknowns.push_back(S);
+ return true;
+ }
+ bool isDone() const { return false; }
+ };
+
+ SCEVUnknownCollector C;
+ visitAll(S, C);
+ forgetMemoizedResults(C.Unknowns);
+ }
+ }
+
+ // Also perform the normal invalidation.
+ forgetValue(V);
+}
+
void ScalarEvolution::forgetLoopDispositions() { LoopDispositions.clear(); }
void ScalarEvolution::forgetBlockAndLoopDispositions(Value *V) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0807d2a7e5a2671..d17412f52981e0d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3552,7 +3552,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
OrigLoop->getExitBlocks(ExitBlocks);
for (BasicBlock *Exit : ExitBlocks)
for (PHINode &PN : Exit->phis())
- PSE.getSE()->forgetValue(&PN);
+ PSE.getSE()->forgetLcssaPhiWithNewPredecessor(&PN);
VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
Loop *VectorLoop = LI->getLoopFor(State.CFG.VPBB2IRBB[LatchVPBB]);
diff --git a/llvm/test/Transforms/LoopVectorize/pr66616.ll b/llvm/test/Transforms/LoopVectorize/pr66616.ll
new file mode 100644
index 000000000000000..2fb7f88e5341e5f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/pr66616.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -passes="print<scalar-evolution>,loop-vectorize" --verify-scev -S < %s -force-vector-width=4 2>/dev/null | FileCheck %s
+
+; Make sure users of SCEVUnknowns from the scalar loop are invalidated.
+
+define void @pr66616(ptr %ptr) {
+; CHECK-LABEL: define void @pr66616(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK: vector.ph:
+; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
+; CHECK: vector.body:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i64 0
+; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 1, i32 1, i32 1, i32 1>
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
+; CHECK-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: middle.block:
+; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
+; CHECK-NEXT: br i1 true, label [[PREHEADER:%.*]], label [[SCALAR_PH]]
+; CHECK: scalar.ph:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i8 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT: br label [[LOOP_1:%.*]]
+; CHECK: loop.1:
+; CHECK-NEXT: [[IV_1:%.*]] = phi i8 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[LOOP_1]] ]
+; CHECK-NEXT: [[LOAD:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT: [[ADD3:%.*]] = add i32 [[LOAD]], 1
+; CHECK-NEXT: [[INC]] = add i8 [[IV_1]], 1
+; CHECK-NEXT: [[COND1:%.*]] = icmp eq i8 [[INC]], 0
+; CHECK-NEXT: br i1 [[COND1]], label [[PREHEADER]], label [[LOOP_1]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: preheader:
+; CHECK-NEXT: [[ADD3_LCSSA:%.*]] = phi i32 [ [[ADD3]], [[LOOP_1]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT: [[TMP4:%.*]] = sub i32 0, [[ADD3_LCSSA]]
+; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
+; CHECK-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 1
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP6]], 4
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH2:%.*]], label [[VECTOR_PH3:%.*]]
+; CHECK: vector.ph3:
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP6]], 4
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP6]], [[N_MOD_VF]]
+; CHECK-NEXT: [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; CHECK-NEXT: [[IND_END:%.*]] = add i32 [[ADD3_LCSSA]], [[DOTCAST]]
+; CHECK-NEXT: [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR]], i64 [[N_VEC]]
+; CHECK-NEXT: br label [[VECTOR_BODY7:%.*]]
+; CHECK: vector.body7:
+; CHECK-NEXT: [[INDEX8:%.*]] = phi i64 [ 0, [[VECTOR_PH3]] ], [ [[INDEX_NEXT9:%.*]], [[VECTOR_BODY7]] ]
+; CHECK-NEXT: [[INDEX_NEXT9]] = add nuw i64 [[INDEX8]], 4
+; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT9]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK1:%.*]], label [[VECTOR_BODY7]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK: middle.block1:
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP6]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH2]]
+; CHECK: scalar.ph2:
+; CHECK-NEXT: [[BC_RESUME_VAL4:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK1]] ], [ [[ADD3_LCSSA]], [[PREHEADER]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL6:%.*]] = phi ptr [ [[IND_END5]], [[MIDDLE_BLOCK1]] ], [ [[PTR]], [[PREHEADER]] ]
+; CHECK-NEXT: br label [[LOOP_2:%.*]]
+; CHECK: loop.2:
+; CHECK-NEXT: [[IV_2:%.*]] = phi i32 [ [[IV_2_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL4]], [[SCALAR_PH2]] ]
+; CHECK-NEXT: [[IV_3:%.*]] = phi ptr [ [[IV_3_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH2]] ]
+; CHECK-NEXT: [[IV_2_I]] = add i32 [[IV_2]], 1
+; CHECK-NEXT: [[IV_3_I]] = getelementptr i8, ptr [[IV_3]], i64 1
+; CHECK-NEXT: [[COND2:%.*]] = icmp eq i32 [[IV_2]], 0
+; CHECK-NEXT: br i1 [[COND2]], label [[EXIT]], label [[LOOP_2]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %loop.1
+
+loop.1:
+ %iv.1 = phi i8 [ 0, %entry ], [ %inc, %loop.1 ]
+ %load = load i32, ptr %ptr, align 4
+ %add3 = add i32 %load, 1
+ %inc = add i8 %iv.1, 1
+ %cond1 = icmp eq i8 %inc, 0
+ br i1 %cond1, label %preheader, label %loop.1
+
+preheader:
+ br label %loop.2
+
+loop.2:
+ %iv.2 = phi i32 [ %iv.2.i, %loop.2 ], [ %add3, %preheader ]
+ %iv.3 = phi ptr [ %iv.3.i, %loop.2 ], [ %ptr, %preheader ]
+ %iv.2.i = add i32 %iv.2, 1
+ %iv.3.i = getelementptr i8, ptr %iv.3, i64 1
+ %cond2 = icmp eq i32 %iv.2, 0
+ br i1 %cond2, label %exit, label %loop.2
+
+exit:
+ ret void
+}
|
Oh, this solution is better than #69886. I forgot the IR is in LCSSA form. |
ping |
Could you please check whether this patch fixed b22917e#commitcomment-131198746 ? I assumed it's because loop optimzations share some invalid analysis cache. |
@fhahn Ping. |
ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extending the invalidation.
I'm wondering if expressions other than SCEVUnknown could also cause issues? IIUC any operand from the SCEV expression for the phi could be used outside and wouldn't those uses need invalidating as well? They might not cause crashes via isAvailableAtLoopEntry
, could they cause slightly incorrect BECounts?
@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) { | |||
forgetMemoizedResults(ToForget); | |||
} | |||
|
|||
void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) { | |||
// If SCEV looked through a trivial LCSSA phi node, we might have SCEV's | |||
// directly using a SCEVUnknown defined in the loop. After an extra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? follow
has a comment that it could be restricted, but isn't at the moment AFAICT?
SmallVector<const SCEV *, 8> Unknowns; | ||
|
||
bool follow(const SCEV *S) { | ||
// This could be restricted to unknowns defined in the loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and implemented the extra check.
} | ||
|
||
// Also perform the normal invalidation. | ||
forgetValue(V); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sufficient to do this if V
is SCEVable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. forgetValue() will skip non-SCEVable itself. I've converted this into an early exit for the whole function.
We are invalidating everything using SCEVUnknowns defined in the loop. Keep in mind that forgetMemoizedResults() does a recursive walk over the users. So this is not just the SCEVUnknowns themselves. I think the only other thing that we might need to consider as invalidation roots in addition to SCEVUnknown are SCEVAddRecs, as these are loop-defined values that are not based on SCEVUnknown. Though those would typically get invalided by forgetLoop() anyway. Do you think I should add addrec handling to be on the safe side? |
Yep, addrecs where the main thing I was thinking about. In theory, I think there may also be issues with constants (e.g. a lcssa phi with a constant value) where we would need to invalidate those uses once we add an incoming value. This may be mostly theoretical as those would be simplified elsewhere. Mostly curious how expensive invalidating addrecs and/or all roots would be |
I don't think constants or similar are relevant here, because the change of the phi node still has to preserve its value. If it previously evaluated to a constant, it has to evaluate to the same constant afterwards as well. The value doesn't change, but the old representation of the value may no longer be valid. |
Yeah probably not necessary. Would still be good to handle AddRecs as well for completeness, even if they may already be invalidated. |
Fix this an alternative to llvm#69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as thoe in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns used by the LCSSA phi node. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unroling and loop peeling.
fe57709
to
7711c01
Compare
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This an alternative to llvm#69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes llvm#69097. Fixes llvm#66616. Fixes llvm#63970.
This an alternative to llvm#69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes llvm#69097. Fixes llvm#66616. Fixes llvm#63970.
This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts.
Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop.
We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling.
Fixes #69097.
Fixes #66616.
Fixes #63970.