[SCEV][LV] Invalidate LCSSA exit phis more thoroughly #69909

nikic · 2023-10-23T10:00:06Z

This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling.

Fixes #69097.
Fixes #66616.
Fixes #63970.

llvmbot · 2023-10-23T10:01:48Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns used by the LCSSA phi node.

We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling.

Full diff: https://github.com/llvm/llvm-project/pull/69909.diff

4 Files Affected:

(modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+4)
(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+29)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
(added) llvm/test/Transforms/LoopVectorize/pr66616.ll (+95)

diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 2765f1286d8bce5..ab2007abe6b3d5d 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -943,6 +943,10 @@ class ScalarEvolution {
   /// def-use chain linking it to a loop.
   void forgetValue(Value *V);
 
+  /// Forget LCSSA Phi node to which a new predecessor was added, such that
+  /// it may no longer be trivial.
+  void forgetLcssaPhiWithNewPredecessor(PHINode *V);
+
   /// Called when the client has changed the disposition of values in
   /// this loop.
   ///
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 4850a6aa5625d42..17edce0aff61758 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) {
   forgetMemoizedResults(ToForget);
 }
 
+void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) {
+  // If SCEV looked through a trivial LCSSA phi node, we might have SCEV's
+  // directly using a SCEVUnknown defined in the loop. After an extra
+  // predecessor is added, this is no longer valid. Find all SCEVUnknown's
+  // defined in the loop and invalidate any SCEV's making use of them.
+  if (isSCEVable(V->getType())) {
+    if (const SCEV *S = getExistingSCEV(V)) {
+      struct SCEVUnknownCollector {
+        SmallVector<const SCEV *, 8> Unknowns;
+
+        bool follow(const SCEV *S) {
+          // This could be restricted to unknowns defined in the loop.
+          if (isa<SCEVUnknown>(S))
+            Unknowns.push_back(S);
+          return true;
+        }
+        bool isDone() const { return false; }
+      };
+
+      SCEVUnknownCollector C;
+      visitAll(S, C);
+      forgetMemoizedResults(C.Unknowns);
+    }
+  }
+
+  // Also perform the normal invalidation.
+  forgetValue(V);
+}
+
 void ScalarEvolution::forgetLoopDispositions() { LoopDispositions.clear(); }
 
 void ScalarEvolution::forgetBlockAndLoopDispositions(Value *V) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0807d2a7e5a2671..d17412f52981e0d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3552,7 +3552,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
   OrigLoop->getExitBlocks(ExitBlocks);
   for (BasicBlock *Exit : ExitBlocks)
     for (PHINode &PN : Exit->phis())
-      PSE.getSE()->forgetValue(&PN);
+      PSE.getSE()->forgetLcssaPhiWithNewPredecessor(&PN);
 
   VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
   Loop *VectorLoop = LI->getLoopFor(State.CFG.VPBB2IRBB[LatchVPBB]);
diff --git a/llvm/test/Transforms/LoopVectorize/pr66616.ll b/llvm/test/Transforms/LoopVectorize/pr66616.ll
new file mode 100644
index 000000000000000..2fb7f88e5341e5f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/pr66616.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -passes="print<scalar-evolution>,loop-vectorize" --verify-scev -S < %s -force-vector-width=4 2>/dev/null | FileCheck %s
+
+; Make sure users of SCEVUnknowns from the scalar loop are invalidated.
+
+define void @pr66616(ptr %ptr) {
+; CHECK-LABEL: define void @pr66616(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP1:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 1, i32 1, i32 1, i32 1>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
+; CHECK-NEXT:    br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
+; CHECK-NEXT:    br i1 true, label [[PREHEADER:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i8 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    br label [[LOOP_1:%.*]]
+; CHECK:       loop.1:
+; CHECK-NEXT:    [[IV_1:%.*]] = phi i8 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[LOOP_1]] ]
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    [[ADD3:%.*]] = add i32 [[LOAD]], 1
+; CHECK-NEXT:    [[INC]] = add i8 [[IV_1]], 1
+; CHECK-NEXT:    [[COND1:%.*]] = icmp eq i8 [[INC]], 0
+; CHECK-NEXT:    br i1 [[COND1]], label [[PREHEADER]], label [[LOOP_1]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       preheader:
+; CHECK-NEXT:    [[ADD3_LCSSA:%.*]] = phi i32 [ [[ADD3]], [[LOOP_1]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = sub i32 0, [[ADD3_LCSSA]]
+; CHECK-NEXT:    [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 1
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP6]], 4
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH2:%.*]], label [[VECTOR_PH3:%.*]]
+; CHECK:       vector.ph3:
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP6]], 4
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP6]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; CHECK-NEXT:    [[IND_END:%.*]] = add i32 [[ADD3_LCSSA]], [[DOTCAST]]
+; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR]], i64 [[N_VEC]]
+; CHECK-NEXT:    br label [[VECTOR_BODY7:%.*]]
+; CHECK:       vector.body7:
+; CHECK-NEXT:    [[INDEX8:%.*]] = phi i64 [ 0, [[VECTOR_PH3]] ], [ [[INDEX_NEXT9:%.*]], [[VECTOR_BODY7]] ]
+; CHECK-NEXT:    [[INDEX_NEXT9]] = add nuw i64 [[INDEX8]], 4
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT9]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP7]], label [[MIDDLE_BLOCK1:%.*]], label [[VECTOR_BODY7]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       middle.block1:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP6]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH2]]
+; CHECK:       scalar.ph2:
+; CHECK-NEXT:    [[BC_RESUME_VAL4:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK1]] ], [ [[ADD3_LCSSA]], [[PREHEADER]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi ptr [ [[IND_END5]], [[MIDDLE_BLOCK1]] ], [ [[PTR]], [[PREHEADER]] ]
+; CHECK-NEXT:    br label [[LOOP_2:%.*]]
+; CHECK:       loop.2:
+; CHECK-NEXT:    [[IV_2:%.*]] = phi i32 [ [[IV_2_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL4]], [[SCALAR_PH2]] ]
+; CHECK-NEXT:    [[IV_3:%.*]] = phi ptr [ [[IV_3_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH2]] ]
+; CHECK-NEXT:    [[IV_2_I]] = add i32 [[IV_2]], 1
+; CHECK-NEXT:    [[IV_3_I]] = getelementptr i8, ptr [[IV_3]], i64 1
+; CHECK-NEXT:    [[COND2:%.*]] = icmp eq i32 [[IV_2]], 0
+; CHECK-NEXT:    br i1 [[COND2]], label [[EXIT]], label [[LOOP_2]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop.1
+
+loop.1:
+  %iv.1 = phi i8 [ 0, %entry ], [ %inc, %loop.1 ]
+  %load = load i32, ptr %ptr, align 4
+  %add3 = add i32 %load, 1
+  %inc = add i8 %iv.1, 1
+  %cond1 = icmp eq i8 %inc, 0
+  br i1 %cond1, label %preheader, label %loop.1
+
+preheader:
+  br label %loop.2
+
+loop.2:
+  %iv.2 = phi i32 [ %iv.2.i, %loop.2 ], [ %add3, %preheader ]
+  %iv.3 = phi ptr [ %iv.3.i, %loop.2 ], [ %ptr, %preheader ]
+  %iv.2.i = add i32 %iv.2, 1
+  %iv.3.i = getelementptr i8, ptr %iv.3, i64 1
+  %cond2 = icmp eq i32 %iv.2, 0
+  br i1 %cond2, label %exit, label %loop.2
+
+exit:
+  ret void
+}

dtcxzyw · 2023-10-23T10:55:17Z

Oh, this solution is better than #69886. I forgot the IR is in LCSSA form.

nikic · 2023-10-30T08:08:32Z

ping

XChy · 2023-10-30T10:16:33Z

Could you please check whether this patch fixed b22917e#commitcomment-131198746 ? I assumed it's because loop optimzations share some invalid analysis cache.

dtcxzyw · 2023-11-03T06:25:17Z

@fhahn Ping.

yuxuanchen1997 · 2023-11-03T20:27:02Z

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

fhahn · 2023-11-04T17:43:30Z

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

That’s something I’d like to check in the next few days

fhahn · 2023-11-04T22:19:17Z

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

That’s something I’d like to check in the next few days

Ok looks unrelated to this PR. (Planning to take a look at this one tomorrow or on Monday)

nikic · 2023-11-13T08:50:59Z

ping :)

fhahn

Thanks for extending the invalidation.

I'm wondering if expressions other than SCEVUnknown could also cause issues? IIUC any operand from the SCEV expression for the phi could be used outside and wouldn't those uses need invalidating as well? They might not cause crashes via isAvailableAtLoopEntry, could they cause slightly incorrect BECounts?

fhahn · 2023-11-15T14:51:54Z

llvm/lib/Analysis/ScalarEvolution.cpp

@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) {
  forgetMemoizedResults(ToForget);
 }

+void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) {
+  // If SCEV looked through a trivial LCSSA phi node, we might have SCEV's
+  // directly using a SCEVUnknown defined in the loop. After an extra


Is this accurate? follow has a comment that it could be restricted, but isn't at the moment AFAICT?

fhahn · 2023-11-15T14:52:12Z

llvm/lib/Analysis/ScalarEvolution.cpp

+        SmallVector<const SCEV *, 8> Unknowns;
+
+        bool follow(const SCEV *S) {
+          // This could be restricted to unknowns defined in the loop.


Should this be a TODO?

I went ahead and implemented the extra check.

fhahn · 2023-11-15T15:16:09Z

llvm/lib/Analysis/ScalarEvolution.cpp

+  }
+
+  // Also perform the normal invalidation.
+  forgetValue(V);


Sufficient to do this if V is SCEVable?

Yes. forgetValue() will skip non-SCEVable itself. I've converted this into an early exit for the whole function.

nikic · 2023-11-15T16:01:17Z

I'm wondering if expressions other than SCEVUnknown could also cause issues? IIUC any operand from the SCEV expression for the phi could be used outside and wouldn't those uses need invalidating as well? They might not cause crashes via isAvailableAtLoopEntry, could they cause slightly incorrect BECounts?

We are invalidating everything using SCEVUnknowns defined in the loop. Keep in mind that forgetMemoizedResults() does a recursive walk over the users. So this is not just the SCEVUnknowns themselves.

I think the only other thing that we might need to consider as invalidation roots in addition to SCEVUnknown are SCEVAddRecs, as these are loop-defined values that are not based on SCEVUnknown. Though those would typically get invalided by forgetLoop() anyway. Do you think I should add addrec handling to be on the safe side?

fhahn · 2023-11-15T19:50:42Z

Yep, addrecs where the main thing I was thinking about.

In theory, I think there may also be issues with constants (e.g. a lcssa phi with a constant value) where we would need to invalidate those uses once we add an incoming value. This may be mostly theoretical as those would be simplified elsewhere. Mostly curious how expensive invalidating addrecs and/or all roots would be

nikic · 2023-11-15T20:32:09Z

Yep, addrecs where the main thing I was thinking about.

In theory, I think there may also be issues with constants (e.g. a lcssa phi with a constant value) where we would need to invalidate those uses once we add an incoming value. This may be mostly theoretical as those would be simplified elsewhere. Mostly curious how expensive invalidating addrecs and/or all roots would be

I don't think constants or similar are relevant here, because the change of the phi node still has to preserve its value. If it previously evaluated to a constant, it has to evaluate to the same constant afterwards as well. The value doesn't change, but the old representation of the value may no longer be valid.

fhahn · 2023-11-15T22:04:07Z

I don't think constants or similar are relevant here, because the change of the phi node still has to preserve its value. If it previously evaluated to a constant, it has to evaluate to the same constant afterwards as well. The value doesn't change, but the old representation of the value may no longer be valid.

Yeah probably not necessary. Would still be good to handle AddRecs as well for completeness, even if they may already be invalidated.

Fix this an alternative to llvm#69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as thoe in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns used by the LCSSA phi node. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unroling and loop peeling.

nikic · 2023-11-16T09:16:59Z

Yeah probably not necessary. Would still be good to handle AddRecs as well for completeness, even if they may already be invalidated.

Done!

fhahn

LGTM, thanks!

This an alternative to llvm#69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts. Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop. We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling. Fixes llvm#69097. Fixes llvm#66616. Fixes llvm#63970.

nikic requested review from fhahn and dtcxzyw October 23, 2023 10:00

llvmbot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 23, 2023

nikic mentioned this pull request Oct 23, 2023

[LV] Invalidate SCEV values in the scalar loop after loop vectorization #69886

Closed

fhahn mentioned this pull request Nov 4, 2023

LoopVectorize Miscompilation with Aliases in clang 15+ #69744

Closed

fhahn reviewed Nov 15, 2023

View reviewed changes

nikic force-pushed the perf/lv-scev-fix branch from fe57709 to 7711c01 Compare November 16, 2023 09:15

fhahn approved these changes Nov 16, 2023

View reviewed changes

nikic merged commit de176d8 into llvm:main Nov 17, 2023

[SCEV][LV] Invalidate LCSSA exit phis more thoroughly #69909

[SCEV][LV] Invalidate LCSSA exit phis more thoroughly #69909

Uh oh!

Conversation

nikic commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Oct 23, 2023

Uh oh!

nikic commented Oct 30, 2023

Uh oh!

XChy commented Oct 30, 2023

Uh oh!

dtcxzyw commented Nov 3, 2023

Uh oh!

yuxuanchen1997 commented Nov 3, 2023

Uh oh!

fhahn commented Nov 4, 2023

Uh oh!

fhahn commented Nov 4, 2023

Uh oh!

nikic commented Nov 13, 2023

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

nikic Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

fhahn Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

nikic Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

nikic commented Nov 15, 2023

Uh oh!

fhahn commented Nov 15, 2023

Uh oh!

nikic commented Nov 15, 2023

Uh oh!

fhahn commented Nov 15, 2023

Uh oh!

nikic commented Nov 16, 2023

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic commented Oct 23, 2023 •

edited

Loading

llvmbot commented Oct 23, 2023 •

edited

Loading