Skip to content

[SCEV][LV] Invalidate LCSSA exit phis more thoroughly #69909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 17, 2023

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Oct 23, 2023

This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling.

Fixes #69097.
Fixes #66616.
Fixes #63970.

@nikic nikic requested review from fhahn and dtcxzyw October 23, 2023 10:00
@llvmbot llvmbot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Oct 23, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Nikita Popov (nikic)

Changes

This an alternative to #69886. The basic problem is that SCEV can look through trivial LCSSA phis. When the phi node later becomes non-trivial, we do invalidate it, but this doesn't catch uses that are not covered by the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which will also invalidate all the SCEVUnknowns used by the LCSSA phi node.

We should probably also use this invalidation method in other places that add predecessors to exit blocks, such as loop unrolling and loop peeling.


Full diff: https://github.com/llvm/llvm-project/pull/69909.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+4)
  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+29)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (added) llvm/test/Transforms/LoopVectorize/pr66616.ll (+95)
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 2765f1286d8bce5..ab2007abe6b3d5d 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -943,6 +943,10 @@ class ScalarEvolution {
   /// def-use chain linking it to a loop.
   void forgetValue(Value *V);
 
+  /// Forget LCSSA Phi node to which a new predecessor was added, such that
+  /// it may no longer be trivial.
+  void forgetLcssaPhiWithNewPredecessor(PHINode *V);
+
   /// Called when the client has changed the disposition of values in
   /// this loop.
   ///
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 4850a6aa5625d42..17edce0aff61758 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) {
   forgetMemoizedResults(ToForget);
 }
 
+void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) {
+  // If SCEV looked through a trivial LCSSA phi node, we might have SCEV's
+  // directly using a SCEVUnknown defined in the loop. After an extra
+  // predecessor is added, this is no longer valid. Find all SCEVUnknown's
+  // defined in the loop and invalidate any SCEV's making use of them.
+  if (isSCEVable(V->getType())) {
+    if (const SCEV *S = getExistingSCEV(V)) {
+      struct SCEVUnknownCollector {
+        SmallVector<const SCEV *, 8> Unknowns;
+
+        bool follow(const SCEV *S) {
+          // This could be restricted to unknowns defined in the loop.
+          if (isa<SCEVUnknown>(S))
+            Unknowns.push_back(S);
+          return true;
+        }
+        bool isDone() const { return false; }
+      };
+
+      SCEVUnknownCollector C;
+      visitAll(S, C);
+      forgetMemoizedResults(C.Unknowns);
+    }
+  }
+
+  // Also perform the normal invalidation.
+  forgetValue(V);
+}
+
 void ScalarEvolution::forgetLoopDispositions() { LoopDispositions.clear(); }
 
 void ScalarEvolution::forgetBlockAndLoopDispositions(Value *V) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0807d2a7e5a2671..d17412f52981e0d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3552,7 +3552,7 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
   OrigLoop->getExitBlocks(ExitBlocks);
   for (BasicBlock *Exit : ExitBlocks)
     for (PHINode &PN : Exit->phis())
-      PSE.getSE()->forgetValue(&PN);
+      PSE.getSE()->forgetLcssaPhiWithNewPredecessor(&PN);
 
   VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
   Loop *VectorLoop = LI->getLoopFor(State.CFG.VPBB2IRBB[LatchVPBB]);
diff --git a/llvm/test/Transforms/LoopVectorize/pr66616.ll b/llvm/test/Transforms/LoopVectorize/pr66616.ll
new file mode 100644
index 000000000000000..2fb7f88e5341e5f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/pr66616.ll
@@ -0,0 +1,95 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -passes="print<scalar-evolution>,loop-vectorize" --verify-scev -S < %s -force-vector-width=4 2>/dev/null | FileCheck %s
+
+; Make sure users of SCEVUnknowns from the scalar loop are invalidated.
+
+define void @pr66616(ptr %ptr) {
+; CHECK-LABEL: define void @pr66616(
+; CHECK-SAME: ptr [[PTR:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP1:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 1, i32 1, i32 1, i32 1>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
+; CHECK-NEXT:    br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 3
+; CHECK-NEXT:    br i1 true, label [[PREHEADER:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i8 [ 0, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    br label [[LOOP_1:%.*]]
+; CHECK:       loop.1:
+; CHECK-NEXT:    [[IV_1:%.*]] = phi i8 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[LOOP_1]] ]
+; CHECK-NEXT:    [[LOAD:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT:    [[ADD3:%.*]] = add i32 [[LOAD]], 1
+; CHECK-NEXT:    [[INC]] = add i8 [[IV_1]], 1
+; CHECK-NEXT:    [[COND1:%.*]] = icmp eq i8 [[INC]], 0
+; CHECK-NEXT:    br i1 [[COND1]], label [[PREHEADER]], label [[LOOP_1]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       preheader:
+; CHECK-NEXT:    [[ADD3_LCSSA:%.*]] = phi i32 [ [[ADD3]], [[LOOP_1]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = sub i32 0, [[ADD3_LCSSA]]
+; CHECK-NEXT:    [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 1
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP6]], 4
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH2:%.*]], label [[VECTOR_PH3:%.*]]
+; CHECK:       vector.ph3:
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP6]], 4
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP6]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; CHECK-NEXT:    [[IND_END:%.*]] = add i32 [[ADD3_LCSSA]], [[DOTCAST]]
+; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR]], i64 [[N_VEC]]
+; CHECK-NEXT:    br label [[VECTOR_BODY7:%.*]]
+; CHECK:       vector.body7:
+; CHECK-NEXT:    [[INDEX8:%.*]] = phi i64 [ 0, [[VECTOR_PH3]] ], [ [[INDEX_NEXT9:%.*]], [[VECTOR_BODY7]] ]
+; CHECK-NEXT:    [[INDEX_NEXT9]] = add nuw i64 [[INDEX8]], 4
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT9]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP7]], label [[MIDDLE_BLOCK1:%.*]], label [[VECTOR_BODY7]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       middle.block1:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP6]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH2]]
+; CHECK:       scalar.ph2:
+; CHECK-NEXT:    [[BC_RESUME_VAL4:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK1]] ], [ [[ADD3_LCSSA]], [[PREHEADER]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi ptr [ [[IND_END5]], [[MIDDLE_BLOCK1]] ], [ [[PTR]], [[PREHEADER]] ]
+; CHECK-NEXT:    br label [[LOOP_2:%.*]]
+; CHECK:       loop.2:
+; CHECK-NEXT:    [[IV_2:%.*]] = phi i32 [ [[IV_2_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL4]], [[SCALAR_PH2]] ]
+; CHECK-NEXT:    [[IV_3:%.*]] = phi ptr [ [[IV_3_I:%.*]], [[LOOP_2]] ], [ [[BC_RESUME_VAL6]], [[SCALAR_PH2]] ]
+; CHECK-NEXT:    [[IV_2_I]] = add i32 [[IV_2]], 1
+; CHECK-NEXT:    [[IV_3_I]] = getelementptr i8, ptr [[IV_3]], i64 1
+; CHECK-NEXT:    [[COND2:%.*]] = icmp eq i32 [[IV_2]], 0
+; CHECK-NEXT:    br i1 [[COND2]], label [[EXIT]], label [[LOOP_2]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %loop.1
+
+loop.1:
+  %iv.1 = phi i8 [ 0, %entry ], [ %inc, %loop.1 ]
+  %load = load i32, ptr %ptr, align 4
+  %add3 = add i32 %load, 1
+  %inc = add i8 %iv.1, 1
+  %cond1 = icmp eq i8 %inc, 0
+  br i1 %cond1, label %preheader, label %loop.1
+
+preheader:
+  br label %loop.2
+
+loop.2:
+  %iv.2 = phi i32 [ %iv.2.i, %loop.2 ], [ %add3, %preheader ]
+  %iv.3 = phi ptr [ %iv.3.i, %loop.2 ], [ %ptr, %preheader ]
+  %iv.2.i = add i32 %iv.2, 1
+  %iv.3.i = getelementptr i8, ptr %iv.3, i64 1
+  %cond2 = icmp eq i32 %iv.2, 0
+  br i1 %cond2, label %exit, label %loop.2
+
+exit:
+  ret void
+}

@dtcxzyw
Copy link
Member

dtcxzyw commented Oct 23, 2023

Oh, this solution is better than #69886. I forgot the IR is in LCSSA form.

@nikic
Copy link
Contributor Author

nikic commented Oct 30, 2023

ping

@XChy
Copy link
Member

XChy commented Oct 30, 2023

Could you please check whether this patch fixed b22917e#commitcomment-131198746 ? I assumed it's because loop optimzations share some invalid analysis cache.

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 3, 2023

@fhahn Ping.

@yuxuanchen1997
Copy link
Member

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

@fhahn
Copy link
Contributor

fhahn commented Nov 4, 2023

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

That’s something I’d like to check in the next few days

@fhahn
Copy link
Contributor

fhahn commented Nov 4, 2023

Hey @nikic, is this patch in scope to fix #69744 as well? I think this still repros on your branch.

That’s something I’d like to check in the next few days

Ok looks unrelated to this PR. (Planning to take a look at this one tomorrow or on Monday)

@nikic
Copy link
Contributor Author

nikic commented Nov 13, 2023

ping :)

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending the invalidation.

I'm wondering if expressions other than SCEVUnknown could also cause issues? IIUC any operand from the SCEV expression for the phi could be used outside and wouldn't those uses need invalidating as well? They might not cause crashes via isAvailableAtLoopEntry, could they cause slightly incorrect BECounts?

@@ -8408,6 +8408,35 @@ void ScalarEvolution::forgetValue(Value *V) {
forgetMemoizedResults(ToForget);
}

void ScalarEvolution::forgetLcssaPhiWithNewPredecessor(PHINode *V) {
// If SCEV looked through a trivial LCSSA phi node, we might have SCEV's
// directly using a SCEVUnknown defined in the loop. After an extra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate? follow has a comment that it could be restricted, but isn't at the moment AFAICT?

SmallVector<const SCEV *, 8> Unknowns;

bool follow(const SCEV *S) {
// This could be restricted to unknowns defined in the loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and implemented the extra check.

}

// Also perform the normal invalidation.
forgetValue(V);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sufficient to do this if V is SCEVable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. forgetValue() will skip non-SCEVable itself. I've converted this into an early exit for the whole function.

@nikic
Copy link
Contributor Author

nikic commented Nov 15, 2023

I'm wondering if expressions other than SCEVUnknown could also cause issues? IIUC any operand from the SCEV expression for the phi could be used outside and wouldn't those uses need invalidating as well? They might not cause crashes via isAvailableAtLoopEntry, could they cause slightly incorrect BECounts?

We are invalidating everything using SCEVUnknowns defined in the loop. Keep in mind that forgetMemoizedResults() does a recursive walk over the users. So this is not just the SCEVUnknowns themselves.

I think the only other thing that we might need to consider as invalidation roots in addition to SCEVUnknown are SCEVAddRecs, as these are loop-defined values that are not based on SCEVUnknown. Though those would typically get invalided by forgetLoop() anyway. Do you think I should add addrec handling to be on the safe side?

@fhahn
Copy link
Contributor

fhahn commented Nov 15, 2023

Yep, addrecs where the main thing I was thinking about.

In theory, I think there may also be issues with constants (e.g. a lcssa phi with a constant value) where we would need to invalidate those uses once we add an incoming value. This may be mostly theoretical as those would be simplified elsewhere. Mostly curious how expensive invalidating addrecs and/or all roots would be

@nikic
Copy link
Contributor Author

nikic commented Nov 15, 2023

Yep, addrecs where the main thing I was thinking about.

In theory, I think there may also be issues with constants (e.g. a lcssa phi with a constant value) where we would need to invalidate those uses once we add an incoming value. This may be mostly theoretical as those would be simplified elsewhere. Mostly curious how expensive invalidating addrecs and/or all roots would be

I don't think constants or similar are relevant here, because the change of the phi node still has to preserve its value. If it previously evaluated to a constant, it has to evaluate to the same constant afterwards as well. The value doesn't change, but the old representation of the value may no longer be valid.

@fhahn
Copy link
Contributor

fhahn commented Nov 15, 2023

I don't think constants or similar are relevant here, because the change of the phi node still has to preserve its value. If it previously evaluated to a constant, it has to evaluate to the same constant afterwards as well. The value doesn't change, but the old representation of the value may no longer be valid.

Yeah probably not necessary. Would still be good to handle AddRecs as well for completeness, even if they may already be invalidated.

Fix this an alternative to llvm#69886. The basic problem is that
SCEV can look through trivial LCSSA phis. When the phi node
later becomes non-trivial, we do invalidate it, but this doesn't
catch uses that are not covered by the IR use-def walk, such as
thoe in BECounts.

Fix this by adding a special invalidation method for LCSSA phis,
which will also invalidate all the SCEVUnknowns used by the LCSSA
phi node.

We should probably also use this invalidation method in other
places that add predecessors to exit blocks, such as loop unroling
and loop peeling.
@nikic
Copy link
Contributor Author

nikic commented Nov 16, 2023

Yeah probably not necessary. Would still be good to handle AddRecs as well for completeness, even if they may already be invalidated.

Done!

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@nikic nikic merged commit de176d8 into llvm:main Nov 17, 2023
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
This an alternative to llvm#69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.

Fixes llvm#69097.
Fixes llvm#66616.
Fixes llvm#63970.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
This an alternative to llvm#69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.

Fixes llvm#69097.
Fixes llvm#66616.
Fixes llvm#63970.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms vectorizers
Projects
None yet
6 participants