Skip to content

[LAA] Use SCEVUse to add extra NUW flags to pointer bounds. (WIP) #91962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: users/fhahn/scevuse
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 13, 2024

Use SCEVUse to add a NUW flag to the upper bound of an accessed pointer.
We must already have proved that the pointers do not wrap, as otherwise
we could not use them for runtime check computations.

By adding the use-specific NUW flag, we can detect cases where SCEV can
prove that the compared pointers must overlap, so the runtime checks
will always be false. In that case, there is no point in vectorizing
with runtime checks.

Note that this depends c2895cd27fbf200d1da056bc66d77eeb62690bf0, which
could be submitted separately if desired; without the current change, I
don't think it triggers in practice though.

Depends on #91961

@fhahn fhahn requested review from preames and efriedma-quic May 13, 2024 13:31
@fhahn fhahn requested a review from nikic as a code owner May 13, 2024 13:31
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label May 13, 2024
@llvmbot
Copy link
Member

llvmbot commented May 13, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Use SCEVUse to add a NUW flag to the upper bound of an accessed pointer.
We must already have proved that the pointers do not wrap, as otherwise
we could not use them for runtime check computations.

By adding the use-specific NUW flag, we can detect cases where SCEV can
prove that the compared pointers must overlap, so the runtime checks
will always be false. In that case, there is no point in vectorizing
with runtime checks.

Note that this depends c2895cd27fbf200d1da056bc66d77eeb62690bf0, which
could be submitted separately if desired; without the current change, I
don't think it triggers in practice though.

Depends on #91961


Patch is 35.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91962.diff

5 Files Affected:

  • (modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+9-5)
  • (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+15-5)
  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+15-5)
  • (modified) llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll (+67-67)
  • (added) llvm/test/Analysis/LoopAccessAnalysis/scoped-scevs.ll (+149)
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index 6ebd0fb8477a0..0663ef6a2865d 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -372,10 +372,10 @@ struct RuntimeCheckingPtrGroup {
 
   /// The SCEV expression which represents the upper bound of all the
   /// pointers in this group.
-  const SCEV *High;
+  SCEVUse High;
   /// The SCEV expression which represents the lower bound of all the
   /// pointers in this group.
-  const SCEV *Low;
+  SCEVUse Low;
   /// Indices of all the pointers that constitute this grouping.
   SmallVector<unsigned, 2> Members;
   /// Address space of the involved pointers.
@@ -413,10 +413,10 @@ class RuntimePointerChecking {
     TrackingVH<Value> PointerValue;
     /// Holds the smallest byte address accessed by the pointer throughout all
     /// iterations of the loop.
-    const SCEV *Start;
+    SCEVUse Start;
     /// Holds the largest byte address accessed by the pointer throughout all
     /// iterations of the loop, plus 1.
-    const SCEV *End;
+    SCEVUse End;
     /// Holds the information if this pointer is used for writing to memory.
     bool IsWritePtr;
     /// Holds the id of the set of pointers that could be dependent because of a
@@ -429,7 +429,7 @@ class RuntimePointerChecking {
     /// True if the pointer expressions needs to be frozen after expansion.
     bool NeedsFreeze;
 
-    PointerInfo(Value *PointerValue, const SCEV *Start, const SCEV *End,
+    PointerInfo(Value *PointerValue, SCEVUse Start, SCEVUse End,
                 bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,
                 const SCEV *Expr, bool NeedsFreeze)
         : PointerValue(PointerValue), Start(Start), End(End),
@@ -443,8 +443,10 @@ class RuntimePointerChecking {
   /// Reset the state of the pointer runtime information.
   void reset() {
     Need = false;
+    AlwaysFalse = false;
     Pointers.clear();
     Checks.clear();
+    CheckingGroups.clear();
   }
 
   /// Insert a pointer and calculate the start and end SCEVs.
@@ -501,6 +503,8 @@ class RuntimePointerChecking {
   /// This flag indicates if we need to add the runtime check.
   bool Need = false;
 
+  bool AlwaysFalse = false;
+
   /// Information about the pointers that may require checking.
   SmallVector<PointerInfo, 2> Pointers;
 
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index e1d0f900c9050..208c034763f7c 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -210,8 +210,8 @@ void RuntimePointerChecking::insert(Loop *Lp, Value *Ptr, const SCEV *PtrExpr,
                                     bool NeedsFreeze) {
   ScalarEvolution *SE = PSE.getSE();
 
-  const SCEV *ScStart;
-  const SCEV *ScEnd;
+  SCEVUse ScStart;
+  SCEVUse ScEnd;
 
   if (SE->isLoopInvariant(PtrExpr, Lp)) {
     ScStart = ScEnd = PtrExpr;
@@ -223,6 +223,8 @@ void RuntimePointerChecking::insert(Loop *Lp, Value *Ptr, const SCEV *PtrExpr,
     ScStart = AR->getStart();
     ScEnd = AR->evaluateAtIteration(Ex, *SE);
     const SCEV *Step = AR->getStepRecurrence(*SE);
+    if (auto *Comm = dyn_cast<SCEVCommutativeExpr>(ScEnd))
+      ScEnd = SCEVUse(ScEnd, 2);
 
     // For expressions with negative step, the upper bound is ScStart and the
     // lower bound is ScEnd.
@@ -244,7 +246,10 @@ void RuntimePointerChecking::insert(Loop *Lp, Value *Ptr, const SCEV *PtrExpr,
   auto &DL = Lp->getHeader()->getModule()->getDataLayout();
   Type *IdxTy = DL.getIndexType(Ptr->getType());
   const SCEV *EltSizeSCEV = SE->getStoreSizeOfExpr(IdxTy, AccessTy);
-  ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);
+  // TODO: this computes one-past-the-end. ScEnd + EltSizeSCEV - 1 is the last
+  // accessed byte. Not entirely sure if one-past-the-end must also not wrap? If
+  // it does, could compute and use last accessed byte instead.
+  ScEnd = SCEVUse(SE->getAddExpr(ScEnd, EltSizeSCEV), 2);
 
   Pointers.emplace_back(Ptr, ScStart, ScEnd, WritePtr, DepSetId, ASId, PtrExpr,
                         NeedsFreeze);
@@ -379,6 +384,11 @@ SmallVector<RuntimePointerCheck, 4> RuntimePointerChecking::generateChecks() {
       if (needsChecking(CGI, CGJ)) {
         tryToCreateDiffCheck(CGI, CGJ);
         Checks.push_back(std::make_pair(&CGI, &CGJ));
+        if (SE->isKnownPredicate(CmpInst::ICMP_UGT, CGI.High, CGJ.Low) &&
+            SE->isKnownPredicate(CmpInst::ICMP_ULE, CGI.Low, CGJ.High)) {
+          AlwaysFalse = true;
+          return {};
+        }
       }
     }
   }
@@ -635,8 +645,7 @@ void RuntimePointerChecking::print(raw_ostream &OS, unsigned Depth) const {
     const auto &CG = CheckingGroups[I];
 
     OS.indent(Depth + 2) << "Group " << &CG << ":\n";
-    OS.indent(Depth + 4) << "(Low: " << *CG.Low << " High: " << *CG.High
-                         << ")\n";
+    OS.indent(Depth + 4) << "(Low: " << CG.Low << " High: " << CG.High << ")\n";
     for (unsigned J = 0; J < CG.Members.size(); ++J) {
       OS.indent(Depth + 6) << "Member: " << *Pointers[CG.Members[J]].Expr
                            << "\n";
@@ -1274,6 +1283,7 @@ bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
   // If we can do run-time checks, but there are no checks, no runtime checks
   // are needed. This can happen when all pointers point to the same underlying
   // object for example.
+  CanDoRT &= !RtCheck.AlwaysFalse;
   RtCheck.Need = CanDoRT ? RtCheck.getNumberOfChecks() != 0 : MayNeedRTCheck;
 
   bool CanDoRTIfNeeded = !RtCheck.Need || CanDoRT;
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 320be6f26fc0a..04d7450e6cc2e 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -11228,8 +11228,7 @@ bool ScalarEvolution::isKnownPredicateViaNoOverflow(ICmpInst::Predicate Pred,
       XNonConstOp = X;
       XFlagsPresent = ExpectedFlags;
     }
-    if (!isa<SCEVConstant>(XConstOp) ||
-        (XFlagsPresent & ExpectedFlags) != ExpectedFlags)
+    if (!isa<SCEVConstant>(XConstOp))
       return false;
 
     if (!splitBinaryAdd(Y, YConstOp, YNonConstOp, YFlagsPresent)) {
@@ -11238,13 +11237,22 @@ bool ScalarEvolution::isKnownPredicateViaNoOverflow(ICmpInst::Predicate Pred,
       YFlagsPresent = ExpectedFlags;
     }
 
-    if (!isa<SCEVConstant>(YConstOp) ||
-        (YFlagsPresent & ExpectedFlags) != ExpectedFlags)
+    if (YNonConstOp != XNonConstOp)
       return false;
 
-    if (YNonConstOp != XNonConstOp)
+    if (!isa<SCEVConstant>(YConstOp))
       return false;
 
+    if (YNonConstOp != Y && ExpectedFlags == SCEV::FlagNUW) {
+      if ((YFlagsPresent & ExpectedFlags) != ExpectedFlags)
+        return false;
+    } else {
+      if ((XFlagsPresent & ExpectedFlags) != ExpectedFlags)
+        return false;
+      if ((YFlagsPresent & ExpectedFlags) != ExpectedFlags)
+        return false;
+    }
+
     OutC1 = cast<SCEVConstant>(XConstOp)->getAPInt();
     OutC2 = cast<SCEVConstant>(YConstOp)->getAPInt();
 
@@ -11878,6 +11886,8 @@ bool ScalarEvolution::splitBinaryAdd(SCEVUse Expr, SCEVUse &L, SCEVUse &R,
   L = AE->getOperand(0);
   R = AE->getOperand(1);
   Flags = AE->getNoWrapFlags();
+  Flags = setFlags(AE->getNoWrapFlags(),
+                   static_cast<SCEV::NoWrapFlags>(Expr.getInt()));
   return true;
 }
 
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
index cd388b4ee87f2..2f9f6dc39b19d 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
@@ -24,14 +24,14 @@ define void @forked_ptrs_simple(ptr nocapture readonly %Base1, ptr nocapture rea
 ; CHECK-NEXT:          %select = select i1 %cmp, ptr %gep.1, ptr %gep.2
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP1]]:
-; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%loop>
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%loop>
 ; CHECK-NEXT:        Group [[GRP2]]:
-; CHECK-NEXT:          (Low: %Base1 High: (400 + %Base1))
+; CHECK-NEXT:          (Low: %Base1 High: (400 + %Base1)(u nuw))
 ; CHECK-NEXT:            Member: {%Base1,+,4}<nw><%loop>
 ; CHECK-NEXT:        Group [[GRP3]]:
-; CHECK-NEXT:          (Low: %Base2 High: (400 + %Base2))
+; CHECK-NEXT:          (Low: %Base2 High: (400 + %Base2)(u nuw))
 ; CHECK-NEXT:            Member: {%Base2,+,4}<nw><%loop>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -58,14 +58,14 @@ define void @forked_ptrs_simple(ptr nocapture readonly %Base1, ptr nocapture rea
 ; RECURSE-NEXT:          %select = select i1 %cmp, ptr %gep.1, ptr %gep.2
 ; RECURSE-NEXT:      Grouped accesses:
 ; RECURSE-NEXT:        Group [[GRP4]]:
-; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest)(u nuw))
 ; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%loop>
 ; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%loop>
 ; RECURSE-NEXT:        Group [[GRP5]]:
-; RECURSE-NEXT:          (Low: %Base1 High: (400 + %Base1))
+; RECURSE-NEXT:          (Low: %Base1 High: (400 + %Base1)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base1,+,4}<nw><%loop>
 ; RECURSE-NEXT:        Group [[GRP6]]:
-; RECURSE-NEXT:          (Low: %Base2 High: (400 + %Base2))
+; RECURSE-NEXT:          (Low: %Base2 High: (400 + %Base2)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base2,+,4}<nw><%loop>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -132,16 +132,16 @@ define dso_local void @forked_ptrs_different_base_same_offset(ptr nocapture read
 ; CHECK-NEXT:          %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP7]]:
-; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP8]]:
-; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP9]]:
-; CHECK-NEXT:          (Low: %Base2 High: (400 + %Base2))
+; CHECK-NEXT:          (Low: %Base2 High: (400 + %Base2)(u nuw))
 ; CHECK-NEXT:            Member: {%Base2,+,4}<nw><%for.body>
 ; CHECK-NEXT:        Group [[GRP10]]:
-; CHECK-NEXT:          (Low: %Base1 High: (400 + %Base1))
+; CHECK-NEXT:          (Low: %Base1 High: (400 + %Base1)(u nuw))
 ; CHECK-NEXT:            Member: {%Base1,+,4}<nw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -171,16 +171,16 @@ define dso_local void @forked_ptrs_different_base_same_offset(ptr nocapture read
 ; RECURSE-NEXT:          %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
 ; RECURSE-NEXT:        Group [[GRP11]]:
-; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest))
+; RECURSE-NEXT:          (Low: %Dest High: (400 + %Dest)(u nuw))
 ; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP12]]:
-; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP13]]:
-; RECURSE-NEXT:          (Low: %Base2 High: (400 + %Base2))
+; RECURSE-NEXT:          (Low: %Base2 High: (400 + %Base2)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base2,+,4}<nw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP14]]:
-; RECURSE-NEXT:          (Low: %Base1 High: (400 + %Base1))
+; RECURSE-NEXT:          (Low: %Base1 High: (400 + %Base1)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base1,+,4}<nw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -232,16 +232,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_64b(ptr nocapture
 ; CHECK-NEXT:          %.sink.in = getelementptr inbounds double, ptr %spec.select, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP15]]:
-; CHECK-NEXT:          (Low: %Dest High: (800 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (800 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,8}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP16]]:
-; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP17]]:
-; CHECK-NEXT:          (Low: %Base2 High: (800 + %Base2))
+; CHECK-NEXT:          (Low: %Base2 High: (800 + %Base2)(u nuw))
 ; CHECK-NEXT:            Member: {%Base2,+,8}<nw><%for.body>
 ; CHECK-NEXT:        Group [[GRP18]]:
-; CHECK-NEXT:          (Low: %Base1 High: (800 + %Base1))
+; CHECK-NEXT:          (Low: %Base1 High: (800 + %Base1)(u nuw))
 ; CHECK-NEXT:            Member: {%Base1,+,8}<nw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -271,16 +271,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_64b(ptr nocapture
 ; RECURSE-NEXT:          %.sink.in = getelementptr inbounds double, ptr %spec.select, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
 ; RECURSE-NEXT:        Group [[GRP19]]:
-; RECURSE-NEXT:          (Low: %Dest High: (800 + %Dest))
+; RECURSE-NEXT:          (Low: %Dest High: (800 + %Dest)(u nuw))
 ; RECURSE-NEXT:            Member: {%Dest,+,8}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP20]]:
-; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP21]]:
-; RECURSE-NEXT:          (Low: %Base2 High: (800 + %Base2))
+; RECURSE-NEXT:          (Low: %Base2 High: (800 + %Base2)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base2,+,8}<nw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP22]]:
-; RECURSE-NEXT:          (Low: %Base1 High: (800 + %Base1))
+; RECURSE-NEXT:          (Low: %Base1 High: (800 + %Base1)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base1,+,8}<nw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -332,16 +332,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_23b(ptr nocapture
 ; CHECK-NEXT:          %.sink.in = getelementptr inbounds i23, ptr %spec.select, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP23]]:
-; CHECK-NEXT:          (Low: %Dest High: (399 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (399 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP24]]:
-; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP25]]:
-; CHECK-NEXT:          (Low: %Base2 High: (399 + %Base2))
+; CHECK-NEXT:          (Low: %Base2 High: (399 + %Base2)(u nuw))
 ; CHECK-NEXT:            Member: {%Base2,+,4}<nw><%for.body>
 ; CHECK-NEXT:        Group [[GRP26]]:
-; CHECK-NEXT:          (Low: %Base1 High: (399 + %Base1))
+; CHECK-NEXT:          (Low: %Base1 High: (399 + %Base1)(u nuw))
 ; CHECK-NEXT:            Member: {%Base1,+,4}<nw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -371,16 +371,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_23b(ptr nocapture
 ; RECURSE-NEXT:          %.sink.in = getelementptr inbounds i23, ptr %spec.select, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
 ; RECURSE-NEXT:        Group [[GRP27]]:
-; RECURSE-NEXT:          (Low: %Dest High: (399 + %Dest))
+; RECURSE-NEXT:          (Low: %Dest High: (399 + %Dest)(u nuw))
 ; RECURSE-NEXT:            Member: {%Dest,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP28]]:
-; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP29]]:
-; RECURSE-NEXT:          (Low: %Base2 High: (399 + %Base2))
+; RECURSE-NEXT:          (Low: %Base2 High: (399 + %Base2)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base2,+,4}<nw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP30]]:
-; RECURSE-NEXT:          (Low: %Base1 High: (399 + %Base1))
+; RECURSE-NEXT:          (Low: %Base1 High: (399 + %Base1)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base1,+,4}<nw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -432,16 +432,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_6b(ptr nocapture r
 ; CHECK-NEXT:          %.sink.in = getelementptr inbounds i6, ptr %spec.select, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP31]]:
-; CHECK-NEXT:          (Low: %Dest High: (100 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (100 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,1}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP32]]:
-; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds))
+; CHECK-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; CHECK-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; CHECK-NEXT:        Group [[GRP33]]:
-; CHECK-NEXT:          (Low: %Base2 High: (100 + %Base2))
+; CHECK-NEXT:          (Low: %Base2 High: (100 + %Base2)(u nuw))
 ; CHECK-NEXT:            Member: {%Base2,+,1}<nw><%for.body>
 ; CHECK-NEXT:        Group [[GRP34]]:
-; CHECK-NEXT:          (Low: %Base1 High: (100 + %Base1))
+; CHECK-NEXT:          (Low: %Base1 High: (100 + %Base1)(u nuw))
 ; CHECK-NEXT:            Member: {%Base1,+,1}<nw><%for.body>
 ; CHECK-EMPTY:
 ; CHECK-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -471,16 +471,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_6b(ptr nocapture r
 ; RECURSE-NEXT:          %.sink.in = getelementptr inbounds i6, ptr %spec.select, i64 %indvars.iv
 ; RECURSE-NEXT:      Grouped accesses:
 ; RECURSE-NEXT:        Group [[GRP35]]:
-; RECURSE-NEXT:          (Low: %Dest High: (100 + %Dest))
+; RECURSE-NEXT:          (Low: %Dest High: (100 + %Dest)(u nuw))
 ; RECURSE-NEXT:            Member: {%Dest,+,1}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP36]]:
-; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds))
+; RECURSE-NEXT:          (Low: %Preds High: (400 + %Preds)(u nuw))
 ; RECURSE-NEXT:            Member: {%Preds,+,4}<nuw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP37]]:
-; RECURSE-NEXT:          (Low: %Base2 High: (100 + %Base2))
+; RECURSE-NEXT:          (Low: %Base2 High: (100 + %Base2)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base2,+,1}<nw><%for.body>
 ; RECURSE-NEXT:        Group [[GRP38]]:
-; RECURSE-NEXT:          (Low: %Base1 High: (100 + %Base1))
+; RECURSE-NEXT:          (Low: %Base1 High: (100 + %Base1)(u nuw))
 ; RECURSE-NEXT:            Member: {%Base1,+,1}<nw><%for.body>
 ; RECURSE-EMPTY:
 ; RECURSE-NEXT:      Non vectorizable stores to invariant address were not found in loop.
@@ -532,16 +532,16 @@ define dso_local void @forked_ptrs_different_base_same_offset_possible_poison(pt
 ; CHECK-NEXT:          %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
 ; CHECK-NEXT:      Grouped accesses:
 ; CHECK-NEXT:        Group [[GRP39]]:
-; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest))
+; CHECK-NEXT:          (Low: %Dest High: (400 + %Dest)(u nuw))
 ; CHECK-NEXT:            Member: {%Dest,+,4}<nw><%for.body>
 ; CHECK-NEXT:        Group [[GRP40]]:
-; CHECK-NE...
[truncated]

fhahn added 5 commits June 25, 2024 13:16
This patch introduces SCEVUse, which is a tagged pointer containing the
used const SCEV *, plus extra bits to store NUW/NSW flags that are only
valid at the specific use.

This was suggested by @nikic as an alternative
to llvm#90742.

This patch just updates most SCEV infrastructure to operate on SCEVUse
instead of const SCEV *. It does not introduce any code that makes use
of the use-specific flags yet which I'll share as follow-ups.

Note that this should be NFC, but currently there's at least one case
where it is not (turn-to-invariant.ll), which I'll investigate once we
agree on the overall direction.

This PR at the moment also contains a commit that updates various SCEV
clients to use `const SCEV *` instead of `const auto *`, to prepare for
this patch. This reduces the number of changes needed, as SCEVUse will
automatically convert to `const SCEV *`. This is a safe default, as it
just drops the use-specific flags for the expression (it will not drop
any use-specific flags for any of its operands though).

This probably
SCEVUse could probably also be used to address mis-compiles due to
equivalent AddRecs modulo flags result in an AddRec with incorrect flags
for some uses of some phis, e.g. the one
llvm#80430 attempted to fix

Compile-time impact:
stage1-O3: +0.06%
stage1-ReleaseThinLTO: +0.07%
stage1-ReleaseLTO-g: +0.07%
stage2-O3: +0.11%

https://llvm-compile-time-tracker.com/compare.php?from=ce055843e2be9643bd58764783a7bb69f6db8c9a&to=8c7f4e9e154ebc4862c4e2716cedc3c688352d7c&stat=instructions:u
@fhahn fhahn force-pushed the users/fhahn/scevuse branch from 6104124 to 4b0c62b Compare June 25, 2024 13:13
fhahn added 3 commits June 25, 2024 14:18
Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the
second operand (Y) is a BinOp. The code only simplifies the condition if
C1 < C2, so if the BinOp is NUW, it doesn't matter whether the first
operand also has the NUW flag, as it cannot wrap if C1 < C2.
Use SCEVUse to add a NUW flag to the upper bound of an accessed pointer.
We must already have proved that the pointers do not wrap, as otherwise
we could not use them for runtime check computations.

By adding the use-specific NUW flag, we can detect cases where SCEV can
prove that the compared pointers must overlap, so the runtime checks
will always be false. In that case, there is no point in vectorizing
with runtime checks.

Note that this depends c2895cd27fbf200d1da056bc66d77eeb62690bf0, which
could be submitted separately if desired; without the current change, I
don't think it triggers in practice though.
@fhahn fhahn force-pushed the users/fhahn/scevuse branch from 4b0c62b to 352daca Compare August 1, 2024 19:47
@fhahn fhahn force-pushed the users/fhahn/scevuse branch 2 times, most recently from 45b5e62 to a777a0e Compare October 9, 2024 13:35
@fhahn fhahn force-pushed the users/fhahn/scevuse branch from bcc33a2 to a15daba Compare November 25, 2024 19:18
@fhahn fhahn force-pushed the users/fhahn/scevuse branch 2 times, most recently from f3905cd to 6eeb7f7 Compare December 13, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants