Skip to content

[ConstraintElimination] Use SCEV ranges information for Loop counter #91457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

v01dXYZ
Copy link
Contributor

@v01dXYZ v01dXYZ commented May 8, 2024

Fixes #90417

The modifications allow to use SCEV ranges for the loop counter to allow simplifications that doesn't depend on the branch taken.

As SCEV could create large numbers that would very likely trigger an overflow with ConstraintSystem, ConstraintSystem was modified to skip the constraint when overflow occurs.

Copy link

github-actions bot commented May 8, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels May 8, 2024
@llvmbot
Copy link
Member

llvmbot commented May 8, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (v01dXYZ)

Changes

Fixes #90417

The modifications allow to use SCEV ranges for the loop counter to allow simplifications that doesn't depend on the branch taken.

As SCEV could create large numbers that would very likely trigger an overflow with ConstraintSystem, ConstraintSystem was modified to skip the constraint when overflow occurs.


Patch is 25.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91457.diff

12 Files Affected:

  • (modified) llvm/include/llvm/Analysis/ConstraintSystem.h (+9-1)
  • (modified) llvm/lib/Analysis/ConstraintSystem.cpp (+32-29)
  • (modified) llvm/lib/Transforms/Scalar/ConstraintElimination.cpp (+39-5)
  • (modified) llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll (+1-2)
  • (modified) llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll (+5-10)
  • (modified) llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll (+13-24)
  • (modified) llvm/test/Transforms/ConstraintElimination/loops.ll (+4-7)
  • (modified) llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll (+1-2)
  • (modified) llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll (+3-6)
  • (added) llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll (+83)
  • (modified) llvm/test/Transforms/ConstraintElimination/transfer-signed-facts-to-unsigned-is-known-non-negative.ll (+3-6)
  • (modified) llvm/test/Transforms/ConstraintElimination/transfer-unsigned-facts-to-signed-is-known-non-negative.ll (+3-4)
diff --git a/llvm/include/llvm/Analysis/ConstraintSystem.h b/llvm/include/llvm/Analysis/ConstraintSystem.h
index 7b02b618f7cb4..d8698a6f64347 100644
--- a/llvm/include/llvm/Analysis/ConstraintSystem.h
+++ b/llvm/include/llvm/Analysis/ConstraintSystem.h
@@ -43,6 +43,8 @@ class ConstraintSystem {
     return 0;
   }
 
+  // WARNING: it represents rather the maximum number of coefficients in the constraints
+  // which is actually the number of variables PLUS one (for the constant part).
   size_t NumVariables = 0;
 
   /// Current linear constraints in the system.
@@ -54,7 +56,13 @@ class ConstraintSystem {
   /// constraint system.
   DenseMap<Value *, unsigned> Value2Index;
 
-  // Eliminate constraints from the system using Fourier–Motzkin elimination.
+  // Eliminate the last variable from the system using Fourier–Motzkin
+  // elimination, while possibly relaxing it if it is beyond
+  // acceptable means (too many created constraints, overflow when
+  // computing the coefficients)
+  //
+  // return true if the updated system is equivalent, otherwise return
+  // false if it is relaxed.
   bool eliminateUsingFM();
 
   /// Returns true if there may be a solution for the constraints in the system.
diff --git a/llvm/lib/Analysis/ConstraintSystem.cpp b/llvm/lib/Analysis/ConstraintSystem.cpp
index 1a9c7c21e9ced..9dc6d25d11492 100644
--- a/llvm/lib/Analysis/ConstraintSystem.cpp
+++ b/llvm/lib/Analysis/ConstraintSystem.cpp
@@ -47,23 +47,24 @@ bool ConstraintSystem::eliminateUsingFM() {
     }
   }
 
+  // Track if an overflow occurred when computing coefficents
+  bool Overflow = false;
+
   // Process rows where the variable is != 0.
   unsigned NumRemainingConstraints = RemainingRows.size();
   for (unsigned R1 = 0; R1 < NumRemainingConstraints; R1++) {
     // FIXME do not use copy
     for (unsigned R2 = R1 + 1; R2 < NumRemainingConstraints; R2++) {
-      if (R1 == R2)
-        continue;
-
       int64_t UpperLast = getLastCoefficient(RemainingRows[R2], LastIdx);
       int64_t LowerLast = getLastCoefficient(RemainingRows[R1], LastIdx);
       assert(
           UpperLast != 0 && LowerLast != 0 &&
           "RemainingRows should only contain rows where the variable is != 0");
 
+      // ensure signs are different then swap if necessary to only
+      // deal with UpperLast > 0
       if ((LowerLast < 0 && UpperLast < 0) || (LowerLast > 0 && UpperLast > 0))
         continue;
-
       unsigned LowerR = R1;
       unsigned UpperR = R2;
       if (UpperLast < 0) {
@@ -71,43 +72,42 @@ bool ConstraintSystem::eliminateUsingFM() {
         std::swap(LowerLast, UpperLast);
       }
 
+      // The following loop does the element-wise operation on sparse
+      // vectors:
+      //
+      // NR = - UpperRow * LowerLast + LowerRow * UpperLast
       SmallVector<Entry, 8> NR;
       unsigned IdxUpper = 0;
       unsigned IdxLower = 0;
       auto &LowerRow = RemainingRows[LowerR];
       auto &UpperRow = RemainingRows[UpperR];
-      while (true) {
-        if (IdxUpper >= UpperRow.size() || IdxLower >= LowerRow.size())
-          break;
+
+      while (IdxUpper < UpperRow.size() && IdxLower < LowerRow.size()) {
         int64_t M1, M2, N;
         int64_t UpperV = 0;
         int64_t LowerV = 0;
-        uint16_t CurrentId = std::numeric_limits<uint16_t>::max();
-        if (IdxUpper < UpperRow.size()) {
-          CurrentId = std::min(UpperRow[IdxUpper].Id, CurrentId);
-        }
-        if (IdxLower < LowerRow.size()) {
-          CurrentId = std::min(LowerRow[IdxLower].Id, CurrentId);
-        }
+        uint16_t CurrentId =
+            std::min(UpperRow[IdxUpper].Id, LowerRow[IdxLower].Id);
 
-        if (IdxUpper < UpperRow.size() && UpperRow[IdxUpper].Id == CurrentId) {
+        if (UpperRow[IdxUpper].Id == CurrentId) {
           UpperV = UpperRow[IdxUpper].Coefficient;
-          IdxUpper++;
+          IdxUpper += 1;
         }
-
-        if (MulOverflow(UpperV, -1 * LowerLast, M1))
-          return false;
-        if (IdxLower < LowerRow.size() && LowerRow[IdxLower].Id == CurrentId) {
+        if (LowerRow[IdxLower].Id == CurrentId) {
           LowerV = LowerRow[IdxLower].Coefficient;
-          IdxLower++;
+          IdxLower += 1;
+        }
+
+        if (MulOverflow(UpperV, -1 * LowerLast, M1) ||
+            MulOverflow(LowerV, UpperLast, M2) || AddOverflow(M1, M2, N)) {
+          Overflow = true;
+          continue;
         }
 
-        if (MulOverflow(LowerV, UpperLast, M2))
-          return false;
-        if (AddOverflow(M1, M2, N))
-          return false;
+        // useless to add a 0 to a sparse vector
         if (N == 0)
           continue;
+
         NR.emplace_back(N, CurrentId);
       }
       if (NR.empty())
@@ -120,13 +120,16 @@ bool ConstraintSystem::eliminateUsingFM() {
   }
   NumVariables -= 1;
 
-  return true;
+  return !Overflow;
 }
 
 bool ConstraintSystem::mayHaveSolutionImpl() {
-  while (!Constraints.empty() && NumVariables > 1) {
-    if (!eliminateUsingFM())
-      return true;
+  while (!Constraints.empty() && Constraints.size() <= 500 && NumVariables > 1) {
+    if(!eliminateUsingFM()) {
+      LLVM_DEBUG(
+          dbgs()
+          << "Some new constraints has been ignored during elimination.\n");
+    };
   }
 
   if (Constraints.empty() || NumVariables > 1)
diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index 7e48c28176bd1..e4f2feb1b28bf 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -916,6 +916,45 @@ void State::addInfoForInductions(BasicBlock &BB) {
       !SE.isSCEVable(PN->getType()))
     return;
 
+  auto *AR = dyn_cast_or_null<SCEVAddRecExpr>(SE.getSCEV(PN));
+  BasicBlock *LoopPred = L->getLoopPredecessor();
+  if (!AR || AR->getLoop() != L || !LoopPred)
+    return;
+
+  // If SCEV provides relevant range information, we push those facts
+  // to the worklist relatively to the header node itself (and not its
+  // successor).
+  auto UnsignedRange = SE.getUnsignedRange(AR);
+  auto SignedRange = SE.getSignedRange(AR);
+  DomTreeNode *DTNHeader = DT.getNode(&BB);
+
+  // The range can wrap thus we take Min/Max instead of Lower/Upper
+  auto UnsignedMin = UnsignedRange.getUnsignedMin();
+  auto UnsignedMax = UnsignedRange.getUnsignedMax();
+  if (!UnsignedMax.isMaxValue()) {
+    WorkList.push_back(FactOrCheck::getConditionFact(
+        DTNHeader, CmpInst::ICMP_ULE, PN,
+        Constant::getIntegerValue(PN->getType(), UnsignedMax)));
+  }
+  if (!UnsignedMin.isMinValue()) {
+    WorkList.push_back(FactOrCheck::getConditionFact(
+        DTNHeader, CmpInst::ICMP_UGE, PN,
+        Constant::getIntegerValue(PN->getType(), UnsignedMin)));
+  }
+
+  auto SignedMin = SignedRange.getSignedMin();
+  auto SignedMax = SignedRange.getSignedMax();
+  if (!SignedMax.isMaxSignedValue()) {
+    WorkList.push_back(FactOrCheck::getConditionFact(
+        DTNHeader, CmpInst::ICMP_SLE, PN,
+        Constant::getIntegerValue(PN->getType(), SignedMax)));
+  }
+  if (!SignedMin.isMinSignedValue()) {
+    WorkList.push_back(FactOrCheck::getConditionFact(
+        DTNHeader, CmpInst::ICMP_SGE, PN,
+        Constant::getIntegerValue(PN->getType(), SignedMin)));
+  }
+
   BasicBlock *InLoopSucc = nullptr;
   if (Pred == CmpInst::ICMP_NE)
     InLoopSucc = cast<BranchInst>(BB.getTerminator())->getSuccessor(0);
@@ -927,11 +966,6 @@ void State::addInfoForInductions(BasicBlock &BB) {
   if (!L->contains(InLoopSucc) || !L->isLoopExiting(&BB) || InLoopSucc == &BB)
     return;
 
-  auto *AR = dyn_cast_or_null<SCEVAddRecExpr>(SE.getSCEV(PN));
-  BasicBlock *LoopPred = L->getLoopPredecessor();
-  if (!AR || AR->getLoop() != L || !LoopPred)
-    return;
-
   const SCEV *StartSCEV = AR->getStart();
   Value *StartValue = nullptr;
   if (auto *C = dyn_cast<SCEVConstant>(StartSCEV)) {
diff --git a/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll b/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
index 88f87f4afab28..efb89fd0512d9 100644
--- a/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
+++ b/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
@@ -13,8 +13,7 @@ define i32 @f(i64 %a3, i64 %numElements) {
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[A1]], [[A3]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF_END_I:%.*]], label [[ABORT:%.*]]
 ; CHECK:       if.end.i:
-; CHECK-NEXT:    [[CMP2_NOT_I:%.*]] = icmp ult i64 [[A1]], [[A3]]
-; CHECK-NEXT:    br i1 [[CMP2_NOT_I]], label [[ABORT]], label [[EXIT:%.*]]
+; CHECK-NEXT:    br i1 false, label [[ABORT]], label [[EXIT:%.*]]
 ; CHECK:       abort:
 ; CHECK-NEXT:    ret i32 -1
 ; CHECK:       exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll b/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
index 3dbea9496da8d..72a998e71ff87 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
@@ -11,10 +11,8 @@ define void @loop_iv_cond_variable_bound(i32 %n) {
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[T_1:%.*]] = icmp ule i32 [[IV]], [[N:%.*]]
 ; CHECK-NEXT:    call void @use(i1 [[T_1]])
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
-; CHECK-NEXT:    [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT:    call void @use(i1 [[T_3]])
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp ult i32 [[IV]], [[N]]
 ; CHECK-NEXT:    call void @use(i1 [[C_1]])
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
@@ -56,12 +54,9 @@ define void @loop_iv_cond_constant_bound() {
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[T_1:%.*]] = icmp ule i32 [[IV]], 2
-; CHECK-NEXT:    call void @use(i1 [[T_1]])
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
-; CHECK-NEXT:    [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT:    call void @use(i1 [[T_3]])
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp ult i32 [[IV]], 2
 ; CHECK-NEXT:    call void @use(i1 [[C_1]])
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
diff --git a/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll b/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
index 7b928a030614b..61b3411a8cded 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
@@ -14,15 +14,11 @@ define void @loop_phi_pos_start_value(i32 %y, i1 %c, i32 %n) {
 ; CHECK:       loop.latch:
 ; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    call void @use(i1 false)
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[X]], 10
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
-; CHECK-NEXT:    [[C_2:%.*]] = icmp sle i32 [[X]], 9
-; CHECK-NEXT:    call void @use(i1 [[C_2]])
-; CHECK-NEXT:    [[C_3:%.*]] = icmp sgt i32 [[X]], 9
-; CHECK-NEXT:    call void @use(i1 [[C_3]])
 ; CHECK-NEXT:    call void @use(i1 true)
-; CHECK-NEXT:    [[C_5:%.*]] = icmp sge i32 [[X]], 9
-; CHECK-NEXT:    call void @use(i1 [[C_5]])
+; CHECK-NEXT:    call void @use(i1 false)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[X_NEXT]] = add nsw i32 [[X]], 1
 ; CHECK-NEXT:    br label [[LOOP_HEADER]]
 ; CHECK:       exit:
@@ -75,8 +71,7 @@ define void @loop_phi_neg_start_value(i32 %y, i1 %c, i32 %n) {
 ; CHECK:       loop.latch:
 ; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    call void @use(i1 false)
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[X]], -10
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp sle i32 [[X]], 9
 ; CHECK-NEXT:    call void @use(i1 [[C_2]])
 ; CHECK-NEXT:    [[C_3:%.*]] = icmp sgt i32 [[X]], 9
@@ -248,12 +243,11 @@ define void @loop_latch_not_executed_constant_bound(i32 %y, i1 %c) {
 ; CHECK-NEXT:    br i1 [[C:%.*]], label [[LOOP_HEADER:%.*]], label [[EXIT:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[X:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[X_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C_1:%.*]] = icmp ugt i32 [[X]], 10
-; CHECK-NEXT:    br i1 [[C_1]], label [[LOOP_LATCH]], label [[EXIT]]
+; CHECK-NEXT:    br i1 false, label [[LOOP_LATCH]], label [[EXIT]]
 ; CHECK:       loop.latch:
-; CHECK-NEXT:    call void @use(i1 false)
 ; CHECK-NEXT:    call void @use(i1 true)
-; CHECK-NEXT:    call void @use(i1 false)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[X_NEXT]] = add i32 [[X]], 1
 ; CHECK-NEXT:    br label [[LOOP_HEADER]]
@@ -299,10 +293,8 @@ define void @loop_iv_cond_variable_bound(i32 %n) {
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
 ; CHECK-NEXT:    [[T_1:%.*]] = icmp ule i32 [[IV]], [[N:%.*]]
 ; CHECK-NEXT:    call void @use(i1 [[T_1]])
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
-; CHECK-NEXT:    [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT:    call void @use(i1 [[T_3]])
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp ult i32 [[IV]], [[N]]
 ; CHECK-NEXT:    call void @use(i1 [[C_1]])
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
@@ -362,12 +354,9 @@ define void @loop_iv_cond_constant_bound() {
 ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[T_1:%.*]] = icmp ule i32 [[IV]], 2
-; CHECK-NEXT:    call void @use(i1 [[T_1]])
-; CHECK-NEXT:    [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT:    call void @use(i1 [[T_2]])
-; CHECK-NEXT:    [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT:    call void @use(i1 [[T_3]])
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp ult i32 [[IV]], 2
 ; CHECK-NEXT:    call void @use(i1 [[C_1]])
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
diff --git a/llvm/test/Transforms/ConstraintElimination/loops.ll b/llvm/test/Transforms/ConstraintElimination/loops.ll
index 25d4ac436ce8c..8753fff4aac7e 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops.ll
@@ -107,15 +107,12 @@ define i32 @loop_header_dom_successors_flipped(i32 %y, i1 %c) {
 ; CHECK-NEXT:    br i1 [[C:%.*]], label [[LOOP_HEADER:%.*]], label [[EXIT:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[X:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[X_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C_1:%.*]] = icmp ule i32 [[X]], 10
-; CHECK-NEXT:    br i1 [[C_1]], label [[EXIT]], label [[LOOP_LATCH]]
+; CHECK-NEXT:    br i1 true, label [[EXIT]], label [[LOOP_LATCH]]
 ; CHECK:       loop.latch:
-; CHECK-NEXT:    call void @use(i1 false)
 ; CHECK-NEXT:    call void @use(i1 true)
-; CHECK-NEXT:    [[C_2:%.*]] = icmp ugt i32 [[X]], 11
-; CHECK-NEXT:    call void @use(i1 [[C_2]])
-; CHECK-NEXT:    [[C_3:%.*]] = icmp ule i32 [[X]], 11
-; CHECK-NEXT:    call void @use(i1 [[C_3]])
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[X_NEXT]] = add i32 [[X]], 1
 ; CHECK-NEXT:    br label [[LOOP_HEADER]]
 ; CHECK:       exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
index 155423d33a69a..a81f2059026c0 100644
--- a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
+++ b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
@@ -197,8 +197,7 @@ define void @add_rec_decreasing_2_cond_true_constant(i8 noundef %len) {
 ; CHECK-NEXT:    [[CMP2_NOT:%.*]] = icmp eq i8 [[K_0]], 0
 ; CHECK-NEXT:    br i1 [[CMP2_NOT]], label [[EXIT:%.*]], label [[LOOP_LATCH]]
 ; CHECK:       loop.latch:
-; CHECK-NEXT:    [[CMP_NOT_I:%.*]] = icmp ult i8 [[K_0]], 5
-; CHECK-NEXT:    call void @use(i1 [[CMP_NOT_I]])
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[K_DEC]] = add i8 [[K_0]], -2
 ; CHECK-NEXT:    br label [[LOOP_HEADER]]
 ; CHECK:       exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
index 17aa4d54abfba..2a58220b78f00 100644
--- a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
+++ b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
@@ -49,8 +49,7 @@ define void @test_iv_nuw_nsw_1_uge_start(i8 %len.n, i8 %a) {
 ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ -1, [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT:    br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[C_2:%.*]] = call i1 @cond()
 ; CHECK-NEXT:    br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
@@ -90,8 +89,7 @@ define void @test_iv_nuw_nsw_2_uge_start(i8 %len.n, i8 %a) {
 ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ [[START]], [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT:    br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[C_2:%.*]] = call i1 @cond()
 ; CHECK-NEXT:    br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
@@ -131,8 +129,7 @@ define void @test_iv_nsw_nuw_1_ult_end(i8 %len.n, i8 %a) {
 ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
 ; CHECK:       loop.header:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ -2, [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT:    br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[C_2:%.*]] = call i1 @cond()
 ; CHECK-NEXT:    br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
diff --git a/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll b/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll
new file mode 100644
index 0000000000000..0c77574add33d
--- /dev/null
+++ b/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll
@@ -0,0 +1,83 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=constraint-elimination -S %s | FileCheck %s
+
+define i1 @test_ult_loop_with_ult_icmp_N_at_end(ptr %s) {
+; CHECK-LABEL: define i1 @test_ult_loop_with_ult_icmp_N_at_end(
+; CHECK-SAME: ptr [[S:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[WHILE_COND:%.*]]
+; CHECK:       while.cond:
+; CHECK-NEXT:    [[I_0:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[ADD:%.*]], [[WHILE_BODY:%.*]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i64 [[I_0]], 1234
+; CHECK-NEXT:    br i1 [[CMP]], label [[WHILE_BODY]], label [[WHILE_END:%.*]]
+; CHECK:       while.body:
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 [[I_0]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT:    [[TMP1:%.*]] = add i8 [[TMP0]], -48
+; CHECK-NEXT:    [[OR_COND:%.*]] = icmp ult i8 [[TMP1]], 10
+; CHECK-NEXT:    [[ADD]] = add n...
[truncated]

@v01dXYZ v01dXYZ marked this pull request as draft May 8, 2024 14:29
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 8, 2024

I switch the PR to draft as there is a failing test: polly Support/pipelineposition.ll

Input was:
<<<<<<
          22:  n/a
          23:  Statements {
          24:  Stmt_body_i
          25:  Domain :=
          26:  [n] -> { Stmt_body_i[i0, i1] : 0 <= i0 < n and 0 <= i1 < n };
          27:  Schedule :=
next:84'0                 X error: no match found
          28:  [n] -> { Stmt_body_i[i0, i1] -> [i0, i1] };
next:84'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:84'1      ?                                            possible intended match
          29:  MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
next:84'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          30:  [n] -> { Stmt_body_i[i0, i1] -> MemRef_A[i0 + i1] };
next:84'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          31:  }
next:84'0     ~~~
          32:  Function: caller
next:84'0     ~~~~~~~~~~~~~~~~~~
          33:  Region: %entry.split---%polly.merge_new_and_old
next:84'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
>>>>>>

Simplify the code of the loop performing the element-wise multiplication
as there were strange patterns. I added some comments to help understand
the intent of the code for some external viewers.
@v01dXYZ v01dXYZ force-pushed the 90417-constraintelim-add-scev-ranges branch from dd0e727 to e174704 Compare May 9, 2024 08:39
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 9, 2024

Below the test that fails:

define internal void @callee(i32 %n, ptr noalias nonnull %A, i32 %i) {
entry:
  br label %for

for:
  %j = phi i32 [0, %entry], [%j.inc, %inc]
  %j.cmp = icmp slt i32 %j, %n
  br i1 %j.cmp, label %body, label %exit

    body:
      %idx = add i32 %i, %j
      %arrayidx = getelementptr inbounds double, ptr %A, i32 %idx
      store double 42.0, ptr %arrayidx
      br label %inc

inc:
  %j.inc = add nuw nsw i32 %j, 1
  br label %for

exit:
  br label %return

return:
  ret void
}

I rebased as I don't understand what's going on. I'll take a closer look at it if it fails again.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 9, 2024

it fails (as expected actually), I'm taking a look at it.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 9, 2024

Very strange error it seems the output of ScopInfo is almost the same, just the name of the block that changes. I don't know why that occurs.

Below the output of the twos:

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 9, 2024

I understand now why it fails. Polly uses for tests -O3 which means any changes to the default passes could break them.

Attached bellow the result depending

$LLVM_BUILD/bin/opt \
-polly-process-unprofitable  \
-polly-remarks-minimal  \
-polly-use-llvm-names  \
-polly-import-jscop-dir=$LLVM_ROOT/polly/test/Support  \
-polly-codegen-verify  \
-O3 \
-polly \
-polly-position=before-vectorizer \
-disable-output \
$LLVM_ROOT/polly/test/Support/pipelineposition.ll \
--print-before-all 2>&1

Below is the output of the above comment depending on the commit

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 9, 2024

More precisely, here is the IR dump before constraint elimination:

define void @caller(i32 %n, ptr noalias nocapture nonnull writeonly %A) local_unnamed_addr {
entry:
  br label %for

for:                                              ; preds = %callee.exit, %entry
  %i = phi i32 [ 0, %entry ], [ %j.inc, %callee.exit ]
  %i.cmp = icmp slt i32 %i, %n
  br i1 %i.cmp, label %body, label %return

body:                                             ; preds = %for
  %j.cmp1.i = icmp sgt i32 %n, 0
  br i1 %j.cmp1.i, label %body.i, label %callee.exit

body.i:                                           ; preds = %body.i, %body
  %j2.i = phi i32 [ %j.inc.i, %body.i ], [ 0, %body ]
  %idx.i = add nuw i32 %j2.i, %i
  %0 = sext i32 %idx.i to i64
  %arrayidx.i = getelementptr inbounds double, ptr %A, i64 %0
  store double 4.200000e+01, ptr %arrayidx.i, align 8, !alias.scope !0
  %j.inc.i = add nuw nsw i32 %j2.i, 1
  %j.cmp.i = icmp slt i32 %j.inc.i, %n
  br i1 %j.cmp.i, label %body.i, label %callee.exit

callee.exit:                                      ; preds = %body, %body.i
  %j.inc = add nuw nsw i32 %i, 1
  br label %for

return:                                           ; preds = %for
  ret void
}

When adding the SCEV ranges as facts to the header, it adds %i >= 0 which would allow elimination of %n >= 0 in body.

It doesn't seem to alter what's being tested for Polly (and neither the produced optimised IR).

@v01dXYZ v01dXYZ force-pushed the 90417-constraintelim-add-scev-ranges branch 2 times, most recently from cea65de to 86f4583 Compare May 9, 2024 21:11
@v01dXYZ v01dXYZ marked this pull request as ready for review May 10, 2024 05:15
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 10, 2024

@fhahn I think the PR is ready. The changes are mainly for ConstraintElimination/ConstraintSystem which is a part where the major contributor seems to be you. Thus my request for a review.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 13, 2024

@fhahn ping

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff febd89cafea11e6603f593e41be1a21ca9d009ac 86f4583def5ebd87a1ac08a819fee85f6fc53ae7 -- llvm/include/llvm/Analysis/ConstraintSystem.h llvm/lib/Analysis/ConstraintSystem.cpp llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/Analysis/ConstraintSystem.h b/llvm/include/llvm/Analysis/ConstraintSystem.h
index d8698a6f64..32f61caee0 100644
--- a/llvm/include/llvm/Analysis/ConstraintSystem.h
+++ b/llvm/include/llvm/Analysis/ConstraintSystem.h
@@ -43,8 +43,9 @@ class ConstraintSystem {
     return 0;
   }
 
-  // WARNING: it represents rather the maximum number of coefficients in the constraints
-  // which is actually the number of variables PLUS one (for the constant part).
+  // WARNING: it represents rather the maximum number of coefficients in the
+  // constraints which is actually the number of variables PLUS one (for the
+  // constant part).
   size_t NumVariables = 0;
 
   /// Current linear constraints in the system.
diff --git a/llvm/lib/Analysis/ConstraintSystem.cpp b/llvm/lib/Analysis/ConstraintSystem.cpp
index 9dc6d25d11..9f7c32207f 100644
--- a/llvm/lib/Analysis/ConstraintSystem.cpp
+++ b/llvm/lib/Analysis/ConstraintSystem.cpp
@@ -124,8 +124,9 @@ bool ConstraintSystem::eliminateUsingFM() {
 }
 
 bool ConstraintSystem::mayHaveSolutionImpl() {
-  while (!Constraints.empty() && Constraints.size() <= 500 && NumVariables > 1) {
-    if(!eliminateUsingFM()) {
+  while (!Constraints.empty() && Constraints.size() <= 500 &&
+         NumVariables > 1) {
+    if (!eliminateUsingFM()) {
       LLVM_DEBUG(
           dbgs()
           << "Some new constraints has been ignored during elimination.\n");

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 13, 2024

thanks I'm solving the formatting thing right now. I don't know why I didn't spot it previously. Is there any other important CI pipelines that need manual actions ?

v01dxyz added 2 commits May 13, 2024 15:25
When constructing a new constraint from a pair of lower/upper
constraints on the variable to eliminate, it simply ignores the new
constraint by not adding it if an overflow occurs when computing one
of its coefficients or if there are already too many constraints.

Previously, it simply stopped at this stage and considered the system
as having solution. Thus adding new constraints could turn a
unsatisfiable system into a possibly satisfiable one which is a little
bit strange.

Unsatisfiability of the new system still implies unsatisfiability of
the former one (the projected solution set of the original system is
included in the solution set of the new system).
When looking at the loop counter, if the min/max values of the SCEV
ranges are relevant (not extremal), add this information as
ConditionFacts for the Loop header.

It allows icmp simplifications independently of the branch taken, such
as using `i <= N` after a block `for(i=0; i<N; i+=1) { ... }`.

Fixes llvm#90417

IMPORTANT: Concerning modifications to Polly

One test for polly was modified as an icmp simplification could be
done earlier in the pipeline than previously. The input of
`polly::CodePreparationPass` is almost the same one, except the name
of one block that is different because of this simplification done
earlier.
@v01dXYZ v01dXYZ force-pushed the 90417-constraintelim-add-scev-ranges branch from 86f4583 to 7062594 Compare May 13, 2024 13:26
@fhahn
Copy link
Contributor

fhahn commented May 13, 2024

Computing the ranges unconditionally requires computing the BTC for all loops, which is quite expensive https://llvm-compile-time-tracker.com/compare.php?from=53d6cfb1d184127a57132657f41104b21e7c0f2b&to=86f4583def5ebd87a1ac08a819fee85f6fc53ae7&stat=instructions:u

Not sure if the benefit warrants the increase; do you have any statistics on larger software how beneficial this is?

Also it would probably be good to split off c41b5b5 at least to a separate PR

@nikic
Copy link
Contributor

nikic commented May 14, 2024

#76299 seems related to the second commit.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 14, 2024

@fhahn I don't have statistics to motivate this change wrt perf regressions, my first goal was to have feature parity with gcc and I didn't take the performance regressions into account as I should have. I will take a look at llvm-opt-benchmark.

Also, to better justify which kind of expression where it would be acceptable to perform a range analysis, I'm currently taking a look at gcc. gcc uses the Ranger framework which was already mentioned in the review for ConstraintSystem: https://reviews.llvm.org/D84547#2174140.

With only O2 level, the analysis is done during Early Variable Range Propagation (evrp). It is able to infer the range if the induction variable is additive, but if we use a multiplicative induction variable the inferred range is too approximated. With a O3 level, the right range could be nonetheless inferred in the dom3 pass, which is not present at lower optimisation levels.

I think it will make me busy for two weeks at least before writing a message here with perf stats and good justifications of which sort of variables should be range analysed.

@nikic thanks for pointing out this MR, I made a mistake continue should be replaced by NR.clear(); break;.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 14, 2024
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 14, 2024

As there is work to do on my side, I switch this PR to Draft.

@v01dXYZ v01dXYZ marked this pull request as draft May 14, 2024 14:31
@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 20, 2024

Little update to show some progress even if the work is not completed yet.

To have a verbose output from gcc/Ranger, use the following command line

gcc faulty.c -O2 -o - -S -Q --fdump-tree-all-details=- --param=ranger-debug=all

The evrp pass outputs a lot of interesting information.

The function number_of_iterations_exit is similar to computeExitLimit. tree_niter_desc is almost the same as ExitLimit.
The function estimate_number_of_iterations is similar to computeBackedgeTakenCount.

It seems to me at the moment gcc tries to compute the loop BTC for the loop (or at least the smallest upper bound it can be sure of).

Below to remind me the calling functions a backtrace of number_of_iterations_exit:

Backtrace
#0  number_of_iterations_exit (loop=loop@entry=0x7ffff7b2c4b0, exit=exit@entry=0x7ffff7d08a80, niter=niter@entry=0x7fffffffbe20, warn=warn@entry=false, every_iteration=every_iteration@entry=false, body=body@entry=0x3160520) at ../.././gcc/tree-ssa-loop-niter.cc:3285
#1  0x0000000001195796 in estimate_numbers_of_iterations (loop=loop@entry=0x7ffff7b2c4b0) at ../.././gcc/tree-ssa-loop-niter.cc:4882
#2  0x0000000001199156 in loop_exits_before_overflow (base=0x7ffff7d01de0, step=0x7ffff7d01de0, at_stmt=0x7ffff7cdaa00, loop=0x7ffff7b2c4b0) at ../.././gcc/tree-ssa-loop-niter.cc:5286
#3  scev_probably_wraps_p (var=var@entry=0x0, base=base@entry=0x7ffff7d01de0, step=step@entry=0x7ffff7d01de0, at_stmt=at_stmt@entry=0x7ffff7cdaa00, loop=0x7ffff7b2c4b0, use_overflow_semantics=use_overflow_semantics@entry=true) at ../.././gcc/tree-ssa-loop-niter.cc:5538
#4  0x00000000013a2622 in get_scev_info (r=..., name=0x7ffff7b2c4b0, stmt=<optimized out>, l=0x7ffff7b2c4b0, init=<synthetic pointer>: <optimized out>, step=<synthetic pointer>: <optimized out>, dir=<synthetic pointer>: <optimized out>) at ../.././gcc/vr-values.cc:204
#5  range_of_var_in_loop (v=..., name=name@entry=0x7ffff7d130d8, l=0x7ffff7b2c4b0, stmt=0x7ffff7cdaa00, query=0x313c140) at ../.././gcc/vr-values.cc:271
#6  0x0000000001f0992c in fold_using_range::range_of_ssa_name_with_loop_info (this=<optimized out>, r=..., name=0x7ffff7d130d8, l=<optimized out>, phi=<optimized out>, src=...) at ../.././gcc/gimple-range-fold.cc:1114
#7  0x0000000001f09fdb in fold_using_range::range_of_phi (this=this@entry=0x7fffffffd64f, r=..., phi=phi@entry=0x7ffff7cdaa00, src=...) at ../.././gcc/gimple-range-fold.cc:981
#8  0x0000000001f0d3d1 in fold_using_range::fold_stmt (this=this@entry=0x7fffffffd64f, r=..., s=s@entry=0x7ffff7cdaa00, src=..., name=name@entry=0x7ffff7d130d8) at ../.././gcc/gimple-range-fold.cc:606
#9  0x0000000001ef896e in gimple_ranger::fold_range_internal (this=0x313c140, r=..., s=0x7ffff7cdaa00, name=0x7ffff7d130d8) at ../.././gcc/gimple-range.cc:277
#10 gimple_ranger::range_of_stmt (this=0x313c140, r=..., s=0x7ffff7cdaa00, name=<optimized out>) at ../.././gcc/gimple-range.cc:338
#11 0x0000000001ef8bb2 in gimple_ranger::register_inferred_ranges (this=0x313c140, s=s@entry=0x7ffff7cdaa00) at ../.././gcc/gimple-range.cc:498
#12 0x0000000001304d41 in rvrp_folder::pre_fold_bb (this=0x7fffffffde20, bb=0x7ffff7cdca80) at ../.././gcc/tree-vrp.cc:1017
#13 0x00000000011e377e in substitute_and_fold_dom_walker::before_dom_children (this=0x7fffffffdd50, bb=0x7ffff7cdca80) at ../.././gcc/tree-ssa-propagate.cc:762
#14 0x0000000001e9db9e in dom_walker::walk (this=0x7fffffffdd50, bb=0x7ffff7cdca80) at ../.././gcc/domwalk.cc:311
#15 0x00000000011e2526 in substitute_and_fold_engine::substitute_and_fold (this=this@entry=0x7fffffffde20, block=block@entry=0x0) at ../.././gcc/tree-ssa-propagate.cc:999
#16 0x000000000130201d in execute_ranger_vrp (fun=0x7ffff7d0d0c0, warn_array_bounds_p=false, final_p=<optimized out>) at ../.././gcc/tree-vrp.cc:1080
#17 0x0000000000ec6419 in execute_one_pass (pass=pass@entry=0x3099300) at ../.././gcc/passes.cc:2649
#18 0x0000000000ec6d20 in execute_pass_list_1 (pass=0x3099300) at ../.././gcc/passes.cc:2758
#19 0x0000000000ec6d32 in execute_pass_list_1 (pass=0x3098e00) at ../.././gcc/passes.cc:2759
#20 0x0000000000ec6d59 in execute_pass_list (fn=0x7ffff7d0d0c0, pass=<optimized out>) at ../.././gcc/passes.cc:2769
#21 0x0000000000ec768d in do_per_function_toporder (callback=callback@entry=0xec6d40 <execute_pass_list(function*, opt_pass*)>, data=0x3098b60) at ../.././gcc/passes.cc:1774
#22 0x0000000000ec78cf in do_per_function_toporder (data=<optimized out>, callback=0xec6d40 <execute_pass_list(function*, opt_pass*)>) at ../.././gcc/passes.cc:1741
#23 execute_ipa_pass_list (pass=0x3098ae0) at ../.././gcc/passes.cc:3103
#24 0x0000000000aed8d7 in ipa_passes () at ../.././gcc/cgraphunit.cc:2214
#25 symbol_table::compile (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2337
#26 0x0000000000af06ad in symbol_table::compile (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2315
#27 symbol_table::finalize_compilation_unit (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2589
#28 0x0000000000ff46b2 in compile_file () at ../.././gcc/toplev.cc:476
#29 0x00000000009076dc in do_compile () at ../.././gcc/toplev.cc:2158
#30 toplev::main (this=this@entry=0x7fffffffe6de, argc=<optimized out>, argc@entry=15, argv=<optimized out>, argv@entry=0x7fffffffe738) at ../.././gcc/toplev.cc:2314
#31 0x0000000000908ebe in main (argc=15, argv=0x7fffffffe738) at ../.././gcc/main.cc:39

Some interesting classes I am still trying to understand:

  • rvrp_folder inherits substitute_and_fold. The latter uses a dom walker which will do a DFS while locally sorting the children according to the CFG reverse post-order, see before_dom_children for that.
  • range_query has a member relation_oracle which is an equivalent to ConstraintSystem, keeping track of the relations between different expressions.
  • gimple_ranger, ranger_cache

Some abbreviations:

  • GORI: Generates Outgoing Range Info
  • chrec: chain of recurrence
  • fur: fold under range

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 20, 2024

A little fact, LLVM also computes the BTC when the add is not marked explicity with both nuw nsw to compute the NUW/NSW flag when creating the SCEV expression. Reminder for myself, look at proveNoWrapViaConstantRanges(), use AR->hasNoSelfWrap() to find an upper bound for the AR, which is absolutely equivalent to upper bound the BTC. ( computeMaxBECountForLT howManyLessThans/howManyGreaterThans ?).

It seems SCEV don't use the NSW/NUW info when computing BTC which is a little bit suprising (when having a start greater than the end I expect it to have a range looking like [start, start+1) ie BTC == 0 but it's rather [start, TYPE_MAX).

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 25, 2024

I think you're right about not using the range information in this pass, as this pass seems to simplify using mainly a forward approach which traverses the CFG while keeping track of the branch conditions associated to the dominating edges (among other things).

Supporting this case would be against this simple forward approach.

There are other passes that do a range analysis but do not simplify icmp.

  • SCCP: lattice -> set NUW/NSW for add
  • Correlated Propagation: Lazy Value Info / Value Lattice -> set NUW/NSW for add
  • Ind Var: SCEV
  • InstCombine: ValueTracking

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 26, 2024

I think I found a possible solution with a minor change in IndVarSimplify / SimplifyIndVar. It seems the instructions eligible to simplifications are only the ones within the current loop (cf pushIVUsers), I wonder if we can also consider eligible instructions that are not in any loop. It seems to work but I didn't test the performance regressions. So still WIP.

@v01dXYZ
Copy link
Contributor Author

v01dXYZ commented May 31, 2024

This PR has been superseded by #93598 as it appears ConstraintElimination is not the right pass for optimisation of Induction Variable Users that occur post-loop. I stop working on it and close it.

@v01dXYZ v01dXYZ closed this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reopen of Bug 49389 - Condition not constant folded
4 participants