-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[ConstraintElimination] Use SCEV ranges information for Loop counter #91457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: None (v01dXYZ) ChangesFixes #90417 The modifications allow to use SCEV ranges for the loop counter to allow simplifications that doesn't depend on the branch taken. As SCEV could create large numbers that would very likely trigger an overflow with Patch is 25.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91457.diff 12 Files Affected:
diff --git a/llvm/include/llvm/Analysis/ConstraintSystem.h b/llvm/include/llvm/Analysis/ConstraintSystem.h
index 7b02b618f7cb4..d8698a6f64347 100644
--- a/llvm/include/llvm/Analysis/ConstraintSystem.h
+++ b/llvm/include/llvm/Analysis/ConstraintSystem.h
@@ -43,6 +43,8 @@ class ConstraintSystem {
return 0;
}
+ // WARNING: it represents rather the maximum number of coefficients in the constraints
+ // which is actually the number of variables PLUS one (for the constant part).
size_t NumVariables = 0;
/// Current linear constraints in the system.
@@ -54,7 +56,13 @@ class ConstraintSystem {
/// constraint system.
DenseMap<Value *, unsigned> Value2Index;
- // Eliminate constraints from the system using Fourier–Motzkin elimination.
+ // Eliminate the last variable from the system using Fourier–Motzkin
+ // elimination, while possibly relaxing it if it is beyond
+ // acceptable means (too many created constraints, overflow when
+ // computing the coefficients)
+ //
+ // return true if the updated system is equivalent, otherwise return
+ // false if it is relaxed.
bool eliminateUsingFM();
/// Returns true if there may be a solution for the constraints in the system.
diff --git a/llvm/lib/Analysis/ConstraintSystem.cpp b/llvm/lib/Analysis/ConstraintSystem.cpp
index 1a9c7c21e9ced..9dc6d25d11492 100644
--- a/llvm/lib/Analysis/ConstraintSystem.cpp
+++ b/llvm/lib/Analysis/ConstraintSystem.cpp
@@ -47,23 +47,24 @@ bool ConstraintSystem::eliminateUsingFM() {
}
}
+ // Track if an overflow occurred when computing coefficents
+ bool Overflow = false;
+
// Process rows where the variable is != 0.
unsigned NumRemainingConstraints = RemainingRows.size();
for (unsigned R1 = 0; R1 < NumRemainingConstraints; R1++) {
// FIXME do not use copy
for (unsigned R2 = R1 + 1; R2 < NumRemainingConstraints; R2++) {
- if (R1 == R2)
- continue;
-
int64_t UpperLast = getLastCoefficient(RemainingRows[R2], LastIdx);
int64_t LowerLast = getLastCoefficient(RemainingRows[R1], LastIdx);
assert(
UpperLast != 0 && LowerLast != 0 &&
"RemainingRows should only contain rows where the variable is != 0");
+ // ensure signs are different then swap if necessary to only
+ // deal with UpperLast > 0
if ((LowerLast < 0 && UpperLast < 0) || (LowerLast > 0 && UpperLast > 0))
continue;
-
unsigned LowerR = R1;
unsigned UpperR = R2;
if (UpperLast < 0) {
@@ -71,43 +72,42 @@ bool ConstraintSystem::eliminateUsingFM() {
std::swap(LowerLast, UpperLast);
}
+ // The following loop does the element-wise operation on sparse
+ // vectors:
+ //
+ // NR = - UpperRow * LowerLast + LowerRow * UpperLast
SmallVector<Entry, 8> NR;
unsigned IdxUpper = 0;
unsigned IdxLower = 0;
auto &LowerRow = RemainingRows[LowerR];
auto &UpperRow = RemainingRows[UpperR];
- while (true) {
- if (IdxUpper >= UpperRow.size() || IdxLower >= LowerRow.size())
- break;
+
+ while (IdxUpper < UpperRow.size() && IdxLower < LowerRow.size()) {
int64_t M1, M2, N;
int64_t UpperV = 0;
int64_t LowerV = 0;
- uint16_t CurrentId = std::numeric_limits<uint16_t>::max();
- if (IdxUpper < UpperRow.size()) {
- CurrentId = std::min(UpperRow[IdxUpper].Id, CurrentId);
- }
- if (IdxLower < LowerRow.size()) {
- CurrentId = std::min(LowerRow[IdxLower].Id, CurrentId);
- }
+ uint16_t CurrentId =
+ std::min(UpperRow[IdxUpper].Id, LowerRow[IdxLower].Id);
- if (IdxUpper < UpperRow.size() && UpperRow[IdxUpper].Id == CurrentId) {
+ if (UpperRow[IdxUpper].Id == CurrentId) {
UpperV = UpperRow[IdxUpper].Coefficient;
- IdxUpper++;
+ IdxUpper += 1;
}
-
- if (MulOverflow(UpperV, -1 * LowerLast, M1))
- return false;
- if (IdxLower < LowerRow.size() && LowerRow[IdxLower].Id == CurrentId) {
+ if (LowerRow[IdxLower].Id == CurrentId) {
LowerV = LowerRow[IdxLower].Coefficient;
- IdxLower++;
+ IdxLower += 1;
+ }
+
+ if (MulOverflow(UpperV, -1 * LowerLast, M1) ||
+ MulOverflow(LowerV, UpperLast, M2) || AddOverflow(M1, M2, N)) {
+ Overflow = true;
+ continue;
}
- if (MulOverflow(LowerV, UpperLast, M2))
- return false;
- if (AddOverflow(M1, M2, N))
- return false;
+ // useless to add a 0 to a sparse vector
if (N == 0)
continue;
+
NR.emplace_back(N, CurrentId);
}
if (NR.empty())
@@ -120,13 +120,16 @@ bool ConstraintSystem::eliminateUsingFM() {
}
NumVariables -= 1;
- return true;
+ return !Overflow;
}
bool ConstraintSystem::mayHaveSolutionImpl() {
- while (!Constraints.empty() && NumVariables > 1) {
- if (!eliminateUsingFM())
- return true;
+ while (!Constraints.empty() && Constraints.size() <= 500 && NumVariables > 1) {
+ if(!eliminateUsingFM()) {
+ LLVM_DEBUG(
+ dbgs()
+ << "Some new constraints has been ignored during elimination.\n");
+ };
}
if (Constraints.empty() || NumVariables > 1)
diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index 7e48c28176bd1..e4f2feb1b28bf 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -916,6 +916,45 @@ void State::addInfoForInductions(BasicBlock &BB) {
!SE.isSCEVable(PN->getType()))
return;
+ auto *AR = dyn_cast_or_null<SCEVAddRecExpr>(SE.getSCEV(PN));
+ BasicBlock *LoopPred = L->getLoopPredecessor();
+ if (!AR || AR->getLoop() != L || !LoopPred)
+ return;
+
+ // If SCEV provides relevant range information, we push those facts
+ // to the worklist relatively to the header node itself (and not its
+ // successor).
+ auto UnsignedRange = SE.getUnsignedRange(AR);
+ auto SignedRange = SE.getSignedRange(AR);
+ DomTreeNode *DTNHeader = DT.getNode(&BB);
+
+ // The range can wrap thus we take Min/Max instead of Lower/Upper
+ auto UnsignedMin = UnsignedRange.getUnsignedMin();
+ auto UnsignedMax = UnsignedRange.getUnsignedMax();
+ if (!UnsignedMax.isMaxValue()) {
+ WorkList.push_back(FactOrCheck::getConditionFact(
+ DTNHeader, CmpInst::ICMP_ULE, PN,
+ Constant::getIntegerValue(PN->getType(), UnsignedMax)));
+ }
+ if (!UnsignedMin.isMinValue()) {
+ WorkList.push_back(FactOrCheck::getConditionFact(
+ DTNHeader, CmpInst::ICMP_UGE, PN,
+ Constant::getIntegerValue(PN->getType(), UnsignedMin)));
+ }
+
+ auto SignedMin = SignedRange.getSignedMin();
+ auto SignedMax = SignedRange.getSignedMax();
+ if (!SignedMax.isMaxSignedValue()) {
+ WorkList.push_back(FactOrCheck::getConditionFact(
+ DTNHeader, CmpInst::ICMP_SLE, PN,
+ Constant::getIntegerValue(PN->getType(), SignedMax)));
+ }
+ if (!SignedMin.isMinSignedValue()) {
+ WorkList.push_back(FactOrCheck::getConditionFact(
+ DTNHeader, CmpInst::ICMP_SGE, PN,
+ Constant::getIntegerValue(PN->getType(), SignedMin)));
+ }
+
BasicBlock *InLoopSucc = nullptr;
if (Pred == CmpInst::ICMP_NE)
InLoopSucc = cast<BranchInst>(BB.getTerminator())->getSuccessor(0);
@@ -927,11 +966,6 @@ void State::addInfoForInductions(BasicBlock &BB) {
if (!L->contains(InLoopSucc) || !L->isLoopExiting(&BB) || InLoopSucc == &BB)
return;
- auto *AR = dyn_cast_or_null<SCEVAddRecExpr>(SE.getSCEV(PN));
- BasicBlock *LoopPred = L->getLoopPredecessor();
- if (!AR || AR->getLoop() != L || !LoopPred)
- return;
-
const SCEV *StartSCEV = AR->getStart();
Value *StartValue = nullptr;
if (auto *C = dyn_cast<SCEVConstant>(StartSCEV)) {
diff --git a/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll b/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
index 88f87f4afab28..efb89fd0512d9 100644
--- a/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
+++ b/llvm/test/Transforms/ConstraintElimination/constraint-overflow.ll
@@ -13,8 +13,7 @@ define i32 @f(i64 %a3, i64 %numElements) {
; CHECK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[A1]], [[A3]]
; CHECK-NEXT: br i1 [[CMP]], label [[IF_END_I:%.*]], label [[ABORT:%.*]]
; CHECK: if.end.i:
-; CHECK-NEXT: [[CMP2_NOT_I:%.*]] = icmp ult i64 [[A1]], [[A3]]
-; CHECK-NEXT: br i1 [[CMP2_NOT_I]], label [[ABORT]], label [[EXIT:%.*]]
+; CHECK-NEXT: br i1 false, label [[ABORT]], label [[EXIT:%.*]]
; CHECK: abort:
; CHECK-NEXT: ret i32 -1
; CHECK: exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll b/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
index 3dbea9496da8d..72a998e71ff87 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops-bottom-tested-base.ll
@@ -11,10 +11,8 @@ define void @loop_iv_cond_variable_bound(i32 %n) {
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[T_1:%.*]] = icmp ule i32 [[IV]], [[N:%.*]]
; CHECK-NEXT: call void @use(i1 [[T_1]])
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT: call void @use(i1 [[T_2]])
-; CHECK-NEXT: [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT: call void @use(i1 [[T_3]])
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[C_1:%.*]] = icmp ult i32 [[IV]], [[N]]
; CHECK-NEXT: call void @use(i1 [[C_1]])
; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
@@ -56,12 +54,9 @@ define void @loop_iv_cond_constant_bound() {
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT: [[T_1:%.*]] = icmp ule i32 [[IV]], 2
-; CHECK-NEXT: call void @use(i1 [[T_1]])
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT: call void @use(i1 [[T_2]])
-; CHECK-NEXT: [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT: call void @use(i1 [[T_3]])
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[C_1:%.*]] = icmp ult i32 [[IV]], 2
; CHECK-NEXT: call void @use(i1 [[C_1]])
; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
diff --git a/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll b/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
index 7b928a030614b..61b3411a8cded 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops-header-tested-base.ll
@@ -14,15 +14,11 @@ define void @loop_phi_pos_start_value(i32 %y, i1 %c, i32 %n) {
; CHECK: loop.latch:
; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: call void @use(i1 false)
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[X]], 10
-; CHECK-NEXT: call void @use(i1 [[T_2]])
-; CHECK-NEXT: [[C_2:%.*]] = icmp sle i32 [[X]], 9
-; CHECK-NEXT: call void @use(i1 [[C_2]])
-; CHECK-NEXT: [[C_3:%.*]] = icmp sgt i32 [[X]], 9
-; CHECK-NEXT: call void @use(i1 [[C_3]])
; CHECK-NEXT: call void @use(i1 true)
-; CHECK-NEXT: [[C_5:%.*]] = icmp sge i32 [[X]], 9
-; CHECK-NEXT: call void @use(i1 [[C_5]])
+; CHECK-NEXT: call void @use(i1 false)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[X_NEXT]] = add nsw i32 [[X]], 1
; CHECK-NEXT: br label [[LOOP_HEADER]]
; CHECK: exit:
@@ -75,8 +71,7 @@ define void @loop_phi_neg_start_value(i32 %y, i1 %c, i32 %n) {
; CHECK: loop.latch:
; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: call void @use(i1 false)
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[X]], -10
-; CHECK-NEXT: call void @use(i1 [[T_2]])
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[C_2:%.*]] = icmp sle i32 [[X]], 9
; CHECK-NEXT: call void @use(i1 [[C_2]])
; CHECK-NEXT: [[C_3:%.*]] = icmp sgt i32 [[X]], 9
@@ -248,12 +243,11 @@ define void @loop_latch_not_executed_constant_bound(i32 %y, i1 %c) {
; CHECK-NEXT: br i1 [[C:%.*]], label [[LOOP_HEADER:%.*]], label [[EXIT:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[X:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[X_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[C_1:%.*]] = icmp ugt i32 [[X]], 10
-; CHECK-NEXT: br i1 [[C_1]], label [[LOOP_LATCH]], label [[EXIT]]
+; CHECK-NEXT: br i1 false, label [[LOOP_LATCH]], label [[EXIT]]
; CHECK: loop.latch:
-; CHECK-NEXT: call void @use(i1 false)
; CHECK-NEXT: call void @use(i1 true)
-; CHECK-NEXT: call void @use(i1 false)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[X_NEXT]] = add i32 [[X]], 1
; CHECK-NEXT: br label [[LOOP_HEADER]]
@@ -299,10 +293,8 @@ define void @loop_iv_cond_variable_bound(i32 %n) {
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[T_1:%.*]] = icmp ule i32 [[IV]], [[N:%.*]]
; CHECK-NEXT: call void @use(i1 [[T_1]])
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT: call void @use(i1 [[T_2]])
-; CHECK-NEXT: [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT: call void @use(i1 [[T_3]])
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[C_1:%.*]] = icmp ult i32 [[IV]], [[N]]
; CHECK-NEXT: call void @use(i1 [[C_1]])
; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
@@ -362,12 +354,9 @@ define void @loop_iv_cond_constant_bound() {
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[T_1:%.*]] = icmp ule i32 [[IV]], 2
-; CHECK-NEXT: call void @use(i1 [[T_1]])
-; CHECK-NEXT: [[T_2:%.*]] = icmp sge i32 [[IV]], 0
-; CHECK-NEXT: call void @use(i1 [[T_2]])
-; CHECK-NEXT: [[T_3:%.*]] = icmp sge i32 [[IV]], -1
-; CHECK-NEXT: call void @use(i1 [[T_3]])
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[C_1:%.*]] = icmp ult i32 [[IV]], 2
; CHECK-NEXT: call void @use(i1 [[C_1]])
; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[IV]], 1
diff --git a/llvm/test/Transforms/ConstraintElimination/loops.ll b/llvm/test/Transforms/ConstraintElimination/loops.ll
index 25d4ac436ce8c..8753fff4aac7e 100644
--- a/llvm/test/Transforms/ConstraintElimination/loops.ll
+++ b/llvm/test/Transforms/ConstraintElimination/loops.ll
@@ -107,15 +107,12 @@ define i32 @loop_header_dom_successors_flipped(i32 %y, i1 %c) {
; CHECK-NEXT: br i1 [[C:%.*]], label [[LOOP_HEADER:%.*]], label [[EXIT:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[X:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[X_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[C_1:%.*]] = icmp ule i32 [[X]], 10
-; CHECK-NEXT: br i1 [[C_1]], label [[EXIT]], label [[LOOP_LATCH]]
+; CHECK-NEXT: br i1 true, label [[EXIT]], label [[LOOP_LATCH]]
; CHECK: loop.latch:
-; CHECK-NEXT: call void @use(i1 false)
; CHECK-NEXT: call void @use(i1 true)
-; CHECK-NEXT: [[C_2:%.*]] = icmp ugt i32 [[X]], 11
-; CHECK-NEXT: call void @use(i1 [[C_2]])
-; CHECK-NEXT: [[C_3:%.*]] = icmp ule i32 [[X]], 11
-; CHECK-NEXT: call void @use(i1 [[C_3]])
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[X_NEXT]] = add i32 [[X]], 1
; CHECK-NEXT: br label [[LOOP_HEADER]]
; CHECK: exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
index 155423d33a69a..a81f2059026c0 100644
--- a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
+++ b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-decrement.ll
@@ -197,8 +197,7 @@ define void @add_rec_decreasing_2_cond_true_constant(i8 noundef %len) {
; CHECK-NEXT: [[CMP2_NOT:%.*]] = icmp eq i8 [[K_0]], 0
; CHECK-NEXT: br i1 [[CMP2_NOT]], label [[EXIT:%.*]], label [[LOOP_LATCH]]
; CHECK: loop.latch:
-; CHECK-NEXT: [[CMP_NOT_I:%.*]] = icmp ult i8 [[K_0]], 5
-; CHECK-NEXT: call void @use(i1 [[CMP_NOT_I]])
+; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: [[K_DEC]] = add i8 [[K_0]], -2
; CHECK-NEXT: br label [[LOOP_HEADER]]
; CHECK: exit:
diff --git a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
index 17aa4d54abfba..2a58220b78f00 100644
--- a/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
+++ b/llvm/test/Transforms/ConstraintElimination/monotonic-int-phis-wrapping.ll
@@ -49,8 +49,7 @@ define void @test_iv_nuw_nsw_1_uge_start(i8 %len.n, i8 %a) {
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.*]] = phi i8 [ -1, [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT: br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[C_2:%.*]] = call i1 @cond()
; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
@@ -90,8 +89,7 @@ define void @test_iv_nuw_nsw_2_uge_start(i8 %len.n, i8 %a) {
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.*]] = phi i8 [ [[START]], [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT: br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[C_2:%.*]] = call i1 @cond()
; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
@@ -131,8 +129,7 @@ define void @test_iv_nsw_nuw_1_ult_end(i8 %len.n, i8 %a) {
; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
; CHECK: loop.header:
; CHECK-NEXT: [[IV:%.*]] = phi i8 [ -2, [[LOOP_PH:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[IV]], 1
-; CHECK-NEXT: br i1 [[C]], label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
+; CHECK-NEXT: br i1 false, label [[EXIT:%.*]], label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[C_2:%.*]] = call i1 @cond()
; CHECK-NEXT: br i1 [[C_2]], label [[LOOP_LATCH]], label [[EXIT]]
diff --git a/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll b/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll
new file mode 100644
index 0000000000000..0c77574add33d
--- /dev/null
+++ b/llvm/test/Transforms/ConstraintElimination/scev-phis-range-facts.ll
@@ -0,0 +1,83 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -passes=constraint-elimination -S %s | FileCheck %s
+
+define i1 @test_ult_loop_with_ult_icmp_N_at_end(ptr %s) {
+; CHECK-LABEL: define i1 @test_ult_loop_with_ult_icmp_N_at_end(
+; CHECK-SAME: ptr [[S:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[WHILE_COND:%.*]]
+; CHECK: while.cond:
+; CHECK-NEXT: [[I_0:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[ADD:%.*]], [[WHILE_BODY:%.*]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[I_0]], 1234
+; CHECK-NEXT: br i1 [[CMP]], label [[WHILE_BODY]], label [[WHILE_END:%.*]]
+; CHECK: while.body:
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[S]], i64 [[I_0]]
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
+; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[TMP0]], -48
+; CHECK-NEXT: [[OR_COND:%.*]] = icmp ult i8 [[TMP1]], 10
+; CHECK-NEXT: [[ADD]] = add n...
[truncated]
|
I switch the PR to draft as there is a failing test: Input was:
<<<<<<
22: n/a
23: Statements {
24: Stmt_body_i
25: Domain :=
26: [n] -> { Stmt_body_i[i0, i1] : 0 <= i0 < n and 0 <= i1 < n };
27: Schedule :=
next:84'0 X error: no match found
28: [n] -> { Stmt_body_i[i0, i1] -> [i0, i1] };
next:84'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:84'1 ? possible intended match
29: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0]
next:84'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30: [n] -> { Stmt_body_i[i0, i1] -> MemRef_A[i0 + i1] };
next:84'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31: }
next:84'0 ~~~
32: Function: caller
next:84'0 ~~~~~~~~~~~~~~~~~~
33: Region: %entry.split---%polly.merge_new_and_old
next:84'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.
.
.
>>>>>> |
Simplify the code of the loop performing the element-wise multiplication as there were strange patterns. I added some comments to help understand the intent of the code for some external viewers.
dd0e727
to
e174704
Compare
Below the test that fails: define internal void @callee(i32 %n, ptr noalias nonnull %A, i32 %i) {
entry:
br label %for
for:
%j = phi i32 [0, %entry], [%j.inc, %inc]
%j.cmp = icmp slt i32 %j, %n
br i1 %j.cmp, label %body, label %exit
body:
%idx = add i32 %i, %j
%arrayidx = getelementptr inbounds double, ptr %A, i32 %idx
store double 42.0, ptr %arrayidx
br label %inc
inc:
%j.inc = add nuw nsw i32 %j, 1
br label %for
exit:
br label %return
return:
ret void
} I rebased as I don't understand what's going on. I'll take a closer look at it if it fails again. |
it fails (as expected actually), I'm taking a look at it. |
Very strange error it seems the output of Below the output of the twos: |
I understand now why it fails. Polly uses for tests Attached bellow the result depending $LLVM_BUILD/bin/opt \
-polly-process-unprofitable \
-polly-remarks-minimal \
-polly-use-llvm-names \
-polly-import-jscop-dir=$LLVM_ROOT/polly/test/Support \
-polly-codegen-verify \
-O3 \
-polly \
-polly-position=before-vectorizer \
-disable-output \
$LLVM_ROOT/polly/test/Support/pipelineposition.ll \
--print-before-all 2>&1 Below is the output of the above comment depending on the commit |
More precisely, here is the IR dump before constraint elimination: define void @caller(i32 %n, ptr noalias nocapture nonnull writeonly %A) local_unnamed_addr {
entry:
br label %for
for: ; preds = %callee.exit, %entry
%i = phi i32 [ 0, %entry ], [ %j.inc, %callee.exit ]
%i.cmp = icmp slt i32 %i, %n
br i1 %i.cmp, label %body, label %return
body: ; preds = %for
%j.cmp1.i = icmp sgt i32 %n, 0
br i1 %j.cmp1.i, label %body.i, label %callee.exit
body.i: ; preds = %body.i, %body
%j2.i = phi i32 [ %j.inc.i, %body.i ], [ 0, %body ]
%idx.i = add nuw i32 %j2.i, %i
%0 = sext i32 %idx.i to i64
%arrayidx.i = getelementptr inbounds double, ptr %A, i64 %0
store double 4.200000e+01, ptr %arrayidx.i, align 8, !alias.scope !0
%j.inc.i = add nuw nsw i32 %j2.i, 1
%j.cmp.i = icmp slt i32 %j.inc.i, %n
br i1 %j.cmp.i, label %body.i, label %callee.exit
callee.exit: ; preds = %body, %body.i
%j.inc = add nuw nsw i32 %i, 1
br label %for
return: ; preds = %for
ret void
} When adding the SCEV ranges as facts to the header, it adds It doesn't seem to alter what's being tested for Polly (and neither the produced optimised IR). |
cea65de
to
86f4583
Compare
@fhahn I think the PR is ready. The changes are mainly for |
@fhahn ping |
You can test this locally with the following command:git-clang-format --diff febd89cafea11e6603f593e41be1a21ca9d009ac 86f4583def5ebd87a1ac08a819fee85f6fc53ae7 -- llvm/include/llvm/Analysis/ConstraintSystem.h llvm/lib/Analysis/ConstraintSystem.cpp llvm/lib/Transforms/Scalar/ConstraintElimination.cpp View the diff from clang-format here.diff --git a/llvm/include/llvm/Analysis/ConstraintSystem.h b/llvm/include/llvm/Analysis/ConstraintSystem.h
index d8698a6f64..32f61caee0 100644
--- a/llvm/include/llvm/Analysis/ConstraintSystem.h
+++ b/llvm/include/llvm/Analysis/ConstraintSystem.h
@@ -43,8 +43,9 @@ class ConstraintSystem {
return 0;
}
- // WARNING: it represents rather the maximum number of coefficients in the constraints
- // which is actually the number of variables PLUS one (for the constant part).
+ // WARNING: it represents rather the maximum number of coefficients in the
+ // constraints which is actually the number of variables PLUS one (for the
+ // constant part).
size_t NumVariables = 0;
/// Current linear constraints in the system.
diff --git a/llvm/lib/Analysis/ConstraintSystem.cpp b/llvm/lib/Analysis/ConstraintSystem.cpp
index 9dc6d25d11..9f7c32207f 100644
--- a/llvm/lib/Analysis/ConstraintSystem.cpp
+++ b/llvm/lib/Analysis/ConstraintSystem.cpp
@@ -124,8 +124,9 @@ bool ConstraintSystem::eliminateUsingFM() {
}
bool ConstraintSystem::mayHaveSolutionImpl() {
- while (!Constraints.empty() && Constraints.size() <= 500 && NumVariables > 1) {
- if(!eliminateUsingFM()) {
+ while (!Constraints.empty() && Constraints.size() <= 500 &&
+ NumVariables > 1) {
+ if (!eliminateUsingFM()) {
LLVM_DEBUG(
dbgs()
<< "Some new constraints has been ignored during elimination.\n");
|
thanks I'm solving the formatting thing right now. I don't know why I didn't spot it previously. Is there any other important CI pipelines that need manual actions ? |
When constructing a new constraint from a pair of lower/upper constraints on the variable to eliminate, it simply ignores the new constraint by not adding it if an overflow occurs when computing one of its coefficients or if there are already too many constraints. Previously, it simply stopped at this stage and considered the system as having solution. Thus adding new constraints could turn a unsatisfiable system into a possibly satisfiable one which is a little bit strange. Unsatisfiability of the new system still implies unsatisfiability of the former one (the projected solution set of the original system is included in the solution set of the new system).
When looking at the loop counter, if the min/max values of the SCEV ranges are relevant (not extremal), add this information as ConditionFacts for the Loop header. It allows icmp simplifications independently of the branch taken, such as using `i <= N` after a block `for(i=0; i<N; i+=1) { ... }`. Fixes llvm#90417 IMPORTANT: Concerning modifications to Polly One test for polly was modified as an icmp simplification could be done earlier in the pipeline than previously. The input of `polly::CodePreparationPass` is almost the same one, except the name of one block that is different because of this simplification done earlier.
86f4583
to
7062594
Compare
Computing the ranges unconditionally requires computing the BTC for all loops, which is quite expensive https://llvm-compile-time-tracker.com/compare.php?from=53d6cfb1d184127a57132657f41104b21e7c0f2b&to=86f4583def5ebd87a1ac08a819fee85f6fc53ae7&stat=instructions:u Not sure if the benefit warrants the increase; do you have any statistics on larger software how beneficial this is? Also it would probably be good to split off c41b5b5 at least to a separate PR |
#76299 seems related to the second commit. |
@fhahn I don't have statistics to motivate this change wrt perf regressions, my first goal was to have feature parity with gcc and I didn't take the performance regressions into account as I should have. I will take a look at Also, to better justify which kind of expression where it would be acceptable to perform a range analysis, I'm currently taking a look at gcc. gcc uses the Ranger framework which was already mentioned in the review for With only I think it will make me busy for two weeks at least before writing a message here with perf stats and good justifications of which sort of variables should be range analysed. @nikic thanks for pointing out this MR, I made a mistake |
As there is work to do on my side, I switch this PR to Draft. |
Little update to show some progress even if the work is not completed yet. To have a verbose output from gcc/Ranger, use the following command line
The The function It seems to me at the moment Below to remind me the calling functions a backtrace of Backtrace#0 number_of_iterations_exit (loop=loop@entry=0x7ffff7b2c4b0, exit=exit@entry=0x7ffff7d08a80, niter=niter@entry=0x7fffffffbe20, warn=warn@entry=false, every_iteration=every_iteration@entry=false, body=body@entry=0x3160520) at ../.././gcc/tree-ssa-loop-niter.cc:3285
#1 0x0000000001195796 in estimate_numbers_of_iterations (loop=loop@entry=0x7ffff7b2c4b0) at ../.././gcc/tree-ssa-loop-niter.cc:4882
#2 0x0000000001199156 in loop_exits_before_overflow (base=0x7ffff7d01de0, step=0x7ffff7d01de0, at_stmt=0x7ffff7cdaa00, loop=0x7ffff7b2c4b0) at ../.././gcc/tree-ssa-loop-niter.cc:5286
#3 scev_probably_wraps_p (var=var@entry=0x0, base=base@entry=0x7ffff7d01de0, step=step@entry=0x7ffff7d01de0, at_stmt=at_stmt@entry=0x7ffff7cdaa00, loop=0x7ffff7b2c4b0, use_overflow_semantics=use_overflow_semantics@entry=true) at ../.././gcc/tree-ssa-loop-niter.cc:5538
#4 0x00000000013a2622 in get_scev_info (r=..., name=0x7ffff7b2c4b0, stmt=<optimized out>, l=0x7ffff7b2c4b0, init=<synthetic pointer>: <optimized out>, step=<synthetic pointer>: <optimized out>, dir=<synthetic pointer>: <optimized out>) at ../.././gcc/vr-values.cc:204
#5 range_of_var_in_loop (v=..., name=name@entry=0x7ffff7d130d8, l=0x7ffff7b2c4b0, stmt=0x7ffff7cdaa00, query=0x313c140) at ../.././gcc/vr-values.cc:271
#6 0x0000000001f0992c in fold_using_range::range_of_ssa_name_with_loop_info (this=<optimized out>, r=..., name=0x7ffff7d130d8, l=<optimized out>, phi=<optimized out>, src=...) at ../.././gcc/gimple-range-fold.cc:1114
#7 0x0000000001f09fdb in fold_using_range::range_of_phi (this=this@entry=0x7fffffffd64f, r=..., phi=phi@entry=0x7ffff7cdaa00, src=...) at ../.././gcc/gimple-range-fold.cc:981
#8 0x0000000001f0d3d1 in fold_using_range::fold_stmt (this=this@entry=0x7fffffffd64f, r=..., s=s@entry=0x7ffff7cdaa00, src=..., name=name@entry=0x7ffff7d130d8) at ../.././gcc/gimple-range-fold.cc:606
#9 0x0000000001ef896e in gimple_ranger::fold_range_internal (this=0x313c140, r=..., s=0x7ffff7cdaa00, name=0x7ffff7d130d8) at ../.././gcc/gimple-range.cc:277
#10 gimple_ranger::range_of_stmt (this=0x313c140, r=..., s=0x7ffff7cdaa00, name=<optimized out>) at ../.././gcc/gimple-range.cc:338
#11 0x0000000001ef8bb2 in gimple_ranger::register_inferred_ranges (this=0x313c140, s=s@entry=0x7ffff7cdaa00) at ../.././gcc/gimple-range.cc:498
#12 0x0000000001304d41 in rvrp_folder::pre_fold_bb (this=0x7fffffffde20, bb=0x7ffff7cdca80) at ../.././gcc/tree-vrp.cc:1017
#13 0x00000000011e377e in substitute_and_fold_dom_walker::before_dom_children (this=0x7fffffffdd50, bb=0x7ffff7cdca80) at ../.././gcc/tree-ssa-propagate.cc:762
#14 0x0000000001e9db9e in dom_walker::walk (this=0x7fffffffdd50, bb=0x7ffff7cdca80) at ../.././gcc/domwalk.cc:311
#15 0x00000000011e2526 in substitute_and_fold_engine::substitute_and_fold (this=this@entry=0x7fffffffde20, block=block@entry=0x0) at ../.././gcc/tree-ssa-propagate.cc:999
#16 0x000000000130201d in execute_ranger_vrp (fun=0x7ffff7d0d0c0, warn_array_bounds_p=false, final_p=<optimized out>) at ../.././gcc/tree-vrp.cc:1080
#17 0x0000000000ec6419 in execute_one_pass (pass=pass@entry=0x3099300) at ../.././gcc/passes.cc:2649
#18 0x0000000000ec6d20 in execute_pass_list_1 (pass=0x3099300) at ../.././gcc/passes.cc:2758
#19 0x0000000000ec6d32 in execute_pass_list_1 (pass=0x3098e00) at ../.././gcc/passes.cc:2759
#20 0x0000000000ec6d59 in execute_pass_list (fn=0x7ffff7d0d0c0, pass=<optimized out>) at ../.././gcc/passes.cc:2769
#21 0x0000000000ec768d in do_per_function_toporder (callback=callback@entry=0xec6d40 <execute_pass_list(function*, opt_pass*)>, data=0x3098b60) at ../.././gcc/passes.cc:1774
#22 0x0000000000ec78cf in do_per_function_toporder (data=<optimized out>, callback=0xec6d40 <execute_pass_list(function*, opt_pass*)>) at ../.././gcc/passes.cc:1741
#23 execute_ipa_pass_list (pass=0x3098ae0) at ../.././gcc/passes.cc:3103
#24 0x0000000000aed8d7 in ipa_passes () at ../.././gcc/cgraphunit.cc:2214
#25 symbol_table::compile (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2337
#26 0x0000000000af06ad in symbol_table::compile (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2315
#27 symbol_table::finalize_compilation_unit (this=0x7ffff7b2d000) at ../.././gcc/cgraphunit.cc:2589
#28 0x0000000000ff46b2 in compile_file () at ../.././gcc/toplev.cc:476
#29 0x00000000009076dc in do_compile () at ../.././gcc/toplev.cc:2158
#30 toplev::main (this=this@entry=0x7fffffffe6de, argc=<optimized out>, argc@entry=15, argv=<optimized out>, argv@entry=0x7fffffffe738) at ../.././gcc/toplev.cc:2314
#31 0x0000000000908ebe in main (argc=15, argv=0x7fffffffe738) at ../.././gcc/main.cc:39 Some interesting classes I am still trying to understand:
Some abbreviations:
|
A little fact, LLVM also computes the BTC when the It seems SCEV don't use the NSW/NUW info when computing BTC which is a little bit suprising (when having a start greater than the end I expect it to have a range looking like |
I think you're right about not using the range information in this pass, as this pass seems to simplify using mainly a forward approach which traverses the CFG while keeping track of the branch conditions associated to the dominating edges (among other things). Supporting this case would be against this simple forward approach. There are other passes that do a range analysis but do not simplify icmp.
|
I think I found a possible solution with a minor change in |
This PR has been superseded by #93598 as it appears |
Fixes #90417
The modifications allow to use SCEV ranges for the loop counter to allow simplifications that doesn't depend on the branch taken.
As SCEV could create large numbers that would very likely trigger an overflow with
ConstraintSystem
,ConstraintSystem
was modified to skip the constraint when overflow occurs.