-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187
Conversation
This is a follow on from llvm#115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Luke Lau (lukel97) ChangesThis is a follow on from #115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check. Full diff: https://github.com/llvm/llvm-project/pull/116187.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 4b8cb3a39a86db..8b7745e6ab1034 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -2218,8 +2218,8 @@ class ScalarEvolution {
/// actually doesn't, or we'd have to immediately execute UB)
/// We *don't* assert these preconditions so please be careful.
const SCEV *computeMaxBECountForLT(const SCEV *Start, const SCEV *Stride,
- const SCEV *End, unsigned BitWidth,
- bool IsSigned);
+ const SCEV *End, const Loop *L,
+ unsigned BitWidth, bool IsSigned);
/// Verify if an linear IV with positive stride can overflow when in a
/// less-than comparison, knowing the invariant term of the comparison,
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b10811133770e1..bb7306f5ba3778 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -12857,11 +12857,10 @@ const SCEV *ScalarEvolution::getUDivCeilSCEV(const SCEV *N, const SCEV *D) {
return getAddExpr(MinNOne, getUDivExpr(NMinusOne, D));
}
-const SCEV *ScalarEvolution::computeMaxBECountForLT(const SCEV *Start,
- const SCEV *Stride,
- const SCEV *End,
- unsigned BitWidth,
- bool IsSigned) {
+const SCEV *
+ScalarEvolution::computeMaxBECountForLT(const SCEV *Start, const SCEV *Stride,
+ const SCEV *End, const Loop *L,
+ unsigned BitWidth, bool IsSigned) {
// The logic in this function assumes we can represent a positive stride.
// If we can't, the backedge-taken count must be zero.
if (IsSigned && BitWidth == 1)
@@ -12895,8 +12894,10 @@ const SCEV *ScalarEvolution::computeMaxBECountForLT(const SCEV *Start,
// the case End = RHS of the loop termination condition. This is safe because
// in the other case (End - Start) is zero, leading to a zero maximum backedge
// taken count.
- APInt MaxEnd = IsSigned ? APIntOps::smin(getSignedRangeMax(End), Limit)
- : APIntOps::umin(getUnsignedRangeMax(End), Limit);
+ const SCEV *GuardedEnd = applyLoopGuards(End, L);
+ APInt MaxEnd = IsSigned
+ ? APIntOps::smin(getSignedRangeMax(GuardedEnd), Limit)
+ : APIntOps::umin(getUnsignedRangeMax(GuardedEnd), Limit);
// MaxBECount = ceil((max(MaxEnd, MinStart) - MinStart) / Stride)
MaxEnd = IsSigned ? APIntOps::smax(MaxEnd, MinStart)
@@ -13150,7 +13151,7 @@ ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,
// loop (RHS), and the fact that IV does not overflow (which is
// checked above).
const SCEV *MaxBECount = computeMaxBECountForLT(
- Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
+ Start, Stride, RHS, L, getTypeSizeInBits(LHS->getType()), IsSigned);
return ExitLimit(getCouldNotCompute() /* ExactNotTaken */, MaxBECount,
MaxBECount, false /*MaxOrZero*/, Predicates);
}
@@ -13334,7 +13335,7 @@ ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,
MaxOrZero = true;
} else {
ConstantMaxBECount = computeMaxBECountForLT(
- Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
+ Start, Stride, RHS, L, getTypeSizeInBits(LHS->getType()), IsSigned);
}
if (isa<SCEVCouldNotCompute>(ConstantMaxBECount) &&
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
index a8cf002182e240..735421a4f65ee8 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
@@ -4,8 +4,7 @@
; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
-; TODO: We know the IV will never overflow here so we can skip the overflow
-; check
+; We know the IV will never overflow here so we can skip the overflow check
define void @trip_count_max_1024(ptr %p, i64 %tc) vscale_range(2, 1024) {
; CHECK-LABEL: define void @trip_count_max_1024(
@@ -15,11 +14,7 @@ define void @trip_count_max_1024(ptr %p, i64 %tc) vscale_range(2, 1024) {
; CHECK-NEXT: br i1 [[GUARD]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]]
; CHECK: [[LOOP_PREHEADER]]:
; CHECK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[TC]], i64 1)
-; CHECK-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; CHECK-NEXT: br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The patch seems reasonable to me - both howFarToZero and howManyLessThans already seem to apply loop guards in other places.
The pre-merge CI has picked up a lot more tests that have changed that I completely forgot to run locally, my bad. I'll check the diffs on my end and then update this PR soon |
Hmm, that's also my bad for being too eager to accept before the pre-commit results. :) No worries - I'll take a look again once you've updated the patch! |
Please make sure there are SCEV-only tests for the change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds significant compile time overhead: https://llvm-compile-time-tracker.com/compare.php?from=e52238b59f250aef5dc0925866d0308305a19dbf&to=49f1bf83d7a988020bea4492e76620ae685d73aa&stat=instructions:u
Possibly using pre-computed guards in howManyLessThans would help, though I'm not sure (the existing guard uses are conditional).
Or going one level higher, we could pre-compute the guards for all exit count calculation for a given loop, which would be beneficial for multi-exit loops. Possibly this needs to be lazy though for the cases where we don't need the guards at all. |
I see that
and perhaps this function is used less in the code base so the additional compile time cost is minimal? If we did something similar in |
I went through and updated the failed tests, this actually ends up reducing the max trip count in a lot more places than expected.
I gave this a try but it doesn't seem to work, it looks we need to apply the guards before we call I've also noticed that a good few of the other calls to I'll continue poking around to see if there's anywhere else we can make this more efficient. My gut feeling is that hoisting the loop guard collection to |
I'm going to close this for now since I'm can't really think of another way of avoiding the compile time hit, and this was kind of just a follow-up that fell out of #115705. I benchmarked this on SPEC CPU 2017 on RISC-V and whilst the generated code was different (I think I saw one loop vectorized where it wasn't previously?), it didn't have any measurable performance impact so I'm happy to go without it for now. If we end up moving to EVL tail folding eventually this might be worth picking up again to remove the overflow check, but on RISC-V vscale will always be a power-of-2 so there might be other ways of avoiding it. |
This is a follow on from #115705. In
computeMaxBECountForLT
, we can apply the loop guards to the RHS before computing its maximum range to get a more accurate constant maximum trip count, similarly to what is done inhowFarToZero
.One of the effects of this is that it allows isIndvarOverflowCheckKnownFalse to skip the overflow check in more cases.