[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

lukel97 · 2024-11-14T09:21:36Z

This is a follow on from #115705. In computeMaxBECountForLT, we can apply the loop guards to the RHS before computing its maximum range to get a more accurate constant maximum trip count, similarly to what is done in howFarToZero.

One of the effects of this is that it allows isIndvarOverflowCheckKnownFalse to skip the overflow check in more cases.

This is a follow on from llvm#115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check.

llvmbot · 2024-11-14T09:22:11Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

This is a follow on from #115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check.

Full diff: https://github.com/llvm/llvm-project/pull/116187.diff

3 Files Affected:

(modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+2-2)
(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+10-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+2-7)

diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 4b8cb3a39a86db..8b7745e6ab1034 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -2218,8 +2218,8 @@ class ScalarEvolution {
   ///   actually doesn't, or we'd have to immediately execute UB)
   /// We *don't* assert these preconditions so please be careful.
   const SCEV *computeMaxBECountForLT(const SCEV *Start, const SCEV *Stride,
-                                     const SCEV *End, unsigned BitWidth,
-                                     bool IsSigned);
+                                     const SCEV *End, const Loop *L,
+                                     unsigned BitWidth, bool IsSigned);
 
   /// Verify if an linear IV with positive stride can overflow when in a
   /// less-than comparison, knowing the invariant term of the comparison,
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b10811133770e1..bb7306f5ba3778 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -12857,11 +12857,10 @@ const SCEV *ScalarEvolution::getUDivCeilSCEV(const SCEV *N, const SCEV *D) {
   return getAddExpr(MinNOne, getUDivExpr(NMinusOne, D));
 }
 
-const SCEV *ScalarEvolution::computeMaxBECountForLT(const SCEV *Start,
-                                                    const SCEV *Stride,
-                                                    const SCEV *End,
-                                                    unsigned BitWidth,
-                                                    bool IsSigned) {
+const SCEV *
+ScalarEvolution::computeMaxBECountForLT(const SCEV *Start, const SCEV *Stride,
+                                        const SCEV *End, const Loop *L,
+                                        unsigned BitWidth, bool IsSigned) {
   // The logic in this function assumes we can represent a positive stride.
   // If we can't, the backedge-taken count must be zero.
   if (IsSigned && BitWidth == 1)
@@ -12895,8 +12894,10 @@ const SCEV *ScalarEvolution::computeMaxBECountForLT(const SCEV *Start,
   // the case End = RHS of the loop termination condition. This is safe because
   // in the other case (End - Start) is zero, leading to a zero maximum backedge
   // taken count.
-  APInt MaxEnd = IsSigned ? APIntOps::smin(getSignedRangeMax(End), Limit)
-                          : APIntOps::umin(getUnsignedRangeMax(End), Limit);
+  const SCEV *GuardedEnd = applyLoopGuards(End, L);
+  APInt MaxEnd = IsSigned
+                     ? APIntOps::smin(getSignedRangeMax(GuardedEnd), Limit)
+                     : APIntOps::umin(getUnsignedRangeMax(GuardedEnd), Limit);
 
   // MaxBECount = ceil((max(MaxEnd, MinStart) - MinStart) / Stride)
   MaxEnd = IsSigned ? APIntOps::smax(MaxEnd, MinStart)
@@ -13150,7 +13151,7 @@ ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,
       // loop (RHS), and the fact that IV does not overflow (which is
       // checked above).
       const SCEV *MaxBECount = computeMaxBECountForLT(
-          Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
+          Start, Stride, RHS, L, getTypeSizeInBits(LHS->getType()), IsSigned);
       return ExitLimit(getCouldNotCompute() /* ExactNotTaken */, MaxBECount,
                        MaxBECount, false /*MaxOrZero*/, Predicates);
     }
@@ -13334,7 +13335,7 @@ ScalarEvolution::howManyLessThans(const SCEV *LHS, const SCEV *RHS,
     MaxOrZero = true;
   } else {
     ConstantMaxBECount = computeMaxBECountForLT(
-        Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
+        Start, Stride, RHS, L, getTypeSizeInBits(LHS->getType()), IsSigned);
   }
 
   if (isa<SCEVCouldNotCompute>(ConstantMaxBECount) &&
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
index a8cf002182e240..735421a4f65ee8 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
@@ -4,8 +4,7 @@
 ; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
 ; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
 
-; TODO: We know the IV will never overflow here so we can skip the overflow
-; check
+; We know the IV will never overflow here so we can skip the overflow check
 
 define void @trip_count_max_1024(ptr %p, i64 %tc) vscale_range(2, 1024) {
 ; CHECK-LABEL: define void @trip_count_max_1024(
@@ -15,11 +14,7 @@ define void @trip_count_max_1024(ptr %p, i64 %tc) vscale_range(2, 1024) {
 ; CHECK-NEXT:    br i1 [[GUARD]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]]
 ; CHECK:       [[LOOP_PREHEADER]]:
 ; CHECK-NEXT:    [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[TC]], i64 1)
-; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; CHECK-NEXT:    [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; CHECK-NEXT:    br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 2

david-arm

LGTM! The patch seems reasonable to me - both howFarToZero and howManyLessThans already seem to apply loop guards in other places.

lukel97 · 2024-11-14T09:56:19Z

The pre-merge CI has picked up a lot more tests that have changed that I completely forgot to run locally, my bad. I'll check the diffs on my end and then update this PR soon

david-arm · 2024-11-14T09:58:51Z

Hmm, that's also my bad for being too eager to accept before the pre-commit results. :) No worries - I'll take a look again once you've updated the patch!

fhahn · 2024-11-14T10:04:30Z

Please make sure there are SCEV-only tests for the change

nikic

This adds significant compile time overhead: https://llvm-compile-time-tracker.com/compare.php?from=e52238b59f250aef5dc0925866d0308305a19dbf&to=49f1bf83d7a988020bea4492e76620ae685d73aa&stat=instructions:u

Possibly using pre-computed guards in howManyLessThans would help, though I'm not sure (the existing guard uses are conditional).

nikic · 2024-11-14T10:47:17Z

Possibly using pre-computed guards in howManyLessThans would help, though I'm not sure (the existing guard uses are conditional).

Or going one level higher, we could pre-compute the guards for all exit count calculation for a given loop, which would be beneficial for multi-exit loops. Possibly this needs to be lazy though for the cases where we don't need the guards at all.

david-arm · 2024-11-14T11:04:39Z

Possibly using pre-computed guards in howManyLessThans would help, though I'm not sure (the existing guard uses are conditional).

Or going one level higher, we could pre-compute the guards for all exit count calculation for a given loop, which would be beneficial for multi-exit loops. Possibly this needs to be lazy though for the cases where we don't need the guards at all.

I see that getSmallConstantTripMultiple applies the loop guards after calculating the exit count, i.e.

unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L,
                                                       const SCEV *ExitCount) {
  if (ExitCount == getCouldNotCompute())
    return 1;

  // Get the trip count
  const SCEV *TCExpr = getTripCountFromExitCount(applyLoopGuards(ExitCount, L));

...
}

and perhaps this function is used less in the code base so the additional compile time cost is minimal? If we did something similar in getSmallConstantTripCount perhaps that might keep the compile time down? Not sure if applying the loop guards this late still gives us the benefits though.

lukel97 · 2024-11-19T11:48:48Z

I went through and updated the failed tests, this actually ends up reducing the max trip count in a lot more places than expected.

If we did something similar in getSmallConstantTripCount perhaps that might keep the compile time down? Not sure if applying the loop guards this late still gives us the benefits though.

I gave this a try but it doesn't seem to work, it looks we need to apply the guards before we call get{Signed,Unigned}RangeMax in order for ConstantMaxBECount to actually be reduced.

I've also noticed that a good few of the other calls to get{Signed,Unigned}RangeMax tend to wrap the SCEV with applyLoopGuards, e.g. in howFarToZero, so maybe this is kind of a pattern.

I'll continue poking around to see if there's anywhere else we can make this more efficient. My gut feeling is that hoisting the loop guard collection to howManyLessThans won't help given that the only other uses of applyLoopGuards there are mutually exclusive to computeMaxBECountForLT, so I don't think it will end up deduplicating any work

lukel97 · 2024-11-21T16:47:28Z

I'm going to close this for now since I'm can't really think of another way of avoiding the compile time hit, and this was kind of just a follow-up that fell out of #115705.

I benchmarked this on SPEC CPU 2017 on RISC-V and whilst the generated code was different (I think I saw one loop vectorized where it wasn't previously?), it didn't have any measurable performance impact so I'm happy to go without it for now.

If we end up moving to EVL tail folding eventually this might be worth picking up again to remove the overflow check, but on RISC-V vscale will always be a power-of-2 so there might be other ways of avoiding it.

[SCEV] Apply loop guards to End computeMaxBECountForLT

f03b341

This is a follow on from llvm#115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check.

lukel97 requested a review from nikic as a code owner November 14, 2024 09:21

lukel97 requested review from david-arm, fhahn and preames November 14, 2024 09:21

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Nov 14, 2024

lukel97 changed the title ~~[SCEV] Apply loop guards to End computeMaxBECountForLT~~ [SCEV] Apply loop guards to End in computeMaxBECountForLT Nov 14, 2024

david-arm approved these changes Nov 14, 2024

View reviewed changes

lukel97 marked this pull request as draft November 14, 2024 10:03

nikic reviewed Nov 14, 2024

View reviewed changes

Update tests

8c104fe

lukel97 marked this pull request as ready for review November 19, 2024 11:43

lukel97 mentioned this pull request Nov 20, 2024

[SCEV] Memoize collected loop guards. NFCI #116947

Closed

lukel97 closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

Uh oh!

lukel97 commented Nov 14, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 14, 2024 •

edited

Loading

Uh oh!

david-arm left a comment

Uh oh!

lukel97 commented Nov 14, 2024

Uh oh!

david-arm commented Nov 14, 2024

Uh oh!

fhahn commented Nov 14, 2024

Uh oh!

nikic left a comment

Uh oh!

nikic commented Nov 14, 2024

Uh oh!

david-arm commented Nov 14, 2024

Uh oh!

lukel97 commented Nov 19, 2024 •

edited

Loading

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

Uh oh!

[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

Uh oh!

Conversation

lukel97 commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Nov 14, 2024

Uh oh!

david-arm commented Nov 14, 2024

Uh oh!

fhahn commented Nov 14, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic commented Nov 14, 2024

Uh oh!

david-arm commented Nov 14, 2024

Uh oh!

lukel97 commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

Uh oh!

lukel97 commented Nov 14, 2024 •

edited

Loading

llvmbot commented Nov 14, 2024 •

edited

Loading

lukel97 commented Nov 19, 2024 •

edited

Loading