[LSR][term-fold] Adjust expansion budget based on trip count #80304

preames · 2024-02-01T16:06:10Z

Follow up to #74747

This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations.

Follow up to llvm@4d57ba1. This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations.

llvmbot · 2024-02-01T16:06:41Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

Follow up to #74747

This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations.

Full diff: https://github.com/llvm/llvm-project/pull/80304.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+12-2)
(modified) llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll (+12-20)

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 831911ab21c22..356cc5b11ae48 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -6813,6 +6813,17 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
   if (!isAlmostDeadIV(ToFold, LoopLatch, TermCond))
     return std::nullopt;
 
+  // Inserting instructions in the preheader has a runtime cost, scale
+  // the allowed cost with the loops trip count as best we can.
+  const unsigned ExpansionBudget = [&]() {
+    if (unsigned SmallTC = SE.getSmallConstantMaxTripCount(L))
+      return std::min(2*SCEVCheapExpansionBudget, SmallTC);
+    if (std::optional<unsigned> SmallTC = getLoopEstimatedTripCount(L))
+      return std::min(2*SCEVCheapExpansionBudget, *SmallTC);
+    // Unknown trip count, assume long running by default.
+    return 2*SCEVCheapExpansionBudget;
+  }();
+
   const SCEV *BECount = SE.getBackedgeTakenCount(L);
   const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
   SCEVExpander Expander(SE, DL, "lsr_fold_term_cond");
@@ -6862,8 +6873,7 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
       continue;
     }
 
-    if (Expander.isHighCostExpansion(TermValueSLocal, L,
-                                     2*SCEVCheapExpansionBudget,
+    if (Expander.isHighCostExpansion(TermValueSLocal, L, ExpansionBudget,
                                      &TTI, InsertPt)) {
       LLVM_DEBUG(
           dbgs() << "Is too expensive to expand terminating value for phi node"
diff --git a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
index be24ecf112e30..7299a014b7983 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
@@ -482,20 +482,15 @@ define void @profiled_short_tc(ptr %a, i32 %offset, i32 %n) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
-; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[N:%.*]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
-; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
-; CHECK-NEXT:    [[TMP3:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
-; CHECK-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
-; CHECK-NEXT:    [[TMP5:%.*]] = add nsw i64 [[TMP4]], 84
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV:%.*]] = phi i32 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[LSR_IV_NEXT]] = add nsw i32 [[LSR_IV]], 1
 ; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
-; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
-; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[LSR_IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret void
 ;
@@ -637,16 +632,15 @@ define void @small_tc_trivial_loop(ptr %a, i32 %offset) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
-; CHECK-NEXT:    [[TMP0:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
-; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i64 [[TMP0]], 84
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV:%.*]] = phi i32 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[LSR_IV_NEXT]] = add nsw i32 [[LSR_IV]], 1
 ; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
-; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
-; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[LSR_IV_NEXT]], 1
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret void
 ;
@@ -673,17 +667,15 @@ define void @small_tc_below_threshold(ptr %a, i32 %offset) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[OFFSET_NONZERO:%.*]] = or i32 [[OFFSET:%.*]], 1
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
-; CHECK-NEXT:    [[TMP0:%.*]] = sext i32 [[OFFSET_NONZERO]] to i64
-; CHECK-NEXT:    [[TMP1:%.*]] = shl nsw i64 [[TMP0]], 1
-; CHECK-NEXT:    [[TMP2:%.*]] = add nsw i64 [[TMP1]], 84
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV:%.*]] = phi i32 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[LSR_IV_NEXT]] = add nsw i32 [[LSR_IV]], 1
 ; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET_NONZERO]]
-; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
-; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[LSR_IV_NEXT]], 2
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret void
 ;

github-actions · 2024-02-01T16:08:34Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 390d66b03b5a5e352b7d696aaed679a2deb25c6c 0d04b0b1ed5d6dfe44b02a8d94329e93861476ab -- llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 6f6d1f9c76..c0be6d2ef6 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -6874,8 +6874,8 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
       continue;
     }
 
-    if (Expander.isHighCostExpansion(TermValueSLocal, L, ExpansionBudget,
-                                     &TTI, InsertPt)) {
+    if (Expander.isHighCostExpansion(TermValueSLocal, L, ExpansionBudget, &TTI,
+                                     InsertPt)) {
       LLVM_DEBUG(
           dbgs() << "Is too expensive to expand terminating value for phi node"
                  << PN << "\n");

nikic

Sounds reasonable, LGTM.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Co-authored-by: Nikita Popov <[email protected]>

) Follow up to llvm#74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <[email protected]>

preames requested review from nikic and topperc February 1, 2024 16:06

preames added the backend:RISC-V label Feb 1, 2024

llvmbot added the llvm:transforms label Feb 1, 2024

nikic approved these changes Feb 2, 2024

View reviewed changes

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp Outdated Show resolved Hide resolved

Update llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

0d04b0b

Co-authored-by: Nikita Popov <[email protected]>

preames merged commit 28865da into llvm:main Feb 2, 2024

preames deleted the pr-lsr-term-fold-budget-by-trip-count branch February 2, 2024 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LSR][term-fold] Adjust expansion budget based on trip count #80304

[LSR][term-fold] Adjust expansion budget based on trip count #80304

Uh oh!

preames commented Feb 1, 2024

Uh oh!

llvmbot commented Feb 1, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Feb 1, 2024 •

edited

Loading

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

Uh oh!

[LSR][term-fold] Adjust expansion budget based on trip count #80304

[LSR][term-fold] Adjust expansion budget based on trip count #80304

Uh oh!

Conversation

preames commented Feb 1, 2024

Uh oh!

llvmbot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Feb 1, 2024 •

edited

Loading

github-actions bot commented Feb 1, 2024 •

edited

Loading