[LoopPeeling] Fix weights updating of peeled off branches #70094

aleks-tmb · 2023-10-24T19:06:14Z

In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:

F is a weight of the edge from latch to header.
E is a weight of the edge from latch to exit.
I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate.
Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)

llvmbot · 2023-10-24T19:07:23Z

@llvm/pr-subscribers-llvm-transforms

Author: Aleksandr Popov (aleks-tmb)

Changes

In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:

F is a weight of the edge from latch to header.
E is a weight of the edge from latch to exit.
I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate.
Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)

Full diff: https://github.com/llvm/llvm-project/pull/70094.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+4-3)
(modified) llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll (+2-3)
(modified) llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll (+2-3)

diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 31f065b691f864c..0681812b2c64e62 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -636,9 +636,10 @@ static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
                     MDB.createBranchWeights(Info.Weights));
   for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
     if (SubWeight != 0)
-      Info.Weights[Idx] = Info.Weights[Idx] > SubWeight
-                              ? Info.Weights[Idx] - SubWeight
-                              : 1;
+      Info.Weights[Idx] =
+          Info.Weights[Idx] > SubWeight
+              ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
+              : SubWeight;
 }
 
 /// Initialize the weights for all exiting blocks.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index a0dc216133fb7d5..d91cb5bab382782 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -21,7 +21,7 @@
 ; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !18
 ; CHECK: [[NEXT2]]:
 ; CHECK: br i1 %c, label %{{.*}}, label %side_exit.loopexit, !prof !15
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !19
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
 
 define i32 @basic(ptr %p, i32 %k, i1 %c) #0 !prof !15 {
 entry:
@@ -85,6 +85,5 @@ attributes #1 = { nounwind optsize }
 ; This is a weights of latch and its copies.
 ;CHECK: !16 = !{!"branch_weights", i32 3001, i32 1001}
 ;CHECK: !17 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !19 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !18 = !{!"branch_weights", i32 1001, i32 1001}
 
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
index cadb6739dbc3fbb..15dce234baee91d 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
@@ -24,7 +24,7 @@
 ; CHECK: [[NEXT1]]:
 ; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !17
 ; CHECK: [[NEXT2]]:
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !17
 
 define void @basic(ptr %p, i32 %k) #0 !prof !15 {
 entry:
@@ -105,6 +105,5 @@ attributes #1 = { nounwind optsize }
 
 ;CHECK: !15 = !{!"branch_weights", i32 3001, i32 1001}
 ;CHECK: !16 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !17 = !{!"branch_weights", i32 1001, i32 1001}

github-actions · 2023-10-24T19:31:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

aleks-tmb · 2023-10-30T17:42:10Z

@MatzeB Hi, could you please take a look. Described issue manifested after your patch:

SHA: b30c9c937802a78ef986cb4219eba51148f76e6c
Author: Matthias Braun
Date: 2023-09-11 14:23:29 -0700 -0700
LoopUnrollRuntime: Add weights to all branches

Actually your patch added branch weights scaling. So imagine a loop with scaled latch weights ratio from 1:2 to 127:254 one. If we decide to peel 3 iterations of such loop, initial 1:2 ratio will be updated as:

0th peeled latch: 1:2
1st peeled latch: 1:1 
2nd peeled latch: 1:1

And with initial 127:254:

0th peeled latch: 127:254
1st peeled latch: 127:127
2nd peeled latch: 127:1

So the hotness of the mail loop header changed from neutral 1:1 to 127:1, making it almost unreachable.

In my patch I tried to make it:

0th peeled latch: 127:254
1st peeled latch: 127:127
2nd peeled latch: 127:127

MatzeB

Seems like a sensible backstop for slightly incorrect profile data leading to extreme branch weights. Added some nitpicks, either way LGTM

llvm/lib/Transforms/Utils/LoopPeel.cpp

In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies. It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where: - F is a weight of the edge from latch to header. - E is a weight of the edge from latch to exit. - I is a number of peeling iteration. E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,1) https://godbolt.org/z/93KnoEsT6 So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate. Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable. Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,100)

llvmbot added the llvm:transforms label Oct 24, 2023

aleks-tmb force-pushed the loop-peeling-branch-weights branch from 06b6a94 to 46a9bdf Compare October 24, 2023 19:21

aleks-tmb requested review from aeubanks, nikic, DaniilSuchkov, annamthomas and arpilipe October 24, 2023 19:22

aleks-tmb force-pushed the loop-peeling-branch-weights branch from 46a9bdf to 4265cc4 Compare October 25, 2023 10:36

nikic requested a review from MatzeB October 26, 2023 09:29

MatzeB approved these changes Oct 30, 2023

View reviewed changes

llvm/lib/Transforms/Utils/LoopPeel.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Utils/LoopPeel.cpp Show resolved Hide resolved

aleks-tmb force-pushed the loop-peeling-branch-weights branch from c7c5a01 to 7d7299d Compare October 31, 2023 11:26

aleks-tmb merged commit e8d5db2 into llvm:main Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopPeeling] Fix weights updating of peeled off branches #70094

[LoopPeeling] Fix weights updating of peeled off branches #70094

Uh oh!

aleks-tmb commented Oct 24, 2023 •

edited

Loading

Uh oh!

llvmbot commented Oct 24, 2023

Uh oh!

github-actions bot commented Oct 24, 2023 •

edited

Loading

Uh oh!

aleks-tmb commented Oct 30, 2023 •

edited

Loading

Uh oh!

MatzeB left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[LoopPeeling] Fix weights updating of peeled off branches #70094

[LoopPeeling] Fix weights updating of peeled off branches #70094

Uh oh!

Conversation

aleks-tmb commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 24, 2023

Uh oh!

github-actions bot commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aleks-tmb commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatzeB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aleks-tmb commented Oct 24, 2023 •

edited

Loading

github-actions bot commented Oct 24, 2023 •

edited

Loading

aleks-tmb commented Oct 30, 2023 •

edited

Loading