Skip to content

[LoopPeeling] Fix weights updating of peeled off branches #70094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 31, 2023

Conversation

aleks-tmb
Copy link
Contributor

@aleks-tmb aleks-tmb commented Oct 24, 2023

In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:

  • F is a weight of the edge from latch to header.
  • E is a weight of the edge from latch to exit.
  • I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate.
Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)

@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Aleksandr Popov (aleks-tmb)

Changes

In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:

  • F is a weight of the edge from latch to header.
  • E is a weight of the edge from latch to exit.
  • I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate.
Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)


Full diff: https://github.com/llvm/llvm-project/pull/70094.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+4-3)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll (+2-3)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll (+2-3)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 31f065b691f864c..0681812b2c64e62 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -636,9 +636,10 @@ static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
                     MDB.createBranchWeights(Info.Weights));
   for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
     if (SubWeight != 0)
-      Info.Weights[Idx] = Info.Weights[Idx] > SubWeight
-                              ? Info.Weights[Idx] - SubWeight
-                              : 1;
+      Info.Weights[Idx] =
+          Info.Weights[Idx] > SubWeight
+              ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
+              : SubWeight;
 }
 
 /// Initialize the weights for all exiting blocks.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index a0dc216133fb7d5..d91cb5bab382782 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -21,7 +21,7 @@
 ; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !18
 ; CHECK: [[NEXT2]]:
 ; CHECK: br i1 %c, label %{{.*}}, label %side_exit.loopexit, !prof !15
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !19
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
 
 define i32 @basic(ptr %p, i32 %k, i1 %c) #0 !prof !15 {
 entry:
@@ -85,6 +85,5 @@ attributes #1 = { nounwind optsize }
 ; This is a weights of latch and its copies.
 ;CHECK: !16 = !{!"branch_weights", i32 3001, i32 1001}
 ;CHECK: !17 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !19 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !18 = !{!"branch_weights", i32 1001, i32 1001}
 
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
index cadb6739dbc3fbb..15dce234baee91d 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
@@ -24,7 +24,7 @@
 ; CHECK: [[NEXT1]]:
 ; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !17
 ; CHECK: [[NEXT2]]:
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !17
 
 define void @basic(ptr %p, i32 %k) #0 !prof !15 {
 entry:
@@ -105,6 +105,5 @@ attributes #1 = { nounwind optsize }
 
 ;CHECK: !15 = !{!"branch_weights", i32 3001, i32 1001}
 ;CHECK: !16 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !17 = !{!"branch_weights", i32 1001, i32 1001}
 

@github-actions
Copy link

github-actions bot commented Oct 24, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@aleks-tmb aleks-tmb force-pushed the loop-peeling-branch-weights branch from 46a9bdf to 4265cc4 Compare October 25, 2023 10:36
@nikic nikic requested a review from MatzeB October 26, 2023 09:29
@aleks-tmb
Copy link
Contributor Author

aleks-tmb commented Oct 30, 2023

@MatzeB Hi, could you please take a look. Described issue manifested after your patch:

SHA: b30c9c937802a78ef986cb4219eba51148f76e6c
Author: Matthias Braun
Date: 2023-09-11 14:23:29 -0700 -0700
LoopUnrollRuntime: Add weights to all branches

Actually your patch added branch weights scaling. So imagine a loop with scaled latch weights ratio from 1:2 to 127:254 one. If we decide to peel 3 iterations of such loop, initial 1:2 ratio will be updated as:

0th peeled latch: 1:2
1st peeled latch: 1:1 
2nd peeled latch: 1:1 

And with initial 127:254:

0th peeled latch: 127:254
1st peeled latch: 127:127
2nd peeled latch: 127:1

So the hotness of the mail loop header changed from neutral 1:1 to 127:1, making it almost unreachable.

In my patch I tried to make it:

0th peeled latch: 127:254
1st peeled latch: 127:127
2nd peeled latch: 127:127

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a sensible backstop for slightly incorrect profile data leading to extreme branch weights. Added some nitpicks, either way LGTM

In https://reviews.llvm.org/D64235 a new algorithm has been introduced
for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next
peel iteration, calculating weights by (F - I * E, E), where:
- F is a weight of the edge from latch to header.
- E is a weight of the edge from latch to exit.
- I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated
trip count is 4. If we peel off all 4 iterations the weights of the
copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy
according to the profile data. But that's only true if the profiling
data is accurate.
Underestimated trip count can lead to a performance issues with the
register allocator, which may decide to spill intervals inside the loop
assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better
to set neutral 1/1 weights on the last peeled latch branch. After this
change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)
@aleks-tmb aleks-tmb force-pushed the loop-peeling-branch-weights branch from c7c5a01 to 7d7299d Compare October 31, 2023 11:26
@aleks-tmb aleks-tmb merged commit e8d5db2 into llvm:main Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants