-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LoopPeeling] Fix weights updating of peeled off branches #70094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Aleksandr Popov (aleks-tmb) ChangesIn https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies. It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:
E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be: https://godbolt.org/z/93KnoEsT6 So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate. Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300) Full diff: https://github.com/llvm/llvm-project/pull/70094.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index 31f065b691f864c..0681812b2c64e62 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -636,9 +636,10 @@ static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
MDB.createBranchWeights(Info.Weights));
for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
if (SubWeight != 0)
- Info.Weights[Idx] = Info.Weights[Idx] > SubWeight
- ? Info.Weights[Idx] - SubWeight
- : 1;
+ Info.Weights[Idx] =
+ Info.Weights[Idx] > SubWeight
+ ? std::max(Info.Weights[Idx] - SubWeight, SubWeight)
+ : SubWeight;
}
/// Initialize the weights for all exiting blocks.
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index a0dc216133fb7d5..d91cb5bab382782 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -21,7 +21,7 @@
; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !18
; CHECK: [[NEXT2]]:
; CHECK: br i1 %c, label %{{.*}}, label %side_exit.loopexit, !prof !15
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !19
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
define i32 @basic(ptr %p, i32 %k, i1 %c) #0 !prof !15 {
entry:
@@ -85,6 +85,5 @@ attributes #1 = { nounwind optsize }
; This is a weights of latch and its copies.
;CHECK: !16 = !{!"branch_weights", i32 3001, i32 1001}
;CHECK: !17 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !19 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !18 = !{!"branch_weights", i32 1001, i32 1001}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
index cadb6739dbc3fbb..15dce234baee91d 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo.ll
@@ -24,7 +24,7 @@
; CHECK: [[NEXT1]]:
; CHECK: br i1 %{{.*}}, label %[[NEXT2:.*]], label %for.cond.for.end_crit_edge, !prof !17
; CHECK: [[NEXT2]]:
-; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !18
+; CHECK: br i1 %{{.*}}, label %for.body, label %{{.*}}, !prof !17
define void @basic(ptr %p, i32 %k) #0 !prof !15 {
entry:
@@ -105,6 +105,5 @@ attributes #1 = { nounwind optsize }
;CHECK: !15 = !{!"branch_weights", i32 3001, i32 1001}
;CHECK: !16 = !{!"branch_weights", i32 2000, i32 1001}
-;CHECK: !17 = !{!"branch_weights", i32 999, i32 1001}
-;CHECK: !18 = !{!"branch_weights", i32 1, i32 1001}
+;CHECK: !17 = !{!"branch_weights", i32 1001, i32 1001}
|
06b6a94
to
46a9bdf
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
46a9bdf
to
4265cc4
Compare
@MatzeB Hi, could you please take a look. Described issue manifested after your patch:
Actually your patch added branch weights scaling. So imagine a loop with scaled latch weights ratio from 1:2 to 127:254 one. If we decide to peel 3 iterations of such loop, initial 1:2 ratio will be updated as:
And with initial 127:254:
So the hotness of the mail loop header changed from neutral 1:1 to 127:1, making it almost unreachable. In my patch I tried to make it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a sensible backstop for slightly incorrect profile data leading to extreme branch weights. Added some nitpicks, either way LGTM
In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies. It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where: - F is a weight of the edge from latch to header. - E is a weight of the edge from latch to exit. - I is a number of peeling iteration. E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,1) https://godbolt.org/z/93KnoEsT6 So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate. Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable. Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,100)
c7c5a01
to
7d7299d
Compare
In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies.
It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where:
E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)
https://godbolt.org/z/93KnoEsT6
So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate.
Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable.
Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)