-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LoopUnswitch] Allow i1 truncs in loop unswitch #89738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With the addition of llvm#84628, truncs to i1 are being emitted as conditions to branch instructions. This caused significant regressions in cases which were previously improved by loop unswitch. Adding truncs to i1 restore the previous performance seen.
@llvm/pr-subscribers-llvm-transforms Author: Matthew Devereau (MDevereau) ChangesWith the addition of #84628, truncs to i1 are being emitted as conditions to branch instructions. This caused significant regressions in cases which were previously improved by loop unswitch. Adding truncs to i1 restore the previous performance seen. Full diff: https://github.com/llvm/llvm-project/pull/89738.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 73c5d636782294..e10c5dcbd218aa 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -1930,7 +1930,15 @@ llvm::hasPartialIVCondition(const Loop &L, unsigned MSSAThreshold,
if (!TI || !TI->isConditional())
return {};
- auto *CondI = dyn_cast<CmpInst>(TI->getCondition());
+ Instruction *CondI = nullptr;
+ CondI = dyn_cast<CmpInst>(TI->getCondition());
+
+ if (!CondI) {
+ CondI = dyn_cast<TruncInst>(TI->getCondition());
+ if (CondI && CondI->getType() != Type::getInt1Ty(TI->getContext())) {
+ return {};
+ }
+ }
// The case with the condition outside the loop should already be handled
// earlier.
if (!CondI || !L.contains(CondI))
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll b/llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
index 0d3aa8b243109e..a5ad182ad0b3e0 100644
--- a/llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
@@ -106,3 +106,96 @@ for.inc: ; preds = %for.cond5
store i8 0, ptr @b, align 1
br label %for.cond5
}
+
+define void @e() {
+; CHECK-LABEL: @e(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_COND:%.*]]
+; CHECK: for.cond:
+; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[FOR_COND]]
+; CHECK: for.end:
+; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr null, align 2
+; CHECK-NEXT: [[TMP1:%.*]] = trunc i16 [[TMP0]] to i1
+; CHECK-NEXT: br i1 [[TMP1]], label [[FOR_END_SPLIT:%.*]], label [[FOR_END_SPLIT_US:%.*]]
+; CHECK: for.end.split.us:
+; CHECK-NEXT: br label [[G_US:%.*]]
+; CHECK: g.us:
+; CHECK-NEXT: br label [[G_SPLIT_US6:%.*]]
+; CHECK: for.cond1.us1:
+; CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr null, align 2
+; CHECK-NEXT: [[TOBOOL4_NOT_US:%.*]] = trunc i16 [[TMP2]] to i1
+; CHECK-NEXT: br i1 [[TOBOOL4_NOT_US]], label [[FOR_COND5_PREHEADER_US4:%.*]], label [[G_LOOPEXIT_US:%.*]]
+; CHECK: for.cond5.us2:
+; CHECK-NEXT: br i1 false, label [[FOR_COND1_LOOPEXIT_US5:%.*]], label [[FOR_INC_US3:%.*]]
+; CHECK: for.inc.us3:
+; CHECK-NEXT: store i8 0, ptr @b, align 1
+; CHECK-NEXT: br label [[FOR_COND5_US2:%.*]]
+; CHECK: for.cond5.preheader.us4:
+; CHECK-NEXT: br label [[FOR_COND5_US2]]
+; CHECK: for.cond1.loopexit.us5:
+; CHECK-NEXT: br label [[FOR_COND1_US1:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK: g.loopexit.us:
+; CHECK-NEXT: br label [[G_US]]
+; CHECK: g.split.us6:
+; CHECK-NEXT: br label [[FOR_COND1_US1]]
+; CHECK: for.end.split:
+; CHECK-NEXT: br label [[G:%.*]]
+; CHECK: g.loopexit:
+; CHECK-NEXT: br label [[G]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK: g:
+; CHECK-NEXT: [[TMP3:%.*]] = load i16, ptr null, align 2
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i16 [[TMP3]] to i1
+; CHECK-NEXT: br i1 [[TMP4]], label [[G_SPLIT_US:%.*]], label [[G_SPLIT:%.*]]
+; CHECK: g.split.us:
+; CHECK-NEXT: br label [[FOR_COND1_US:%.*]]
+; CHECK: for.cond1.us:
+; CHECK-NEXT: br label [[FOR_COND5_PREHEADER_US:%.*]]
+; CHECK: for.cond5.us:
+; CHECK-NEXT: br i1 false, label [[FOR_COND1_LOOPEXIT_US:%.*]], label [[FOR_INC_US:%.*]]
+; CHECK: for.inc.us:
+; CHECK-NEXT: store i8 0, ptr @b, align 1
+; CHECK-NEXT: br label [[FOR_COND5_US:%.*]]
+; CHECK: for.cond5.preheader.us:
+; CHECK-NEXT: br label [[FOR_COND5_US]]
+; CHECK: for.cond1.loopexit.us:
+; CHECK-NEXT: br label [[FOR_COND1_US]]
+; CHECK: g.split:
+; CHECK-NEXT: br label [[FOR_COND1:%.*]]
+; CHECK: for.cond1.loopexit:
+; CHECK-NEXT: br label [[FOR_COND1]], !llvm.loop [[LOOP3]]
+; CHECK: for.cond1:
+; CHECK-NEXT: [[TMP5:%.*]] = load i16, ptr null, align 2
+; CHECK-NEXT: [[TOBOOL4_NOT:%.*]] = trunc i16 [[TMP5]] to i1
+; CHECK-NEXT: br i1 [[TOBOOL4_NOT]], label [[FOR_COND5_PREHEADER:%.*]], label [[G_LOOPEXIT:%.*]]
+; CHECK: for.cond5.preheader:
+; CHECK-NEXT: br label [[FOR_COND5:%.*]]
+; CHECK: for.cond5:
+; CHECK-NEXT: br i1 false, label [[FOR_COND1_LOOPEXIT:%.*]], label [[FOR_INC:%.*]]
+; CHECK: for.inc:
+; CHECK-NEXT: store i8 0, ptr @b, align 1
+; CHECK-NEXT: br label [[FOR_COND5]]
+;
+entry:
+ br label %for.cond
+
+for.cond: ; preds = %for.cond, %entry
+ br i1 false, label %for.end, label %for.cond
+
+for.end: ; preds = %for.cond
+ br label %g
+
+g: ; preds = %for.cond1, %for.end
+ br label %for.cond1
+
+for.cond1: ; preds = %for.cond5, %g
+ %0 = load i16, ptr null, align 2
+ %tobool4.not = trunc i16 %0 to i1
+ br i1 %tobool4.not, label %for.cond5, label %g
+
+for.cond5: ; preds = %for.inc, %for.cond1
+ br i1 false, label %for.cond1, label %for.inc
+
+for.inc: ; preds = %for.cond5
+ store i8 0, ptr @b, align 1
+ br label %for.cond5
+}
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll b/llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
index f97e5c3eec9d46..1d8942079ffd81 100644
--- a/llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
@@ -1326,6 +1326,136 @@ exit:
ret i32 10
}
+define i32 @partial_unswitch_true_successor_trunc(ptr %ptr, i32 %N) {
+; CHECK-LABEL: @partial_unswitch_true_successor_trunc(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[PTR:%.*]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 [[TMP0]] to i1
+; CHECK-NEXT: br i1 [[TMP1]], label [[ENTRY_SPLIT_US:%.*]], label [[ENTRY_SPLIT:%.*]]
+; CHECK: entry.split.us:
+; CHECK-NEXT: br label [[LOOP_HEADER_US:%.*]]
+; CHECK: loop.header.us:
+; CHECK-NEXT: [[IV_US:%.*]] = phi i32 [ 0, [[ENTRY_SPLIT_US]] ], [ [[IV_NEXT_US:%.*]], [[LOOP_LATCH_US:%.*]] ]
+; CHECK-NEXT: br label [[NOCLOBBER_US:%.*]]
+; CHECK: noclobber.us:
+; CHECK-NEXT: br label [[LOOP_LATCH_US]]
+; CHECK: loop.latch.us:
+; CHECK-NEXT: [[C_US:%.*]] = icmp ult i32 [[IV_US]], [[N:%.*]]
+; CHECK-NEXT: [[IV_NEXT_US]] = add i32 [[IV_US]], 1
+; CHECK-NEXT: br i1 [[C_US]], label [[LOOP_HEADER_US]], label [[EXIT_SPLIT_US:%.*]]
+; CHECK: exit.split.us:
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: entry.split:
+; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
+; CHECK: loop.header:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY_SPLIT]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
+; CHECK-NEXT: [[LV:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT: [[SC:%.*]] = trunc i32 [[LV]] to i1
+; CHECK-NEXT: br i1 [[SC]], label [[NOCLOBBER:%.*]], label [[CLOBBER:%.*]]
+; CHECK: noclobber:
+; CHECK-NEXT: br label [[LOOP_LATCH]]
+; CHECK: clobber:
+; CHECK-NEXT: call void @clobber()
+; CHECK-NEXT: br label [[LOOP_LATCH]]
+; CHECK: loop.latch:
+; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[IV]], [[N]]
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: br i1 [[C]], label [[LOOP_HEADER]], label [[EXIT_SPLIT:%.*]], !llvm.loop [[LOOP12:![0-9]+]]
+; CHECK: exit.split:
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret i32 10
+;
+entry:
+ br label %loop.header
+
+loop.header:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
+ %lv = load i32, ptr %ptr
+ %sc = trunc i32 %lv to i1
+ br i1 %sc, label %noclobber, label %clobber
+
+noclobber:
+ br label %loop.latch
+
+clobber:
+ call void @clobber()
+ br label %loop.latch
+
+loop.latch:
+ %c = icmp ult i32 %iv, %N
+ %iv.next = add i32 %iv, 1
+ br i1 %c, label %loop.header, label %exit
+
+exit:
+ ret i32 10
+}
+
+define i32 @partial_unswitch_false_successor_trunc(ptr %ptr, i32 %N) {
+; CHECK-LABEL: @partial_unswitch_false_successor_trunc(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[PTR:%.*]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 [[TMP0]] to i1
+; CHECK-NEXT: br i1 [[TMP1]], label [[ENTRY_SPLIT:%.*]], label [[ENTRY_SPLIT_US:%.*]]
+; CHECK: entry.split.us:
+; CHECK-NEXT: br label [[LOOP_HEADER_US:%.*]]
+; CHECK: loop.header.us:
+; CHECK-NEXT: [[IV_US:%.*]] = phi i32 [ 0, [[ENTRY_SPLIT_US]] ], [ [[IV_NEXT_US:%.*]], [[LOOP_LATCH_US:%.*]] ]
+; CHECK-NEXT: br label [[NOCLOBBER_US:%.*]]
+; CHECK: noclobber.us:
+; CHECK-NEXT: br label [[LOOP_LATCH_US]]
+; CHECK: loop.latch.us:
+; CHECK-NEXT: [[C_US:%.*]] = icmp ult i32 [[IV_US]], [[N:%.*]]
+; CHECK-NEXT: [[IV_NEXT_US]] = add i32 [[IV_US]], 1
+; CHECK-NEXT: br i1 [[C_US]], label [[LOOP_HEADER_US]], label [[EXIT_SPLIT_US:%.*]]
+; CHECK: exit.split.us:
+; CHECK-NEXT: br label [[EXIT:%.*]]
+; CHECK: entry.split:
+; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
+; CHECK: loop.header:
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY_SPLIT]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
+; CHECK-NEXT: [[LV:%.*]] = load i32, ptr [[PTR]], align 4
+; CHECK-NEXT: [[SC:%.*]] = trunc i32 [[LV]] to i1
+; CHECK-NEXT: br i1 [[SC]], label [[CLOBBER:%.*]], label [[NOCLOBBER:%.*]]
+; CHECK: clobber:
+; CHECK-NEXT: call void @clobber()
+; CHECK-NEXT: br label [[LOOP_LATCH]]
+; CHECK: noclobber:
+; CHECK-NEXT: br label [[LOOP_LATCH]]
+; CHECK: loop.latch:
+; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[IV]], [[N]]
+; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT: br i1 [[C]], label [[LOOP_HEADER]], label [[EXIT_SPLIT:%.*]], !llvm.loop [[LOOP13:![0-9]+]]
+; CHECK: exit.split:
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret i32 10
+;
+entry:
+ br label %loop.header
+
+loop.header:
+ %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
+ %lv = load i32, ptr %ptr
+ %sc = trunc i32 %lv to i1
+ br i1 %sc, label %clobber, label %noclobber
+
+clobber:
+ call void @clobber()
+ br label %loop.latch
+
+noclobber:
+ br label %loop.latch
+
+loop.latch:
+ %c = icmp ult i32 %iv, %N
+ %iv.next = add i32 %iv, 1
+ br i1 %c, label %loop.header, label %exit
+
+exit:
+ ret i32 10
+}
+
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[UNSWITCH_PARTIAL_DISABLE:![0-9]+]]}
; CHECK: [[UNSWITCH_PARTIAL_DISABLE]] = !{!"llvm.loop.unswitch.partial.disable"}
; CHECK: [[LOOP2]] = distinct !{[[LOOP2]], [[UNSWITCH_PARTIAL_DISABLE]]}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice fix @MDevereau! I just had a couple of minor comments, but other than that it looks reasonable to me.
@@ -1930,7 +1930,15 @@ llvm::hasPartialIVCondition(const Loop &L, unsigned MSSAThreshold, | |||
if (!TI || !TI->isConditional()) | |||
return {}; | |||
|
|||
auto *CondI = dyn_cast<CmpInst>(TI->getCondition()); | |||
Instruction *CondI = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can write this more simply as:
Instruction *CondI = dyn_cast<CmpInst>(TI->getCondition());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
if (!CondI) { | ||
CondI = dyn_cast<TruncInst>(TI->getCondition()); | ||
if (CondI && CondI->getType() != Type::getInt1Ty(TI->getContext())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the {} braces around the if block here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
✅ With the latest revision this PR passed the C/C++ code formatter. |
br label %for.cond1 | ||
|
||
for.cond1: ; preds = %for.cond5, %g | ||
%0 = load i16, ptr null, align 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid loading from null as it is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I've changed this function to take a ptr %p parameter and have used that instead.
@@ -1930,7 +1930,13 @@ llvm::hasPartialIVCondition(const Loop &L, unsigned MSSAThreshold, | |||
if (!TI || !TI->isConditional()) | |||
return {}; | |||
|
|||
auto *CondI = dyn_cast<CmpInst>(TI->getCondition()); | |||
Instruction *CondI = dyn_cast<CmpInst>(TI->getCondition()); | |||
if (!CondI){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like this is mis-formatted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Instruction *CondI = dyn_cast<CmpInst>(TI->getCondition()); | ||
if (!CondI) { | ||
CondI = dyn_cast<TruncInst>(TI->getCondition()); | ||
if (CondI && CondI->getType() != Type::getInt1Ty(TI->getContext())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CondI
must have type i1 here as it is used as condition of the branch, so no check should be necessary?
Maybe simpler to have
auto * CondI = dyn_cast<Instruction>(TI->getCondition());
if (!CondI || !isa<CmpInst, TruncInst>(CondI))
return {}
Would be good to also document the restriction to compares & truncs. Perhaps it would be best to remove the restriction if there is no strong reason to retain it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a good point that it may be best to remove the restrictions, but I removed them and they broke quite a few tests, such as nested cases which shouldn't be unswitched and cases with phi nodes as the condition. Given this patch is a response to a specific regression I'd prefer to keep it simple if that's alright.
As for the CondI suggestion I've put that in and it looks cleaner - thanks. I've also added a short comment above to describe why it's just limited to Cmp and Trunc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This patch is music to my ears. :)
With the addition of #84628, truncs to i1 are being emitted as conditions to branch instructions. This caused significant regressions in cases which were previously improved by loop unswitch. Adding truncs to i1 restore the previous performance seen.