-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesSimplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if
For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future. Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally) https://alive2.llvm.org/ce/z/_BtyGy Full diff: https://github.com/llvm/llvm-project/pull/142599.diff 5 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 56cdfabccb66f..453aa10ce82b0 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1793,6 +1793,18 @@ const SCEV *ScalarEvolution::getZeroExtendExprImpl(const SCEV *Op, Type *Ty,
return getAddExpr(Ops, SCEV::FlagNUW, Depth + 1);
}
+ const SCEVConstant *C;
+ const SCEV *A;
+ // zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0
+ // and A >=s V.
+ if (SA->hasNoSignedWrap() && isKnownNonNegative(SA) &&
+ match(SA, m_scev_Add(m_SCEVConstant(C), m_SCEV(A))) &&
+ isKnownPredicate(CmpInst::ICMP_SGE, A, C)) {
+ SmallVector<const SCEV *, 4> Ops = {getSignExtendExpr(C, Ty, Depth + 1),
+ getZeroExtendExpr(A, Ty, Depth + 1)};
+ return getAddExpr(Ops, SCEV::FlagNSW, Depth + 1);
+ }
+
// zext(C + x + y + ...) --> (zext(D) + zext((C - D) + x + y + ...))
// if D + (C - D + x + y + ...) could be proven to not unsigned wrap
// where D maximizes the number of trailing zeros of (C - D + x + y + ...)
diff --git a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
index 9bf2427eddb9c..1a04b0c72cf2c 100644
--- a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
+++ b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
@@ -1231,7 +1231,7 @@ define void @optimized_range_check_unsigned3(ptr %pred, i1 %c) {
; CHECK-NEXT: %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
; CHECK-NEXT: --> {0,+,1}<nuw><nsw><%loop> U: [0,3) S: [0,3) Exits: (-1 + %N)<nsw> LoopDispositions: { %loop: Computable }
; CHECK-NEXT: %gep = getelementptr inbounds i16, ptr %pred, i32 %iv
-; CHECK-NEXT: --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: ((2 * (zext i32 (-1 + %N)<nsw> to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
+; CHECK-NEXT: --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: (-2 + (2 * (zext i32 %N to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
; CHECK-NEXT: %iv.next = add nuw nsw i32 %iv, 1
; CHECK-NEXT: --> {1,+,1}<nuw><nsw><%loop> U: [1,4) S: [1,4) Exits: %N LoopDispositions: { %loop: Computable }
; CHECK-NEXT: Determining loop execution counts for: @optimized_range_check_unsigned3
diff --git a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
index d24f9a4e40e38..17921afc5ff06 100644
--- a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
+++ b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
@@ -15,11 +15,9 @@ define void @_Z3fn1v() {
; CHECK-NEXT: [[J_SROA_0_0_COPYLOAD:%.*]] = load i8, ptr [[X5]], align 1
; CHECK-NEXT: br label [[DOTPREHEADER4_LR_PH:%.*]]
; CHECK: .preheader4.lr.ph:
-; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[X4]], -1
-; CHECK-NEXT: [[TMP2:%.*]] = zext nneg i32 [[TMP1]] to i64
-; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
; CHECK-NEXT: [[TMP4:%.*]] = sext i8 [[J_SROA_0_0_COPYLOAD]] to i64
-; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP3]], [[TMP4]]
+; CHECK-NEXT: [[TMP2:%.*]] = zext nneg i32 [[X4]] to i64
+; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], [[TMP2]]
; CHECK-NEXT: br label [[DOTPREHEADER4:%.*]]
; CHECK: .preheader4:
; CHECK-NEXT: [[K_09:%.*]] = phi ptr [ undef, [[DOTPREHEADER4_LR_PH]] ], [ [[X25:%.*]], [[X22:%.*]] ]
diff --git a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
index ea2cfe74be264..feef268bc7412 100644
--- a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
+++ b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
@@ -15,11 +15,11 @@ define void @test(ptr %ptr) {
; CHECK: for.body.preheader:
; CHECK-NEXT: [[LIM_0:%.*]] = phi i32 [ 65, [[ENTRY:%.*]] ], [ 1, [[DEAD:%.*]] ]
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 8
-; CHECK-NEXT: [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[LIM_0]], i32 2)
-; CHECK-NEXT: [[TMP0:%.*]] = add nsw i32 [[UMAX]], -1
-; CHECK-NEXT: [[TMP1:%.*]] = zext nneg i32 [[TMP0]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = zext nneg i32 [[LIM_0]] to i64
+; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP0]], i64 2)
; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[TMP1]], 3
-; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP2]], i1 false)
+; CHECK-NEXT: [[TMP3:%.*]] = add nsw i64 [[TMP2]], -8
+; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP3]], i1 false)
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/reduction.ll b/llvm/test/Transforms/LoopVectorize/reduction.ll
index 757be041afbb5..af6aa9373b3cb 100644
--- a/llvm/test/Transforms/LoopVectorize/reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/reduction.ll
@@ -1199,13 +1199,13 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT: [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT: [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT: [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
-; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT: [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT: [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
; CHECK-NEXT: [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
@@ -1222,7 +1222,7 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]
@@ -1277,13 +1277,13 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT: [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT: [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT: [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
-; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT: [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT: [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
; CHECK-NEXT: [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
@@ -1300,7 +1300,7 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]
|
const SCEVConstant *C; | ||
const SCEV *A; | ||
// zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0 | ||
// and A >=s V. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// and A >=s V. | |
// and A >=s C. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, wouldn't the more natural fold here be something like https://alive2.llvm.org/ce/z/RF9XaY? For the case where A >= 0 the sext would become a zext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks. I was originally a bit worried that replacing zext with 2 inner sext may make things worse, but it should probably be fine.
Still need to check if there's a test case for the 2 sext case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some changes with the generalization: 7f8f937
- in
@add_nsw_zext_fold_results_in_sext
we have a in more complex expansion - in
@fold_add_zext_to_sext
we miss some re-use during expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core issue here seems to be that pushing sext through add nsw is a non-reversible transform. You can convert sext of add nsw to add nsw of sext, but you generally can't go from add nsw of sext to sext of add.
But given that SCEV does that in general, it probably makes sense to still do it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, so should we go with the current version then straight away or first land the restricted version which is less likely to intoduce any regressions?
dc4287a
to
dcf279a
Compare
Add extra test coverage for #142599.
Add extra test coverage for llvm/llvm-project#142599.
Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if * zext (C + A)<nsw> >=s 0 and * A >=s V. For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future. Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally) https://alive2.llvm.org/ce/z/_BtyGy
Ah interesting. Let me see if I can pin down where this is coming from. |
Add extra test coverage for llvm#142599.
Hmm, the impact in Clang seems down to the extra work constructing and reasoning about the newly created expression (or additional transforms), not due to the additional checks whether the transform is valid. Just doing the analysis w/o construting the new SCEV completely removes the compile-time impact for the clang build: http://llvm-compile-time-tracker.com/compare.php?from=e2639eefaabdfc06adad1a4458b6900d9838e64f&to=011614dacd9e201587843351dd54d1342ad0d622&stat=instructions:u Not sure what the best next steps would be to get this wrapped up |
Add extra test coverage for llvm#142599.
Simplify zext(C+A) -> (sext(C) + zext(A)) if
For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future.
Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally)
https://alive2.llvm.org/ce/z/_BtyGy