Skip to content

[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jun 3, 2025

Simplify zext(C+A) -> (sext(C) + zext(A)) if

  • zext (C + A) >=s 0 and
  • A >=s V.

For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future.

Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally)

https://alive2.llvm.org/ce/z/_BtyGy

@fhahn fhahn requested review from preames and efriedma-quic June 3, 2025 13:08
@fhahn fhahn requested a review from nikic as a code owner June 3, 2025 13:08
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 3, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if

  • zext (C + A)<nsw> >=s 0 and
  • A >=s V.

For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future.

Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally)

https://alive2.llvm.org/ce/z/_BtyGy


Full diff: https://github.com/llvm/llvm-project/pull/142599.diff

5 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+12)
  • (modified) llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll (+1-1)
  • (modified) llvm/test/Transforms/IndVarSimplify/zext-nuw.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+10-10)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 56cdfabccb66f..453aa10ce82b0 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1793,6 +1793,18 @@ const SCEV *ScalarEvolution::getZeroExtendExprImpl(const SCEV *Op, Type *Ty,
       return getAddExpr(Ops, SCEV::FlagNUW, Depth + 1);
     }
 
+    const SCEVConstant *C;
+    const SCEV *A;
+    // zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0
+    // and A >=s V.
+    if (SA->hasNoSignedWrap() && isKnownNonNegative(SA) &&
+        match(SA, m_scev_Add(m_SCEVConstant(C), m_SCEV(A))) &&
+        isKnownPredicate(CmpInst::ICMP_SGE, A, C)) {
+      SmallVector<const SCEV *, 4> Ops = {getSignExtendExpr(C, Ty, Depth + 1),
+                                          getZeroExtendExpr(A, Ty, Depth + 1)};
+      return getAddExpr(Ops, SCEV::FlagNSW, Depth + 1);
+    }
+
     // zext(C + x + y + ...) --> (zext(D) + zext((C - D) + x + y + ...))
     // if D + (C - D + x + y + ...) could be proven to not unsigned wrap
     // where D maximizes the number of trailing zeros of (C - D + x + y + ...)
diff --git a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
index 9bf2427eddb9c..1a04b0c72cf2c 100644
--- a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
+++ b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
@@ -1231,7 +1231,7 @@ define void @optimized_range_check_unsigned3(ptr %pred, i1 %c) {
 ; CHECK-NEXT:    %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
 ; CHECK-NEXT:    --> {0,+,1}<nuw><nsw><%loop> U: [0,3) S: [0,3) Exits: (-1 + %N)<nsw> LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:    %gep = getelementptr inbounds i16, ptr %pred, i32 %iv
-; CHECK-NEXT:    --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: ((2 * (zext i32 (-1 + %N)<nsw> to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
+; CHECK-NEXT:    --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: (-2 + (2 * (zext i32 %N to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:    %iv.next = add nuw nsw i32 %iv, 1
 ; CHECK-NEXT:    --> {1,+,1}<nuw><nsw><%loop> U: [1,4) S: [1,4) Exits: %N LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:  Determining loop execution counts for: @optimized_range_check_unsigned3
diff --git a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
index d24f9a4e40e38..17921afc5ff06 100644
--- a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
+++ b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
@@ -15,11 +15,9 @@ define void @_Z3fn1v() {
 ; CHECK-NEXT:    [[J_SROA_0_0_COPYLOAD:%.*]] = load i8, ptr [[X5]], align 1
 ; CHECK-NEXT:    br label [[DOTPREHEADER4_LR_PH:%.*]]
 ; CHECK:       .preheader4.lr.ph:
-; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[X4]], -1
-; CHECK-NEXT:    [[TMP2:%.*]] = zext nneg i32 [[TMP1]] to i64
-; CHECK-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
 ; CHECK-NEXT:    [[TMP4:%.*]] = sext i8 [[J_SROA_0_0_COPYLOAD]] to i64
-; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP3]], [[TMP4]]
+; CHECK-NEXT:    [[TMP2:%.*]] = zext nneg i32 [[X4]] to i64
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], [[TMP2]]
 ; CHECK-NEXT:    br label [[DOTPREHEADER4:%.*]]
 ; CHECK:       .preheader4:
 ; CHECK-NEXT:    [[K_09:%.*]] = phi ptr [ undef, [[DOTPREHEADER4_LR_PH]] ], [ [[X25:%.*]], [[X22:%.*]] ]
diff --git a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
index ea2cfe74be264..feef268bc7412 100644
--- a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
+++ b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
@@ -15,11 +15,11 @@ define void @test(ptr %ptr) {
 ; CHECK:       for.body.preheader:
 ; CHECK-NEXT:    [[LIM_0:%.*]] = phi i32 [ 65, [[ENTRY:%.*]] ], [ 1, [[DEAD:%.*]] ]
 ; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 8
-; CHECK-NEXT:    [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[LIM_0]], i32 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i32 [[UMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i32 [[TMP0]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i32 [[LIM_0]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP0]], i64 2)
 ; CHECK-NEXT:    [[TMP2:%.*]] = shl nuw nsw i64 [[TMP1]], 3
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP2]], i1 false)
+; CHECK-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP2]], -8
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP3]], i1 false)
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/reduction.ll b/llvm/test/Transforms/LoopVectorize/reduction.ll
index 757be041afbb5..af6aa9373b3cb 100644
--- a/llvm/test/Transforms/LoopVectorize/reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/reduction.ll
@@ -1199,13 +1199,13 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
 ; CHECK-NEXT:    [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -1222,7 +1222,7 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-NEXT:    br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]
@@ -1277,13 +1277,13 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
 ; CHECK-NEXT:    [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -1300,7 +1300,7 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-NEXT:    br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]

const SCEVConstant *C;
const SCEV *A;
// zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0
// and A >=s V.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// and A >=s V.
// and A >=s C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, wouldn't the more natural fold here be something like https://alive2.llvm.org/ce/z/RF9XaY? For the case where A >= 0 the sext would become a zext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks. I was originally a bit worried that replacing zext with 2 inner sext may make things worse, but it should probably be fine.

Still need to check if there's a test case for the 2 sext case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some changes with the generalization: 7f8f937

  • in @add_nsw_zext_fold_results_in_sext we have a in more complex expansion
  • in @fold_add_zext_to_sext we miss some re-use during expansion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core issue here seems to be that pushing sext through add nsw is a non-reversible transform. You can convert sext of add nsw to add nsw of sext, but you generally can't go from add nsw of sext to sext of add.

But given that SCEV does that in general, it probably makes sense to still do it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so should we go with the current version then straight away or first land the restricted version which is less likely to intoduce any regressions?

@fhahn fhahn force-pushed the scev-zext-add-nsw branch 2 times, most recently from dc4287a to dcf279a Compare June 3, 2025 15:41
fhahn added a commit that referenced this pull request Jun 3, 2025
@fhahn fhahn force-pushed the scev-zext-add-nsw branch from dcf279a to 7f8f937 Compare June 3, 2025 21:06
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 3, 2025
fhahn added 4 commits June 5, 2025 19:26
Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if
 * zext (C + A)<nsw> >=s 0 and
 * A >=s V.

For now this is limited to cases where the first operand is a constant,
so the SExt can be folded to a new constant. This can be relaxed in the
future.

Alive2 proof of the general pattern and the test changes in zext-nuw.ll
(times out in the online instance but verifies locally)

https://alive2.llvm.org/ce/z/_BtyGy
@fhahn fhahn force-pushed the scev-zext-add-nsw branch from 7f8f937 to 842564e Compare June 5, 2025 19:10
@fhahn
Copy link
Contributor Author

fhahn commented Jun 5, 2025

rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
@fhahn
Copy link
Contributor Author

fhahn commented Jun 11, 2025

Hmm, the impact in Clang seems down to the extra work constructing and reasoning about the newly created expression (or additional transforms), not due to the additional checks whether the transform is valid. Just doing the analysis w/o construting the new SCEV completely removes the compile-time impact for the clang build: http://llvm-compile-time-tracker.com/compare.php?from=e2639eefaabdfc06adad1a4458b6900d9838e64f&to=011614dacd9e201587843351dd54d1342ad0d622&stat=instructions:u

Not sure what the best next steps would be to get this wrapped up

DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants