[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

fhahn · 2025-06-03T13:08:06Z

Simplify zext(C+A) -> (sext(C) + zext(A)) if

zext (C + A) >=s 0 and
A >=s V.

For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future.

Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally)

https://alive2.llvm.org/ce/z/_BtyGy

llvmbot · 2025-06-03T13:08:43Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if

zext (C + A)<nsw> >=s 0 and
A >=s V.

For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future.

Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally)

https://alive2.llvm.org/ce/z/_BtyGy

Full diff: https://github.com/llvm/llvm-project/pull/142599.diff

5 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+12)
(modified) llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll (+1-1)
(modified) llvm/test/Transforms/IndVarSimplify/zext-nuw.ll (+2-4)
(modified) llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+10-10)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 56cdfabccb66f..453aa10ce82b0 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -1793,6 +1793,18 @@ const SCEV *ScalarEvolution::getZeroExtendExprImpl(const SCEV *Op, Type *Ty,
       return getAddExpr(Ops, SCEV::FlagNUW, Depth + 1);
     }
 
+    const SCEVConstant *C;
+    const SCEV *A;
+    // zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0
+    // and A >=s V.
+    if (SA->hasNoSignedWrap() && isKnownNonNegative(SA) &&
+        match(SA, m_scev_Add(m_SCEVConstant(C), m_SCEV(A))) &&
+        isKnownPredicate(CmpInst::ICMP_SGE, A, C)) {
+      SmallVector<const SCEV *, 4> Ops = {getSignExtendExpr(C, Ty, Depth + 1),
+                                          getZeroExtendExpr(A, Ty, Depth + 1)};
+      return getAddExpr(Ops, SCEV::FlagNSW, Depth + 1);
+    }
+
     // zext(C + x + y + ...) --> (zext(D) + zext((C - D) + x + y + ...))
     // if D + (C - D + x + y + ...) could be proven to not unsigned wrap
     // where D maximizes the number of trailing zeros of (C - D + x + y + ...)
diff --git a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
index 9bf2427eddb9c..1a04b0c72cf2c 100644
--- a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
+++ b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
@@ -1231,7 +1231,7 @@ define void @optimized_range_check_unsigned3(ptr %pred, i1 %c) {
 ; CHECK-NEXT:    %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
 ; CHECK-NEXT:    --> {0,+,1}<nuw><nsw><%loop> U: [0,3) S: [0,3) Exits: (-1 + %N)<nsw> LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:    %gep = getelementptr inbounds i16, ptr %pred, i32 %iv
-; CHECK-NEXT:    --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: ((2 * (zext i32 (-1 + %N)<nsw> to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
+; CHECK-NEXT:    --> {%pred,+,2}<nuw><%loop> U: full-set S: full-set Exits: (-2 + (2 * (zext i32 %N to i64))<nuw><nsw> + %pred) LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:    %iv.next = add nuw nsw i32 %iv, 1
 ; CHECK-NEXT:    --> {1,+,1}<nuw><nsw><%loop> U: [1,4) S: [1,4) Exits: %N LoopDispositions: { %loop: Computable }
 ; CHECK-NEXT:  Determining loop execution counts for: @optimized_range_check_unsigned3
diff --git a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
index d24f9a4e40e38..17921afc5ff06 100644
--- a/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
+++ b/llvm/test/Transforms/IndVarSimplify/zext-nuw.ll
@@ -15,11 +15,9 @@ define void @_Z3fn1v() {
 ; CHECK-NEXT:    [[J_SROA_0_0_COPYLOAD:%.*]] = load i8, ptr [[X5]], align 1
 ; CHECK-NEXT:    br label [[DOTPREHEADER4_LR_PH:%.*]]
 ; CHECK:       .preheader4.lr.ph:
-; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[X4]], -1
-; CHECK-NEXT:    [[TMP2:%.*]] = zext nneg i32 [[TMP1]] to i64
-; CHECK-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
 ; CHECK-NEXT:    [[TMP4:%.*]] = sext i8 [[J_SROA_0_0_COPYLOAD]] to i64
-; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP3]], [[TMP4]]
+; CHECK-NEXT:    [[TMP2:%.*]] = zext nneg i32 [[X4]] to i64
+; CHECK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], [[TMP2]]
 ; CHECK-NEXT:    br label [[DOTPREHEADER4:%.*]]
 ; CHECK:       .preheader4:
 ; CHECK-NEXT:    [[K_09:%.*]] = phi ptr [ undef, [[DOTPREHEADER4_LR_PH]] ], [ [[X25:%.*]], [[X22:%.*]] ]
diff --git a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
index ea2cfe74be264..feef268bc7412 100644
--- a/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
+++ b/llvm/test/Transforms/LoopIdiom/X86/memset-size-compute.ll
@@ -15,11 +15,11 @@ define void @test(ptr %ptr) {
 ; CHECK:       for.body.preheader:
 ; CHECK-NEXT:    [[LIM_0:%.*]] = phi i32 [ 65, [[ENTRY:%.*]] ], [ 1, [[DEAD:%.*]] ]
 ; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[PTR:%.*]], i64 8
-; CHECK-NEXT:    [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[LIM_0]], i32 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i32 [[UMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i32 [[TMP0]] to i64
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i32 [[LIM_0]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP0]], i64 2)
 ; CHECK-NEXT:    [[TMP2:%.*]] = shl nuw nsw i64 [[TMP1]], 3
-; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP2]], i1 false)
+; CHECK-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP2]], -8
+; CHECK-NEXT:    call void @llvm.memset.p0.i64(ptr align 8 [[SCEVGEP]], i8 0, i64 [[TMP3]], i1 false)
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/reduction.ll b/llvm/test/Transforms/LoopVectorize/reduction.ll
index 757be041afbb5..af6aa9373b3cb 100644
--- a/llvm/test/Transforms/LoopVectorize/reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/reduction.ll
@@ -1199,13 +1199,13 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
 ; CHECK-NEXT:    [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -1222,7 +1222,7 @@ define i64 @reduction_with_phi_with_one_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-NEXT:    br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]
@@ -1277,13 +1277,13 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-SAME: i16 [[N:%.*]], ptr [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SMAX:%.*]] = call i16 @llvm.smax.i16(i16 [[N]], i16 2)
-; CHECK-NEXT:    [[TMP0:%.*]] = add nsw i16 [[SMAX]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext nneg i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[TMP0:%.*]] = zext nneg i16 [[SMAX]] to i32
+; CHECK-NEXT:    [[TMP1:%.*]] = add nsw i32 [[TMP0]], -1
 ; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp slt i16 [[N]], 5
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], 32764
-; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nuw nsw i32 [[N_VEC]] to i16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[TMP1]], -4
+; CHECK-NEXT:    [[DOTCAST:%.*]] = trunc nsw i32 [[N_VEC]] to i16
 ; CHECK-NEXT:    [[IND_END:%.*]] = or disjoint i16 [[DOTCAST]], 1
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
@@ -1300,7 +1300,7 @@ define i64 @reduction_with_phi_with_two_incoming_on_backedge(i16 %n, ptr %A) {
 ; CHECK-NEXT:    br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP4]])
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[TMP1]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i16 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[ENTRY:%.*]] ]

nikic · 2025-06-03T13:52:40Z

llvm/lib/Analysis/ScalarEvolution.cpp

+    const SCEVConstant *C;
+    const SCEV *A;
+    // zext (C + A)<nsw> -> (sext(C) + zext(A))<nsw> if zext (C + A)<nsw> >=s 0
+    // and A >=s V.


Suggested change

// and A >=s V.

// and A >=s C.

Though, wouldn't the more natural fold here be something like https://alive2.llvm.org/ce/z/RF9XaY? For the case where A >= 0 the sext would become a zext.

Updated, thanks. I was originally a bit worried that replacing zext with 2 inner sext may make things worse, but it should probably be fine.

Still need to check if there's a test case for the 2 sext case

Here are some changes with the generalization: 7f8f937

in @add_nsw_zext_fold_results_in_sext we have a in more complex expansion

in @fold_add_zext_to_sext we miss some re-use during expansion.

The core issue here seems to be that pushing sext through add nsw is a non-reversible transform. You can convert sext of add nsw to add nsw of sext, but you generally can't go from add nsw of sext to sext of add.

But given that SCEV does that in general, it probably makes sense to still do it here as well.

Sounds good, so should we go with the current version then straight away or first land the restricted version which is less likely to intoduce any regressions?

Add extra test coverage for #142599.

Add extra test coverage for llvm/llvm-project#142599.

Simplify zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if * zext (C + A)<nsw> >=s 0 and * A >=s V. For now this is limited to cases where the first operand is a constant, so the SExt can be folded to a new constant. This can be relaxed in the future. Alive2 proof of the general pattern and the test changes in zext-nuw.ll (times out in the online instance but verifies locally) https://alive2.llvm.org/ce/z/_BtyGy

nikic · 2025-06-05T19:14:58Z

Compile-time impact is peculiar: https://llvm-compile-time-tracker.com/compare.php?from=100a1d0c4caad0d0f2ec26b07d3cc73f59b9a9a8&to=6d78ba4a9c028477904dd915f34a6ea8854b15a2&stat=instructions:u Not much on CTMark, but +0.4% on clang ThinLTO: https://llvm-compile-time-tracker.com/compare_clang.php?from=100a1d0c4caad0d0f2ec26b07d3cc73f59b9a9a8&to=6d78ba4a9c028477904dd915f34a6ea8854b15a2&stat=instructions%3Au&sortBy=interestingness

fhahn · 2025-06-05T19:53:17Z

Compile-time impact is peculiar: https://llvm-compile-time-tracker.com/compare.php?from=100a1d0c4caad0d0f2ec26b07d3cc73f59b9a9a8&to=6d78ba4a9c028477904dd915f34a6ea8854b15a2&stat=instructions:u Not much on CTMark, but +0.4% on clang ThinLTO: https://llvm-compile-time-tracker.com/compare_clang.php?from=100a1d0c4caad0d0f2ec26b07d3cc73f59b9a9a8&to=6d78ba4a9c028477904dd915f34a6ea8854b15a2&stat=instructions%3Au&sortBy=interestingness

Ah interesting. Let me see if I can pin down where this is coming from.

Add extra test coverage for llvm#142599.

fhahn · 2025-06-11T16:01:32Z

Hmm, the impact in Clang seems down to the extra work constructing and reasoning about the newly created expression (or additional transforms), not due to the additional checks whether the transform is valid. Just doing the analysis w/o construting the new SCEV completely removes the compile-time impact for the clang build: http://llvm-compile-time-tracker.com/compare.php?from=e2639eefaabdfc06adad1a4458b6900d9838e64f&to=011614dacd9e201587843351dd54d1342ad0d622&stat=instructions:u

Not sure what the best next steps would be to get this wrapped up

Add extra test coverage for llvm#142599.

fhahn requested review from preames and efriedma-quic June 3, 2025 13:08

fhahn requested a review from nikic as a code owner June 3, 2025 13:08

fhahn mentioned this pull request Jun 3, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 3, 2025

dtcxzyw mentioned this pull request Jun 3, 2025

pre-commit: PR142599 dtcxzyw/llvm-opt-benchmark#2393

Open

nikic reviewed Jun 3, 2025

View reviewed changes

fhahn force-pushed the scev-zext-add-nsw branch 2 times, most recently from dc4287a to dcf279a Compare June 3, 2025 15:41

fhahn added a commit that referenced this pull request Jun 3, 2025

[SCEV] Add more tests with zext(add C, %var)<nsw>.

1340ecf

Add extra test coverage for #142599.

fhahn force-pushed the scev-zext-add-nsw branch from dcf279a to 7f8f937 Compare June 3, 2025 21:06

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 3, 2025

Automerge: [SCEV] Add more tests with zext(add C, %var)<nsw>.

61cb7e8

Add extra test coverage for llvm/llvm-project#142599.

fhahn added 4 commits June 5, 2025 19:26

!fixup drop A >= C requierement

c56db5f

!fixup add test showing regression.

b271f0a

!fixup use getAddExpr taking 2 SCEVs.

842564e

fhahn force-pushed the scev-zext-add-nsw branch from 7f8f937 to 842564e Compare June 5, 2025 19:10

rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025

[SCEV] Add more tests with zext(add C, %var)<nsw>.

bf372ac

Add extra test coverage for llvm#142599.

DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025

[SCEV] Add more tests with zext(add C, %var)<nsw>.

4e87f63

Add extra test coverage for llvm#142599.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

Uh oh!

fhahn commented Jun 3, 2025

Uh oh!

llvmbot commented Jun 3, 2025 •

edited

Loading

Uh oh!

nikic Jun 3, 2025

Uh oh!

nikic Jun 3, 2025

Uh oh!

fhahn Jun 3, 2025

Uh oh!

fhahn Jun 3, 2025

Uh oh!

nikic Jun 5, 2025

Uh oh!

fhahn Jun 5, 2025

Uh oh!

nikic commented Jun 5, 2025

Uh oh!

fhahn commented Jun 5, 2025

Uh oh!

fhahn commented Jun 11, 2025

Uh oh!

Uh oh!

[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

Are you sure you want to change the base?

[SCEV] Fold zext(C+A)<nsw> -> (sext(C) + zext(A))<nsw> if possible. #142599

Uh oh!

Conversation

fhahn commented Jun 3, 2025

Uh oh!

llvmbot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

nikic commented Jun 5, 2025

Uh oh!

fhahn commented Jun 5, 2025

Uh oh!

fhahn commented Jun 11, 2025

Uh oh!

Uh oh!

llvmbot commented Jun 3, 2025 •

edited

Loading