Skip to content

SCEV: teach isImpliedViaOperations about samesign #124270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 6, 2025

Conversation

artagnon
Copy link
Contributor

Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass samesign information to isImpliedViaOperations, and teach it to call CmpPredicate::getPreferredSignedPredicate, effectively making it optimize with samesign information.

@artagnon artagnon requested a review from nikic as a code owner January 24, 2025 13:41
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jan 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Ramkumar Ramachandra (artagnon)

Changes

Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass samesign information to isImpliedViaOperations, and teach it to call CmpPredicate::getPreferredSignedPredicate, effectively making it optimize with samesign information.


Full diff: https://github.com/llvm/llvm-project/pull/124270.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+17-16)
  • (modified) llvm/test/Analysis/ScalarEvolution/implied-via-division.ll (+131-33)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 7d7d37b3d228dd..d3e060e4f2be84 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -11863,15 +11863,13 @@ bool ScalarEvolution::isImpliedCondBalancedTypes(
   }
 
   // Check whether the found predicate is the same as the desired predicate.
-  // FIXME: use CmpPredicate::getMatching here.
-  if (FoundPred == static_cast<CmpInst::Predicate>(Pred))
-    return isImpliedCondOperands(Pred, LHS, RHS, FoundLHS, FoundRHS, CtxI);
+  if (auto P = CmpPredicate::getMatching(FoundPred, Pred))
+    return isImpliedCondOperands(*P, LHS, RHS, FoundLHS, FoundRHS, CtxI);
 
   // Check whether swapping the found predicate makes it the same as the
   // desired predicate.
-  // FIXME: use CmpPredicate::getMatching here.
-  if (ICmpInst::getSwappedCmpPredicate(FoundPred) ==
-      static_cast<CmpInst::Predicate>(Pred)) {
+  if (auto P = CmpPredicate::getMatching(
+          ICmpInst::getSwappedCmpPredicate(FoundPred), Pred)) {
     // We can write the implication
     // 0.  LHS Pred      RHS  <-   FoundLHS SwapPred  FoundRHS
     // using one of the following ways:
@@ -11882,22 +11880,23 @@ bool ScalarEvolution::isImpliedCondBalancedTypes(
     // Forms 1. and 2. require swapping the operands of one condition. Don't
     // do this if it would break canonical constant/addrec ordering.
     if (!isa<SCEVConstant>(RHS) && !isa<SCEVAddRecExpr>(LHS))
-      return isImpliedCondOperands(FoundPred, RHS, LHS, FoundLHS, FoundRHS,
-                                   CtxI);
+      return isImpliedCondOperands(ICmpInst::getSwappedCmpPredicate(*P), RHS,
+                                   LHS, FoundLHS, FoundRHS, CtxI);
     if (!isa<SCEVConstant>(FoundRHS) && !isa<SCEVAddRecExpr>(FoundLHS))
-      return isImpliedCondOperands(Pred, LHS, RHS, FoundRHS, FoundLHS, CtxI);
+      return isImpliedCondOperands(*P, LHS, RHS, FoundRHS, FoundLHS, CtxI);
 
     // There's no clear preference between forms 3. and 4., try both.  Avoid
     // forming getNotSCEV of pointer values as the resulting subtract is
     // not legal.
     if (!LHS->getType()->isPointerTy() && !RHS->getType()->isPointerTy() &&
-        isImpliedCondOperands(FoundPred, getNotSCEV(LHS), getNotSCEV(RHS),
-                              FoundLHS, FoundRHS, CtxI))
+        isImpliedCondOperands(ICmpInst::getSwappedCmpPredicate(*P),
+                              getNotSCEV(LHS), getNotSCEV(RHS), FoundLHS,
+                              FoundRHS, CtxI))
       return true;
 
     if (!FoundLHS->getType()->isPointerTy() &&
         !FoundRHS->getType()->isPointerTy() &&
-        isImpliedCondOperands(Pred, LHS, RHS, getNotSCEV(FoundLHS),
+        isImpliedCondOperands(*P, LHS, RHS, getNotSCEV(FoundLHS),
                               getNotSCEV(FoundRHS), CtxI))
       return true;
 
@@ -12567,14 +12566,16 @@ bool ScalarEvolution::isImpliedViaOperations(CmpPredicate Pred, const SCEV *LHS,
     return false;
 
   // We only want to work with GT comparison so far.
-  if (Pred == ICmpInst::ICMP_ULT || Pred == ICmpInst::ICMP_SLT) {
+  if (ICmpInst::isLT(Pred)) {
     Pred = ICmpInst::getSwappedCmpPredicate(Pred);
     std::swap(LHS, RHS);
     std::swap(FoundLHS, FoundRHS);
   }
 
+  CmpInst::Predicate P = Pred.getPreferredSignedPredicate();
+
   // For unsigned, try to reduce it to corresponding signed comparison.
-  if (Pred == ICmpInst::ICMP_UGT)
+  if (P == ICmpInst::ICMP_UGT)
     // We can replace unsigned predicate with its signed counterpart if all
     // involved values are non-negative.
     // TODO: We could have better support for unsigned.
@@ -12587,10 +12588,10 @@ bool ScalarEvolution::isImpliedViaOperations(CmpPredicate Pred, const SCEV *LHS,
                                 FoundRHS) &&
           isImpliedCondOperands(ICmpInst::ICMP_SGT, RHS, MinusOne, FoundLHS,
                                 FoundRHS))
-        Pred = ICmpInst::ICMP_SGT;
+        P = ICmpInst::ICMP_SGT;
     }
 
-  if (Pred != ICmpInst::ICMP_SGT)
+  if (P != ICmpInst::ICMP_SGT)
     return false;
 
   auto GetOpFromSExt = [&](const SCEV *S) {
diff --git a/llvm/test/Analysis/ScalarEvolution/implied-via-division.ll b/llvm/test/Analysis/ScalarEvolution/implied-via-division.ll
index a1d30406095ec5..d83301243ef30b 100644
--- a/llvm/test/Analysis/ScalarEvolution/implied-via-division.ll
+++ b/llvm/test/Analysis/ScalarEvolution/implied-via-division.ll
@@ -2,12 +2,10 @@
 ; RUN: opt < %s -disable-output -passes="print<scalar-evolution>" \
 ; RUN:   -scalar-evolution-classify-expressions=0 2>&1 | FileCheck %s
 
-declare void @llvm.experimental.guard(i1, ...)
-
-define void @test_1(i32 %n) nounwind {
-; Prove that (n > 1) ===> (n / 2 > 0).
-; CHECK-LABEL: 'test_1'
-; CHECK-NEXT:  Determining loop execution counts for: @test_1
+define void @implied1(i32 %n) {
+; Prove that (n s> 1) ===> (n / 2 s> 0).
+; CHECK-LABEL: 'implied1'
+; CHECK-NEXT:  Determining loop execution counts for: @implied1
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + %n.div.2)<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + %n.div.2)<nsw>
@@ -29,10 +27,35 @@ exit:
   ret void
 }
 
-define void @test_1neg(i32 %n) nounwind {
-; Prove that (n > 0) =\=> (n / 2 > 0).
-; CHECK-LABEL: 'test_1neg'
-; CHECK-NEXT:  Determining loop execution counts for: @test_1neg
+define void @implied1_samesign(i32 %n) {
+; Prove that (n > 1) ===> (n / 2 s> 0).
+; CHECK-LABEL: 'implied1_samesign'
+; CHECK-NEXT:  Determining loop execution counts for: @implied1_samesign
+; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
+; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: Trip multiple is 1
+;
+entry:
+  %cmp1 = icmp samesign ugt i32 %n, 1
+  %n.div.2 = sdiv i32 %n, 2
+  call void @llvm.assume(i1 %cmp1)
+  br label %header
+
+header:
+  %indvar = phi i32 [ %indvar.next, %header ], [ 0, %entry ]
+  %indvar.next = add i32 %indvar, 1
+  %exitcond = icmp sgt i32 %n.div.2, %indvar.next
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
+
+define void @implied1_neg(i32 %n) {
+; Prove that (n s> 0) =\=> (n / 2 s> 0).
+; CHECK-LABEL: 'implied1_neg'
+; CHECK-NEXT:  Determining loop execution counts for: @implied1_neg
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
@@ -54,10 +77,10 @@ exit:
   ret void
 }
 
-define void @test_2(i32 %n) nounwind {
-; Prove that (n >= 2) ===> (n / 2 > 0).
-; CHECK-LABEL: 'test_2'
-; CHECK-NEXT:  Determining loop execution counts for: @test_2
+define void @implied2(i32 %n) {
+; Prove that (n s>= 2) ===> (n / 2 s> 0).
+; CHECK-LABEL: 'implied2'
+; CHECK-NEXT:  Determining loop execution counts for: @implied2
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + %n.div.2)<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + %n.div.2)<nsw>
@@ -79,10 +102,35 @@ exit:
   ret void
 }
 
-define void @test_2neg(i32 %n) nounwind {
-; Prove that (n >= 1) =\=> (n / 2 > 0).
-; CHECK-LABEL: 'test_2neg'
-; CHECK-NEXT:  Determining loop execution counts for: @test_2neg
+define void @implied2_samesign(i32 %n) {
+; Prove that (n >= 2) ===> (n / 2 s> 0).
+; CHECK-LABEL: 'implied2_samesign'
+; CHECK-NEXT:  Determining loop execution counts for: @implied2_samesign
+; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
+; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
+; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
+; CHECK-NEXT:  Loop %header: Trip multiple is 1
+;
+entry:
+  %cmp1 = icmp samesign uge i32 %n, 2
+  %n.div.2 = sdiv i32 %n, 2
+  call void @llvm.assume(i1 %cmp1)
+  br label %header
+
+header:
+  %indvar = phi i32 [ %indvar.next, %header ], [ 0, %entry ]
+  %indvar.next = add i32 %indvar, 1
+  %exitcond = icmp sgt i32 %n.div.2, %indvar.next
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
+
+define void @implied2_neg(i32 %n) {
+; Prove that (n s>= 1) =\=> (n / 2 s> 0).
+; CHECK-LABEL: 'implied2_neg'
+; CHECK-NEXT:  Determining loop execution counts for: @implied2_neg
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741822
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (-1 + (1 smax %n.div.2))<nsw>
@@ -104,10 +152,10 @@ exit:
   ret void
 }
 
-define void @test_3(i32 %n) nounwind {
-; Prove that (n > -2) ===> (n / 2 >= 0).
-; CHECK-LABEL: 'test_3'
-; CHECK-NEXT:  Determining loop execution counts for: @test_3
+define void @implied3(i32 %n) {
+; Prove that (n s> -2) ===> (n / 2 s>= 0).
+; CHECK-LABEL: 'implied3'
+; CHECK-NEXT:  Determining loop execution counts for: @implied3
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (1 + %n.div.2)<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741824
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (1 + %n.div.2)<nsw>
@@ -129,10 +177,35 @@ exit:
   ret void
 }
 
-define void @test_3neg(i32 %n) nounwind {
+define void @implied3_samesign(i32 %n) {
+; Prove that (n > -2) ===> (n / 2 s>= 0).
+; CHECK-LABEL: 'implied3_samesign'
+; CHECK-NEXT:  Determining loop execution counts for: @implied3_samesign
+; CHECK-NEXT:  Loop %header: backedge-taken count is (1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1
+; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: Trip multiple is 1
+;
+entry:
+  %cmp1 = icmp samesign ugt i32 %n, -2
+  %n.div.2 = sdiv i32 %n, 2
+  call void @llvm.assume(i1 %cmp1)
+  br label %header
+
+header:
+  %indvar = phi i32 [ %indvar.next, %header ], [ 0, %entry ]
+  %indvar.next = add i32 %indvar, 1
+  %exitcond = icmp sge i32 %n.div.2, %indvar
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
+
+define void @implied3_neg(i32 %n) {
 ; Prove that (n > -3) =\=> (n / 2 >= 0).
-; CHECK-LABEL: 'test_3neg'
-; CHECK-NEXT:  Determining loop execution counts for: @test_3neg
+; CHECK-LABEL: 'implied3_neg'
+; CHECK-NEXT:  Determining loop execution counts for: @implied3_neg
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (0 smax (1 + %n.div.2)<nsw>)
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741824
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (0 smax (1 + %n.div.2)<nsw>)
@@ -154,10 +227,10 @@ exit:
   ret void
 }
 
-define void @test_4(i32 %n) nounwind {
-; Prove that (n >= -1) ===> (n / 2 >= 0).
-; CHECK-LABEL: 'test_4'
-; CHECK-NEXT:  Determining loop execution counts for: @test_4
+define void @implied4(i32 %n) {
+; Prove that (n s>= -1) ===> (n / 2 s>= 0).
+; CHECK-LABEL: 'implied4'
+; CHECK-NEXT:  Determining loop execution counts for: @implied4
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (1 + %n.div.2)<nsw>
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741824
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (1 + %n.div.2)<nsw>
@@ -179,10 +252,35 @@ exit:
   ret void
 }
 
-define void @test_4neg(i32 %n) nounwind {
-; Prove that (n >= -2) =\=> (n / 2 >= 0).
-; CHECK-LABEL: 'test_4neg'
-; CHECK-NEXT:  Determining loop execution counts for: @test_4neg
+define void @implied4_samesign(i32 %n) {
+; Prove that (n >= -1) ===> (n / 2 s>= 0).
+; CHECK-LABEL: 'implied4_samesign'
+; CHECK-NEXT:  Determining loop execution counts for: @implied4_samesign
+; CHECK-NEXT:  Loop %header: backedge-taken count is (1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1
+; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (1 + %n.div.2)<nsw>
+; CHECK-NEXT:  Loop %header: Trip multiple is 1
+;
+entry:
+  %cmp1 = icmp samesign uge i32 %n, -1
+  %n.div.2 = sdiv i32 %n, 2
+  call void @llvm.assume(i1 %cmp1)
+  br label %header
+
+header:
+  %indvar = phi i32 [ %indvar.next, %header ], [ 0, %entry ]
+  %indvar.next = add i32 %indvar, 1
+  %exitcond = icmp sge i32 %n.div.2, %indvar
+  br i1 %exitcond, label %header, label %exit
+
+exit:
+  ret void
+}
+
+define void @implied4_neg(i32 %n) {
+; Prove that (n s>= -2) =\=> (n / 2 s>= 0).
+; CHECK-LABEL: 'implied4_neg'
+; CHECK-NEXT:  Determining loop execution counts for: @implied4_neg
 ; CHECK-NEXT:  Loop %header: backedge-taken count is (0 smax (1 + %n.div.2)<nsw>)
 ; CHECK-NEXT:  Loop %header: constant max backedge-taken count is i32 1073741824
 ; CHECK-NEXT:  Loop %header: symbolic max backedge-taken count is (0 smax (1 + %n.div.2)<nsw>)

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 29, 2025

Regression:

; bin/opt -O3 reduced.ll -vectorize-loops=false -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define i32 @QRsol(i32 %0, ptr %1) {
  %3 = sub nsw i32 %0, 1
  br label %4

4:                                                ; preds = %15, %2
  %.0 = phi i32 [ %3, %2 ], [ %16, %15 ]
  %5 = icmp sge i32 %.0, 0
  br i1 %5, label %6, label %17

6:                                                ; preds = %4
  %7 = sext i32 %.0 to i64
  %8 = getelementptr double, ptr null, i64 %7
  store double 0.000000e+00, ptr %8, align 8
  br label %9

9:                                                ; preds = %11, %6
  %.08 = phi i32 [ 0, %6 ], [ %14, %11 ]
  %10 = icmp slt i32 %.08, %.0
  br i1 %10, label %11, label %15

11:                                               ; preds = %9
  %12 = sext i32 %.08 to i64
  %13 = getelementptr double, ptr %1, i64 %12
  store double 1.000000e+00, ptr %13, align 8
  %14 = add i32 %.08, 1
  br label %9

15:                                               ; preds = %9
  %16 = add i32 %.0, -1
  br label %4

17:                                               ; preds = %4
  ret i32 0
}

Before:

define noundef i32 @QRsol(i32 %0, ptr nocapture writeonly %1) local_unnamed_addr #0 {
  %3 = icmp sgt i32 %0, 0
  br i1 %3, label %.lr.ph4.preheader, label %._crit_edge

.lr.ph4.preheader:                                ; preds = %2
  %4 = zext nneg i32 %0 to i64
  br label %.lr.ph4

.loopexit:                                        ; preds = %.lr.ph
  %5 = icmp sgt i64 %indvars.iv6, 1
  br i1 %5, label %.lr.ph4, label %._crit_edge

.lr.ph4:                                          ; preds = %.lr.ph4.preheader, %.loopexit
  %indvars.iv6 = phi i64 [ %4, %.lr.ph4.preheader ], [ %indvars.iv.next7, %.loopexit ]
  %indvars.iv.next7 = add nsw i64 %indvars.iv6, -1
  %6 = icmp samesign ugt i64 %indvars.iv6, 1
  br i1 %6, label %.lr.ph, label %._crit_edge

.lr.ph:                                           ; preds = %.lr.ph4, %.lr.ph
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %.lr.ph4 ]
  %7 = getelementptr double, ptr %1, i64 %indvars.iv
  store double 1.000000e+00, ptr %7, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %8 = icmp slt i64 %indvars.iv.next, %indvars.iv.next7
  br i1 %8, label %.lr.ph, label %.loopexit

._crit_edge:                                      ; preds = %.loopexit, %.lr.ph4, %2
  ret i32 0
}

After:

define noundef i32 @QRsol(i32 %0, ptr nocapture writeonly %1) local_unnamed_addr #0 {
  %3 = icmp sgt i32 %0, 0
  br i1 %3, label %.lr.ph4.preheader, label %._crit_edge

.lr.ph4.preheader:                                ; preds = %2
  %4 = add nsw i32 %0, -1
  %5 = zext nneg i32 %0 to i64
  %6 = zext nneg i32 %4 to i64
  br label %.lr.ph4

.loopexit:                                        ; preds = %.lr.ph
  %7 = icmp sgt i64 %indvars.iv10, 1
  %indvars.iv.next9 = add nsw i64 %indvars.iv8, -1
  br i1 %7, label %.lr.ph4, label %._crit_edge

.lr.ph4:                                          ; preds = %.lr.ph4.preheader, %.loopexit
  %indvars.iv10 = phi i64 [ %5, %.lr.ph4.preheader ], [ %indvars.iv.next11, %.loopexit ]
  %indvars.iv8 = phi i64 [ %6, %.lr.ph4.preheader ], [ %indvars.iv.next9, %.loopexit ]
  %indvars.iv.next11 = add nsw i64 %indvars.iv10, -1
  %8 = icmp samesign ugt i64 %indvars.iv10, 1
  br i1 %8, label %.lr.ph, label %._crit_edge

.lr.ph:                                           ; preds = %.lr.ph4, %.lr.ph
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %.lr.ph4 ]
  %9 = getelementptr double, ptr %1, i64 %indvars.iv
  store double 1.000000e+00, ptr %9, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %indvars.iv8
  br i1 %exitcond.not, label %.loopexit, label %.lr.ph

._crit_edge:                                      ; preds = %.loopexit, %.lr.ph4, %2
  ret i32 0
}

@artagnon
Copy link
Contributor Author

I'm confused about where the regression is coming from because the output from opt -passes='print<scalar-evolution>' is identical (?)

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 29, 2025

I'm confused about where the regression is coming from because the output from opt -passes='print<scalar-evolution>' is identical (?)

Further reduced case:

; bin/opt -passes="print<scalar-evolution>,indvars" reduced.ll -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define i32 @QRsol(i32 %0, ptr nocapture writeonly %1) local_unnamed_addr {
  %3 = icmp sgt i32 %0, 0
  br i1 %3, label %.lr.ph4.preheader, label %._crit_edge

.lr.ph4.preheader:                                ; preds = %2
  br label %.lr.ph4

.loopexit.loopexit:                               ; preds = %.lr.ph
  br label %.loopexit

.loopexit:                                        ; preds = %.loopexit.loopexit, %.lr.ph4
  %4 = icmp sgt i32 %.03.in, 1
  br i1 %4, label %.lr.ph4, label %._crit_edge.loopexit
.lr.ph4:                                          ; preds = %.lr.ph4.preheader, %.loopexit
  %.03.in = phi i32 [ %.03, %.loopexit ], [ %0, %.lr.ph4.preheader ]
  %.03 = add nsw i32 %.03.in, -1
  %5 = zext nneg i32 %.03 to i64
  %6 = getelementptr double, ptr null, i64 %5
  store double poison, ptr %6, align 8
  %7 = icmp samesign ugt i32 %.03.in, 1
  br i1 %7, label %.lr.ph.preheader, label %.loopexit
.lr.ph.preheader:                                 ; preds = %.lr.ph4
  br label %.lr.ph
.lr.ph:                                           ; preds = %.lr.ph.preheader, %.lr.ph
  %.081 = phi i32 [ %10, %.lr.ph ], [ 0, %.lr.ph.preheader ]
  %8 = zext nneg i32 %.081 to i64
  %9 = getelementptr double, ptr %1, i64 %8
  store double 1.000000e+00, ptr %9, align 8
  %10 = add nuw nsw i32 %.081, 1
  %11 = icmp slt i32 %10, %.03
  br i1 %11, label %.lr.ph, label %.loopexit.loopexit
._crit_edge.loopexit:                             ; preds = %.loopexit
  br label %._crit_edge
._crit_edge:                                      ; preds = %._crit_edge.loopexit, %2
  ret i32 0
}

Before:

Classifying expressions for: @QRsol
  %.03.in = phi i32 [ %.03, %.loopexit ], [ %0, %.lr.ph4.preheader ]
  -->  {%0,+,-1}<nsw><%.lr.ph4> U: full-set S: full-set         Exits: 1                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %.03 = add nsw i32 %.03.in, -1
  -->  {(-1 + %0),+,-1}<nsw><%.lr.ph4> U: full-set S: full-set          Exits: 0                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %5 = zext nneg i32 %.03 to i64
  -->  {(zext i32 (-1 + %0) to i64),+,-1}<nsw><%.lr.ph4> U: [-2147483646,4294967296) S: [-2147483646,4294967296)                Exits: 0                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %6 = getelementptr double, ptr null, i64 %5
  -->  {((8 * (zext i32 (-1 + %0) to i64))<nuw><nsw> + null),+,-8}<nw><%.lr.ph4> U: [0,-7) S: [-17179869168,34359738361)                Exits: null        LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %.081 = phi i32 [ %10, %.lr.ph ], [ 0, %.lr.ph.preheader ]
  -->  {0,+,1}<nuw><nsw><%.lr.ph> U: [0,2147483647) S: [0,2147483647)           Exits: (-1 + (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>))<nsw>             LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %8 = zext nneg i32 %.081 to i64
  -->  {0,+,1}<nuw><nsw><%.lr.ph> U: [0,2147483647) S: [0,2147483647)           Exits: (zext i32 (-1 + (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>))<nsw> to i64)               LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %9 = getelementptr double, ptr %1, i64 %8
  -->  {%1,+,8}<nw><%.lr.ph> U: full-set S: full-set            Exits: ((8 * (zext i32 (-1 + (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>))<nsw> to i64))<nuw><nsw> + %1)                LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %10 = add nuw nsw i32 %.081, 1
  -->  {1,+,1}<nuw><nsw><%.lr.ph> U: [1,-2147483648) S: [1,-2147483648)         Exits: (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>)         LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
Determining loop execution counts for: @QRsol
Loop %.lr.ph: backedge-taken count is (-1 + (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>))<nsw>
Loop %.lr.ph: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph: symbolic max backedge-taken count is (-1 + (1 smax {(-1 + %0),+,-1}<nsw><%.lr.ph4>))<nsw>
Loop %.lr.ph: Trip multiple is 1
Loop %.lr.ph4: backedge-taken count is (-1 + %0)
Loop %.lr.ph4: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph4: symbolic max backedge-taken count is (-1 + %0)
Loop %.lr.ph4: Trip multiple is 1
; ModuleID = 'test.ll'
source_filename = "test.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define i32 @QRsol(i32 %0, ptr nocapture writeonly %1) local_unnamed_addr {
  %3 = icmp sgt i32 %0, 0
  br i1 %3, label %.lr.ph4.preheader, label %._crit_edge

.lr.ph4.preheader:                                ; preds = %2
  %4 = sext i32 %0 to i64
  br label %.lr.ph4

.loopexit.loopexit:                               ; preds = %.lr.ph
  br label %.loopexit

.loopexit:                                        ; preds = %.lr.ph4, %.loopexit.loopexit
  %5 = icmp sgt i64 %indvars.iv2, 1
  br i1 %5, label %.lr.ph4, label %._crit_edge.loopexit

.lr.ph4:                                          ; preds = %.loopexit, %.lr.ph4.preheader
  %indvars.iv2 = phi i64 [ %indvars.iv.next3, %.loopexit ], [ %4, %.lr.ph4.preheader ]
  %indvars.iv.next3 = add nsw i64 %indvars.iv2, -1
  %6 = getelementptr double, ptr null, i64 %indvars.iv.next3
  store double poison, ptr %6, align 8
  %7 = icmp samesign ugt i64 %indvars.iv2, 1
  br i1 %7, label %.lr.ph.preheader, label %.loopexit

.lr.ph.preheader:                                 ; preds = %.lr.ph4
  br label %.lr.ph

.lr.ph:                                           ; preds = %.lr.ph, %.lr.ph.preheader
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %.lr.ph.preheader ]
  %8 = getelementptr double, ptr %1, i64 %indvars.iv
  store double 1.000000e+00, ptr %8, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %9 = icmp slt i64 %indvars.iv.next, %indvars.iv.next3
  br i1 %9, label %.lr.ph, label %.loopexit.loopexit

._crit_edge.loopexit:                             ; preds = %.loopexit
  br label %._crit_edge

._crit_edge:                                      ; preds = %._crit_edge.loopexit, %2
  ret i32 0
}

After:

Printing analysis 'Scalar Evolution Analysis' for function 'QRsol':
Classifying expressions for: @QRsol
  %.03.in = phi i32 [ %.03, %.loopexit ], [ %0, %.lr.ph4.preheader ]
  -->  {%0,+,-1}<nsw><%.lr.ph4> U: full-set S: full-set         Exits: 1                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %.03 = add nsw i32 %.03.in, -1
  -->  {(-1 + %0),+,-1}<nsw><%.lr.ph4> U: full-set S: full-set          Exits: 0                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %5 = zext nneg i32 %.03 to i64
  -->  {(zext i32 (-1 + %0) to i64),+,-1}<nsw><%.lr.ph4> U: [-2147483646,4294967296) S: [-2147483646,4294967296)                Exits: 0                LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %6 = getelementptr double, ptr null, i64 %5
  -->  {((8 * (zext i32 (-1 + %0) to i64))<nuw><nsw> + null),+,-8}<nw><%.lr.ph4> U: [0,-7) S: [-17179869168,34359738361)                Exits: null        LoopDispositions: { %.lr.ph4: Computable, %.lr.ph: Invariant }
  %.081 = phi i32 [ %10, %.lr.ph ], [ 0, %.lr.ph.preheader ]
  -->  {0,+,1}<nuw><nsw><%.lr.ph> U: [0,2147483647) S: [0,2147483647)           Exits: {(-2 + %0),+,-1}<nw><%.lr.ph4>           LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %8 = zext nneg i32 %.081 to i64
  -->  {0,+,1}<nuw><nsw><%.lr.ph> U: [0,2147483647) S: [0,2147483647)           Exits: (zext i32 {(-2 + %0),+,-1}<nw><%.lr.ph4> to i64)         LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %9 = getelementptr double, ptr %1, i64 %8
  -->  {%1,+,8}<nw><%.lr.ph> U: full-set S: full-set            Exits: ((8 * (zext i32 {(-2 + %0),+,-1}<nw><%.lr.ph4> to i64))<nuw><nsw> + %1)          LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
  %10 = add nuw nsw i32 %.081, 1
  -->  {1,+,1}<nuw><nsw><%.lr.ph> U: [1,-2147483648) S: [1,-2147483648)         Exits: {(-1 + %0),+,-1}<nsw><%.lr.ph4>          LoopDispositions: { %.lr.ph: Computable, %.lr.ph4: Variant }
Determining loop execution counts for: @QRsol
Loop %.lr.ph: backedge-taken count is {(-2 + %0),+,-1}<nw><%.lr.ph4>
Loop %.lr.ph: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph: symbolic max backedge-taken count is {(-2 + %0),+,-1}<nw><%.lr.ph4>
Loop %.lr.ph: Trip multiple is 1
Loop %.lr.ph4: backedge-taken count is (-1 + %0)
Loop %.lr.ph4: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph4: symbolic max backedge-taken count is (-1 + %0)
Loop %.lr.ph4: Trip multiple is 1
; ModuleID = 'test.ll'
source_filename = "test.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define i32 @QRsol(i32 %0, ptr nocapture writeonly %1) local_unnamed_addr {
  %3 = icmp sgt i32 %0, 0
  br i1 %3, label %.lr.ph4.preheader, label %._crit_edge

.lr.ph4.preheader:                                ; preds = %2
  %4 = add i32 %0, -1
  %5 = zext i32 %4 to i64
  %6 = zext i32 %0 to i64
  br label %.lr.ph4

.loopexit.loopexit:                               ; preds = %.lr.ph
  br label %.loopexit

.loopexit:                                        ; preds = %.lr.ph4, %.loopexit.loopexit
  %7 = icmp sgt i64 %indvars.iv6, 1
  %indvars.iv.next5 = add nsw i64 %indvars.iv4, -1
  br i1 %7, label %.lr.ph4, label %._crit_edge.loopexit

.lr.ph4:                                          ; preds = %.loopexit, %.lr.ph4.preheader
  %indvars.iv6 = phi i64 [ %indvars.iv.next7, %.loopexit ], [ %6, %.lr.ph4.preheader ]
  %indvars.iv4 = phi i64 [ %indvars.iv.next5, %.loopexit ], [ %5, %.lr.ph4.preheader ]
  %indvars.iv.next7 = add nsw i64 %indvars.iv6, -1
  %8 = getelementptr double, ptr null, i64 %indvars.iv.next7
  store double poison, ptr %8, align 8
  %9 = icmp samesign ugt i64 %indvars.iv6, 1
  br i1 %9, label %.lr.ph.preheader, label %.loopexit

.lr.ph.preheader:                                 ; preds = %.lr.ph4
  br label %.lr.ph

.lr.ph:                                           ; preds = %.lr.ph, %.lr.ph.preheader
  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %.lr.ph.preheader ]
  %10 = getelementptr double, ptr %1, i64 %indvars.iv
  store double 1.000000e+00, ptr %10, align 8
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64 %indvars.iv.next, %indvars.iv4
  br i1 %exitcond, label %.lr.ph, label %.loopexit.loopexit

._crit_edge.loopexit:                             ; preds = %.loopexit
  br label %._crit_edge

._crit_edge:                                      ; preds = %._crit_edge.loopexit, %2
  ret i32 0
}

@artagnon
Copy link
Contributor Author

artagnon commented Jan 29, 2025

Further reduced case:

Yes, after loop-simplify.

Before:
Loop %.lr.ph: backedge-taken count is (-1 + (1 smax {(-1 + %0),+,-1}<%.lr.ph4>))
Loop %.lr.ph: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph: symbolic max backedge-taken count is (-1 + (1 smax {(-1 + %0),+,-1}<%.lr.ph4>))
Loop %.lr.ph: Trip multiple is 1
Loop %.lr.ph4: backedge-taken count is (-1 + %0)
Loop %.lr.ph4: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph4: symbolic max backedge-taken count is (-1 + %0)
Loop %.lr.ph4: Trip multiple is 1

After:
Loop %.lr.ph: backedge-taken count is {(-2 + %0),+,-1}<%.lr.ph4>
Loop %.lr.ph: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph: symbolic max backedge-taken count is {(-2 + %0),+,-1}<%.lr.ph4>
Loop %.lr.ph: Trip multiple is 1
Loop %.lr.ph4: backedge-taken count is (-1 + %0)
Loop %.lr.ph4: constant max backedge-taken count is i32 2147483646
Loop %.lr.ph4: symbolic max backedge-taken count is (-1 + %0)
Loop %.lr.ph4: Trip multiple is 1

Isn't the symbolic-BTC more refined? Have to think about why this introduces a regression.

@artagnon
Copy link
Contributor Author

I think the issue is in IndVarSimplify's handling of the icmp: the code around the SCEV call getLoopInvariantPredicate seems to not handle samesign properly.

@artagnon
Copy link
Contributor Author

Isn't the symbolic-BTC more refined? Have to think about why this introduces a regression.

After some more investigation, it appears that SCEVExpander's isHighCostExpansion returns false now, and the LFTR in IndVarSimplify kicks in and replaces things. Still have to figure out how to avoid the regression.

@artagnon
Copy link
Contributor Author

artagnon commented Feb 5, 2025

Okay, I've finished my investigation, and I believe that the regression is orthogonal to samesign support, and purely due to the LFTR logic in IndVarSimplify in attempting to optimize loop exits.

  %8 = icmp slt i64 %indvars.iv.next, %indvars.iv.next7

was changed to:

  %exitcond.not = icmp eq i64 %indvars.iv.next, %indvars.iv8

In the first case, SCEVExpander::isHighCostExpansion returned true, and the LFTR logic was disabled. In the second case, as we have a better exit-count, isHighCostExpansion returned false, and the LFTR logic kicked in, changing the exit and adding extra instructions to compensate.

I'm not sure what we can do about this, because isHighCostExpansion is quite ad-hoc about costing, so I would say that we have to live with this regression unless there is a fundamental change in how the LFTR works. As a consolation, I improved IndVarSimplify in #125764 and #125828.

I think this patch is good to go as-is, although I will rebase it before landing.

@nikic
Copy link
Contributor

nikic commented Feb 5, 2025

Yeah, there is a general problem with LFTR being too aggressive. We might want to disable it for the case where it just wants to canonicalize the predicate. Introducing extra instructions for the trip count calculation in that case doesn't seem worthwhile (unlike for example the case where we can replace an and/or of multiple conditions with a single one).

@artagnon
Copy link
Contributor Author

artagnon commented Feb 5, 2025

Yeah, there is a general problem with LFTR being too aggressive. We might want to disable it for the case where it just wants to canonicalize the predicate. Introducing extra instructions for the trip count calculation in that case doesn't seem worthwhile (unlike for example the case where we can replace an and/or of multiple conditions with a single one).

If I understand correctly, this is quite non-trivial, as we don't immediately have information about whether the LFTR is just canonicalizing a predicate or performing additional simplifications as well.

Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass
samesign information to isImpliedViaOperations, and teach it to call
CmpPredicate::getPreferredSignedPredicate, effectively making it
optimize with samesign information.
@artagnon artagnon force-pushed the scev-implied-via-oper-samesign branch from e7d8606 to c485e50 Compare February 6, 2025 11:58
@artagnon
Copy link
Contributor Author

artagnon commented Feb 6, 2025

Rebased and ready to review. It's clear that the regressions are coming from the LFTR being too aggressive.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I agree we don't need to block this on LFTR regressions.

We can check if something along the lines of #126086 would be worthwhile separately.

@artagnon artagnon merged commit f5d24e6 into llvm:main Feb 6, 2025
9 checks passed
@artagnon artagnon deleted the scev-implied-via-oper-samesign branch February 6, 2025 18:14
@mikaelholmen
Copy link
Collaborator

Hi,

I bisected (what I think is) a miscompile back to this patch.

Reduced indvars reproducer:
opt -passes="indvars" bbi-103889.ll -S -o -

The problem is that with this patch it changes the input
%add = add i16 %0, 2
to
%add = add nuw nsw i16 %1, 2
but we're iterating from 32760 to 32780, doing "ult" comparisons, so we do get signed wrap around in the add.

bbi-103889.ll.gz

@artagnon
Copy link
Contributor Author

Hi,

I bisected (what I think is) a miscompile back to this patch.

Reduced indvars reproducer: opt -passes="indvars" bbi-103889.ll -S -o -

The problem is that with this patch it changes the input %add = add i16 %0, 2 to %add = add nuw nsw i16 %1, 2 but we're iterating from 32760 to 32780, doing "ult" comparisons, so we do get signed wrap around in the add.

bbi-103889.ll.gz

I think #126409 has a reduced test case. Currently verifying that this patch is buggy: will revert and investigate, as it's not clear what's wrong with the patch.

artagnon added a commit to artagnon/llvm-project that referenced this pull request Feb 10, 2025
The commit f5d24e6 is buggy, and following miscompiles have been
reported: llvm#126409 and
llvm#124270 (comment)

Revert it while we investigate.
@artagnon
Copy link
Contributor Author

Hi,
I bisected (what I think is) a miscompile back to this patch.
Reduced indvars reproducer: opt -passes="indvars" bbi-103889.ll -S -o -
The problem is that with this patch it changes the input %add = add i16 %0, 2 to %add = add nuw nsw i16 %1, 2 but we're iterating from 32760 to 32780, doing "ult" comparisons, so we do get signed wrap around in the add.
bbi-103889.ll.gz

I think #126409 has a reduced test case. Currently verifying that this patch is buggy: will revert and investigate, as it's not clear what's wrong with the patch.

Thanks for reporting the miscompile. See #126506.

artagnon added a commit that referenced this pull request Feb 10, 2025
The commit f5d24e6 is buggy, and following miscompiles have been
reported: #126409 and
#124270 (comment)

Revert it while we investigate.
@mikaelholmen
Copy link
Collaborator

Hi,
I bisected (what I think is) a miscompile back to this patch.
Reduced indvars reproducer: opt -passes="indvars" bbi-103889.ll -S -o -
The problem is that with this patch it changes the input %add = add i16 %0, 2 to %add = add nuw nsw i16 %1, 2 but we're iterating from 32760 to 32780, doing "ult" comparisons, so we do get signed wrap around in the add.
bbi-103889.ll.gz

I think #126409 has a reduced test case. Currently verifying that this patch is buggy: will revert and investigate, as it's not clear what's wrong with the patch.

Thanks for reporting the miscompile. See #126506.

Thanks!

github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 10, 2025
… (#126506)

The commit f5d24e6 is buggy, and following miscompiles have been
reported: #126409 and
llvm/llvm-project#124270 (comment)

Revert it while we investigate.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass
samesign information to isImpliedViaOperations, and teach it to call
CmpPredicate::getPreferredSignedPredicate, effectively making it
optimize with samesign information.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
The commit f5d24e6 is buggy, and following miscompiles have been
reported: llvm#126409 and
llvm#124270 (comment)

Revert it while we investigate.
dtcxzyw pushed a commit to dtcxzyw/llvm-project that referenced this pull request Mar 31, 2025
Use CmpPredicate::getMatching in isImpliedCondBalancedTypes to pass
samesign information to isImpliedViaOperations, and teach it to call
CmpPredicate::getPreferredSignedPredicate, effectively making it
optimize with samesign information.
dtcxzyw added a commit that referenced this pull request Apr 2, 2025
This patch relands #124270.
Closes #126409.

The root cause is that we incorrectly preserve the samesign flag after
truncating operands of an icmp:
https://alive2.llvm.org/ce/z/4NE9gS

---------

Co-authored-by: Ramkumar Ramachandra <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
… (#133711)

This patch relands llvm/llvm-project#124270.
Closes llvm/llvm-project#126409.

The root cause is that we incorrectly preserve the samesign flag after
truncating operands of an icmp:
https://alive2.llvm.org/ce/z/4NE9gS

---------

Co-authored-by: Ramkumar Ramachandra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants