ValueTracking: simplify udiv/urem recurrences #108973

artagnon · 2024-09-17T13:02:59Z

A urem recurrence has the property that the result can never exceed the start value. A udiv recurrence has the property that the result can never exceed either the start value or the numerator, whichever is greater. Implement a simplification based on these properties.

llvmbot · 2024-09-18T21:04:37Z

@llvm/pr-subscribers-llvm-analysis

Author: Ramkumar Ramachandra (artagnon)

Changes

matchSimpleRecurrence only misses recurrence cases for UDiv, SDiv, URem, and SRem, as no meaningful simplification can be performed on an Xor recurrence using the KnownBits infrastructure; Xor with zero is already simplified. Hence, write simplifications for UDiv, SDiv, URem and SRem recurrences using KnownBits, completing the function.

-- 8< --
Based on #109198.

Full diff: https://github.com/llvm/llvm-project/pull/108973.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ValueTracking.cpp (+47-27)
(modified) llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll (+200)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index ba3ba7cc98136d..f19397d6810d03 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1451,34 +1451,34 @@ static void computeKnownBitsFromOperator(const Operator *I,
         };
       }
 
-      // Check for operations that have the property that if
-      // both their operands have low zero bits, the result
-      // will have low zero bits.
-      if (Opcode == Instruction::Add ||
-          Opcode == Instruction::Sub ||
-          Opcode == Instruction::And ||
-          Opcode == Instruction::Or ||
-          Opcode == Instruction::Mul) {
-        // Change the context instruction to the "edge" that flows into the
-        // phi. This is important because that is where the value is actually
-        // "evaluated" even though it is used later somewhere else. (see also
-        // D69571).
-        SimplifyQuery RecQ = Q.getWithoutCondContext();
+      // Change the context instruction to the "edge" that flows into the
+      // phi. This is important because that is where the value is actually
+      // "evaluated" even though it is used later somewhere else. (see also
+      // D69571).
+      SimplifyQuery RecQ = Q.getWithoutCondContext();
 
-        unsigned OpNum = P->getOperand(0) == R ? 0 : 1;
-        Instruction *RInst = P->getIncomingBlock(OpNum)->getTerminator();
-        Instruction *LInst = P->getIncomingBlock(1 - OpNum)->getTerminator();
+      unsigned OpNum = P->getOperand(0) == R ? 0 : 1;
+      Instruction *RInst = P->getIncomingBlock(OpNum)->getTerminator();
+      Instruction *LInst = P->getIncomingBlock(1 - OpNum)->getTerminator();
 
-        // Ok, we have a PHI of the form L op= R. Check for low
-        // zero bits.
-        RecQ.CxtI = RInst;
-        computeKnownBits(R, DemandedElts, Known2, Depth + 1, RecQ);
+      // Ok, we have a PHI of the form L op= R. R and Known2 correspond to the
+      // start value.
+      RecQ.CxtI = RInst;
+      computeKnownBits(R, DemandedElts, Known2, Depth + 1, RecQ);
 
-        // We need to take the minimum number of known bits
-        KnownBits Known3(BitWidth);
-        RecQ.CxtI = LInst;
-        computeKnownBits(L, DemandedElts, Known3, Depth + 1, RecQ);
+      KnownBits Known3(BitWidth);
+      RecQ.CxtI = LInst;
+      computeKnownBits(L, DemandedElts, Known3, Depth + 1, RecQ);
 
+      switch (Opcode) {
+      // Check for operations that have the property that if both their operands
+      // have low zero bits, the result will have low zero bits.
+      case Instruction::Add:
+      case Instruction::Sub:
+      case Instruction::And:
+      case Instruction::Or:
+      case Instruction::Mul: {
+        // We need to take the minimum number of known bits
         Known.Zero.setLowBits(std::min(Known2.countMinTrailingZeros(),
                                        Known3.countMinTrailingZeros()));
 
@@ -1514,9 +1514,26 @@ static void computeKnownBitsFromOperator(const Operator *I,
                    Known3.isNonNegative())
             Known.makeNonNegative();
         }
-
         break;
       }
+      case Instruction::UDiv:
+      case Instruction::URem:
+         // Result cannot be larger than start value.
+         Known.Zero.setHighBits(Known2.countMinLeadingZeros());
+         break;
+      case Instruction::SDiv:
+      case Instruction::SRem: {
+        // Magnitude of result cannot be larger than that of start value.
+        if (Known2.isNonNegative())
+          Known.Zero.setHighBits(Known2.countMinLeadingZeros());
+        else if (Known2.isNegative())
+          Known.One.setHighBits(Known2.countMinLeadingOnes());
+        break;
+      }
+      default:
+        break;
+      }
+      break;
     }
 
     // Unreachable blocks may have zero-operand PHI nodes.
@@ -8968,7 +8985,6 @@ bool llvm::matchSimpleRecurrence(const PHINode *P, BinaryOperator *&BO,
     switch (Opcode) {
     default:
       continue;
-    // TODO: Expand list -- xor, div, gep, uaddo, etc..
     case Instruction::LShr:
     case Instruction::AShr:
     case Instruction::Shl:
@@ -8977,7 +8993,11 @@ bool llvm::matchSimpleRecurrence(const PHINode *P, BinaryOperator *&BO,
     case Instruction::And:
     case Instruction::Or:
     case Instruction::Mul:
-    case Instruction::FMul: {
+    case Instruction::FMul:
+    case Instruction::UDiv:
+    case Instruction::SDiv:
+    case Instruction::URem:
+    case Instruction::SRem: {
       Value *LL = LU->getOperand(0);
       Value *LR = LU->getOperand(1);
       // Find a recurrence.
diff --git a/llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll b/llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll
index 3355328cad9ecf..cdf55ba6e92f0a 100644
--- a/llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll
+++ b/llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll
@@ -81,6 +81,206 @@ exit:
   ret i64 %res
 }
 
+define i64 @test_udiv(i1 %c) {
+; CHECK-LABEL: @test_udiv(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [9, %entry], [%iv.next, %loop]
+  %iv.next = udiv i64 %iv, 3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 16
+  ret i64 %res
+}
+
+define i64 @test_sdiv(i1 %c) {
+; CHECK-LABEL: @test_sdiv(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 16
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [-9, %entry], [%iv.next, %loop]
+  %iv.next = sdiv i64 %iv, -3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 16
+  ret i64 %res
+}
+
+define i64 @test_sdiv2(i1 %c) {
+; CHECK-LABEL: @test_sdiv2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 16
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [-9, %entry], [%iv.next, %loop]
+  %iv.next = sdiv i64 %iv, 3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 16
+  ret i64 %res
+}
+
+define i64 @test_sdiv3(i1 %c) {
+; CHECK-LABEL: @test_sdiv3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [9, %entry], [%iv.next, %loop]
+  %iv.next = sdiv i64 %iv, -3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, -16
+  ret i64 %res
+}
+
+define i64 @test_urem(i1 %c) {
+; CHECK-LABEL: @test_urem(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [3, %entry], [%iv.next, %loop]
+  %iv.next = urem i64 9, %iv
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 4
+  ret i64 %res
+}
+
+define i64 @test_srem(i1 %c) {
+; CHECK-LABEL: @test_srem(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 16
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [-9, %entry], [%iv.next, %loop]
+  %iv.next = srem i64 %iv, -3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 16
+  ret i64 %res
+}
+
+define i64 @test_srem2(i1 %c) {
+; CHECK-LABEL: @test_srem2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 16
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [-9, %entry], [%iv.next, %loop]
+  %iv.next = srem i64 %iv, 3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 16
+  ret i64 %res
+}
+
+define i64 @test_srem2_flipped(i1 %c) {
+; CHECK-LABEL: @test_srem2_flipped(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [3, %entry], [%iv.next, %loop]
+  %iv.next = srem i64 -9, %iv
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, 4
+  ret i64 %res
+}
+
+define i64 @test_srem3(i1 %c) {
+; CHECK-LABEL: @test_srem3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 0
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [9, %entry], [%iv.next, %loop]
+  %iv.next = srem i64 %iv, -3
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, -16
+  ret i64 %res
+}
+
+define i64 @test_srem3_flipped(i1 %c) {
+; CHECK-LABEL: @test_srem3_flipped(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 [[C:%.*]], label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret i64 -4
+;
+entry:
+  br label %loop
+loop:
+  %iv = phi i64 [-3, %entry], [%iv.next, %loop]
+  %iv.next = srem i64 9, %iv
+  br i1 %c, label %exit, label %loop
+exit:
+  %res = and i64 %iv, -4
+  ret i64 %res
+}
+
 define i64 @test_and(i1 %c) {
 ; CHECK-LABEL: @test_and(
 ; CHECK-NEXT:  entry:

github-actions · 2024-09-18T21:07:31Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Analysis/ValueTracking.cpp

artagnon · 2024-09-19T09:35:15Z

Sorry, this is incorrect, and there are subtle problems with it. Is it possible to experiment with an Alive2 proof for this example? (I believe Alive2 doesn't handle loops)

dtcxzyw · 2024-09-19T09:38:12Z

Sorry, this is incorrect, and there are subtle problems with it. Is it possible to experiment with an Alive2 proof for this example? (I believe Alive2 doesn't handle loops)

You can try with -src-unroll/-tgt-unroll.

artagnon · 2024-10-07T09:56:57Z

As far as I can tell, there is no possible sdiv/srem recurrence simplification: with negative values, the result alternates between positive and negative values. I've now implemented just udiv/urem simplification, and I think it is simple enough that the patch should be good.

llvm/lib/Analysis/ValueTracking.cpp

artagnon · 2024-10-14T11:19:58Z

Gentle ping.

artagnon · 2024-10-21T09:10:21Z

Gentle ping.

llvm/lib/Analysis/ValueTracking.cpp

artagnon · 2024-11-04T11:33:52Z

Gentle ping.

artagnon · 2024-11-04T13:52:51Z

This patch is a very minor improvement, but an improvement nevertheless. I don't see it regressing compile-time, so it should be good, no?

nikic

Can this code be integrated in the shift handling above? It seems pretty similar -- but not quite the same, and I'm suspicious of differences :)

artagnon · 2024-11-05T15:36:19Z

Can this code be integrated in the shift handling above? It seems pretty similar -- but not quite the same, and I'm suspicious of differences :)

The only difference is a smarter context. Given that the context is a little difficult to test, I've integrated the code at the cost of a worse context. We can try to improve the context in a follow-up, but I'm not sure what tests we can write.

nikic

LGTM with some nits.

llvm/lib/Analysis/ValueTracking.cpp

llvm/test/Analysis/ValueTracking/recurrence-knownbits.ll

udiv and urem recurrences have the property that the result can never exceed the start value. Implement a simplification based on this property.

artagnon force-pushed the vt-recur-divrem branch from 81c1be6 to 4a4d517 Compare September 18, 2024 21:03

artagnon marked this pull request as ready for review September 18, 2024 21:04

artagnon requested a review from nikic as a code owner September 18, 2024 21:04

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Sep 18, 2024

artagnon requested a review from dtcxzyw September 18, 2024 21:04

artagnon force-pushed the vt-recur-divrem branch from 4a4d517 to 37f3777 Compare September 18, 2024 21:08

dtcxzyw reviewed Sep 19, 2024

View reviewed changes