Skip to content

Recommit "[CodeGenPrepare] Folding urem with loop invariant value" #104877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

goldsteinn
Copy link
Contributor

@goldsteinn goldsteinn commented Aug 19, 2024

  • [CodeGenPrepare][X86] Add tests for fixing urem transform; NFC
  • Recommit "[CodeGenPrepare] Folding urem with loop invariant value"

Was missing remainder on Start value.

Also changed logic as as nikic suggested (getting loop from PN
instead of Rem). The prior impl increased the complexity of the code
and made debugging it more difficult.

Reproduced origin issue and verified + lit tests coverage for the case.

Was missing remainder on `Start` value.

Also changed logic as as nikic suggested (getting loop from `PN`
instead of `Rem`). The prior impl increased the complexity of the code
and made debugging it more difficult.
@llvmbot
Copy link
Member

llvmbot commented Aug 19, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-x86

Author: None (goldsteinn)

Changes
  • [CodeGenPrepare][X86] Add tests for fixing urem transform; NFC
  • Recommit "[CodeGenPrepare] Folding urem with loop invariant value"

Patch is 53.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/104877.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/CodeGenPrepare.cpp (+134)
  • (modified) llvm/test/CodeGen/X86/fold-loop-of-urem.ll (+462-171)
  • (modified) llvm/test/Transforms/CodeGenPrepare/X86/fold-loop-of-urem.ll (+20-6)
diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 48253a613b41d2..bf48c1fdab0ff0 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -472,6 +472,7 @@ class CodeGenPrepare {
   bool replaceMathCmpWithIntrinsic(BinaryOperator *BO, Value *Arg0, Value *Arg1,
                                    CmpInst *Cmp, Intrinsic::ID IID);
   bool optimizeCmp(CmpInst *Cmp, ModifyDT &ModifiedDT);
+  bool optimizeURem(Instruction *Rem);
   bool combineToUSubWithOverflow(CmpInst *Cmp, ModifyDT &ModifiedDT);
   bool combineToUAddWithOverflow(CmpInst *Cmp, ModifyDT &ModifiedDT);
   void verifyBFIUpdates(Function &F);
@@ -1975,6 +1976,135 @@ static bool foldFCmpToFPClassTest(CmpInst *Cmp, const TargetLowering &TLI,
   return true;
 }
 
+static bool isRemOfLoopIncrementWithLoopInvariant(Instruction *Rem,
+                                                  const LoopInfo *LI,
+                                                  Value *&RemAmtOut,
+                                                  PHINode *&LoopIncrPNOut) {
+  Value *Incr, *RemAmt;
+  // NB: If RemAmt is a power of 2 it *should* have been transformed by now.
+  if (!match(Rem, m_URem(m_Value(Incr), m_Value(RemAmt))))
+    return false;
+
+  // Find out loop increment PHI.
+  auto *PN = dyn_cast<PHINode>(Incr);
+  if (!PN)
+    return false;
+
+  // This isn't strictly necessary, what we really need is one increment and any
+  // amount of initial values all being the same.
+  if (PN->getNumIncomingValues() != 2)
+    return false;
+
+  // Only trivially analyzable loops.
+  Loop *L = LI->getLoopFor(PN->getParent());
+  if (!L || !L->getLoopPreheader() || !L->getLoopLatch())
+    return false;
+
+  // Req that the remainder is in the loop
+  if (!L->contains(Rem))
+    return false;
+
+  // Only works if the remainder amount is a loop invaraint
+  if (!L->isLoopInvariant(RemAmt))
+    return false;
+
+  // Is the PHI a loop increment?
+  auto LoopIncrInfo = getIVIncrement(PN, LI);
+  if (!LoopIncrInfo)
+    return false;
+
+  // We need remainder_amount % increment_amount to be zero. Increment of one
+  // satisfies that without any special logic and is overwhelmingly the common
+  // case.
+  if (!match(LoopIncrInfo->second, m_One()))
+    return false;
+
+  // Need the increment to not overflow.
+  if (!match(LoopIncrInfo->first, m_c_NUWAdd(m_Specific(PN), m_Value())))
+    return false;
+
+  // Set output variables.
+  RemAmtOut = RemAmt;
+  LoopIncrPNOut = PN;
+
+  return true;
+}
+
+// Try to transform:
+//
+// for(i = Start; i < End; ++i)
+//    Rem = (i nuw+ IncrLoopInvariant) u% RemAmtLoopInvariant;
+//
+// ->
+//
+// Rem = (Start nuw+ IncrLoopInvariant) % RemAmtLoopInvariant;
+// for(i = Start; i < End; ++i, ++rem)
+//    Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
+//
+// Currently only implemented for `IncrLoopInvariant` being zero.
+static bool foldURemOfLoopIncrement(Instruction *Rem, const DataLayout *DL,
+                                    const LoopInfo *LI,
+                                    SmallSet<BasicBlock *, 32> &FreshBBs,
+                                    bool IsHuge) {
+  Value *RemAmt;
+  PHINode *LoopIncrPN;
+  if (!isRemOfLoopIncrementWithLoopInvariant(Rem, LI, RemAmt, LoopIncrPN))
+    return false;
+
+  // Only non-constant remainder as the extra IV is probably not profitable
+  // in that case.
+  //
+  // Potential TODO(1): `urem` of a const ends up as `mul` + `shift` + `add`. If
+  // we can rule out register pressure and ensure this `urem` is executed each
+  // iteration, its probably profitable to handle the const case as well.
+  //
+  // Potential TODO(2): Should we have a check for how "nested" this remainder
+  // operation is? The new code runs every iteration so if the remainder is
+  // guarded behind unlikely conditions this might not be worth it.
+  if (match(RemAmt, m_ImmConstant()))
+    return false;
+
+  Loop *L = LI->getLoopFor(LoopIncrPN->getParent());
+  Value *Start = LoopIncrPN->getIncomingValueForBlock(L->getLoopPreheader());
+  // If we can't fully optimize out the `rem`, skip this transform.
+  Start = simplifyURemInst(Start, RemAmt, *DL);
+  if (!Start)
+    return false;
+
+  // Create new remainder with induction variable.
+  Type *Ty = Rem->getType();
+  IRBuilder<> Builder(Rem->getContext());
+
+  Builder.SetInsertPoint(LoopIncrPN);
+  PHINode *NewRem = Builder.CreatePHI(Ty, 2);
+
+  Builder.SetInsertPoint(cast<Instruction>(
+      LoopIncrPN->getIncomingValueForBlock(L->getLoopLatch())));
+  // `(add (urem x, y), 1)` is always nuw.
+  Value *RemAdd = Builder.CreateNUWAdd(NewRem, ConstantInt::get(Ty, 1));
+  Value *RemCmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, RemAdd, RemAmt);
+  Value *RemSel =
+      Builder.CreateSelect(RemCmp, Constant::getNullValue(Ty), RemAdd);
+
+  NewRem->addIncoming(Start, L->getLoopPreheader());
+  NewRem->addIncoming(RemSel, L->getLoopLatch());
+
+  // Insert all touched BBs.
+  FreshBBs.insert(LoopIncrPN->getParent());
+  FreshBBs.insert(L->getLoopLatch());
+  FreshBBs.insert(Rem->getParent());
+
+  replaceAllUsesWith(Rem, NewRem, FreshBBs, IsHuge);
+  Rem->eraseFromParent();
+  return true;
+}
+
+bool CodeGenPrepare::optimizeURem(Instruction *Rem) {
+  if (foldURemOfLoopIncrement(Rem, DL, LI, FreshBBs, IsHugeFunc))
+    return true;
+  return false;
+}
+
 bool CodeGenPrepare::optimizeCmp(CmpInst *Cmp, ModifyDT &ModifiedDT) {
   if (sinkCmpExpression(Cmp, *TLI))
     return true;
@@ -8358,6 +8488,10 @@ bool CodeGenPrepare::optimizeInst(Instruction *I, ModifyDT &ModifiedDT) {
     if (optimizeCmp(Cmp, ModifiedDT))
       return true;
 
+  if (match(I, m_URem(m_Value(), m_Value())))
+    if (optimizeURem(I))
+      return true;
+
   if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
     LI->setMetadata(LLVMContext::MD_invariant_group, nullptr);
     bool Modified = optimizeLoadExt(LI);
diff --git a/llvm/test/CodeGen/X86/fold-loop-of-urem.ll b/llvm/test/CodeGen/X86/fold-loop-of-urem.ll
index aad2e0dd7bd248..c4b130a8b4e717 100644
--- a/llvm/test/CodeGen/X86/fold-loop-of-urem.ll
+++ b/llvm/test/CodeGen/X86/fold-loop-of-urem.ll
@@ -15,25 +15,31 @@ define void @simple_urem_to_sel(i32 %N, i32 %rem_amt) nounwind {
 ; CHECK-NEXT:    je .LBB0_4
 ; CHECK-NEXT:  # %bb.1: # %for.body.preheader
 ; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    pushq %r15
 ; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    pushq %r12
 ; CHECK-NEXT:    pushq %rbx
 ; CHECK-NEXT:    movl %esi, %ebx
 ; CHECK-NEXT:    movl %edi, %ebp
+; CHECK-NEXT:    xorl %r15d, %r15d
 ; CHECK-NEXT:    xorl %r14d, %r14d
+; CHECK-NEXT:    xorl %r12d, %r12d
 ; CHECK-NEXT:    .p2align 4, 0x90
 ; CHECK-NEXT:  .LBB0_2: # %for.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    movl %r14d, %eax
-; CHECK-NEXT:    xorl %edx, %edx
-; CHECK-NEXT:    divl %ebx
-; CHECK-NEXT:    movl %edx, %edi
+; CHECK-NEXT:    movl %r14d, %edi
 ; CHECK-NEXT:    callq use.i32@PLT
 ; CHECK-NEXT:    incl %r14d
-; CHECK-NEXT:    cmpl %r14d, %ebp
+; CHECK-NEXT:    cmpl %ebx, %r14d
+; CHECK-NEXT:    cmovel %r15d, %r14d
+; CHECK-NEXT:    incl %r12d
+; CHECK-NEXT:    cmpl %r12d, %ebp
 ; CHECK-NEXT:    jne .LBB0_2
 ; CHECK-NEXT:  # %bb.3:
 ; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    popq %r12
 ; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    popq %r15
 ; CHECK-NEXT:    popq %rbp
 ; CHECK-NEXT:  .LBB0_4: # %for.cond.cleanup
 ; CHECK-NEXT:    retq
@@ -53,53 +59,271 @@ for.body:
   br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
 }
 
-define void @simple_urem_to_sel_nested2(i32 %N, i32 %rem_amt) nounwind {
-; CHECK-LABEL: simple_urem_to_sel_nested2:
+define void @simple_urem_to_sel_fail_not_in_loop(i32 %N, i32 %rem_amt) nounwind {
+; CHECK-LABEL: simple_urem_to_sel_fail_not_in_loop:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    movl %esi, %ebx
+; CHECK-NEXT:    testl %edi, %edi
+; CHECK-NEXT:    je .LBB1_1
+; CHECK-NEXT:  # %bb.3: # %for.body.preheader
+; CHECK-NEXT:    movl %edi, %r14d
+; CHECK-NEXT:    xorl %ebp, %ebp
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB1_4: # %for.body
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    movl %ebp, %edi
+; CHECK-NEXT:    callq use.i32@PLT
+; CHECK-NEXT:    incl %ebp
+; CHECK-NEXT:    cmpl %ebp, %r14d
+; CHECK-NEXT:    jne .LBB1_4
+; CHECK-NEXT:    jmp .LBB1_2
+; CHECK-NEXT:  .LBB1_1:
+; CHECK-NEXT:    xorl %ebp, %ebp
+; CHECK-NEXT:  .LBB1_2: # %for.cond.cleanup
+; CHECK-NEXT:    movl %ebp, %eax
+; CHECK-NEXT:    xorl %edx, %edx
+; CHECK-NEXT:    divl %ebx
+; CHECK-NEXT:    movl %edx, %edi
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    jmp use.i32@PLT # TAILCALL
+entry:
+  %cmp3.not = icmp eq i32 %N, 0
+  br i1 %cmp3.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  %i.05 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
+  %rem = urem i32 %i.05, %rem_amt
+  tail call void @use.i32(i32 %rem)
+  ret void
+
+for.body:
+  %i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
+  tail call void @use.i32(i32 %i.04)
+  %inc = add nuw i32 %i.04, 1
+  %exitcond.not = icmp eq i32 %inc, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define void @simple_urem_to_sel_inner_loop(i32 %N, i32 %M) nounwind {
+; CHECK-LABEL: simple_urem_to_sel_inner_loop:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    pushq %r15
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    pushq %r13
+; CHECK-NEXT:    pushq %r12
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    movl %esi, %r12d
+; CHECK-NEXT:    movl %edi, %ebp
+; CHECK-NEXT:    callq get.i32@PLT
+; CHECK-NEXT:    testl %ebp, %ebp
+; CHECK-NEXT:    je .LBB2_6
+; CHECK-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK-NEXT:    movl %eax, %r14d
+; CHECK-NEXT:    xorl %r15d, %r15d
+; CHECK-NEXT:    xorl %r13d, %r13d
+; CHECK-NEXT:    jmp .LBB2_2
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB2_5: # %for.inner.cond.cleanup
+; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:    incl %r15d
+; CHECK-NEXT:    cmpl %r14d, %r15d
+; CHECK-NEXT:    movl $0, %eax
+; CHECK-NEXT:    cmovel %eax, %r15d
+; CHECK-NEXT:    incl %r13d
+; CHECK-NEXT:    cmpl %ebp, %r13d
+; CHECK-NEXT:    je .LBB2_6
+; CHECK-NEXT:  .LBB2_2: # %for.body
+; CHECK-NEXT:    # =>This Loop Header: Depth=1
+; CHECK-NEXT:    # Child Loop BB2_4 Depth 2
+; CHECK-NEXT:    testl %r12d, %r12d
+; CHECK-NEXT:    je .LBB2_5
+; CHECK-NEXT:  # %bb.3: # %for.inner.body.preheader
+; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:    movl %r12d, %ebx
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB2_4: # %for.inner.body
+; CHECK-NEXT:    # Parent Loop BB2_2 Depth=1
+; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
+; CHECK-NEXT:    movl %r15d, %edi
+; CHECK-NEXT:    callq use.i32@PLT
+; CHECK-NEXT:    decl %ebx
+; CHECK-NEXT:    jne .LBB2_4
+; CHECK-NEXT:    jmp .LBB2_5
+; CHECK-NEXT:  .LBB2_6: # %for.cond.cleanup
+; CHECK-NEXT:    addq $8, %rsp
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    popq %r12
+; CHECK-NEXT:    popq %r13
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    popq %r15
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    retq
+entry:
+  %rem_amt = call i32 @get.i32()
+  %cmp3.not = icmp eq i32 %N, 0
+  br i1 %cmp3.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i.04 = phi i32 [ %inc, %for.inner.cond.cleanup ], [ 0, %entry ]
+
+  %cmp_inner = icmp eq i32 %M, 0
+  br i1 %cmp_inner, label %for.inner.cond.cleanup, label %for.inner.body
+
+for.inner.body:
+  %j = phi i32 [ %inc_inner, %for.inner.body ], [ 0, %for.body ]
+  %rem = urem i32 %i.04, %rem_amt
+  tail call void @use.i32(i32 %rem)
+  %inc_inner = add nuw i32 %j, 1
+  %exitcond_inner = icmp eq i32 %inc_inner, %M
+  br i1 %exitcond_inner, label %for.inner.cond.cleanup, label %for.inner.body
+
+for.inner.cond.cleanup:
+  %inc = add nuw i32 %i.04, 1
+  %exitcond.not = icmp eq i32 %inc, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define void @simple_urem_to_sel_inner_loop_fail_not_invariant(i32 %N, i32 %M) nounwind {
+; CHECK-LABEL: simple_urem_to_sel_inner_loop_fail_not_invariant:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    testl %edi, %edi
-; CHECK-NEXT:    je .LBB1_8
+; CHECK-NEXT:    je .LBB3_7
 ; CHECK-NEXT:  # %bb.1: # %for.body.preheader
 ; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    pushq %r15
 ; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    pushq %r12
 ; CHECK-NEXT:    pushq %rbx
 ; CHECK-NEXT:    movl %esi, %ebx
 ; CHECK-NEXT:    movl %edi, %ebp
 ; CHECK-NEXT:    xorl %r14d, %r14d
-; CHECK-NEXT:    jmp .LBB1_2
+; CHECK-NEXT:    jmp .LBB3_2
 ; CHECK-NEXT:    .p2align 4, 0x90
-; CHECK-NEXT:  .LBB1_5: # %for.body1
-; CHECK-NEXT:    # in Loop: Header=BB1_2 Depth=1
+; CHECK-NEXT:  .LBB3_5: # %for.inner.cond.cleanup
+; CHECK-NEXT:    # in Loop: Header=BB3_2 Depth=1
+; CHECK-NEXT:    incl %r14d
+; CHECK-NEXT:    cmpl %ebp, %r14d
+; CHECK-NEXT:    je .LBB3_6
+; CHECK-NEXT:  .LBB3_2: # %for.body
+; CHECK-NEXT:    # =>This Loop Header: Depth=1
+; CHECK-NEXT:    # Child Loop BB3_4 Depth 2
+; CHECK-NEXT:    callq get.i32@PLT
+; CHECK-NEXT:    testl %ebx, %ebx
+; CHECK-NEXT:    je .LBB3_5
+; CHECK-NEXT:  # %bb.3: # %for.inner.body.preheader
+; CHECK-NEXT:    # in Loop: Header=BB3_2 Depth=1
+; CHECK-NEXT:    movl %eax, %r15d
+; CHECK-NEXT:    movl %ebx, %r12d
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB3_4: # %for.inner.body
+; CHECK-NEXT:    # Parent Loop BB3_2 Depth=1
+; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
 ; CHECK-NEXT:    movl %r14d, %eax
 ; CHECK-NEXT:    xorl %edx, %edx
-; CHECK-NEXT:    divl %ebx
+; CHECK-NEXT:    divl %r15d
 ; CHECK-NEXT:    movl %edx, %edi
 ; CHECK-NEXT:    callq use.i32@PLT
-; CHECK-NEXT:  .LBB1_6: # %for.body.tail
-; CHECK-NEXT:    # in Loop: Header=BB1_2 Depth=1
+; CHECK-NEXT:    decl %r12d
+; CHECK-NEXT:    jne .LBB3_4
+; CHECK-NEXT:    jmp .LBB3_5
+; CHECK-NEXT:  .LBB3_6:
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    popq %r12
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    popq %r15
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:  .LBB3_7: # %for.cond.cleanup
+; CHECK-NEXT:    retq
+entry:
+  %cmp3.not = icmp eq i32 %N, 0
+  br i1 %cmp3.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i.04 = phi i32 [ %inc, %for.inner.cond.cleanup ], [ 0, %entry ]
+  %rem_amt = call i32 @get.i32()
+  %cmp_inner = icmp eq i32 %M, 0
+  br i1 %cmp_inner, label %for.inner.cond.cleanup, label %for.inner.body
+
+for.inner.body:
+  %j = phi i32 [ %inc_inner, %for.inner.body ], [ 0, %for.body ]
+  %rem = urem i32 %i.04, %rem_amt
+  tail call void @use.i32(i32 %rem)
+  %inc_inner = add nuw i32 %j, 1
+  %exitcond_inner = icmp eq i32 %inc_inner, %M
+  br i1 %exitcond_inner, label %for.inner.cond.cleanup, label %for.inner.body
+
+for.inner.cond.cleanup:
+  %inc = add nuw i32 %i.04, 1
+  %exitcond.not = icmp eq i32 %inc, %N
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+}
+
+define void @simple_urem_to_sel_nested2(i32 %N, i32 %rem_amt) nounwind {
+; CHECK-LABEL: simple_urem_to_sel_nested2:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    testl %edi, %edi
+; CHECK-NEXT:    je .LBB4_8
+; CHECK-NEXT:  # %bb.1: # %for.body.preheader
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    pushq %r15
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    pushq %r12
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    movl %esi, %ebx
+; CHECK-NEXT:    movl %edi, %ebp
+; CHECK-NEXT:    xorl %r15d, %r15d
+; CHECK-NEXT:    xorl %r14d, %r14d
+; CHECK-NEXT:    xorl %r12d, %r12d
+; CHECK-NEXT:    jmp .LBB4_2
+; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:  .LBB4_5: # %for.body1
+; CHECK-NEXT:    # in Loop: Header=BB4_2 Depth=1
+; CHECK-NEXT:    movl %r14d, %edi
+; CHECK-NEXT:    callq use.i32@PLT
+; CHECK-NEXT:  .LBB4_6: # %for.body.tail
+; CHECK-NEXT:    # in Loop: Header=BB4_2 Depth=1
 ; CHECK-NEXT:    incl %r14d
-; CHECK-NEXT:    cmpl %r14d, %ebp
-; CHECK-NEXT:    je .LBB1_7
-; CHECK-NEXT:  .LBB1_2: # %for.body
+; CHECK-NEXT:    cmpl %ebx, %r14d
+; CHECK-NEXT:    cmovel %r15d, %r14d
+; CHECK-NEXT:    incl %r12d
+; CHECK-NEXT:    cmpl %r12d, %ebp
+; CHECK-NEXT:    je .LBB4_7
+; CHECK-NEXT:  .LBB4_2: # %for.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    je .LBB1_6
+; CHECK-NEXT:    je .LBB4_6
 ; CHECK-NEXT:  # %bb.3: # %for.body0
-; CHECK-NEXT:    # in Loop: Header=BB1_2 Depth=1
+; CHECK-NEXT:    # in Loop: Header=BB4_2 Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    jne .LBB1_5
+; CHECK-NEXT:    jne .LBB4_5
 ; CHECK-NEXT:  # %bb.4: # %for.body2
-; CHECK-NEXT:    # in Loop: Header=BB1_2 Depth=1
+; CHECK-NEXT:    # in Loop: Header=BB4_2 Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    jne .LBB1_5
-; CHECK-NEXT:    jmp .LBB1_6
-; CHECK-NEXT:  .LBB1_7:
+; CHECK-NEXT:    jne .LBB4_5
+; CHECK-NEXT:    jmp .LBB4_6
+; CHECK-NEXT:  .LBB4_7:
 ; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    popq %r12
 ; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    popq %r15
 ; CHECK-NEXT:    popq %rbp
-; CHECK-NEXT:  .LBB1_8: # %for.cond.cleanup
+; CHECK-NEXT:  .LBB4_8: # %for.cond.cleanup
 ; CHECK-NEXT:    retq
 entry:
   %cmp3.not = icmp eq i32 %N, 0
@@ -132,55 +356,55 @@ define void @simple_urem_fail_bad_incr3(i32 %N, i32 %rem_amt) nounwind {
 ; CHECK-LABEL: simple_urem_fail_bad_incr3:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    testl %edi, %edi
-; CHECK-NEXT:    je .LBB2_9
+; CHECK-NEXT:    je .LBB5_9
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    pushq %rbp
 ; CHECK-NEXT:    pushq %r14
 ; CHECK-NEXT:    pushq %rbx
 ; CHECK-NEXT:    movl %esi, %ebx
-; CHECK-NEXT:    jmp .LBB2_2
+; CHECK-NEXT:    jmp .LBB5_2
 ; CHECK-NEXT:    .p2align 4, 0x90
-; CHECK-NEXT:  .LBB2_6: # %for.body1
-; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:  .LBB5_6: # %for.body1
+; CHECK-NEXT:    # in Loop: Header=BB5_2 Depth=1
 ; CHECK-NEXT:    movl %ebp, %eax
 ; CHECK-NEXT:    xorl %edx, %edx
 ; CHECK-NEXT:    divl %ebx
 ; CHECK-NEXT:    movl %edx, %edi
 ; CHECK-NEXT:    callq use.i32@PLT
-; CHECK-NEXT:  .LBB2_7: # %for.body.tail
-; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:  .LBB5_7: # %for.body.tail
+; CHECK-NEXT:    # in Loop: Header=BB5_2 Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    jne .LBB2_8
-; CHECK-NEXT:  .LBB2_2: # %for.body
+; CHECK-NEXT:    jne .LBB5_8
+; CHECK-NEXT:  .LBB5_2: # %for.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    je .LBB2_5
+; CHECK-NEXT:    je .LBB5_5
 ; CHECK-NEXT:  # %bb.3: # %for.body0
-; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:    # in Loop: Header=BB5_2 Depth=1
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    movl %eax, %r14d
 ; CHECK-NEXT:    callq get.i32@PLT
 ; CHECK-NEXT:    testb $1, %r14b
-; CHECK-NEXT:    je .LBB2_7
-; CHECK-NEXT:  # %bb.4: # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:    je .LBB5_7
+; CHECK-NEXT:  # %bb.4: # in Loop: Header=BB5_2 Depth=1
 ; CHECK-NEXT:    movl %eax, %ebp
 ; CHECK-NEXT:    incl %ebp
-; CHECK-NEXT:    jmp .LBB2_6
+; CHECK-NEXT:    jmp .LBB5_6
 ; CHECK-NEXT:    .p2align 4, 0x90
-; CHECK-NEXT:  .LBB2_5: # %for.body2
-; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
+; CHECK-NEXT:  .LBB5_5: # %for.body2
+; CHECK-NEXT:    # in Loop: Header=BB5_2 Depth=1
 ; CHECK-NEXT:    xorl %ebp, %ebp
 ; CHECK-NEXT:    callq get.i1@PLT
 ; CHECK-NEXT:    testb $1, %al
-; CHECK-NEXT:    jne .LBB2_6
-; CHECK-NEXT:    jmp .LBB2_7
-; CHECK-NEXT:  .LBB2_8:
+; CHECK-NEXT:    jne .LBB5_6
+; CHECK-NEXT:    jmp .LBB5_7
+; CHECK-NEXT:  .LBB5_8:
 ; CHECK-NEXT:    popq %rbx
 ; CHECK-NEXT:    popq %r14
 ; CHECK-NEXT:    popq %rbp
-; CHECK-NEXT:  .LBB2_9: # %for.cond.cleanup
+; CHECK-NEXT:  .LBB5_9: # %for.cond.cleanup
 ; CHECK-NEXT:    retq
 entry:
   %cmp3.not = icmp eq i32 %N, 0
@@ -213,40 +437,36 @@ for.body.tail:
 define void @simple_urem_to_sel_vec(<2 x i64> %rem_amt) nounwind {
 ; CHECK-LABEL: simple_urem_to_sel_vec:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    pushq %r14
-; CHECK-NEXT:    pushq %rbx
-; CHECK-NEXT:    subq $24, %rsp
+; CHECK...
[truncated]

@goldsteinn goldsteinn changed the title goldsteinn/cgp urem recommit Recommit "[CodeGenPrepare] Folding urem with loop invariant value" Aug 19, 2024
@goldsteinn
Copy link
Contributor Author

Fix for issues in #96625

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think the new tests should go into the IR test file, not the asm test file...

@goldsteinn
Copy link
Contributor Author

LGTM, but I think the new tests should go into the IR test file, not the asm test file...

Okay, Ill copy tests to IR file before pushing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants