[InstCombine] Fix fail to fold (A >> C1) Pred C2 if shr is used multple times #83430 #83563

SahilPatidar · 2024-03-01T12:27:21Z

SahilPatidar · 2024-03-01T12:28:16Z

dtcxzyw · 2024-03-01T12:36:17Z

llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll

@@ -10,7 +10,8 @@ define i64 @dont_break_minmax_i64(i64 %conv, i64 %conv2) {
 ; CHECK-SAME: (i64 [[CONV:%.*]], i64 [[CONV2:%.*]]) {
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i64 [[CONV]], [[CONV2]]
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i64 [[MUL]], 4
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = call i64 @llvm.smin.i64(i64 [[SHR]], i64 348731)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i64 [[MUL]], 5579712


It breaks the min/max idiom.

You could try passing matchSelectPattern result to the function and only drop the 1 use check if the result is SPF_UNKNOWN

Are you referring to this kind of usage: matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN?

When I executed matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN within the function, it produced an SPF_UNKNOWN output for the dont_break_minmax test on this code snippet.

Hmm, I would have assumed that would indicate we are breaking the selection idiom. Would have to spend some time digging to see where it goes wrong. maybe @dtcxzyw has an idea?

The check already exists here:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 7043 to 7058 in 0c89427

// Test if the ICmpInst instruction is used exclusively by a select as

// part of a minimum or maximum operation. If so, refrain from doing

// any other folding. This helps out other analyses which understand

// non-obfuscated minimum and maximum idioms, such as ScalarEvolution

// and CodeGen. And in this case, at least one of the comparison

// operands has at least one user besides the compare (the select),

// which would often largely negate the benefit of folding anyway.

//

// Do the same for the other patterns recognized by matchSelectPattern.

if (I.hasOneUse())

if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) {

Value *A, *B;

SelectPatternResult SPR = matchSelectPattern(SI, A, B);

if (SPR.Flavor != SPF_UNKNOWN)

return nullptr;

}

Could you please move the call to foldICmpWithConstant after the check?
Related patch: #82472

Failed Tests (3):
LLVM :: Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
LLVM :: Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
LLVM :: Transforms/PhaseOrdering/loop-access-checks.ll

@SahilPatidar Please update these tests.

aemerson · 2024-03-06T07:55:24Z

Absolutely do not simply remove this oneuse check, if you break the minmax tests at all then this can't proceed.

SahilPatidar · 2024-03-07T17:32:39Z

Since matchSelectPattern is failing to check for the minimax pattern, it raises the question of whether the issue lies with matchSelectPattern or if there is a need to implement additional functionality to correctly identify the minimax pattern. Could you provide some guidance or hints to help me begin investigating this matter?

aemerson · 2024-03-07T22:17:41Z

Since matchSelectPattern is failing to check for the minimax pattern, it raises the question of whether the issue lies with matchSelectPattern or if there is a need to implement additional functionality to correctly identify the minimax pattern. Could you provide some guidance or hints to help me begin investigating this matter?

You'll have to just step through it in a debugger to see why it fails to detect. I would guess that the minmax patterns relies on some more simplifications to have happened first. The whole instcombine visitation ordering issue is very tricky.

SahilPatidar · 2024-03-08T16:19:13Z

@aemerson, @dtcxzyw, @goldsteinn
Throughout the code review, I noticed an issue caused by %switch.i = icmp ult i1 %cmp4.i, true. This part of the code checks if (TrueVal == CmpLHS && FalseVal == CmpRHS), indicating a match that could be handled more effectively with a pattern-matching approach. For example, in the following code snippet:

  %cmp4.i = icmp sgt i64 %shr, 348731
  %switch.i = icmp ult i1 %cmp4.i, true
  %spec.select.i = select i1 %switch.i, i64 %shr, i64 348731

If we examine the value of %cmp4.i, we find that both branches have the same value. This suggests that we only need to handle the middle pattern. I just noticed this and I am not clear on how effective it could be. Could you please provide your insight on this?

SahilPatidar · 2024-04-13T06:39:33Z

@dtcxzyw

llvmbot · 2024-04-13T07:32:43Z

@llvm/pr-subscribers-llvm-transforms

Author: None (SahilPatidar)

Changes

Resolve #83430

Full diff: https://github.com/llvm/llvm-project/pull/83563.diff

7 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+4-4)
(modified) llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll (+2-1)
(modified) llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/icmp-shr.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll (+24-22)
(modified) llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll (+7-3)
(modified) llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll (+1-1)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 49e597171b1c6f..4eac73e8055589 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2469,7 +2469,7 @@ Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
   // constant-value-based preconditions in the folds below, then we could assert
   // those conditions rather than checking them. This is difficult because of
   // undef/poison (PR34838).
-  if (IsAShr && Shr->hasOneUse()) {
+  if (IsAShr) {
     if (IsExact || Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) {
       // When ShAmtC can be shifted losslessly:
       // icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
@@ -7025,9 +7025,6 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
   if (Instruction *Res = canonicalizeICmpPredicate(I))
     return Res;
 
-  if (Instruction *Res = foldICmpWithConstant(I))
-    return Res;
-
   if (Instruction *Res = foldICmpWithDominatingICmp(I))
     return Res;
 
@@ -7057,6 +7054,9 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
         return nullptr;
     }
 
+  if (Instruction *Res = foldICmpWithConstant(I))
+    return Res;
+
   // Do this after checking for min/max to prevent infinite looping.
   if (Instruction *Res = foldICmpWithZero(I))
     return Res;
diff --git a/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll b/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
index c6d6e916b2c786..0c445b3be3740b 100644
--- a/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
+++ b/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
@@ -10,7 +10,8 @@ define i64 @dont_break_minmax_i64(i64 %conv, i64 %conv2) {
 ; CHECK-SAME: (i64 [[CONV:%.*]], i64 [[CONV2:%.*]]) {
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i64 [[CONV]], [[CONV2]]
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i64 [[MUL]], 4
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = call i64 @llvm.smin.i64(i64 [[SHR]], i64 348731)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i64 [[MUL]], 5579712
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CMP4_I]], i64 [[SHR]], i64 348731
 ; CHECK-NEXT:    ret i64 [[SPEC_SELECT_I]]
 ;
   %mul = mul nsw i64 %conv, %conv2
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
index 1b8efe4351c6dc..08a763a50bf958 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
@@ -900,7 +900,7 @@ define i1 @ashrsgt_01_00(i4 %x) {
 define i1 @ashrsgt_01_00_multiuse(i4 %x, ptr %p) {
 ; CHECK-LABEL: @ashrsgt_01_00_multiuse(
 ; CHECK-NEXT:    [[S:%.*]] = ashr i4 [[X:%.*]], 1
-; CHECK-NEXT:    [[C:%.*]] = icmp sgt i4 [[S]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp sgt i4 [[X]], 1
 ; CHECK-NEXT:    store i4 [[S]], ptr [[P:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
@@ -2442,7 +2442,7 @@ define i1 @ashr_sle_noexact(i8 %x) {
 define i1 @ashr_00_00_ashr_extra_use(i8 %x, ptr %ptr) {
 ; CHECK-LABEL: @ashr_00_00_ashr_extra_use(
 ; CHECK-NEXT:    [[S:%.*]] = ashr exact i8 [[X:%.*]], 3
-; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[S]], 11
+; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[X]], 88
 ; CHECK-NEXT:    store i8 [[S]], ptr [[PTR:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr.ll b/llvm/test/Transforms/InstCombine/icmp-shr.ll
index 71b4f5a970c2f6..cdbe9c9e4986fd 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr.ll
@@ -780,7 +780,7 @@ define i1 @ashr_ult_2(i4 %x) {
 define i1 @ashr_ult_2_multiuse(i4 %x, ptr %p) {
 ; CHECK-LABEL: @ashr_ult_2_multiuse(
 ; CHECK-NEXT:    [[S:%.*]] = ashr i4 [[X:%.*]], 1
-; CHECK-NEXT:    [[R:%.*]] = icmp ult i4 [[S]], 2
+; CHECK-NEXT:    [[R:%.*]] = icmp ult i4 [[X]], 4
 ; CHECK-NEXT:    store i4 [[S]], ptr [[P:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
diff --git a/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll b/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
index 73bcee5fb74f0c..055a420ccfc65b 100644
--- a/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
+++ b/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
@@ -14,11 +14,11 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[CMP_NOT2:%.*]] = icmp eq i32 [[BLOCKSIZE:%.*]], 0
 ; CHECK-NEXT:    br i1 [[CMP_NOT2]], label [[WHILE_END:%.*]], label [[WHILE_BODY_PREHEADER:%.*]]
 ; CHECK:       while.body.preheader:
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 8
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 4
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[WHILE_BODY_PREHEADER16:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -8
-; CHECK-NEXT:    [[IND_END:%.*]] = and i32 [[BLOCKSIZE]], 7
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -4
+; CHECK-NEXT:    [[IND_END:%.*]] = and i32 [[BLOCKSIZE]], 3
 ; CHECK-NEXT:    [[TMP0:%.*]] = shl i32 [[N_VEC]], 1
 ; CHECK-NEXT:    [[IND_END7:%.*]] = getelementptr i8, ptr [[PSRCA:%.*]], i32 [[TMP0]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl i32 [[N_VEC]], 1
@@ -34,18 +34,19 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[NEXT_GEP13:%.*]] = getelementptr i8, ptr [[PDST]], i32 [[TMP4]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = shl i32 [[INDEX]], 1
 ; CHECK-NEXT:    [[NEXT_GEP14:%.*]] = getelementptr i8, ptr [[PSRCB]], i32 [[TMP5]]
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[NEXT_GEP]], align 2
-; CHECK-NEXT:    [[TMP6:%.*]] = sext <8 x i16> [[WIDE_LOAD]] to <8 x i32>
-; CHECK-NEXT:    [[WIDE_LOAD15:%.*]] = load <8 x i16>, ptr [[NEXT_GEP14]], align 2
-; CHECK-NEXT:    [[TMP7:%.*]] = sext <8 x i16> [[WIDE_LOAD15]] to <8 x i32>
-; CHECK-NEXT:    [[TMP8:%.*]] = mul nsw <8 x i32> [[TMP7]], [[TMP6]]
-; CHECK-NEXT:    [[TMP9:%.*]] = ashr <8 x i32> [[TMP8]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
-; CHECK-NEXT:    [[TMP10:%.*]] = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP9]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
-; CHECK-NEXT:    [[TMP11:%.*]] = trunc <8 x i32> [[TMP10]] to <8 x i16>
-; CHECK-NEXT:    store <8 x i16> [[TMP11]], ptr [[NEXT_GEP13]], align 2
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
-; CHECK-NEXT:    [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[NEXT_GEP]], align 2
+; CHECK-NEXT:    [[TMP6:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
+; CHECK-NEXT:    [[WIDE_LOAD15:%.*]] = load <4 x i16>, ptr [[NEXT_GEP14]], align 2
+; CHECK-NEXT:    [[TMP7:%.*]] = sext <4 x i16> [[WIDE_LOAD15]] to <4 x i32>
+; CHECK-NEXT:    [[TMP8:%.*]] = mul nsw <4 x i32> [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    [[TMP9:%.*]] = lshr <4 x i32> [[TMP8]], <i32 15, i32 15, i32 15, i32 15>
+; CHECK-NEXT:    [[TMP10:%.*]] = icmp slt <4 x i32> [[TMP8]], <i32 1073741824, i32 1073741824, i32 1073741824, i32 1073741824>
+; CHECK-NEXT:    [[TMP11:%.*]] = trunc <4 x i32> [[TMP9]] to <4 x i16>
+; CHECK-NEXT:    [[TMP12:%.*]] = select <4 x i1> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> <i16 32767, i16 32767, i16 32767, i16 32767>
+; CHECK-NEXT:    store <4 x i16> [[TMP12]], ptr [[NEXT_GEP13]], align 2
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[WHILE_END]], label [[WHILE_BODY_PREHEADER16]]
@@ -61,15 +62,16 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[PDST_ADDR_04:%.*]] = phi ptr [ [[INCDEC_PTR4:%.*]], [[WHILE_BODY]] ], [ [[PDST_ADDR_04_PH]], [[WHILE_BODY_PREHEADER16]] ]
 ; CHECK-NEXT:    [[PSRCB_ADDR_03:%.*]] = phi ptr [ [[INCDEC_PTR1:%.*]], [[WHILE_BODY]] ], [ [[PSRCB_ADDR_03_PH]], [[WHILE_BODY_PREHEADER16]] ]
 ; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[PSRCA_ADDR_05]], i32 2
-; CHECK-NEXT:    [[TMP13:%.*]] = load i16, ptr [[PSRCA_ADDR_05]], align 2
-; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP13]] to i32
+; CHECK-NEXT:    [[TMP14:%.*]] = load i16, ptr [[PSRCA_ADDR_05]], align 2
+; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP14]] to i32
 ; CHECK-NEXT:    [[INCDEC_PTR1]] = getelementptr inbounds i8, ptr [[PSRCB_ADDR_03]], i32 2
-; CHECK-NEXT:    [[TMP14:%.*]] = load i16, ptr [[PSRCB_ADDR_03]], align 2
-; CHECK-NEXT:    [[CONV2:%.*]] = sext i16 [[TMP14]] to i32
+; CHECK-NEXT:    [[TMP15:%.*]] = load i16, ptr [[PSRCB_ADDR_03]], align 2
+; CHECK-NEXT:    [[CONV2:%.*]] = sext i16 [[TMP15]] to i32
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i32 [[CONV2]], [[CONV]]
-; CHECK-NEXT:    [[SHR:%.*]] = ashr i32 [[MUL]], 15
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[SHR]], i32 32767)
-; CHECK-NEXT:    [[CONV3:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i16
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 [[MUL]], 15
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i32 [[MUL]], 1073741824
+; CHECK-NEXT:    [[TMP16:%.*]] = trunc i32 [[SHR]] to i16
+; CHECK-NEXT:    [[CONV3:%.*]] = select i1 [[CMP4_I]], i16 [[TMP16]], i16 32767
 ; CHECK-NEXT:    [[INCDEC_PTR4]] = getelementptr inbounds i8, ptr [[PDST_ADDR_04]], i32 2
 ; CHECK-NEXT:    store i16 [[CONV3]], ptr [[PDST_ADDR_04]], align 2
 ; CHECK-NEXT:    [[DEC]] = add i32 [[BLKCNT_06]], -1
diff --git a/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll b/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
index 67d721b23d6f00..8559f973f281ac 100644
--- a/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
+++ b/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
@@ -5,7 +5,8 @@ define i32 @testa(i32 %mul) {
 ; CHECK-LABEL: define i32 @testa(
 ; CHECK-SAME: i32 [[MUL:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i32 [[MUL]], 15
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[SHR]], i32 32767)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i32 [[MUL]], 1073741824
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CMP4_I]], i32 [[SHR]], i32 32767
 ; CHECK-NEXT:    ret i32 [[SPEC_SELECT_I]]
 ;
   %shr = ashr i32 %mul, 15
@@ -19,8 +20,11 @@ define i32 @testb(i32 %mul) {
 ; CHECK-LABEL: define i32 @testb(
 ; CHECK-SAME: i32 [[MUL:%.*]]) local_unnamed_addr #[[ATTR0]] {
 ; CHECK-NEXT:    [[SHR102:%.*]] = ashr i32 [[MUL]], 7
-; CHECK-NEXT:    [[TMP1:%.*]] = tail call i32 @llvm.smax.i32(i32 [[SHR102]], i32 -128)
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[TMP1]], i32 127)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp sgt i32 [[MUL]], 16383
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = select i1 [[CMP4_I]], i32 127, i32 -128
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[MUL]], 16384
+; CHECK-NEXT:    [[CLEANUP_DEST_SLOT_0_I:%.*]] = icmp ult i32 [[TMP1]], 32768
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CLEANUP_DEST_SLOT_0_I]], i32 [[SHR102]], i32 [[RETVAL_0_I]]
 ; CHECK-NEXT:    ret i32 [[SPEC_SELECT_I]]
 ;
   %shr102 = ashr i32 %mul, 7
diff --git a/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll b/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
index bd509509c321f8..d41883fb788d3b 100644
--- a/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
+++ b/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
@@ -279,7 +279,7 @@ define void @loop_with_signed_induction(ptr noundef nonnull align 8 dereferencea
 ; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I_I:%.*]] = ptrtoint ptr [[TMP1]] to i64
 ; CHECK-NEXT:    [[SUB_PTR_SUB_I_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I_I]], [[SUB_PTR_RHS_CAST_I_I]]
 ; CHECK-NEXT:    [[SUB_PTR_DIV_I_I:%.*]] = ashr exact i64 [[SUB_PTR_SUB_I_I]], 3
-; CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i64 [[SUB_PTR_DIV_I_I]], 0
+; CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i64 [[SUB_PTR_SUB_I_I]], 0
 ; CHECK-NEXT:    br i1 [[CMP9]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 ; CHECK:       for.cond.cleanup:
 ; CHECK-NEXT:    ret void

dtcxzyw · 2024-04-13T07:41:26Z

@aemerson, @dtcxzyw, @goldsteinn Throughout the code review, I noticed an issue caused by %switch.i = icmp ult i1 %cmp4.i, true. This part of the code checks if (TrueVal == CmpLHS && FalseVal == CmpRHS), indicating a match that could be handled more effectively with a pattern-matching approach. For example, in the following code snippet:
  %cmp4.i = icmp sgt i64 %shr, 348731
  %switch.i = icmp ult i1 %cmp4.i, true
  %spec.select.i = select i1 %switch.i, i64 %shr, i64 348731
If we examine the value of %cmp4.i, we find that both branches have the same value. This suggests that we only need to handle the middle pattern. I just noticed this and I am not clear on how effective it could be. Could you please provide your insight on this?

cc @nikic

nikic · 2024-04-13T11:00:57Z

I think the only way to move forward with this would be to teach SPF about this pattern (I mean the one with comparison of the ashr operand, not the one with the non-canonical not).

SahilPatidar · 2024-04-15T11:12:43Z

@dtcxzyw, I was experimenting with SPF code and added some logic before the if (TrueVal == CmpLHS && FalseVal == CmpRHS) statement. but it didn't work because of this:

(lldb) expr SI->getCondition()->dump()
 %switch.i = xor i1 %cmp4.i, true

dtcxzyw · 2024-04-15T11:35:21Z

@dtcxzyw, I was experimenting with SPF code and added some logic before the if (TrueVal == CmpLHS && FalseVal == CmpRHS) statement. but it didn't work because of this:
(lldb) expr SI->getCondition()->dump()
 %switch.i = xor i1 %cmp4.i, true

You should match this with m_Not, then update the cmp predicate with CmpInst::getInversePredicate.

bool Invert = false;
Value *Condition = SI->getCondition();
Value *X;
if (match(Condition, m_Not(X)) {
  Invert = true;
  Condition = X;
}

CmpInst::Predicate Pred;
Value *CmpLHS, *CmpRHS;
if (!match(Condition, m_ICmp(Pred, m_Value(CmpLHS), m_Value(CmpRHS)))
   fail...

if (Invert)
  Pred = CmpInst::getInversePredicate(Pred);

...

SahilPatidar · 2024-04-19T06:46:15Z

@dtcxyzw,

if (IsAShr && matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN) {

After I made some code changes, this line now triggers three times. However, it only returns SPF_SMIN twice. On the third call, Shr->user_back() seems to be behaving unexpectedly, as shown by the following output:

(lldb) expr Shr->user_back()->dump()
 %cmp4.i = icmp slt i64 %shr, 348732

Can you help me understand why this discrepancy might be happening?

dtcxzyw · 2024-04-19T07:01:37Z

@dtcxyzw,
if (IsAShr && matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN) {
After I made some code changes, this line now triggers three times. However, it only returns SPF_SMIN twice. On the third call, Shr->user_back() seems to be behaving unexpectedly, as shown by the following output:
(lldb) expr Shr->user_back()->dump()
 %cmp4.i = icmp slt i64 %shr, 348732
Can you help me understand why this discrepancy might be happening?

%spec.select.i = select i1 %switch.i, i64 %shr, i64 348731 seems to be eliminated. You can pass -debug-only=instcombine to opt for more details :)

SahilPatidar · 2024-04-20T06:55:11Z

@dtcxzyw in first iteration:

IC: Visiting:   %switch.i = icmp ult i1 %cmp4.i, true
IC: Old =   %switch.i = icmp ult i1 %cmp4.i, true
    New =   <badref> = xor i1 %cmp4.i, true
ADD:   %switch.i = xor i1 %cmp4.i, true
IC: ERASE   %1 = icmp ult i1 %cmp4.i, true
ADD DEFERRED:   %cmp4.i = icmp sgt i64 %shr, 348731
ADD DEFERRED:   %switch.i = xor i1 %cmp4.i, true
ADD:   %cmp4.i = icmp sgt i64 %shr, 348731
IC: Visiting:   %cmp4.i = icmp sgt i64 %shr, 348731

second:

IC: Visiting:   %switch.i = xor i1 %cmp4.i, true
IC: Replacing   %switch.i = xor i1 %cmp4.i, true
    with   %cmp4.i = icmp sle i64 %shr, 348731
ADD:   %switch.i = xor i1 %cmp4.i, true
IC: Mod =   %switch.i = xor i1 %cmp4.i, true
    New =   %switch.i = xor i1 %cmp4.i, true
IC: ERASE   %switch.i = xor i1 %cmp4.i, true
ADD DEFERRED:   %cmp4.i = icmp sle i64 %shr, 348731
ADD DEFERRED:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
ADD:   %cmp4.i = icmp sle i64 %shr, 348731
IC: Visiting:   %cmp4.i = icmp sle i64 %shr, 348731
IC: Old =   %cmp4.i = icmp sle i64 %shr, 348731
    New =   <badref> = icmp slt i64 %shr, 348732
ADD:   %cmp4.i = icmp slt i64 %shr, 348732
IC: ERASE   %1 = icmp sle i64 %shr, 348731
ADD DEFERRED:   %shr = ashr i64 %mul, 4
ADD:   %shr = ashr i64 %mul, 4
IC: Visiting:   %shr = ashr i64 %mul, 4
IC: Visiting:   %cmp4.i = icmp slt i64 %shr, 348732

third:

IC: Old =   %cmp4.i = icmp slt i64 %shr, 348732
    New =   <badref> = icmp slt i64 %mul, 5579712
ADD:   %cmp4.i = icmp slt i64 %mul, 5579712
IC: ERASE   %1 = icmp slt i64 %shr, 348732
ADD DEFERRED:   %shr = ashr i64 %mul, 4
ADD DEFERRED:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
ADD:   %shr = ashr i64 %mul, 4
IC: Visiting:   %shr = ashr i64 %mul, 4
IC: Visiting:   %cmp4.i = icmp slt i64 %mul, 5579712
IC: Visiting:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
IC: Visiting:   ret i64 %spec.select.i

dtcxzyw · 2024-04-20T08:57:39Z

After the second iteration:

%shr = ashr i64 %mul, 4
%cmp4.i = icmp slt i64 %shr, 348732
%spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731

As %shr has multiple users, the result of Shr->user_back() is undetermined. Can you try all_of(Shr->users(), [] (User* U) {return matchSelectPattern(U, Lhs, Rhs).Flavor == SPF_UNKNOWN;})?

SahilPatidar · 2024-04-23T05:08:23Z

@dtcxzyw, It seems to be working perfectly now, and I haven't noticed any test failures after the changes. However, the current implementation feels a bit like a brute force approach, and the code could definitely use some improvement – it's a bit rough around the edges. Considering this, should I go ahead and make a commit on this PR?

[InstCombine] Fix fail to fold (A >> C1) Pred C2 if shr is used multi…

28dba0b

…ple times llvm#83430

dtcxzyw reviewed Mar 1, 2024

View reviewed changes

dtcxzyw requested review from aemerson, davemgreen, XChy and goldsteinn March 1, 2024 12:37

moves foldICmpWithConstant after the SPF check

515b967

SahilPatidar force-pushed the resolve83430 branch from da1af62 to 515b967 Compare March 4, 2024 10:14

SahilPatidar requested a review from dtcxzyw March 12, 2024 04:07

dtcxzyw marked this pull request as ready for review April 13, 2024 07:32

dtcxzyw requested a review from nikic as a code owner April 13, 2024 07:32

llvmbot added the llvm:transforms label Apr 13, 2024

	// Test if the ICmpInst instruction is used exclusively by a select as
	// part of a minimum or maximum operation. If so, refrain from doing
	// any other folding. This helps out other analyses which understand
	// non-obfuscated minimum and maximum idioms, such as ScalarEvolution
	// and CodeGen. And in this case, at least one of the comparison
	// operands has at least one user besides the compare (the select),
	// which would often largely negate the benefit of folding anyway.
	//
	// Do the same for the other patterns recognized by matchSelectPattern.
	if (I.hasOneUse())
	if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) {
	Value A, B;
	SelectPatternResult SPR = matchSelectPattern(SI, A, B);
	if (SPR.Flavor != SPF_UNKNOWN)
	return nullptr;
	}

[InstCombine] Fix fail to fold (A >> C1) Pred C2 if shr is used multple times #83430 #83563

Are you sure you want to change the base?

[InstCombine] Fix fail to fold (A >> C1) Pred C2 if shr is used multple times #83430 #83563

Uh oh!

Conversation

SahilPatidar commented Mar 1, 2024

Uh oh!

SahilPatidar commented Mar 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aemerson commented Mar 6, 2024

Uh oh!

SahilPatidar commented Mar 7, 2024

Uh oh!

aemerson commented Mar 7, 2024

Uh oh!

SahilPatidar commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SahilPatidar commented Apr 13, 2024

Uh oh!

llvmbot commented Apr 13, 2024

Uh oh!

dtcxzyw commented Apr 13, 2024

Uh oh!

nikic commented Apr 13, 2024

Uh oh!

SahilPatidar commented Apr 15, 2024

Uh oh!

dtcxzyw commented Apr 15, 2024

Uh oh!

SahilPatidar commented Apr 19, 2024

Uh oh!

dtcxzyw commented Apr 19, 2024

Uh oh!

SahilPatidar commented Apr 20, 2024

Uh oh!

dtcxzyw commented Apr 20, 2024

Uh oh!

SahilPatidar commented Apr 23, 2024

Uh oh!

Uh oh!

SahilPatidar commented Mar 8, 2024 •

edited

Loading