Skip to content

[InstCombine] Fix fail to fold (A >> C1) Pred C2 if shr is used multple times #83430 #83563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SahilPatidar
Copy link
Contributor

Resolve #83430

@SahilPatidar
Copy link
Contributor Author

@XChy, @dtcxzyw

@@ -10,7 +10,8 @@ define i64 @dont_break_minmax_i64(i64 %conv, i64 %conv2) {
; CHECK-SAME: (i64 [[CONV:%.*]], i64 [[CONV2:%.*]]) {
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i64 [[CONV]], [[CONV2]]
; CHECK-NEXT: [[SHR:%.*]] = ashr i64 [[MUL]], 4
; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = call i64 @llvm.smin.i64(i64 [[SHR]], i64 348731)
; CHECK-NEXT: [[CMP4_I:%.*]] = icmp slt i64 [[MUL]], 5579712
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks the min/max idiom.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try passing matchSelectPattern result to the function and only drop the 1 use check if the result is SPF_UNKNOWN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this kind of usage: matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I executed matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN within the function, it produced an SPF_UNKNOWN output for the dont_break_minmax test on this code snippet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I would have assumed that would indicate we are breaking the selection idiom. Would have to spend some time digging to see where it goes wrong. maybe @dtcxzyw has an idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check already exists here:

// Test if the ICmpInst instruction is used exclusively by a select as
// part of a minimum or maximum operation. If so, refrain from doing
// any other folding. This helps out other analyses which understand
// non-obfuscated minimum and maximum idioms, such as ScalarEvolution
// and CodeGen. And in this case, at least one of the comparison
// operands has at least one user besides the compare (the select),
// which would often largely negate the benefit of folding anyway.
//
// Do the same for the other patterns recognized by matchSelectPattern.
if (I.hasOneUse())
if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) {
Value *A, *B;
SelectPatternResult SPR = matchSelectPattern(SI, A, B);
if (SPR.Flavor != SPF_UNKNOWN)
return nullptr;
}

Could you please move the call to foldICmpWithConstant after the check?
Related patch: #82472

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed Tests (3):
LLVM :: Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
LLVM :: Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
LLVM :: Transforms/PhaseOrdering/loop-access-checks.ll

@SahilPatidar Please update these tests.

@aemerson
Copy link
Contributor

aemerson commented Mar 6, 2024

Absolutely do not simply remove this oneuse check, if you break the minmax tests at all then this can't proceed.

@SahilPatidar
Copy link
Contributor Author

Since matchSelectPattern is failing to check for the minimax pattern, it raises the question of whether the issue lies with matchSelectPattern or if there is a need to implement additional functionality to correctly identify the minimax pattern. Could you provide some guidance or hints to help me begin investigating this matter?

@aemerson
Copy link
Contributor

aemerson commented Mar 7, 2024

Since matchSelectPattern is failing to check for the minimax pattern, it raises the question of whether the issue lies with matchSelectPattern or if there is a need to implement additional functionality to correctly identify the minimax pattern. Could you provide some guidance or hints to help me begin investigating this matter?

You'll have to just step through it in a debugger to see why it fails to detect. I would guess that the minmax patterns relies on some more simplifications to have happened first. The whole instcombine visitation ordering issue is very tricky.

@SahilPatidar
Copy link
Contributor Author

SahilPatidar commented Mar 8, 2024

@aemerson, @dtcxzyw, @goldsteinn
Throughout the code review, I noticed an issue caused by %switch.i = icmp ult i1 %cmp4.i, true. This part of the code checks if (TrueVal == CmpLHS && FalseVal == CmpRHS), indicating a match that could be handled more effectively with a pattern-matching approach. For example, in the following code snippet:

  %cmp4.i = icmp sgt i64 %shr, 348731
  %switch.i = icmp ult i1 %cmp4.i, true
  %spec.select.i = select i1 %switch.i, i64 %shr, i64 348731

If we examine the value of %cmp4.i, we find that both branches have the same value. This suggests that we only need to handle the middle pattern. I just noticed this and I am not clear on how effective it could be. Could you please provide your insight on this?

@SahilPatidar SahilPatidar requested a review from dtcxzyw March 12, 2024 04:07
@SahilPatidar
Copy link
Contributor Author

@dtcxzyw

@dtcxzyw dtcxzyw marked this pull request as ready for review April 13, 2024 07:32
@dtcxzyw dtcxzyw requested a review from nikic as a code owner April 13, 2024 07:32
@llvmbot
Copy link
Member

llvmbot commented Apr 13, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (SahilPatidar)

Changes

Resolve #83430


Full diff: https://github.com/llvm/llvm-project/pull/83563.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll (+2-1)
  • (modified) llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/icmp-shr.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll (+24-22)
  • (modified) llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll (+7-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll (+1-1)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 49e597171b1c6f..4eac73e8055589 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2469,7 +2469,7 @@ Instruction *InstCombinerImpl::foldICmpShrConstant(ICmpInst &Cmp,
   // constant-value-based preconditions in the folds below, then we could assert
   // those conditions rather than checking them. This is difficult because of
   // undef/poison (PR34838).
-  if (IsAShr && Shr->hasOneUse()) {
+  if (IsAShr) {
     if (IsExact || Pred == CmpInst::ICMP_SLT || Pred == CmpInst::ICMP_ULT) {
       // When ShAmtC can be shifted losslessly:
       // icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
@@ -7025,9 +7025,6 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
   if (Instruction *Res = canonicalizeICmpPredicate(I))
     return Res;
 
-  if (Instruction *Res = foldICmpWithConstant(I))
-    return Res;
-
   if (Instruction *Res = foldICmpWithDominatingICmp(I))
     return Res;
 
@@ -7057,6 +7054,9 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
         return nullptr;
     }
 
+  if (Instruction *Res = foldICmpWithConstant(I))
+    return Res;
+
   // Do this after checking for min/max to prevent infinite looping.
   if (Instruction *Res = foldICmpWithZero(I))
     return Res;
diff --git a/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll b/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
index c6d6e916b2c786..0c445b3be3740b 100644
--- a/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
+++ b/llvm/test/Transforms/InstCombine/ashr-icmp-minmax-idiom-break.ll
@@ -10,7 +10,8 @@ define i64 @dont_break_minmax_i64(i64 %conv, i64 %conv2) {
 ; CHECK-SAME: (i64 [[CONV:%.*]], i64 [[CONV2:%.*]]) {
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i64 [[CONV]], [[CONV2]]
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i64 [[MUL]], 4
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = call i64 @llvm.smin.i64(i64 [[SHR]], i64 348731)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i64 [[MUL]], 5579712
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CMP4_I]], i64 [[SHR]], i64 348731
 ; CHECK-NEXT:    ret i64 [[SPEC_SELECT_I]]
 ;
   %mul = mul nsw i64 %conv, %conv2
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
index 1b8efe4351c6dc..08a763a50bf958 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll
@@ -900,7 +900,7 @@ define i1 @ashrsgt_01_00(i4 %x) {
 define i1 @ashrsgt_01_00_multiuse(i4 %x, ptr %p) {
 ; CHECK-LABEL: @ashrsgt_01_00_multiuse(
 ; CHECK-NEXT:    [[S:%.*]] = ashr i4 [[X:%.*]], 1
-; CHECK-NEXT:    [[C:%.*]] = icmp sgt i4 [[S]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp sgt i4 [[X]], 1
 ; CHECK-NEXT:    store i4 [[S]], ptr [[P:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
@@ -2442,7 +2442,7 @@ define i1 @ashr_sle_noexact(i8 %x) {
 define i1 @ashr_00_00_ashr_extra_use(i8 %x, ptr %ptr) {
 ; CHECK-LABEL: @ashr_00_00_ashr_extra_use(
 ; CHECK-NEXT:    [[S:%.*]] = ashr exact i8 [[X:%.*]], 3
-; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[S]], 11
+; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[X]], 88
 ; CHECK-NEXT:    store i8 [[S]], ptr [[PTR:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/icmp-shr.ll b/llvm/test/Transforms/InstCombine/icmp-shr.ll
index 71b4f5a970c2f6..cdbe9c9e4986fd 100644
--- a/llvm/test/Transforms/InstCombine/icmp-shr.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-shr.ll
@@ -780,7 +780,7 @@ define i1 @ashr_ult_2(i4 %x) {
 define i1 @ashr_ult_2_multiuse(i4 %x, ptr %p) {
 ; CHECK-LABEL: @ashr_ult_2_multiuse(
 ; CHECK-NEXT:    [[S:%.*]] = ashr i4 [[X:%.*]], 1
-; CHECK-NEXT:    [[R:%.*]] = icmp ult i4 [[S]], 2
+; CHECK-NEXT:    [[R:%.*]] = icmp ult i4 [[X]], 4
 ; CHECK-NEXT:    store i4 [[S]], ptr [[P:%.*]], align 1
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
diff --git a/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll b/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
index 73bcee5fb74f0c..055a420ccfc65b 100644
--- a/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
+++ b/llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll
@@ -14,11 +14,11 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[CMP_NOT2:%.*]] = icmp eq i32 [[BLOCKSIZE:%.*]], 0
 ; CHECK-NEXT:    br i1 [[CMP_NOT2]], label [[WHILE_END:%.*]], label [[WHILE_BODY_PREHEADER:%.*]]
 ; CHECK:       while.body.preheader:
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 8
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[BLOCKSIZE]], 4
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[WHILE_BODY_PREHEADER16:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -8
-; CHECK-NEXT:    [[IND_END:%.*]] = and i32 [[BLOCKSIZE]], 7
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i32 [[BLOCKSIZE]], -4
+; CHECK-NEXT:    [[IND_END:%.*]] = and i32 [[BLOCKSIZE]], 3
 ; CHECK-NEXT:    [[TMP0:%.*]] = shl i32 [[N_VEC]], 1
 ; CHECK-NEXT:    [[IND_END7:%.*]] = getelementptr i8, ptr [[PSRCA:%.*]], i32 [[TMP0]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = shl i32 [[N_VEC]], 1
@@ -34,18 +34,19 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[NEXT_GEP13:%.*]] = getelementptr i8, ptr [[PDST]], i32 [[TMP4]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = shl i32 [[INDEX]], 1
 ; CHECK-NEXT:    [[NEXT_GEP14:%.*]] = getelementptr i8, ptr [[PSRCB]], i32 [[TMP5]]
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[NEXT_GEP]], align 2
-; CHECK-NEXT:    [[TMP6:%.*]] = sext <8 x i16> [[WIDE_LOAD]] to <8 x i32>
-; CHECK-NEXT:    [[WIDE_LOAD15:%.*]] = load <8 x i16>, ptr [[NEXT_GEP14]], align 2
-; CHECK-NEXT:    [[TMP7:%.*]] = sext <8 x i16> [[WIDE_LOAD15]] to <8 x i32>
-; CHECK-NEXT:    [[TMP8:%.*]] = mul nsw <8 x i32> [[TMP7]], [[TMP6]]
-; CHECK-NEXT:    [[TMP9:%.*]] = ashr <8 x i32> [[TMP8]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
-; CHECK-NEXT:    [[TMP10:%.*]] = tail call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[TMP9]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
-; CHECK-NEXT:    [[TMP11:%.*]] = trunc <8 x i32> [[TMP10]] to <8 x i16>
-; CHECK-NEXT:    store <8 x i16> [[TMP11]], ptr [[NEXT_GEP13]], align 2
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
-; CHECK-NEXT:    [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT:    br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[NEXT_GEP]], align 2
+; CHECK-NEXT:    [[TMP6:%.*]] = sext <4 x i16> [[WIDE_LOAD]] to <4 x i32>
+; CHECK-NEXT:    [[WIDE_LOAD15:%.*]] = load <4 x i16>, ptr [[NEXT_GEP14]], align 2
+; CHECK-NEXT:    [[TMP7:%.*]] = sext <4 x i16> [[WIDE_LOAD15]] to <4 x i32>
+; CHECK-NEXT:    [[TMP8:%.*]] = mul nsw <4 x i32> [[TMP7]], [[TMP6]]
+; CHECK-NEXT:    [[TMP9:%.*]] = lshr <4 x i32> [[TMP8]], <i32 15, i32 15, i32 15, i32 15>
+; CHECK-NEXT:    [[TMP10:%.*]] = icmp slt <4 x i32> [[TMP8]], <i32 1073741824, i32 1073741824, i32 1073741824, i32 1073741824>
+; CHECK-NEXT:    [[TMP11:%.*]] = trunc <4 x i32> [[TMP9]] to <4 x i16>
+; CHECK-NEXT:    [[TMP12:%.*]] = select <4 x i1> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> <i16 32767, i16 32767, i16 32767, i16 32767>
+; CHECK-NEXT:    store <4 x i16> [[TMP12]], ptr [[NEXT_GEP13]], align 2
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[N_VEC]], [[BLOCKSIZE]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[WHILE_END]], label [[WHILE_BODY_PREHEADER16]]
@@ -61,15 +62,16 @@ define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockS
 ; CHECK-NEXT:    [[PDST_ADDR_04:%.*]] = phi ptr [ [[INCDEC_PTR4:%.*]], [[WHILE_BODY]] ], [ [[PDST_ADDR_04_PH]], [[WHILE_BODY_PREHEADER16]] ]
 ; CHECK-NEXT:    [[PSRCB_ADDR_03:%.*]] = phi ptr [ [[INCDEC_PTR1:%.*]], [[WHILE_BODY]] ], [ [[PSRCB_ADDR_03_PH]], [[WHILE_BODY_PREHEADER16]] ]
 ; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[PSRCA_ADDR_05]], i32 2
-; CHECK-NEXT:    [[TMP13:%.*]] = load i16, ptr [[PSRCA_ADDR_05]], align 2
-; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP13]] to i32
+; CHECK-NEXT:    [[TMP14:%.*]] = load i16, ptr [[PSRCA_ADDR_05]], align 2
+; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[TMP14]] to i32
 ; CHECK-NEXT:    [[INCDEC_PTR1]] = getelementptr inbounds i8, ptr [[PSRCB_ADDR_03]], i32 2
-; CHECK-NEXT:    [[TMP14:%.*]] = load i16, ptr [[PSRCB_ADDR_03]], align 2
-; CHECK-NEXT:    [[CONV2:%.*]] = sext i16 [[TMP14]] to i32
+; CHECK-NEXT:    [[TMP15:%.*]] = load i16, ptr [[PSRCB_ADDR_03]], align 2
+; CHECK-NEXT:    [[CONV2:%.*]] = sext i16 [[TMP15]] to i32
 ; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i32 [[CONV2]], [[CONV]]
-; CHECK-NEXT:    [[SHR:%.*]] = ashr i32 [[MUL]], 15
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[SHR]], i32 32767)
-; CHECK-NEXT:    [[CONV3:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i16
+; CHECK-NEXT:    [[SHR:%.*]] = lshr i32 [[MUL]], 15
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i32 [[MUL]], 1073741824
+; CHECK-NEXT:    [[TMP16:%.*]] = trunc i32 [[SHR]] to i16
+; CHECK-NEXT:    [[CONV3:%.*]] = select i1 [[CMP4_I]], i16 [[TMP16]], i16 32767
 ; CHECK-NEXT:    [[INCDEC_PTR4]] = getelementptr inbounds i8, ptr [[PDST_ADDR_04]], i32 2
 ; CHECK-NEXT:    store i16 [[CONV3]], ptr [[PDST_ADDR_04]], align 2
 ; CHECK-NEXT:    [[DEC]] = add i32 [[BLKCNT_06]], -1
diff --git a/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll b/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
index 67d721b23d6f00..8559f973f281ac 100644
--- a/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
+++ b/llvm/test/Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll
@@ -5,7 +5,8 @@ define i32 @testa(i32 %mul) {
 ; CHECK-LABEL: define i32 @testa(
 ; CHECK-SAME: i32 [[MUL:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:    [[SHR:%.*]] = ashr i32 [[MUL]], 15
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[SHR]], i32 32767)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp slt i32 [[MUL]], 1073741824
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CMP4_I]], i32 [[SHR]], i32 32767
 ; CHECK-NEXT:    ret i32 [[SPEC_SELECT_I]]
 ;
   %shr = ashr i32 %mul, 15
@@ -19,8 +20,11 @@ define i32 @testb(i32 %mul) {
 ; CHECK-LABEL: define i32 @testb(
 ; CHECK-SAME: i32 [[MUL:%.*]]) local_unnamed_addr #[[ATTR0]] {
 ; CHECK-NEXT:    [[SHR102:%.*]] = ashr i32 [[MUL]], 7
-; CHECK-NEXT:    [[TMP1:%.*]] = tail call i32 @llvm.smax.i32(i32 [[SHR102]], i32 -128)
-; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = tail call i32 @llvm.smin.i32(i32 [[TMP1]], i32 127)
+; CHECK-NEXT:    [[CMP4_I:%.*]] = icmp sgt i32 [[MUL]], 16383
+; CHECK-NEXT:    [[RETVAL_0_I:%.*]] = select i1 [[CMP4_I]], i32 127, i32 -128
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[MUL]], 16384
+; CHECK-NEXT:    [[CLEANUP_DEST_SLOT_0_I:%.*]] = icmp ult i32 [[TMP1]], 32768
+; CHECK-NEXT:    [[SPEC_SELECT_I:%.*]] = select i1 [[CLEANUP_DEST_SLOT_0_I]], i32 [[SHR102]], i32 [[RETVAL_0_I]]
 ; CHECK-NEXT:    ret i32 [[SPEC_SELECT_I]]
 ;
   %shr102 = ashr i32 %mul, 7
diff --git a/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll b/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
index bd509509c321f8..d41883fb788d3b 100644
--- a/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
+++ b/llvm/test/Transforms/PhaseOrdering/loop-access-checks.ll
@@ -279,7 +279,7 @@ define void @loop_with_signed_induction(ptr noundef nonnull align 8 dereferencea
 ; CHECK-NEXT:    [[SUB_PTR_RHS_CAST_I_I:%.*]] = ptrtoint ptr [[TMP1]] to i64
 ; CHECK-NEXT:    [[SUB_PTR_SUB_I_I:%.*]] = sub i64 [[SUB_PTR_LHS_CAST_I_I]], [[SUB_PTR_RHS_CAST_I_I]]
 ; CHECK-NEXT:    [[SUB_PTR_DIV_I_I:%.*]] = ashr exact i64 [[SUB_PTR_SUB_I_I]], 3
-; CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i64 [[SUB_PTR_DIV_I_I]], 0
+; CHECK-NEXT:    [[CMP9:%.*]] = icmp sgt i64 [[SUB_PTR_SUB_I_I]], 0
 ; CHECK-NEXT:    br i1 [[CMP9]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 ; CHECK:       for.cond.cleanup:
 ; CHECK-NEXT:    ret void

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 13, 2024

@aemerson, @dtcxzyw, @goldsteinn Throughout the code review, I noticed an issue caused by %switch.i = icmp ult i1 %cmp4.i, true. This part of the code checks if (TrueVal == CmpLHS && FalseVal == CmpRHS), indicating a match that could be handled more effectively with a pattern-matching approach. For example, in the following code snippet:

  %cmp4.i = icmp sgt i64 %shr, 348731
  %switch.i = icmp ult i1 %cmp4.i, true
  %spec.select.i = select i1 %switch.i, i64 %shr, i64 348731

If we examine the value of %cmp4.i, we find that both branches have the same value. This suggests that we only need to handle the middle pattern. I just noticed this and I am not clear on how effective it could be. Could you please provide your insight on this?

cc @nikic

@nikic
Copy link
Contributor

nikic commented Apr 13, 2024

I think the only way to move forward with this would be to teach SPF about this pattern (I mean the one with comparison of the ashr operand, not the one with the non-canonical not).

@SahilPatidar
Copy link
Contributor Author

@dtcxzyw, I was experimenting with SPF code and added some logic before the if (TrueVal == CmpLHS && FalseVal == CmpRHS) statement. but it didn't work because of this:

(lldb) expr SI->getCondition()->dump()
 %switch.i = xor i1 %cmp4.i, true

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 15, 2024

@dtcxzyw, I was experimenting with SPF code and added some logic before the if (TrueVal == CmpLHS && FalseVal == CmpRHS) statement. but it didn't work because of this:

(lldb) expr SI->getCondition()->dump()
 %switch.i = xor i1 %cmp4.i, true

You should match this with m_Not, then update the cmp predicate with CmpInst::getInversePredicate.

bool Invert = false;
Value *Condition = SI->getCondition();
Value *X;
if (match(Condition, m_Not(X)) {
  Invert = true;
  Condition = X;
}

CmpInst::Predicate Pred;
Value *CmpLHS, *CmpRHS;
if (!match(Condition, m_ICmp(Pred, m_Value(CmpLHS), m_Value(CmpRHS)))
   fail...

if (Invert)
  Pred = CmpInst::getInversePredicate(Pred);

...

@SahilPatidar
Copy link
Contributor Author

@dtcxyzw,

if (IsAShr && matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN) {

After I made some code changes, this line now triggers three times. However, it only returns SPF_SMIN twice. On the third call, Shr->user_back() seems to be behaving unexpectedly, as shown by the following output:

(lldb) expr Shr->user_back()->dump()
 %cmp4.i = icmp slt i64 %shr, 348732

Can you help me understand why this discrepancy might be happening?

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 19, 2024

@dtcxyzw,

if (IsAShr && matchSelectPattern(Shr->user_back(), Lhs, Rhs).Flavor == SPF_UNKNOWN) {

After I made some code changes, this line now triggers three times. However, it only returns SPF_SMIN twice. On the third call, Shr->user_back() seems to be behaving unexpectedly, as shown by the following output:

(lldb) expr Shr->user_back()->dump()
 %cmp4.i = icmp slt i64 %shr, 348732

Can you help me understand why this discrepancy might be happening?

%spec.select.i = select i1 %switch.i, i64 %shr, i64 348731 seems to be eliminated. You can pass -debug-only=instcombine to opt for more details :)

@SahilPatidar
Copy link
Contributor Author

@dtcxzyw in first iteration:

IC: Visiting:   %switch.i = icmp ult i1 %cmp4.i, true
IC: Old =   %switch.i = icmp ult i1 %cmp4.i, true
    New =   <badref> = xor i1 %cmp4.i, true
ADD:   %switch.i = xor i1 %cmp4.i, true
IC: ERASE   %1 = icmp ult i1 %cmp4.i, true
ADD DEFERRED:   %cmp4.i = icmp sgt i64 %shr, 348731
ADD DEFERRED:   %switch.i = xor i1 %cmp4.i, true
ADD:   %cmp4.i = icmp sgt i64 %shr, 348731
IC: Visiting:   %cmp4.i = icmp sgt i64 %shr, 348731

second:

IC: Visiting:   %switch.i = xor i1 %cmp4.i, true
IC: Replacing   %switch.i = xor i1 %cmp4.i, true
    with   %cmp4.i = icmp sle i64 %shr, 348731
ADD:   %switch.i = xor i1 %cmp4.i, true
IC: Mod =   %switch.i = xor i1 %cmp4.i, true
    New =   %switch.i = xor i1 %cmp4.i, true
IC: ERASE   %switch.i = xor i1 %cmp4.i, true
ADD DEFERRED:   %cmp4.i = icmp sle i64 %shr, 348731
ADD DEFERRED:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
ADD:   %cmp4.i = icmp sle i64 %shr, 348731
IC: Visiting:   %cmp4.i = icmp sle i64 %shr, 348731
IC: Old =   %cmp4.i = icmp sle i64 %shr, 348731
    New =   <badref> = icmp slt i64 %shr, 348732
ADD:   %cmp4.i = icmp slt i64 %shr, 348732
IC: ERASE   %1 = icmp sle i64 %shr, 348731
ADD DEFERRED:   %shr = ashr i64 %mul, 4
ADD:   %shr = ashr i64 %mul, 4
IC: Visiting:   %shr = ashr i64 %mul, 4
IC: Visiting:   %cmp4.i = icmp slt i64 %shr, 348732

third:

IC: Old =   %cmp4.i = icmp slt i64 %shr, 348732
    New =   <badref> = icmp slt i64 %mul, 5579712
ADD:   %cmp4.i = icmp slt i64 %mul, 5579712
IC: ERASE   %1 = icmp slt i64 %shr, 348732
ADD DEFERRED:   %shr = ashr i64 %mul, 4
ADD DEFERRED:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
ADD:   %shr = ashr i64 %mul, 4
IC: Visiting:   %shr = ashr i64 %mul, 4
IC: Visiting:   %cmp4.i = icmp slt i64 %mul, 5579712
IC: Visiting:   %spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731
IC: Visiting:   ret i64 %spec.select.i

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 20, 2024

After the second iteration:

%shr = ashr i64 %mul, 4
%cmp4.i = icmp slt i64 %shr, 348732
%spec.select.i = select i1 %cmp4.i, i64 %shr, i64 348731

As %shr has multiple users, the result of Shr->user_back() is undetermined. Can you try all_of(Shr->users(), [] (User* U) {return matchSelectPattern(U, Lhs, Rhs).Flavor == SPF_UNKNOWN;})?

@SahilPatidar
Copy link
Contributor Author

@dtcxzyw, It seems to be working perfectly now, and I haven't noticed any test failures after the changes. However, the current implementation feels a bit like a brute force approach, and the code could definitely use some improvement – it's a bit rough around the edges. Considering this, should I go ahead and make a commit on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine] Missed optimization: fail to fold (A >> C1) Pred C2 if shr is used multiple times
6 participants