Skip to content

[llvm][InstCombine] Fold select to cmp for weak and inverted inequalities #143445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2025

Conversation

yashnator
Copy link
Contributor

Currently:

int signum(int x) {
    if (x < 0) return -1;
    if (x > 0) return +1;
    return 0;
}

is not identified as scmp(x,0).

This patch adds folding for the edge case: (x > -1) ? zext(x != 0) : -1, which is generated for the above signum and is equivalent to scmp(x, 0). For other constants/variables, the fold is already optimised.

Alive2 proof (taken from issue)

Resolves #143259

@yashnator yashnator requested a review from nikic as a code owner June 9, 2025 21:45
@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jun 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Yash Solanki (yashnator)

Changes

Currently:

int signum(int x) {
    if (x &lt; 0) return -1;
    if (x &gt; 0) return +1;
    return 0;
}

is not identified as scmp(x,0).

This patch adds folding for the edge case: (x &gt; -1) ? zext(x != 0) : -1, which is generated for the above signum and is equivalent to scmp(x, 0). For other constants/variables, the fold is already optimised.

Alive2 proof (taken from issue)

Resolves #143259


Full diff: https://github.com/llvm/llvm-project/pull/143445.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+12)
  • (modified) llvm/test/Transforms/InstCombine/scmp.ll (+14)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 8f46ae304353d..ec0cda18a6492 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -3603,6 +3603,18 @@ Instruction *InstCombinerImpl::foldSelectToCmp(SelectInst &SI) {
        ICmpInst::getSwappedPredicate(ExtendedCmpPredicate) == Pred))
     Replace = true;
 
+  // Handle the edge case (x > -1) ? zext(x != 0), -1
+  if (IsSigned && ICmpInst::isGT(Pred) && match(FV, m_AllOnes()) &&
+      match(TV, m_ZExt(m_c_ICmp(ExtendedCmpPredicate, m_Specific(LHS),
+                                m_Zero()))) &&
+      (ExtendedCmpPredicate == ICmpInst::ICMP_NE ||
+       ICmpInst::getSwappedPredicate(ExtendedCmpPredicate) == Pred)) {
+    Value *Zero = ConstantInt::get(LHS->getType(), 0);
+    return replaceInstUsesWith(
+        SI,
+        Builder.CreateIntrinsic(SI.getType(), Intrinsic::scmp, {LHS, Zero}));
+  }
+
   // (x == y) ? 0 : (x > y ? 1 : -1)
   CmpPredicate FalseBranchSelectPredicate;
   const APInt *InnerTV, *InnerFV;
diff --git a/llvm/test/Transforms/InstCombine/scmp.ll b/llvm/test/Transforms/InstCombine/scmp.ll
index 2140a59de3fa9..b685ca20998bd 100644
--- a/llvm/test/Transforms/InstCombine/scmp.ll
+++ b/llvm/test/Transforms/InstCombine/scmp.ll
@@ -473,3 +473,17 @@ define i8 @scmp_from_select_eq_and_gt_neg3(i32 %x, i32 %y) {
   %r = select i1 %eq, i8 0, i8 %sel1
   ret i8 %r
 }
+
+; Fold (x > -1) ? zext(x != 0), -1 to scmp(x, 0)
+define i32 @scmp_x_0_from_gt_minus_1(i32 noundef %0) local_unnamed_addr #0 {
+; CHECK-LABEL: define i32 @scmp_x_0_from_gt_minus_1(
+; CHECK-SAME: i32 noundef [[TMP0:%.*]]) local_unnamed_addr {
+; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.scmp.i32.i32(i32 [[TMP0]], i32 0)
+; CHECK-NEXT:    ret i32 [[TMP2]]
+;
+  %2 = icmp ne i32 %0, 0
+  %3 = zext i1 %2 to i32
+  %4 = icmp sgt i32 %0, -1
+  %5 = select i1 %4, i32 %3, i32 -1
+  ret i32 %5
+}

Copy link
Member

@el-ev el-ev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test for the change?

@yashnator
Copy link
Contributor Author

@el-ev I had added a test in scmp.ll- scmp_x_0_from_gt_minus_1

@el-ev
Copy link
Member

el-ev commented Jun 10, 2025

@el-ev I had added a test in scmp.ll- scmp_x_0_from_gt_minus_1

Could you please add additional test cases and pre-commit the tests?

Please check https://llvm.org/docs/InstCombineContributorGuide.html

@AZero13
Copy link
Contributor

AZero13 commented Jun 10, 2025

Irrelevant to this PR in itself, but I did notice when running both src in tgt in llc, but why does llc lower the src in the alive2 posted better than the intrinsic?

@topperc
Copy link
Collaborator

topperc commented Jun 10, 2025

Irrelevant to this PR in itself, but I did notice when running both src in tgt in llc, but why does llc lower the src in the alive2 posted better than the intrinsic?

I'm guessing you mean on X86. It looks neutral on RISC-V and scmp looks better for AArch64.

@yashnator yashnator force-pushed the fold-scmp-x-0 branch 2 times, most recently from 1d1ad2b to a9e0877 Compare June 11, 2025 17:09
@yashnator
Copy link
Contributor Author

@el-ev I have added more tests. I also added changes to handle more general predicates

@AZero13
Copy link
Contributor

AZero13 commented Jun 11, 2025

Irrelevant to this PR in itself, but I did notice when running both src in tgt in llc, but why does llc lower the src in the alive2 posted better than the intrinsic?

I'm guessing you mean on X86. It looks neutral on RISC-V and scmp looks better for AArch64.

Yeah, might have to work on that.

I mean human written asm for this in particular is:

  add edi, edi
  setnz cl
  sbb eax, eax
  or al, cl
  ret

but I do not know if I can get the compiler to lower that to this.

@topperc
Copy link
Collaborator

topperc commented Jun 11, 2025

Irrelevant to this PR in itself, but I did notice when running both src in tgt in llc, but why does llc lower the src in the alive2 posted better than the intrinsic?

I'm guessing you mean on X86. It looks neutral on RISC-V and scmp looks better for AArch64.

Yeah, might have to work on that.

I mean human written asm for this in particular is:

  add edi, edi
  setnz cl
  sbb eax, eax
  or al, cl
  ret

but I do not know if I can get the compiler to lower that to this.

add edi, edi seems wrong there

@AZero13
Copy link
Contributor

AZero13 commented Jun 11, 2025

add edi, edi seems wrong there

Why?

@topperc
Copy link
Collaborator

topperc commented Jun 12, 2025

add edi, edi seems wrong there

Why?

Nevermind, I think its ok, but there's one partial register write and one false dependency in that code.

@AZero13
Copy link
Contributor

AZero13 commented Jun 12, 2025

add edi, edi seems wrong there

Why?

Nevermind, I think its ok, but there's one partial register write and one false dependency in that code.

partial register write, yes, deliberate
false dependency? Where?

@AZero13
Copy link
Contributor

AZero13 commented Jun 12, 2025

if you're looking at sbb eax, eax, then (1) that's probably special-cased by the CPU, and (2) even if not, eax isn't touched before that point, so it's very unlikely that the value is still in flight

I don't know if LLVM is aware of special cases like that though. Or if it's worth teaching LLVM that

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a getFlippedStrictnessPredicateAndConstant() helper for cases like this. I think we can handle this generically by doing a swap of the select arms to get the constant into TV, inverting the predicate, and then using getFlippedStrictnessPredicateAndConstant() to convert the now non-strict predicate into a strict predicate.

@topperc
Copy link
Collaborator

topperc commented Jun 12, 2025

if you're looking at sbb eax, eax, then (1) that's probably special-cased by the CPU, and (2) even if not, eax isn't touched before that point, so it's very unlikely that the value is still in flight

I don't know if LLVM is aware of special cases like that though. Or if it's worth teaching LLVM that

For a long time it was only special cased on AMD CPUs. I don't if Intel finally fixed it or not. There's a TuningSBBDepBreaking flag in X86Subtarget for it.

to cmp with constants in non-canonical inequalities
…ualities

For constant y, inequalities aren't always canonical
so we need to check the conditions such as
- (x > y - 1) ? zext(x != y) : -1
- (x > y - 1) ? zext(x > y) : -1
- (x < y + 1) ? sext(x != y) : 1
- (x < y + 1) ? sext(x < y) : 1
and similary non-strict equalities.

Fold select into scmp/ucmp based on signedness of
the comparison predicate.

Resolves llvm#143259
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yashnator yashnator changed the title [llvm][InstCombine] Fold signum(x) into scmp(x, 0) [llvm][InstCombine] Fold select to cmp for weak and inverted inequalities Jun 13, 2025
@yashnator
Copy link
Contributor Author

I don't have commit access, can you help merge?

@el-ev el-ev merged commit a361a3d into llvm:main Jun 13, 2025
7 checks passed
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to spot scmp(x, 0) idiom
7 participants