Skip to content

[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 11, 2025

Conversation

mskamp
Copy link
Contributor

@mskamp mskamp commented Dec 30, 2024

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes #119606.

@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Marius Kamp (mskamp)

Changes

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

The transformation works for unsigned comparisons and equality
comparisons with 0 (by converting them to unsigned comparisons).

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

Fixes #119606.


Full diff: https://github.com/llvm/llvm-project/pull/121350.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+82)
  • (added) llvm/test/CodeGen/AArch64/csel-cmp-cse.ll (+383)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 24e1ebd8421fbf..d280e3350c3d11 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -24838,6 +24838,83 @@ static SDValue foldCSELOfCSEL(SDNode *Op, SelectionDAG &DAG) {
   return DAG.getNode(AArch64ISD::CSEL, DL, VT, L, R, CCValue, Cond);
 }
 
+// Reassociate the true/false expressions of a CSEL instruction to obtain a
+// common subexpression with the comparison instruction. For example, change
+// (CSEL (ADD (ADD x y) -c) f LO (SUBS x c)) to
+// (CSEL (ADD (SUBS x c) y) f LO (SUBS x c)) such that (SUBS x c) is a common
+// subexpression.
+static SDValue reassociateCSELOperandsForCSE(SDNode *N, SelectionDAG &DAG) {
+  SDValue SubsNode = N->getOperand(3);
+  if (SubsNode.getOpcode() != AArch64ISD::SUBS || !SubsNode.hasOneUse())
+    return SDValue();
+  auto *CmpOpConst = dyn_cast<ConstantSDNode>(SubsNode.getOperand(1));
+  if (!CmpOpConst)
+    return SDValue();
+
+  auto CC = static_cast<AArch64CC::CondCode>(N->getConstantOperandVal(2));
+  bool IsEquality = CC == AArch64CC::EQ || CC == AArch64CC::NE;
+  if (CC != AArch64CC::LO && CC != AArch64CC::HI &&
+      (!IsEquality || !CmpOpConst->isZero()))
+    return SDValue();
+  // The cases (x < c) and (x == 0) are later unified as (x < newconst).
+  // The cases (x > c) and (x != 0) are later unified as (x >= newconst).
+  APInt NewCmpConst = CC == AArch64CC::LO ? CmpOpConst->getAPIntValue()
+                                          : CmpOpConst->getAPIntValue() + 1;
+  APInt ExpectedConst = -NewCmpConst;
+
+  SDValue CmpOpOther = SubsNode.getOperand(0);
+  EVT VT = N->getValueType(0);
+  SDValue NewCmp = DAG.getNode(AArch64ISD::SUBS, SDLoc(SubsNode),
+                               DAG.getVTList(VT, MVT_CC), CmpOpOther,
+                               DAG.getConstant(NewCmpConst, SDLoc(CmpOpConst),
+                                               CmpOpConst->getValueType(0)));
+
+  auto Reassociate = [&](SDValue Op) {
+    if (Op.getOpcode() != ISD::ADD)
+      return SDValue();
+    auto *AddOpConst = dyn_cast<ConstantSDNode>(Op.getOperand(1));
+    if (!AddOpConst)
+      return SDValue();
+    if (IsEquality && AddOpConst->getAPIntValue() != ExpectedConst)
+      return SDValue();
+    if (!IsEquality && AddOpConst->getAPIntValue() != ExpectedConst)
+      return SDValue();
+    if (Op.getOperand(0).getOpcode() != ISD::ADD ||
+        !Op.getOperand(0).hasOneUse())
+      return SDValue();
+    SDValue X = Op.getOperand(0).getOperand(0);
+    SDValue Y = Op.getOperand(0).getOperand(1);
+    if (X != CmpOpOther)
+      std::swap(X, Y);
+    if (X != CmpOpOther)
+      return SDValue();
+    SDNodeFlags Flags;
+    if (Op.getOperand(0).getNode()->getFlags().hasNoUnsignedWrap())
+      Flags.setNoUnsignedWrap(true);
+    return DAG.getNode(ISD::ADD, SDLoc(Op), VT, NewCmp.getValue(0), Y, Flags);
+  };
+
+  SDValue TValReassoc = Reassociate(N->getOperand(0));
+  SDValue FValReassoc = Reassociate(N->getOperand(1));
+  if (!TValReassoc && !FValReassoc)
+    return SDValue();
+  if (TValReassoc)
+    DAG.ReplaceAllUsesWith(N->getOperand(0), TValReassoc);
+  else
+    TValReassoc = N->getOperand(0);
+  if (FValReassoc)
+    DAG.ReplaceAllUsesWith(N->getOperand(1), FValReassoc);
+  else
+    FValReassoc = N->getOperand(1);
+
+  AArch64CC::CondCode NewCC = CC == AArch64CC::EQ || CC == AArch64CC::LO
+                                  ? AArch64CC::LO
+                                  : AArch64CC::HS;
+  return DAG.getNode(AArch64ISD::CSEL, SDLoc(N), VT, TValReassoc, FValReassoc,
+                     DAG.getConstant(NewCC, SDLoc(N->getOperand(2)), MVT_CC),
+                     NewCmp.getValue(1));
+}
+
 // Optimize CSEL instructions
 static SDValue performCSELCombine(SDNode *N,
                                   TargetLowering::DAGCombinerInfo &DCI,
@@ -24849,6 +24926,11 @@ static SDValue performCSELCombine(SDNode *N,
   if (SDValue R = foldCSELOfCSEL(N, DAG))
     return R;
 
+  // Try to reassociate the true/false expressions so that we can do CSE with
+  // a SUBS instruction used to perform the comparison.
+  if (SDValue R = reassociateCSELOperandsForCSE(N, DAG))
+    return R;
+
   // CSEL 0, cttz(X), eq(X, 0) -> AND cttz bitwidth-1
   // CSEL cttz(X), 0, ne(X, 0) -> AND cttz bitwidth-1
   if (SDValue Folded = foldCSELofCTTZ(N, DAG))
diff --git a/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll b/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll
new file mode 100644
index 00000000000000..48b542966cf52f
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll
@@ -0,0 +1,383 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s
+
+declare void @use_i1(i1 %x)
+declare void @use_i32(i32 %x)
+
+; Based on the IR generated for the `last` method of the type `slice` in Rust
+define ptr @test_last_elem_from_ptr(ptr noundef readnone %x0, i64 noundef %x1) {
+; CHECK-LABEL: test_last_elem_from_ptr:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs x8, x1, #1
+; CHECK-NEXT:    add x8, x8, x0
+; CHECK-NEXT:    csel x0, xzr, x8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i64 %x1, 0
+  %add.ptr = getelementptr inbounds nuw i8, ptr %x0, i64 %x1
+  %add.ptr1 = getelementptr inbounds i8, ptr %add.ptr, i64 -1
+  %retval.0 = select i1 %cmp, ptr null, ptr %add.ptr1
+  ret ptr %retval.0
+}
+
+define i32 @test_eq0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ule7_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule7_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #8
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ule0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ultminus2_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ultminus2_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    adds w8, w1, #2
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, -2
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, -2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ne0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ne0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, w8, wzr, hs
+; CHECK-NEXT:    ret
+  %cmp = icmp ne i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 %sub, i32 0
+  ret i32 %ret
+}
+
+define i32 @test_ugt7_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugt7_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #8
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, hs
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_sub_addcomm_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_sub_addcomm_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x1, %x0
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_subcomm_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_subcomm_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = add i32 -1, %add
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_multi_use_sub_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_sub_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w30, -16
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w0, w8, w0
+; CHECK-NEXT:    csel w19, wzr, w0, lo
+; CHECK-NEXT:    bl use_i32
+; CHECK-NEXT:    mov w0, w19
+; CHECK-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  tail call void @use_i32(i32 %sub)
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq0_multi_use_cmp_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_cmp_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w30, -16
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    csel w19, wzr, w8, eq
+; CHECK-NEXT:    bl use_i1
+; CHECK-NEXT:    mov w0, w19
+; CHECK-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  tail call void @use_i1(i1 %cmp)
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq0_multi_use_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x30, [sp, #-32]! // 8-byte Folded Spill
+; CHECK-NEXT:    stp x20, x19, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w20, -16
+; CHECK-NEXT:    .cfi_offset w30, -32
+; CHECK-NEXT:    add w20, w0, w1
+; CHECK-NEXT:    mov w19, w1
+; CHECK-NEXT:    mov w0, w20
+; CHECK-NEXT:    bl use_i32
+; CHECK-NEXT:    sub w8, w20, #1
+; CHECK-NEXT:    cmp w19, #0
+; CHECK-NEXT:    ldp x20, x19, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ldr x30, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  tail call void @use_i32(i32 %add)
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq1_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq1_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #1
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 1
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ultminus1_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ultminus1_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmn w1, #1
+; CHECK-NEXT:    csinc w0, wzr, w8, ne
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, -1
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, -1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ugtsmax_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugtsmax_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT:    add w9, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    add w8, w9, w8
+; CHECK-NEXT:    csel w0, wzr, w8, lt
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 2147483647
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2147483648
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ult_nonconst_i32(i32 %x0, i32 %x1, i32 %x2) {
+; CHECK-LABEL: test_ult_nonconst_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, w2
+; CHECK-NEXT:    sub w8, w8, w2
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, %x2
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, %x2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_sle7_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_sle7_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #8
+; CHECK-NEXT:    sub w8, w8, #8
+; CHECK-NEXT:    csel w0, wzr, w8, lt
+; CHECK-NEXT:    ret
+  %cmp = icmp sle i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ne_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ne_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, w8, wzr, ne
+; CHECK-NEXT:    ret
+  %cmp = icmp ne i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 %sub, i32 0
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ule7_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule7_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #8
+; CHECK-NEXT:    sub w8, w8, #7
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 7
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ugt7_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugt7_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #7
+; CHECK-NEXT:    sub w8, w8, #7
+; CHECK-NEXT:    csel w0, wzr, w8, hi
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 7
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_unrelated_add_i32(i32 %x0, i32 %x1, i32 %x2) {
+; CHECK-LABEL: test_unrelated_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w2
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x2
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i16 @test_eq0_sub_add_i16(i16 %x0, i16 %x1) {
+; CHECK-LABEL: test_eq0_sub_add_i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    tst w1, #0xffff
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i16 %x1, 0
+  %add = add nuw i16 %x0, %x1
+  %sub = sub i16 %add, 1
+  %ret = select i1 %cmp, i16 0, i16 %sub
+  ret i16 %ret
+}

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - Am I right in thinking that any condition would be valid if we were just reassociating the adds and use the same constant? (As in https://alive2.llvm.org/ce/z/X7pFRT). The conditions allowed by this patch are to do with having to alter the constants to match the SUBS?

Do you have any tests where the type of the CSEL is different from the type of the SUBS?

@mskamp
Copy link
Contributor Author

mskamp commented Jan 3, 2025

Hi - Am I right in thinking that any condition would be valid if we were just reassociating the adds and use the same constant? (As in https://alive2.llvm.org/ce/z/X7pFRT). The conditions allowed by this patch are to do with having to alter the constants to match the SUBS?

If we have an (icmp cc %a, const) and a (add (add %a, %b), -const), we can reassociate the adds and remove the comparison instruction by using a SUBS instruction. The motivating pattern for this PR, however, was slightly different: If we have a (icmp eq %a, 0) and a (add (add %a, %b), -1), we can change the comparison to (icmp ult %a, 1) and then perform the transformation stated before. Currently, the implemented transformation mainly targets the latter case (with some generalizations) but it is probably better (and might even be easier to read) to extend the transformation to handle the former case, too. I'll adapt the implementation accordingly.

Do you have any tests where the type of the CSEL is different from the type of the SUBS?

I can add such tests but I'm wondering whether they are useful. If the SUBS has a different type than the operands of the CSEL, the operands of the SUBS used for comparison and the ADD won't match (as they must have different type) or the operands of the CSEL wouldn't be chains of ADD instructions. Hence, the transformation as implemented would not apply. Or am I missing something here?

@mskamp mskamp force-pushed the fix_119606_reassociate_for_cmp_cse branch from b3f6a01 to 8c78b70 Compare January 4, 2025 08:32
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add such tests but I'm wondering whether they are useful. If the SUBS has a different type than the operands of the CSEL, the operands of the SUBS used for comparison and the ADD won't match (as they must have different type) or the operands of the CSEL wouldn't be chains of ADD instructions. Hence, the transformation as implemented would not apply. Or am I missing something here?

It is just to protect against crashes - APInt doesn't look it when the bitwidths don't match.


auto CC = static_cast<AArch64CC::CondCode>(N->getConstantOperandVal(2));
bool IsEquality = CC == AArch64CC::EQ || CC == AArch64CC::NE;
if (IsEquality && !CmpOpConst->isZero())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle eq and ne in the same way?

define i32 @src(i32 %x0, i32 %x1) {
  %cmp = icmp eq i32 %x1, 7
  %add = add i32 %x0, %x1
  %sub = sub i32 %add, 7
  %ret = select i1 %cmp, i32 0, i32 %sub
  ret i32 %ret
}

define i32 @tgt(i32 %x0, i32 %x1) {
  %cmp = icmp eq i32 %x1, 7
  %add = sub i32 %x1, 7
  %sub = add i32 %add, %x0
  %ret = select i1 %cmp, i32 0, i32 %sub
  ret i32 %ret
}

(I guess for all of those the constant isn't actually necessary and the basic pattern is that we can reassociate an add/sub to allow it to be shared with a sub. FYI There is a related but different patch to that (swapping conditions to allow sharing) added in #121412 (mostly for handling constants, for non-constants it already exists in a couple of places in llvm already). I understand this is getting further away from your motivating case. If we can fix the APInt issues then the follow on can easily be added in a later patch if that is simpler. Sometimes it is better to have smaller patches).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can also handle eq and ne. I've also implemented this.

I agree that smaller patches are sometimes better. I'll take a look into non-constant operands, which would probably require some more restructuring, and start working on a follow-up patch. For this PR, it is probably easier to stick to constants to keep the PR concise.

mskamp added 2 commits January 8, 2025 20:28
If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes llvm#119606.
@mskamp mskamp force-pushed the fix_119606_reassociate_for_cmp_cse branch from 8c78b70 to 43f2da4 Compare January 9, 2025 18:31
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good now - LGTM.

Let me know if we should submit this.

@mskamp
Copy link
Contributor Author

mskamp commented Jan 11, 2025

Thanks, looks good now - LGTM.

Let me know if we should submit this.

From my point of view, this PR is ready. I'd be glad if it could be merged. Thank you for your time!

FYI the suggested patch to support non-constant operands should be ready within the next few days.

@davemgreen davemgreen merged commit 1eed469 into llvm:main Jan 11, 2025
8 checks passed
BaiXilin pushed a commit to BaiXilin/llvm-fix-vnni-instr-types that referenced this pull request Jan 12, 2025
…vm#121350)

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes llvm#119606.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] Eliminate cmp by reassocating add and sub
3 participants