[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

mskamp · 2024-12-30T18:08:00Z

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

We can extend this transformation for slightly different constants. For
example, if we have (add (add x y) -(c-1)) and a the comparison x <u c,
we can transform the comparison to x <=u c-1 to eliminate the comparison
instruction, too. Similarly, we can transform (x == 0) to (x <u 1).

Proofs for the transformations that alter the constants:
https://alive2.llvm.org/ce/z/3nVqgR

Fixes #119606.

llvmbot · 2024-12-30T18:08:32Z

@llvm/pr-subscribers-backend-aarch64

Author: Marius Kamp (mskamp)

Changes

If we have a CSEL instruction that depends on the flags set by a
(SUBS x c) instruction and the true and/or false expression is
(add (add x y) -c), we can reassociate the latter expression to
(add (SUBS x c) y) and save one instruction.

The transformation works for unsigned comparisons and equality
comparisons with 0 (by converting them to unsigned comparisons).

Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb

Fixes #119606.

Full diff: https://github.com/llvm/llvm-project/pull/121350.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+82)
(added) llvm/test/CodeGen/AArch64/csel-cmp-cse.ll (+383)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 24e1ebd8421fbf..d280e3350c3d11 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -24838,6 +24838,83 @@ static SDValue foldCSELOfCSEL(SDNode *Op, SelectionDAG &DAG) {
   return DAG.getNode(AArch64ISD::CSEL, DL, VT, L, R, CCValue, Cond);
 }
 
+// Reassociate the true/false expressions of a CSEL instruction to obtain a
+// common subexpression with the comparison instruction. For example, change
+// (CSEL (ADD (ADD x y) -c) f LO (SUBS x c)) to
+// (CSEL (ADD (SUBS x c) y) f LO (SUBS x c)) such that (SUBS x c) is a common
+// subexpression.
+static SDValue reassociateCSELOperandsForCSE(SDNode *N, SelectionDAG &DAG) {
+  SDValue SubsNode = N->getOperand(3);
+  if (SubsNode.getOpcode() != AArch64ISD::SUBS || !SubsNode.hasOneUse())
+    return SDValue();
+  auto *CmpOpConst = dyn_cast<ConstantSDNode>(SubsNode.getOperand(1));
+  if (!CmpOpConst)
+    return SDValue();
+
+  auto CC = static_cast<AArch64CC::CondCode>(N->getConstantOperandVal(2));
+  bool IsEquality = CC == AArch64CC::EQ || CC == AArch64CC::NE;
+  if (CC != AArch64CC::LO && CC != AArch64CC::HI &&
+      (!IsEquality || !CmpOpConst->isZero()))
+    return SDValue();
+  // The cases (x < c) and (x == 0) are later unified as (x < newconst).
+  // The cases (x > c) and (x != 0) are later unified as (x >= newconst).
+  APInt NewCmpConst = CC == AArch64CC::LO ? CmpOpConst->getAPIntValue()
+                                          : CmpOpConst->getAPIntValue() + 1;
+  APInt ExpectedConst = -NewCmpConst;
+
+  SDValue CmpOpOther = SubsNode.getOperand(0);
+  EVT VT = N->getValueType(0);
+  SDValue NewCmp = DAG.getNode(AArch64ISD::SUBS, SDLoc(SubsNode),
+                               DAG.getVTList(VT, MVT_CC), CmpOpOther,
+                               DAG.getConstant(NewCmpConst, SDLoc(CmpOpConst),
+                                               CmpOpConst->getValueType(0)));
+
+  auto Reassociate = [&](SDValue Op) {
+    if (Op.getOpcode() != ISD::ADD)
+      return SDValue();
+    auto *AddOpConst = dyn_cast<ConstantSDNode>(Op.getOperand(1));
+    if (!AddOpConst)
+      return SDValue();
+    if (IsEquality && AddOpConst->getAPIntValue() != ExpectedConst)
+      return SDValue();
+    if (!IsEquality && AddOpConst->getAPIntValue() != ExpectedConst)
+      return SDValue();
+    if (Op.getOperand(0).getOpcode() != ISD::ADD ||
+        !Op.getOperand(0).hasOneUse())
+      return SDValue();
+    SDValue X = Op.getOperand(0).getOperand(0);
+    SDValue Y = Op.getOperand(0).getOperand(1);
+    if (X != CmpOpOther)
+      std::swap(X, Y);
+    if (X != CmpOpOther)
+      return SDValue();
+    SDNodeFlags Flags;
+    if (Op.getOperand(0).getNode()->getFlags().hasNoUnsignedWrap())
+      Flags.setNoUnsignedWrap(true);
+    return DAG.getNode(ISD::ADD, SDLoc(Op), VT, NewCmp.getValue(0), Y, Flags);
+  };
+
+  SDValue TValReassoc = Reassociate(N->getOperand(0));
+  SDValue FValReassoc = Reassociate(N->getOperand(1));
+  if (!TValReassoc && !FValReassoc)
+    return SDValue();
+  if (TValReassoc)
+    DAG.ReplaceAllUsesWith(N->getOperand(0), TValReassoc);
+  else
+    TValReassoc = N->getOperand(0);
+  if (FValReassoc)
+    DAG.ReplaceAllUsesWith(N->getOperand(1), FValReassoc);
+  else
+    FValReassoc = N->getOperand(1);
+
+  AArch64CC::CondCode NewCC = CC == AArch64CC::EQ || CC == AArch64CC::LO
+                                  ? AArch64CC::LO
+                                  : AArch64CC::HS;
+  return DAG.getNode(AArch64ISD::CSEL, SDLoc(N), VT, TValReassoc, FValReassoc,
+                     DAG.getConstant(NewCC, SDLoc(N->getOperand(2)), MVT_CC),
+                     NewCmp.getValue(1));
+}
+
 // Optimize CSEL instructions
 static SDValue performCSELCombine(SDNode *N,
                                   TargetLowering::DAGCombinerInfo &DCI,
@@ -24849,6 +24926,11 @@ static SDValue performCSELCombine(SDNode *N,
   if (SDValue R = foldCSELOfCSEL(N, DAG))
     return R;
 
+  // Try to reassociate the true/false expressions so that we can do CSE with
+  // a SUBS instruction used to perform the comparison.
+  if (SDValue R = reassociateCSELOperandsForCSE(N, DAG))
+    return R;
+
   // CSEL 0, cttz(X), eq(X, 0) -> AND cttz bitwidth-1
   // CSEL cttz(X), 0, ne(X, 0) -> AND cttz bitwidth-1
   if (SDValue Folded = foldCSELofCTTZ(N, DAG))
diff --git a/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll b/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll
new file mode 100644
index 00000000000000..48b542966cf52f
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/csel-cmp-cse.ll
@@ -0,0 +1,383 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-unknown-unknown < %s | FileCheck %s
+
+declare void @use_i1(i1 %x)
+declare void @use_i32(i32 %x)
+
+; Based on the IR generated for the `last` method of the type `slice` in Rust
+define ptr @test_last_elem_from_ptr(ptr noundef readnone %x0, i64 noundef %x1) {
+; CHECK-LABEL: test_last_elem_from_ptr:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs x8, x1, #1
+; CHECK-NEXT:    add x8, x8, x0
+; CHECK-NEXT:    csel x0, xzr, x8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i64 %x1, 0
+  %add.ptr = getelementptr inbounds nuw i8, ptr %x0, i64 %x1
+  %add.ptr1 = getelementptr inbounds i8, ptr %add.ptr, i64 -1
+  %retval.0 = select i1 %cmp, ptr null, ptr %add.ptr1
+  ret ptr %retval.0
+}
+
+define i32 @test_eq0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ule7_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule7_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #8
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ule0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ultminus2_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ultminus2_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    adds w8, w1, #2
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, -2
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, -2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_ne0_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ne0_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, w8, wzr, hs
+; CHECK-NEXT:    ret
+  %cmp = icmp ne i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 %sub, i32 0
+  ret i32 %ret
+}
+
+define i32 @test_ugt7_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugt7_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #8
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, hs
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_sub_addcomm_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_sub_addcomm_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x1, %x0
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_subcomm_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_subcomm_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w8, w8, w0
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = add i32 -1, %add
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+define i32 @test_eq0_multi_use_sub_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_sub_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w30, -16
+; CHECK-NEXT:    subs w8, w1, #1
+; CHECK-NEXT:    add w0, w8, w0
+; CHECK-NEXT:    csel w19, wzr, w0, lo
+; CHECK-NEXT:    bl use_i32
+; CHECK-NEXT:    mov w0, w19
+; CHECK-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  tail call void @use_i32(i32 %sub)
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq0_multi_use_cmp_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_cmp_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w30, -16
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    csel w19, wzr, w8, eq
+; CHECK-NEXT:    bl use_i1
+; CHECK-NEXT:    mov w0, w19
+; CHECK-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  tail call void @use_i1(i1 %cmp)
+  %add = add nuw i32 %x0, %x1
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq0_multi_use_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq0_multi_use_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    str x30, [sp, #-32]! // 8-byte Folded Spill
+; CHECK-NEXT:    stp x20, x19, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w20, -16
+; CHECK-NEXT:    .cfi_offset w30, -32
+; CHECK-NEXT:    add w20, w0, w1
+; CHECK-NEXT:    mov w19, w1
+; CHECK-NEXT:    mov w0, w20
+; CHECK-NEXT:    bl use_i32
+; CHECK-NEXT:    sub w8, w20, #1
+; CHECK-NEXT:    cmp w19, #0
+; CHECK-NEXT:    ldp x20, x19, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ldr x30, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x1
+  tail call void @use_i32(i32 %add)
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq1_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq1_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #1
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 1
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ultminus1_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ultminus1_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmn w1, #1
+; CHECK-NEXT:    csinc w0, wzr, w8, ne
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, -1
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, -1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ugtsmax_sub_add_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugtsmax_sub_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT:    add w9, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    add w8, w9, w8
+; CHECK-NEXT:    csel w0, wzr, w8, lt
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 2147483647
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2147483648
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ult_nonconst_i32(i32 %x0, i32 %x1, i32 %x2) {
+; CHECK-LABEL: test_ult_nonconst_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, w2
+; CHECK-NEXT:    sub w8, w8, w2
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ult i32 %x1, %x2
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, %x2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_sle7_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_sle7_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #8
+; CHECK-NEXT:    sub w8, w8, #8
+; CHECK-NEXT:    csel w0, wzr, w8, lt
+; CHECK-NEXT:    ret
+  %cmp = icmp sle i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 8
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_eq_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_eq_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ne_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ne_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #2
+; CHECK-NEXT:    csel w0, w8, wzr, ne
+; CHECK-NEXT:    ret
+  %cmp = icmp ne i32 %x1, 0
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 2
+  %ret = select i1 %cmp, i32 %sub, i32 0
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ule7_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ule7_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #8
+; CHECK-NEXT:    sub w8, w8, #7
+; CHECK-NEXT:    csel w0, wzr, w8, lo
+; CHECK-NEXT:    ret
+  %cmp = icmp ule i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 7
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_ugt7_const_mismatch_i32(i32 %x0, i32 %x1) {
+; CHECK-LABEL: test_ugt7_const_mismatch_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    cmp w1, #7
+; CHECK-NEXT:    sub w8, w8, #7
+; CHECK-NEXT:    csel w0, wzr, w8, hi
+; CHECK-NEXT:    ret
+  %cmp = icmp ugt i32 %x1, 7
+  %add = add i32 %x0, %x1
+  %sub = sub i32 %add, 7
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i32 @test_unrelated_add_i32(i32 %x0, i32 %x1, i32 %x2) {
+; CHECK-LABEL: test_unrelated_add_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w2
+; CHECK-NEXT:    cmp w1, #0
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i32 %x1, 0
+  %add = add nuw i32 %x0, %x2
+  %sub = sub i32 %add, 1
+  %ret = select i1 %cmp, i32 0, i32 %sub
+  ret i32 %ret
+}
+
+; Negative test
+define i16 @test_eq0_sub_add_i16(i16 %x0, i16 %x1) {
+; CHECK-LABEL: test_eq0_sub_add_i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    tst w1, #0xffff
+; CHECK-NEXT:    sub w8, w8, #1
+; CHECK-NEXT:    csel w0, wzr, w8, eq
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i16 %x1, 0
+  %add = add nuw i16 %x0, %x1
+  %sub = sub i16 %add, 1
+  %ret = select i1 %cmp, i16 0, i16 %sub
+  ret i16 %ret
+}

davemgreen

Hi - Am I right in thinking that any condition would be valid if we were just reassociating the adds and use the same constant? (As in https://alive2.llvm.org/ce/z/X7pFRT). The conditions allowed by this patch are to do with having to alter the constants to match the SUBS?

Do you have any tests where the type of the CSEL is different from the type of the SUBS?

mskamp · 2025-01-03T10:05:12Z

Hi - Am I right in thinking that any condition would be valid if we were just reassociating the adds and use the same constant? (As in https://alive2.llvm.org/ce/z/X7pFRT). The conditions allowed by this patch are to do with having to alter the constants to match the SUBS?

If we have an (icmp cc %a, const) and a (add (add %a, %b), -const), we can reassociate the adds and remove the comparison instruction by using a SUBS instruction. The motivating pattern for this PR, however, was slightly different: If we have a (icmp eq %a, 0) and a (add (add %a, %b), -1), we can change the comparison to (icmp ult %a, 1) and then perform the transformation stated before. Currently, the implemented transformation mainly targets the latter case (with some generalizations) but it is probably better (and might even be easier to read) to extend the transformation to handle the former case, too. I'll adapt the implementation accordingly.

Do you have any tests where the type of the CSEL is different from the type of the SUBS?

I can add such tests but I'm wondering whether they are useful. If the SUBS has a different type than the operands of the CSEL, the operands of the SUBS used for comparison and the ADD won't match (as they must have different type) or the operands of the CSEL wouldn't be chains of ADD instructions. Hence, the transformation as implemented would not apply. Or am I missing something here?

davemgreen

I can add such tests but I'm wondering whether they are useful. If the SUBS has a different type than the operands of the CSEL, the operands of the SUBS used for comparison and the ADD won't match (as they must have different type) or the operands of the CSEL wouldn't be chains of ADD instructions. Hence, the transformation as implemented would not apply. Or am I missing something here?

It is just to protect against crashes - APInt doesn't look it when the bitwidths don't match.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

davemgreen · 2025-01-07T20:48:24Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

+
+  auto CC = static_cast<AArch64CC::CondCode>(N->getConstantOperandVal(2));
+  bool IsEquality = CC == AArch64CC::EQ || CC == AArch64CC::NE;
+  if (IsEquality && !CmpOpConst->isZero())


Can we handle eq and ne in the same way?

define i32 @src(i32 %x0, i32 %x1) { %cmp = icmp eq i32 %x1, 7 %add = add i32 %x0, %x1 %sub = sub i32 %add, 7 %ret = select i1 %cmp, i32 0, i32 %sub ret i32 %ret } define i32 @tgt(i32 %x0, i32 %x1) { %cmp = icmp eq i32 %x1, 7 %add = sub i32 %x1, 7 %sub = add i32 %add, %x0 %ret = select i1 %cmp, i32 0, i32 %sub ret i32 %ret }

(I guess for all of those the constant isn't actually necessary and the basic pattern is that we can reassociate an add/sub to allow it to be shared with a sub. FYI There is a related but different patch to that (swapping conditions to allow sharing) added in #121412 (mostly for handling constants, for non-constants it already exists in a couple of places in llvm already). I understand this is getting further away from your motivating case. If we can fix the APInt issues then the follow on can easily be added in a later patch if that is simpler. Sometimes it is better to have smaller patches).

Yes, we can also handle eq and ne. I've also implemented this.

I agree that smaller patches are sometimes better. I'll take a look into non-constant operands, which would probably require some more restructuring, and start working on a follow-up patch. For this PR, it is probably easier to stick to constants to keep the PR concise.

…ation; NFC

If we have a CSEL instruction that depends on the flags set by a (SUBS x c) instruction and the true and/or false expression is (add (add x y) -c), we can reassociate the latter expression to (add (SUBS x c) y) and save one instruction. Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb We can extend this transformation for slightly different constants. For example, if we have (add (add x y) -(c-1)) and a the comparison x <u c, we can transform the comparison to x <=u c-1 to eliminate the comparison instruction, too. Similarly, we can transform (x == 0) to (x <u 1). Proofs for the transformations that alter the constants: https://alive2.llvm.org/ce/z/3nVqgR Fixes llvm#119606.

davemgreen

Thanks, looks good now - LGTM.

Let me know if we should submit this.

mskamp · 2025-01-11T15:10:36Z

Thanks, looks good now - LGTM.

Let me know if we should submit this.

From my point of view, this PR is ready. I'd be glad if it could be merged. Thank you for your time!

FYI the suggested patch to support non-constant operands should be ready within the next few days.

…vm#121350) If we have a CSEL instruction that depends on the flags set by a (SUBS x c) instruction and the true and/or false expression is (add (add x y) -c), we can reassociate the latter expression to (add (SUBS x c) y) and save one instruction. Proof for the basic transformation: https://alive2.llvm.org/ce/z/-337Pb We can extend this transformation for slightly different constants. For example, if we have (add (add x y) -(c-1)) and a the comparison x <u c, we can transform the comparison to x <=u c-1 to eliminate the comparison instruction, too. Similarly, we can transform (x == 0) to (x <u 1). Proofs for the transformations that alter the constants: https://alive2.llvm.org/ce/z/3nVqgR Fixes llvm#119606.

llvmbot added the backend:AArch64 label Dec 30, 2024

davemgreen reviewed Jan 2, 2025

View reviewed changes

mskamp force-pushed the fix_119606_reassociate_for_cmp_cse branch from b3f6a01 to 8c78b70 Compare January 4, 2025 08:32

davemgreen reviewed Jan 7, 2025

View reviewed changes

davemgreen requested review from SamTebbs33, nasherm and david-arm January 7, 2025 21:02

mskamp added 2 commits January 8, 2025 20:28

[AArch64] Add Tests for CSEL with Common Subexpression after Reassoci…

6417c98

…ation; NFC

mskamp force-pushed the fix_119606_reassociate_for_cmp_cse branch from 8c78b70 to 43f2da4 Compare January 9, 2025 18:31

davemgreen approved these changes Jan 11, 2025

View reviewed changes

davemgreen merged commit 1eed469 into llvm:main Jan 11, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

Uh oh!

mskamp commented Dec 30, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 30, 2024

Uh oh!

davemgreen left a comment

Uh oh!

mskamp commented Jan 3, 2025

Uh oh!

davemgreen left a comment

Uh oh!

Uh oh!

davemgreen Jan 7, 2025

Uh oh!

mskamp Jan 9, 2025

Uh oh!

davemgreen left a comment

Uh oh!

mskamp commented Jan 11, 2025

Uh oh!

Uh oh!

Uh oh!

[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

[AArch64] Eliminate Common Subexpression of CSEL by Reassociation #121350

Uh oh!

Conversation

mskamp commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 30, 2024

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

mskamp commented Jan 3, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

mskamp Jan 9, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

mskamp commented Jan 11, 2025

Uh oh!

Uh oh!

Uh oh!

mskamp commented Dec 30, 2024 •

edited

Loading