Skip to content

[SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant #127599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

vg0204
Copy link
Contributor

@vg0204 vg0204 commented Feb 18, 2025

The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is.

It necessarily fixes #121145.

@vg0204 vg0204 changed the title [AMDGPU][AArch64][SelectionDAG] Added target hook check for Select fo… [AMDGPU][AArch64][SelectionDAG] Added target hook check for SelectwithConstant Feb 18, 2025
@vg0204 vg0204 marked this pull request as ready for review February 18, 2025 11:19
@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-arm

Author: Vikash Gupta (vg0204)

Changes

The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is.

As for AAarch64, based on selectWithConstant LIT tests, it apparently seems beneficial for it to have convertSelectOfConstantsToMath() set as TRUE against the defualt value.

It necessarily fixes #121145.


Patch is 551.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127599.diff

39 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-ccmp.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-csel.ll (+2-3)
  • (modified) llvm/test/CodeGen/AArch64/arm64-zip.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/cmp-select-sign.ll (+8-12)
  • (modified) llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll (+3-3)
  • (modified) llvm/test/CodeGen/AArch64/i128-math.ll (+8-6)
  • (modified) llvm/test/CodeGen/AArch64/midpoint-int.ll (+90-80)
  • (modified) llvm/test/CodeGen/AArch64/select-constant-xor.ll (+4-2)
  • (modified) llvm/test/CodeGen/AArch64/select_const.ll (+28-35)
  • (modified) llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll (+16-30)
  • (modified) llvm/test/CodeGen/AArch64/signbit-shift.ll (+8-12)
  • (modified) llvm/test/CodeGen/AArch64/vselect-constants.ll (+17-28)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.private-memory.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+7-12)
  • (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+31-51)
  • (modified) llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll (+1-2)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-fmul-sel.ll (+45-88)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll (+8-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+118-138)
  • (modified) llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll (+143-128)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll (+20-21)
  • (modified) llvm/test/CodeGen/AMDGPU/fptrunc.ll (+133-151)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll (+25-50)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (+31-36)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+26-29)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+761-830)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+761-830)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (+244-313)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.set.rounding.ll (+18-68)
  • (modified) llvm/test/CodeGen/AMDGPU/private-memory-atomics.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/pseudo-scalar-transcendental.ll (+8-20)
  • (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+261-269)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-alloca-bitcast.ll (+7-13)
  • (modified) llvm/test/CodeGen/ARM/select-imm.ll (+6-12)
  • (modified) llvm/test/CodeGen/MSP430/shift-amount-threshold.ll (+1-2)
  • (modified) llvm/test/CodeGen/Thumb/branchless-cmp.ll (+28-16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index bc7cdf38dbc2a..486fc00746064 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28189,7 +28189,7 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
   bool Fold = N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2();
   bool Swap = N3C && isNullConstant(N2) && N3C->getAPIntValue().isPowerOf2();
 
-  if ((Fold || Swap) &&
+  if (TLI.convertSelectOfConstantsToMath(VT) && (Fold || Swap) &&
       TLI.getBooleanContents(CmpOpVT) ==
           TargetLowering::ZeroOrOneBooleanContent &&
       (!LegalOperations || TLI.isOperationLegal(ISD::SETCC, CmpOpVT))) {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b26f28dc79f88..048fd7abb907a 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -796,6 +796,8 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
                                          Type *Ty) const override;
 
+  bool convertSelectOfConstantsToMath(EVT VT) const override { return true; }
+
   /// Return true if EXTRACT_SUBVECTOR is cheap for this result type
   /// with this index.
   bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
diff --git a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
index 06e957fdcc6a2..c6c8bfa325c94 100644
--- a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
@@ -501,8 +501,8 @@ define float @select_or_float(i32 %w0, i32 %w1, float %x2, float %x3) {
 define i64 @gccbug(i64 %x0, i64 %x1) {
 ; SDISEL-LABEL: gccbug:
 ; SDISEL:       ; %bb.0:
-; SDISEL-NEXT:    cmp x0, #2
-; SDISEL-NEXT:    ccmp x0, #4, #4, ne
+; SDISEL-NEXT:    cmp x0, #4
+; SDISEL-NEXT:    ccmp x0, #2, #4, ne
 ; SDISEL-NEXT:    ccmp x1, #0, #0, eq
 ; SDISEL-NEXT:    mov w8, #1 ; =0x1
 ; SDISEL-NEXT:    cinc x0, x8, eq
diff --git a/llvm/test/CodeGen/AArch64/arm64-csel.ll b/llvm/test/CodeGen/AArch64/arm64-csel.ll
index 1cf99d1b31a8b..a08ad5f52114a 100644
--- a/llvm/test/CodeGen/AArch64/arm64-csel.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-csel.ll
@@ -295,9 +295,8 @@ entry:
 define i64 @foo18_overflow3(i1 %cmp) nounwind readnone optsize ssp {
 ; CHECK-LABEL: foo18_overflow3:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    csel x0, x8, xzr, ne
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    lsl x0, x0, #63
 ; CHECK-NEXT:    ret
 entry:
   %. = select i1 %cmp, i64 -9223372036854775808, i64 0
diff --git a/llvm/test/CodeGen/AArch64/arm64-zip.ll b/llvm/test/CodeGen/AArch64/arm64-zip.ll
index 9955b253f563e..368429e1ad727 100644
--- a/llvm/test/CodeGen/AArch64/arm64-zip.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-zip.ll
@@ -455,7 +455,7 @@ define <4 x i32> @shuffle_zip3(<4 x i32> %arg) {
 ; CHECK-NEXT:    zip2.4h v0, v0, v1
 ; CHECK-NEXT:    movi.4s v1, #1
 ; CHECK-NEXT:    zip1.4h v0, v0, v0
-; CHECK-NEXT:    sshll.4s v0, v0, #0
+; CHECK-NEXT:    ushll.4s v0, v0, #0
 ; CHECK-NEXT:    and.16b v0, v0, v1
 ; CHECK-NEXT:    ret
 bb:
diff --git a/llvm/test/CodeGen/AArch64/cmp-select-sign.ll b/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
index b4f179e992a0d..22bb2cea0e182 100644
--- a/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
+++ b/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
@@ -241,18 +241,14 @@ define <4 x i32> @not_sign_4xi32_3(<4 x i32> %a) {
 define <4 x i65> @sign_4xi65(<4 x i65> %a) {
 ; CHECK-LABEL: sign_4xi65:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sbfx x8, x5, #0, #1
-; CHECK-NEXT:    sbfx x9, x3, #0, #1
-; CHECK-NEXT:    sbfx x10, x1, #0, #1
-; CHECK-NEXT:    sbfx x11, x7, #0, #1
-; CHECK-NEXT:    lsr x1, x10, #63
-; CHECK-NEXT:    lsr x3, x9, #63
-; CHECK-NEXT:    lsr x5, x8, #63
-; CHECK-NEXT:    lsr x7, x11, #63
-; CHECK-NEXT:    orr x0, x10, #0x1
-; CHECK-NEXT:    orr x2, x9, #0x1
-; CHECK-NEXT:    orr x4, x8, #0x1
-; CHECK-NEXT:    orr x6, x11, #0x1
+; CHECK-NEXT:    sbfx x3, x3, #0, #1
+; CHECK-NEXT:    sbfx x1, x1, #0, #1
+; CHECK-NEXT:    sbfx x7, x7, #0, #1
+; CHECK-NEXT:    sbfx x5, x5, #0, #1
+; CHECK-NEXT:    orr x0, x1, #0x1
+; CHECK-NEXT:    orr x2, x3, #0x1
+; CHECK-NEXT:    orr x6, x7, #0x1
+; CHECK-NEXT:    orr x4, x5, #0x1
 ; CHECK-NEXT:    ret
   %c = icmp sgt <4 x i65> %a, <i65 -1, i65 -1, i65 -1, i65 -1>
   %res = select <4 x i1> %c, <4 x i65> <i65 1, i65 1, i65 1, i65 1>, <4 x i65 > <i65 -1, i65 -1, i65 -1, i65 -1>
diff --git a/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll b/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
index b40c0656a60e4..a3b0eedbb9714 100644
--- a/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
+++ b/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
@@ -184,10 +184,10 @@ define double @fmul_pow_shl_cnt2(i64 %cnt) nounwind {
 define float @fmul_pow_select(i32 %cnt, i1 %c) nounwind {
 ; CHECK-LABEL: fmul_pow_select:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:    tst w1, #0x1
+; CHECK-NEXT:    mov w8, #2 // =0x2
+; CHECK-NEXT:    and w9, w1, #0x1
 ; CHECK-NEXT:    fmov s1, #9.00000000
-; CHECK-NEXT:    cinc w8, w8, eq
+; CHECK-NEXT:    sub w8, w8, w9
 ; CHECK-NEXT:    lsl w8, w8, w0
 ; CHECK-NEXT:    ucvtf s0, w8
 ; CHECK-NEXT:    fmul s0, s0, s1
diff --git a/llvm/test/CodeGen/AArch64/i128-math.ll b/llvm/test/CodeGen/AArch64/i128-math.ll
index 9e1c0c1b115ab..85792a529642d 100644
--- a/llvm/test/CodeGen/AArch64/i128-math.ll
+++ b/llvm/test/CodeGen/AArch64/i128-math.ll
@@ -457,17 +457,19 @@ define i128 @i128_saturating_mul(i128 %x, i128 %y) {
 ; CHECK-NEXT:    adc x10, x13, x14
 ; CHECK-NEXT:    adds x8, x11, x8
 ; CHECK-NEXT:    asr x11, x9, #63
-; CHECK-NEXT:    mul x13, x0, x2
+; CHECK-NEXT:    eor x13, x3, x1
+; CHECK-NEXT:    mul x14, x0, x2
 ; CHECK-NEXT:    adc x10, x12, x10
-; CHECK-NEXT:    eor x12, x3, x1
+; CHECK-NEXT:    lsr x12, x13, #63
 ; CHECK-NEXT:    eor x8, x8, x11
 ; CHECK-NEXT:    eor x10, x10, x11
-; CHECK-NEXT:    asr x11, x12, #63
+; CHECK-NEXT:    mov x11, #9223372036854775807 // =0x7fffffffffffffff
 ; CHECK-NEXT:    orr x8, x8, x10
-; CHECK-NEXT:    eor x10, x11, #0x7fffffffffffffff
+; CHECK-NEXT:    subs x10, x12, #1
+; CHECK-NEXT:    adc x11, xzr, x11
 ; CHECK-NEXT:    cmp x8, #0
-; CHECK-NEXT:    csinv x0, x13, x11, eq
-; CHECK-NEXT:    csel x1, x10, x9, ne
+; CHECK-NEXT:    csel x0, x10, x14, ne
+; CHECK-NEXT:    csel x1, x11, x9, ne
 ; CHECK-NEXT:    ret
   %1 = tail call { i128, i1 } @llvm.smul.with.overflow.i128(i128 %x, i128 %y)
   %2 = extractvalue { i128, i1 } %1, 0
diff --git a/llvm/test/CodeGen/AArch64/midpoint-int.ll b/llvm/test/CodeGen/AArch64/midpoint-int.ll
index bbdce7c6e933b..cca2c9e3a41f7 100644
--- a/llvm/test/CodeGen/AArch64/midpoint-int.ll
+++ b/llvm/test/CodeGen/AArch64/midpoint-int.ll
@@ -271,14 +271,15 @@ define i64 @scalar_i64_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 define i16 @scalar_i16_signed_reg_reg(i16 %a1, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w1
-; CHECK-NEXT:    sxth w10, w0
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxth w8, w1
+; CHECK-NEXT:    sxth w9, w0
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp sgt i16 %a1, %a2 ; signed
   %t4 = select i1 %t3, i16 -1, i16 1
@@ -294,14 +295,15 @@ define i16 @scalar_i16_signed_reg_reg(i16 %a1, i16 %a2) nounwind {
 define i16 @scalar_i16_unsigned_reg_reg(i16 %a1, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_unsigned_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w9, w1, #0xffff
-; CHECK-NEXT:    and w10, w0, #0xffff
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, ls
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    and w8, w1, #0xffff
+; CHECK-NEXT:    and w9, w0, #0xffff
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, hi
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp ugt i16 %a1, %a2
   %t4 = select i1 %t3, i16 -1, i16 1
@@ -319,14 +321,15 @@ define i16 @scalar_i16_unsigned_reg_reg(i16 %a1, i16 %a2) nounwind {
 define i16 @scalar_i16_signed_mem_reg(ptr %a1_addr, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_mem_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w1
-; CHECK-NEXT:    ldrsh w10, [x0]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w10
+; CHECK-NEXT:    sxth w8, w1
+; CHECK-NEXT:    ldrsh w9, [x0]
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w8, w10, w9
 ; CHECK-NEXT:    ret
   %a1 = load i16, ptr %a1_addr
   %t3 = icmp sgt i16 %a1, %a2 ; signed
@@ -343,14 +346,15 @@ define i16 @scalar_i16_signed_mem_reg(ptr %a1_addr, i16 %a2) nounwind {
 define i16 @scalar_i16_signed_reg_mem(i16 %a1, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_reg_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w0
-; CHECK-NEXT:    ldrsh w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w9, w10
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxth w8, w0
+; CHECK-NEXT:    ldrsh w9, [x1]
+; CHECK-NEXT:    subs w8, w8, w9
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %a2 = load i16, ptr %a2_addr
   %t3 = icmp sgt i16 %a1, %a2 ; signed
@@ -367,14 +371,15 @@ define i16 @scalar_i16_signed_reg_mem(i16 %a1, ptr %a2_addr) nounwind {
 define i16 @scalar_i16_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_mem_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    ldrsh w9, [x0]
-; CHECK-NEXT:    ldrsh w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w10, w9, w10
-; CHECK-NEXT:    cneg w10, w10, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w10, w10, #1
-; CHECK-NEXT:    madd w0, w10, w8, w9
+; CHECK-NEXT:    ldrsh w8, [x0]
+; CHECK-NEXT:    ldrsh w9, [x1]
+; CHECK-NEXT:    subs w9, w8, w9
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w9, w9, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w9, w9, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w9, w10, w8
 ; CHECK-NEXT:    ret
   %a1 = load i16, ptr %a1_addr
   %a2 = load i16, ptr %a2_addr
@@ -398,14 +403,15 @@ define i16 @scalar_i16_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 define i8 @scalar_i8_signed_reg_reg(i8 %a1, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w1
-; CHECK-NEXT:    sxtb w10, w0
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxtb w8, w1
+; CHECK-NEXT:    sxtb w9, w0
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp sgt i8 %a1, %a2 ; signed
   %t4 = select i1 %t3, i8 -1, i8 1
@@ -421,14 +427,15 @@ define i8 @scalar_i8_signed_reg_reg(i8 %a1, i8 %a2) nounwind {
 define i8 @scalar_i8_unsigned_reg_reg(i8 %a1, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_unsigned_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w9, w1, #0xff
-; CHECK-NEXT:    and w10, w0, #0xff
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, ls
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    and w8, w1, #0xff
+; CHECK-NEXT:    and w9, w0, #0xff
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, hi
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp ugt i8 %a1, %a2
   %t4 = select i1 %t3, i8 -1, i8 1
@@ -446,14 +453,15 @@ define i8 @scalar_i8_unsigned_reg_reg(i8 %a1, i8 %a2) nounwind {
 define i8 @scalar_i8_signed_mem_reg(ptr %a1_addr, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_mem_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w1
-; CHECK-NEXT:    ldrsb w10, [x0]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w10
+; CHECK-NEXT:    sxtb w8, w1
+; CHECK-NEXT:    ldrsb w9, [x0]
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w8, w10, w9
 ; CHECK-NEXT:    ret
   %a1 = load i8, ptr %a1_addr
   %t3 = icmp sgt i8 %a1, %a2 ; signed
@@ -470,14 +478,15 @@ define i8 @scalar_i8_signed_mem_reg(ptr %a1_addr, i8 %a2) nounwind {
 define i8 @scalar_i8_signed_reg_mem(i8 %a1, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_reg_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w0
-; CHECK-NEXT:    ldrsb w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w9, w10
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxtb w8, w0
+; CHECK-NEXT:    ldrsb w9, [x1]
+; CHECK-NEXT:    subs w8, w8, w9
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %a2 = load i8, ptr %a2_addr
   %t3 = icmp sgt i8 %a1, %a2 ; signed
@@ -494,14 +503,15 @@ define i8 @scalar_i8_signed_reg_mem(i8 %a1, ptr %a2_addr) nounwind {
 define i8 @scalar_i8_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_mem_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    ldrsb w9, [x0]
-; CHECK-NEXT:    ldrsb w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w10, w9, w10
-; CHECK-NEXT:    cneg w10, w10, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w10, w10, #1
-; CHECK-NEXT:    madd w0, w10, w8, w9
+; CHECK-NEXT:    ldrsb w8, [x0]
+; CHECK-NEXT:    ldrsb w9, [x1]
+; CHECK-NEXT:    subs w9, w8, w9
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w9, w9, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w9, w9, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w9, w10, w8
 ; CHECK-NEXT:    ret
   %a1 = load i8, ptr %a1_addr
   %a2 = load i8, ptr %a2_addr
diff --git a/llvm/test/CodeGen/AArch64/select-constant-xor.ll b/llvm/test/CodeGen/AArch64/select-constant-xor.ll
index 3adf48e84b44c..0c09dca186095 100644
--- a/llvm/test/CodeGen/AArch64/select-constant-xor.ll
+++ b/llvm/test/CodeGen/AArch64/select-constant-xor.ll
@@ -27,8 +27,10 @@ define i64 @selecti64i64(i64 %a) {
 define i32 @selecti64i32(i64 %a) {
 ; CHECK-LABEL: selecti64i32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    asr x8, x0, #63
-; CHECK-NEXT:    eor w0, w8, #0x7fffffff
+; CHECK-NEXT:    lsr x9, x0, #63
+; CHECK-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT:    eor w9, w9, #0x1
+; CHECK-NEXT:    sub w0, w8, w9
 ; CHECK-NEXT:    ret
   %c = icmp sgt i64 %a, -1
   %s = select i1 %c, i32 2147483647, i32 -2147483648
diff --git a/llvm/test/CodeGen/AArch64/select_const.ll b/llvm/test/CodeGen/AArch64/select_const.ll
index cd50d776e913f..484a888e12bb0 100644
--- a/llvm/test/CodeGen/AArch64/select_const.ll
+++ b/llvm/test/CodeGen/AArch64/select_const.ll
@@ -126,9 +126,8 @@ define i32 @select_neg1_or_0_signext(i1 signext %cond) {
 define i32 @select_Cplus1_C(i1 %cond) {
 ; CHECK-LABEL: select_Cplus1_C:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    and w8, w0, #0x1
+; CHECK-NEXT:    add w0, w8, #41
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -137,9 +136,7 @@ define i32 @select_Cplus1_C(i1 %cond) {
 define i32 @select_Cplus1_C_zeroext(i1 zeroext %cond) {
 ; CHECK-LABEL: select_Cplus1_C_zeroext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    add w0, w0, #41
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -149,8 +146,7 @@ define i32 @select_Cplus1_C_signext(i1 signext %cond) {
 ; CHECK-LABEL: select_Cplus1_C_signext:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    sub w0, w8, w0
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -161,9 +157,9 @@ define i32 @select_Cplus1_C_signext(i1 signext %cond) {
 define i32 @select_C_Cplus1(i1 %cond) {
 ; CHECK-LABEL: select_C_Cplus1:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    mov w8, #42 // =0x2a
+; CHECK-NEXT:    and w9, w0, #0x1
+; CHECK-NEXT:    sub w0, w8, w9
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -172,9 +168,8 @@ define i32 @select_C_Cplus1(i1 %cond) {
 define i32 @select_C_Cplus1_zeroext(i1 zeroext %cond) {
 ; CHECK-LABEL: select_C_Cplus1_zeroext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    mov w8, #42 // =0x2a
+; CHECK-NEXT:    sub w0, w8, w0
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -183,9 +178,7 @@ define i32 @select_C_Cplus1_zeroext(i1 zeroext %cond) {
 define i32 @select_C_Cplus1_signext(i1 signext %cond) {
 ; CHECK-LABEL: select_C_Cplus1_signext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    add w0, w0, #42
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -360,9 +353,7 @@ define i8 @srem_constant_sel_constants(i1 %cond) {
 define i8 @sel_constants_urem_constant(i1 %cond) {
 ; CHECK-LABEL: sel_constants_urem_constant:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #2 // =0x2
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    eor w0, w0, #0x3
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i8 -4, i8 23
   %bo = urem i8 %sel, 5
@@ -385,9 +376,8 @@ define i8 @urem_constant_sel_constants(i1 %cond) {
 define i8 @sel_constants_and_constant(i1 %cond) {
 ; CHECK-LABEL: sel_constants_and_constant:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #4 // =0x4
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Vikash Gupta (vg0204)

Changes

The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is.

As for AAarch64, based on selectWithConstant LIT tests, it apparently seems beneficial for it to have convertSelectOfConstantsToMath() set as TRUE against the defualt value.

It necessarily fixes #121145.


Patch is 551.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127599.diff

39 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-ccmp.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-csel.ll (+2-3)
  • (modified) llvm/test/CodeGen/AArch64/arm64-zip.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/cmp-select-sign.ll (+8-12)
  • (modified) llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll (+3-3)
  • (modified) llvm/test/CodeGen/AArch64/i128-math.ll (+8-6)
  • (modified) llvm/test/CodeGen/AArch64/midpoint-int.ll (+90-80)
  • (modified) llvm/test/CodeGen/AArch64/select-constant-xor.ll (+4-2)
  • (modified) llvm/test/CodeGen/AArch64/select_const.ll (+28-35)
  • (modified) llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll (+16-30)
  • (modified) llvm/test/CodeGen/AArch64/signbit-shift.ll (+8-12)
  • (modified) llvm/test/CodeGen/AArch64/vselect-constants.ll (+17-28)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.private-memory.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+7-12)
  • (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+31-51)
  • (modified) llvm/test/CodeGen/AMDGPU/copysign-simplify-demanded-bits.ll (+1-2)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-fmul-sel.ll (+45-88)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_dynelt.ll (+8-10)
  • (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+118-138)
  • (modified) llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll (+143-128)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll (+20-21)
  • (modified) llvm/test/CodeGen/AMDGPU/fptrunc.ll (+133-151)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll (+25-50)
  • (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (+31-36)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+26-29)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+761-830)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+761-830)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (+244-313)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.set.rounding.ll (+18-68)
  • (modified) llvm/test/CodeGen/AMDGPU/private-memory-atomics.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/pseudo-scalar-transcendental.ll (+8-20)
  • (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+261-269)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-alloca-bitcast.ll (+7-13)
  • (modified) llvm/test/CodeGen/ARM/select-imm.ll (+6-12)
  • (modified) llvm/test/CodeGen/MSP430/shift-amount-threshold.ll (+1-2)
  • (modified) llvm/test/CodeGen/Thumb/branchless-cmp.ll (+28-16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index bc7cdf38dbc2a..486fc00746064 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28189,7 +28189,7 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
   bool Fold = N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2();
   bool Swap = N3C && isNullConstant(N2) && N3C->getAPIntValue().isPowerOf2();
 
-  if ((Fold || Swap) &&
+  if (TLI.convertSelectOfConstantsToMath(VT) && (Fold || Swap) &&
       TLI.getBooleanContents(CmpOpVT) ==
           TargetLowering::ZeroOrOneBooleanContent &&
       (!LegalOperations || TLI.isOperationLegal(ISD::SETCC, CmpOpVT))) {
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b26f28dc79f88..048fd7abb907a 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -796,6 +796,8 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
                                          Type *Ty) const override;
 
+  bool convertSelectOfConstantsToMath(EVT VT) const override { return true; }
+
   /// Return true if EXTRACT_SUBVECTOR is cheap for this result type
   /// with this index.
   bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
diff --git a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
index 06e957fdcc6a2..c6c8bfa325c94 100644
--- a/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-ccmp.ll
@@ -501,8 +501,8 @@ define float @select_or_float(i32 %w0, i32 %w1, float %x2, float %x3) {
 define i64 @gccbug(i64 %x0, i64 %x1) {
 ; SDISEL-LABEL: gccbug:
 ; SDISEL:       ; %bb.0:
-; SDISEL-NEXT:    cmp x0, #2
-; SDISEL-NEXT:    ccmp x0, #4, #4, ne
+; SDISEL-NEXT:    cmp x0, #4
+; SDISEL-NEXT:    ccmp x0, #2, #4, ne
 ; SDISEL-NEXT:    ccmp x1, #0, #0, eq
 ; SDISEL-NEXT:    mov w8, #1 ; =0x1
 ; SDISEL-NEXT:    cinc x0, x8, eq
diff --git a/llvm/test/CodeGen/AArch64/arm64-csel.ll b/llvm/test/CodeGen/AArch64/arm64-csel.ll
index 1cf99d1b31a8b..a08ad5f52114a 100644
--- a/llvm/test/CodeGen/AArch64/arm64-csel.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-csel.ll
@@ -295,9 +295,8 @@ entry:
 define i64 @foo18_overflow3(i1 %cmp) nounwind readnone optsize ssp {
 ; CHECK-LABEL: foo18_overflow3:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    csel x0, x8, xzr, ne
+; CHECK-NEXT:    // kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:    lsl x0, x0, #63
 ; CHECK-NEXT:    ret
 entry:
   %. = select i1 %cmp, i64 -9223372036854775808, i64 0
diff --git a/llvm/test/CodeGen/AArch64/arm64-zip.ll b/llvm/test/CodeGen/AArch64/arm64-zip.ll
index 9955b253f563e..368429e1ad727 100644
--- a/llvm/test/CodeGen/AArch64/arm64-zip.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-zip.ll
@@ -455,7 +455,7 @@ define <4 x i32> @shuffle_zip3(<4 x i32> %arg) {
 ; CHECK-NEXT:    zip2.4h v0, v0, v1
 ; CHECK-NEXT:    movi.4s v1, #1
 ; CHECK-NEXT:    zip1.4h v0, v0, v0
-; CHECK-NEXT:    sshll.4s v0, v0, #0
+; CHECK-NEXT:    ushll.4s v0, v0, #0
 ; CHECK-NEXT:    and.16b v0, v0, v1
 ; CHECK-NEXT:    ret
 bb:
diff --git a/llvm/test/CodeGen/AArch64/cmp-select-sign.ll b/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
index b4f179e992a0d..22bb2cea0e182 100644
--- a/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
+++ b/llvm/test/CodeGen/AArch64/cmp-select-sign.ll
@@ -241,18 +241,14 @@ define <4 x i32> @not_sign_4xi32_3(<4 x i32> %a) {
 define <4 x i65> @sign_4xi65(<4 x i65> %a) {
 ; CHECK-LABEL: sign_4xi65:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sbfx x8, x5, #0, #1
-; CHECK-NEXT:    sbfx x9, x3, #0, #1
-; CHECK-NEXT:    sbfx x10, x1, #0, #1
-; CHECK-NEXT:    sbfx x11, x7, #0, #1
-; CHECK-NEXT:    lsr x1, x10, #63
-; CHECK-NEXT:    lsr x3, x9, #63
-; CHECK-NEXT:    lsr x5, x8, #63
-; CHECK-NEXT:    lsr x7, x11, #63
-; CHECK-NEXT:    orr x0, x10, #0x1
-; CHECK-NEXT:    orr x2, x9, #0x1
-; CHECK-NEXT:    orr x4, x8, #0x1
-; CHECK-NEXT:    orr x6, x11, #0x1
+; CHECK-NEXT:    sbfx x3, x3, #0, #1
+; CHECK-NEXT:    sbfx x1, x1, #0, #1
+; CHECK-NEXT:    sbfx x7, x7, #0, #1
+; CHECK-NEXT:    sbfx x5, x5, #0, #1
+; CHECK-NEXT:    orr x0, x1, #0x1
+; CHECK-NEXT:    orr x2, x3, #0x1
+; CHECK-NEXT:    orr x6, x7, #0x1
+; CHECK-NEXT:    orr x4, x5, #0x1
 ; CHECK-NEXT:    ret
   %c = icmp sgt <4 x i65> %a, <i65 -1, i65 -1, i65 -1, i65 -1>
   %res = select <4 x i1> %c, <4 x i65> <i65 1, i65 1, i65 1, i65 1>, <4 x i65 > <i65 -1, i65 -1, i65 -1, i65 -1>
diff --git a/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll b/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
index b40c0656a60e4..a3b0eedbb9714 100644
--- a/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
+++ b/llvm/test/CodeGen/AArch64/fold-int-pow2-with-fmul-or-fdiv.ll
@@ -184,10 +184,10 @@ define double @fmul_pow_shl_cnt2(i64 %cnt) nounwind {
 define float @fmul_pow_select(i32 %cnt, i1 %c) nounwind {
 ; CHECK-LABEL: fmul_pow_select:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #1 // =0x1
-; CHECK-NEXT:    tst w1, #0x1
+; CHECK-NEXT:    mov w8, #2 // =0x2
+; CHECK-NEXT:    and w9, w1, #0x1
 ; CHECK-NEXT:    fmov s1, #9.00000000
-; CHECK-NEXT:    cinc w8, w8, eq
+; CHECK-NEXT:    sub w8, w8, w9
 ; CHECK-NEXT:    lsl w8, w8, w0
 ; CHECK-NEXT:    ucvtf s0, w8
 ; CHECK-NEXT:    fmul s0, s0, s1
diff --git a/llvm/test/CodeGen/AArch64/i128-math.ll b/llvm/test/CodeGen/AArch64/i128-math.ll
index 9e1c0c1b115ab..85792a529642d 100644
--- a/llvm/test/CodeGen/AArch64/i128-math.ll
+++ b/llvm/test/CodeGen/AArch64/i128-math.ll
@@ -457,17 +457,19 @@ define i128 @i128_saturating_mul(i128 %x, i128 %y) {
 ; CHECK-NEXT:    adc x10, x13, x14
 ; CHECK-NEXT:    adds x8, x11, x8
 ; CHECK-NEXT:    asr x11, x9, #63
-; CHECK-NEXT:    mul x13, x0, x2
+; CHECK-NEXT:    eor x13, x3, x1
+; CHECK-NEXT:    mul x14, x0, x2
 ; CHECK-NEXT:    adc x10, x12, x10
-; CHECK-NEXT:    eor x12, x3, x1
+; CHECK-NEXT:    lsr x12, x13, #63
 ; CHECK-NEXT:    eor x8, x8, x11
 ; CHECK-NEXT:    eor x10, x10, x11
-; CHECK-NEXT:    asr x11, x12, #63
+; CHECK-NEXT:    mov x11, #9223372036854775807 // =0x7fffffffffffffff
 ; CHECK-NEXT:    orr x8, x8, x10
-; CHECK-NEXT:    eor x10, x11, #0x7fffffffffffffff
+; CHECK-NEXT:    subs x10, x12, #1
+; CHECK-NEXT:    adc x11, xzr, x11
 ; CHECK-NEXT:    cmp x8, #0
-; CHECK-NEXT:    csinv x0, x13, x11, eq
-; CHECK-NEXT:    csel x1, x10, x9, ne
+; CHECK-NEXT:    csel x0, x10, x14, ne
+; CHECK-NEXT:    csel x1, x11, x9, ne
 ; CHECK-NEXT:    ret
   %1 = tail call { i128, i1 } @llvm.smul.with.overflow.i128(i128 %x, i128 %y)
   %2 = extractvalue { i128, i1 } %1, 0
diff --git a/llvm/test/CodeGen/AArch64/midpoint-int.ll b/llvm/test/CodeGen/AArch64/midpoint-int.ll
index bbdce7c6e933b..cca2c9e3a41f7 100644
--- a/llvm/test/CodeGen/AArch64/midpoint-int.ll
+++ b/llvm/test/CodeGen/AArch64/midpoint-int.ll
@@ -271,14 +271,15 @@ define i64 @scalar_i64_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 define i16 @scalar_i16_signed_reg_reg(i16 %a1, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w1
-; CHECK-NEXT:    sxth w10, w0
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxth w8, w1
+; CHECK-NEXT:    sxth w9, w0
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp sgt i16 %a1, %a2 ; signed
   %t4 = select i1 %t3, i16 -1, i16 1
@@ -294,14 +295,15 @@ define i16 @scalar_i16_signed_reg_reg(i16 %a1, i16 %a2) nounwind {
 define i16 @scalar_i16_unsigned_reg_reg(i16 %a1, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_unsigned_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w9, w1, #0xffff
-; CHECK-NEXT:    and w10, w0, #0xffff
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, ls
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    and w8, w1, #0xffff
+; CHECK-NEXT:    and w9, w0, #0xffff
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, hi
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp ugt i16 %a1, %a2
   %t4 = select i1 %t3, i16 -1, i16 1
@@ -319,14 +321,15 @@ define i16 @scalar_i16_unsigned_reg_reg(i16 %a1, i16 %a2) nounwind {
 define i16 @scalar_i16_signed_mem_reg(ptr %a1_addr, i16 %a2) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_mem_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w1
-; CHECK-NEXT:    ldrsh w10, [x0]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w10
+; CHECK-NEXT:    sxth w8, w1
+; CHECK-NEXT:    ldrsh w9, [x0]
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w8, w10, w9
 ; CHECK-NEXT:    ret
   %a1 = load i16, ptr %a1_addr
   %t3 = icmp sgt i16 %a1, %a2 ; signed
@@ -343,14 +346,15 @@ define i16 @scalar_i16_signed_mem_reg(ptr %a1_addr, i16 %a2) nounwind {
 define i16 @scalar_i16_signed_reg_mem(i16 %a1, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_reg_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxth w9, w0
-; CHECK-NEXT:    ldrsh w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w9, w10
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxth w8, w0
+; CHECK-NEXT:    ldrsh w9, [x1]
+; CHECK-NEXT:    subs w8, w8, w9
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %a2 = load i16, ptr %a2_addr
   %t3 = icmp sgt i16 %a1, %a2 ; signed
@@ -367,14 +371,15 @@ define i16 @scalar_i16_signed_reg_mem(i16 %a1, ptr %a2_addr) nounwind {
 define i16 @scalar_i16_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i16_signed_mem_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    ldrsh w9, [x0]
-; CHECK-NEXT:    ldrsh w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w10, w9, w10
-; CHECK-NEXT:    cneg w10, w10, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w10, w10, #1
-; CHECK-NEXT:    madd w0, w10, w8, w9
+; CHECK-NEXT:    ldrsh w8, [x0]
+; CHECK-NEXT:    ldrsh w9, [x1]
+; CHECK-NEXT:    subs w9, w8, w9
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w9, w9, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w9, w9, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w9, w10, w8
 ; CHECK-NEXT:    ret
   %a1 = load i16, ptr %a1_addr
   %a2 = load i16, ptr %a2_addr
@@ -398,14 +403,15 @@ define i16 @scalar_i16_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 define i8 @scalar_i8_signed_reg_reg(i8 %a1, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w1
-; CHECK-NEXT:    sxtb w10, w0
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxtb w8, w1
+; CHECK-NEXT:    sxtb w9, w0
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp sgt i8 %a1, %a2 ; signed
   %t4 = select i1 %t3, i8 -1, i8 1
@@ -421,14 +427,15 @@ define i8 @scalar_i8_signed_reg_reg(i8 %a1, i8 %a2) nounwind {
 define i8 @scalar_i8_unsigned_reg_reg(i8 %a1, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_unsigned_reg_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    and w9, w1, #0xff
-; CHECK-NEXT:    and w10, w0, #0xff
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, ls
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    and w8, w1, #0xff
+; CHECK-NEXT:    and w9, w0, #0xff
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w9, hi
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %t3 = icmp ugt i8 %a1, %a2
   %t4 = select i1 %t3, i8 -1, i8 1
@@ -446,14 +453,15 @@ define i8 @scalar_i8_unsigned_reg_reg(i8 %a1, i8 %a2) nounwind {
 define i8 @scalar_i8_signed_mem_reg(ptr %a1_addr, i8 %a2) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_mem_reg:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w1
-; CHECK-NEXT:    ldrsb w10, [x0]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w10, w9
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w10
+; CHECK-NEXT:    sxtb w8, w1
+; CHECK-NEXT:    ldrsb w9, [x0]
+; CHECK-NEXT:    subs w8, w9, w8
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w8, w10, w9
 ; CHECK-NEXT:    ret
   %a1 = load i8, ptr %a1_addr
   %t3 = icmp sgt i8 %a1, %a2 ; signed
@@ -470,14 +478,15 @@ define i8 @scalar_i8_signed_mem_reg(ptr %a1_addr, i8 %a2) nounwind {
 define i8 @scalar_i8_signed_reg_mem(i8 %a1, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_reg_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    sxtb w9, w0
-; CHECK-NEXT:    ldrsb w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w9, w9, w10
-; CHECK-NEXT:    cneg w9, w9, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w9, w9, #1
-; CHECK-NEXT:    madd w0, w9, w8, w0
+; CHECK-NEXT:    sxtb w8, w0
+; CHECK-NEXT:    ldrsb w9, [x1]
+; CHECK-NEXT:    subs w8, w8, w9
+; CHECK-NEXT:    cset w9, gt
+; CHECK-NEXT:    cneg w8, w8, mi
+; CHECK-NEXT:    sbfx w9, w9, #0, #1
+; CHECK-NEXT:    lsr w8, w8, #1
+; CHECK-NEXT:    orr w9, w9, #0x1
+; CHECK-NEXT:    madd w0, w8, w9, w0
 ; CHECK-NEXT:    ret
   %a2 = load i8, ptr %a2_addr
   %t3 = icmp sgt i8 %a1, %a2 ; signed
@@ -494,14 +503,15 @@ define i8 @scalar_i8_signed_reg_mem(i8 %a1, ptr %a2_addr) nounwind {
 define i8 @scalar_i8_signed_mem_mem(ptr %a1_addr, ptr %a2_addr) nounwind {
 ; CHECK-LABEL: scalar_i8_signed_mem_mem:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    ldrsb w9, [x0]
-; CHECK-NEXT:    ldrsb w10, [x1]
-; CHECK-NEXT:    mov w8, #-1 // =0xffffffff
-; CHECK-NEXT:    subs w10, w9, w10
-; CHECK-NEXT:    cneg w10, w10, mi
-; CHECK-NEXT:    cneg w8, w8, le
-; CHECK-NEXT:    lsr w10, w10, #1
-; CHECK-NEXT:    madd w0, w10, w8, w9
+; CHECK-NEXT:    ldrsb w8, [x0]
+; CHECK-NEXT:    ldrsb w9, [x1]
+; CHECK-NEXT:    subs w9, w8, w9
+; CHECK-NEXT:    cset w10, gt
+; CHECK-NEXT:    cneg w9, w9, mi
+; CHECK-NEXT:    sbfx w10, w10, #0, #1
+; CHECK-NEXT:    lsr w9, w9, #1
+; CHECK-NEXT:    orr w10, w10, #0x1
+; CHECK-NEXT:    madd w0, w9, w10, w8
 ; CHECK-NEXT:    ret
   %a1 = load i8, ptr %a1_addr
   %a2 = load i8, ptr %a2_addr
diff --git a/llvm/test/CodeGen/AArch64/select-constant-xor.ll b/llvm/test/CodeGen/AArch64/select-constant-xor.ll
index 3adf48e84b44c..0c09dca186095 100644
--- a/llvm/test/CodeGen/AArch64/select-constant-xor.ll
+++ b/llvm/test/CodeGen/AArch64/select-constant-xor.ll
@@ -27,8 +27,10 @@ define i64 @selecti64i64(i64 %a) {
 define i32 @selecti64i32(i64 %a) {
 ; CHECK-LABEL: selecti64i32:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    asr x8, x0, #63
-; CHECK-NEXT:    eor w0, w8, #0x7fffffff
+; CHECK-NEXT:    lsr x9, x0, #63
+; CHECK-NEXT:    mov w8, #-2147483648 // =0x80000000
+; CHECK-NEXT:    eor w9, w9, #0x1
+; CHECK-NEXT:    sub w0, w8, w9
 ; CHECK-NEXT:    ret
   %c = icmp sgt i64 %a, -1
   %s = select i1 %c, i32 2147483647, i32 -2147483648
diff --git a/llvm/test/CodeGen/AArch64/select_const.ll b/llvm/test/CodeGen/AArch64/select_const.ll
index cd50d776e913f..484a888e12bb0 100644
--- a/llvm/test/CodeGen/AArch64/select_const.ll
+++ b/llvm/test/CodeGen/AArch64/select_const.ll
@@ -126,9 +126,8 @@ define i32 @select_neg1_or_0_signext(i1 signext %cond) {
 define i32 @select_Cplus1_C(i1 %cond) {
 ; CHECK-LABEL: select_Cplus1_C:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    and w8, w0, #0x1
+; CHECK-NEXT:    add w0, w8, #41
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -137,9 +136,7 @@ define i32 @select_Cplus1_C(i1 %cond) {
 define i32 @select_Cplus1_C_zeroext(i1 zeroext %cond) {
 ; CHECK-LABEL: select_Cplus1_C_zeroext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    add w0, w0, #41
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -149,8 +146,7 @@ define i32 @select_Cplus1_C_signext(i1 signext %cond) {
 ; CHECK-LABEL: select_Cplus1_C_signext:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, ne
+; CHECK-NEXT:    sub w0, w8, w0
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 42, i32 41
   ret i32 %sel
@@ -161,9 +157,9 @@ define i32 @select_Cplus1_C_signext(i1 signext %cond) {
 define i32 @select_C_Cplus1(i1 %cond) {
 ; CHECK-LABEL: select_C_Cplus1:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    mov w8, #42 // =0x2a
+; CHECK-NEXT:    and w9, w0, #0x1
+; CHECK-NEXT:    sub w0, w8, w9
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -172,9 +168,8 @@ define i32 @select_C_Cplus1(i1 %cond) {
 define i32 @select_C_Cplus1_zeroext(i1 zeroext %cond) {
 ; CHECK-LABEL: select_C_Cplus1_zeroext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    cmp w0, #0
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    mov w8, #42 // =0x2a
+; CHECK-NEXT:    sub w0, w8, w0
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -183,9 +178,7 @@ define i32 @select_C_Cplus1_zeroext(i1 zeroext %cond) {
 define i32 @select_C_Cplus1_signext(i1 signext %cond) {
 ; CHECK-LABEL: select_C_Cplus1_signext:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #41 // =0x29
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    add w0, w0, #42
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i32 41, i32 42
   ret i32 %sel
@@ -360,9 +353,7 @@ define i8 @srem_constant_sel_constants(i1 %cond) {
 define i8 @sel_constants_urem_constant(i1 %cond) {
 ; CHECK-LABEL: sel_constants_urem_constant:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #2 // =0x2
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    cinc w0, w8, eq
+; CHECK-NEXT:    eor w0, w0, #0x3
 ; CHECK-NEXT:    ret
   %sel = select i1 %cond, i8 -4, i8 23
   %bo = urem i8 %sel, 5
@@ -385,9 +376,8 @@ define i8 @urem_constant_sel_constants(i1 %cond) {
 define i8 @sel_constants_and_constant(i1 %cond) {
 ; CHECK-LABEL: sel_constants_and_constant:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #4 // =0x4
-; CHECK-NEXT:    tst w0, #0x1
-; CHECK-NEXT:    ...
[truncated]

@@ -1,3 +1,4 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to autogenerate this test, you can delete the CHECK-NOTs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

; CHECK-NEXT: beq .LBB4_2
; CHECK-NEXT: @ %bb.1:
; CHECK-NEXT: movs r0, #4
; CHECK-NEXT: .LBB4_2: @ %entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the end of the function getting cut off by update_llc_test_checks.py? If auto-generation isn't working correctly, please take a look.

Branches are pretty expensive on a cortex-m0. The old code might be better? (Thumb1 in particular is a bit different from other AArch32 targets.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the end of the function getting cut off by update_llc_test_checks.py? If auto-generation isn't working correctly, please take a look.

Removed them expliclty after auto-generation, thought to only keep changed relevant things!

; CHECK-NEXT: csinv x0, x13, x11, eq
; CHECK-NEXT: csel x1, x10, x9, ne
; CHECK-NEXT: csel x0, x10, x14, ne
; CHECK-NEXT: csel x1, x11, x9, ne
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks worse. (You don't need to necessarily improve every case, but it would be good to have a rough idea what's happening here.)

; CHECK-NEXT: lsr x9, x0, #63
; CHECK-NEXT: mov w8, #-2147483648 // =0x80000000
; CHECK-NEXT: eor w9, w9, #0x1
; CHECK-NEXT: sub w0, w8, w9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks worse.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should have fewer tags and name the used target hook

@vg0204
Copy link
Contributor Author

vg0204 commented Feb 19, 2025

@efriedma-quic , I am not quite sure about AArch64 tests, (as not much familiar with its target specific optimization). Even as @arsenm suggested the new override for target hook (should be taken care separately) mentioned for AArch64 apparently looked beneficial to me so did that based on changed LIT test after hook usgae. Can you comment even that override is needed for AArch64!

@vg0204 vg0204 changed the title [AMDGPU][AArch64][SelectionDAG] Added target hook check for SelectwithConstant [SelectionDAG] Added target hook check for SelectwithConstant Feb 20, 2025
@vg0204 vg0204 force-pushed the vg0204/add-targethook-check-select-folding-isel branch from 355c2e3 to 71dbc40 Compare February 20, 2025 08:55
@vg0204 vg0204 changed the title [SelectionDAG] Added target hook check for SelectwithConstant [SelectionDAG] Added target hook convertSelectOfConstantsToMath check for SelectwithConstant Feb 20, 2025
@vg0204
Copy link
Contributor Author

vg0204 commented Feb 20, 2025

Considered to add the override for any other target as separate PR, with proper reasoning there, & see their cascading effect with this PR separately!

@vg0204 vg0204 self-assigned this Feb 20, 2025
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I suspect we might want an override for AArch64, and for Thumb1, after some further investigation. But the differences are small enough I'm fine with taking this as-is, and leaving investigating those for later.

This patch adds the required convertSelectOfConstantsToMath() target
hook within SimplifySelectCC helper combine function in SelectionDAG
Isel, where generic select folding with constants is happening into
simple maths op using the condition as it is.
@vg0204 vg0204 force-pushed the vg0204/add-targethook-check-select-folding-isel branch from 71dbc40 to e7eeeab Compare February 25, 2025 12:49
@arsenm
Copy link
Contributor

arsenm commented Feb 25, 2025

Description should be fixed, you aren't adding a new hook and only introducing a new use

@vg0204 vg0204 changed the title [SelectionDAG] Added target hook convertSelectOfConstantsToMath check for SelectwithConstant [SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath check for SelectwithConstant Feb 25, 2025
@vg0204 vg0204 changed the title [SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath check for SelectwithConstant [SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant Feb 25, 2025
@vg0204 vg0204 merged commit 352c48f into llvm:main Feb 25, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 25, 2025

LLVM Buildbot has detected a new failure on builder clang-m68k-linux-cross running on suse-gary-m68k-cross while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/6598

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'LLVM :: CodeGen/M68k/Control/setcc.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/llc < /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll -mtriple=m68k-linux -verify-machineinstrs | /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll
+ /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/llc -mtriple=m68k-linux -verify-machineinstrs
+ /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/stage1/bin/FileCheck /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll:12:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: shi %d0
              ^
<stdin>:10:16: note: scanning from here
 sub.l #26, %d0
               ^
<stdin>:13:8: note: possible intended match here
 moveq #0, %d0
       ^
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll:28:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: scs %d0
              ^
<stdin>:28:16: note: scanning from here
 sub.l #26, %d0
               ^
<stdin>:31:8: note: possible intended match here
 moveq #0, %d0
       ^
/var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll:43:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: moveq #0, %d2
              ^
<stdin>:45:44: note: scanning from here
 movem.l %d2, (0,%sp) ; 8-byte Folded Spill
                                           ^
<stdin>:47:2: note: possible intended match here
 moveq #0, %d0
 ^

Input file: <stdin>
Check file: /var/lib/buildbot/workers/suse-gary-m68k-cross/clang-m68k-linux-cross/llvm/llvm/test/CodeGen/M68k/Control/setcc.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
           5:  .type t1,@function 
           6: t1: ; @t1 
           7: ; %bb.0: ; %entry 
           8:  move.w (6,%sp), %d0 
...

jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request May 27, 2025
…or SelectwithConstant (llvm#127599)

The Target hook convertSelectOfConstantsToMath() needs to be used within
SimplifySelectCC helper combine function in SelectionDAG Isel, where
generic select folding with constants is happening into simple maths op
using the condition as it is.

It necessarily fixes llvm#121145.

Change-Id: Ib62b4b7ba2e20b2c983dcedb8212b8bd4419dd5d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] [AMDGPU] [SelectionDAG] [GlobalIsel] select with constant combine into binaryOp with zext/sext
5 participants