Skip to content

[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types #99913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

wizardengineer
Copy link
Contributor

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-aarch64

Author: Julius Alexandre (medievalghoul)

Changes

Issue: rust-lang/rust#124790
Previous PR: #99614

https://rust.godbolt.org/z/T7eKP3Tvo

Aarch64: https://alive2.llvm.org/ce/z/dqr2Kg
x86: https://alive2.llvm.org/ce/z/ze88Hw

cc: @RKSimon @topperc


Full diff: https://github.com/llvm/llvm-project/pull/99913.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+23-1)
  • (added) llvm/test/CodeGen/AArch64/avg-i128.ll (+129)
  • (added) llvm/test/CodeGen/X86/avg-i128.ll (+159)
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index c3a20b5044c5f..92795bd37a562 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -9318,7 +9318,8 @@ SDValue TargetLowering::expandAVG(SDNode *N, SelectionDAG &DAG) const {
   unsigned ExtOpc = IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
   assert((Opc == ISD::AVGFLOORS || Opc == ISD::AVGCEILS ||
           Opc == ISD::AVGFLOORU || Opc == ISD::AVGCEILU) &&
-         "Unknown AVG node");
+         "Unknown AVG node"); 
+  EVT SVT = VT.getScalarType();
 
   // If the operands are already extended, we can add+shift.
   bool IsExt =
@@ -9352,6 +9353,27 @@ SDValue TargetLowering::expandAVG(SDNode *N, SelectionDAG &DAG) const {
     }
   }
 
+  if (Opc == ISD::AVGFLOORU && SVT == MVT::i128) {
+    SDValue UAddWithOverflow = DAG.getNode(ISD::UADDO, dl, 
+                                           DAG.getVTList(VT, MVT::i1), { RHS, LHS });
+
+    SDValue Sum = UAddWithOverflow.getValue(0);
+    SDValue Overflow = UAddWithOverflow.getValue(1);
+
+    // Right shift the sum by 1
+    SDValue One = DAG.getConstant(1, dl, VT);
+    SDValue LShrVal = DAG.getNode(ISD::SRL, dl, VT, Sum, One);
+    
+    // Creating the select instruction
+    APInt SignMin = APInt::getSignedMinValue(VT.getSizeInBits());
+    SDValue SignMinVal = DAG.getConstant(SignMin, dl, VT);
+    SDValue ZeroOut = DAG.getConstant(0, dl, VT);
+
+    SDValue SelectVal = DAG.getSelect(dl, VT, Overflow, SignMinVal, ZeroOut);
+
+    return DAG.getNode(ISD::OR, dl, VT, LShrVal, SelectVal);
+  }
+
   // avgceils(lhs, rhs) -> sub(or(lhs,rhs),ashr(xor(lhs,rhs),1))
   // avgceilu(lhs, rhs) -> sub(or(lhs,rhs),lshr(xor(lhs,rhs),1))
   // avgfloors(lhs, rhs) -> add(and(lhs,rhs),ashr(xor(lhs,rhs),1))
diff --git a/llvm/test/CodeGen/AArch64/avg-i128.ll b/llvm/test/CodeGen/AArch64/avg-i128.ll
new file mode 100644
index 0000000000000..75ee52decbb70
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avg-i128.ll
@@ -0,0 +1,129 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+define i128 @avgflooru_i128(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    adds x9, x0, x2
+; CHECK-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x10, x1, x3
+; CHECK-NEXT:    csel x1, x8, xzr, hs
+; CHECK-NEXT:    extr x0, x10, x9, #1
+; CHECK-NEXT:    bfxil x1, x10, #1, #63
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %y, %x
+  %lshr = lshr i128 %xor, 1
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+declare void @use(i8)
+
+define i128 @avgflooru_i128_multi_use(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_multi_use:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    str x30, [sp, #-64]! // 8-byte Folded Spill
+; CHECK-NEXT:    stp x24, x23, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x22, x21, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x20, x19, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w20, -16
+; CHECK-NEXT:    .cfi_offset w21, -24
+; CHECK-NEXT:    .cfi_offset w22, -32
+; CHECK-NEXT:    .cfi_offset w23, -40
+; CHECK-NEXT:    .cfi_offset w24, -48
+; CHECK-NEXT:    .cfi_offset w30, -64
+; CHECK-NEXT:    eor x23, x3, x1
+; CHECK-NEXT:    eor x24, x2, x0
+; CHECK-NEXT:    mov x21, x1
+; CHECK-NEXT:    mov x22, x0
+; CHECK-NEXT:    mov x0, x24
+; CHECK-NEXT:    mov x1, x23
+; CHECK-NEXT:    mov x19, x3
+; CHECK-NEXT:    mov x20, x2
+; CHECK-NEXT:    bl use
+; CHECK-NEXT:    extr x0, x23, x24, #1
+; CHECK-NEXT:    lsr x1, x23, #1
+; CHECK-NEXT:    bl use
+; CHECK-NEXT:    adds x8, x22, x20
+; CHECK-NEXT:    mov x10, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x9, x21, x19
+; CHECK-NEXT:    ldp x20, x19, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x22, x21, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    csel x1, x10, xzr, hs
+; CHECK-NEXT:    ldp x24, x23, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    extr x0, x9, x8, #1
+; CHECK-NEXT:    bfxil x1, x9, #1, #63
+; CHECK-NEXT:    ldr x30, [sp], #64 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %y, %x
+  call void @use(i128 %xor)
+  %lshr = lshr i128 %xor, 1
+  call void @use(i128 %lshr)
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+define i128 @avgflooru_i128_negative(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_negative:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    mvn x8, x0
+; CHECK-NEXT:    and x9, x2, x0
+; CHECK-NEXT:    mvn x10, x1
+; CHECK-NEXT:    and x11, x3, x1
+; CHECK-NEXT:    adds x0, x8, x9
+; CHECK-NEXT:    adc x1, x10, x11
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %x, -1
+  %and = and i128 %y, %x
+  %add = add i128 %xor, %and
+  ret i128 %add
+}
+
+define i32 @avgflooru_i128_negative2(i32 %x, i32 %y) {
+; CHECK-LABEL: avgflooru_i128_negative2:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    mov w8, w1
+; CHECK-NEXT:    add x8, x8, w0, uxtw
+; CHECK-NEXT:    lsr x0, x8, #1
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i32 %y, %x
+  %lshr = lshr i32 %xor, 1
+  %and = and i32 %y, %x
+  %add = add i32 %lshr, %and
+  ret i32 %add
+}
+
+define <2 x i128> @avgflooru_i128_vec(<2 x i128> %x, <2 x i128> %y) {
+; CHECK-LABEL: avgflooru_i128_vec:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    adds x8, x0, x4
+; CHECK-NEXT:    mov x10, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x9, x1, x5
+; CHECK-NEXT:    csel x1, x10, xzr, hs
+; CHECK-NEXT:    adds x11, x2, x6
+; CHECK-NEXT:    extr x0, x9, x8, #1
+; CHECK-NEXT:    adcs x12, x3, x7
+; CHECK-NEXT:    bfxil x1, x9, #1, #63
+; CHECK-NEXT:    csel x3, x10, xzr, hs
+; CHECK-NEXT:    extr x10, x12, x11, #1
+; CHECK-NEXT:    bfxil x3, x12, #1, #63
+; CHECK-NEXT:    fmov d0, x10
+; CHECK-NEXT:    mov v0.d[1], x3
+; CHECK-NEXT:    fmov x2, d0
+; CHECK-NEXT:    ret
+start:
+  %xor = xor <2 x i128> %y, %x
+  %lshr = lshr <2 x i128> %xor, <i128 1, i128 1>
+  %and = and <2 x i128> %y, %x
+  %add = add <2 x i128> %lshr, %and
+  ret <2 x i128> %add
+}
diff --git a/llvm/test/CodeGen/X86/avg-i128.ll b/llvm/test/CodeGen/X86/avg-i128.ll
new file mode 100644
index 0000000000000..e0e3283c308d7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/avg-i128.ll
@@ -0,0 +1,159 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 < %s | FileCheck %s
+
+define i128 @avgflooru_i128(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    addq %rdx, %rax
+; CHECK-NEXT:    adcq %rcx, %rsi
+; CHECK-NEXT:    setb %cl
+; CHECK-NEXT:    shrdq $1, %rsi, %rax
+; CHECK-NEXT:    movzbl %cl, %edx
+; CHECK-NEXT:    shldq $63, %rsi, %rdx
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %y, %x
+  %lshr = lshr i128 %xor, 1
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+declare void @use(i8)
+
+define i128 @avgflooru_i128_multi_use(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_multi_use:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    pushq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    pushq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    pushq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    .cfi_offset %rbx, -56
+; CHECK-NEXT:    .cfi_offset %r12, -48
+; CHECK-NEXT:    .cfi_offset %r13, -40
+; CHECK-NEXT:    .cfi_offset %r14, -32
+; CHECK-NEXT:    .cfi_offset %r15, -24
+; CHECK-NEXT:    .cfi_offset %rbp, -16
+; CHECK-NEXT:    movq %rcx, %rbx
+; CHECK-NEXT:    movq %rdx, %r14
+; CHECK-NEXT:    movq %rsi, %r15
+; CHECK-NEXT:    movq %rdi, %r12
+; CHECK-NEXT:    movq %rdx, %r13
+; CHECK-NEXT:    xorq %rdi, %r13
+; CHECK-NEXT:    movq %rcx, %rbp
+; CHECK-NEXT:    xorq %rsi, %rbp
+; CHECK-NEXT:    movq %r13, %rdi
+; CHECK-NEXT:    movq %rbp, %rsi
+; CHECK-NEXT:    callq use@PLT
+; CHECK-NEXT:    shrdq $1, %rbp, %r13
+; CHECK-NEXT:    shrq %rbp
+; CHECK-NEXT:    movq %r13, %rdi
+; CHECK-NEXT:    movq %rbp, %rsi
+; CHECK-NEXT:    callq use@PLT
+; CHECK-NEXT:    addq %r14, %r12
+; CHECK-NEXT:    adcq %rbx, %r15
+; CHECK-NEXT:    setb %al
+; CHECK-NEXT:    shrdq $1, %r15, %r12
+; CHECK-NEXT:    movzbl %al, %edx
+; CHECK-NEXT:    shldq $63, %r15, %rdx
+; CHECK-NEXT:    movq %r12, %rax
+; CHECK-NEXT:    addq $8, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    popq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    popq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    popq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %y, %x
+  call void @use(i128 %xor)
+  %lshr = lshr i128 %xor, 1
+  call void @use(i128 %lshr)
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+define i128 @avgflooru_i128_negative(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_negative:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    andq %rsi, %rcx
+; CHECK-NEXT:    notq %rsi
+; CHECK-NEXT:    andq %rdi, %rdx
+; CHECK-NEXT:    notq %rax
+; CHECK-NEXT:    addq %rdx, %rax
+; CHECK-NEXT:    adcq %rcx, %rsi
+; CHECK-NEXT:    movq %rsi, %rdx
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %x, -1
+  %and = and i128 %y, %x
+  %add = add i128 %xor, %and
+  ret i128 %add
+}
+
+define i32 @avgflooru_i128_negative2(i32 %x, i32 %y) {
+; CHECK-LABEL: avgflooru_i128_negative2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movl %edi, %ecx
+; CHECK-NEXT:    movl %esi, %eax
+; CHECK-NEXT:    addq %rcx, %rax
+; CHECK-NEXT:    shrq %rax
+; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i32 %y, %x
+  %lshr = lshr i32 %xor, 1
+  %and = and i32 %y, %x
+  %add = add i32 %lshr, %and
+  ret i32 %add
+}
+
+define <2 x i128> @avgflooru_i128_vec(<2 x i128> %x, <2 x i128> %y) {
+; CHECK-LABEL: avgflooru_i128_vec:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    addq {{[0-9]+}}(%rsp), %rsi
+; CHECK-NEXT:    adcq {{[0-9]+}}(%rsp), %rdx
+; CHECK-NEXT:    setb %dil
+; CHECK-NEXT:    movzbl %dil, %edi
+; CHECK-NEXT:    shldq $63, %rdx, %rdi
+; CHECK-NEXT:    addq {{[0-9]+}}(%rsp), %rcx
+; CHECK-NEXT:    adcq {{[0-9]+}}(%rsp), %r8
+; CHECK-NEXT:    setb %r9b
+; CHECK-NEXT:    movzbl %r9b, %r9d
+; CHECK-NEXT:    shldq $63, %r8, %r9
+; CHECK-NEXT:    shldq $63, %rsi, %rdx
+; CHECK-NEXT:    shldq $63, %rcx, %r8
+; CHECK-NEXT:    movq %r8, 16(%rax)
+; CHECK-NEXT:    movq %rdx, (%rax)
+; CHECK-NEXT:    movq %r9, 24(%rax)
+; CHECK-NEXT:    movq %rdi, 8(%rax)
+; CHECK-NEXT:    retq
+start:
+  %xor = xor <2 x i128> %y, %x
+  %lshr = lshr <2 x i128> %xor, <i128 1, i128 1>
+  %and = and <2 x i128> %y, %x
+  %add = add <2 x i128> %lshr, %and
+  ret <2 x i128> %add
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Julius Alexandre (medievalghoul)

Changes

Issue: rust-lang/rust#124790
Previous PR: #99614

https://rust.godbolt.org/z/T7eKP3Tvo

Aarch64: https://alive2.llvm.org/ce/z/dqr2Kg
x86: https://alive2.llvm.org/ce/z/ze88Hw

cc: @RKSimon @topperc


Full diff: https://github.com/llvm/llvm-project/pull/99913.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+23-1)
  • (added) llvm/test/CodeGen/AArch64/avg-i128.ll (+129)
  • (added) llvm/test/CodeGen/X86/avg-i128.ll (+159)
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index c3a20b5044c5f..92795bd37a562 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -9318,7 +9318,8 @@ SDValue TargetLowering::expandAVG(SDNode *N, SelectionDAG &DAG) const {
   unsigned ExtOpc = IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
   assert((Opc == ISD::AVGFLOORS || Opc == ISD::AVGCEILS ||
           Opc == ISD::AVGFLOORU || Opc == ISD::AVGCEILU) &&
-         "Unknown AVG node");
+         "Unknown AVG node"); 
+  EVT SVT = VT.getScalarType();
 
   // If the operands are already extended, we can add+shift.
   bool IsExt =
@@ -9352,6 +9353,27 @@ SDValue TargetLowering::expandAVG(SDNode *N, SelectionDAG &DAG) const {
     }
   }
 
+  if (Opc == ISD::AVGFLOORU && SVT == MVT::i128) {
+    SDValue UAddWithOverflow = DAG.getNode(ISD::UADDO, dl, 
+                                           DAG.getVTList(VT, MVT::i1), { RHS, LHS });
+
+    SDValue Sum = UAddWithOverflow.getValue(0);
+    SDValue Overflow = UAddWithOverflow.getValue(1);
+
+    // Right shift the sum by 1
+    SDValue One = DAG.getConstant(1, dl, VT);
+    SDValue LShrVal = DAG.getNode(ISD::SRL, dl, VT, Sum, One);
+    
+    // Creating the select instruction
+    APInt SignMin = APInt::getSignedMinValue(VT.getSizeInBits());
+    SDValue SignMinVal = DAG.getConstant(SignMin, dl, VT);
+    SDValue ZeroOut = DAG.getConstant(0, dl, VT);
+
+    SDValue SelectVal = DAG.getSelect(dl, VT, Overflow, SignMinVal, ZeroOut);
+
+    return DAG.getNode(ISD::OR, dl, VT, LShrVal, SelectVal);
+  }
+
   // avgceils(lhs, rhs) -> sub(or(lhs,rhs),ashr(xor(lhs,rhs),1))
   // avgceilu(lhs, rhs) -> sub(or(lhs,rhs),lshr(xor(lhs,rhs),1))
   // avgfloors(lhs, rhs) -> add(and(lhs,rhs),ashr(xor(lhs,rhs),1))
diff --git a/llvm/test/CodeGen/AArch64/avg-i128.ll b/llvm/test/CodeGen/AArch64/avg-i128.ll
new file mode 100644
index 0000000000000..75ee52decbb70
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avg-i128.ll
@@ -0,0 +1,129 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64 < %s | FileCheck %s
+
+define i128 @avgflooru_i128(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    adds x9, x0, x2
+; CHECK-NEXT:    mov x8, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x10, x1, x3
+; CHECK-NEXT:    csel x1, x8, xzr, hs
+; CHECK-NEXT:    extr x0, x10, x9, #1
+; CHECK-NEXT:    bfxil x1, x10, #1, #63
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %y, %x
+  %lshr = lshr i128 %xor, 1
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+declare void @use(i8)
+
+define i128 @avgflooru_i128_multi_use(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_multi_use:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    str x30, [sp, #-64]! // 8-byte Folded Spill
+; CHECK-NEXT:    stp x24, x23, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x22, x21, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x20, x19, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    .cfi_offset w19, -8
+; CHECK-NEXT:    .cfi_offset w20, -16
+; CHECK-NEXT:    .cfi_offset w21, -24
+; CHECK-NEXT:    .cfi_offset w22, -32
+; CHECK-NEXT:    .cfi_offset w23, -40
+; CHECK-NEXT:    .cfi_offset w24, -48
+; CHECK-NEXT:    .cfi_offset w30, -64
+; CHECK-NEXT:    eor x23, x3, x1
+; CHECK-NEXT:    eor x24, x2, x0
+; CHECK-NEXT:    mov x21, x1
+; CHECK-NEXT:    mov x22, x0
+; CHECK-NEXT:    mov x0, x24
+; CHECK-NEXT:    mov x1, x23
+; CHECK-NEXT:    mov x19, x3
+; CHECK-NEXT:    mov x20, x2
+; CHECK-NEXT:    bl use
+; CHECK-NEXT:    extr x0, x23, x24, #1
+; CHECK-NEXT:    lsr x1, x23, #1
+; CHECK-NEXT:    bl use
+; CHECK-NEXT:    adds x8, x22, x20
+; CHECK-NEXT:    mov x10, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x9, x21, x19
+; CHECK-NEXT:    ldp x20, x19, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x22, x21, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    csel x1, x10, xzr, hs
+; CHECK-NEXT:    ldp x24, x23, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    extr x0, x9, x8, #1
+; CHECK-NEXT:    bfxil x1, x9, #1, #63
+; CHECK-NEXT:    ldr x30, [sp], #64 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %y, %x
+  call void @use(i128 %xor)
+  %lshr = lshr i128 %xor, 1
+  call void @use(i128 %lshr)
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+define i128 @avgflooru_i128_negative(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_negative:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    mvn x8, x0
+; CHECK-NEXT:    and x9, x2, x0
+; CHECK-NEXT:    mvn x10, x1
+; CHECK-NEXT:    and x11, x3, x1
+; CHECK-NEXT:    adds x0, x8, x9
+; CHECK-NEXT:    adc x1, x10, x11
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i128 %x, -1
+  %and = and i128 %y, %x
+  %add = add i128 %xor, %and
+  ret i128 %add
+}
+
+define i32 @avgflooru_i128_negative2(i32 %x, i32 %y) {
+; CHECK-LABEL: avgflooru_i128_negative2:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    mov w8, w1
+; CHECK-NEXT:    add x8, x8, w0, uxtw
+; CHECK-NEXT:    lsr x0, x8, #1
+; CHECK-NEXT:    // kill: def $w0 killed $w0 killed $x0
+; CHECK-NEXT:    ret
+start:
+  %xor = xor i32 %y, %x
+  %lshr = lshr i32 %xor, 1
+  %and = and i32 %y, %x
+  %add = add i32 %lshr, %and
+  ret i32 %add
+}
+
+define <2 x i128> @avgflooru_i128_vec(<2 x i128> %x, <2 x i128> %y) {
+; CHECK-LABEL: avgflooru_i128_vec:
+; CHECK:       // %bb.0: // %start
+; CHECK-NEXT:    adds x8, x0, x4
+; CHECK-NEXT:    mov x10, #-9223372036854775808 // =0x8000000000000000
+; CHECK-NEXT:    adcs x9, x1, x5
+; CHECK-NEXT:    csel x1, x10, xzr, hs
+; CHECK-NEXT:    adds x11, x2, x6
+; CHECK-NEXT:    extr x0, x9, x8, #1
+; CHECK-NEXT:    adcs x12, x3, x7
+; CHECK-NEXT:    bfxil x1, x9, #1, #63
+; CHECK-NEXT:    csel x3, x10, xzr, hs
+; CHECK-NEXT:    extr x10, x12, x11, #1
+; CHECK-NEXT:    bfxil x3, x12, #1, #63
+; CHECK-NEXT:    fmov d0, x10
+; CHECK-NEXT:    mov v0.d[1], x3
+; CHECK-NEXT:    fmov x2, d0
+; CHECK-NEXT:    ret
+start:
+  %xor = xor <2 x i128> %y, %x
+  %lshr = lshr <2 x i128> %xor, <i128 1, i128 1>
+  %and = and <2 x i128> %y, %x
+  %add = add <2 x i128> %lshr, %and
+  ret <2 x i128> %add
+}
diff --git a/llvm/test/CodeGen/X86/avg-i128.ll b/llvm/test/CodeGen/X86/avg-i128.ll
new file mode 100644
index 0000000000000..e0e3283c308d7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/avg-i128.ll
@@ -0,0 +1,159 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 < %s | FileCheck %s
+
+define i128 @avgflooru_i128(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    addq %rdx, %rax
+; CHECK-NEXT:    adcq %rcx, %rsi
+; CHECK-NEXT:    setb %cl
+; CHECK-NEXT:    shrdq $1, %rsi, %rax
+; CHECK-NEXT:    movzbl %cl, %edx
+; CHECK-NEXT:    shldq $63, %rsi, %rdx
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %y, %x
+  %lshr = lshr i128 %xor, 1
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+declare void @use(i8)
+
+define i128 @avgflooru_i128_multi_use(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_multi_use:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    pushq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    pushq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    pushq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    pushq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    pushq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    pushq %rax
+; CHECK-NEXT:    .cfi_def_cfa_offset 64
+; CHECK-NEXT:    .cfi_offset %rbx, -56
+; CHECK-NEXT:    .cfi_offset %r12, -48
+; CHECK-NEXT:    .cfi_offset %r13, -40
+; CHECK-NEXT:    .cfi_offset %r14, -32
+; CHECK-NEXT:    .cfi_offset %r15, -24
+; CHECK-NEXT:    .cfi_offset %rbp, -16
+; CHECK-NEXT:    movq %rcx, %rbx
+; CHECK-NEXT:    movq %rdx, %r14
+; CHECK-NEXT:    movq %rsi, %r15
+; CHECK-NEXT:    movq %rdi, %r12
+; CHECK-NEXT:    movq %rdx, %r13
+; CHECK-NEXT:    xorq %rdi, %r13
+; CHECK-NEXT:    movq %rcx, %rbp
+; CHECK-NEXT:    xorq %rsi, %rbp
+; CHECK-NEXT:    movq %r13, %rdi
+; CHECK-NEXT:    movq %rbp, %rsi
+; CHECK-NEXT:    callq use@PLT
+; CHECK-NEXT:    shrdq $1, %rbp, %r13
+; CHECK-NEXT:    shrq %rbp
+; CHECK-NEXT:    movq %r13, %rdi
+; CHECK-NEXT:    movq %rbp, %rsi
+; CHECK-NEXT:    callq use@PLT
+; CHECK-NEXT:    addq %r14, %r12
+; CHECK-NEXT:    adcq %rbx, %r15
+; CHECK-NEXT:    setb %al
+; CHECK-NEXT:    shrdq $1, %r15, %r12
+; CHECK-NEXT:    movzbl %al, %edx
+; CHECK-NEXT:    shldq $63, %r15, %rdx
+; CHECK-NEXT:    movq %r12, %rax
+; CHECK-NEXT:    addq $8, %rsp
+; CHECK-NEXT:    .cfi_def_cfa_offset 56
+; CHECK-NEXT:    popq %rbx
+; CHECK-NEXT:    .cfi_def_cfa_offset 48
+; CHECK-NEXT:    popq %r12
+; CHECK-NEXT:    .cfi_def_cfa_offset 40
+; CHECK-NEXT:    popq %r13
+; CHECK-NEXT:    .cfi_def_cfa_offset 32
+; CHECK-NEXT:    popq %r14
+; CHECK-NEXT:    .cfi_def_cfa_offset 24
+; CHECK-NEXT:    popq %r15
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    .cfi_def_cfa_offset 8
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %y, %x
+  call void @use(i128 %xor)
+  %lshr = lshr i128 %xor, 1
+  call void @use(i128 %lshr)
+  %and = and i128 %y, %x
+  %add = add i128 %lshr, %and
+  ret i128 %add
+}
+
+define i128 @avgflooru_i128_negative(i128 %x, i128 %y) {
+; CHECK-LABEL: avgflooru_i128_negative:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    andq %rsi, %rcx
+; CHECK-NEXT:    notq %rsi
+; CHECK-NEXT:    andq %rdi, %rdx
+; CHECK-NEXT:    notq %rax
+; CHECK-NEXT:    addq %rdx, %rax
+; CHECK-NEXT:    adcq %rcx, %rsi
+; CHECK-NEXT:    movq %rsi, %rdx
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i128 %x, -1
+  %and = and i128 %y, %x
+  %add = add i128 %xor, %and
+  ret i128 %add
+}
+
+define i32 @avgflooru_i128_negative2(i32 %x, i32 %y) {
+; CHECK-LABEL: avgflooru_i128_negative2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movl %edi, %ecx
+; CHECK-NEXT:    movl %esi, %eax
+; CHECK-NEXT:    addq %rcx, %rax
+; CHECK-NEXT:    shrq %rax
+; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    retq
+start:
+  %xor = xor i32 %y, %x
+  %lshr = lshr i32 %xor, 1
+  %and = and i32 %y, %x
+  %add = add i32 %lshr, %and
+  ret i32 %add
+}
+
+define <2 x i128> @avgflooru_i128_vec(<2 x i128> %x, <2 x i128> %y) {
+; CHECK-LABEL: avgflooru_i128_vec:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    addq {{[0-9]+}}(%rsp), %rsi
+; CHECK-NEXT:    adcq {{[0-9]+}}(%rsp), %rdx
+; CHECK-NEXT:    setb %dil
+; CHECK-NEXT:    movzbl %dil, %edi
+; CHECK-NEXT:    shldq $63, %rdx, %rdi
+; CHECK-NEXT:    addq {{[0-9]+}}(%rsp), %rcx
+; CHECK-NEXT:    adcq {{[0-9]+}}(%rsp), %r8
+; CHECK-NEXT:    setb %r9b
+; CHECK-NEXT:    movzbl %r9b, %r9d
+; CHECK-NEXT:    shldq $63, %r8, %r9
+; CHECK-NEXT:    shldq $63, %rsi, %rdx
+; CHECK-NEXT:    shldq $63, %rcx, %r8
+; CHECK-NEXT:    movq %r8, 16(%rax)
+; CHECK-NEXT:    movq %rdx, (%rax)
+; CHECK-NEXT:    movq %r9, 24(%rax)
+; CHECK-NEXT:    movq %rdi, 8(%rax)
+; CHECK-NEXT:    retq
+start:
+  %xor = xor <2 x i128> %y, %x
+  %lshr = lshr <2 x i128> %xor, <i128 1, i128 1>
+  %and = and <2 x i128> %y, %x
+  %add = add <2 x i128> %lshr, %and
+  ret <2 x i128> %add
+}

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title should mention what operation

@wizardengineer wizardengineer changed the title [DAG] Reducing instructions by better legalization handling of i128 data types [DAG] Reducing instructions by better legalization handling of AVGFLOORU for i128 data types Jul 22, 2024
@wizardengineer
Copy link
Contributor Author

Title should mention what operation

fixed @arsenm

@wizardengineer wizardengineer requested a review from RKSimon July 23, 2024 03:20
@dtcxzyw
Copy link
Member

dtcxzyw commented Jul 23, 2024

Failed Tests (8):
LLVM :: CodeGen/RISCV/avgceils.ll
LLVM :: CodeGen/RISCV/avgceilu.ll
LLVM :: CodeGen/RISCV/avgfloors.ll
LLVM :: CodeGen/RISCV/avgflooru.ll
LLVM :: CodeGen/X86/avgceils-scalar.ll
LLVM :: CodeGen/X86/avgceilu-scalar.ll
LLVM :: CodeGen/X86/avgfloors-scalar.ll
LLVM :: CodeGen/X86/avgflooru-scalar.ll

Copy link

github-actions bot commented Jul 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@RKSimon RKSimon changed the title [DAG] Reducing instructions by better legalization handling of AVGFLOORU for i128 data types [DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types Jul 23, 2024
@RKSimon
Copy link
Collaborator

RKSimon commented Jul 23, 2024

It looks like there are still 2 additional test files that need regenerating

@wizardengineer
Copy link
Contributor Author

It looks like there are still 2 additional test files that need regenerating

@RKSimon Are you referring to me having to send another commit with them regenerated before and after the optimization?

@wizardengineer
Copy link
Contributor Author

@RKSimon is there anymore changes i need to make?

@wizardengineer wizardengineer requested review from arsenm and topperc July 25, 2024 20:40
@wizardengineer
Copy link
Contributor Author

reverse ping

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wizardengineer wizardengineer requested a review from jayfoad July 26, 2024 17:00
@wizardengineer
Copy link
Contributor Author

@topperc @jayfoad Is this good to merge?

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wizardengineer
Copy link
Contributor Author

wizardengineer commented Jul 26, 2024

Just in case:
I don't have write access to merge. I realized my statement was ambiguous. Is it okay for you guys to merge it?

@wizardengineer wizardengineer requested a review from RKSimon July 26, 2024 20:45
@wizardengineer
Copy link
Contributor Author

@RKSimon I'm sorry, I don't mean to be a bother, but is it okay if you can merge this?

@topperc topperc merged commit d5521d1 into llvm:main Jul 28, 2024
4 of 6 checks passed
Copy link

@medievalghoul Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

topperc added a commit that referenced this pull request Jul 28, 2024
…f AVGFLOORU for illegal data types (#99913)"

This reverts commit d5521d1.

The AArch64 test is failing on the bots.
@topperc
Copy link
Collaborator

topperc commented Jul 28, 2024

@medievalghoul I pushed this, but then I started getting build bot errors failing llvm/test/CodeGen/AArch64/avgflooru-i128.ll so I reverted to get the bots back to green. Can you take a look?

@wizardengineer wizardengineer deleted the _DAG_Reducing_instructions_by_better_legalization_handling_for_i128_data_types branch July 28, 2024 01:53
@wizardengineer
Copy link
Contributor Author

@topperc I'm still getting build bot errors failing llvm/test/CodeGen/AArch64/avgflooru-i128.ll, were you able to revert everything?

@topperc
Copy link
Collaborator

topperc commented Jul 28, 2024

@topperc I'm still getting build bot errors failing llvm/test/CodeGen/AArch64/avgflooru-i128.ll, were you able to revert everything?

I think so. It can take a while for all build bots to see the revert.

@wizardengineer
Copy link
Contributor Author

I think so. It can take a while for all build bots to see the revert.

Thank you for taking your time to help me with this. It seems like the build was breaking from getting conflicted with someone else build, is there anyway I can prevent this from happening in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants