Skip to content

[LLVM][CodeGen][SVE] Add ISel for bfloat scalable vector compares. #138707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025

Conversation

paulwalker-arm
Copy link
Collaborator

No description provided.

@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

Patch is 34.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138707.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+3-3)
  • (added) llvm/test/CodeGen/AArch64/sve-bf16-compares.ll (+1001)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 1c889d67c81e0..d36e38884e3a2 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1758,10 +1758,10 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
 
     for (auto Opcode :
          {ISD::FCEIL, ISD::FDIV, ISD::FFLOOR, ISD::FNEARBYINT, ISD::FRINT,
-          ISD::FROUND, ISD::FROUNDEVEN, ISD::FSQRT, ISD::FTRUNC}) {
+          ISD::FROUND, ISD::FROUNDEVEN, ISD::FSQRT, ISD::FTRUNC, ISD::SETCC}) {
       setOperationPromotedToType(Opcode, MVT::nxv2bf16, MVT::nxv2f32);
       setOperationPromotedToType(Opcode, MVT::nxv4bf16, MVT::nxv4f32);
-      setOperationAction(Opcode, MVT::nxv8bf16, Expand);
+      setOperationPromotedToType(Opcode, MVT::nxv8bf16, MVT::nxv8f32);
     }
 
     if (!Subtarget->hasSVEB16B16()) {
@@ -1769,7 +1769,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
                           ISD::FMINIMUM, ISD::FMINNUM, ISD::FMUL, ISD::FSUB}) {
         setOperationPromotedToType(Opcode, MVT::nxv2bf16, MVT::nxv2f32);
         setOperationPromotedToType(Opcode, MVT::nxv4bf16, MVT::nxv4f32);
-        setOperationAction(Opcode, MVT::nxv8bf16, Expand);
+        setOperationPromotedToType(Opcode, MVT::nxv8bf16, MVT::nxv8f32);
       }
     }
 
diff --git a/llvm/test/CodeGen/AArch64/sve-bf16-compares.ll b/llvm/test/CodeGen/AArch64/sve-bf16-compares.ll
new file mode 100644
index 0000000000000..e0bf755c851b7
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-bf16-compares.ll
@@ -0,0 +1,1001 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --extra_scrub
+; RUN: llc -mattr=+sve                  < %s | FileCheck %s
+; RUN: llc -mattr=+sme -force-streaming < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+;
+; OEQ
+;
+
+define <vscale x 2 x i1> @fcmp_oeq_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oeq_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp oeq <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_oeq_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oeq_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp oeq <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_oeq_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oeq_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmeq p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp oeq <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; OGT
+;
+
+define <vscale x 2 x i1> @fcmp_ogt_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ogt_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp ogt <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ogt_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ogt_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp ogt <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ogt_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ogt_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp ogt <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; OGE
+;
+
+define <vscale x 2 x i1> @fcmp_oge_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oge_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmge p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp oge <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_oge_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oge_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmge p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    ret
+  %res = fcmp oge <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_oge_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_oge_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmge p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmge p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp oge <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; OLT
+;
+
+define <vscale x 2 x i1> @fcmp_olt_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_olt_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    ret
+  %res = fcmp olt <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_olt_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_olt_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    ret
+  %res = fcmp olt <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_olt_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_olt_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z0.h
+; CHECK-NEXT:    uunpkhi z3.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp olt <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; OLE
+;
+
+define <vscale x 2 x i1> @fcmp_ole_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ole_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmge p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    ret
+  %res = fcmp ole <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ole_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ole_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmge p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    ret
+  %res = fcmp ole <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ole_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ole_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z0.h
+; CHECK-NEXT:    uunpkhi z3.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    fcmge p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmge p0.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp ole <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; ONE
+;
+
+define <vscale x 2 x i1> @fcmp_one_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_one_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp one <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_one_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_one_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp one <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_one_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_one_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z0.h
+; CHECK-NEXT:    uunpkhi z3.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmgt p2.s, p0/z, z2.s, z3.s
+; CHECK-NEXT:    fcmgt p3.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    fcmgt p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    mov p1.b, p2/m, p2.b
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p3.b
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp one <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; ORD
+;
+
+define <vscale x 2 x i1> @fcmp_ord_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ord_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ord <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ord_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ord_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ord <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ord_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ord_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmuo p2.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p1.b, p0/z, p1.b
+; CHECK-NEXT:    not p0.b, p0/z, p2.b
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp ord <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; UEQ
+;
+
+define <vscale x 2 x i1> @fcmp_ueq_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ueq_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ueq <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ueq_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ueq_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ueq <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ueq_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ueq_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmuo p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmeq p2.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmuo p3.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    fcmeq p0.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    mov p1.b, p2/m, p2.b
+; CHECK-NEXT:    sel p0.b, p0, p0.b, p3.b
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp ueq <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; UGT
+;
+
+define <vscale x 2 x i1> @fcmp_ugt_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ugt_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmge p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ugt <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ugt_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ugt_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmge p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ugt <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ugt_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ugt_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z0.h
+; CHECK-NEXT:    uunpkhi z3.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    fcmge p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmge p2.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p1.b, p0/z, p1.b
+; CHECK-NEXT:    not p0.b, p0/z, p2.b
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp ugt <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; UGE
+;
+
+define <vscale x 2 x i1> @fcmp_uge_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_uge_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp uge <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_uge_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_uge_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp uge <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_uge_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_uge_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z0.h
+; CHECK-NEXT:    uunpkhi z3.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    fcmgt p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmgt p2.s, p0/z, z1.s, z0.s
+; CHECK-NEXT:    not p1.b, p0/z, p1.b
+; CHECK-NEXT:    not p0.b, p0/z, p2.b
+; CHECK-NEXT:    uzp1 p0.h, p0.h, p1.h
+; CHECK-NEXT:    ret
+  %res = fcmp uge <vscale x 8 x bfloat> %a, %b
+  ret <vscale x 8 x i1> %res
+}
+
+;
+; ULT
+;
+
+define <vscale x 2 x i1> @fcmp_ult_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ult_nxv2bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    fcmge p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ult <vscale x 2 x bfloat> %a, %b
+  ret <vscale x 2 x i1> %res
+}
+
+define <vscale x 4 x i1> @fcmp_ult_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ult_nxv4bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    fcmge p1.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p0.b, p0/z, p1.b
+; CHECK-NEXT:    ret
+  %res = fcmp ult <vscale x 4 x bfloat> %a, %b
+  ret <vscale x 4 x i1> %res
+}
+
+define <vscale x 8 x i1> @fcmp_ult_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b) {
+; CHECK-LABEL: fcmp_ult_nxv8bf16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uunpkhi z2.s, z1.h
+; CHECK-NEXT:    uunpkhi z3.s, z0.h
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    lsl z2.s, z2.s, #16
+; CHECK-NEXT:    lsl z3.s, z3.s, #16
+; CHECK-NEXT:    lsl z1.s, z1.s, #16
+; CHECK-NEXT:    lsl z0.s, z0.s, #16
+; CHECK-NEXT:    fcmge p1.s, p0/z, z3.s, z2.s
+; CHECK-NEXT:    fcmge p2.s, p0/z, z0.s, z1.s
+; CHECK-NEXT:    not p1.b, p0/z, p1.b
+; CHECK-NEXT:    not p0.b, p0/z, p2.b
+; CHECK-NEXT: ...
[truncated]

}

if (!Subtarget->hasSVEB16B16()) {
for (auto Opcode : {ISD::FADD, ISD::FMA, ISD::FMAXIMUM, ISD::FMAXNUM,
ISD::FMINIMUM, ISD::FMINNUM, ISD::FMUL, ISD::FSUB}) {
setOperationPromotedToType(Opcode, MVT::nxv2bf16, MVT::nxv2f32);
setOperationPromotedToType(Opcode, MVT::nxv4bf16, MVT::nxv4f32);
setOperationAction(Opcode, MVT::nxv8bf16, Expand);
setOperationPromotedToType(Opcode, MVT::nxv8bf16, MVT::nxv8f32);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this is not required, but given it's related to legalisation of nxv8bf16 as above I figured it was worth keeping the two lines consistent?

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@paulwalker-arm paulwalker-arm merged commit 160abfb into llvm:main May 8, 2025
13 checks passed
@paulwalker-arm paulwalker-arm deleted the sve-bfloat-compares branch May 8, 2025 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants