Skip to content

[RISCV] Allow f16/bf16 with zvfhmin/zvfbfmin as legal interleaved access #115257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 7, 2024

This is another piece split off from the work to add zvfhmin/zvfbfmin to isLegalElementTypeForRVV.
This is needed to get InterleavedAccessPass to lower [de]interleaves to segment load/stores.

This is another piece split off from the work to add zvfhmin/zvfbfmin to isLegalElementTypeForRVV.
This is needed to get InterleavedAccessPass to lower [de]interleaves to segment load/stores.
@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is another piece split off from the work to add zvfhmin/zvfbfmin to isLegalElementTypeForRVV.
This is needed to get InterleavedAccessPass to lower [de]interleaves to segment load/stores.


Full diff: https://github.com/llvm/llvm-project/pull/115257.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+6-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll (+37-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll (+37-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+37-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+37-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index a625e9d5efeb55..74c1910cffc47b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21512,7 +21512,12 @@ bool RISCVTargetLowering::isLegalInterleavedAccessType(
   if (!isTypeLegal(VT))
     return false;
 
-  if (!isLegalElementTypeForRVV(VT.getScalarType()) ||
+  // TODO: Move bf16/f16 support into isLegalElementTypeForRVV
+  if (!(isLegalElementTypeForRVV(VT.getScalarType()) ||
+        (VT.getScalarType() == MVT::bf16 &&
+         Subtarget.hasVInstructionsBF16Minimal()) ||
+        (VT.getScalarType() == MVT::f16 &&
+         Subtarget.hasVInstructionsF16Minimal())) ||
       !allowsMemoryAccessForAlignment(VTy->getContext(), DL, VT, AddrSpace,
                                       Alignment))
     return false;
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
index 1d7496397670f9..76720c5641563c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck %s
 
 ; Integers
 
@@ -107,6 +109,28 @@ declare {<2 x i64>, <2 x i64>} @llvm.vector.deinterleave2.v4i64(<4 x i64>)
 
 ; Floats
 
+define {<2 x bfloat>, <2 x bfloat>} @vector_deinterleave_load_v2bf16_v4bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_v2bf16_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <4 x bfloat>, ptr %p
+  %retval = call {<2 x bfloat>, <2 x bfloat>} @llvm.vector.deinterleave2.v4bf16(<4 x bfloat> %vec)
+  ret {<2 x bfloat>, <2 x bfloat>} %retval
+}
+
+define {<4 x bfloat>, <4 x bfloat>} @vector_deinterleave_load_v4bf16_v8bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_v4bf16_v8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <8 x bfloat>, ptr %p
+  %retval = call {<4 x bfloat>, <4 x bfloat>} @llvm.vector.deinterleave2.v8bf16(<8 x bfloat> %vec)
+  ret {<4 x bfloat>, <4 x bfloat>} %retval
+}
+
 define {<2 x half>, <2 x half>} @vector_deinterleave_load_v2f16_v4f16(ptr %p) {
 ; CHECK-LABEL: vector_deinterleave_load_v2f16_v4f16:
 ; CHECK:       # %bb.0:
@@ -140,6 +164,17 @@ define {<2 x float>, <2 x float>} @vector_deinterleave_load_v2f32_v4f32(ptr %p)
   ret {<2 x float>, <2 x float>} %retval
 }
 
+define {<8 x bfloat>, <8 x bfloat>} @vector_deinterleave_load_v8bf16_v16bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_v8bf16_v16bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <16 x bfloat>, ptr %p
+  %retval = call {<8 x bfloat>, <8 x bfloat>} @llvm.vector.deinterleave2.v16bf16(<16 x bfloat> %vec)
+  ret {<8 x bfloat>, <8 x bfloat>} %retval
+}
+
 define {<8 x half>, <8 x half>} @vector_deinterleave_load_v8f16_v16f16(ptr %p) {
 ; CHECK-LABEL: vector_deinterleave_load_v8f16_v16f16:
 ; CHECK:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
index 7de9b59c6853f6..67c18b5eef736e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck %s
 
 ; Integers
 
@@ -85,6 +87,28 @@ declare <4 x i64> @llvm.vector.interleave2.v4i64(<2 x i64>, <2 x i64>)
 
 ; Floats
 
+define void @vector_interleave_store_v4bf16_v2bf16(<2 x bfloat> %a, <2 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_v4bf16_v2bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <4 x bfloat> @llvm.vector.interleave2.v4bf16(<2 x bfloat> %a, <2 x bfloat> %b)
+  store <4 x bfloat> %res, ptr %p
+  ret void
+}
+
+define void @vector_interleave_store_v8bf16_v4bf16(<4 x bfloat> %a, <4 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_v8bf16_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <8 x bfloat> @llvm.vector.interleave2.v8bf16(<4 x bfloat> %a, <4 x bfloat> %b)
+  store <8 x bfloat> %res, ptr %p
+  ret void
+}
+
 define void @vector_interleave_store_v4f16_v2f16(<2 x half> %a, <2 x half> %b, ptr %p) {
 ; CHECK-LABEL: vector_interleave_store_v4f16_v2f16:
 ; CHECK:       # %bb.0:
@@ -118,6 +142,17 @@ define void @vector_interleave_store_v4f32_v2f32(<2 x float> %a, <2 x float> %b,
   ret void
 }
 
+define void @vector_interleave_store_v16bf16_v8bf16(<8 x bfloat> %a, <8 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_v16bf16_v8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <16 x bfloat> @llvm.vector.interleave2.v16bf16(<8 x bfloat> %a, <8 x bfloat> %b)
+  store <16 x bfloat> %res, ptr %p
+  ret void
+}
+
 define void @vector_interleave_store_v16f16_v8f16(<8 x half> %a, <8 x half> %b, ptr %p) {
 ; CHECK-LABEL: vector_interleave_store_v16f16_v8f16:
 ; CHECK:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
index 46e93087ab34ba..f4c7f0f13e984b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh,+m | FileCheck --check-prefixes=CHECK,RV32 %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh,+m | FileCheck --check-prefixes=CHECK,RV64 %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh,+zvfbfmin,+m | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh,+zvfbfmin,+m | FileCheck --check-prefixes=CHECK,RV64 %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfhmin,+zvfbfmin,+m | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfhmin,+zvfbfmin,+m | FileCheck --check-prefixes=CHECK,RV64 %s
 
 ; Integers
 
@@ -199,6 +201,28 @@ declare {<vscale x 8 x i64>, <vscale x 8 x i64>} @llvm.vector.deinterleave2.nxv1
 
 ; Floats
 
+define {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} @vector_deinterleave_load_nxv2bf16_nxv4bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_nxv2bf16_nxv4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <vscale x 4 x bfloat>, ptr %p
+  %retval = call {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} @llvm.vector.deinterleave2.nxv4bf16(<vscale x 4 x bfloat> %vec)
+  ret {<vscale x 2 x bfloat>, <vscale x 2 x bfloat>} %retval
+}
+
+define {<vscale x 4 x bfloat>, <vscale x 4 x bfloat>} @vector_deinterleave_load_nxv4bf16_nxv8bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_nxv4bf16_nxv8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <vscale x 8 x bfloat>, ptr %p
+  %retval = call {<vscale x 4 x bfloat>, <vscale x 4 x bfloat>} @llvm.vector.deinterleave2.nxv8bf16(<vscale x 8 x bfloat> %vec)
+  ret {<vscale x 4 x bfloat>, <vscale x 4 x bfloat>} %retval
+}
+
 define {<vscale x 2 x half>, <vscale x 2 x half>} @vector_deinterleave_load_nxv2f16_nxv4f16(ptr %p) {
 ; CHECK-LABEL: vector_deinterleave_load_nxv2f16_nxv4f16:
 ; CHECK:       # %bb.0:
@@ -232,6 +256,17 @@ define {<vscale x 2 x float>, <vscale x 2 x float>} @vector_deinterleave_load_nx
   ret {<vscale x 2 x float>, <vscale x 2 x float>} %retval
 }
 
+define {<vscale x 8 x bfloat>, <vscale x 8 x bfloat>} @vector_deinterleave_load_nxv8bf16_nxv16bf16(ptr %p) {
+; CHECK-LABEL: vector_deinterleave_load_nxv8bf16_nxv16bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vlseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %vec = load <vscale x 16 x bfloat>, ptr %p
+  %retval = call {<vscale x 8 x bfloat>, <vscale x 8 x bfloat>} @llvm.vector.deinterleave2.nxv16bf16(<vscale x 16 x bfloat> %vec)
+  ret {<vscale x 8 x bfloat>, <vscale x 8 x bfloat>} %retval
+}
+
 define {<vscale x 8 x half>, <vscale x 8 x half>} @vector_deinterleave_load_nxv8f16_nxv16f16(ptr %p) {
 ; CHECK-LABEL: vector_deinterleave_load_nxv8f16_nxv16f16:
 ; CHECK:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
index 04e1adcb37ca15..16ce25f86462e3 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll
@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh | FileCheck --check-prefixes=CHECK,RV32 %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh | FileCheck --check-prefixes=CHECK,RV64 %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfh,+zvfbfmin | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfh,+zvfbfmin | FileCheck --check-prefixes=CHECK,RV64 %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck --check-prefixes=CHECK,RV32 %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v,+zvfhmin,+zvfbfmin | FileCheck --check-prefixes=CHECK,RV64 %s
 
 ; Integers
 
@@ -154,6 +156,28 @@ declare <vscale x 16 x i64> @llvm.vector.interleave2.nxv16i64(<vscale x 8 x i64>
 
 ; Floats
 
+define void @vector_interleave_store_nxv4bf16_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_nxv4bf16_nxv2bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <vscale x 4 x bfloat> @llvm.vector.interleave2.nxv4bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b)
+  store <vscale x 4 x bfloat> %res, ptr %p
+  ret void
+}
+
+define void @vector_interleave_store_nxv8bf16_nxv4bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_nxv8bf16_nxv4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <vscale x 8 x bfloat> @llvm.vector.interleave2.nxv8bf16(<vscale x 4 x bfloat> %a, <vscale x 4 x bfloat> %b)
+  store <vscale x 8 x bfloat> %res, ptr %p
+  ret void
+}
+
 define void @vector_interleave_store_nxv4f16_nxv2f16(<vscale x 2 x half> %a, <vscale x 2 x half> %b, ptr %p) {
 ; CHECK-LABEL: vector_interleave_store_nxv4f16_nxv2f16:
 ; CHECK:       # %bb.0:
@@ -187,6 +211,17 @@ define void @vector_interleave_store_nxv4f32_nxv2f32(<vscale x 2 x float> %a, <v
   ret void
 }
 
+define void @vector_interleave_store_nxv16bf16_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b, ptr %p) {
+; CHECK-LABEL: vector_interleave_store_nxv16bf16_nxv8bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
+; CHECK-NEXT:    vsseg2e16.v v8, (a0)
+; CHECK-NEXT:    ret
+  %res = call <vscale x 16 x bfloat> @llvm.vector.interleave2.nxv16bf16(<vscale x 8 x bfloat> %a, <vscale x 8 x bfloat> %b)
+  store <vscale x 16 x bfloat> %res, ptr %p
+  ret void
+}
+
 define void @vector_interleave_store_nxv16f16_nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b, ptr %p) {
 ; CHECK-LABEL: vector_interleave_store_nxv16f16_nxv8f16:
 ; CHECK:       # %bb.0:

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97 lukel97 merged commit f0e2301 into llvm:main Nov 7, 2024
7 of 9 checks passed
lukel97 added a commit that referenced this pull request Nov 11, 2024
)

This PR enables scalable loop vectorization for f16 with zvfhmin and
bf16 with zvfbfmin.

Enabling this was dependent on filling out the gaps for scalable
zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might
emit should now be handled.

It does this by marking f16 and bf16 as legal in
`isLegalElementTypeForRVV`. There are a few users of
`isLegalElementTypeForRVV` that have already been enabled in other PRs:

- `isLegalStridedLoadStore` #115264
- `isLegalInterleavedAccessType` #115257
- `isLegalMaskedLoadStore` #115145
- `isLegalMaskedGatherScatter` #114945

The remaining user is `isLegalToVectorizeReduction`. We can't promote
f16/bf16 reductions to f32 so we need to disable them for scalable
vectors. The cost model actually marks these as invalid, but for
out-of-tree reductions `ComputeReductionResult` doesn't get costed and
it will end up emitting a reduction intrinsic regardless, so we still
need to mark them as illegal. We might be able to remove this
restriction later for fmax and fmin reductions.
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
…#115272)

This PR enables scalable loop vectorization for f16 with zvfhmin and
bf16 with zvfbfmin.

Enabling this was dependent on filling out the gaps for scalable
zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might
emit should now be handled.

It does this by marking f16 and bf16 as legal in
`isLegalElementTypeForRVV`. There are a few users of
`isLegalElementTypeForRVV` that have already been enabled in other PRs:

- `isLegalStridedLoadStore` llvm#115264
- `isLegalInterleavedAccessType` llvm#115257
- `isLegalMaskedLoadStore` llvm#115145
- `isLegalMaskedGatherScatter` llvm#114945

The remaining user is `isLegalToVectorizeReduction`. We can't promote
f16/bf16 reductions to f32 so we need to disable them for scalable
vectors. The cost model actually marks these as invalid, but for
out-of-tree reductions `ComputeReductionResult` doesn't get costed and
it will end up emitting a reduction intrinsic regardless, so we still
need to mark them as illegal. We might be able to remove this
restriction later for fmax and fmin reductions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants