Skip to content

[RISCV] Lower vector_shuffle for bf16 #114731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 4, 2024

This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.
@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.


Full diff: https://github.com/llvm/llvm-project/pull/114731.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+103-16)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 920b06c7ba6ecd..54642a9ed80e88 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1381,6 +1381,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
         if (VT.getVectorElementType() == MVT::bf16) {
           setOperationAction(ISD::BITCAST, VT, Custom);
           setOperationAction({ISD::VP_FP_ROUND, ISD::VP_FP_EXTEND}, VT, Custom);
+          setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
           if (Subtarget.hasStdExtZfbfmin()) {
             setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
           } else {
@@ -5197,8 +5198,9 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
 
         MVT SplatVT = ContainerVT;
 
-        // If we don't have Zfh, we need to use an integer scalar load.
-        if (SVT == MVT::f16 && !Subtarget.hasStdExtZfh()) {
+        // f16 with zvfhmin and bf16 need to use an integer scalar load.
+        if (SVT == MVT::bf16 ||
+            (SVT == MVT::f16 && !Subtarget.hasStdExtZfh())) {
           SVT = MVT::i16;
           SplatVT = ContainerVT.changeVectorElementType(SVT);
         }
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
index 67ae562fa2ab50..c803b15913bb34 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
@@ -1,8 +1,19 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <4 x bfloat> @shuffle_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: shuffle_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.i v0, 11
+; CHECK-NEXT:    vmerge.vvm v8, v9, v8, v0
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 0, i32 1, i32 6, i32 3>
+  ret <4 x bfloat> %s
+}
 
 define <4 x half> @shuffle_v4f16(<4 x half> %x, <4 x half> %y) {
 ; CHECK-LABEL: shuffle_v4f16:
@@ -30,8 +41,8 @@ define <8 x float> @shuffle_v8f32(<8 x float> %x, <8 x float> %y) {
 define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: shuffle_fv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI2_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI2_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI3_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI3_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v0, 9
 ; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
@@ -44,8 +55,8 @@ define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
 define <4 x double> @shuffle_vf_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: shuffle_vf_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI3_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI3_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI4_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI4_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v0, 6
 ; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
@@ -92,8 +103,8 @@ define <4 x double> @vrgather_permute_shuffle_uv_v4f64(<4 x double> %x) {
 define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y) {
 ; CHECK-LABEL: vrgather_shuffle_vv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI6_0)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI6_0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI7_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI7_0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vle16.v v14, (a0)
 ; CHECK-NEXT:    vmv.v.i v0, 8
@@ -109,8 +120,8 @@ define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y)
 define <4 x double> @vrgather_shuffle_xv_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: vrgather_shuffle_xv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI7_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI7_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI8_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vid.v v10
 ; CHECK-NEXT:    vrsub.vi v12, v10, 4
@@ -129,8 +140,8 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vid.v v10
-; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI8_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI9_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI9_0)(a0)
 ; CHECK-NEXT:    li a0, 3
 ; CHECK-NEXT:    vmul.vx v12, v10, a0
 ; CHECK-NEXT:    vmv.v.i v0, 3
@@ -143,6 +154,28 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
   ret <4 x double> %s
 }
 
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_1(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    ret
+entry:
+  %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_3(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_3:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
+; CHECK-NEXT:    ret
+entry:
+  %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @shuffle_v8f16_to_vslidedown_1(<8 x half> %x) {
 ; CHECK-LABEL: shuffle_v8f16_to_vslidedown_1:
 ; CHECK:       # %bb.0: # %entry
@@ -176,6 +209,16 @@ entry:
   ret <2 x float> %s
 }
 
+define <4 x bfloat> @slidedown_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: slidedown_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 undef>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @slidedown_v4f16(<4 x half> %x) {
 ; CHECK-LABEL: slidedown_v4f16:
 ; CHECK:       # %bb.0:
@@ -265,6 +308,50 @@ define <8 x double> @splice_binary2(<8 x double> %x, <8 x double> %y) {
   ret <8 x double> %s
 }
 
+define <4 x bfloat> @vrgather_permute_shuffle_vu_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: vrgather_permute_shuffle_vu_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, 4096
+; CHECK-NEXT:    addi a0, a0, 513
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vmv.s.x v9, a0
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vsext.vf2 v10, v9
+; CHECK-NEXT:    vrgather.vv v9, v8, v10
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 1>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vv_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: vrgather_shuffle_vv_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI25_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI25_0)
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
+; CHECK-NEXT:    vle16.v v11, (a0)
+; CHECK-NEXT:    vmv.v.i v0, 8
+; CHECK-NEXT:    vrgather.vv v10, v8, v11
+; CHECK-NEXT:    vrgather.vi v10, v9, 1, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 1, i32 2, i32 0, i32 5>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vx_v4bf16_load(ptr %p) {
+; CHECK-LABEL: vrgather_shuffle_vx_v4bf16_load:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lh a0, 2(a0)
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a0
+; CHECK-NEXT:    ret
+  %v = load <4 x bfloat>, ptr %p
+  %s = shufflevector <4 x bfloat> %v, <4 x bfloat> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
 ; CHECK-LABEL: vrgather_permute_shuffle_vu_v4f16:
 ; CHECK:       # %bb.0:
@@ -284,8 +371,8 @@ define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
 define <4 x half> @vrgather_shuffle_vv_v4f16(<4 x half> %x, <4 x half> %y) {
 ; CHECK-LABEL: vrgather_shuffle_vv_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI21_0)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI21_0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI28_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI28_0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
 ; CHECK-NEXT:    vle16.v v11, (a0)
 ; CHECK-NEXT:    vmv.v.i v0, 8

@lukel97 lukel97 changed the title [RISCV] Lower shuffle_vector for bf16 [RISCV] Lower vector_shuffle for bf16 Nov 4, 2024
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Nov 4, 2024
…fbfmin

Similarly to llvm#114731, these don't actually require any instructions from the extensions.

The motivation for this and llvm#114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization.

Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lukel97 lukel97 merged commit 7d35368 into llvm:main Nov 4, 2024
10 checks passed
lukel97 added a commit that referenced this pull request Nov 4, 2024
…fbfmin (#114750)

Similarly to #114731, these don't actually require any instructions from
the extensions.

The motivation for this and #114731 is to eventually enable
isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in
order to enable scalable vectorization.

Although the scalable codegen support for f16 and bf16 is now complete
enough for anything the loop vectorizer may emit, enabling
isLegalElementTypeForRVV would make certian hooks like
isLegalInterleavedAccessType and isLegalStridedLoadStore return true for
f16 and bf16. This means SLP would start emitting these intrinsics, so
we need to add fixed-length codegen support.
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
This is much the same as with f16. Currently we scalarize if there's no
zvfbfmin, and crash if there is zvfbfmin because it will try to create a
bf16 build_vector, which we also can't lower.
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…fbfmin (llvm#114750)

Similarly to llvm#114731, these don't actually require any instructions from
the extensions.

The motivation for this and llvm#114731 is to eventually enable
isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in
order to enable scalable vectorization.

Although the scalable codegen support for f16 and bf16 is now complete
enough for anything the loop vectorizer may emit, enabling
isLegalElementTypeForRVV would make certian hooks like
isLegalInterleavedAccessType and isLegalStridedLoadStore return true for
f16 and bf16. This means SLP would start emitting these intrinsics, so
we need to add fixed-length codegen support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants