-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Lower vector_shuffle for bf16 #114731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Lower vector_shuffle for bf16 #114731
Conversation
This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower. Full diff: https://github.com/llvm/llvm-project/pull/114731.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 920b06c7ba6ecd..54642a9ed80e88 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1381,6 +1381,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
if (VT.getVectorElementType() == MVT::bf16) {
setOperationAction(ISD::BITCAST, VT, Custom);
setOperationAction({ISD::VP_FP_ROUND, ISD::VP_FP_EXTEND}, VT, Custom);
+ setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
if (Subtarget.hasStdExtZfbfmin()) {
setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
} else {
@@ -5197,8 +5198,9 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
MVT SplatVT = ContainerVT;
- // If we don't have Zfh, we need to use an integer scalar load.
- if (SVT == MVT::f16 && !Subtarget.hasStdExtZfh()) {
+ // f16 with zvfhmin and bf16 need to use an integer scalar load.
+ if (SVT == MVT::bf16 ||
+ (SVT == MVT::f16 && !Subtarget.hasStdExtZfh())) {
SVT = MVT::i16;
SplatVT = ContainerVT.changeVectorElementType(SVT);
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
index 67ae562fa2ab50..c803b15913bb34 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
@@ -1,8 +1,19 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <4 x bfloat> @shuffle_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: shuffle_v4bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT: vmv.v.i v0, 11
+; CHECK-NEXT: vmerge.vvm v8, v9, v8, v0
+; CHECK-NEXT: ret
+ %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 0, i32 1, i32 6, i32 3>
+ ret <4 x bfloat> %s
+}
define <4 x half> @shuffle_v4f16(<4 x half> %x, <4 x half> %y) {
; CHECK-LABEL: shuffle_v4f16:
@@ -30,8 +41,8 @@ define <8 x float> @shuffle_v8f32(<8 x float> %x, <8 x float> %y) {
define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
; CHECK-LABEL: shuffle_fv_v4f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI2_0)
-; CHECK-NEXT: fld fa5, %lo(.LCPI2_0)(a0)
+; CHECK-NEXT: lui a0, %hi(.LCPI3_0)
+; CHECK-NEXT: fld fa5, %lo(.LCPI3_0)(a0)
; CHECK-NEXT: vsetivli zero, 1, e8, mf8, ta, ma
; CHECK-NEXT: vmv.v.i v0, 9
; CHECK-NEXT: vsetivli zero, 4, e64, m2, ta, ma
@@ -44,8 +55,8 @@ define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
define <4 x double> @shuffle_vf_v4f64(<4 x double> %x) {
; CHECK-LABEL: shuffle_vf_v4f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI3_0)
-; CHECK-NEXT: fld fa5, %lo(.LCPI3_0)(a0)
+; CHECK-NEXT: lui a0, %hi(.LCPI4_0)
+; CHECK-NEXT: fld fa5, %lo(.LCPI4_0)(a0)
; CHECK-NEXT: vsetivli zero, 1, e8, mf8, ta, ma
; CHECK-NEXT: vmv.v.i v0, 6
; CHECK-NEXT: vsetivli zero, 4, e64, m2, ta, ma
@@ -92,8 +103,8 @@ define <4 x double> @vrgather_permute_shuffle_uv_v4f64(<4 x double> %x) {
define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y) {
; CHECK-LABEL: vrgather_shuffle_vv_v4f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI6_0)
-; CHECK-NEXT: addi a0, a0, %lo(.LCPI6_0)
+; CHECK-NEXT: lui a0, %hi(.LCPI7_0)
+; CHECK-NEXT: addi a0, a0, %lo(.LCPI7_0)
; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT: vle16.v v14, (a0)
; CHECK-NEXT: vmv.v.i v0, 8
@@ -109,8 +120,8 @@ define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y)
define <4 x double> @vrgather_shuffle_xv_v4f64(<4 x double> %x) {
; CHECK-LABEL: vrgather_shuffle_xv_v4f64:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI7_0)
-; CHECK-NEXT: fld fa5, %lo(.LCPI7_0)(a0)
+; CHECK-NEXT: lui a0, %hi(.LCPI8_0)
+; CHECK-NEXT: fld fa5, %lo(.LCPI8_0)(a0)
; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT: vid.v v10
; CHECK-NEXT: vrsub.vi v12, v10, 4
@@ -129,8 +140,8 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT: vid.v v10
-; CHECK-NEXT: lui a0, %hi(.LCPI8_0)
-; CHECK-NEXT: fld fa5, %lo(.LCPI8_0)(a0)
+; CHECK-NEXT: lui a0, %hi(.LCPI9_0)
+; CHECK-NEXT: fld fa5, %lo(.LCPI9_0)(a0)
; CHECK-NEXT: li a0, 3
; CHECK-NEXT: vmul.vx v12, v10, a0
; CHECK-NEXT: vmv.v.i v0, 3
@@ -143,6 +154,28 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
ret <4 x double> %s
}
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_1(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vslidedown.vi v8, v8, 1
+; CHECK-NEXT: ret
+entry:
+ %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
+ ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_3(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_3:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT: vslidedown.vi v8, v8, 3
+; CHECK-NEXT: ret
+entry:
+ %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+ ret <4 x bfloat> %s
+}
+
define <4 x half> @shuffle_v8f16_to_vslidedown_1(<8 x half> %x) {
; CHECK-LABEL: shuffle_v8f16_to_vslidedown_1:
; CHECK: # %bb.0: # %entry
@@ -176,6 +209,16 @@ entry:
ret <2 x float> %s
}
+define <4 x bfloat> @slidedown_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: slidedown_v4bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT: vslidedown.vi v8, v8, 1
+; CHECK-NEXT: ret
+ %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 undef>
+ ret <4 x bfloat> %s
+}
+
define <4 x half> @slidedown_v4f16(<4 x half> %x) {
; CHECK-LABEL: slidedown_v4f16:
; CHECK: # %bb.0:
@@ -265,6 +308,50 @@ define <8 x double> @splice_binary2(<8 x double> %x, <8 x double> %y) {
ret <8 x double> %s
}
+define <4 x bfloat> @vrgather_permute_shuffle_vu_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: vrgather_permute_shuffle_vu_v4bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a0, 4096
+; CHECK-NEXT: addi a0, a0, 513
+; CHECK-NEXT: vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT: vmv.s.x v9, a0
+; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT: vsext.vf2 v10, v9
+; CHECK-NEXT: vrgather.vv v9, v8, v10
+; CHECK-NEXT: vmv1r.v v8, v9
+; CHECK-NEXT: ret
+ %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 1>
+ ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vv_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: vrgather_shuffle_vv_v4bf16:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lui a0, %hi(.LCPI25_0)
+; CHECK-NEXT: addi a0, a0, %lo(.LCPI25_0)
+; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, mu
+; CHECK-NEXT: vle16.v v11, (a0)
+; CHECK-NEXT: vmv.v.i v0, 8
+; CHECK-NEXT: vrgather.vv v10, v8, v11
+; CHECK-NEXT: vrgather.vi v10, v9, 1, v0.t
+; CHECK-NEXT: vmv1r.v v8, v10
+; CHECK-NEXT: ret
+ %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 1, i32 2, i32 0, i32 5>
+ ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vx_v4bf16_load(ptr %p) {
+; CHECK-LABEL: vrgather_shuffle_vx_v4bf16_load:
+; CHECK: # %bb.0:
+; CHECK-NEXT: lh a0, 2(a0)
+; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT: vmv.v.x v8, a0
+; CHECK-NEXT: ret
+ %v = load <4 x bfloat>, ptr %p
+ %s = shufflevector <4 x bfloat> %v, <4 x bfloat> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
+ ret <4 x bfloat> %s
+}
+
define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
; CHECK-LABEL: vrgather_permute_shuffle_vu_v4f16:
; CHECK: # %bb.0:
@@ -284,8 +371,8 @@ define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
define <4 x half> @vrgather_shuffle_vv_v4f16(<4 x half> %x, <4 x half> %y) {
; CHECK-LABEL: vrgather_shuffle_vv_v4f16:
; CHECK: # %bb.0:
-; CHECK-NEXT: lui a0, %hi(.LCPI21_0)
-; CHECK-NEXT: addi a0, a0, %lo(.LCPI21_0)
+; CHECK-NEXT: lui a0, %hi(.LCPI28_0)
+; CHECK-NEXT: addi a0, a0, %lo(.LCPI28_0)
; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, mu
; CHECK-NEXT: vle16.v v11, (a0)
; CHECK-NEXT: vmv.v.i v0, 8
|
…fbfmin Similarly to llvm#114731, these don't actually require any instructions from the extensions. The motivation for this and llvm#114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…fbfmin (#114750) Similarly to #114731, these don't actually require any instructions from the extensions. The motivation for this and #114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.
This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.
…fbfmin (llvm#114750) Similarly to llvm#114731, these don't actually require any instructions from the extensions. The motivation for this and llvm#114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.
This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.