[RISCV] Lower vector_shuffle for bf16 #114731

lukel97 · 2024-11-04T04:07:23Z

This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

llvmbot · 2024-11-04T04:08:00Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

Full diff: https://github.com/llvm/llvm-project/pull/114731.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll (+103-16)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 920b06c7ba6ecd..54642a9ed80e88 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1381,6 +1381,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
         if (VT.getVectorElementType() == MVT::bf16) {
           setOperationAction(ISD::BITCAST, VT, Custom);
           setOperationAction({ISD::VP_FP_ROUND, ISD::VP_FP_EXTEND}, VT, Custom);
+          setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
           if (Subtarget.hasStdExtZfbfmin()) {
             setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
           } else {
@@ -5197,8 +5198,9 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
 
         MVT SplatVT = ContainerVT;
 
-        // If we don't have Zfh, we need to use an integer scalar load.
-        if (SVT == MVT::f16 && !Subtarget.hasStdExtZfh()) {
+        // f16 with zvfhmin and bf16 need to use an integer scalar load.
+        if (SVT == MVT::bf16 ||
+            (SVT == MVT::f16 && !Subtarget.hasStdExtZfh())) {
           SVT = MVT::i16;
           SplatVT = ContainerVT.changeVectorElementType(SVT);
         }
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
index 67ae562fa2ab50..c803b15913bb34 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
@@ -1,8 +1,19 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfh,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -mattr=+d,+zvfhmin,+zvfbfmin,+v -verify-machineinstrs < %s | FileCheck %s
+
+define <4 x bfloat> @shuffle_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: shuffle_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.i v0, 11
+; CHECK-NEXT:    vmerge.vvm v8, v9, v8, v0
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 0, i32 1, i32 6, i32 3>
+  ret <4 x bfloat> %s
+}
 
 define <4 x half> @shuffle_v4f16(<4 x half> %x, <4 x half> %y) {
 ; CHECK-LABEL: shuffle_v4f16:
@@ -30,8 +41,8 @@ define <8 x float> @shuffle_v8f32(<8 x float> %x, <8 x float> %y) {
 define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: shuffle_fv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI2_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI2_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI3_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI3_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v0, 9
 ; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
@@ -44,8 +55,8 @@ define <4 x double> @shuffle_fv_v4f64(<4 x double> %x) {
 define <4 x double> @shuffle_vf_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: shuffle_vf_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI3_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI3_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI4_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI4_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e8, mf8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v0, 6
 ; CHECK-NEXT:    vsetivli zero, 4, e64, m2, ta, ma
@@ -92,8 +103,8 @@ define <4 x double> @vrgather_permute_shuffle_uv_v4f64(<4 x double> %x) {
 define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y) {
 ; CHECK-LABEL: vrgather_shuffle_vv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI6_0)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI6_0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI7_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI7_0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vle16.v v14, (a0)
 ; CHECK-NEXT:    vmv.v.i v0, 8
@@ -109,8 +120,8 @@ define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y)
 define <4 x double> @vrgather_shuffle_xv_v4f64(<4 x double> %x) {
 ; CHECK-LABEL: vrgather_shuffle_xv_v4f64:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI7_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI7_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI8_0)(a0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vid.v v10
 ; CHECK-NEXT:    vrsub.vi v12, v10, 4
@@ -129,8 +140,8 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vid.v v10
-; CHECK-NEXT:    lui a0, %hi(.LCPI8_0)
-; CHECK-NEXT:    fld fa5, %lo(.LCPI8_0)(a0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI9_0)
+; CHECK-NEXT:    fld fa5, %lo(.LCPI9_0)(a0)
 ; CHECK-NEXT:    li a0, 3
 ; CHECK-NEXT:    vmul.vx v12, v10, a0
 ; CHECK-NEXT:    vmv.v.i v0, 3
@@ -143,6 +154,28 @@ define <4 x double> @vrgather_shuffle_vx_v4f64(<4 x double> %x) {
   ret <4 x double> %s
 }
 
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_1(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    ret
+entry:
+  %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @shuffle_v8bf16_to_vslidedown_3(<8 x bfloat> %x) {
+; CHECK-LABEL: shuffle_v8bf16_to_vslidedown_3:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
+; CHECK-NEXT:    ret
+entry:
+  %s = shufflevector <8 x bfloat> %x, <8 x bfloat> poison, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @shuffle_v8f16_to_vslidedown_1(<8 x half> %x) {
 ; CHECK-LABEL: shuffle_v8f16_to_vslidedown_1:
 ; CHECK:       # %bb.0: # %entry
@@ -176,6 +209,16 @@ entry:
   ret <2 x float> %s
 }
 
+define <4 x bfloat> @slidedown_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: slidedown_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 1
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 undef>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @slidedown_v4f16(<4 x half> %x) {
 ; CHECK-LABEL: slidedown_v4f16:
 ; CHECK:       # %bb.0:
@@ -265,6 +308,50 @@ define <8 x double> @splice_binary2(<8 x double> %x, <8 x double> %y) {
   ret <8 x double> %s
 }
 
+define <4 x bfloat> @vrgather_permute_shuffle_vu_v4bf16(<4 x bfloat> %x) {
+; CHECK-LABEL: vrgather_permute_shuffle_vu_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, 4096
+; CHECK-NEXT:    addi a0, a0, 513
+; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
+; CHECK-NEXT:    vmv.s.x v9, a0
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT:    vsext.vf2 v10, v9
+; CHECK-NEXT:    vrgather.vv v9, v8, v10
+; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> poison, <4 x i32> <i32 1, i32 2, i32 0, i32 1>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vv_v4bf16(<4 x bfloat> %x, <4 x bfloat> %y) {
+; CHECK-LABEL: vrgather_shuffle_vv_v4bf16:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lui a0, %hi(.LCPI25_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI25_0)
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
+; CHECK-NEXT:    vle16.v v11, (a0)
+; CHECK-NEXT:    vmv.v.i v0, 8
+; CHECK-NEXT:    vrgather.vv v10, v8, v11
+; CHECK-NEXT:    vrgather.vi v10, v9, 1, v0.t
+; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    ret
+  %s = shufflevector <4 x bfloat> %x, <4 x bfloat> %y, <4 x i32> <i32 1, i32 2, i32 0, i32 5>
+  ret <4 x bfloat> %s
+}
+
+define <4 x bfloat> @vrgather_shuffle_vx_v4bf16_load(ptr %p) {
+; CHECK-LABEL: vrgather_shuffle_vx_v4bf16_load:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    lh a0, 2(a0)
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vmv.v.x v8, a0
+; CHECK-NEXT:    ret
+  %v = load <4 x bfloat>, ptr %p
+  %s = shufflevector <4 x bfloat> %v, <4 x bfloat> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
+  ret <4 x bfloat> %s
+}
+
 define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
 ; CHECK-LABEL: vrgather_permute_shuffle_vu_v4f16:
 ; CHECK:       # %bb.0:
@@ -284,8 +371,8 @@ define <4 x half> @vrgather_permute_shuffle_vu_v4f16(<4 x half> %x) {
 define <4 x half> @vrgather_shuffle_vv_v4f16(<4 x half> %x, <4 x half> %y) {
 ; CHECK-LABEL: vrgather_shuffle_vv_v4f16:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    lui a0, %hi(.LCPI21_0)
-; CHECK-NEXT:    addi a0, a0, %lo(.LCPI21_0)
+; CHECK-NEXT:    lui a0, %hi(.LCPI28_0)
+; CHECK-NEXT:    addi a0, a0, %lo(.LCPI28_0)
 ; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, mu
 ; CHECK-NEXT:    vle16.v v11, (a0)
 ; CHECK-NEXT:    vmv.v.i v0, 8

…fbfmin Similarly to llvm#114731, these don't actually require any instructions from the extensions. The motivation for this and llvm#114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.

wangpc-pp

LGTM.

…fbfmin (#114750) Similarly to #114731, these don't actually require any instructions from the extensions. The motivation for this and #114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.

This is much the same as with f16. Currently we scalarize if there's no zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

…fbfmin (llvm#114750) Similarly to llvm#114731, these don't actually require any instructions from the extensions. The motivation for this and llvm#114731 is to eventually enable isLegalElementTypeForRVV for f16 with zvfhmin and bf16 with zvfbfmin in order to enable scalable vectorization. Although the scalable codegen support for f16 and bf16 is now complete enough for anything the loop vectorizer may emit, enabling isLegalElementTypeForRVV would make certian hooks like isLegalInterleavedAccessType and isLegalStridedLoadStore return true for f16 and bf16. This means SLP would start emitting these intrinsics, so we need to add fixed-length codegen support.

[RISCV] Lower shuffle_vector for bf16

86fc8bd

This is much the same as with f16. Currently we scalarize if there's no +zvfbfmin, and crash if there is zvfbfmin because it will try to create a bf16 build_vector, which we also can't lower.

lukel97 requested review from preames, topperc and wangpc-pp November 4, 2024 04:07

llvmbot added the backend:RISC-V label Nov 4, 2024

lukel97 changed the title ~~[RISCV] Lower shuffle_vector for bf16~~ [RISCV] Lower vector_shuffle for bf16 Nov 4, 2024

lukel97 mentioned this pull request Nov 4, 2024

[RISCV] Lower fixed-length strided VP loads and stores for zvfhmin/zvfbfmin #114750

Merged

wangpc-pp approved these changes Nov 4, 2024

View reviewed changes

lukel97 merged commit 7d35368 into llvm:main Nov 4, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Lower vector_shuffle for bf16 #114731

[RISCV] Lower vector_shuffle for bf16 #114731

Uh oh!

lukel97 commented Nov 4, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 4, 2024

Uh oh!

wangpc-pp left a comment

Uh oh!

Uh oh!

Uh oh!

[RISCV] Lower vector_shuffle for bf16 #114731

[RISCV] Lower vector_shuffle for bf16 #114731

Uh oh!

Conversation

lukel97 commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 4, 2024

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Nov 4, 2024 •

edited

Loading