Skip to content

[RISCV] Use integer VTypeInfo predicate for vmv_v_v_vl pattern #114915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 5, 2024

When lowering fixed length f16 insert_subvector nodes at index 0 we crashed with zvfhmin because we couldn't select vmv_v_v_vl.
This was due to the predicates requiring full zvfh, even though we only need zve32x. Use the integer VTypeInfo instead similarly to VPatSlideVL_VX_VI.

The extract_subvector tests aren't related but were just added for consistency with the insert_subvector tests.

When lowering fixed length f16 insert_subvector nodes at index 0 we crashed with zvfhmin because we couldn't select vmv_v_v_vl.
This was due to the predicates requiring full zvfh, even though we only need zve32x. Use the integer VTypeInfo instead similarly to VPatSlideVL_VX_VI.

The extract_subvector tests aren't related but were just added for consistency with the insert_subvector tests.
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

When lowering fixed length f16 insert_subvector nodes at index 0 we crashed with zvfhmin because we couldn't select vmv_v_v_vl.
This was due to the predicates requiring full zvfh, even though we only need zve32x. Use the integer VTypeInfo instead similarly to VPatSlideVL_VX_VI.

The extract_subvector tests aren't related but were just added for consistency with the insert_subvector tests.


Full diff: https://github.com/llvm/llvm-project/pull/114915.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td (+2-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+72-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+96-6)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
index 9d434cef5a96f1..4b938fc734e5c1 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
@@ -2305,7 +2305,8 @@ foreach vti = AllIntegerVectors in {
 
 // 11.16. Vector Integer Move Instructions
 foreach vti = AllVectors in {
-  let Predicates = GetVTypePredicates<vti>.Predicates in {
+  defvar ivti = GetIntVTypeInfo<vti>.Vti;
+  let Predicates = GetVTypePredicates<ivti>.Predicates in {
     def : Pat<(vti.Vector (riscv_vmv_v_v_vl vti.RegClass:$passthru,
                                             vti.RegClass:$rs2, VLOpFrag)),
               (!cast<Instruction>("PseudoVMV_V_V_"#vti.LMul.MX)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
index 33cd00c9f6af30..fdee80fb95627e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
@@ -1,12 +1,18 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
-; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
 
-; RUN: llc -mtriple=riscv32 -mattr=+m,+v -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
-; RUN: llc -mtriple=riscv64 -mattr=+m,+v -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA
 
-; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+m,v -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS %s
 
 define void @extract_v2i8_v4i8_0(ptr %x, ptr %y) {
 ; CHECK-LABEL: extract_v2i8_v4i8_0:
@@ -866,6 +872,66 @@ define <1 x i64> @extract_v1i64_v2i64_1(<2 x i64> %x) {
   ret <1 x i64> %v
 }
 
+define void @extract_v2bf16_v4bf16_0(ptr %x, ptr %y) {
+; CHECK-LABEL: extract_v2bf16_v4bf16_0:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    ret
+  %a = load <4 x bfloat>, ptr %x
+  %c = call <2 x bfloat> @llvm.vector.extract.v2bf16.v4bf16(<4 x bfloat> %a, i64 0)
+  store <2 x bfloat> %c, ptr %y
+  ret void
+}
+
+define void @extract_v2bf16_v4bf16_2(ptr %x, ptr %y) {
+; CHECK-LABEL: extract_v2bf16_v4bf16_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    ret
+  %a = load <4 x bfloat>, ptr %x
+  %c = call <2 x bfloat> @llvm.vector.extract.v2bf16.v4bf16(<4 x bfloat> %a, i64 2)
+  store <2 x bfloat> %c, ptr %y
+  ret void
+}
+
+define void @extract_v2f16_v4f16_0(ptr %x, ptr %y) {
+; CHECK-LABEL: extract_v2f16_v4f16_0:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    ret
+  %a = load <4 x half>, ptr %x
+  %c = call <2 x half> @llvm.vector.extract.v2f16.v4f16(<4 x half> %a, i64 0)
+  store <2 x half> %c, ptr %y
+  ret void
+}
+
+define void @extract_v2f16_v4f16_2(ptr %x, ptr %y) {
+; CHECK-LABEL: extract_v2f16_v4f16_2:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
+; CHECK-NEXT:    vle16.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf2, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; CHECK-NEXT:    vse16.v v8, (a1)
+; CHECK-NEXT:    ret
+  %a = load <4 x half>, ptr %x
+  %c = call <2 x half> @llvm.vector.extract.v2f16.v4f16(<4 x half> %a, i64 2)
+  store <2 x half> %c, ptr %y
+  ret void
+}
+
 declare <2 x i1> @llvm.vector.extract.v2i1.v64i1(<64 x i1> %vec, i64 %idx)
 declare <8 x i1> @llvm.vector.extract.v8i1.v64i1(<64 x i1> %vec, i64 %idx)
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
index e81f686a283032..2077a905da5f98 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll
@@ -1,12 +1,18 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
-; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
 
-; RUN: llc -mtriple=riscv32 -mattr=+m,+v -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
-; RUN: llc -mtriple=riscv64 -mattr=+m,+v -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
+; RUN: llc -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV32VLA
+; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -early-live-intervals -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,VLA,RV64VLA
 
-; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV32VLS %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+m,v -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV64VLS %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV32VLS %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV64VLS %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfh,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV32VLS %s
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfh,+zvfbfmin -riscv-v-vector-bits-max=128 -verify-machineinstrs | FileCheck -check-prefixes=CHECK,VLS,RV64VLS %s
 
 define <vscale x 8 x i32> @insert_nxv8i32_v2i32_0(<vscale x 8 x i32> %vec, ptr %svp) {
 ; VLA-LABEL: insert_nxv8i32_v2i32_0:
@@ -860,6 +866,90 @@ define void @insert_v2i64_nxv16i64_hi(ptr %psv, ptr %out) {
   ret void
 }
 
+define <vscale x 8 x bfloat> @insert_nxv8bf16_v2bf16_0(<vscale x 8 x bfloat> %vec, ptr %svp) {
+; VLA-LABEL: insert_nxv8bf16_v2bf16_0:
+; VLA:       # %bb.0:
+; VLA-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLA-NEXT:    vle16.v v10, (a0)
+; VLA-NEXT:    vsetivli zero, 2, e16, m2, tu, ma
+; VLA-NEXT:    vmv.v.v v8, v10
+; VLA-NEXT:    ret
+;
+; VLS-LABEL: insert_nxv8bf16_v2bf16_0:
+; VLS:       # %bb.0:
+; VLS-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLS-NEXT:    vle16.v v10, (a0)
+; VLS-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
+; VLS-NEXT:    vmv.v.v v8, v10
+; VLS-NEXT:    ret
+  %sv = load <2 x bfloat>, ptr %svp
+  %v = call <vscale x 8 x bfloat> @llvm.vector.insert.v2bf16.nxv8bf16(<vscale x 8 x bfloat> %vec, <2 x bfloat> %sv, i64 0)
+  ret <vscale x 8 x bfloat> %v
+}
+
+define <vscale x 8 x bfloat> @insert_nxv8bf16_v2bf16_2(<vscale x 8 x bfloat> %vec, ptr %svp) {
+; VLA-LABEL: insert_nxv8bf16_v2bf16_2:
+; VLA:       # %bb.0:
+; VLA-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLA-NEXT:    vle16.v v10, (a0)
+; VLA-NEXT:    vsetivli zero, 4, e16, m2, tu, ma
+; VLA-NEXT:    vslideup.vi v8, v10, 2
+; VLA-NEXT:    ret
+;
+; VLS-LABEL: insert_nxv8bf16_v2bf16_2:
+; VLS:       # %bb.0:
+; VLS-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLS-NEXT:    vle16.v v10, (a0)
+; VLS-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
+; VLS-NEXT:    vslideup.vi v8, v10, 2
+; VLS-NEXT:    ret
+  %sv = load <2 x bfloat>, ptr %svp
+  %v = call <vscale x 8 x bfloat> @llvm.vector.insert.v2bf16.nxv8bf16(<vscale x 8 x bfloat> %vec, <2 x bfloat> %sv, i64 2)
+  ret <vscale x 8 x bfloat> %v
+}
+
+define <vscale x 8 x half> @insert_nxv8f16_v2f16_0(<vscale x 8 x half> %vec, ptr %svp) {
+; VLA-LABEL: insert_nxv8f16_v2f16_0:
+; VLA:       # %bb.0:
+; VLA-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLA-NEXT:    vle16.v v10, (a0)
+; VLA-NEXT:    vsetivli zero, 2, e16, m2, tu, ma
+; VLA-NEXT:    vmv.v.v v8, v10
+; VLA-NEXT:    ret
+;
+; VLS-LABEL: insert_nxv8f16_v2f16_0:
+; VLS:       # %bb.0:
+; VLS-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLS-NEXT:    vle16.v v10, (a0)
+; VLS-NEXT:    vsetivli zero, 2, e16, m1, tu, ma
+; VLS-NEXT:    vmv.v.v v8, v10
+; VLS-NEXT:    ret
+  %sv = load <2 x half>, ptr %svp
+  %v = call <vscale x 8 x half> @llvm.vector.insert.v2f16.nxv8f16(<vscale x 8 x half> %vec, <2 x half> %sv, i64 0)
+  ret <vscale x 8 x half> %v
+}
+
+define <vscale x 8 x half> @insert_nxv8f16_v2f16_2(<vscale x 8 x half> %vec, ptr %svp) {
+; VLA-LABEL: insert_nxv8f16_v2f16_2:
+; VLA:       # %bb.0:
+; VLA-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLA-NEXT:    vle16.v v10, (a0)
+; VLA-NEXT:    vsetivli zero, 4, e16, m2, tu, ma
+; VLA-NEXT:    vslideup.vi v8, v10, 2
+; VLA-NEXT:    ret
+;
+; VLS-LABEL: insert_nxv8f16_v2f16_2:
+; VLS:       # %bb.0:
+; VLS-NEXT:    vsetivli zero, 2, e16, mf4, ta, ma
+; VLS-NEXT:    vle16.v v10, (a0)
+; VLS-NEXT:    vsetivli zero, 4, e16, m1, tu, ma
+; VLS-NEXT:    vslideup.vi v8, v10, 2
+; VLS-NEXT:    ret
+  %sv = load <2 x half>, ptr %svp
+  %v = call <vscale x 8 x half> @llvm.vector.insert.v2f16.nxv8f16(<vscale x 8 x half> %vec, <2 x half> %sv, i64 2)
+  ret <vscale x 8 x half> %v
+}
+
 declare <8 x i1> @llvm.vector.insert.v4i1.v8i1(<8 x i1>, <4 x i1>, i64)
 declare <32 x i1> @llvm.vector.insert.v8i1.v32i1(<32 x i1>, <8 x i1>, i64)
 

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97 lukel97 merged commit a7d1d38 into llvm:main Nov 5, 2024
8 of 9 checks passed
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Nov 5, 2024
In preparation for allowing zvfhmin and zvfbfmin in isLegalElementTypeForRVV, this lowers masked gathers and scatters.

We need to mark f16 and bf16 as legal in isLegalMaskedGatherScatter otherwise ScalarizeMaskedMemIntrin will just scalarize them, but we can move this back into isLegalElementTypeForRVV afterwards.

The scalarized codegen required llvm#114938, llvm#114927 and llvm#114915 to not crash.
lukel97 added a commit that referenced this pull request Nov 6, 2024
…4945)

In preparation for allowing zvfhmin and zvfbfmin in
isLegalElementTypeForRVV, this lowers fixed-length masked gathers and
scatters

We need to mark f16 and bf16 as legal in isLegalMaskedGatherScatter
otherwise ScalarizeMaskedMemIntrin will just scalarize them, but we can
move this back into isLegalElementTypeForRVV afterwards.

The scalarized codegen required #114938, #114927 and #114915 to not
crash.
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…114915)

When lowering fixed length f16 insert_subvector nodes at index 0 we
crashed with zvfhmin because we couldn't select vmv_v_v_vl.
This was due to the predicates requiring full zvfh, even though we only
need zve32x. Use the integer VTypeInfo instead similarly to
VPatSlideVL_VX_VI.

The extract_subvector tests aren't related but were just added for
consistency with the insert_subvector tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants