Skip to content

[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 18, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Apr 18, 2025

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be
used to the concatenation and register deinterleave shuffle.  This patch
only effects the intrinsic lowering (and thus scalable vectors because
the fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only.
e64 is obviously profitable (i.e. we remove a vcompress).  At e32
and below, our alternative is a vnsrl instead, and we need a bit
more complexity around lowering with fractional LMUL before the
ri.vunzip2a/b versions becomes always profitable.  I'll post the
followup change once this lands.
@llvmbot
Copy link
Member

llvmbot commented Apr 18, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.


Full diff: https://github.com/llvm/llvm-project/pull/136321.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+19-3)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+3-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td (+6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+103-73)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 98c8bdb4bc114..4dba01dd52492 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5018,7 +5018,8 @@ static SDValue lowerVZIP(unsigned Opc, SDValue Op0, SDValue Op1,
                          const SDLoc &DL, SelectionDAG &DAG,
                          const RISCVSubtarget &Subtarget) {
   assert(RISCVISD::RI_VZIPEVEN_VL == Opc || RISCVISD::RI_VZIPODD_VL == Opc ||
-         RISCVISD::RI_VZIP2A_VL == Opc);
+         RISCVISD::RI_VZIP2A_VL == Opc || RISCVISD::RI_VUNZIP2A_VL == Opc ||
+         RISCVISD::RI_VUNZIP2B_VL == Opc);
   assert(Op0.getSimpleValueType() == Op1.getSimpleValueType());
 
   MVT VT = Op0.getSimpleValueType();
@@ -6934,7 +6935,7 @@ static bool hasPassthruOp(unsigned Opcode) {
          Opcode <= RISCVISD::LAST_STRICTFP_OPCODE &&
          "not a RISC-V target specific op");
   static_assert(
-      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 130 &&
+      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 132 &&
       RISCVISD::LAST_STRICTFP_OPCODE - RISCVISD::FIRST_STRICTFP_OPCODE == 21 &&
       "adding target specific op should update this function");
   if (Opcode >= RISCVISD::ADD_VL && Opcode <= RISCVISD::VFMAX_VL)
@@ -6958,7 +6959,7 @@ static bool hasMaskOp(unsigned Opcode) {
          Opcode <= RISCVISD::LAST_STRICTFP_OPCODE &&
          "not a RISC-V target specific op");
   static_assert(
-      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 130 &&
+      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 132 &&
       RISCVISD::LAST_STRICTFP_OPCODE - RISCVISD::FIRST_STRICTFP_OPCODE == 21 &&
       "adding target specific op should update this function");
   if (Opcode >= RISCVISD::TRUNCATE_VECTOR_VL && Opcode <= RISCVISD::SETCC_VL)
@@ -11509,6 +11510,19 @@ SDValue RISCVTargetLowering::lowerVECTOR_DEINTERLEAVE(SDValue Op,
     return DAG.getMergeValues(Res, DL);
   }
 
+  // TODO: Remove the e64 restriction once the fractional LMUL lowering
+  // is improved to always beat the vnsrl lowering below.
+  if (Subtarget.hasVendorXRivosVizip() && Factor == 2 &&
+      VecVT.getVectorElementType() == MVT::i64) {
+    SDValue V1 = Op->getOperand(0);
+    SDValue V2 = Op->getOperand(1);
+    SDValue Even =
+      lowerVZIP(RISCVISD::RI_VUNZIP2A_VL, V1, V2, DL, DAG, Subtarget);
+    SDValue Odd =
+      lowerVZIP(RISCVISD::RI_VUNZIP2B_VL, V1, V2, DL, DAG, Subtarget);
+    return DAG.getMergeValues({Even, Odd}, DL);
+  }
+
   SmallVector<SDValue, 8> Ops(Op->op_values());
 
   // Concatenate the vectors as one vector to deinterleave
@@ -22242,6 +22256,8 @@ const char *RISCVTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(RI_VZIPEVEN_VL)
   NODE_NAME_CASE(RI_VZIPODD_VL)
   NODE_NAME_CASE(RI_VZIP2A_VL)
+  NODE_NAME_CASE(RI_VUNZIP2A_VL)
+  NODE_NAME_CASE(RI_VUNZIP2B_VL)
   NODE_NAME_CASE(READ_CSR)
   NODE_NAME_CASE(WRITE_CSR)
   NODE_NAME_CASE(SWAP_CSR)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index d624ee33d6a63..baf1b2e4d8e6e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -408,8 +408,10 @@ enum NodeType : unsigned {
   RI_VZIPEVEN_VL,
   RI_VZIPODD_VL,
   RI_VZIP2A_VL,
+  RI_VUNZIP2A_VL,
+  RI_VUNZIP2B_VL,
 
-  LAST_VL_VECTOR_OP = RI_VZIP2A_VL,
+  LAST_VL_VECTOR_OP = RI_VUNZIP2B_VL,
 
   // Read VLENB CSR
   READ_VLENB,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
index 3fe50503f937b..147f89850765a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
@@ -71,6 +71,8 @@ defm RI_VUNZIP2B_V : VALU_IV_V<"ri.vunzip2b", 0b011000>;
 def ri_vzipeven_vl : SDNode<"RISCVISD::RI_VZIPEVEN_VL", SDT_RISCVIntBinOp_VL>;
 def ri_vzipodd_vl : SDNode<"RISCVISD::RI_VZIPODD_VL", SDT_RISCVIntBinOp_VL>;
 def ri_vzip2a_vl : SDNode<"RISCVISD::RI_VZIP2A_VL", SDT_RISCVIntBinOp_VL>;
+def ri_vunzip2a_vl : SDNode<"RISCVISD::RI_VUNZIP2A_VL", SDT_RISCVIntBinOp_VL>;
+def ri_vunzip2b_vl : SDNode<"RISCVISD::RI_VUNZIP2B_VL", SDT_RISCVIntBinOp_VL>;
 
 multiclass RIVPseudoVALU_VV {
   foreach m = MxList in
@@ -82,6 +84,8 @@ let Predicates = [HasVendorXRivosVizip],
 defm PseudoRI_VZIPEVEN   : RIVPseudoVALU_VV;
 defm PseudoRI_VZIPODD   : RIVPseudoVALU_VV;
 defm PseudoRI_VZIP2A   : RIVPseudoVALU_VV;
+defm PseudoRI_VUNZIP2A   : RIVPseudoVALU_VV;
+defm PseudoRI_VUNZIP2B   : RIVPseudoVALU_VV;
 }
 
 multiclass RIVPatBinaryVL_VV<SDPatternOperator vop, string instruction_name,
@@ -98,6 +102,8 @@ multiclass RIVPatBinaryVL_VV<SDPatternOperator vop, string instruction_name,
 defm : RIVPatBinaryVL_VV<ri_vzipeven_vl, "PseudoRI_VZIPEVEN">;
 defm : RIVPatBinaryVL_VV<ri_vzipodd_vl, "PseudoRI_VZIPODD">;
 defm : RIVPatBinaryVL_VV<ri_vzip2a_vl, "PseudoRI_VZIP2A">;
+defm : RIVPatBinaryVL_VV<ri_vunzip2a_vl, "PseudoRI_VUNZIP2A">;
+defm : RIVPatBinaryVL_VV<ri_vunzip2b_vl, "PseudoRI_VUNZIP2B">;
 
 //===----------------------------------------------------------------------===//
 // XRivosVisni
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
index 2787bef0c893f..3ef1c945ffc15 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s --check-prefixes=CHECK,V
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s --check-prefixes=CHECK,V
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin,+experimental-xrivosvizip | FileCheck %s --check-prefixes=CHECK,ZIP
 
 ; Integers
 
@@ -66,37 +67,55 @@ ret {<vscale x 4 x i32>, <vscale x 4 x i32>} %retval
 }
 
 define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv4i64(<vscale x 4 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
-; CHECK-NEXT:    vmv.v.x v16, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v20, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
-; CHECK-NEXT:    vcompress.vm v12, v8, v16
-; CHECK-NEXT:    vcompress.vm v16, v8, v20
-; CHECK-NEXT:    vmv2r.v v8, v12
-; CHECK-NEXT:    vmv2r.v v10, v16
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
+; V:       # %bb.0:
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
+; V-NEXT:    vmv.v.x v16, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v20, a0
+; V-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; V-NEXT:    vcompress.vm v12, v8, v16
+; V-NEXT:    vcompress.vm v16, v8, v20
+; V-NEXT:    vmv2r.v v8, v12
+; V-NEXT:    vmv2r.v v10, v16
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v12, v8, v10
+; ZIP-NEXT:    ri.vunzip2b.vv v14, v8, v10
+; ZIP-NEXT:    vmv.v.v v8, v12
+; ZIP-NEXT:    vmv.v.v v10, v14
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %vec)
 ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
 }
 
 define {<vscale x 4 x i64>, <vscale x 4 x i64>} @vector_deinterleave_nxv4i64_nxv8i64(<vscale x 8 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vmv.v.x v24, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v7, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
-; CHECK-NEXT:    vcompress.vm v16, v8, v24
-; CHECK-NEXT:    vcompress.vm v24, v8, v7
-; CHECK-NEXT:    vmv4r.v v8, v16
-; CHECK-NEXT:    vmv4r.v v12, v24
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
+; V:       # %bb.0:
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
+; V-NEXT:    vmv.v.x v24, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v7, a0
+; V-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; V-NEXT:    vcompress.vm v16, v8, v24
+; V-NEXT:    vcompress.vm v24, v8, v7
+; V-NEXT:    vmv4r.v v8, v16
+; V-NEXT:    vmv4r.v v12, v24
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v16, v8, v12
+; ZIP-NEXT:    ri.vunzip2b.vv v20, v8, v12
+; ZIP-NEXT:    vmv.v.v v8, v16
+; ZIP-NEXT:    vmv.v.v v12, v20
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 4 x i64>, <vscale x 4 x i64>} @llvm.vector.deinterleave2.nxv8i64(<vscale x 8 x i64> %vec)
 ret {<vscale x 4 x i64>, <vscale x 4 x i64>} %retval
 }
@@ -171,51 +190,62 @@ ret {<vscale x 16 x i32>, <vscale x 16 x i32>} %retval
 }
 
 define {<vscale x 8 x i64>, <vscale x 8 x i64>} @vector_deinterleave_nxv8i64_nxv16i64(<vscale x 16 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    addi sp, sp, -16
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 4
-; CHECK-NEXT:    sub sp, sp, a0
-; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vmv.v.x v7, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v6, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
-; CHECK-NEXT:    vcompress.vm v24, v8, v7
-; CHECK-NEXT:    vmv1r.v v28, v7
-; CHECK-NEXT:    vmv1r.v v29, v6
-; CHECK-NEXT:    vcompress.vm v0, v8, v29
-; CHECK-NEXT:    vcompress.vm v8, v16, v28
-; CHECK-NEXT:    addi a0, sp, 16
-; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
-; CHECK-NEXT:    vcompress.vm v8, v16, v29
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 3
-; CHECK-NEXT:    add a0, sp, a0
-; CHECK-NEXT:    addi a0, a0, 16
-; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
-; CHECK-NEXT:    addi a0, sp, 16
-; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; CHECK-NEXT:    vmv4r.v v28, v8
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 3
-; CHECK-NEXT:    add a0, sp, a0
-; CHECK-NEXT:    addi a0, a0, 16
-; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; CHECK-NEXT:    vmv4r.v v4, v8
-; CHECK-NEXT:    vmv8r.v v8, v24
-; CHECK-NEXT:    vmv8r.v v16, v0
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 4
-; CHECK-NEXT:    add sp, sp, a0
-; CHECK-NEXT:    .cfi_def_cfa sp, 16
-; CHECK-NEXT:    addi sp, sp, 16
-; CHECK-NEXT:    .cfi_def_cfa_offset 0
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
+; V:       # %bb.0:
+; V-NEXT:    addi sp, sp, -16
+; V-NEXT:    .cfi_def_cfa_offset 16
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 4
+; V-NEXT:    sub sp, sp, a0
+; V-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
+; V-NEXT:    vmv.v.x v7, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v6, a0
+; V-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; V-NEXT:    vcompress.vm v24, v8, v7
+; V-NEXT:    vmv1r.v v28, v7
+; V-NEXT:    vmv1r.v v29, v6
+; V-NEXT:    vcompress.vm v0, v8, v29
+; V-NEXT:    vcompress.vm v8, v16, v28
+; V-NEXT:    addi a0, sp, 16
+; V-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
+; V-NEXT:    vcompress.vm v8, v16, v29
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 3
+; V-NEXT:    add a0, sp, a0
+; V-NEXT:    addi a0, a0, 16
+; V-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
+; V-NEXT:    addi a0, sp, 16
+; V-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; V-NEXT:    vmv4r.v v28, v8
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 3
+; V-NEXT:    add a0, sp, a0
+; V-NEXT:    addi a0, a0, 16
+; V-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; V-NEXT:    vmv4r.v v4, v8
+; V-NEXT:    vmv8r.v v8, v24
+; V-NEXT:    vmv8r.v v16, v0
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 4
+; V-NEXT:    add sp, sp, a0
+; V-NEXT:    .cfi_def_cfa sp, 16
+; V-NEXT:    addi sp, sp, 16
+; V-NEXT:    .cfi_def_cfa_offset 0
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v28, v16, v20
+; ZIP-NEXT:    ri.vunzip2b.vv v4, v16, v20
+; ZIP-NEXT:    ri.vunzip2a.vv v24, v8, v12
+; ZIP-NEXT:    ri.vunzip2b.vv v0, v8, v12
+; ZIP-NEXT:    vmv8r.v v8, v24
+; ZIP-NEXT:    vmv8r.v v16, v0
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 8 x i64>, <vscale x 8 x i64>} @llvm.vector.deinterleave2.nxv16i64(<vscale x 16 x i64> %vec)
 ret {<vscale x 8 x i64>, <vscale x 8 x i64>} %retval
 }

Copy link

github-actions bot commented Apr 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The CI reported a clang-format error though

@preames preames merged commit 722d589 into llvm:main Apr 18, 2025
6 of 11 checks passed
@preames preames deleted the pr-riscv-xrivosvizip-deinterleave2 branch April 18, 2025 17:40
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136321)

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used
to the concatenation and register deinterleave shuffle. This patch only
effects the intrinsic lowering (and thus scalable vectors because the
fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64
is obviously profitable (i.e. we remove a vcompress). At e32 and below,
our alternative is a vnsrl instead, and we need a bit more complexity
around lowering with fractional LMUL before the ri.vunzip2a/b versions
becomes always profitable. I'll post the followup change once this
lands.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136321)

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used
to the concatenation and register deinterleave shuffle. This patch only
effects the intrinsic lowering (and thus scalable vectors because the
fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64
is obviously profitable (i.e. we remove a vcompress). At e32 and below,
our alternative is a vnsrl instead, and we need a bit more complexity
around lowering with fractional LMUL before the ri.vunzip2a/b versions
becomes always profitable. I'll post the followup change once this
lands.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136321)

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used
to the concatenation and register deinterleave shuffle. This patch only
effects the intrinsic lowering (and thus scalable vectors because the
fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64
is obviously profitable (i.e. we remove a vcompress). At e32 and below,
our alternative is a vnsrl instead, and we need a bit more complexity
around lowering with fractional LMUL before the ri.vunzip2a/b versions
becomes always profitable. I'll post the followup change once this
lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants