[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

preames · 2025-04-18T16:24:07Z

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering). Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.

llvmbot · 2025-04-18T16:24:46Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering).

Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.

Full diff: https://github.com/llvm/llvm-project/pull/136321.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+19-3)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+3-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+103-73)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 98c8bdb4bc114..4dba01dd52492 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5018,7 +5018,8 @@ static SDValue lowerVZIP(unsigned Opc, SDValue Op0, SDValue Op1,
                          const SDLoc &DL, SelectionDAG &DAG,
                          const RISCVSubtarget &Subtarget) {
   assert(RISCVISD::RI_VZIPEVEN_VL == Opc || RISCVISD::RI_VZIPODD_VL == Opc ||
-         RISCVISD::RI_VZIP2A_VL == Opc);
+         RISCVISD::RI_VZIP2A_VL == Opc || RISCVISD::RI_VUNZIP2A_VL == Opc ||
+         RISCVISD::RI_VUNZIP2B_VL == Opc);
   assert(Op0.getSimpleValueType() == Op1.getSimpleValueType());
 
   MVT VT = Op0.getSimpleValueType();
@@ -6934,7 +6935,7 @@ static bool hasPassthruOp(unsigned Opcode) {
          Opcode <= RISCVISD::LAST_STRICTFP_OPCODE &&
          "not a RISC-V target specific op");
   static_assert(
-      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 130 &&
+      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 132 &&
       RISCVISD::LAST_STRICTFP_OPCODE - RISCVISD::FIRST_STRICTFP_OPCODE == 21 &&
       "adding target specific op should update this function");
   if (Opcode >= RISCVISD::ADD_VL && Opcode <= RISCVISD::VFMAX_VL)
@@ -6958,7 +6959,7 @@ static bool hasMaskOp(unsigned Opcode) {
          Opcode <= RISCVISD::LAST_STRICTFP_OPCODE &&
          "not a RISC-V target specific op");
   static_assert(
-      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 130 &&
+      RISCVISD::LAST_VL_VECTOR_OP - RISCVISD::FIRST_VL_VECTOR_OP == 132 &&
       RISCVISD::LAST_STRICTFP_OPCODE - RISCVISD::FIRST_STRICTFP_OPCODE == 21 &&
       "adding target specific op should update this function");
   if (Opcode >= RISCVISD::TRUNCATE_VECTOR_VL && Opcode <= RISCVISD::SETCC_VL)
@@ -11509,6 +11510,19 @@ SDValue RISCVTargetLowering::lowerVECTOR_DEINTERLEAVE(SDValue Op,
     return DAG.getMergeValues(Res, DL);
   }
 
+  // TODO: Remove the e64 restriction once the fractional LMUL lowering
+  // is improved to always beat the vnsrl lowering below.
+  if (Subtarget.hasVendorXRivosVizip() && Factor == 2 &&
+      VecVT.getVectorElementType() == MVT::i64) {
+    SDValue V1 = Op->getOperand(0);
+    SDValue V2 = Op->getOperand(1);
+    SDValue Even =
+      lowerVZIP(RISCVISD::RI_VUNZIP2A_VL, V1, V2, DL, DAG, Subtarget);
+    SDValue Odd =
+      lowerVZIP(RISCVISD::RI_VUNZIP2B_VL, V1, V2, DL, DAG, Subtarget);
+    return DAG.getMergeValues({Even, Odd}, DL);
+  }
+
   SmallVector<SDValue, 8> Ops(Op->op_values());
 
   // Concatenate the vectors as one vector to deinterleave
@@ -22242,6 +22256,8 @@ const char *RISCVTargetLowering::getTargetNodeName(unsigned Opcode) const {
   NODE_NAME_CASE(RI_VZIPEVEN_VL)
   NODE_NAME_CASE(RI_VZIPODD_VL)
   NODE_NAME_CASE(RI_VZIP2A_VL)
+  NODE_NAME_CASE(RI_VUNZIP2A_VL)
+  NODE_NAME_CASE(RI_VUNZIP2B_VL)
   NODE_NAME_CASE(READ_CSR)
   NODE_NAME_CASE(WRITE_CSR)
   NODE_NAME_CASE(SWAP_CSR)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index d624ee33d6a63..baf1b2e4d8e6e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -408,8 +408,10 @@ enum NodeType : unsigned {
   RI_VZIPEVEN_VL,
   RI_VZIPODD_VL,
   RI_VZIP2A_VL,
+  RI_VUNZIP2A_VL,
+  RI_VUNZIP2B_VL,
 
-  LAST_VL_VECTOR_OP = RI_VZIP2A_VL,
+  LAST_VL_VECTOR_OP = RI_VUNZIP2B_VL,
 
   // Read VLENB CSR
   READ_VLENB,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
index 3fe50503f937b..147f89850765a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXRivos.td
@@ -71,6 +71,8 @@ defm RI_VUNZIP2B_V : VALU_IV_V<"ri.vunzip2b", 0b011000>;
 def ri_vzipeven_vl : SDNode<"RISCVISD::RI_VZIPEVEN_VL", SDT_RISCVIntBinOp_VL>;
 def ri_vzipodd_vl : SDNode<"RISCVISD::RI_VZIPODD_VL", SDT_RISCVIntBinOp_VL>;
 def ri_vzip2a_vl : SDNode<"RISCVISD::RI_VZIP2A_VL", SDT_RISCVIntBinOp_VL>;
+def ri_vunzip2a_vl : SDNode<"RISCVISD::RI_VUNZIP2A_VL", SDT_RISCVIntBinOp_VL>;
+def ri_vunzip2b_vl : SDNode<"RISCVISD::RI_VUNZIP2B_VL", SDT_RISCVIntBinOp_VL>;
 
 multiclass RIVPseudoVALU_VV {
   foreach m = MxList in
@@ -82,6 +84,8 @@ let Predicates = [HasVendorXRivosVizip],
 defm PseudoRI_VZIPEVEN   : RIVPseudoVALU_VV;
 defm PseudoRI_VZIPODD   : RIVPseudoVALU_VV;
 defm PseudoRI_VZIP2A   : RIVPseudoVALU_VV;
+defm PseudoRI_VUNZIP2A   : RIVPseudoVALU_VV;
+defm PseudoRI_VUNZIP2B   : RIVPseudoVALU_VV;
 }
 
 multiclass RIVPatBinaryVL_VV<SDPatternOperator vop, string instruction_name,
@@ -98,6 +102,8 @@ multiclass RIVPatBinaryVL_VV<SDPatternOperator vop, string instruction_name,
 defm : RIVPatBinaryVL_VV<ri_vzipeven_vl, "PseudoRI_VZIPEVEN">;
 defm : RIVPatBinaryVL_VV<ri_vzipodd_vl, "PseudoRI_VZIPODD">;
 defm : RIVPatBinaryVL_VV<ri_vzip2a_vl, "PseudoRI_VZIP2A">;
+defm : RIVPatBinaryVL_VV<ri_vunzip2a_vl, "PseudoRI_VUNZIP2A">;
+defm : RIVPatBinaryVL_VV<ri_vunzip2b_vl, "PseudoRI_VUNZIP2B">;
 
 //===----------------------------------------------------------------------===//
 // XRivosVisni
diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
index 2787bef0c893f..3ef1c945ffc15 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll
@@ -1,6 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s
-; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s
+; RUN: llc < %s -mtriple=riscv32 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s --check-prefixes=CHECK,V
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin | FileCheck %s --check-prefixes=CHECK,V
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m,+v,+zvfhmin,+zvfbfmin,+experimental-xrivosvizip | FileCheck %s --check-prefixes=CHECK,ZIP
 
 ; Integers
 
@@ -66,37 +67,55 @@ ret {<vscale x 4 x i32>, <vscale x 4 x i32>} %retval
 }
 
 define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv4i64(<vscale x 4 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
-; CHECK-NEXT:    vmv.v.x v16, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v20, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
-; CHECK-NEXT:    vcompress.vm v12, v8, v16
-; CHECK-NEXT:    vcompress.vm v16, v8, v20
-; CHECK-NEXT:    vmv2r.v v8, v12
-; CHECK-NEXT:    vmv2r.v v10, v16
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
+; V:       # %bb.0:
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
+; V-NEXT:    vmv.v.x v16, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v20, a0
+; V-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; V-NEXT:    vcompress.vm v12, v8, v16
+; V-NEXT:    vcompress.vm v16, v8, v20
+; V-NEXT:    vmv2r.v v8, v12
+; V-NEXT:    vmv2r.v v10, v16
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv2i64_nxv4i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v12, v8, v10
+; ZIP-NEXT:    ri.vunzip2b.vv v14, v8, v10
+; ZIP-NEXT:    vmv.v.v v8, v12
+; ZIP-NEXT:    vmv.v.v v10, v14
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv4i64(<vscale x 4 x i64> %vec)
 ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
 }
 
 define {<vscale x 4 x i64>, <vscale x 4 x i64>} @vector_deinterleave_nxv4i64_nxv8i64(<vscale x 8 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vmv.v.x v24, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v7, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
-; CHECK-NEXT:    vcompress.vm v16, v8, v24
-; CHECK-NEXT:    vcompress.vm v24, v8, v7
-; CHECK-NEXT:    vmv4r.v v8, v16
-; CHECK-NEXT:    vmv4r.v v12, v24
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
+; V:       # %bb.0:
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
+; V-NEXT:    vmv.v.x v24, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v7, a0
+; V-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; V-NEXT:    vcompress.vm v16, v8, v24
+; V-NEXT:    vcompress.vm v24, v8, v7
+; V-NEXT:    vmv4r.v v8, v16
+; V-NEXT:    vmv4r.v v12, v24
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv4i64_nxv8i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v16, v8, v12
+; ZIP-NEXT:    ri.vunzip2b.vv v20, v8, v12
+; ZIP-NEXT:    vmv.v.v v8, v16
+; ZIP-NEXT:    vmv.v.v v12, v20
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 4 x i64>, <vscale x 4 x i64>} @llvm.vector.deinterleave2.nxv8i64(<vscale x 8 x i64> %vec)
 ret {<vscale x 4 x i64>, <vscale x 4 x i64>} %retval
 }
@@ -171,51 +190,62 @@ ret {<vscale x 16 x i32>, <vscale x 16 x i32>} %retval
 }
 
 define {<vscale x 8 x i64>, <vscale x 8 x i64>} @vector_deinterleave_nxv8i64_nxv16i64(<vscale x 16 x i64> %vec) {
-; CHECK-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    addi sp, sp, -16
-; CHECK-NEXT:    .cfi_def_cfa_offset 16
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 4
-; CHECK-NEXT:    sub sp, sp, a0
-; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT:    li a0, 85
-; CHECK-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
-; CHECK-NEXT:    vmv.v.x v7, a0
-; CHECK-NEXT:    li a0, 170
-; CHECK-NEXT:    vmv.v.x v6, a0
-; CHECK-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
-; CHECK-NEXT:    vcompress.vm v24, v8, v7
-; CHECK-NEXT:    vmv1r.v v28, v7
-; CHECK-NEXT:    vmv1r.v v29, v6
-; CHECK-NEXT:    vcompress.vm v0, v8, v29
-; CHECK-NEXT:    vcompress.vm v8, v16, v28
-; CHECK-NEXT:    addi a0, sp, 16
-; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
-; CHECK-NEXT:    vcompress.vm v8, v16, v29
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 3
-; CHECK-NEXT:    add a0, sp, a0
-; CHECK-NEXT:    addi a0, a0, 16
-; CHECK-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
-; CHECK-NEXT:    addi a0, sp, 16
-; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; CHECK-NEXT:    vmv4r.v v28, v8
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 3
-; CHECK-NEXT:    add a0, sp, a0
-; CHECK-NEXT:    addi a0, a0, 16
-; CHECK-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
-; CHECK-NEXT:    vmv4r.v v4, v8
-; CHECK-NEXT:    vmv8r.v v8, v24
-; CHECK-NEXT:    vmv8r.v v16, v0
-; CHECK-NEXT:    csrr a0, vlenb
-; CHECK-NEXT:    slli a0, a0, 4
-; CHECK-NEXT:    add sp, sp, a0
-; CHECK-NEXT:    .cfi_def_cfa sp, 16
-; CHECK-NEXT:    addi sp, sp, 16
-; CHECK-NEXT:    .cfi_def_cfa_offset 0
-; CHECK-NEXT:    ret
+; V-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
+; V:       # %bb.0:
+; V-NEXT:    addi sp, sp, -16
+; V-NEXT:    .cfi_def_cfa_offset 16
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 4
+; V-NEXT:    sub sp, sp, a0
+; V-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; V-NEXT:    li a0, 85
+; V-NEXT:    vsetvli a1, zero, e8, mf8, ta, ma
+; V-NEXT:    vmv.v.x v7, a0
+; V-NEXT:    li a0, 170
+; V-NEXT:    vmv.v.x v6, a0
+; V-NEXT:    vsetvli a0, zero, e64, m8, ta, ma
+; V-NEXT:    vcompress.vm v24, v8, v7
+; V-NEXT:    vmv1r.v v28, v7
+; V-NEXT:    vmv1r.v v29, v6
+; V-NEXT:    vcompress.vm v0, v8, v29
+; V-NEXT:    vcompress.vm v8, v16, v28
+; V-NEXT:    addi a0, sp, 16
+; V-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
+; V-NEXT:    vcompress.vm v8, v16, v29
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 3
+; V-NEXT:    add a0, sp, a0
+; V-NEXT:    addi a0, a0, 16
+; V-NEXT:    vs8r.v v8, (a0) # vscale x 64-byte Folded Spill
+; V-NEXT:    addi a0, sp, 16
+; V-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; V-NEXT:    vmv4r.v v28, v8
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 3
+; V-NEXT:    add a0, sp, a0
+; V-NEXT:    addi a0, a0, 16
+; V-NEXT:    vl8r.v v8, (a0) # vscale x 64-byte Folded Reload
+; V-NEXT:    vmv4r.v v4, v8
+; V-NEXT:    vmv8r.v v8, v24
+; V-NEXT:    vmv8r.v v16, v0
+; V-NEXT:    csrr a0, vlenb
+; V-NEXT:    slli a0, a0, 4
+; V-NEXT:    add sp, sp, a0
+; V-NEXT:    .cfi_def_cfa sp, 16
+; V-NEXT:    addi sp, sp, 16
+; V-NEXT:    .cfi_def_cfa_offset 0
+; V-NEXT:    ret
+;
+; ZIP-LABEL: vector_deinterleave_nxv8i64_nxv16i64:
+; ZIP:       # %bb.0:
+; ZIP-NEXT:    vsetvli a0, zero, e64, m4, ta, ma
+; ZIP-NEXT:    ri.vunzip2a.vv v28, v16, v20
+; ZIP-NEXT:    ri.vunzip2b.vv v4, v16, v20
+; ZIP-NEXT:    ri.vunzip2a.vv v24, v8, v12
+; ZIP-NEXT:    ri.vunzip2b.vv v0, v8, v12
+; ZIP-NEXT:    vmv8r.v v8, v24
+; ZIP-NEXT:    vmv8r.v v16, v0
+; ZIP-NEXT:    ret
 %retval = call {<vscale x 8 x i64>, <vscale x 8 x i64>} @llvm.vector.deinterleave2.nxv16i64(<vscale x 16 x i64> %vec)
 ret {<vscale x 8 x i64>, <vscale x 8 x i64>} %retval
 }

github-actions · 2025-04-18T16:26:27Z

✅ With the latest revision this PR passed the C/C++ code formatter.

mshockwave

LGTM. The CI reported a clang-format error though

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm#136321) If XRivosVizip is available, the ri.vunzip2a and ri.vunzip2b can be used to the concatenation and register deinterleave shuffle. This patch only effects the intrinsic lowering (and thus scalable vectors because the fixed vectors go through shuffle lowering). Note that this patch is restricted to e64 for staging purposes only. e64 is obviously profitable (i.e. we remove a vcompress). At e32 and below, our alternative is a vnsrl instead, and we need a bit more complexity around lowering with fractional LMUL before the ri.vunzip2a/b versions becomes always profitable. I'll post the followup change once this lands.

preames requested review from lukel97 and topperc April 18, 2025 16:24

llvmbot added the backend:RISC-V label Apr 18, 2025

mshockwave approved these changes Apr 18, 2025

View reviewed changes

topperc reviewed Apr 18, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

preames added 2 commits April 18, 2025 10:30

Clang-format

ec8d354

Address review comment

888701f

preames merged commit 722d589 into llvm:main Apr 18, 2025
6 of 11 checks passed

preames deleted the pr-riscv-xrivosvizip-deinterleave2 branch April 18, 2025 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

Uh oh!

preames commented Apr 18, 2025

Uh oh!

llvmbot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025 •

edited

Loading

Uh oh!

mshockwave left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

[RISCV] Lower e64 vector_deinterleave via ri.vunzip2{a,b} if available #136321

Uh oh!

Conversation

preames commented Apr 18, 2025

Uh oh!

llvmbot commented Apr 18, 2025

Uh oh!

github-actions bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 18, 2025 •

edited

Loading