[IR][RISCV] Add llvm.vector.(de)interleave3/5/7 #124825

mshockwave · 2025-01-28T19:28:49Z

These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results.
For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch.

This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them.

These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. Co-Authored-By: Craig Topper <[email protected]>

llvmbot · 2025-01-28T19:29:21Z

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results.
For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch.

This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them.

Patch is 302.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124825.diff

11 Files Affected:

(modified) llvm/include/llvm/IR/DerivedTypes.h (+9)
(modified) llvm/include/llvm/IR/Intrinsics.h (+9-4)
(modified) llvm/include/llvm/IR/Intrinsics.td (+60)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (+12-8)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+48-20)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+58-24)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+2-2)
(modified) llvm/lib/IR/Intrinsics.cpp (+34)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+183-69)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+4093-24)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+2859-25)

diff --git a/llvm/include/llvm/IR/DerivedTypes.h b/llvm/include/llvm/IR/DerivedTypes.h
index b44f4f8c8687dc..60606d34c32c31 100644
--- a/llvm/include/llvm/IR/DerivedTypes.h
+++ b/llvm/include/llvm/IR/DerivedTypes.h
@@ -536,6 +536,15 @@ class VectorType : public Type {
                            EltCnt.divideCoefficientBy(2));
   }
 
+  static VectorType *getOneNthElementsVectorType(VectorType *VTy,
+                                                 unsigned Denominator) {
+    auto EltCnt = VTy->getElementCount();
+    assert(EltCnt.isKnownMultipleOf(Denominator) &&
+           "Cannot take one-nth of a vector");
+    return VectorType::get(VTy->getScalarType(),
+                           EltCnt.divideCoefficientBy(Denominator));
+  }
+
   /// This static method returns a VectorType with twice as many elements as the
   /// input type and the same element type.
   static VectorType *getDoubleElementsVectorType(VectorType *VTy) {
diff --git a/llvm/include/llvm/IR/Intrinsics.h b/llvm/include/llvm/IR/Intrinsics.h
index 82f72131b9d2f4..a6f243a2d98798 100644
--- a/llvm/include/llvm/IR/Intrinsics.h
+++ b/llvm/include/llvm/IR/Intrinsics.h
@@ -148,6 +148,9 @@ namespace Intrinsic {
       ExtendArgument,
       TruncArgument,
       HalfVecArgument,
+      OneThirdVecArgument,
+      OneFifthVecArgument,
+      OneSeventhVecArgument,
       SameVecWidthArgument,
       VecOfAnyPtrsToElt,
       VecElementArgument,
@@ -178,15 +181,17 @@ namespace Intrinsic {
     unsigned getArgumentNumber() const {
       assert(Kind == Argument || Kind == ExtendArgument ||
              Kind == TruncArgument || Kind == HalfVecArgument ||
-             Kind == SameVecWidthArgument || Kind == VecElementArgument ||
-             Kind == Subdivide2Argument || Kind == Subdivide4Argument ||
-             Kind == VecOfBitcastsToInt);
+             Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
+             Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
+             Kind == VecElementArgument || Kind == Subdivide2Argument ||
+             Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
       return Argument_Info >> 3;
     }
     ArgKind getArgumentKind() const {
       assert(Kind == Argument || Kind == ExtendArgument ||
              Kind == TruncArgument || Kind == HalfVecArgument ||
-             Kind == SameVecWidthArgument ||
+             Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
+             Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
              Kind == VecElementArgument || Kind == Subdivide2Argument ||
              Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
       return (ArgKind)(Argument_Info & 7);
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index ee877349a33149..3597400df9b771 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -300,6 +300,8 @@ def IIT_V1 : IIT_Vec<1, 28>;
 def IIT_VARARG : IIT_VT<isVoid, 29>;
 def IIT_HALF_VEC_ARG : IIT_Base<30>;
 def IIT_SAME_VEC_WIDTH_ARG : IIT_Base<31>;
+def IIT_ONE_THIRD_VEC_ARG : IIT_Base<32>;
+def IIT_ONE_FIFTH_VEC_ARG : IIT_Base<33>;
 def IIT_VEC_OF_ANYPTRS_TO_ELT : IIT_Base<34>;
 def IIT_I128 : IIT_Int<128, 35>;
 def IIT_V512 : IIT_Vec<512, 36>;
@@ -327,6 +329,7 @@ def IIT_I4 : IIT_Int<4, 58>;
 def IIT_AARCH64_SVCOUNT : IIT_VT<aarch64svcount, 59>;
 def IIT_V6 : IIT_Vec<6, 60>;
 def IIT_V10 : IIT_Vec<10, 61>;
+def IIT_ONE_SEVENTH_VEC_ARG : IIT_Base<62>;
 }
 
 defvar IIT_all_FixedTypes = !filter(iit, IIT_all,
@@ -467,6 +470,15 @@ class LLVMVectorElementType<int num> : LLVMMatchType<num, IIT_VEC_ELEMENT>;
 class LLVMHalfElementsVectorType<int num>
   : LLVMMatchType<num, IIT_HALF_VEC_ARG>;
 
+class LLVMOneThirdElementsVectorType<int num>
+  : LLVMMatchType<num, IIT_ONE_THIRD_VEC_ARG>;
+
+class LLVMOneFifthElementsVectorType<int num>
+  : LLVMMatchType<num, IIT_ONE_FIFTH_VEC_ARG>;
+
+class LLVMOneSeventhElementsVectorType<int num>
+  : LLVMMatchType<num, IIT_ONE_SEVENTH_VEC_ARG>;
+
 // Match the type of another intrinsic parameter that is expected to be a
 // vector type (i.e. <N x iM>) but with each element subdivided to
 // form a vector with more elements that are smaller than the original.
@@ -2728,6 +2740,54 @@ def int_vector_deinterleave2 : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType
                                                      [llvm_anyvector_ty],
                                                      [IntrNoMem]>;
 
+def int_vector_interleave3   : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                                                     [LLVMOneThirdElementsVectorType<0>,
+                                                      LLVMOneThirdElementsVectorType<0>,
+                                                      LLVMOneThirdElementsVectorType<0>],
+                                                     [IntrNoMem]>;
+
+def int_vector_deinterleave3 : DefaultAttrsIntrinsic<[LLVMOneThirdElementsVectorType<0>,
+                                                      LLVMOneThirdElementsVectorType<0>,
+                                                      LLVMOneThirdElementsVectorType<0>],
+                                                     [llvm_anyvector_ty],
+                                                     [IntrNoMem]>;
+
+def int_vector_interleave5   : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                                                     [LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>],
+                                                     [IntrNoMem]>;
+
+def int_vector_deinterleave5 : DefaultAttrsIntrinsic<[LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>,
+                                                      LLVMOneFifthElementsVectorType<0>],
+                                                     [llvm_anyvector_ty],
+                                                     [IntrNoMem]>;
+
+def int_vector_interleave7   : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+                                                     [LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>],
+                                                     [IntrNoMem]>;
+
+def int_vector_deinterleave7 : DefaultAttrsIntrinsic<[LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>,
+                                                      LLVMOneSeventhElementsVectorType<0>],
+                                                     [llvm_anyvector_ty],
+                                                     [IntrNoMem]>;
+
 //===-------------- Intrinsics to perform partial reduction ---------------===//
 
 def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index b0a624680231e9..c95f7b7eb8dec3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -5825,15 +5825,19 @@ SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SPLICE(SDNode *N) {
 }
 
 SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_INTERLEAVE_DEINTERLEAVE(SDNode *N) {
-  SDLoc dl(N);
+  SDLoc DL(N);
+  unsigned Factor = N->getNumOperands();
+
+  SmallVector<SDValue, 8> Ops(Factor);
+  for (unsigned i = 0; i != Factor; i++)
+    Ops[i] = GetPromotedInteger(N->getOperand(i));
+
+  SmallVector<EVT, 8> ResVTs(Factor, Ops[0].getValueType());
+  SDValue Res = DAG.getNode(N->getOpcode(), DL, DAG.getVTList(ResVTs), Ops);
+
+  for (unsigned i = 0; i != Factor; i++)
+    SetPromotedInteger(SDValue(N, i), Res.getValue(i));
 
-  SDValue V0 = GetPromotedInteger(N->getOperand(0));
-  SDValue V1 = GetPromotedInteger(N->getOperand(1));
-  EVT ResVT = V0.getValueType();
-  SDValue Res = DAG.getNode(N->getOpcode(), dl,
-                            DAG.getVTList(ResVT, ResVT), V0, V1);
-  SetPromotedInteger(SDValue(N, 0), Res.getValue(0));
-  SetPromotedInteger(SDValue(N, 1), Res.getValue(1));
   return SDValue();
 }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..03d0298e99ad4d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1668,6 +1668,15 @@ void DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo,
     return;
   }
 
+  if (getTypeAction(SubVecVT) == TargetLowering::TypeWidenVector &&
+      Vec.isUndef() && SubVecVT.getVectorElementType() == MVT::i1) {
+    SDValue WideSubVec = GetWidenedVector(SubVec);
+    if (WideSubVec.getValueType() == VecVT) {
+      std::tie(Lo, Hi) = DAG.SplitVector(WideSubVec, SDLoc(WideSubVec));
+      return;
+    }
+  }
+
   // Spill the vector to the stack.
   // In cases where the vector is illegal it will be broken down into parts
   // and stored in parts - we should use the alignment for the smallest part.
@@ -3183,34 +3192,53 @@ void DAGTypeLegalizer::SplitVecRes_VP_REVERSE(SDNode *N, SDValue &Lo,
 }
 
 void DAGTypeLegalizer::SplitVecRes_VECTOR_DEINTERLEAVE(SDNode *N) {
+  unsigned Factor = N->getNumOperands();
+
+  SmallVector<SDValue, 8> Ops(Factor * 2);
+  for (unsigned i = 0; i != Factor; ++i) {
+    SDValue OpLo, OpHi;
+    GetSplitVector(N->getOperand(i), OpLo, OpHi);
+    Ops[i * 2] = OpLo;
+    Ops[i * 2 + 1] = OpHi;
+  }
+
+  SmallVector<EVT, 8> VTs(Factor, Ops[0].getValueType());
 
-  SDValue Op0Lo, Op0Hi, Op1Lo, Op1Hi;
-  GetSplitVector(N->getOperand(0), Op0Lo, Op0Hi);
-  GetSplitVector(N->getOperand(1), Op1Lo, Op1Hi);
-  EVT VT = Op0Lo.getValueType();
   SDLoc DL(N);
-  SDValue ResLo = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
-                              DAG.getVTList(VT, VT), Op0Lo, Op0Hi);
-  SDValue ResHi = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
-                              DAG.getVTList(VT, VT), Op1Lo, Op1Hi);
+  SDValue ResLo = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTs,
+                              ArrayRef(Ops).slice(0, Factor));
+  SDValue ResHi = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTs,
+                              ArrayRef(Ops).slice(Factor, Factor));
 
-  SetSplitVector(SDValue(N, 0), ResLo.getValue(0), ResHi.getValue(0));
-  SetSplitVector(SDValue(N, 1), ResLo.getValue(1), ResHi.getValue(1));
+  for (unsigned i = 0; i != Factor; ++i)
+    SetSplitVector(SDValue(N, i), ResLo.getValue(i), ResHi.getValue(i));
 }
 
 void DAGTypeLegalizer::SplitVecRes_VECTOR_INTERLEAVE(SDNode *N) {
-  SDValue Op0Lo, Op0Hi, Op1Lo, Op1Hi;
-  GetSplitVector(N->getOperand(0), Op0Lo, Op0Hi);
-  GetSplitVector(N->getOperand(1), Op1Lo, Op1Hi);
-  EVT VT = Op0Lo.getValueType();
+  unsigned Factor = N->getNumOperands();
+
+  SmallVector<SDValue, 8> Ops(Factor * 2);
+  for (unsigned i = 0; i != Factor; ++i) {
+    SDValue OpLo, OpHi;
+    GetSplitVector(N->getOperand(i), OpLo, OpHi);
+    Ops[i] = OpLo;
+    Ops[i + Factor] = OpHi;
+  }
+
+  SmallVector<EVT, 8> VTs(Factor, Ops[0].getValueType());
+
   SDLoc DL(N);
-  SDValue Res[] = {DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
-                               DAG.getVTList(VT, VT), Op0Lo, Op1Lo),
-                   DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
-                               DAG.getVTList(VT, VT), Op0Hi, Op1Hi)};
+  SDValue Res[] = {DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTs,
+                               ArrayRef(Ops).slice(0, Factor)),
+                   DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTs,
+                               ArrayRef(Ops).slice(Factor, Factor))};
 
-  SetSplitVector(SDValue(N, 0), Res[0].getValue(0), Res[0].getValue(1));
-  SetSplitVector(SDValue(N, 1), Res[1].getValue(0), Res[1].getValue(1));
+  for (unsigned i = 0; i != Factor; ++i) {
+    unsigned IdxLo = 2 * i;
+    unsigned IdxHi = 2 * i + 1;
+    SetSplitVector(SDValue(N, i), Res[IdxLo / Factor].getValue(IdxLo % Factor),
+                   Res[IdxHi / Factor].getValue(IdxHi % Factor));
+  }
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 428e7a316d247b..6867944b5d8b4a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8251,10 +8251,28 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
     visitCallBrLandingPad(I);
     return;
   case Intrinsic::vector_interleave2:
-    visitVectorInterleave(I);
+    visitVectorInterleave(I, 2);
+    return;
+  case Intrinsic::vector_interleave3:
+    visitVectorInterleave(I, 3);
+    return;
+  case Intrinsic::vector_interleave5:
+    visitVectorInterleave(I, 5);
+    return;
+  case Intrinsic::vector_interleave7:
+    visitVectorInterleave(I, 7);
     return;
   case Intrinsic::vector_deinterleave2:
-    visitVectorDeinterleave(I);
+    visitVectorDeinterleave(I, 2);
+    return;
+  case Intrinsic::vector_deinterleave3:
+    visitVectorDeinterleave(I, 3);
+    return;
+  case Intrinsic::vector_deinterleave5:
+    visitVectorDeinterleave(I, 5);
+    return;
+  case Intrinsic::vector_deinterleave7:
+    visitVectorDeinterleave(I, 7);
     return;
   case Intrinsic::experimental_vector_compress:
     setValue(&I, DAG.getNode(ISD::VECTOR_COMPRESS, sdl,
@@ -12565,26 +12583,31 @@ void SelectionDAGBuilder::visitVectorReverse(const CallInst &I) {
   setValue(&I, DAG.getVectorShuffle(VT, DL, V, DAG.getUNDEF(VT), Mask));
 }
 
-void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
+void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I,
+                                                  unsigned Factor) {
   auto DL = getCurSDLoc();
   SDValue InVec = getValue(I.getOperand(0));
-  EVT OutVT =
-      InVec.getValueType().getHalfNumVectorElementsVT(*DAG.getContext());
 
+  SmallVector<EVT, 4> ValueVTs;
+  ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),
+                  ValueVTs);
+
+  EVT OutVT = ValueVTs[0];
   unsigned OutNumElts = OutVT.getVectorMinNumElements();
 
-  // ISD Node needs the input vectors split into two equal parts
-  SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
-                           DAG.getVectorIdxConstant(0, DL));
-  SDValue Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
-                           DAG.getVectorIdxConstant(OutNumElts, DL));
+  SmallVector<SDValue, 4> SubVecs(Factor);
+  for (unsigned i = 0; i != Factor; ++i) {
+    assert(ValueVTs[i] == OutVT && "Expected VTs to be the same");
+    SubVecs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
+                             DAG.getVectorIdxConstant(OutNumElts * i, DL));
+  }
 
   // Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
   // legalisation and combines.
-  if (OutVT.isFixedLengthVector()) {
-    SDValue Even = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
+  if (OutVT.isFixedLengthVector() && Factor == 2) {
+    SDValue Even = DAG.getVectorShuffle(OutVT, DL, SubVecs[0], SubVecs[1],
                                         createStrideMask(0, 2, OutNumElts));
-    SDValue Odd = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
+    SDValue Odd = DAG.getVectorShuffle(OutVT, DL, SubVecs[0], SubVecs[1],
                                        createStrideMask(1, 2, OutNumElts));
     SDValue Res = DAG.getMergeValues({Even, Odd}, getCurSDLoc());
     setValue(&I, Res);
@@ -12592,32 +12615,43 @@ void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
   }
 
   SDValue Res = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
-                            DAG.getVTList(OutVT, OutVT), Lo, Hi);
+                            DAG.getVTList(ValueVTs), SubVecs);
   setValue(&I, Res);
 }
 
-void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I) {
+void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I,
+                                                unsigned Factor) {
   auto DL = getCurSDLoc();
-  EVT InVT = getValue(I.getOperand(0)).getValueType();
-  SDValue InVec0 = getValue(I.getOperand(0));
-  SDValue InVec1 = getValue(I.getOperand(1));
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+  EVT InVT = getValue(I.getOperand(0)).getValueType();
   EVT OutVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
 
+  SmallVector<SDValue, 8> InVecs(Factor);
+  for (unsigned i = 0; i < Factor; ++i) {
+    InVecs[i] = getValue(I.getOperand(i));
+    assert(InVecs[i].getValueType() == InVecs[0].getValueType() &&
+           "Expected VTs to be the same");
+  }
+
   // Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
   // legalisation and combines.
-  if (OutVT.isFixedLengthVector()) {
+  if (OutVT.isFixedLengthVector() && Factor == 2) {
     unsigned NumElts = InVT.getVectorMinNumElements();
-    SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVec0, InVec1);
+    SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVecs);
     setValue(&I, DAG.getVectorShuffle(OutVT, DL, V, DAG.getUNDEF(OutVT),
                                       createInterleaveMask(NumElts, 2)));
     return;
   }
 
-  SDValue Res = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
-                            DAG.getVTList(InVT, InVT), InVec0, InVec1);
-  Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Res.getValue(0),
-                    Res.getValue(1));
+  SmallVector<EVT, 8> ValueVTs(Factor, InVT);
+  SDValue Res =
+      DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, DAG.getVTList(ValueVTs), InVecs);
+
+  SmallVector<SDValue, 8> Results(Factor);
+  for (unsigned i = 0; i < Factor; ++i)
+    Results[i] = Res.getValue(i);
+
+  Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Results);
   setValue(&I, Res);
 }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index ed85deef64fa79..ece48c9bedf722 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -659,8 +659,8 @@ class SelectionDAGBuilder {
   void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
   void visitVectorReverse(const CallInst &I);
   void visitVectorSplice(const CallInst &I);
-  void visitVectorInterleave(const CallInst &I);
-  void visitVectorDeinterleave(const CallInst &I);
+  void visitVectorInterleave(const CallInst &I, unsigned Factor);
+  void visitVectorDeinterleave(const CallInst &I, unsigned Factor);
   void visitStepVector(const CallInst &I);
 
   void visitUserOp1(const Instruction &I) {
diff --git a/llvm/lib/IR/Intrinsics.cpp b/llvm/lib/IR/Intrinsics.cpp
index ec1184e8d835d6..107caebede1391 100644
--- a/llvm/lib/IR/Intrinsics.cpp
+++ b/llvm/lib/IR/Intrinsics.cpp
@@ -362,6 +362,24 @@ DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> In...
[truncated]

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

topperc · 2025-01-28T19:55:41Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  auto [Mask, VL] = getDefaultScalableVLOps(ConcatVT, DL, DAG, Subtarget);
+  SDValue Passthru = DAG.getUNDEF(ConcatVT);
+
+  // For the indices, use the same SEW to avoid an extra vsetvli


Do we need to worry about there being too many elements for SEW=8 to represent the indices? I wrote this code, but I can't figure out how that's not an issue.

We probably can use vrgatherei16.vv here just to be safe, because the longest (legal) type the concatenated vector can have is SEW=8 + LMUL=8, whose VLMAX can be safely put in 16-bit integer. Also, lowerVECTOR_INTERLEAVE is already using vrtahterhei16.vv

We probably can use vrgatherei16.vv here just to be safe, because the longest (legal) type the concatenated vector can have is SEW=8 + LMUL=8, whose VLMAX can be safely put in 16-bit integer.

Well...16-bit element can represent all 65536 indices, which only happens when data operand is SEW=8 + LMUL=8, but in that case the EMUL of the index operand would be invalid (because EMUL = (16/SEW) * LMUL). I guess we also need to spill to the stack and load them back with segmented store and load

But that would require a LMUL=16 vid.v to create the SEW=16 indices

I've changed the logics here to use unit-stride store + segmented load instead.

topperc · 2025-01-28T23:21:14Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+    assert(ValueVTs[i] == OutVT && "Expected VTs to be the same");
+    SubVecs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
+                             DAG.getVectorIdxConstant(OutNumElts * i, DL));
+  }

  // Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing


Comment needs updating

topperc · 2025-01-28T23:21:31Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+    assert(InVecs[i].getValueType() == InVecs[0].getValueType() &&
+           "Expected VTs to be the same");
+  }
+
  // Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing


Comment needs updating

topperc · 2025-01-28T23:24:37Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+  auto [Mask, VL] = getDefaultScalableVLOps(ConcatVT, DL, DAG, Subtarget);
+  SDValue Passthru = DAG.getUNDEF(ConcatVT);
+
+  // For the indices, use the same SEW to avoid an extra vsetvli


But that would require a LMUL=16 vid.v to create the SEW=16 indices

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

When factor > 2.

llvm/lib/IR/Intrinsics.cpp

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

wangpc-pp · 2025-02-05T05:42:18Z

llvm/include/llvm/IR/Intrinsics.td

@@ -300,6 +300,8 @@ def IIT_V1 : IIT_Vec<1, 28>;
 def IIT_VARARG : IIT_VT<isVoid, 29>;
 def IIT_HALF_VEC_ARG : IIT_Base<30>;
 def IIT_SAME_VEC_WIDTH_ARG : IIT_Base<31>;
+def IIT_ONE_THIRD_VEC_ARG : IIT_Base<32>;


Why don't we make them consecutive?
(I don't know why we have a bubble between 31 and 34...)

(I don't know why we have a bubble between 31 and 34...)

There were two IIT w.r.t legacy pointer types that got deprecated after we adopted opaque pointers.

Why don't we make them consecutive?

I was going to make them more compact, but now you pointed out I think it's not really necessary. It is fixed now.

llvm/include/llvm/IR/Intrinsics.td

topperc

LGTM

llvm-ci · 2025-02-06T00:00:56Z

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b1 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/12767