-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[IR][RISCV] Add llvm.vector.(de)interleave3/5/7 #124825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. Co-Authored-By: Craig Topper <[email protected]>
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesThese three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. Patch is 302.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124825.diff 11 Files Affected:
diff --git a/llvm/include/llvm/IR/DerivedTypes.h b/llvm/include/llvm/IR/DerivedTypes.h
index b44f4f8c8687dc..60606d34c32c31 100644
--- a/llvm/include/llvm/IR/DerivedTypes.h
+++ b/llvm/include/llvm/IR/DerivedTypes.h
@@ -536,6 +536,15 @@ class VectorType : public Type {
EltCnt.divideCoefficientBy(2));
}
+ static VectorType *getOneNthElementsVectorType(VectorType *VTy,
+ unsigned Denominator) {
+ auto EltCnt = VTy->getElementCount();
+ assert(EltCnt.isKnownMultipleOf(Denominator) &&
+ "Cannot take one-nth of a vector");
+ return VectorType::get(VTy->getScalarType(),
+ EltCnt.divideCoefficientBy(Denominator));
+ }
+
/// This static method returns a VectorType with twice as many elements as the
/// input type and the same element type.
static VectorType *getDoubleElementsVectorType(VectorType *VTy) {
diff --git a/llvm/include/llvm/IR/Intrinsics.h b/llvm/include/llvm/IR/Intrinsics.h
index 82f72131b9d2f4..a6f243a2d98798 100644
--- a/llvm/include/llvm/IR/Intrinsics.h
+++ b/llvm/include/llvm/IR/Intrinsics.h
@@ -148,6 +148,9 @@ namespace Intrinsic {
ExtendArgument,
TruncArgument,
HalfVecArgument,
+ OneThirdVecArgument,
+ OneFifthVecArgument,
+ OneSeventhVecArgument,
SameVecWidthArgument,
VecOfAnyPtrsToElt,
VecElementArgument,
@@ -178,15 +181,17 @@ namespace Intrinsic {
unsigned getArgumentNumber() const {
assert(Kind == Argument || Kind == ExtendArgument ||
Kind == TruncArgument || Kind == HalfVecArgument ||
- Kind == SameVecWidthArgument || Kind == VecElementArgument ||
- Kind == Subdivide2Argument || Kind == Subdivide4Argument ||
- Kind == VecOfBitcastsToInt);
+ Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
+ Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
+ Kind == VecElementArgument || Kind == Subdivide2Argument ||
+ Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
return Argument_Info >> 3;
}
ArgKind getArgumentKind() const {
assert(Kind == Argument || Kind == ExtendArgument ||
Kind == TruncArgument || Kind == HalfVecArgument ||
- Kind == SameVecWidthArgument ||
+ Kind == OneThirdVecArgument || Kind == OneFifthVecArgument ||
+ Kind == OneSeventhVecArgument || Kind == SameVecWidthArgument ||
Kind == VecElementArgument || Kind == Subdivide2Argument ||
Kind == Subdivide4Argument || Kind == VecOfBitcastsToInt);
return (ArgKind)(Argument_Info & 7);
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index ee877349a33149..3597400df9b771 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -300,6 +300,8 @@ def IIT_V1 : IIT_Vec<1, 28>;
def IIT_VARARG : IIT_VT<isVoid, 29>;
def IIT_HALF_VEC_ARG : IIT_Base<30>;
def IIT_SAME_VEC_WIDTH_ARG : IIT_Base<31>;
+def IIT_ONE_THIRD_VEC_ARG : IIT_Base<32>;
+def IIT_ONE_FIFTH_VEC_ARG : IIT_Base<33>;
def IIT_VEC_OF_ANYPTRS_TO_ELT : IIT_Base<34>;
def IIT_I128 : IIT_Int<128, 35>;
def IIT_V512 : IIT_Vec<512, 36>;
@@ -327,6 +329,7 @@ def IIT_I4 : IIT_Int<4, 58>;
def IIT_AARCH64_SVCOUNT : IIT_VT<aarch64svcount, 59>;
def IIT_V6 : IIT_Vec<6, 60>;
def IIT_V10 : IIT_Vec<10, 61>;
+def IIT_ONE_SEVENTH_VEC_ARG : IIT_Base<62>;
}
defvar IIT_all_FixedTypes = !filter(iit, IIT_all,
@@ -467,6 +470,15 @@ class LLVMVectorElementType<int num> : LLVMMatchType<num, IIT_VEC_ELEMENT>;
class LLVMHalfElementsVectorType<int num>
: LLVMMatchType<num, IIT_HALF_VEC_ARG>;
+class LLVMOneThirdElementsVectorType<int num>
+ : LLVMMatchType<num, IIT_ONE_THIRD_VEC_ARG>;
+
+class LLVMOneFifthElementsVectorType<int num>
+ : LLVMMatchType<num, IIT_ONE_FIFTH_VEC_ARG>;
+
+class LLVMOneSeventhElementsVectorType<int num>
+ : LLVMMatchType<num, IIT_ONE_SEVENTH_VEC_ARG>;
+
// Match the type of another intrinsic parameter that is expected to be a
// vector type (i.e. <N x iM>) but with each element subdivided to
// form a vector with more elements that are smaller than the original.
@@ -2728,6 +2740,54 @@ def int_vector_deinterleave2 : DefaultAttrsIntrinsic<[LLVMHalfElementsVectorType
[llvm_anyvector_ty],
[IntrNoMem]>;
+def int_vector_interleave3 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMOneThirdElementsVectorType<0>,
+ LLVMOneThirdElementsVectorType<0>,
+ LLVMOneThirdElementsVectorType<0>],
+ [IntrNoMem]>;
+
+def int_vector_deinterleave3 : DefaultAttrsIntrinsic<[LLVMOneThirdElementsVectorType<0>,
+ LLVMOneThirdElementsVectorType<0>,
+ LLVMOneThirdElementsVectorType<0>],
+ [llvm_anyvector_ty],
+ [IntrNoMem]>;
+
+def int_vector_interleave5 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>],
+ [IntrNoMem]>;
+
+def int_vector_deinterleave5 : DefaultAttrsIntrinsic<[LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>,
+ LLVMOneFifthElementsVectorType<0>],
+ [llvm_anyvector_ty],
+ [IntrNoMem]>;
+
+def int_vector_interleave7 : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+ [LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>],
+ [IntrNoMem]>;
+
+def int_vector_deinterleave7 : DefaultAttrsIntrinsic<[LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>,
+ LLVMOneSeventhElementsVectorType<0>],
+ [llvm_anyvector_ty],
+ [IntrNoMem]>;
+
//===-------------- Intrinsics to perform partial reduction ---------------===//
def int_experimental_vector_partial_reduce_add : DefaultAttrsIntrinsic<[LLVMMatchType<0>],
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index b0a624680231e9..c95f7b7eb8dec3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -5825,15 +5825,19 @@ SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_SPLICE(SDNode *N) {
}
SDValue DAGTypeLegalizer::PromoteIntRes_VECTOR_INTERLEAVE_DEINTERLEAVE(SDNode *N) {
- SDLoc dl(N);
+ SDLoc DL(N);
+ unsigned Factor = N->getNumOperands();
+
+ SmallVector<SDValue, 8> Ops(Factor);
+ for (unsigned i = 0; i != Factor; i++)
+ Ops[i] = GetPromotedInteger(N->getOperand(i));
+
+ SmallVector<EVT, 8> ResVTs(Factor, Ops[0].getValueType());
+ SDValue Res = DAG.getNode(N->getOpcode(), DL, DAG.getVTList(ResVTs), Ops);
+
+ for (unsigned i = 0; i != Factor; i++)
+ SetPromotedInteger(SDValue(N, i), Res.getValue(i));
- SDValue V0 = GetPromotedInteger(N->getOperand(0));
- SDValue V1 = GetPromotedInteger(N->getOperand(1));
- EVT ResVT = V0.getValueType();
- SDValue Res = DAG.getNode(N->getOpcode(), dl,
- DAG.getVTList(ResVT, ResVT), V0, V1);
- SetPromotedInteger(SDValue(N, 0), Res.getValue(0));
- SetPromotedInteger(SDValue(N, 1), Res.getValue(1));
return SDValue();
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index f39d9ca15496a9..03d0298e99ad4d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -1668,6 +1668,15 @@ void DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR(SDNode *N, SDValue &Lo,
return;
}
+ if (getTypeAction(SubVecVT) == TargetLowering::TypeWidenVector &&
+ Vec.isUndef() && SubVecVT.getVectorElementType() == MVT::i1) {
+ SDValue WideSubVec = GetWidenedVector(SubVec);
+ if (WideSubVec.getValueType() == VecVT) {
+ std::tie(Lo, Hi) = DAG.SplitVector(WideSubVec, SDLoc(WideSubVec));
+ return;
+ }
+ }
+
// Spill the vector to the stack.
// In cases where the vector is illegal it will be broken down into parts
// and stored in parts - we should use the alignment for the smallest part.
@@ -3183,34 +3192,53 @@ void DAGTypeLegalizer::SplitVecRes_VP_REVERSE(SDNode *N, SDValue &Lo,
}
void DAGTypeLegalizer::SplitVecRes_VECTOR_DEINTERLEAVE(SDNode *N) {
+ unsigned Factor = N->getNumOperands();
+
+ SmallVector<SDValue, 8> Ops(Factor * 2);
+ for (unsigned i = 0; i != Factor; ++i) {
+ SDValue OpLo, OpHi;
+ GetSplitVector(N->getOperand(i), OpLo, OpHi);
+ Ops[i * 2] = OpLo;
+ Ops[i * 2 + 1] = OpHi;
+ }
+
+ SmallVector<EVT, 8> VTs(Factor, Ops[0].getValueType());
- SDValue Op0Lo, Op0Hi, Op1Lo, Op1Hi;
- GetSplitVector(N->getOperand(0), Op0Lo, Op0Hi);
- GetSplitVector(N->getOperand(1), Op1Lo, Op1Hi);
- EVT VT = Op0Lo.getValueType();
SDLoc DL(N);
- SDValue ResLo = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
- DAG.getVTList(VT, VT), Op0Lo, Op0Hi);
- SDValue ResHi = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
- DAG.getVTList(VT, VT), Op1Lo, Op1Hi);
+ SDValue ResLo = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTs,
+ ArrayRef(Ops).slice(0, Factor));
+ SDValue ResHi = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTs,
+ ArrayRef(Ops).slice(Factor, Factor));
- SetSplitVector(SDValue(N, 0), ResLo.getValue(0), ResHi.getValue(0));
- SetSplitVector(SDValue(N, 1), ResLo.getValue(1), ResHi.getValue(1));
+ for (unsigned i = 0; i != Factor; ++i)
+ SetSplitVector(SDValue(N, i), ResLo.getValue(i), ResHi.getValue(i));
}
void DAGTypeLegalizer::SplitVecRes_VECTOR_INTERLEAVE(SDNode *N) {
- SDValue Op0Lo, Op0Hi, Op1Lo, Op1Hi;
- GetSplitVector(N->getOperand(0), Op0Lo, Op0Hi);
- GetSplitVector(N->getOperand(1), Op1Lo, Op1Hi);
- EVT VT = Op0Lo.getValueType();
+ unsigned Factor = N->getNumOperands();
+
+ SmallVector<SDValue, 8> Ops(Factor * 2);
+ for (unsigned i = 0; i != Factor; ++i) {
+ SDValue OpLo, OpHi;
+ GetSplitVector(N->getOperand(i), OpLo, OpHi);
+ Ops[i] = OpLo;
+ Ops[i + Factor] = OpHi;
+ }
+
+ SmallVector<EVT, 8> VTs(Factor, Ops[0].getValueType());
+
SDLoc DL(N);
- SDValue Res[] = {DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
- DAG.getVTList(VT, VT), Op0Lo, Op1Lo),
- DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
- DAG.getVTList(VT, VT), Op0Hi, Op1Hi)};
+ SDValue Res[] = {DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTs,
+ ArrayRef(Ops).slice(0, Factor)),
+ DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTs,
+ ArrayRef(Ops).slice(Factor, Factor))};
- SetSplitVector(SDValue(N, 0), Res[0].getValue(0), Res[0].getValue(1));
- SetSplitVector(SDValue(N, 1), Res[1].getValue(0), Res[1].getValue(1));
+ for (unsigned i = 0; i != Factor; ++i) {
+ unsigned IdxLo = 2 * i;
+ unsigned IdxHi = 2 * i + 1;
+ SetSplitVector(SDValue(N, i), Res[IdxLo / Factor].getValue(IdxLo % Factor),
+ Res[IdxHi / Factor].getValue(IdxHi % Factor));
+ }
}
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 428e7a316d247b..6867944b5d8b4a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8251,10 +8251,28 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
visitCallBrLandingPad(I);
return;
case Intrinsic::vector_interleave2:
- visitVectorInterleave(I);
+ visitVectorInterleave(I, 2);
+ return;
+ case Intrinsic::vector_interleave3:
+ visitVectorInterleave(I, 3);
+ return;
+ case Intrinsic::vector_interleave5:
+ visitVectorInterleave(I, 5);
+ return;
+ case Intrinsic::vector_interleave7:
+ visitVectorInterleave(I, 7);
return;
case Intrinsic::vector_deinterleave2:
- visitVectorDeinterleave(I);
+ visitVectorDeinterleave(I, 2);
+ return;
+ case Intrinsic::vector_deinterleave3:
+ visitVectorDeinterleave(I, 3);
+ return;
+ case Intrinsic::vector_deinterleave5:
+ visitVectorDeinterleave(I, 5);
+ return;
+ case Intrinsic::vector_deinterleave7:
+ visitVectorDeinterleave(I, 7);
return;
case Intrinsic::experimental_vector_compress:
setValue(&I, DAG.getNode(ISD::VECTOR_COMPRESS, sdl,
@@ -12565,26 +12583,31 @@ void SelectionDAGBuilder::visitVectorReverse(const CallInst &I) {
setValue(&I, DAG.getVectorShuffle(VT, DL, V, DAG.getUNDEF(VT), Mask));
}
-void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
+void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I,
+ unsigned Factor) {
auto DL = getCurSDLoc();
SDValue InVec = getValue(I.getOperand(0));
- EVT OutVT =
- InVec.getValueType().getHalfNumVectorElementsVT(*DAG.getContext());
+ SmallVector<EVT, 4> ValueVTs;
+ ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),
+ ValueVTs);
+
+ EVT OutVT = ValueVTs[0];
unsigned OutNumElts = OutVT.getVectorMinNumElements();
- // ISD Node needs the input vectors split into two equal parts
- SDValue Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
- DAG.getVectorIdxConstant(0, DL));
- SDValue Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
- DAG.getVectorIdxConstant(OutNumElts, DL));
+ SmallVector<SDValue, 4> SubVecs(Factor);
+ for (unsigned i = 0; i != Factor; ++i) {
+ assert(ValueVTs[i] == OutVT && "Expected VTs to be the same");
+ SubVecs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec,
+ DAG.getVectorIdxConstant(OutNumElts * i, DL));
+ }
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
// legalisation and combines.
- if (OutVT.isFixedLengthVector()) {
- SDValue Even = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
+ if (OutVT.isFixedLengthVector() && Factor == 2) {
+ SDValue Even = DAG.getVectorShuffle(OutVT, DL, SubVecs[0], SubVecs[1],
createStrideMask(0, 2, OutNumElts));
- SDValue Odd = DAG.getVectorShuffle(OutVT, DL, Lo, Hi,
+ SDValue Odd = DAG.getVectorShuffle(OutVT, DL, SubVecs[0], SubVecs[1],
createStrideMask(1, 2, OutNumElts));
SDValue Res = DAG.getMergeValues({Even, Odd}, getCurSDLoc());
setValue(&I, Res);
@@ -12592,32 +12615,43 @@ void SelectionDAGBuilder::visitVectorDeinterleave(const CallInst &I) {
}
SDValue Res = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL,
- DAG.getVTList(OutVT, OutVT), Lo, Hi);
+ DAG.getVTList(ValueVTs), SubVecs);
setValue(&I, Res);
}
-void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I) {
+void SelectionDAGBuilder::visitVectorInterleave(const CallInst &I,
+ unsigned Factor) {
auto DL = getCurSDLoc();
- EVT InVT = getValue(I.getOperand(0)).getValueType();
- SDValue InVec0 = getValue(I.getOperand(0));
- SDValue InVec1 = getValue(I.getOperand(1));
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+ EVT InVT = getValue(I.getOperand(0)).getValueType();
EVT OutVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
+ SmallVector<SDValue, 8> InVecs(Factor);
+ for (unsigned i = 0; i < Factor; ++i) {
+ InVecs[i] = getValue(I.getOperand(i));
+ assert(InVecs[i].getValueType() == InVecs[0].getValueType() &&
+ "Expected VTs to be the same");
+ }
+
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing
// legalisation and combines.
- if (OutVT.isFixedLengthVector()) {
+ if (OutVT.isFixedLengthVector() && Factor == 2) {
unsigned NumElts = InVT.getVectorMinNumElements();
- SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVec0, InVec1);
+ SDValue V = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, InVecs);
setValue(&I, DAG.getVectorShuffle(OutVT, DL, V, DAG.getUNDEF(OutVT),
createInterleaveMask(NumElts, 2)));
return;
}
- SDValue Res = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL,
- DAG.getVTList(InVT, InVT), InVec0, InVec1);
- Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Res.getValue(0),
- Res.getValue(1));
+ SmallVector<EVT, 8> ValueVTs(Factor, InVT);
+ SDValue Res =
+ DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, DAG.getVTList(ValueVTs), InVecs);
+
+ SmallVector<SDValue, 8> Results(Factor);
+ for (unsigned i = 0; i < Factor; ++i)
+ Results[i] = Res.getValue(i);
+
+ Res = DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Results);
setValue(&I, Res);
}
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index ed85deef64fa79..ece48c9bedf722 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -659,8 +659,8 @@ class SelectionDAGBuilder {
void visitVectorReduce(const CallInst &I, unsigned Intrinsic);
void visitVectorReverse(const CallInst &I);
void visitVectorSplice(const CallInst &I);
- void visitVectorInterleave(const CallInst &I);
- void visitVectorDeinterleave(const CallInst &I);
+ void visitVectorInterleave(const CallInst &I, unsigned Factor);
+ void visitVectorDeinterleave(const CallInst &I, unsigned Factor);
void visitStepVector(const CallInst &I);
void visitUserOp1(const Instruction &I) {
diff --git a/llvm/lib/IR/Intrinsics.cpp b/llvm/lib/IR/Intrinsics.cpp
index ec1184e8d835d6..107caebede1391 100644
--- a/llvm/lib/IR/Intrinsics.cpp
+++ b/llvm/lib/IR/Intrinsics.cpp
@@ -362,6 +362,24 @@ DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> In...
[truncated]
|
auto [Mask, VL] = getDefaultScalableVLOps(ConcatVT, DL, DAG, Subtarget); | ||
SDValue Passthru = DAG.getUNDEF(ConcatVT); | ||
|
||
// For the indices, use the same SEW to avoid an extra vsetvli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about there being too many elements for SEW=8 to represent the indices? I wrote this code, but I can't figure out how that's not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably can use vrgatherei16.vv here just to be safe, because the longest (legal) type the concatenated vector can have is SEW=8 + LMUL=8, whose VLMAX can be safely put in 16-bit integer. Also, lowerVECTOR_INTERLEAVE is already using vrtahterhei16.vv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably can use vrgatherei16.vv here just to be safe, because the longest (legal) type the concatenated vector can have is SEW=8 + LMUL=8, whose VLMAX can be safely put in 16-bit integer.
Well...16-bit element can represent all 65536 indices, which only happens when data operand is SEW=8 + LMUL=8, but in that case the EMUL of the index operand would be invalid (because EMUL = (16/SEW) * LMUL). I guess we also need to spill to the stack and load them back with segmented store and load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would require a LMUL=16 vid.v to create the SEW=16 indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the logics here to use unit-stride store + segmented load instead.
assert(ValueVTs[i] == OutVT && "Expected VTs to be the same"); | ||
SubVecs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, OutVT, InVec, | ||
DAG.getVectorIdxConstant(OutNumElts * i, DL)); | ||
} | ||
|
||
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
assert(InVecs[i].getValueType() == InVecs[0].getValueType() && | ||
"Expected VTs to be the same"); | ||
} | ||
|
||
// Use VECTOR_SHUFFLE for fixed-length vectors to benefit from existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
auto [Mask, VL] = getDefaultScalableVLOps(ConcatVT, DL, DAG, Subtarget); | ||
SDValue Passthru = DAG.getUNDEF(ConcatVT); | ||
|
||
// For the indices, use the same SEW to avoid an extra vsetvli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would require a LMUL=16 vid.v to create the SEW=16 indices
llvm/include/llvm/IR/Intrinsics.td
Outdated
@@ -300,6 +300,8 @@ def IIT_V1 : IIT_Vec<1, 28>; | |||
def IIT_VARARG : IIT_VT<isVoid, 29>; | |||
def IIT_HALF_VEC_ARG : IIT_Base<30>; | |||
def IIT_SAME_VEC_WIDTH_ARG : IIT_Base<31>; | |||
def IIT_ONE_THIRD_VEC_ARG : IIT_Base<32>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we make them consecutive?
(I don't know why we have a bubble between 31 and 34...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't know why we have a bubble between 31 and 34...)
There were two IIT w.r.t legacy pointer types that got deprecated after we adopted opaque pointers.
Why don't we make them consecutive?
I was going to make them more compact, but now you pointed out I think it's not really necessary. It is fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/12767 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/12729 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/12912 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/6639 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/10918 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/18021 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/18926 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/13338 Here is the relevant piece of the build log for the reference
|
Many bot failures above. I can just add what I've seen manually:
|
Most of the buildbot failures should be fixed by e335ca7 I'm looking into the expensive check failure |
Fix: #126155 |
Somtimes when we're breaking up a large vector copy into several smaller ones, not every single smaller source registers are initialized at the time when the original COPY happens, and the verifier will not be pleased when seeing the smaller copies reading from an undef register. This patch is a workaround for the said issue by attaching an implicit read of the source operand on the newly generated copies. This is tested by llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll which would have crashed the compiler without this fix when LLVM_EXPENSIVE_CHECK is enabled. Original context: #124825 (comment) --------- Co-authored-by: Craig Topper <[email protected]>
… (#126155) Somtimes when we're breaking up a large vector copy into several smaller ones, not every single smaller source registers are initialized at the time when the original COPY happens, and the verifier will not be pleased when seeing the smaller copies reading from an undef register. This patch is a workaround for the said issue by attaching an implicit read of the source operand on the newly generated copies. This is tested by llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll which would have crashed the compiler without this fix when LLVM_EXPENSIVE_CHECK is enabled. Original context: llvm/llvm-project#124825 (comment) --------- Co-authored-by: Craig Topper <[email protected]>
These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results. For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch. This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them. --------- Co-authored-by: Craig Topper <[email protected]>
Somtimes when we're breaking up a large vector copy into several smaller ones, not every single smaller source registers are initialized at the time when the original COPY happens, and the verifier will not be pleased when seeing the smaller copies reading from an undef register. This patch is a workaround for the said issue by attaching an implicit read of the source operand on the newly generated copies. This is tested by llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll which would have crashed the compiler without this fix when LLVM_EXPENSIVE_CHECK is enabled. Original context: llvm#124825 (comment) --------- Co-authored-by: Craig Topper <[email protected]>
) Currently the loop vectorizer can only vectorize interleave groups for power-of-2 factors at scalable VFs by recursively interleaving [de]interleave2 intrinsics. However after #124825 and #139893, we now have [de]interleave intrinsics for all factors up to 8, which is enough to support all types of segmented loads and stores on RISC-V. Now that the interleaved access pass has been taught to lower these in #139373 and #141512, this patch teaches the loop vectorizer to emit these intrinsics for factors up to 8, which enables scalable vectorization for non-power-of-2 factors. As far as I'm aware, no in-tree target will vectorize a scalable interelave group above factor 8 because the maximum interleave factor is capped at 4 on AArch64 and 8 on RISC-V, and the `-max-interleave-group-factor` CLI option defaults to 8, so the recursive [de]interleaving code has been removed for now. Factors of 3 with scalable VFs are also turned off in AArch64 since there's no lowering for [de]interleave3 just yet either.
…and 7 (#141865) Currently the loop vectorizer can only vectorize interleave groups for power-of-2 factors at scalable VFs by recursively interleaving [de]interleave2 intrinsics. However after llvm/llvm-project#124825 and #139893, we now have [de]interleave intrinsics for all factors up to 8, which is enough to support all types of segmented loads and stores on RISC-V. Now that the interleaved access pass has been taught to lower these in #139373 and #141512, this patch teaches the loop vectorizer to emit these intrinsics for factors up to 8, which enables scalable vectorization for non-power-of-2 factors. As far as I'm aware, no in-tree target will vectorize a scalable interelave group above factor 8 because the maximum interleave factor is capped at 4 on AArch64 and 8 on RISC-V, and the `-max-interleave-group-factor` CLI option defaults to 8, so the recursive [de]interleaving code has been removed for now. Factors of 3 with scalable VFs are also turned off in AArch64 since there's no lowering for [de]interleave3 just yet either.
…#141865) Currently the loop vectorizer can only vectorize interleave groups for power-of-2 factors at scalable VFs by recursively interleaving [de]interleave2 intrinsics. However after llvm#124825 and llvm#139893, we now have [de]interleave intrinsics for all factors up to 8, which is enough to support all types of segmented loads and stores on RISC-V. Now that the interleaved access pass has been taught to lower these in llvm#139373 and llvm#141512, this patch teaches the loop vectorizer to emit these intrinsics for factors up to 8, which enables scalable vectorization for non-power-of-2 factors. As far as I'm aware, no in-tree target will vectorize a scalable interelave group above factor 8 because the maximum interleave factor is capped at 4 on AArch64 and 8 on RISC-V, and the `-max-interleave-group-factor` CLI option defaults to 8, so the recursive [de]interleaving code has been removed for now. Factors of 3 with scalable VFs are also turned off in AArch64 since there's no lowering for [de]interleave3 just yet either.
These three intrinsics are similar to llvm.vector.(de)interleave2 but work with 3/5/7 vector operands or results.
For RISC-V, it's important to have them in order to support segmented load/store with factor of 2 to 8: factor of 2/4/8 can be synthesized from (de)interleave2; factor of 6 can be synthesized from factor of 2 and 3; factor 5 and 7 have their own intrinsics added by this patch.
This patch only adds codegen support for these intrinsics, we still need to teach vectorizer to generate them as well as teaching InterleavedAccessPass to use them.