-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][VectorOps] Add vector.interleave
operation
#80315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir-vector @llvm/pr-subscribers-mlir Author: Benjamin Maxwell (MacDue) ChangesThe interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size. Note that for the n-D case this differs from the interleaving possible with Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable. Example: %0 = vector.interleave %a, %b
: vector<[4]xi32> ; yields vector<[8]xi32>
%1 = vector.interleave %c, %d
: vector<8xi8> ; yields vector<16xi8>
%2 = vector.interleave %e, %f
: vector<f16> ; yields vector<2xf16>
%3 = vector.interleave %g, %h
: vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
%4 = vector.interleave %i, %j
: vector<6x3xf32> ; yields vector<6x6xf32> Full diff: https://github.com/llvm/llvm-project/pull/80315.diff 7 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index fdf51f0173511..c891c1b978b54 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -478,6 +478,71 @@ def Vector_ShuffleOp :
let hasCanonicalizer = 1;
}
+def Vector_InterleaveOp :
+ Vector_Op<"interleave", [Pure,
+ AllTypesMatch<["lhs", "rhs"]>,
+ TypesMatchWith<
+ "type of 'result' is double the width of the inputs",
+ "lhs", "result",
+ [{
+ [&]() -> ::mlir::VectorType {
+ auto vectorType = ::llvm::cast<mlir::VectorType>($_self);
+ ::mlir::VectorType::Builder builder(vectorType);
+ if (vectorType.getRank() == 0) {
+ static constexpr int64_t v2xty_shape[] = { 2 };
+ return builder.setShape(v2xty_shape);
+ }
+ auto lastDim = vectorType.getRank() - 1;
+ return builder.setDim(lastDim, vectorType.getDimSize(lastDim) * 2);
+ }()
+ }]>]> {
+ let summary = "constructs a vector by interleaving two input vectors";
+ let description = [{
+ The interleave operation constructs a new vector by interleaving the
+ elements from the trailing (or final) dimension of two input vectors,
+ returning a new vector where the trailing dimension is twice the size.
+
+ Note that for the n-D case this differs from the interleaving possible with
+ `vector.shuffle`, which would only operate on the leading dimension.
+
+ Another key difference is this operation supports scalable vectors, though
+ currently a general LLVM lowering is limited to the case where only the
+ trailing dimension is scalable.
+
+ Example:
+ ```mlir
+ %0 = vector.interleave %a, %b
+ : vector<[4]xi32> ; yields vector<[8]xi32>
+ %1 = vector.interleave %c, %d
+ : vector<8xi8> ; yields vector<16xi8>
+ %2 = vector.interleave %e, %f
+ : vector<f16> ; yields vector<2xf16>
+ %3 = vector.interleave %g, %h
+ : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
+ %4 = vector.interleave %i, %j
+ : vector<6x3xf32> ; yields vector<6x6xf32>
+ ```
+ }];
+
+ let arguments = (ins AnyVectorOfAnyRank:$lhs, AnyVectorOfAnyRank:$rhs);
+ let results = (outs AnyVector:$result);
+
+ let assemblyFormat = [{
+ $lhs `,` $rhs attr-dict `:` type($lhs)
+ }];
+
+ let extraClassDeclaration = [{
+ VectorType getSourceVectorType() {
+ return ::llvm::cast<VectorType>(getLhs().getType());
+ }
+ VectorType getResultVectorType() {
+ return ::llvm::cast<VectorType>(getResult().getType());
+ }
+ }];
+
+ let hasCanonicalizer = 1;
+}
+
def Vector_ExtractElementOp :
Vector_Op<"extractelement", [Pure,
TypesMatchWith<"result type matches element type of vector operand",
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index b66b55ae8d57f..4dc62608d1b92 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1734,6 +1734,70 @@ struct VectorSplatNdOpLowering : public ConvertOpToLLVMPattern<SplatOp> {
}
};
+struct VectorInterleaveOpLowering
+ : public ConvertOpToLLVMPattern<vector::InterleaveOp> {
+ using ConvertOpToLLVMPattern::ConvertOpToLLVMPattern;
+
+ void initialize() {
+ // This pattern recursively unpacks one dimension at a time. The recursion
+ // bounded as the rank is strictly decreasing.
+ setHasBoundedRewriteRecursion();
+ }
+
+ LogicalResult
+ matchAndRewrite(vector::InterleaveOp interleaveOp, OpAdaptor adaptor,
+ ConversionPatternRewriter &rewriter) const override {
+ VectorType resultType = interleaveOp.getResultVectorType();
+
+ // If the result is rank 1, then this directly maps to LLVM.
+ if (resultType.getRank() == 1) {
+ if (resultType.isScalable()) {
+ rewriter.replaceOpWithNewOp<LLVM::experimental_vector_interleave2>(
+ interleaveOp, typeConverter->convertType(resultType),
+ adaptor.getLhs(), adaptor.getRhs());
+ return success();
+ }
+ // Lower fixed-size interleaves to a shufflevector. While the
+ // vector.interleave2 intrinsic supports fixed and scalable vectors, the
+ // langref still recommends fixed-vectors use shufflevector, see:
+ // https://llvm.org/docs/LangRef.html#id876.
+ int64_t resultVectorSize = resultType.getNumElements();
+ SmallVector<int32_t> interleaveShuffleMask;
+ interleaveShuffleMask.reserve(resultVectorSize);
+ for (int i = 0; i < resultVectorSize / 2; i++) {
+ interleaveShuffleMask.push_back(i);
+ interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+ }
+ rewriter.replaceOpWithNewOp<LLVM::ShuffleVectorOp>(
+ interleaveOp, adaptor.getLhs(), adaptor.getRhs(),
+ interleaveShuffleMask);
+ return success();
+ }
+
+ // It's not possible to unroll a scalable dimension.
+ if (resultType.getScalableDims().front())
+ return failure();
+
+ // n-D case: Unroll the leading dimension.
+ // This eventually converges to an LLVM lowering.
+ auto loc = interleaveOp.getLoc();
+ Value result = rewriter.create<arith::ConstantOp>(
+ loc, resultType, rewriter.getZeroAttr(resultType));
+ for (int d = 0; d < resultType.getDimSize(0); d++) {
+ Value extractLhs =
+ rewriter.create<ExtractOp>(loc, interleaveOp.getLhs(), d);
+ Value extractRhs =
+ rewriter.create<ExtractOp>(loc, interleaveOp.getRhs(), d);
+ Value dimInterleave =
+ rewriter.create<InterleaveOp>(loc, extractLhs, extractRhs);
+ result = rewriter.create<InsertOp>(loc, dimInterleave, result, d);
+ }
+
+ rewriter.replaceOp(interleaveOp, result);
+ return success();
+ }
+};
+
} // namespace
/// Populate the given list with patterns that convert from Vector to LLVM.
@@ -1758,7 +1822,8 @@ void mlir::populateVectorToLLVMConversionPatterns(
VectorExpandLoadOpConversion, VectorCompressStoreOpConversion,
VectorSplatOpLowering, VectorSplatNdOpLowering,
VectorScalableInsertOpLowering, VectorScalableExtractOpLowering,
- MaskedReductionOpConversion>(converter);
+ MaskedReductionOpConversion, VectorInterleaveOpLowering>(
+ converter);
// Transfer ops with rank > 1 are handled by VectorToSCF.
populateVectorTransferLoweringPatterns(patterns, /*maxTransferRank=*/1);
}
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 452354413e883..8aabc35f4c265 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -6308,6 +6308,48 @@ bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {
verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));
}
+//===----------------------------------------------------------------------===//
+// InterleaveOp
+//===----------------------------------------------------------------------===//
+
+// The rank 1 case of vector.interleave on fixed-size vectors is equivalent to a
+// vector.shuffle, which (as an older op) is more likely to be matched by
+// existing pipelines.
+struct FoldRank1FixedSizeInterleaveOp : public OpRewritePattern<InterleaveOp> {
+ using OpRewritePattern::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(InterleaveOp interleaveOp,
+ PatternRewriter &rewriter) const override {
+ auto resultType = interleaveOp.getResultVectorType();
+ if (resultType.getRank() != 1)
+ return rewriter.notifyMatchFailure(
+ interleaveOp, "cannot fold interleave with result rank > 1");
+
+ if (resultType.isScalable())
+ return rewriter.notifyMatchFailure(
+ interleaveOp, "cannot fold interleave of scalable vectors");
+
+ int64_t resultVectorSize = resultType.getNumElements();
+ SmallVector<int64_t> interleaveShuffleMask;
+ interleaveShuffleMask.reserve(resultVectorSize);
+ for (int i = 0; i < resultVectorSize / 2; i++) {
+ interleaveShuffleMask.push_back(i);
+ interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+ }
+
+ rewriter.replaceOpWithNewOp<ShuffleOp>(interleaveOp, interleaveOp.getLhs(),
+ interleaveOp.getRhs(),
+ interleaveShuffleMask);
+
+ return success();
+ }
+};
+
+void InterleaveOp::getCanonicalizationPatterns(RewritePatternSet &results,
+ MLIRContext *context) {
+ results.add<FoldRank1FixedSizeInterleaveOp>(context);
+}
+
Value mlir::vector::makeArithReduction(OpBuilder &b, Location loc,
CombiningKind kind, Value v1, Value acc,
arith::FastMathFlagsAttr fastmath,
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 1c13b16dfd9af..3cbca65472fb6 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -2460,3 +2460,88 @@ func.func @make_fixed_vector_of_scalable_vector(%f : f64) -> vector<3x[2]xf64>
%res = vector.broadcast %f : f64 to vector<3x[2]xf64>
return %res : vector<3x[2]xf64>
}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_0d
+// CHECK-SAME: %[[LHS:.*]]: vector<i8>, %[[RHS:.*]]: vector<i8>)
+func.func @vector_interleave_0d(%a: vector<i8>, %b: vector<i8>) -> vector<2xi8> {
+ // CHECK: %[[LHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<i8> to vector<1xi8>
+ // CHECK: %[[RHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<i8> to vector<1xi8>
+ // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS_RANK1]], %[[RHS_RANK1]] [0, 1] : vector<1xi8>
+ // CHECK: return %[[ZIP]]
+ %0 = vector.interleave %a, %b : vector<i8>
+ return %0 : vector<2xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d
+// CHECK-SAME: %[[LHS:.*]]: vector<8xf32>, %[[RHS:.*]]: vector<8xf32>)
+func.func @vector_interleave_1d(%a: vector<8xf32>, %b: vector<8xf32>) -> vector<16xf32>
+{
+ // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS]], %[[RHS]] [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>
+ // CHECK: return %[[ZIP]]
+ %0 = vector.interleave %a, %b : vector<8xf32>
+ return %0 : vector<16xf32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d_scalable
+// CHECK-SAME: %[[LHS:.*]]: vector<[4]xi32>, %[[RHS:.*]]: vector<[4]xi32>)
+func.func @vector_interleave_1d_scalable(%a: vector<[4]xi32>, %b: vector<[4]xi32>) -> vector<[8]xi32>
+{
+ // CHECK: %[[ZIP:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS]], %[[RHS]]) : (vector<[4]xi32>, vector<[4]xi32>) -> vector<[8]xi32>
+ // CHECK: return %[[ZIP]]
+ %0 = vector.interleave %a, %b : vector<[4]xi32>
+ return %0 : vector<[8]xi32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d
+// CHECK-SAME: %[[LHS:.*]]: vector<2x3xi8>, %[[RHS:.*]]: vector<2x3xi8>)
+func.func @vector_interleave_2d(%a: vector<2x3xi8>, %b: vector<2x3xi8>) -> vector<2x6xi8>
+{
+ // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x6xi8>
+ // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x6xi8> to !llvm.array<2 x vector<6xi8>>
+ // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[ZIM_DIM_0:.*]] = llvm.shufflevector %[[LHS_DIM_0]], %[[RHS_DIM_0]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+ // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<6xi8>>
+ // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %[[LHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %[[RHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+ // CHECK: %[[ZIM_DIM_1:.*]] = llvm.shufflevector %[[LHS_DIM_1]], %[[RHS_DIM_1]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+ // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIM_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<6xi8>>
+ // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<6xi8>> to vector<2x6xi8>
+ // CHECK: return %[[RES]]
+ %0 = vector.interleave %a, %b : vector<2x3xi8>
+ return %0 : vector<2x6xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d_scalable
+// CHECK-SAME: %[[LHS:.*]]: vector<2x[8]xi16>, %[[RHS:.*]]: vector<2x[8]xi16>)
+func.func @vector_interleave_2d_scalable(%a: vector<2x[8]xi16>, %b: vector<2x[8]xi16>) -> vector<2x[16]xi16>
+{
+ // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg0 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg1 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x[16]xi16>
+ // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x[16]xi16> to !llvm.array<2 x vector<[16]xi16>>
+ // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[ZIM_DIM_0:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_0]], %[[RHS_DIM_0]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+ // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<[16]xi16>>
+ // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %0[1] : !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %1[1] : !llvm.array<2 x vector<[8]xi16>>
+ // CHECK: %[[ZIP_DIM_1:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_1]], %[[RHS_DIM_1]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+ // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIP_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<[16]xi16>>
+ // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<[16]xi16>> to vector<2x[16]xi16>
+ // CHECK: return %[[RES]]
+ %0 = vector.interleave %a, %b : vector<2x[8]xi16>
+ return %0 : vector<2x[16]xi16>
+}
diff --git a/mlir/test/Dialect/Vector/canonicalize.mlir b/mlir/test/Dialect/Vector/canonicalize.mlir
index e6f045e12e519..490ee6a462c6a 100644
--- a/mlir/test/Dialect/Vector/canonicalize.mlir
+++ b/mlir/test/Dialect/Vector/canonicalize.mlir
@@ -2567,3 +2567,26 @@ func.func @load_store_forwarding_rank_mismatch(%v0: vector<4x1x1xf32>, %arg0: te
tensor<4x4x4xf32>, vector<1x100x4x5xf32>
return %r : vector<1x100x4x5xf32>
}
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_1_vector_interleave(
+// CHECK-SAME: %[[LHS:.*]]: vector<6xi32>, %[[RHS:.*]]: vector<6xi32>)
+func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
+ // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
+ // CHECK: return %[[ZIP]] : vector<12xi32>
+ %0 = vector.interleave %arg0, %arg1 : vector<6xi32>
+ return %0 : vector<12xi32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_0_vector_interleave(
+// CHECK-SAME: %[[LHS:.*]]: vector<f64>, %[[RHS:.*]]: vector<f64>)
+func.func @fold_rank_0_vector_interleave(%arg0: vector<f64>, %arg1: vector<f64>) -> vector<2xf64>
+{
+ // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 1] : vector<f64>, vector<f64>
+ // CHECK: return %[[ZIP]] : vector<2xf64>
+ %0 = vector.interleave %arg0, %arg1 : vector<f64>
+ return %0 : vector<2xf64>
+}
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
new file mode 100644
index 0000000000000..58dd3d700beff
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
@@ -0,0 +1,25 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: %mcr_aarch64_cmd -e entry -entry-point-result=void \
+// RUN: -shared-libs=%mlir_c_runner_utils,%mlir_arm_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+ %f1 = arith.constant 1.0: f32
+ %f2 = arith.constant 2.0: f32
+ %v1 = vector.splat %f1 : vector<[4]xf32>
+ %v2 = vector.splat %f2 : vector<[4]xf32>
+ vector.print %v1 : vector<[4]xf32>
+ vector.print %v2 : vector<[4]xf32>
+ //
+ // Test vectors:
+ //
+ // CHECK: ( 1, 1, 1, 1
+ // CHECK: ( 2, 2, 2, 2
+
+ %v3 = vector.interleave %v1, %v2 : vector<[4]xf32>
+ vector.print %v3 : vector<[8]xf32>
+ // CHECK: ( 1, 2, 1, 2, 1, 2, 1, 2
+
+ return
+}
+
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
new file mode 100644
index 0000000000000..c6dd6287208d4
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
@@ -0,0 +1,24 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
+// RUN: -shared-libs=%mlir_c_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+ %f1 = arith.constant 1.0: f32
+ %f2 = arith.constant 2.0: f32
+ %v1 = vector.splat %f1 : vector<2x4xf32>
+ %v2 = vector.splat %f2 : vector<2x4xf32>
+ vector.print %v1 : vector<2x4xf32>
+ vector.print %v2 : vector<2x4xf32>
+ //
+ // Test vectors:
+ //
+ // CHECK: ( ( 1, 1, 1, 1 ), ( 1, 1, 1, 1 ) )
+ // CHECK: ( ( 2, 2, 2, 2 ), ( 2, 2, 2, 2 ) )
+
+ %v3 = vector.interleave %v1, %v2 : vector<2x4xf32>
+ vector.print %v3 : vector<2x8xf32>
+ // CHECK: ( ( 1, 2, 1, 2, 1, 2, 1, 2 ), ( 1, 2, 1, 2, 1, 2, 1, 2 ) )
+
+ return
+}
|
Thanks! It looks great!
I was thinking that we could return two vector results, one for the high and one for the low part of the interleave. However, I think keeping them together makes more sense at this level of abstraction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! A few comments
returning a new vector where the trailing dimension is twice the size. | ||
|
||
Note that for the n-D case this differs from the interleaving possible with | ||
`vector.shuffle`, which would only operate on the leading dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we didn't even support n-D cases in vector.shuffle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does (as can be seen in the examples at: https://mlir.llvm.org/docs/Dialects/Vector/#vectorshuffle-vectorshuffleop), it just does not do what you'd want 😔
} | ||
|
||
// It's not possible to unroll a scalable dimension. | ||
if (resultType.getScalableDims().front()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check correct? If so, could we make it easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct :)
This just looks at the scalable flag for the front dimension (which is the dimension that is unrolled below), and checks if it is true (i.e. scalable). If it is scalable it can't be unrolled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help if we add a comment here saying that we are unrolling only the leading dimension because this pattern is applied recursively. Even better, we could unroll all the dimensions at once to avoid recursion and applying partial unrolling when the rewrite may eventually fail. We could separate the base case in one pattern and the unrolling one in another pattern if that makes it clearer. We follow that approach often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split out the unrolling to LowerVectorInterleave
. Note: An incomplete unrolling is not a bug, like if it was vector vector.interleave %a, %b : vector<2x[4]x[4]xf32>
and it unrolled to vector.interleave %a, %b : vector<[4]x[4]xf32>
, the general LLVM lowering can't handle that, but maybe something like ArmSME could then lower that to something architecture specific. The current ArmSME lowerings rely on this quite a bit.
func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> { | ||
// CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32> | ||
// CHECK: return %[[ZIP]] : vector<12xi32> | ||
%0 = vector.interleave %arg0, %arg1 : vector<6xi32> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?
I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find showing all the types here very repetitive, and less consistent with vector.shuffle
which also infers the result.
func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> { | ||
// CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32> | ||
// CHECK: return %[[ZIP]] : vector<12xi32> | ||
%0 = vector.interleave %arg0, %arg1 : vector<6xi32> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?
I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
Outdated
Show resolved
Hide resolved
The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size. Note that for the n-D case this differs from the interleaving possible with `vector.shuffle`, which would only operate on the leading dimension. Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable. Example: ```mlir %0 = vector.interleave %a, %b : vector<[4]xi32> ; yields vector<[8]xi32> %1 = vector.interleave %c, %d : vector<8xi8> ; yields vector<16xi8> %2 = vector.interleave %e, %f : vector<f16> ; yields vector<2xf16> %3 = vector.interleave %g, %h : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64> %4 = vector.interleave %i, %j : vector<6x3xf32> ; yields vector<6x6xf32> ```
3cc2fb2
to
795f4b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is not just adding vector.interleave
operation. From my point of view, it has three changes:
- Add
vector.interleave
op. - Add support for convert
vector.interleave
to llvm dialect. - Add support for unrolling
vector.interleave
op.
Ideally we shoul split it into three PRs, which is easier for reviewers. It's okay for this one because it's already reviewed. Can you add more context to the PR description?
@MacDue I would also find it easier to review if this was split :) (sorry, been away and smaller patches just make it easier to catch-up) |
Would you both be okay with me just splitting this into multiple commits within this PR? I find it becomes much more of a hassle then I've got 3+ branches and PRs to manage. |
New PRs:
I've left the later PRs as drafts as it's easier to review/land them in order :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if splitting the PR made the review simpler or harder given how bad github deals with stacked PRs :_(
The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size.
Note that for the n-D case this differs from the interleaving possible with
vector.shuffle
, which would only operate on the leading dimension.Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable.
Example: