Skip to content

[mlir][VectorOps] Add vector.interleave operation #80315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Feb 1, 2024

The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible with vector.shuffle, which would only operate on the leading dimension.

Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable.

Example:

%0 = vector.interleave %a, %b
            : vector<[4]xi32>     ; yields vector<[8]xi32>
%1 = vector.interleave %c, %d
            : vector<8xi8>        ; yields vector<16xi8>
%2 = vector.interleave %e, %f
            : vector<f16>         ; yields vector<2xf16>
%3 = vector.interleave %g, %h
            : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
%4 = vector.interleave %i, %j
            : vector<6x3xf32>     ; yields vector<6x6xf32>

@llvmbot
Copy link
Member

llvmbot commented Feb 1, 2024

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-sve

@llvm/pr-subscribers-mlir

Author: Benjamin Maxwell (MacDue)

Changes

The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible with vector.shuffle, which would only operate on the leading dimension.

Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable.

Example:

%0 = vector.interleave %a, %b
            : vector&lt;[4]xi32&gt;     ; yields vector&lt;[8]xi32&gt;
%1 = vector.interleave %c, %d
            : vector&lt;8xi8&gt;        ; yields vector&lt;16xi8&gt;
%2 = vector.interleave %e, %f
            : vector&lt;f16&gt;         ; yields vector&lt;2xf16&gt;
%3 = vector.interleave %g, %h
            : vector&lt;2x4x[2]xf64&gt; ; yields vector&lt;2x4x[4]xf64&gt;
%4 = vector.interleave %i, %j
            : vector&lt;6x3xf32&gt;     ; yields vector&lt;6x6xf32&gt;

Full diff: https://github.com/llvm/llvm-project/pull/80315.diff

7 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Vector/IR/VectorOps.td (+65)
  • (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+66-1)
  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+42)
  • (modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir (+85)
  • (modified) mlir/test/Dialect/Vector/canonicalize.mlir (+23)
  • (added) mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir (+25)
  • (added) mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir (+24)
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index fdf51f0173511..c891c1b978b54 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -478,6 +478,71 @@ def Vector_ShuffleOp :
   let hasCanonicalizer = 1;
 }
 
+def Vector_InterleaveOp :
+  Vector_Op<"interleave", [Pure,
+    AllTypesMatch<["lhs", "rhs"]>,
+    TypesMatchWith<
+    "type of 'result' is double the width of the inputs",
+    "lhs", "result",
+    [{
+      [&]() -> ::mlir::VectorType {
+        auto vectorType = ::llvm::cast<mlir::VectorType>($_self);
+        ::mlir::VectorType::Builder builder(vectorType);
+        if (vectorType.getRank() == 0) {
+          static constexpr int64_t v2xty_shape[] = { 2 };
+          return builder.setShape(v2xty_shape);
+        }
+        auto lastDim = vectorType.getRank() - 1;
+        return builder.setDim(lastDim, vectorType.getDimSize(lastDim) * 2);
+      }()
+    }]>]> {
+  let summary = "constructs a vector by interleaving two input vectors";
+  let description = [{
+    The interleave operation constructs a new vector by interleaving the
+    elements from the trailing (or final) dimension of two input vectors,
+    returning a new vector where the trailing dimension is twice the size.
+
+    Note that for the n-D case this differs from the interleaving possible with
+    `vector.shuffle`, which would only operate on the leading dimension.
+
+    Another key difference is this operation supports scalable vectors, though
+    currently a general LLVM lowering is limited to the case where only the
+    trailing dimension is scalable.
+
+    Example:
+    ```mlir
+    %0 = vector.interleave %a, %b
+               : vector<[4]xi32>     ; yields vector<[8]xi32>
+    %1 = vector.interleave %c, %d
+               : vector<8xi8>        ; yields vector<16xi8>
+    %2 = vector.interleave %e, %f
+               : vector<f16>         ; yields vector<2xf16>
+    %3 = vector.interleave %g, %h
+               : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
+    %4 = vector.interleave %i, %j
+               : vector<6x3xf32>     ; yields vector<6x6xf32>
+    ```
+  }];
+
+  let arguments = (ins AnyVectorOfAnyRank:$lhs, AnyVectorOfAnyRank:$rhs);
+  let results = (outs AnyVector:$result);
+
+  let assemblyFormat = [{
+    $lhs `,` $rhs  attr-dict `:` type($lhs)
+  }];
+
+  let extraClassDeclaration = [{
+    VectorType getSourceVectorType() {
+      return ::llvm::cast<VectorType>(getLhs().getType());
+    }
+    VectorType getResultVectorType() {
+      return ::llvm::cast<VectorType>(getResult().getType());
+    }
+  }];
+
+  let hasCanonicalizer = 1;
+}
+
 def Vector_ExtractElementOp :
   Vector_Op<"extractelement", [Pure,
      TypesMatchWith<"result type matches element type of vector operand",
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index b66b55ae8d57f..4dc62608d1b92 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1734,6 +1734,70 @@ struct VectorSplatNdOpLowering : public ConvertOpToLLVMPattern<SplatOp> {
   }
 };
 
+struct VectorInterleaveOpLowering
+    : public ConvertOpToLLVMPattern<vector::InterleaveOp> {
+  using ConvertOpToLLVMPattern::ConvertOpToLLVMPattern;
+
+  void initialize() {
+    // This pattern recursively unpacks one dimension at a time. The recursion
+    // bounded as the rank is strictly decreasing.
+    setHasBoundedRewriteRecursion();
+  }
+
+  LogicalResult
+  matchAndRewrite(vector::InterleaveOp interleaveOp, OpAdaptor adaptor,
+                  ConversionPatternRewriter &rewriter) const override {
+    VectorType resultType = interleaveOp.getResultVectorType();
+
+    // If the result is rank 1, then this directly maps to LLVM.
+    if (resultType.getRank() == 1) {
+      if (resultType.isScalable()) {
+        rewriter.replaceOpWithNewOp<LLVM::experimental_vector_interleave2>(
+            interleaveOp, typeConverter->convertType(resultType),
+            adaptor.getLhs(), adaptor.getRhs());
+        return success();
+      }
+      // Lower fixed-size interleaves to a shufflevector. While the
+      // vector.interleave2 intrinsic supports fixed and scalable vectors, the
+      // langref still recommends fixed-vectors use shufflevector, see:
+      // https://llvm.org/docs/LangRef.html#id876.
+      int64_t resultVectorSize = resultType.getNumElements();
+      SmallVector<int32_t> interleaveShuffleMask;
+      interleaveShuffleMask.reserve(resultVectorSize);
+      for (int i = 0; i < resultVectorSize / 2; i++) {
+        interleaveShuffleMask.push_back(i);
+        interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+      }
+      rewriter.replaceOpWithNewOp<LLVM::ShuffleVectorOp>(
+          interleaveOp, adaptor.getLhs(), adaptor.getRhs(),
+          interleaveShuffleMask);
+      return success();
+    }
+
+    // It's not possible to unroll a scalable dimension.
+    if (resultType.getScalableDims().front())
+      return failure();
+
+    // n-D case: Unroll the leading dimension.
+    // This eventually converges to an LLVM lowering.
+    auto loc = interleaveOp.getLoc();
+    Value result = rewriter.create<arith::ConstantOp>(
+        loc, resultType, rewriter.getZeroAttr(resultType));
+    for (int d = 0; d < resultType.getDimSize(0); d++) {
+      Value extractLhs =
+          rewriter.create<ExtractOp>(loc, interleaveOp.getLhs(), d);
+      Value extractRhs =
+          rewriter.create<ExtractOp>(loc, interleaveOp.getRhs(), d);
+      Value dimInterleave =
+          rewriter.create<InterleaveOp>(loc, extractLhs, extractRhs);
+      result = rewriter.create<InsertOp>(loc, dimInterleave, result, d);
+    }
+
+    rewriter.replaceOp(interleaveOp, result);
+    return success();
+  }
+};
+
 } // namespace
 
 /// Populate the given list with patterns that convert from Vector to LLVM.
@@ -1758,7 +1822,8 @@ void mlir::populateVectorToLLVMConversionPatterns(
                VectorExpandLoadOpConversion, VectorCompressStoreOpConversion,
                VectorSplatOpLowering, VectorSplatNdOpLowering,
                VectorScalableInsertOpLowering, VectorScalableExtractOpLowering,
-               MaskedReductionOpConversion>(converter);
+               MaskedReductionOpConversion, VectorInterleaveOpLowering>(
+      converter);
   // Transfer ops with rank > 1 are handled by VectorToSCF.
   populateVectorTransferLoweringPatterns(patterns, /*maxTransferRank=*/1);
 }
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 452354413e883..8aabc35f4c265 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -6308,6 +6308,48 @@ bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {
       verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));
 }
 
+//===----------------------------------------------------------------------===//
+// InterleaveOp
+//===----------------------------------------------------------------------===//
+
+// The rank 1 case of vector.interleave on fixed-size vectors is equivalent to a
+// vector.shuffle, which (as an older op) is more likely to be matched by
+// existing pipelines.
+struct FoldRank1FixedSizeInterleaveOp : public OpRewritePattern<InterleaveOp> {
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(InterleaveOp interleaveOp,
+                                PatternRewriter &rewriter) const override {
+    auto resultType = interleaveOp.getResultVectorType();
+    if (resultType.getRank() != 1)
+      return rewriter.notifyMatchFailure(
+          interleaveOp, "cannot fold interleave with result rank > 1");
+
+    if (resultType.isScalable())
+      return rewriter.notifyMatchFailure(
+          interleaveOp, "cannot fold interleave of scalable vectors");
+
+    int64_t resultVectorSize = resultType.getNumElements();
+    SmallVector<int64_t> interleaveShuffleMask;
+    interleaveShuffleMask.reserve(resultVectorSize);
+    for (int i = 0; i < resultVectorSize / 2; i++) {
+      interleaveShuffleMask.push_back(i);
+      interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+    }
+
+    rewriter.replaceOpWithNewOp<ShuffleOp>(interleaveOp, interleaveOp.getLhs(),
+                                           interleaveOp.getRhs(),
+                                           interleaveShuffleMask);
+
+    return success();
+  }
+};
+
+void InterleaveOp::getCanonicalizationPatterns(RewritePatternSet &results,
+                                               MLIRContext *context) {
+  results.add<FoldRank1FixedSizeInterleaveOp>(context);
+}
+
 Value mlir::vector::makeArithReduction(OpBuilder &b, Location loc,
                                        CombiningKind kind, Value v1, Value acc,
                                        arith::FastMathFlagsAttr fastmath,
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 1c13b16dfd9af..3cbca65472fb6 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -2460,3 +2460,88 @@ func.func @make_fixed_vector_of_scalable_vector(%f : f64) -> vector<3x[2]xf64>
   %res = vector.broadcast %f : f64 to vector<3x[2]xf64>
   return %res : vector<3x[2]xf64>
 }
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_0d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<i8>, %[[RHS:.*]]: vector<i8>)
+func.func @vector_interleave_0d(%a: vector<i8>, %b: vector<i8>) -> vector<2xi8> {
+  // CHECK: %[[LHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<i8> to vector<1xi8>
+  // CHECK: %[[RHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<i8> to vector<1xi8>
+  // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS_RANK1]], %[[RHS_RANK1]] [0, 1] : vector<1xi8>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<i8>
+  return %0 : vector<2xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<8xf32>, %[[RHS:.*]]: vector<8xf32>)
+func.func @vector_interleave_1d(%a: vector<8xf32>, %b: vector<8xf32>) -> vector<16xf32>
+{
+  // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS]], %[[RHS]] [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<8xf32>
+  return %0 : vector<16xf32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d_scalable
+//  CHECK-SAME:     %[[LHS:.*]]: vector<[4]xi32>, %[[RHS:.*]]: vector<[4]xi32>)
+func.func @vector_interleave_1d_scalable(%a: vector<[4]xi32>, %b: vector<[4]xi32>) -> vector<[8]xi32>
+{
+  // CHECK: %[[ZIP:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS]], %[[RHS]]) : (vector<[4]xi32>, vector<[4]xi32>) -> vector<[8]xi32>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<[4]xi32>
+  return %0 : vector<[8]xi32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<2x3xi8>, %[[RHS:.*]]: vector<2x3xi8>)
+func.func @vector_interleave_2d(%a: vector<2x3xi8>, %b: vector<2x3xi8>) -> vector<2x6xi8>
+{
+  // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x6xi8>
+  // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x6xi8> to !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[ZIM_DIM_0:.*]] = llvm.shufflevector %[[LHS_DIM_0]], %[[RHS_DIM_0]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+  // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %[[LHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %[[RHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[ZIM_DIM_1:.*]] = llvm.shufflevector %[[LHS_DIM_1]], %[[RHS_DIM_1]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+  // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIM_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<6xi8>> to vector<2x6xi8>
+  // CHECK: return %[[RES]]
+  %0 = vector.interleave %a, %b : vector<2x3xi8>
+  return %0 : vector<2x6xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d_scalable
+//  CHECK-SAME:     %[[LHS:.*]]: vector<2x[8]xi16>, %[[RHS:.*]]: vector<2x[8]xi16>)
+func.func @vector_interleave_2d_scalable(%a: vector<2x[8]xi16>, %b: vector<2x[8]xi16>) -> vector<2x[16]xi16>
+{
+  // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg0 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg1 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x[16]xi16>
+  // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x[16]xi16> to !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[ZIM_DIM_0:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_0]], %[[RHS_DIM_0]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+  // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %0[1] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %1[1] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[ZIP_DIM_1:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_1]], %[[RHS_DIM_1]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+  // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIP_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<[16]xi16>> to vector<2x[16]xi16>
+  // CHECK: return %[[RES]]
+  %0 = vector.interleave %a, %b : vector<2x[8]xi16>
+  return %0 : vector<2x[16]xi16>
+}
diff --git a/mlir/test/Dialect/Vector/canonicalize.mlir b/mlir/test/Dialect/Vector/canonicalize.mlir
index e6f045e12e519..490ee6a462c6a 100644
--- a/mlir/test/Dialect/Vector/canonicalize.mlir
+++ b/mlir/test/Dialect/Vector/canonicalize.mlir
@@ -2567,3 +2567,26 @@ func.func @load_store_forwarding_rank_mismatch(%v0: vector<4x1x1xf32>, %arg0: te
       tensor<4x4x4xf32>, vector<1x100x4x5xf32>
   return %r : vector<1x100x4x5xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_1_vector_interleave(
+//  CHECK-SAME:     %[[LHS:.*]]: vector<6xi32>, %[[RHS:.*]]: vector<6xi32>)
+func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
+  // CHECK: return %[[ZIP]] : vector<12xi32>
+  %0 = vector.interleave %arg0, %arg1 : vector<6xi32>
+  return %0 : vector<12xi32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_0_vector_interleave(
+//  CHECK-SAME:     %[[LHS:.*]]: vector<f64>, %[[RHS:.*]]: vector<f64>)
+func.func @fold_rank_0_vector_interleave(%arg0: vector<f64>, %arg1: vector<f64>) -> vector<2xf64>
+{
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 1] : vector<f64>, vector<f64>
+  // CHECK: return %[[ZIP]] : vector<2xf64>
+  %0 = vector.interleave %arg0, %arg1 : vector<f64>
+  return %0 : vector<2xf64>
+}
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
new file mode 100644
index 0000000000000..58dd3d700beff
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
@@ -0,0 +1,25 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: %mcr_aarch64_cmd -e entry -entry-point-result=void  \
+// RUN:   -shared-libs=%mlir_c_runner_utils,%mlir_arm_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+  %f1 = arith.constant 1.0: f32
+  %f2 = arith.constant 2.0: f32
+  %v1 = vector.splat %f1 : vector<[4]xf32>
+  %v2 = vector.splat %f2 :  vector<[4]xf32>
+  vector.print %v1 : vector<[4]xf32>
+  vector.print %v2 : vector<[4]xf32>
+  //
+  // Test vectors:
+  //
+  // CHECK: ( 1, 1, 1, 1
+  // CHECK: ( 2, 2, 2, 2
+
+  %v3 = vector.interleave %v1, %v2 : vector<[4]xf32>
+  vector.print %v3 : vector<[8]xf32>
+  // CHECK: ( 1, 2, 1, 2, 1, 2, 1, 2
+
+  return
+}
+
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
new file mode 100644
index 0000000000000..c6dd6287208d4
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
@@ -0,0 +1,24 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: mlir-cpu-runner -e entry -entry-point-result=void  \
+// RUN:   -shared-libs=%mlir_c_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+  %f1 = arith.constant 1.0: f32
+  %f2 = arith.constant 2.0: f32
+  %v1 = vector.splat %f1 : vector<2x4xf32>
+  %v2 = vector.splat %f2 :  vector<2x4xf32>
+  vector.print %v1 : vector<2x4xf32>
+  vector.print %v2 : vector<2x4xf32>
+  //
+  // Test vectors:
+  //
+  // CHECK: ( ( 1, 1, 1, 1 ), ( 1, 1, 1, 1 ) )
+  // CHECK: ( ( 2, 2, 2, 2 ), ( 2, 2, 2, 2 ) )
+
+  %v3 = vector.interleave %v1, %v2 : vector<2x4xf32>
+  vector.print %v3 : vector<2x8xf32>
+  // CHECK: ( ( 1, 2, 1, 2, 1, 2, 1, 2 ), ( 1, 2, 1, 2, 1, 2, 1, 2 ) )
+
+  return
+}

@dcaballe
Copy link
Contributor

dcaballe commented Feb 1, 2024

Thanks! It looks great!

%0 = vector.interleave %a, %b
: vector<[4]xi32> ; yields vector<[8]xi32>

I was thinking that we could return two vector results, one for the high and one for the low part of the interleave. However, I think keeping them together makes more sense at this level of abstraction.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! A few comments

returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible with
`vector.shuffle`, which would only operate on the leading dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we didn't even support n-D cases in vector.shuffle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does (as can be seen in the examples at: https://mlir.llvm.org/docs/Dialects/Vector/#vectorshuffle-vectorshuffleop), it just does not do what you'd want 😔

}

// It's not possible to unroll a scalable dimension.
if (resultType.getScalableDims().front())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this check correct? If so, could we make it easier to understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct :)
This just looks at the scalable flag for the front dimension (which is the dimension that is unrolled below), and checks if it is true (i.e. scalable). If it is scalable it can't be unrolled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help if we add a comment here saying that we are unrolling only the leading dimension because this pattern is applied recursively. Even better, we could unroll all the dimensions at once to avoid recursion and applying partial unrolling when the rewrite may eventually fail. We could separate the base case in one pattern and the unrolling one in another pattern if that makes it clearer. We follow that approach often.

Copy link
Member Author

@MacDue MacDue Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split out the unrolling to LowerVectorInterleave. Note: An incomplete unrolling is not a bug, like if it was vector vector.interleave %a, %b : vector<2x[4]x[4]xf32> and it unrolled to vector.interleave %a, %b : vector<[4]x[4]xf32>, the general LLVM lowering can't handle that, but maybe something like ArmSME could then lower that to something architecture specific. The current ArmSME lowerings rely on this quite a bit.

func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
// CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
// CHECK: return %[[ZIP]] : vector<12xi32>
%0 = vector.interleave %arg0, %arg1 : vector<6xi32>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find showing all the types here very repetitive, and less consistent with vector.shuffle which also infers the result.

func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
// CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
// CHECK: return %[[ZIP]] : vector<12xi32>
%0 = vector.interleave %arg0, %arg1 : vector<6xi32>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.

The interleave operation constructs a new vector by interleaving the
elements from the trailing (or final) dimension of two input vectors,
returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible
with `vector.shuffle`, which would only operate on the leading dimension.

Another key difference is this operation supports scalable vectors,
though currently a general LLVM lowering is limited to the case where
only the trailing dimension is scalable.

Example:
```mlir
%0 = vector.interleave %a, %b
            : vector<[4]xi32>     ; yields vector<[8]xi32>
%1 = vector.interleave %c, %d
            : vector<8xi8>        ; yields vector<16xi8>
%2 = vector.interleave %e, %f
            : vector<f16>         ; yields vector<2xf16>
%3 = vector.interleave %g, %h
            : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
%4 = vector.interleave %i, %j
            : vector<6x3xf32>     ; yields vector<6x6xf32>
```
- Remove vector.interleave -> vector.shuffle canonicalization
- Add vector.shuffle -> vector.interleave canonicalization
- Split vector.interleave unrolling and LLVM lowering
  - Unrolling now done in LowerVectorInterleave.cpp
- Add missing tests to vector ops.mlir
- Fixed a few nits
@MacDue MacDue force-pushed the add_vector.interleave branch from 3cc2fb2 to 795f4b8 Compare February 6, 2024 12:34
@MacDue MacDue requested a review from hanhanW as a code owner February 6, 2024 12:34
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is not just adding vector.interleave operation. From my point of view, it has three changes:

  1. Add vector.interleave op.
  2. Add support for convert vector.interleave to llvm dialect.
  3. Add support for unrolling vector.interleave op.

Ideally we shoul split it into three PRs, which is easier for reviewers. It's okay for this one because it's already reviewed. Can you add more context to the PR description?

@banach-space
Copy link
Contributor

I think this PR is not just adding vector.interleave operation. From my point of view, it has three changes:

  1. Add vector.interleave op.
  2. Add support for convert vector.interleave to llvm dialect.
  3. Add support for unrolling vector.interleave op.

Ideally we shoul split it into three PRs, which is easier for reviewers. It's okay for this one because it's already reviewed. Can you add more context to the PR description?

@MacDue I would also find it easier to review if this was split :) (sorry, been away and smaller patches just make it easier to catch-up)

@MacDue
Copy link
Member Author

MacDue commented Feb 6, 2024

Would you both be okay with me just splitting this into multiple commits within this PR? I find it becomes much more of a hassle then I've got 3+ branches and PRs to manage.

@MacDue MacDue closed this Feb 7, 2024
@MacDue MacDue deleted the add_vector.interleave branch February 7, 2024 10:28
Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if splitting the PR made the review simpler or harder given how bad github deals with stacked PRs :_(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants