[mlir][VectorOps] Add `vector.interleave` operation #80315

MacDue · 2024-02-01T18:03:10Z

The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible with vector.shuffle, which would only operate on the leading dimension.

Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable.

Example:

%0 = vector.interleave %a, %b
            : vector<[4]xi32>     ; yields vector<[8]xi32>
%1 = vector.interleave %c, %d
            : vector<8xi8>        ; yields vector<16xi8>
%2 = vector.interleave %e, %f
            : vector<f16>         ; yields vector<2xf16>
%3 = vector.interleave %g, %h
            : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
%4 = vector.interleave %i, %j
            : vector<6x3xf32>     ; yields vector<6x6xf32>

llvmbot · 2024-02-01T18:03:41Z

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-sve

@llvm/pr-subscribers-mlir

Author: Benjamin Maxwell (MacDue)

Changes

The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size.

Note that for the n-D case this differs from the interleaving possible with vector.shuffle, which would only operate on the leading dimension.

Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable.

Example:

%0 = vector.interleave %a, %b
            : vector&lt;[4]xi32&gt;     ; yields vector&lt;[8]xi32&gt;
%1 = vector.interleave %c, %d
            : vector&lt;8xi8&gt;        ; yields vector&lt;16xi8&gt;
%2 = vector.interleave %e, %f
            : vector&lt;f16&gt;         ; yields vector&lt;2xf16&gt;
%3 = vector.interleave %g, %h
            : vector&lt;2x4x[2]xf64&gt; ; yields vector&lt;2x4x[4]xf64&gt;
%4 = vector.interleave %i, %j
            : vector&lt;6x3xf32&gt;     ; yields vector&lt;6x6xf32&gt;

Full diff: https://github.com/llvm/llvm-project/pull/80315.diff

7 Files Affected:

(modified) mlir/include/mlir/Dialect/Vector/IR/VectorOps.td (+65)
(modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+66-1)
(modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+42)
(modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir (+85)
(modified) mlir/test/Dialect/Vector/canonicalize.mlir (+23)
(added) mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir (+25)
(added) mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir (+24)

diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index fdf51f0173511..c891c1b978b54 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -478,6 +478,71 @@ def Vector_ShuffleOp :
   let hasCanonicalizer = 1;
 }
 
+def Vector_InterleaveOp :
+  Vector_Op<"interleave", [Pure,
+    AllTypesMatch<["lhs", "rhs"]>,
+    TypesMatchWith<
+    "type of 'result' is double the width of the inputs",
+    "lhs", "result",
+    [{
+      [&]() -> ::mlir::VectorType {
+        auto vectorType = ::llvm::cast<mlir::VectorType>($_self);
+        ::mlir::VectorType::Builder builder(vectorType);
+        if (vectorType.getRank() == 0) {
+          static constexpr int64_t v2xty_shape[] = { 2 };
+          return builder.setShape(v2xty_shape);
+        }
+        auto lastDim = vectorType.getRank() - 1;
+        return builder.setDim(lastDim, vectorType.getDimSize(lastDim) * 2);
+      }()
+    }]>]> {
+  let summary = "constructs a vector by interleaving two input vectors";
+  let description = [{
+    The interleave operation constructs a new vector by interleaving the
+    elements from the trailing (or final) dimension of two input vectors,
+    returning a new vector where the trailing dimension is twice the size.
+
+    Note that for the n-D case this differs from the interleaving possible with
+    `vector.shuffle`, which would only operate on the leading dimension.
+
+    Another key difference is this operation supports scalable vectors, though
+    currently a general LLVM lowering is limited to the case where only the
+    trailing dimension is scalable.
+
+    Example:
+    ```mlir
+    %0 = vector.interleave %a, %b
+               : vector<[4]xi32>     ; yields vector<[8]xi32>
+    %1 = vector.interleave %c, %d
+               : vector<8xi8>        ; yields vector<16xi8>
+    %2 = vector.interleave %e, %f
+               : vector<f16>         ; yields vector<2xf16>
+    %3 = vector.interleave %g, %h
+               : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64>
+    %4 = vector.interleave %i, %j
+               : vector<6x3xf32>     ; yields vector<6x6xf32>
+    ```
+  }];
+
+  let arguments = (ins AnyVectorOfAnyRank:$lhs, AnyVectorOfAnyRank:$rhs);
+  let results = (outs AnyVector:$result);
+
+  let assemblyFormat = [{
+    $lhs `,` $rhs  attr-dict `:` type($lhs)
+  }];
+
+  let extraClassDeclaration = [{
+    VectorType getSourceVectorType() {
+      return ::llvm::cast<VectorType>(getLhs().getType());
+    }
+    VectorType getResultVectorType() {
+      return ::llvm::cast<VectorType>(getResult().getType());
+    }
+  }];
+
+  let hasCanonicalizer = 1;
+}
+
 def Vector_ExtractElementOp :
   Vector_Op<"extractelement", [Pure,
      TypesMatchWith<"result type matches element type of vector operand",
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index b66b55ae8d57f..4dc62608d1b92 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1734,6 +1734,70 @@ struct VectorSplatNdOpLowering : public ConvertOpToLLVMPattern<SplatOp> {
   }
 };
 
+struct VectorInterleaveOpLowering
+    : public ConvertOpToLLVMPattern<vector::InterleaveOp> {
+  using ConvertOpToLLVMPattern::ConvertOpToLLVMPattern;
+
+  void initialize() {
+    // This pattern recursively unpacks one dimension at a time. The recursion
+    // bounded as the rank is strictly decreasing.
+    setHasBoundedRewriteRecursion();
+  }
+
+  LogicalResult
+  matchAndRewrite(vector::InterleaveOp interleaveOp, OpAdaptor adaptor,
+                  ConversionPatternRewriter &rewriter) const override {
+    VectorType resultType = interleaveOp.getResultVectorType();
+
+    // If the result is rank 1, then this directly maps to LLVM.
+    if (resultType.getRank() == 1) {
+      if (resultType.isScalable()) {
+        rewriter.replaceOpWithNewOp<LLVM::experimental_vector_interleave2>(
+            interleaveOp, typeConverter->convertType(resultType),
+            adaptor.getLhs(), adaptor.getRhs());
+        return success();
+      }
+      // Lower fixed-size interleaves to a shufflevector. While the
+      // vector.interleave2 intrinsic supports fixed and scalable vectors, the
+      // langref still recommends fixed-vectors use shufflevector, see:
+      // https://llvm.org/docs/LangRef.html#id876.
+      int64_t resultVectorSize = resultType.getNumElements();
+      SmallVector<int32_t> interleaveShuffleMask;
+      interleaveShuffleMask.reserve(resultVectorSize);
+      for (int i = 0; i < resultVectorSize / 2; i++) {
+        interleaveShuffleMask.push_back(i);
+        interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+      }
+      rewriter.replaceOpWithNewOp<LLVM::ShuffleVectorOp>(
+          interleaveOp, adaptor.getLhs(), adaptor.getRhs(),
+          interleaveShuffleMask);
+      return success();
+    }
+
+    // It's not possible to unroll a scalable dimension.
+    if (resultType.getScalableDims().front())
+      return failure();
+
+    // n-D case: Unroll the leading dimension.
+    // This eventually converges to an LLVM lowering.
+    auto loc = interleaveOp.getLoc();
+    Value result = rewriter.create<arith::ConstantOp>(
+        loc, resultType, rewriter.getZeroAttr(resultType));
+    for (int d = 0; d < resultType.getDimSize(0); d++) {
+      Value extractLhs =
+          rewriter.create<ExtractOp>(loc, interleaveOp.getLhs(), d);
+      Value extractRhs =
+          rewriter.create<ExtractOp>(loc, interleaveOp.getRhs(), d);
+      Value dimInterleave =
+          rewriter.create<InterleaveOp>(loc, extractLhs, extractRhs);
+      result = rewriter.create<InsertOp>(loc, dimInterleave, result, d);
+    }
+
+    rewriter.replaceOp(interleaveOp, result);
+    return success();
+  }
+};
+
 } // namespace
 
 /// Populate the given list with patterns that convert from Vector to LLVM.
@@ -1758,7 +1822,8 @@ void mlir::populateVectorToLLVMConversionPatterns(
                VectorExpandLoadOpConversion, VectorCompressStoreOpConversion,
                VectorSplatOpLowering, VectorSplatNdOpLowering,
                VectorScalableInsertOpLowering, VectorScalableExtractOpLowering,
-               MaskedReductionOpConversion>(converter);
+               MaskedReductionOpConversion, VectorInterleaveOpLowering>(
+      converter);
   // Transfer ops with rank > 1 are handled by VectorToSCF.
   populateVectorTransferLoweringPatterns(patterns, /*maxTransferRank=*/1);
 }
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index 452354413e883..8aabc35f4c265 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -6308,6 +6308,48 @@ bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {
       verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));
 }
 
+//===----------------------------------------------------------------------===//
+// InterleaveOp
+//===----------------------------------------------------------------------===//
+
+// The rank 1 case of vector.interleave on fixed-size vectors is equivalent to a
+// vector.shuffle, which (as an older op) is more likely to be matched by
+// existing pipelines.
+struct FoldRank1FixedSizeInterleaveOp : public OpRewritePattern<InterleaveOp> {
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(InterleaveOp interleaveOp,
+                                PatternRewriter &rewriter) const override {
+    auto resultType = interleaveOp.getResultVectorType();
+    if (resultType.getRank() != 1)
+      return rewriter.notifyMatchFailure(
+          interleaveOp, "cannot fold interleave with result rank > 1");
+
+    if (resultType.isScalable())
+      return rewriter.notifyMatchFailure(
+          interleaveOp, "cannot fold interleave of scalable vectors");
+
+    int64_t resultVectorSize = resultType.getNumElements();
+    SmallVector<int64_t> interleaveShuffleMask;
+    interleaveShuffleMask.reserve(resultVectorSize);
+    for (int i = 0; i < resultVectorSize / 2; i++) {
+      interleaveShuffleMask.push_back(i);
+      interleaveShuffleMask.push_back((resultVectorSize / 2) + i);
+    }
+
+    rewriter.replaceOpWithNewOp<ShuffleOp>(interleaveOp, interleaveOp.getLhs(),
+                                           interleaveOp.getRhs(),
+                                           interleaveShuffleMask);
+
+    return success();
+  }
+};
+
+void InterleaveOp::getCanonicalizationPatterns(RewritePatternSet &results,
+                                               MLIRContext *context) {
+  results.add<FoldRank1FixedSizeInterleaveOp>(context);
+}
+
 Value mlir::vector::makeArithReduction(OpBuilder &b, Location loc,
                                        CombiningKind kind, Value v1, Value acc,
                                        arith::FastMathFlagsAttr fastmath,
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 1c13b16dfd9af..3cbca65472fb6 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -2460,3 +2460,88 @@ func.func @make_fixed_vector_of_scalable_vector(%f : f64) -> vector<3x[2]xf64>
   %res = vector.broadcast %f : f64 to vector<3x[2]xf64>
   return %res : vector<3x[2]xf64>
 }
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_0d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<i8>, %[[RHS:.*]]: vector<i8>)
+func.func @vector_interleave_0d(%a: vector<i8>, %b: vector<i8>) -> vector<2xi8> {
+  // CHECK: %[[LHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<i8> to vector<1xi8>
+  // CHECK: %[[RHS_RANK1:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<i8> to vector<1xi8>
+  // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS_RANK1]], %[[RHS_RANK1]] [0, 1] : vector<1xi8>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<i8>
+  return %0 : vector<2xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<8xf32>, %[[RHS:.*]]: vector<8xf32>)
+func.func @vector_interleave_1d(%a: vector<8xf32>, %b: vector<8xf32>) -> vector<16xf32>
+{
+  // CHECK: %[[ZIP:.*]] = llvm.shufflevector %[[LHS]], %[[RHS]] [0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15] : vector<8xf32>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<8xf32>
+  return %0 : vector<16xf32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_1d_scalable
+//  CHECK-SAME:     %[[LHS:.*]]: vector<[4]xi32>, %[[RHS:.*]]: vector<[4]xi32>)
+func.func @vector_interleave_1d_scalable(%a: vector<[4]xi32>, %b: vector<[4]xi32>) -> vector<[8]xi32>
+{
+  // CHECK: %[[ZIP:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS]], %[[RHS]]) : (vector<[4]xi32>, vector<[4]xi32>) -> vector<[8]xi32>
+  // CHECK: return %[[ZIP]]
+  %0 = vector.interleave %a, %b : vector<[4]xi32>
+  return %0 : vector<[8]xi32>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d
+//  CHECK-SAME:     %[[LHS:.*]]: vector<2x3xi8>, %[[RHS:.*]]: vector<2x3xi8>)
+func.func @vector_interleave_2d(%a: vector<2x3xi8>, %b: vector<2x3xi8>) -> vector<2x6xi8>
+{
+  // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[LHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %[[RHS]] : vector<2x3xi8> to !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x6xi8>
+  // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x6xi8> to !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[ZIM_DIM_0:.*]] = llvm.shufflevector %[[LHS_DIM_0]], %[[RHS_DIM_0]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+  // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %[[LHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %[[RHS_LLVM]][1] : !llvm.array<2 x vector<3xi8>>
+  // CHECK: %[[ZIM_DIM_1:.*]] = llvm.shufflevector %[[LHS_DIM_1]], %[[RHS_DIM_1]] [0, 3, 1, 4, 2, 5] : vector<3xi8>
+  // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIM_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<6xi8>>
+  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<6xi8>> to vector<2x6xi8>
+  // CHECK: return %[[RES]]
+  %0 = vector.interleave %a, %b : vector<2x3xi8>
+  return %0 : vector<2x6xi8>
+}
+
+// -----
+
+// CHECK-LABEL: @vector_interleave_2d_scalable
+//  CHECK-SAME:     %[[LHS:.*]]: vector<2x[8]xi16>, %[[RHS:.*]]: vector<2x[8]xi16>)
+func.func @vector_interleave_2d_scalable(%a: vector<2x[8]xi16>, %b: vector<2x[8]xi16>) -> vector<2x[16]xi16>
+{
+  // CHECK: %[[LHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg0 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_LLVM:.*]] = builtin.unrealized_conversion_cast %arg1 : vector<2x[8]xi16> to !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[CST:.*]] = arith.constant dense<0> : vector<2x[16]xi16>
+  // CHECK: %[[CST_LLVM:.*]] = builtin.unrealized_conversion_cast %[[CST]] : vector<2x[16]xi16> to !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[LHS_DIM_0:.*]] = llvm.extractvalue %[[LHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_DIM_0:.*]] = llvm.extractvalue %[[RHS_LLVM]][0] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[ZIM_DIM_0:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_0]], %[[RHS_DIM_0]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+  // CHECK: %[[RES_0:.*]] = llvm.insertvalue %[[ZIM_DIM_0]], %[[CST_LLVM]][0] : !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[LHS_DIM_1:.*]] = llvm.extractvalue %0[1] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[RHS_DIM_1:.*]] = llvm.extractvalue %1[1] : !llvm.array<2 x vector<[8]xi16>>
+  // CHECK: %[[ZIP_DIM_1:.*]] = "llvm.intr.experimental.vector.interleave2"(%[[LHS_DIM_1]], %[[RHS_DIM_1]]) : (vector<[8]xi16>, vector<[8]xi16>) -> vector<[16]xi16>
+  // CHECK: %[[RES_1:.*]] = llvm.insertvalue %[[ZIP_DIM_1]], %[[RES_0]][1] : !llvm.array<2 x vector<[16]xi16>>
+  // CHECK: %[[RES:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x vector<[16]xi16>> to vector<2x[16]xi16>
+  // CHECK: return %[[RES]]
+  %0 = vector.interleave %a, %b : vector<2x[8]xi16>
+  return %0 : vector<2x[16]xi16>
+}
diff --git a/mlir/test/Dialect/Vector/canonicalize.mlir b/mlir/test/Dialect/Vector/canonicalize.mlir
index e6f045e12e519..490ee6a462c6a 100644
--- a/mlir/test/Dialect/Vector/canonicalize.mlir
+++ b/mlir/test/Dialect/Vector/canonicalize.mlir
@@ -2567,3 +2567,26 @@ func.func @load_store_forwarding_rank_mismatch(%v0: vector<4x1x1xf32>, %arg0: te
       tensor<4x4x4xf32>, vector<1x100x4x5xf32>
   return %r : vector<1x100x4x5xf32>
 }
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_1_vector_interleave(
+//  CHECK-SAME:     %[[LHS:.*]]: vector<6xi32>, %[[RHS:.*]]: vector<6xi32>)
+func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
+  // CHECK: return %[[ZIP]] : vector<12xi32>
+  %0 = vector.interleave %arg0, %arg1 : vector<6xi32>
+  return %0 : vector<12xi32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @fold_rank_0_vector_interleave(
+//  CHECK-SAME:     %[[LHS:.*]]: vector<f64>, %[[RHS:.*]]: vector<f64>)
+func.func @fold_rank_0_vector_interleave(%arg0: vector<f64>, %arg1: vector<f64>) -> vector<2xf64>
+{
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 1] : vector<f64>, vector<f64>
+  // CHECK: return %[[ZIP]] : vector<2xf64>
+  %0 = vector.interleave %arg0, %arg1 : vector<f64>
+  return %0 : vector<2xf64>
+}
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
new file mode 100644
index 0000000000000..58dd3d700beff
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir
@@ -0,0 +1,25 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: %mcr_aarch64_cmd -e entry -entry-point-result=void  \
+// RUN:   -shared-libs=%mlir_c_runner_utils,%mlir_arm_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+  %f1 = arith.constant 1.0: f32
+  %f2 = arith.constant 2.0: f32
+  %v1 = vector.splat %f1 : vector<[4]xf32>
+  %v2 = vector.splat %f2 :  vector<[4]xf32>
+  vector.print %v1 : vector<[4]xf32>
+  vector.print %v2 : vector<[4]xf32>
+  //
+  // Test vectors:
+  //
+  // CHECK: ( 1, 1, 1, 1
+  // CHECK: ( 2, 2, 2, 2
+
+  %v3 = vector.interleave %v1, %v2 : vector<[4]xf32>
+  vector.print %v3 : vector<[8]xf32>
+  // CHECK: ( 1, 2, 1, 2, 1, 2, 1, 2
+
+  return
+}
+
diff --git a/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
new file mode 100644
index 0000000000000..c6dd6287208d4
--- /dev/null
+++ b/mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir
@@ -0,0 +1,24 @@
+// RUN: mlir-opt %s -test-lower-to-llvm | \
+// RUN: mlir-cpu-runner -e entry -entry-point-result=void  \
+// RUN:   -shared-libs=%mlir_c_runner_utils | \
+// RUN: FileCheck %s
+
+func.func @entry() {
+  %f1 = arith.constant 1.0: f32
+  %f2 = arith.constant 2.0: f32
+  %v1 = vector.splat %f1 : vector<2x4xf32>
+  %v2 = vector.splat %f2 :  vector<2x4xf32>
+  vector.print %v1 : vector<2x4xf32>
+  vector.print %v2 : vector<2x4xf32>
+  //
+  // Test vectors:
+  //
+  // CHECK: ( ( 1, 1, 1, 1 ), ( 1, 1, 1, 1 ) )
+  // CHECK: ( ( 2, 2, 2, 2 ), ( 2, 2, 2, 2 ) )
+
+  %v3 = vector.interleave %v1, %v2 : vector<2x4xf32>
+  vector.print %v3 : vector<2x8xf32>
+  // CHECK: ( ( 1, 2, 1, 2, 1, 2, 1, 2 ), ( 1, 2, 1, 2, 1, 2, 1, 2 ) )
+
+  return
+}

dcaballe · 2024-02-01T18:39:52Z

Thanks! It looks great!

%0 = vector.interleave %a, %b
: vector<[4]xi32> ; yields vector<[8]xi32>

I was thinking that we could return two vector results, one for the high and one for the low part of the interleave. However, I think keeping them together makes more sense at this level of abstraction.

dcaballe

It looks great! A few comments

dcaballe · 2024-02-01T18:42:03Z

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

+    returning a new vector where the trailing dimension is twice the size.
+
+    Note that for the n-D case this differs from the interleaving possible with
+    `vector.shuffle`, which would only operate on the leading dimension.


I thought we didn't even support n-D cases in vector.shuffle

It does (as can be seen in the examples at: https://mlir.llvm.org/docs/Dialects/Vector/#vectorshuffle-vectorshuffleop), it just does not do what you'd want 😔

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

dcaballe · 2024-02-01T18:49:00Z

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

+    }
+
+    // It's not possible to unroll a scalable dimension.
+    if (resultType.getScalableDims().front())


is this check correct? If so, could we make it easier to understand?

This is correct :)
This just looks at the scalable flag for the front dimension (which is the dimension that is unrolled below), and checks if it is true (i.e. scalable). If it is scalable it can't be unrolled.

It would help if we add a comment here saying that we are unrolling only the leading dimension because this pattern is applied recursively. Even better, we could unroll all the dimensions at once to avoid recursion and applying partial unrolling when the rewrite may eventually fail. We could separate the base case in one pattern and the unrolling one in another pattern if that makes it clearer. We follow that approach often.

I've split out the unrolling to LowerVectorInterleave. Note: An incomplete unrolling is not a bug, like if it was vector vector.interleave %a, %b : vector<2x[4]x[4]xf32> and it unrolled to vector.interleave %a, %b : vector<[4]x[4]xf32>, the general LLVM lowering can't handle that, but maybe something like ArmSME could then lower that to something architecture specific. The current ArmSME lowerings rely on this quite a bit.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

dcaballe · 2024-02-01T18:56:32Z

mlir/test/Dialect/Vector/canonicalize.mlir

+func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
+  // CHECK: return %[[ZIP]] : vector<12xi32>
+  %0 = vector.interleave %arg0, %arg1 : vector<6xi32>


Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.

I personally find showing all the types here very repetitive, and less consistent with vector.shuffle which also infers the result.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

c-rhodes · 2024-02-05T11:03:36Z

mlir/test/Dialect/Vector/canonicalize.mlir

+func.func @fold_rank_1_vector_interleave(%arg0: vector<6xi32>, %arg1: vector<6xi32>) -> vector<12xi32> {
+  // CHECK: %[[ZIP:.*]] = vector.shuffle %[[LHS]], %[[RHS]] [0, 6, 1, 7, 2, 8, 3, 9, 4, 10, 5, 11] : vector<6xi32>, vector<6xi32>
+  // CHECK: return %[[ZIP]] : vector<12xi32>
+  %0 = vector.interleave %arg0, %arg1 : vector<6xi32>


Perhaps we should print the result type somehow? I know it can be inferred but some people prefer to make it easier to read and this op is not particularly verbose. WDYT?

I think inference is nice from a builder perspective where building an op requires minimal info, but I do prefer if types are explicitly spelled out in the IR.

mlir/test/Dialect/Vector/canonicalize.mlir

mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/test-scalable-interleave.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-interleave.mlir

The interleave operation constructs a new vector by interleaving the elements from the trailing (or final) dimension of two input vectors, returning a new vector where the trailing dimension is twice the size. Note that for the n-D case this differs from the interleaving possible with `vector.shuffle`, which would only operate on the leading dimension. Another key difference is this operation supports scalable vectors, though currently a general LLVM lowering is limited to the case where only the trailing dimension is scalable. Example: ```mlir %0 = vector.interleave %a, %b : vector<[4]xi32> ; yields vector<[8]xi32> %1 = vector.interleave %c, %d : vector<8xi8> ; yields vector<16xi8> %2 = vector.interleave %e, %f : vector<f16> ; yields vector<2xf16> %3 = vector.interleave %g, %h : vector<2x4x[2]xf64> ; yields vector<2x4x[4]xf64> %4 = vector.interleave %i, %j : vector<6x3xf32> ; yields vector<6x6xf32> ```

- Remove vector.interleave -> vector.shuffle canonicalization - Add vector.shuffle -> vector.interleave canonicalization - Split vector.interleave unrolling and LLVM lowering - Unrolling now done in LowerVectorInterleave.cpp - Add missing tests to vector ops.mlir - Fixed a few nits

hanhanW

I think this PR is not just adding vector.interleave operation. From my point of view, it has three changes:

Add vector.interleave op.
Add support for convert vector.interleave to llvm dialect.
Add support for unrolling vector.interleave op.

Ideally we shoul split it into three PRs, which is easier for reviewers. It's okay for this one because it's already reviewed. Can you add more context to the PR description?

banach-space · 2024-02-06T17:45:21Z

I think this PR is not just adding vector.interleave operation. From my point of view, it has three changes:

Add vector.interleave op.

Add support for convert vector.interleave to llvm dialect.

Add support for unrolling vector.interleave op.

Ideally we shoul split it into three PRs, which is easier for reviewers. It's okay for this one because it's already reviewed. Can you add more context to the PR description?

@MacDue I would also find it easier to review if this was split :) (sorry, been away and smaller patches just make it easier to catch-up)

MacDue · 2024-02-06T18:04:11Z

Would you both be okay with me just splitting this into multiple commits within this PR? I find it becomes much more of a hassle then I've got 3+ branches and PRs to manage.

MacDue · 2024-02-07T10:32:25Z

New PRs:

I've left the later PRs as drafts as it's easier to review/land them in order :)

dcaballe

I'm not sure if splitting the PR made the review simpler or harder given how bad github deals with stacked PRs :_(

MacDue requested a review from c-rhodes February 1, 2024 18:03

MacDue requested review from dcaballe, nicolasvasilache and banach-space as code owners February 1, 2024 18:03

llvmbot added mlir:vectorops mlir mlir:sve mlir:vector labels Feb 1, 2024

dcaballe reviewed Feb 1, 2024

View reviewed changes

c-rhodes reviewed Feb 5, 2024

View reviewed changes

MacDue added 2 commits February 6, 2024 10:24

MacDue force-pushed the add_vector.interleave branch from 3cc2fb2 to 795f4b8 Compare February 6, 2024 12:34

MacDue requested a review from hanhanW as a code owner February 6, 2024 12:34

hanhanW reviewed Feb 6, 2024

View reviewed changes

MacDue closed this Feb 7, 2024

MacDue deleted the add_vector.interleave branch February 7, 2024 10:28

dcaballe reviewed Feb 7, 2024

View reviewed changes

[mlir][VectorOps] Add vector.interleave operation #80315

[mlir][VectorOps] Add vector.interleave operation #80315

Uh oh!

Conversation

MacDue commented Feb 1, 2024

Uh oh!

llvmbot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcaballe commented Feb 1, 2024

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MacDue Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space commented Feb 6, 2024

Uh oh!

MacDue commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Feb 7, 2024

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[mlir][VectorOps] Add `vector.interleave` operation #80315

[mlir][VectorOps] Add `vector.interleave` operation #80315

llvmbot commented Feb 1, 2024 •

edited

Loading

MacDue Feb 6, 2024 •

edited

Loading

MacDue commented Feb 6, 2024 •

edited

Loading