Skip to content

[mlir][linalg] Add support for masked vectorization of tensor.insert_slice (1/N) #122927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

banach-space
Copy link
Contributor

@banach-space banach-space commented Jan 14, 2025

For context, tensor.insert_slice is vectorized using a
vector.transfer_read + vector.transfer_write pair.
An unmasked example is shown below:

// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad 
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2] 
  : vector<8x1xi32>, tensor<5x3xi32>

This PR refactors InsertSliceVectorizePattern (which is used to
vectorize tensor.extract_slice) to enable masked vectorization. ATM,
only vector.transfer_read is masked. If vector.transfer_write also
requires masking, the vectorizer will bail out. This will be addressed
in a sub-sequent PR.

Summary of changes:

  • Added an argument to specify vector sizes (behavior remains
    unchanged if vector sizes are not specified).
  • Renamed InsertSliceVectorizePattern to vectorizeAsInsertSliceOp
    and integrated into (alongside other hooks for vectorization) in
    linalg::vectorize.
  • Removed populateInsertSliceVectorizationPatterns, as
    InsertSliceVectorizePattern was its only pattern.
  • Updated vectorizeAsInsertSliceOp to support masking for the
    "read" operation.
  • Updated @pad_and_insert_slice_dest in
    "vectorization-pad-patterns.mlir" to reflect the removal of
    populateInsertSliceVectorizationPatterns from
    ApplyPadVectorizationPatternsOps.

@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Andrzej Warzyński (banach-space)

Changes

This PR refactors the InsertSliceVectorizePattern to enable masked
vectorization of tensor.insert_slice.

Note, tensor.insert_slice is vectorised using the
vector.transfer_read + vector.transfer_write pair. ATM, only
vector.transfer_read is masked. If vector.transfer_write also
requires masking, the vectorizer will bail out. This will be addressed
in a sub-sequent PR.

Summary of changes:

  • Added an argument to specify vector sizes (behavior remains
    unchanged if vector sizes are not specified).
  • Renamed InsertSliceVectorizePattern to vectorizeAsInsertSliceOp
    and integrated into (alongside other hooks for vectorization) in
    linalg::vectorize.
  • Removed populateInsertSliceVectorizationPatterns, as
    InsertSliceVectorizePattern was its only pattern.
  • Updated vectorizeAsInsertSliceOp to support masking for the
    "read" operation.
  • Updated @<!-- -->pad_and_insert_slice_dest in
    "vectorization-pad-patterns.mlir" to reflect the removal of
    populateInsertSliceVectorizationPatterns from
    ApplyPadVectorizationPatternsOps.

Patch is 32.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122927.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (-5)
  • (modified) mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp (-4)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+168-103)
  • (modified) mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir (+6-26)
  • (modified) mlir/test/Dialect/Linalg/vectorization-unsupported.mlir (+23)
  • (modified) mlir/test/Dialect/Linalg/vectorization.mlir (+88-12)
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 1dc700f22c2027..726ce22ac70dc3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1723,11 +1723,6 @@ void populateDecomposePadPatterns(RewritePatternSet &patterns);
 /// \see rewriteInIm2Col for more details.
 void populateConvertConv2DToImg2ColPatterns(RewritePatternSet &patterns);
 
-/// Populates `patterns` with vectorisation patterns for tensor.insert_slice.
-/// TODO: Avoid having a dedicated `populate{}` for one pattern. Instead, either
-/// expand or merge with other `populate{}`.
-void populateInsertSliceVectorizationPatterns(RewritePatternSet &patterns);
-
 /// Populates `patterns` with patterns that vectorize tensor.pad.
 /// These patterns are meant to apply in a complementary fashion. Benefits
 /// are used to encode a certain ordering of pattern application. To avoid
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index 67dd21aafe4fe0..73a52ebc46f15b 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -265,7 +265,6 @@ void transform::ApplyFoldAddIntoDestPatternsOp::populatePatterns(
 void transform::ApplyPadVectorizationPatternsOp::populatePatterns(
     RewritePatternSet &patterns) {
   linalg::populatePadOpVectorizationPatterns(patterns);
-  linalg::populateInsertSliceVectorizationPatterns(patterns);
 }
 
 //===----------------------------------------------------------------------===//
@@ -3504,9 +3503,6 @@ transform::VectorizeChildrenAndApplyPatternsOp::applyToOne(
 
   patterns.add<CopyVectorizationPattern>(ctx);
 
-  // Add misc. vectorization patterns (e.g. for tensor.insert_slice)
-  linalg::populateInsertSliceVectorizationPatterns(patterns);
-
   if (getVectorizePadding()) {
     linalg::populatePadOpVectorizationPatterns(patterns);
     // This creates an alternative path for lowering tensor.pad - by
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 863f2280e46ce6..c87d9b0c388042 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -59,6 +59,30 @@ vectorizeConvolution(RewriterBase &rewriter, LinalgOp convOp,
                      ArrayRef<bool> inputVecScalableFlags = {},
                      bool flatten1DDepthwiseConv = false);
 
+/// Vectorize tensor::InsertSliceOp with:
+///   * vector::TransferReadOp + vector::TransferWriteOp
+/// The vector sizes are either:
+///   * user-provided in `inputVectorSizes`, or
+///   * inferred from the static dims in the input and output tensors.
+/// Bails out if:
+///   * vector sizes are not user-provided, and
+///   * at least one dim is dynamic (in both the input and output tensors),
+///   bails out.
+///
+/// Before:
+///     !t_in_type = tensor<1x2x3xf32>
+///     !t_out_type = tensor<9x8x7x1x2x3xf32>
+///     !v_type = vector<1x2x3xf32>
+///     %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type
+///     into !t_out_type
+/// After:
+///     %read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
+///     %write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type
+static LogicalResult
+vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
+                         ArrayRef<int64_t> inputVectorSizes,
+                         SmallVectorImpl<Value> &newResults);
+
 /// Return the unique instance of OpType in `block` if it is indeed unique.
 /// Return null if none or more than 1 instances exist.
 template <typename OpType>
@@ -1557,6 +1581,7 @@ static LogicalResult
 vectorizeAsTensorPackOp(RewriterBase &rewriter, tensor::PackOp packOp,
                         ArrayRef<int64_t> inputVectorSizes,
                         SmallVectorImpl<Value> &newResults) {
+  // TODO: Introduce a parent class that will handle the insertion point update.
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(packOp);
 
@@ -1633,6 +1658,7 @@ vectorizeAsTensorUnpackOp(RewriterBase &rewriter, tensor::UnPackOp unpackOp,
                           ArrayRef<int64_t> inputVectorSizes,
                           SmallVectorImpl<Value> &newResults) {
 
+  // TODO: Introduce a parent class that will handle the insertion point update.
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unpackOp);
 
@@ -1763,7 +1789,7 @@ vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,
   auto padValue = padOp.getConstantPaddingValue();
   Location loc = padOp.getLoc();
 
-  // transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value))
+  // TODO: Introduce a parent class that will handle the insertion point update.
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(padOp);
 
@@ -1874,6 +1900,15 @@ vectorizeUnPackOpPrecondition(tensor::UnPackOp unpackOp,
   return success();
 }
 
+/// Need to check if the inner-tiles are static/constant.
+static LogicalResult
+vectorizeInsertSliceOpPrecondition(tensor::InsertSliceOp sliceOp,
+                                   ArrayRef<int64_t> inputVectorSizes) {
+
+  // TODO: Move pre-conditions from the vectorization logic
+  return success();
+}
+
 static LogicalResult vectorizeLinalgOpPrecondition(
     LinalgOp linalgOp, ArrayRef<int64_t> inputVectorSizes,
     bool vectorizeNDExtract, bool flatten1DDepthwiseConv) {
@@ -2144,6 +2179,9 @@ LogicalResult mlir::linalg::vectorizeOpPrecondition(
       .Case<tensor::UnPackOp>([&](auto unpackOp) {
         return vectorizeUnPackOpPrecondition(unpackOp, inputVectorSizes);
       })
+      .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
+        return vectorizeInsertSliceOpPrecondition(sliceOp, inputVectorSizes);
+      })
       .Default([](auto) { return failure(); });
 }
 
@@ -2163,8 +2201,8 @@ static void convertAffineApply(RewriterBase &rewriter, LinalgOp linalgOp) {
 }
 
 bool mlir::linalg::hasVectorizationImpl(Operation *op) {
-  return isa<linalg::LinalgOp, tensor::PadOp, tensor::PackOp, tensor::UnPackOp>(
-      op);
+  return isa<linalg::LinalgOp, tensor::PadOp, tensor::PackOp, tensor::UnPackOp,
+             tensor::InsertSliceOp>(op);
 }
 
 /// Emit a suitable vector form for an operation. If provided,
@@ -2178,6 +2216,7 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
                                       ArrayRef<bool> inputScalableVecDims,
                                       bool vectorizeNDExtract,
                                       bool flatten1DDepthwiseConv) {
+  rewriter.getInsertionPoint();
   LDBG("Attempting to vectorize:\n" << *op << "\n");
   LDBG("Input vector sizes: ");
   LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));
@@ -2244,6 +2283,10 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
             return vectorizeAsTensorPackOp(rewriter, packOp, inputVectorSizes,
                                            results);
           })
+          .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
+            return vectorizeAsInsertSliceOp(rewriter, sliceOp, inputVectorSizes,
+                                            results);
+          })
           .Case<tensor::UnPackOp>([&](auto unpackOp) {
             return vectorizeAsTensorUnpackOp(rewriter, unpackOp,
                                              inputVectorSizes, results);
@@ -2534,6 +2577,7 @@ struct PadOpVectorizationWithTransferWritePattern
   }
 };
 
+
 /// Returns the effective Pad value for the input op, provided it's a scalar.
 ///
 /// Many Ops exhibit pad-like behaviour, but this isn't always explicit. If
@@ -2583,113 +2627,139 @@ static Value getStaticPadVal(Operation *op) {
   return {};
 }
 
-/// Rewrite tensor.insert.slice as a vector.transfer_read +
-/// vector.transfer_write pair. The vector size is inferred from the static
-/// dims in the input and output tensors. If a dim is dynamic in both the input
-/// and output tensors, bails out.
-///
-/// Before:
-///     !t_in_type = tensor<1x2x3xf32>
-///     !t_out_type = tensor<9x8x7x1x2x3xf32>
-///     !v_type = vector<1x2x3xf32>
-///     %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type
-///     into !t_out_type
-/// After:
-///     %read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
-///     %write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type
-///
-/// TODO: Support masking
-struct InsertSliceVectorizePattern
-    : public OpRewritePattern<tensor::InsertSliceOp> {
-  using OpRewritePattern<tensor::InsertSliceOp>::OpRewritePattern;
+static LogicalResult
+vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
+                         ArrayRef<int64_t> inputVectorSizes,
+                         SmallVectorImpl<Value> &newResults) {
+  // TODO: Introduce a parent class that will handle the insertion point update.
+  OpBuilder::InsertionGuard g(rewriter);
+  rewriter.setInsertionPoint(sliceOp);
 
-  LogicalResult matchAndRewrite(tensor::InsertSliceOp sliceOp,
-                                PatternRewriter &rewriter) const final {
-    auto sourceType = sliceOp.getSource().getType();
-    if (!VectorType::isValidElementType(sourceType.getElementType()))
-      return failure();
+  TypedValue<RankedTensorType> source = sliceOp.getSource();
+  auto sourceType = source.getType();
+  if (!VectorType::isValidElementType(sourceType.getElementType()))
+    return failure();
+
+  auto resultType = sliceOp.getResultType();
+
+  // 1. Get the pad value.
+  // TransferReadOp requires a scalar padding value. Note that:
+  //    * for in-bounds access, the value is actually irrelevant.
+  // There are 2 cases in which xfer.read accesses are known to be in-bounds:
+  //  1. The source shape is static (output vector sizes would be based on
+  //     the source shape and hence all memory accesses would be in-bounds),
+  //  2. Masking is used (output vector sizes would be user-provided, in which
+  //     case it is assumed that all memory accesses are in-bounds). This
+  //     remains a TODO.
+  //
+  // When the value is not known and not needed, use 0. Otherwise, bail out.
+  Value padValue = getStaticPadVal(sliceOp);
+  bool isOutOfBoundsRead =
+      !sourceType.hasStaticShape() && inputVectorSizes.empty();
 
-    auto resultType = sliceOp.getResultType();
-
-    // 1. Get the pad value.
-    // TransferReadOp requires a scalar padding value. Note that:
-    //    * for in-bounds access, the value is actually irrelevant.
-    //  There are 2 cases in which xfer.read accesses are known to be in-bounds:
-    //  1. The source shape is static (output vector sizes would be based on
-    //     the source shape and hence all memory accesses would be in-bounds),
-    //  2. Masking is used (output vector sizes would be user-provided, in which
-    //     case it is assumed that all memory accesses are in-bounds). This
-    //     remains a TODO.
-    //
-    // When the value is not known and not needed, use 0. Otherwise, bail out.
-    Value padValue = getStaticPadVal(sliceOp);
-    bool isOutOfBoundsRead = !sourceType.hasStaticShape();
-
-    if (!padValue && isOutOfBoundsRead) {
-      LDBG("Failed to get a pad value for out-of-bounds read access\n");
+  if (!padValue && isOutOfBoundsRead) {
+    LDBG("Failed to get a pad value for out-of-bounds read access\n");
+    return failure();
+  }
+
+  if (!padValue) {
+    auto elemType = sourceType.getElementType();
+    padValue = rewriter.create<arith::ConstantOp>(
+        sliceOp.getLoc(), elemType, rewriter.getZeroAttr(elemType));
+  }
+
+  // 2. Get the vector shape and in-bounds attributes
+  SmallVector<int64_t> vecShape;
+  SmallVector<bool> readInBounds;
+  SmallVector<bool> writeInBounds;
+  size_t rankDiff = resultType.getRank() - sourceType.getRank();
+  for (unsigned i = 0; i < sourceType.getRank(); ++i) {
+    if (!inputVectorSizes.empty()) {
+      vecShape.push_back(inputVectorSizes[i]);
+      readInBounds.push_back(false);
+      writeInBounds.push_back(false);
+    } else if (!sourceType.isDynamicDim(i)) {
+      vecShape.push_back(sourceType.getDimSize(i));
+      // Source shape is statically known: Neither read nor write are
+      // out-of-bounds.
+      readInBounds.push_back(true);
+      writeInBounds.push_back(true);
+    } else if (!resultType.isDynamicDim(i)) {
+      // Source shape is not statically known, but result shape is.
+      // Vectorize with size of result shape. This may be larger than the
+      // source size.
+      // FIXME: Using rankDiff implies that the source tensor is inserted at
+      // the end of the destination tensor. However, that's not required.
+      vecShape.push_back(resultType.getDimSize(rankDiff + i));
+      // Read may be out-of-bounds because the result size could be larger
+      // than the source size.
+      readInBounds.push_back(false);
+      // Write will be in-bounds provided that the corresponding write idx is 0.
+      // To keep this logic simple, conservatively mark as out-of-bounds.
+      writeInBounds.push_back(false);
+    } else {
+      // Neither source nor result dim of padOp is static. Cannot vectorize
+      // the copy.
+      // TODO: Add support for masking
       return failure();
     }
+  }
+  auto vecType = VectorType::get(vecShape, sourceType.getElementType());
 
-    if (!padValue) {
-      auto elemType = sourceType.getElementType();
-      padValue = rewriter.create<arith::ConstantOp>(
-          sliceOp.getLoc(), elemType, rewriter.getZeroAttr(elemType));
-    }
+  // 3. Generate TransferReadOp.
+  SmallVector<Value> readIndices(
+      vecType.getRank(),
+      rewriter.create<arith::ConstantIndexOp>(sliceOp.getLoc(), 0));
+  Operation *read = rewriter.create<vector::TransferReadOp>(
+      sliceOp.getLoc(), vecType, source, readIndices, padValue,
+      ArrayRef<bool>{readInBounds});
 
-    // 2. Get the vector shape and in-bounds attributes
-    SmallVector<int64_t> vecShape;
-    SmallVector<bool> readInBounds;
-    SmallVector<bool> writeInBounds;
-    size_t rankDiff = resultType.getRank() - sourceType.getRank();
-    for (unsigned i = 0; i < sourceType.getRank(); ++i) {
-      if (!sourceType.isDynamicDim(i)) {
-        vecShape.push_back(sourceType.getDimSize(i));
-        // Source shape is statically known: Neither read nor write are
-        // out-of-bounds.
-        readInBounds.push_back(true);
-        writeInBounds.push_back(true);
-      } else if (!resultType.isDynamicDim(i)) {
-        // Source shape is not statically known, but result shape is.
-        // Vectorize with size of result shape. This may be larger than the
-        // source size.
-        // FIXME: Using rankDiff implies that the source tensor is inserted at
-        // the end of the destination tensor. However, that's not required.
-        vecShape.push_back(resultType.getDimSize(rankDiff + i));
-        // Read may be out-of-bounds because the result size could be larger
-        // than the source size.
-        readInBounds.push_back(false);
-        // Write will in-bounds provided that the corresponding write idx is 0.
-        // To keep this logic simple, conservatively mark as out-of-bounds.
-        writeInBounds.push_back(false);
-      } else {
-        // Neither source nor result dim of padOp is static. Cannot vectorize
-        // the copy.
-        // TODO: Add support for masking
-        return failure();
-      }
+  // If vector sizes are user provided, make sure to mask xfer_read.
+  if (!inputVectorSizes.empty()) {
+    auto *srcDefOp = source.getDefiningOp();
+    if (!srcDefOp) {
+      LDBG("Unable to get the defining Op of " << sliceOp);
+      return failure();
     }
-    auto vecType = VectorType::get(vecShape, sourceType.getElementType());
 
-    // 3. Generate TransferReadOp.
-    SmallVector<Value> readIndices(
-        vecType.getRank(),
-        rewriter.create<arith::ConstantIndexOp>(sliceOp.getLoc(), 0));
-    auto read = rewriter.create<vector::TransferReadOp>(
-        sliceOp.getLoc(), vecType, sliceOp.getSource(), readIndices, padValue,
-        ArrayRef<bool>{readInBounds});
+    ReifiedRankedShapedTypeDims reifiedSrcSizes;
+    LogicalResult status =
+        cast<ReifyRankedShapedTypeOpInterface>(source.getDefiningOp())
+            .reifyResultShapes(rewriter, reifiedSrcSizes);
+    if (status.failed()) {
+      LDBG("Unable to reify result shapes of " << sliceOp);
+      return failure();
+    }
 
-    // 4. Generate TransferWriteOp.
-    auto writeIndices = getValueOrCreateConstantIndexOp(
-        rewriter, sliceOp.getLoc(), sliceOp.getMixedOffsets());
+    // Create the mask
+    SmallVector<int64_t> readMaskShape(
+        sliceOp.getSource().getType().getShape());
+    auto readMaskType = VectorType::get(inputVectorSizes, rewriter.getI1Type());
+    Value maskOp = rewriter.create<vector::CreateMaskOp>(
+        sliceOp.getLoc(), readMaskType, reifiedSrcSizes[0]);
 
-    // 5. Finalize
-    rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
-        sliceOp, read, sliceOp.getDest(), writeIndices,
-        ArrayRef<bool>{writeInBounds});
+    // Mask the xfer_read Op
+    read = mlir::vector::maskOperation(rewriter, read, maskOp);
+  }
 
-    return success();
+  // 4. Generate TransferWriteOp.
+  if (!inputVectorSizes.empty() &&
+      ShapedType::isDynamicShape(resultType.getShape())) {
+    LDBG("TODO: Masking of xfer_write when vectorising " << sliceOp);
+    return failure();
   }
-};
+
+  auto writeIndices = getValueOrCreateConstantIndexOp(
+      rewriter, sliceOp.getLoc(), sliceOp.getMixedOffsets());
+
+  // 5. Finalize
+  Operation *write = rewriter.create<vector::TransferWriteOp>(
+      sliceOp.getLoc(), read->getResult(0), sliceOp.getDest(), writeIndices,
+      ArrayRef<bool>{writeInBounds});
+  newResults.push_back(write->getResult(0));
+
+  return success();
+}
 
 /// Rewrite use of tensor::PadOp result in InsertSliceOp. E.g.:
 /// ```
@@ -2778,11 +2848,6 @@ struct PadOpVectorizationWithInsertSlicePattern
   }
 };
 
-void mlir::linalg::populateInsertSliceVectorizationPatterns(
-    RewritePatternSet &patterns) {
-  patterns.add<InsertSliceVectorizePattern>(patterns.getContext());
-}
-
 void mlir::linalg::populatePadOpVectorizationPatterns(
     RewritePatternSet &patterns, PatternBenefit baseBenefit) {
   patterns.add<PadOpVectorizationWithTransferReadPattern,
diff --git a/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
index 08a3bbbb301c87..747b6f6d90cc7f 100644
--- a/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
@@ -224,34 +224,16 @@ module attributes {transform.with_named_sequence} {
   }
 }
 
-
 // -----
 
-///----------------------------------------------------------------------------------------
-/// tensor::PadOp -> tensor::EmptyOp + linalg::FillOp/tensor::GenerateOp + tensor::InsertSliceOp
-/// [Pattern: GenericPadOpVectorizationPattern + InsertSliceVectorizePattern]
-/// TODO: Split the test into two, one for each pattern.
-///----------------------------------------------------------------------------------------
-
 func.func private @make_vector() -> tensor<12x13xf32>
 
-// Same as @pad_and_insert_slice_dest in vectorization-with-patterns.mlir, but
-// over here linalg::fill is not vectorized (patterns for linalg.fill are not
-// included here)
-// CHECK-LABEL:   func.func @pad_and_insert_slice_dest(
-// CHECK-SAME:      %[[ARG_0:.*]]: tensor<1x5x6xf32>) -> tensor<1x12x13xf32> {
-//  CHECK-NOT:     tensor.pad
-//  CHECK-DAG:     %[[C0:.*]] = arith.constant 0 : index
-//  CHECK-DAG:     %[[PAD:.*]] = arith.constant 5.000000e+00 : f32
-//  CHECK-DAG:     %[[PAD_READ:.*]] = arith.constant 0.000000e+00 : f32
-//      CHECK:     %[[EMPTY:.*]] = tensor.empty() : tensor<1x12x13xf32>
-//      CHECK:     %[[FILL:.*]] = linalg.fill ins(%[[PAD]] : f32) outs(%[[EMPTY]] : tensor<1x12x13xf32>) -> tensor<1x12x13xf32>
-//      CHECK:     %[[READ_1:.*]] = vector.transfer_read %[[ARG_0]]{{\[}}%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true...
[truncated]

Copy link

github-actions bot commented Jan 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@banach-space banach-space force-pushed the andrzej/refactor_insert_slice_vec branch 4 times, most recently from ca54c1f to 4213800 Compare January 15, 2025 09:25
@banach-space banach-space changed the title [mlir][linalg] Add support for masked vectorization of tensor.insert_slice [mlir][linalg] Add support for masked vectorization of tensor.insert_slice (1/N) Jan 15, 2025
@banach-space banach-space force-pushed the andrzej/refactor_insert_slice_vec branch from 4213800 to 363cfc1 Compare January 15, 2025 09:36
banach-space added a commit to banach-space/llvm-project that referenced this pull request Jan 15, 2025
…_slice` (2/N)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in llvm#122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Jan 15, 2025
Following on from llvm#122927 + llvm#123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question.

@@ -1874,6 +1900,15 @@ vectorizeUnPackOpPrecondition(tensor::UnPackOp unpackOp,
return success();
}

/// Need to check if the inner-tiles are static/constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a TODO? Do you mean to move the checks from vectorizeAsInsertSliceOp to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks for integrating these patterns into the main flow. Really appreciated. Just a few minor comments.

vectorizeInsertSliceOpPrecondition(tensor::InsertSliceOp sliceOp,
ArrayRef<int64_t> inputVectorSizes) {

// TODO: Move pre-conditions from the vectorization logic, i.e.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do that as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2178,6 +2216,7 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
ArrayRef<bool> inputScalableVecDims,
bool vectorizeNDExtract,
bool flatten1DDepthwiseConv) {
rewriter.getInsertionPoint();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed.

vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
ArrayRef<int64_t> inputVectorSizes,
SmallVectorImpl<Value> &newResults) {
// TODO: Introduce a parent class that will handle the insertion point update.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by a parent "class"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these patterns do exactly the same:

We may as well just wrap this into a parent class.

ArrayRef<int64_t> inputVectorSizes,
SmallVectorImpl<Value> &newResults) {
// TODO: Introduce a parent class that will handle the insertion point update.
OpBuilder::InsertionGuard g(rewriter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: g -> guard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the suggestion, but I think that we should prioritise consistency and follow the existing style within the file:

😅

SmallVector<bool> readInBounds;
SmallVector<bool> writeInBounds;
size_t rankDiff = resultType.getRank() - sourceType.getRank();
for (unsigned i = 0; i < sourceType.getRank(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: end = ...; i < end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@banach-space banach-space force-pushed the andrzej/refactor_insert_slice_vec branch from bb829f0 to 2b35fa6 Compare January 22, 2025 16:32
Comment on lines 62 to 79
/// Vectorize tensor::InsertSliceOp with:
/// * vector::TransferReadOp + vector::TransferWriteOp
/// The vector sizes are either:
/// * user-provided in `inputVectorSizes`, or
/// * inferred from the static dims in the input and output tensors.
/// Bails out if:
/// * vector sizes are not user-provided, and
/// * at least one dim is dynamic (in both the input and output tensors),
/// bails out.
///
/// Before:
/// !t_in_type = tensor<1x2x3xf32>
/// !t_out_type = tensor<9x8x7x1x2x3xf32>
/// !v_type = vector<1x2x3xf32>
/// %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type
/// into !t_out_type
/// After:
/// %read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
/// %write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type
Copy link
Contributor

@hanhanW hanhanW Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the comment again, and I can't follow why it becomes vector.transfer_read + write pair now. Can't it be a single vector.transfer_write %src, %dest ....?

The other question is about inputVectorSizes. My understanding is that the inputVectorSizes is somewhat tied to iteration domain, which is related to their TilingInterface implementation. This is very useful to me, because we can infer the vector sizes from the tiling artifacts IR (i.e., affine_min/apply ops) automatically. However, it looks like there are two masks in the vectorization result (which I don't understand now). I wonder this inputVectorSizes is for read mask or write mask, and the reason behind the decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanhanW , these are great questions - those things made me wonder as well at some point :)

I can't follow why it becomes vector.transfer_read + write pair now. Can't it be a single vector.transfer_write %src, %dest ....?

The most intuitive way to see this is by noting that:

  • tensor.insert_slice operates on tensors (i.e. the Op requires two tensors), whereas
  • vector.transfer_write requires a vector to write (i.e. the Op requires one tensor + one vector).

Now, here's an important question:

  • where is the vector (that vector.transfer_write would "write) meant to come from?

Ah! From vector.transfer_read 😅

So, going back to the example from the comment, note that with tensor.insert_slice, all operands are TensorType. That's !t_in_type:

///     %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type

However, in order to operate on vectors, we need to ... switch to VectorType. That's !v_type as the source/destination:

// Read from "source" tensor into a vector. Now we are in the "land of vectors" :)
%read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
// Write vector to the destination "tensor", i.e. switch back to the "land of tensors" :)
%write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type

I think that this example also demonstrates quite nicely that indeed vector.transfer_read and vector.transfer_write are boundary Ops (as in, boundary between TensorType and VectorType).

I hope that this makes sense. Clearly the comment could benefit from an update.


2nd thread

My understanding is that the inputVectorSizes is somewhat tied to iteration domain, which is related to their TilingInterface implementation.

That's one option, yes. But what if you're attempting to vectorize something that e.g. has not been tiled? See e.g. examples for tensor.insert_slice added to this PR.

When building a full compilation flow, you'd want the tile sizes and vector sizes to match, yes. But that's not a requirement.

However, it looks like there are two masks in the vectorization result (which I don't understand now).

Note, this patch adds only one mask. The 2nd mask is added here:

(trying to split my PRs into smaller patches to make reviewing easier).

I wonder this inputVectorSizes is for read mask or write mask, and the reason behind the decision.

In fact, it's tied to tensor.insert_slice, so one mask is used for both. First, we assume that the underlying tensor.insert_slice Op is valid. So, if we calculate the mask based on the sizes of the value to be stored:

// Taken from the the newly added mask calculation in vectorizeAsInsertSliceOp
LogicalResult status =
    cast<ReifyRankedShapedTypeOpInterface>(srcDefOp).reifyResultShapes(
        rewriter, reifiedSrcSizes);
// (...)
Value maskOp = rewriter.create<vector::CreateMaskOp>(
    sliceOp.getLoc(), readMaskType, reifiedSrcSizes[0]);

then, it should be safe to use it for both vector.transfer_read and vector.transfer_write.

we can infer the vector sizes from the tiling artifacts IR (i.e., affine_min/apply ops) automatically

Going back to my point earlier, In theory, vector sizes can indeed be inferred from various artefacts and that's what we should be aiming for. However, the vectorizer should work independently of those artefacts, right? Put differently,

  • If the vector sizes can be inferred, let's used that.
  • If not, we need a fall back version, i.e. the vectorizer should be happy with any arbitrary (within reason) sizes.

I hope that this make sense 😅 Btw, perhaps it would help if I moved the mask calculation to a separate patch?

-Andrzej

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes a lot of sense to me now. Somehow I missed that vector.transfer_write ops write a vector today. 😅

RE inputVectorSize: yes, they are the options. The point was that the input vector size should be inferrable from IR if possible. And I did not know that both read/write use the same mask! The second patch does answer my questions. Thanks for all the details! 🙏

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks! There was a lower-case replacement that you may want to fix before landing :)


// check-label: func.func @pad_and_insert_slice_dest(
// check-not: vector.transfer_read
// check-not: vector.transfer_write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitals

…_slice` (1/N)

For context, note that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %source_slice into %init[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %extracted_slice[%c0, %c0], %c0_i32 : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %2, %0[%c0_0, %c2] : vector<8x1xi32>, tensor<5x3xi32>
```

This PR refactors `InsertSliceVectorizePattern` (which is used to
vectorize `tensor.extract_slice`) to enable masked vectorization. ATM,
only `vector.transfer_read` is masked. If `vector.transfer_write` also
requires masking, the vectorizer will bail out. This will be addressed
in a sub-sequent PR.

Summary of changes:
  * Added an argument to specify vector sizes (behavior remains
    unchanged if vector sizes are not specified).
  * Renamed `InsertSliceVectorizePattern` to `vectorizeAsInsertSliceOp`
    and integrated into (alongside other hooks for vectorization) in
    `linalg::vectorize`.
  * Removed `populateInsertSliceVectorizationPatterns`, as
    `InsertSliceVectorizePattern` was its only pattern.
  * Updated `vectorizeAsInsertSliceOp` to support masking for the
    "read" operation.
  * Updated `@pad_and_insert_slice_dest` in
    "vectorization-pad-patterns.mlir" to reflect the removal of
    `populateInsertSliceVectorizationPatterns` from
    `ApplyPadVectorizationPatternsOps`.
…`tensor.insert_slice` (1/N)

Incorporate suggestions from Diego
…ion of `tensor.insert_slice` (1/N)

Update comments
@banach-space banach-space force-pushed the andrzej/refactor_insert_slice_vec branch from 2b35fa6 to 709dd82 Compare February 2, 2025 14:11
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 2, 2025
…_slice` (2/N)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in llvm#122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 2, 2025
Following on from llvm#122927 + llvm#123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
@banach-space banach-space merged commit d68a4b9 into llvm:main Feb 2, 2025
8 checks passed
@banach-space banach-space deleted the andrzej/refactor_insert_slice_vec branch February 2, 2025 14:51
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 2, 2025
…_slice` (2/N)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in llvm#122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 2, 2025
Following on from llvm#122927 + llvm#123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 5, 2025
…_slice` (2/N)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in llvm#122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit that referenced this pull request Feb 7, 2025
…_slice` (2/N) (#123031)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in #122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit to banach-space/llvm-project that referenced this pull request Feb 7, 2025
Following on from llvm#122927 + llvm#123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…_slice` (2/N) (llvm#123031)

For context, recall that `tensor.insert_slice` is vectorised using the
`vector.transfer_read` + `vector.transfer_write` pair. An unmasked
example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2]
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR extends `vectorizeAsInsertSliceOp` to add masking support for
the `vector.transfer_write` operation. This complements the changes
in llvm#122927, which introduced masking for the `vector.transfer_read`.
banach-space added a commit that referenced this pull request Feb 28, 2025
…123032)

Following on from #122927 + #123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…lvm#123032)

Following on from llvm#122927 + llvm#123031 that added support for masked
vectorization of `tensor.insert_slice`, this PR extends the e2e test for
`tensor.unpack` to leverage the new functionality.
banach-space added a commit to banach-space/llvm-project that referenced this pull request May 22, 2025
This PR refactors the following two hooks in Vectorization.cpp:
  * `createWriteOrMaskedWrite` gains a parameter to specify write
    indices - alignining it with its "read" counterpart:
    `createReadOrMaskedRead`.
  * `vectorizeAsInsertSliceOp` is updated to re-use
    `createWriteOrMaskedWrite` + `createReadOrMaskedRead` (rather than
    re-implementing similar logic).

CONTEXT
-------
This change is effectively re-factoring the logic to vectorize
`tensor.insert_slice`. This logic was recently updated to support
masking:
  * llvm#122927
  * llvm#123031

I am mentioning those PRs for context - in the reviews we discussed
re-using `createWriteOrMaskedWrite` + `createReadOrMaskedRead` to
simplify the code, but at the time it wasn't possible (due to e.g.
missing argument for passing write indices or restrictive assumptions).

OVERVIEW OF CHANGES FOR `vectorizeAsInsertSliceOp`
--------------------------------------------------
* Introduce a clear distinction into the "destination tensor" and the
  "vector to store". This is achieved through meaningful variables:
  `destType`/`destShape`/`destRank` and
  `vecToStoreType`/`vecToStoreShape`/`vecToStoreRank`, respectively.
* Use the newly introduced variables to make sure that e.g. using the
  correct shape when computing attributes for the `xfer_write` Op. For
  example, use the source vector rank (rather than the destination tensor
  rank) to specify the size of the "in-bounds" attribute to use.
* Remove the assumption that `vecToStoreRank == destRank` - this does
  not hold in some existing `tensor.insert_slice` tests
  (and, in general, was an artificial limitation).
* Extract the missing mask dimension (partially specified via
  `inputVecSizesForLeadingDims`) from `vecToStoreShape` rather than
  `destShape`. Note: we ideally should not require
  `inputVecSizesForLeadingDims` at all - mask shape should be computed
  based on `vecToStoreType`.

OTHER CHANGES
-------------

A new helper hook is added: `xferReadNoMask`. This method is meant to
help identify cases where masking is redundant (to reduce our reliance
on canonicalization to remove them later). For the example below, we
know that we can safely skip masking (based on shapes and write
indices):

```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

CHANGES IN TESTS
----------------
TODO

NOTE FOR REVIEWERS
------------------
TODO

(*) Note: this method doesn't really belong in Vectorization.cpp. It
will be moved to VectorUtils.cpp, alongside createReadOrMaskedRead, in a
follow-up PR.
banach-space added a commit that referenced this pull request May 23, 2025
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * #122927
  * #123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `vectorizeAsInsertSliceOp`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  in_bounds. For example, the size of the in_bounds array now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` — this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
   `inputVecSizesForLeadingDims` at all — mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

TEST CHANGES
-----------
This patch primarily affects vectorization of:
  * `tensor.insert_slice`, now refactored to use shared hooks.

`tensor.pad` vectorization patterns, which internally use
`tensor.insert_slice`, are also _effectively_ updated. Note, only
pad-with-patterns.mlir is affected.

Most test updates involve the insertion of masks that were previously
missing — this reflects a correctness fix, not a regression. In all
cases, the added masks are indeed required.

You’ll also notice more repeated constants (`arith.constant 0 : index`),
due to increased use of helper hooks. This will be cleaned up separately
via a constant cache (see #138265 for discussion).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.

(* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
banach-space added a commit to banach-space/llvm-project that referenced this pull request May 24, 2025
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `vectorizeAsInsertSliceOp`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  in_bounds. For example, the size of the in_bounds array now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` — this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
   `inputVecSizesForLeadingDims` at all — mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

TEST CHANGES
-----------
This patch primarily affects vectorization of:
  * `tensor.insert_slice`, now refactored to use shared hooks.

`tensor.pad` vectorization patterns, which internally use
`tensor.insert_slice`, are also _effectively_ updated. Note, only
pad-with-patterns.mlir is affected.

Most test updates involve the insertion of masks that were previously
missing — this reflects a correctness fix, not a regression. In all
cases, the added masks are indeed required.

You’ll also notice more repeated constants (`arith.constant 0 : index`),
due to increased use of helper hooks. This will be cleaned up separately
via a constant cache (see llvm#138265 for discussion).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
    updated test in transform-vector.mlir for an example that would
    benefit from this.

(* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
banach-space added a commit to banach-space/llvm-project that referenced this pull request May 29, 2025
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `vectorizeAsInsertSliceOp`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  in_bounds. For example, the size of the in_bounds array now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` — this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
   `inputVecSizesForLeadingDims` at all — mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

TEST CHANGES
-----------
This patch primarily affects vectorization of:
  * `tensor.insert_slice`, now refactored to use shared hooks.

`tensor.pad` vectorization patterns, which internally use
`tensor.insert_slice`, are also _effectively_ updated. Note, only
pad-with-patterns.mlir is affected.

Most test updates involve the insertion of masks that were previously
missing — this reflects a correctness fix, not a regression. In all
cases, the added masks are indeed required.

You’ll also notice more repeated constants (`arith.constant 0 : index`),
due to increased use of helper hooks. This will be cleaned up separately
via a constant cache (see llvm#138265 for discussion).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
    updated test in transform-vector.mlir for an example that would
    benefit from this.

(* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
banach-space added a commit to banach-space/llvm-project that referenced this pull request May 30, 2025
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `vectorizeAsInsertSliceOp`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  in_bounds. For example, the size of the in_bounds array now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` — this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
   `inputVecSizesForLeadingDims` at all — mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

TEST CHANGES
-----------
This patch primarily affects vectorization of:
  * `tensor.insert_slice`, now refactored to use shared hooks.

`tensor.pad` vectorization patterns, which internally use
`tensor.insert_slice`, are also _effectively_ updated. Note, only
pad-with-patterns.mlir is affected.

Most test updates involve the insertion of masks that were previously
missing — this reflects a correctness fix, not a regression. In all
cases, the added masks are indeed required.

You’ll also notice more repeated constants (`arith.constant 0 : index`),
due to increased use of helper hooks. This will be cleaned up separately
via a constant cache (see llvm#138265 for discussion).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
    updated test in transform-vector.mlir for an example that would
    benefit from this.

(* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
banach-space added a commit that referenced this pull request Jun 7, 2025
…41244)

This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * #122927
  * #123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* #142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see #138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address #142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 7, 2025
…e reuse (#141244)

This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm/llvm-project#122927
  * llvm/llvm-project#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* llvm/llvm-project#142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see #138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address #142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…vm#141244)

This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* llvm#142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see llvm#138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address llvm#142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
…vm#141244)

This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* llvm#142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see llvm#138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address llvm#142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…vm#141244)

This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * llvm#122927
  * llvm#123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* llvm#142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see llvm#138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address llvm#142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants