-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[mlir][linalg] Add support for masked vectorization of tensor.insert_slice
(2/N)
#123031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][linalg] Add support for masked vectorization of tensor.insert_slice
(2/N)
#123031
Conversation
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) Changes
Patch is 35.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123031.diff 5 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 1dc700f22c2027..726ce22ac70dc3 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1723,11 +1723,6 @@ void populateDecomposePadPatterns(RewritePatternSet &patterns);
/// \see rewriteInIm2Col for more details.
void populateConvertConv2DToImg2ColPatterns(RewritePatternSet &patterns);
-/// Populates `patterns` with vectorisation patterns for tensor.insert_slice.
-/// TODO: Avoid having a dedicated `populate{}` for one pattern. Instead, either
-/// expand or merge with other `populate{}`.
-void populateInsertSliceVectorizationPatterns(RewritePatternSet &patterns);
-
/// Populates `patterns` with patterns that vectorize tensor.pad.
/// These patterns are meant to apply in a complementary fashion. Benefits
/// are used to encode a certain ordering of pattern application. To avoid
diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
index 67dd21aafe4fe0..73a52ebc46f15b 100644
--- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
+++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
@@ -265,7 +265,6 @@ void transform::ApplyFoldAddIntoDestPatternsOp::populatePatterns(
void transform::ApplyPadVectorizationPatternsOp::populatePatterns(
RewritePatternSet &patterns) {
linalg::populatePadOpVectorizationPatterns(patterns);
- linalg::populateInsertSliceVectorizationPatterns(patterns);
}
//===----------------------------------------------------------------------===//
@@ -3504,9 +3503,6 @@ transform::VectorizeChildrenAndApplyPatternsOp::applyToOne(
patterns.add<CopyVectorizationPattern>(ctx);
- // Add misc. vectorization patterns (e.g. for tensor.insert_slice)
- linalg::populateInsertSliceVectorizationPatterns(patterns);
-
if (getVectorizePadding()) {
linalg::populatePadOpVectorizationPatterns(patterns);
// This creates an alternative path for lowering tensor.pad - by
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 863f2280e46ce6..3c59bcb8d6ecb3 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -59,6 +59,30 @@ vectorizeConvolution(RewriterBase &rewriter, LinalgOp convOp,
ArrayRef<bool> inputVecScalableFlags = {},
bool flatten1DDepthwiseConv = false);
+/// Vectorize tensor::InsertSliceOp with:
+/// * vector::TransferReadOp + vector::TransferWriteOp
+/// The vector sizes are either:
+/// * user-provided in `inputVectorSizes`, or
+/// * inferred from the static dims in the input and output tensors.
+/// Bails out if:
+/// * vector sizes are not user-provided, and
+/// * at least one dim is dynamic (in both the input and output tensors),
+/// bails out.
+///
+/// Before:
+/// !t_in_type = tensor<1x2x3xf32>
+/// !t_out_type = tensor<9x8x7x1x2x3xf32>
+/// !v_type = vector<1x2x3xf32>
+/// %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type
+/// into !t_out_type
+/// After:
+/// %read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
+/// %write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type
+static LogicalResult
+vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
+ ArrayRef<int64_t> inputVectorSizes,
+ SmallVectorImpl<Value> &newResults);
+
/// Return the unique instance of OpType in `block` if it is indeed unique.
/// Return null if none or more than 1 instances exist.
template <typename OpType>
@@ -1557,6 +1581,7 @@ static LogicalResult
vectorizeAsTensorPackOp(RewriterBase &rewriter, tensor::PackOp packOp,
ArrayRef<int64_t> inputVectorSizes,
SmallVectorImpl<Value> &newResults) {
+ // TODO: Introduce a parent class that will handle the insertion point update.
OpBuilder::InsertionGuard g(rewriter);
rewriter.setInsertionPoint(packOp);
@@ -1633,6 +1658,7 @@ vectorizeAsTensorUnpackOp(RewriterBase &rewriter, tensor::UnPackOp unpackOp,
ArrayRef<int64_t> inputVectorSizes,
SmallVectorImpl<Value> &newResults) {
+ // TODO: Introduce a parent class that will handle the insertion point update.
OpBuilder::InsertionGuard g(rewriter);
rewriter.setInsertionPoint(unpackOp);
@@ -1763,7 +1789,7 @@ vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,
auto padValue = padOp.getConstantPaddingValue();
Location loc = padOp.getLoc();
- // transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value))
+ // TODO: Introduce a parent class that will handle the insertion point update.
OpBuilder::InsertionGuard g(rewriter);
rewriter.setInsertionPoint(padOp);
@@ -1874,6 +1900,15 @@ vectorizeUnPackOpPrecondition(tensor::UnPackOp unpackOp,
return success();
}
+/// Need to check if the inner-tiles are static/constant.
+static LogicalResult
+vectorizeInsertSliceOpPrecondition(tensor::InsertSliceOp sliceOp,
+ ArrayRef<int64_t> inputVectorSizes) {
+
+ // TODO: Move pre-conditions from the vectorization logic
+ return success();
+}
+
static LogicalResult vectorizeLinalgOpPrecondition(
LinalgOp linalgOp, ArrayRef<int64_t> inputVectorSizes,
bool vectorizeNDExtract, bool flatten1DDepthwiseConv) {
@@ -2144,6 +2179,9 @@ LogicalResult mlir::linalg::vectorizeOpPrecondition(
.Case<tensor::UnPackOp>([&](auto unpackOp) {
return vectorizeUnPackOpPrecondition(unpackOp, inputVectorSizes);
})
+ .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
+ return vectorizeInsertSliceOpPrecondition(sliceOp, inputVectorSizes);
+ })
.Default([](auto) { return failure(); });
}
@@ -2163,8 +2201,8 @@ static void convertAffineApply(RewriterBase &rewriter, LinalgOp linalgOp) {
}
bool mlir::linalg::hasVectorizationImpl(Operation *op) {
- return isa<linalg::LinalgOp, tensor::PadOp, tensor::PackOp, tensor::UnPackOp>(
- op);
+ return isa<linalg::LinalgOp, tensor::PadOp, tensor::PackOp, tensor::UnPackOp,
+ tensor::InsertSliceOp>(op);
}
/// Emit a suitable vector form for an operation. If provided,
@@ -2178,6 +2216,7 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
ArrayRef<bool> inputScalableVecDims,
bool vectorizeNDExtract,
bool flatten1DDepthwiseConv) {
+ rewriter.getInsertionPoint();
LDBG("Attempting to vectorize:\n" << *op << "\n");
LDBG("Input vector sizes: ");
LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));
@@ -2244,6 +2283,10 @@ LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,
return vectorizeAsTensorPackOp(rewriter, packOp, inputVectorSizes,
results);
})
+ .Case<tensor::InsertSliceOp>([&](auto sliceOp) {
+ return vectorizeAsInsertSliceOp(rewriter, sliceOp, inputVectorSizes,
+ results);
+ })
.Case<tensor::UnPackOp>([&](auto unpackOp) {
return vectorizeAsTensorUnpackOp(rewriter, unpackOp,
inputVectorSizes, results);
@@ -2583,113 +2626,143 @@ static Value getStaticPadVal(Operation *op) {
return {};
}
-/// Rewrite tensor.insert.slice as a vector.transfer_read +
-/// vector.transfer_write pair. The vector size is inferred from the static
-/// dims in the input and output tensors. If a dim is dynamic in both the input
-/// and output tensors, bails out.
-///
-/// Before:
-/// !t_in_type = tensor<1x2x3xf32>
-/// !t_out_type = tensor<9x8x7x1x2x3xf32>
-/// !v_type = vector<1x2x3xf32>
-/// %inserted_slice = tensor.insert_slice %src into %dest ... : !t_in_type
-/// into !t_out_type
-/// After:
-/// %read = vector.transfer_read %src[...], %pad ... : !t_in_type, !v_type
-/// %write = vector.transfer_write %read, %dest ... : !v_type, !t_out_type
-///
-/// TODO: Support masking
-struct InsertSliceVectorizePattern
- : public OpRewritePattern<tensor::InsertSliceOp> {
- using OpRewritePattern<tensor::InsertSliceOp>::OpRewritePattern;
+static LogicalResult
+vectorizeAsInsertSliceOp(RewriterBase &rewriter, tensor::InsertSliceOp sliceOp,
+ ArrayRef<int64_t> inputVectorSizes,
+ SmallVectorImpl<Value> &newResults) {
+ // TODO: Introduce a parent class that will handle the insertion point update.
+ OpBuilder::InsertionGuard g(rewriter);
+ rewriter.setInsertionPoint(sliceOp);
- LogicalResult matchAndRewrite(tensor::InsertSliceOp sliceOp,
- PatternRewriter &rewriter) const final {
- auto sourceType = sliceOp.getSource().getType();
- if (!VectorType::isValidElementType(sourceType.getElementType()))
- return failure();
+ TypedValue<RankedTensorType> source = sliceOp.getSource();
+ auto sourceType = source.getType();
+ if (!VectorType::isValidElementType(sourceType.getElementType()))
+ return failure();
- auto resultType = sliceOp.getResultType();
-
- // 1. Get the pad value.
- // TransferReadOp requires a scalar padding value. Note that:
- // * for in-bounds access, the value is actually irrelevant.
- // There are 2 cases in which xfer.read accesses are known to be in-bounds:
- // 1. The source shape is static (output vector sizes would be based on
- // the source shape and hence all memory accesses would be in-bounds),
- // 2. Masking is used (output vector sizes would be user-provided, in which
- // case it is assumed that all memory accesses are in-bounds). This
- // remains a TODO.
- //
- // When the value is not known and not needed, use 0. Otherwise, bail out.
- Value padValue = getStaticPadVal(sliceOp);
- bool isOutOfBoundsRead = !sourceType.hasStaticShape();
-
- if (!padValue && isOutOfBoundsRead) {
- LDBG("Failed to get a pad value for out-of-bounds read access\n");
+ auto resultType = sliceOp.getResultType();
+
+ // 1. Get the pad value.
+ // TransferReadOp requires a scalar padding value. Note that:
+ // * for in-bounds access, the value is actually irrelevant.
+ // There are 2 cases in which xfer.read accesses are known to be in-bounds:
+ // 1. The source shape is static (output vector sizes would be based on
+ // the source shape and hence all memory accesses would be in-bounds),
+ // 2. Masking is used (output vector sizes would be user-provided, in which
+ // case it is assumed that all memory accesses are in-bounds). This
+ // remains a TODO.
+ //
+ // When the value is not known and not needed, use 0. Otherwise, bail out.
+ Value padValue = getStaticPadVal(sliceOp);
+ bool isOutOfBoundsRead =
+ !sourceType.hasStaticShape() && inputVectorSizes.empty();
+
+ if (!padValue && isOutOfBoundsRead) {
+ LDBG("Failed to get a pad value for out-of-bounds read access\n");
+ return failure();
+ }
+
+ if (!padValue) {
+ auto elemType = sourceType.getElementType();
+ padValue = rewriter.create<arith::ConstantOp>(
+ sliceOp.getLoc(), elemType, rewriter.getZeroAttr(elemType));
+ }
+
+ // 2. Get the vector shape and in-bounds attributes
+ SmallVector<int64_t> vecShape;
+ SmallVector<bool> readInBounds;
+ SmallVector<bool> writeInBounds;
+ size_t rankDiff = resultType.getRank() - sourceType.getRank();
+ for (unsigned i = 0; i < sourceType.getRank(); ++i) {
+ if (!inputVectorSizes.empty()) {
+ vecShape.push_back(inputVectorSizes[i]);
+ readInBounds.push_back(false);
+ writeInBounds.push_back(false);
+ } else if (!sourceType.isDynamicDim(i)) {
+ vecShape.push_back(sourceType.getDimSize(i));
+ // Source shape is statically known: Neither read nor write are
+ // out-of-bounds.
+ readInBounds.push_back(true);
+ writeInBounds.push_back(true);
+ } else if (!resultType.isDynamicDim(i)) {
+ // Source shape is not statically known, but result shape is.
+ // Vectorize with size of result shape. This may be larger than the
+ // source size.
+ // FIXME: Using rankDiff implies that the source tensor is inserted at
+ // the end of the destination tensor. However, that's not required.
+ vecShape.push_back(resultType.getDimSize(rankDiff + i));
+ // Read may be out-of-bounds because the result size could be larger
+ // than the source size.
+ readInBounds.push_back(false);
+ // Write will be in-bounds provided that the corresponding write idx is 0.
+ // To keep this logic simple, conservatively mark as out-of-bounds.
+ writeInBounds.push_back(false);
+ } else {
+ // Neither source nor result dim of padOp is static. Cannot vectorize
+ // the copy.
+ // TODO: Add support for masking
return failure();
}
+ }
+ auto vecType = VectorType::get(vecShape, sourceType.getElementType());
+
+ // 3. Generate TransferReadOp + TransferWriteOp
+ ReifiedRankedShapedTypeDims reifiedSrcSizes;
+ Value maskOp;
- if (!padValue) {
- auto elemType = sourceType.getElementType();
- padValue = rewriter.create<arith::ConstantOp>(
- sliceOp.getLoc(), elemType, rewriter.getZeroAttr(elemType));
+ // If vector sizes are user provided, make sure to mask. First, generate the
+ // mask.
+ if (!inputVectorSizes.empty()) {
+ auto *srcDefOp = source.getDefiningOp();
+ if (!srcDefOp) {
+ LDBG("Unable to get the defining Op of " << sliceOp);
+ return failure();
}
- // 2. Get the vector shape and in-bounds attributes
- SmallVector<int64_t> vecShape;
- SmallVector<bool> readInBounds;
- SmallVector<bool> writeInBounds;
- size_t rankDiff = resultType.getRank() - sourceType.getRank();
- for (unsigned i = 0; i < sourceType.getRank(); ++i) {
- if (!sourceType.isDynamicDim(i)) {
- vecShape.push_back(sourceType.getDimSize(i));
- // Source shape is statically known: Neither read nor write are
- // out-of-bounds.
- readInBounds.push_back(true);
- writeInBounds.push_back(true);
- } else if (!resultType.isDynamicDim(i)) {
- // Source shape is not statically known, but result shape is.
- // Vectorize with size of result shape. This may be larger than the
- // source size.
- // FIXME: Using rankDiff implies that the source tensor is inserted at
- // the end of the destination tensor. However, that's not required.
- vecShape.push_back(resultType.getDimSize(rankDiff + i));
- // Read may be out-of-bounds because the result size could be larger
- // than the source size.
- readInBounds.push_back(false);
- // Write will in-bounds provided that the corresponding write idx is 0.
- // To keep this logic simple, conservatively mark as out-of-bounds.
- writeInBounds.push_back(false);
- } else {
- // Neither source nor result dim of padOp is static. Cannot vectorize
- // the copy.
- // TODO: Add support for masking
- return failure();
- }
+ LogicalResult status =
+ cast<ReifyRankedShapedTypeOpInterface>(srcDefOp).reifyResultShapes(
+ rewriter, reifiedSrcSizes);
+ if (status.failed()) {
+ LDBG("Unable to reify result shapes of " << srcDefOp);
+ return failure();
}
- auto vecType = VectorType::get(vecShape, sourceType.getElementType());
- // 3. Generate TransferReadOp.
- SmallVector<Value> readIndices(
- vecType.getRank(),
- rewriter.create<arith::ConstantIndexOp>(sliceOp.getLoc(), 0));
- auto read = rewriter.create<vector::TransferReadOp>(
- sliceOp.getLoc(), vecType, sliceOp.getSource(), readIndices, padValue,
- ArrayRef<bool>{readInBounds});
+ // Create the mask
+ auto readMaskType = VectorType::get(inputVectorSizes, rewriter.getI1Type());
+ maskOp = rewriter.create<vector::CreateMaskOp>(
+ sliceOp.getLoc(), readMaskType, reifiedSrcSizes[0]);
+ }
- // 4. Generate TransferWriteOp.
- auto writeIndices = getValueOrCreateConstantIndexOp(
- rewriter, sliceOp.getLoc(), sliceOp.getMixedOffsets());
+ // 3.a. TransferReadOp
+ SmallVector<Value> readIndices(
+ vecType.getRank(),
+ rewriter.create<arith::ConstantIndexOp>(sliceOp.getLoc(), 0));
+ Operation *read = rewriter.create<vector::TransferReadOp>(
+ sliceOp.getLoc(), vecType, source, readIndices, padValue,
+ ArrayRef<bool>{readInBounds});
- // 5. Finalize
- rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
- sliceOp, read, sliceOp.getDest(), writeIndices,
- ArrayRef<bool>{writeInBounds});
+ // Mask the xfer_read Op
+ if (!inputVectorSizes.empty()) {
+ read = mlir::vector::maskOperation(rewriter, read, maskOp);
+ }
- return success();
+ // 3.b. TransferWriteOp
+ auto writeIndices = getValueOrCreateConstantIndexOp(
+ rewriter, sliceOp.getLoc(), sliceOp.getMixedOffsets());
+
+ Operation *write = rewriter.create<vector::TransferWriteOp>(
+ sliceOp.getLoc(), read->getResult(0), sliceOp.getDest(), writeIndices,
+ ArrayRef<bool>{writeInBounds});
+
+ // Mask the xfer_write Op
+ if (!inputVectorSizes.empty()) {
+ write = mlir::vector::maskOperation(rewriter, write, maskOp);
}
-};
+
+ // 4. Finalize
+ newResults.push_back(write->getResult(0));
+
+ return success();
+}
/// Rewrite use of tensor::PadOp result in InsertSliceOp. E.g.:
/// ```
@@ -2778,11 +2851,6 @@ struct PadOpVectorizationWithInsertSlicePattern
}
};
-void mlir::linalg::populateInsertSliceVectorizationPatterns(
- RewritePatternSet &patterns) {
- patterns.add<InsertSliceVectorizePattern>(patterns.getContext());
-}
-
void mlir::linalg::populatePadOpVectorizationPatterns(
RewritePatternSet &patterns, PatternBenefit baseBenefit) {
patterns.add<PadOpVectorizationWithTransferReadPattern,
diff --git a/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir b/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
index 08a3bbbb301c87..747b6f6d90cc7f 100644
--- a/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization-pad-patterns.mlir
@@ -224,34 +224,16 @@ module attributes {transform.with_named_sequence} {
}
}
-
// -----
-///----------------------------------------------------------------------------------------
-/// tensor::PadOp -> tensor::EmptyOp + linalg::FillOp/tensor::GenerateOp + tensor::InsertSliceOp
-/// [Pattern: GenericPadOpVectorizationPattern + InsertSliceVectorizePattern]
-/// TODO: Split the test into two, one for each pattern.
-///----------------------------------------------------------------------------------------
-
func.func private @make_vector() -> tensor<12x13xf32>
-// Same as @pad_and_insert_slice_dest in vectorization-with-patterns.mlir, but
-// over here linalg::fill is not vectorized (patterns for linalg.fill are not
-// included here)
-// CHECK-LABEL: func.func @pad_and_insert_slice_dest(
-// CHECK-SAME: %[[ARG_0:.*]]: tensor<1x5x6xf32>) -> tensor<1x12x13xf32> {
-// CHECK-NOT: tensor.pad
-// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
-// CHECK-DAG: %[[PAD:.*]] = arith.constant 5.000000e+00 : f32
-// CHECK-DAG: %[[PAD_READ:.*]] = arith.constant 0.000000e+00 : f32
-// CHECK: %[[EMPTY:.*]] = tensor.empty() : tensor<1x12x13xf32>
-// CHECK: %[[FILL:.*]] = linalg.fill ins(%[[PAD]] : f32) outs(%[[EMPTY]] : tensor<1x12x13xf32>) -> tensor<1x12x13xf32>
-// CHECK: %[[READ_1:.*]] = vector.transfer_read %[[ARG_0]]{{\[}}%[[C0]], %[[C0]], %[[C0]]], %[[PAD]] {in_bounds = [true, true, true]} : tensor<1x5x6xf32>, vector<1x5x6xf32>
-// CHECK: %[[WRITE_1:.*]] = vector.transfer_write %[[READ_1]], %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true]} : vector<1x5x6xf32>, tensor<1x12x13xf32>
-// CHECK: %[[VEC:.*]] = call @make_vector() : () -> ten...
[truncated]
|
tensor.insert_slice
(2/N)
Following on from llvm#122927 + llvm#123031 that added support for masked vectorization of `tensor.insert_slice`, this PR extends the e2e test for `tensor.unpack` to leverage the new functionality.
acd4d90
to
9802803
Compare
Following on from llvm#122927 + llvm#123031 that added support for masked vectorization of `tensor.insert_slice`, this PR extends the e2e test for `tensor.unpack` to leverage the new functionality.
9802803
to
4e23f1b
Compare
Following on from llvm#122927 + llvm#123031 that added support for masked vectorization of `tensor.insert_slice`, this PR extends the e2e test for `tensor.unpack` to leverage the new functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, just few nits about the comments!
ShapedType::isDynamicShape(resultType.getShape())) { | ||
LDBG("TODO: Masking of xfer_write when vectorising " << sliceOp); | ||
return failure(); | ||
// 3.a. TransferReadOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally do not like this kind of comments because it is redundant to me. Comments are important, for sure. They aim to describe what the code does and why. However, I believe that the best code is self-documenting in some cases. In this snippet, it is clear to me that we are generating a transfer_read op. And the comment does not explain anything else than it. Thus, we do not need the comment, IMO.
https://llvm.org/docs/CodingStandards.html#commenting
(Here is the reference from Google c++ style guide, though we do not need to follow it in LLVM/MLIR. https://google.github.io/styleguide/cppguide.html#Comments)
sliceOp.getLoc(), vecType, source, readIndices, padValue, | ||
ArrayRef<bool>{readInBounds}); | ||
|
||
// Mask the xfer_read Op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: I think the line 2732 comment already documents it. IMO, they belong to the same function/snippet, so I think we do not need the comment.
// If vector sizes are user provided, make sure to mask. First, generate the
// mask.
ArrayRef<bool>{readInBounds}); | ||
// 3. Generate TransferReadOp + TransferWriteOp | ||
ReifiedRankedShapedTypeDims reifiedSrcSizes; | ||
Value maskOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the declaration to where it is initialized, i.e., l.2742?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that l.2742 sits within an if
block and the generated mask is also used outside, e.g. l.2756.
This is roughly the structure:
Value maskOp;
if (!inputVectorSizes.empty()) {
// Generate the mask - this will make `if (maskOp)` below evaluate to TRUE
}
// Generate readOp
if (maskOp) {
// Mask the readOp
}
// Generate writeOp (depends on readOp)
if (maskOp) {
// Mask the writeOp
}
I have ideas how to improve this, but no spare cycles 😢 (there's createWriteOrMaskedWrite
and createReadOrMaskedRead
that we should re-use here, but that won't work as-is).
If that's OK, will add this to my TODO list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I missed it. I see, thanks!
…_slice` (2/N) For context, recall that `tensor.insert_slice` is vectorised using the `vector.transfer_read` + `vector.transfer_write` pair. An unmasked example is shown below: ```mlir // BEFORE VECTORIZATION %res = tensor.insert_slice %slice into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32> // AFTER VECTORIZATION %read = vector.transfer_read %source[%c0, %c0], %pad : tensor<5x1xi32>, vector<8x1xi32> %res = vector.transfer_write %read, %dest[%c0, %c2] : vector<8x1xi32>, tensor<5x3xi32> ``` This PR extends `vectorizeAsInsertSliceOp` to add masking support for the `vector.transfer_write` operation. This complements the changes in llvm#122927, which introduced masking for the `vector.transfer_read`.
….insert_slice` (2/N) Address comment from Hanhan
4e23f1b
to
814379d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanhanW - those were good suggestions 🙏🏻
Rebased, addressed comments + switched from if (!inputVectorSizes.empty())
to if (maskOp)
(where relevant).
ArrayRef<bool>{readInBounds}); | ||
// 3. Generate TransferReadOp + TransferWriteOp | ||
ReifiedRankedShapedTypeDims reifiedSrcSizes; | ||
Value maskOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that l.2742 sits within an if
block and the generated mask is also used outside, e.g. l.2756.
This is roughly the structure:
Value maskOp;
if (!inputVectorSizes.empty()) {
// Generate the mask - this will make `if (maskOp)` below evaluate to TRUE
}
// Generate readOp
if (maskOp) {
// Mask the readOp
}
// Generate writeOp (depends on readOp)
if (maskOp) {
// Mask the writeOp
}
I have ideas how to improve this, but no spare cycles 😢 (there's createWriteOrMaskedWrite
and createReadOrMaskedRead
that we should re-use here, but that won't work as-is).
If that's OK, will add this to my TODO list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
ArrayRef<bool>{readInBounds}); | ||
// 3. Generate TransferReadOp + TransferWriteOp | ||
ReifiedRankedShapedTypeDims reifiedSrcSizes; | ||
Value maskOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I missed it. I see, thanks!
Thanks for the review HanHan! I am bit behind with my PRs, so will land it later today if there are no new comments. Just a small note for other "potential" reviewers :) |
|
||
// If vector sizes are user provided, make sure to mask xfer_read. | ||
// If vector sizes are user provided, make sure to mask. First, generate the | ||
// mask. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't user-provided vector sizes lead to an unmasked scenario? We have a method that checks if mask is needed here (can't remember the name right now). Couldn't use it for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's createWriteOrMaskedWrite
that I mentioned in my reply to HanHan :) I hit some issue with that, so will re-visit in a separate PR (I'm quite keen to progress this and to reduce my PR backlog)
|
||
// CHECK-LABEL: func.func private @insert_slice_dynamic_dest_dim( | ||
// CHECK-SAME: %[[SRC:.*]]: tensor<?x3x?x1xi32>, | ||
// CHECK-SAME: %[[size:.*]]: index) -> tensor<?x3xi32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capital?
…`tensor.insert_slice` (2/N) Fix capitalisation
Following on from llvm#122927 + llvm#123031 that added support for masked vectorization of `tensor.insert_slice`, this PR extends the e2e test for `tensor.unpack` to leverage the new functionality.
…_slice` (2/N) (llvm#123031) For context, recall that `tensor.insert_slice` is vectorised using the `vector.transfer_read` + `vector.transfer_write` pair. An unmasked example is shown below: ```mlir // BEFORE VECTORIZATION %res = tensor.insert_slice %slice into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32> // AFTER VECTORIZATION %read = vector.transfer_read %source[%c0, %c0], %pad : tensor<5x1xi32>, vector<8x1xi32> %res = vector.transfer_write %read, %dest[%c0, %c2] : vector<8x1xi32>, tensor<5x3xi32> ``` This PR extends `vectorizeAsInsertSliceOp` to add masking support for the `vector.transfer_write` operation. This complements the changes in llvm#122927, which introduced masking for the `vector.transfer_read`.
…lvm#123032) Following on from llvm#122927 + llvm#123031 that added support for masked vectorization of `tensor.insert_slice`, this PR extends the e2e test for `tensor.unpack` to leverage the new functionality.
This PR refactors the following two hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a parameter to specify write indices - alignining it with its "read" counterpart: `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to re-use `createWriteOrMaskedWrite` + `createReadOrMaskedRead` (rather than re-implementing similar logic). CONTEXT ------- This change is effectively re-factoring the logic to vectorize `tensor.insert_slice`. This logic was recently updated to support masking: * llvm#122927 * llvm#123031 I am mentioning those PRs for context - in the reviews we discussed re-using `createWriteOrMaskedWrite` + `createReadOrMaskedRead` to simplify the code, but at the time it wasn't possible (due to e.g. missing argument for passing write indices or restrictive assumptions). OVERVIEW OF CHANGES FOR `vectorizeAsInsertSliceOp` -------------------------------------------------- * Introduce a clear distinction into the "destination tensor" and the "vector to store". This is achieved through meaningful variables: `destType`/`destShape`/`destRank` and `vecToStoreType`/`vecToStoreShape`/`vecToStoreRank`, respectively. * Use the newly introduced variables to make sure that e.g. using the correct shape when computing attributes for the `xfer_write` Op. For example, use the source vector rank (rather than the destination tensor rank) to specify the size of the "in-bounds" attribute to use. * Remove the assumption that `vecToStoreRank == destRank` - this does not hold in some existing `tensor.insert_slice` tests (and, in general, was an artificial limitation). * Extract the missing mask dimension (partially specified via `inputVecSizesForLeadingDims`) from `vecToStoreShape` rather than `destShape`. Note: we ideally should not require `inputVecSizesForLeadingDims` at all - mask shape should be computed based on `vecToStoreType`. OTHER CHANGES ------------- A new helper hook is added: `xferReadNoMask`. This method is meant to help identify cases where masking is redundant (to reduce our reliance on canonicalization to remove them later). For the example below, we know that we can safely skip masking (based on shapes and write indices): ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. CHANGES IN TESTS ---------------- TODO NOTE FOR REVIEWERS ------------------ TODO (*) Note: this method doesn't really belong in Vectorization.cpp. It will be moved to VectorUtils.cpp, alongside createReadOrMaskedRead, in a follow-up PR.
This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * #122927 * #123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `vectorizeAsInsertSliceOp` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like in_bounds. For example, the size of the in_bounds array now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` — this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all — mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. TEST CHANGES ----------- This patch primarily affects vectorization of: * `tensor.insert_slice`, now refactored to use shared hooks. `tensor.pad` vectorization patterns, which internally use `tensor.insert_slice`, are also _effectively_ updated. Note, only pad-with-patterns.mlir is affected. Most test updates involve the insertion of masks that were previously missing — this reflects a correctness fix, not a regression. In all cases, the added masks are indeed required. You’ll also notice more repeated constants (`arith.constant 0 : index`), due to increased use of helper hooks. This will be cleaned up separately via a constant cache (see #138265 for discussion). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. (* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `vectorizeAsInsertSliceOp` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like in_bounds. For example, the size of the in_bounds array now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` — this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all — mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. TEST CHANGES ----------- This patch primarily affects vectorization of: * `tensor.insert_slice`, now refactored to use shared hooks. `tensor.pad` vectorization patterns, which internally use `tensor.insert_slice`, are also _effectively_ updated. Note, only pad-with-patterns.mlir is affected. Most test updates involve the insertion of masks that were previously missing — this reflects a correctness fix, not a regression. In all cases, the added masks are indeed required. You’ll also notice more repeated constants (`arith.constant 0 : index`), due to increased use of helper hooks. This will be cleaned up separately via a constant cache (see llvm#138265 for discussion). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. (* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `vectorizeAsInsertSliceOp` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like in_bounds. For example, the size of the in_bounds array now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` — this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all — mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. TEST CHANGES ----------- This patch primarily affects vectorization of: * `tensor.insert_slice`, now refactored to use shared hooks. `tensor.pad` vectorization patterns, which internally use `tensor.insert_slice`, are also _effectively_ updated. Note, only pad-with-patterns.mlir is affected. Most test updates involve the insertion of masks that were previously missing — this reflects a correctness fix, not a regression. In all cases, the added masks are indeed required. You’ll also notice more repeated constants (`arith.constant 0 : index`), due to increased use of helper hooks. This will be cleaned up separately via a constant cache (see llvm#138265 for discussion). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. (* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `vectorizeAsInsertSliceOp` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like in_bounds. For example, the size of the in_bounds array now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` — this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all — mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. TEST CHANGES ----------- This patch primarily affects vectorization of: * `tensor.insert_slice`, now refactored to use shared hooks. `tensor.pad` vectorization patterns, which internally use `tensor.insert_slice`, are also _effectively_ updated. Note, only pad-with-patterns.mlir is affected. Most test updates involve the insertion of masks that were previously missing — this reflects a correctness fix, not a regression. In all cases, the added masks are indeed required. You’ll also notice more repeated constants (`arith.constant 0 : index`), due to increased use of helper hooks. This will be cleaned up separately via a constant cache (see llvm#138265 for discussion). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. (* This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.)
…41244) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * #122927 * #123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * #142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see #138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address #142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
…e reuse (#141244) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm/llvm-project#122927 * llvm/llvm-project#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * llvm/llvm-project#142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see #138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address #142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
…vm#141244) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * llvm#142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see llvm#138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address llvm#142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
…vm#141244) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * llvm#142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see llvm#138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address llvm#142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
…vm#141244) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * llvm#122927 * llvm#123031 At the time, reuse of the shared `create*` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- * Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * llvm#142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see llvm#138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address llvm#142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.
For context, recall that
tensor.insert_slice
is vectorised using thevector.transfer_read
+vector.transfer_write
pair. An unmaskedexample is shown below:
This PR extends
vectorizeAsInsertSliceOp
to add masking support forthe
vector.transfer_write
operation. This complements the changesin #122927, which introduced masking for the
vector.transfer_read
.