-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][tensor] Extend the logic to generalise tensor.pack #109815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][tensor] Extend the logic to generalise tensor.pack #109815
Conversation
@llvm/pr-subscribers-mlir-tensor @llvm/pr-subscribers-mlir-linalg Author: Andrzej Warzyński (banach-space) Changes
Full diff: https://github.com/llvm/llvm-project/pull/109815.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Tensor/Utils/Utils.h b/mlir/include/mlir/Dialect/Tensor/Utils/Utils.h
index 84d06d456bb689..db5a15c9ec3550 100644
--- a/mlir/include/mlir/Dialect/Tensor/Utils/Utils.h
+++ b/mlir/include/mlir/Dialect/Tensor/Utils/Utils.h
@@ -14,12 +14,13 @@
namespace mlir {
namespace tensor {
-// Return a PadOp that pads `source` to `type` size where the static
-// sizes are assumed to be greater than the dynamic sizes. If `type` has dynamic
-// dimensions the padding width is set to zero. The op performs "high" padding
-// (i.e. it adds trailing padding values until the desired size is met).
+// Return a PadOp that pads `source` to `type` size. Output sizes (from `type`)
+// are assumed to be static and greater than the potentially dynamic input sizes
+// (from `source). The op performs "high" padding (i.e. it adds trailing padding
+// values until the desired size is met).
PadOp createPadHighOp(RankedTensorType type, Value source, Value pad,
- bool nofold, Location loc, OpBuilder &builder);
+ bool nofold, Location loc, OpBuilder &builder,
+ std::optional<Value> dynOutDim = {});
// Creates dim ops for each dynamic dimension of the ranked tensor argument and
// returns these as values.
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index e0dea8e78d55c1..42389b431566eb 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1021,8 +1021,16 @@ LogicalResult ExtractSliceOfPadTensorSwapPattern::matchAndRewrite(
return success();
}
-/// Returns a tensor.pad op if padding value is set. Otherwise, returns the
-/// source directly. The method assumes that the `packOp` has static shapes.
+/// If padding value is set, returns a tensor.pad Op for the source tensor,
+/// with the output shape matching the output of `packOp`. Otherwise, returns
+/// the source directly.
+///
+/// This method assumes that all outer dims for this pack Op are 1.
+///
+/// At most _one_ inner tile size can be _dynamic_, all other inner tiles are
+/// required to have static sizes. The inner tile that's dynamic must be a
+/// multiple of vector.vscale (to support scalable tile sizes). This condition
+/// can be relaxed in the future.
static Value getPackOpSourceOrPaddedSource(OpBuilder &builder,
tensor::PackOp packOp) {
Value input = packOp.getSource();
@@ -1038,26 +1046,50 @@ static Value getPackOpSourceOrPaddedSource(OpBuilder &builder,
ShapedType inputType = packOp.getSourceType();
int64_t inputRank = inputType.getRank();
- SmallVector<int64_t> paddedShape;
DenseMap<int64_t, OpFoldResult> tileAndPosMapping =
packOp.getDimAndTileMapping();
- for (int64_t dim = 0; dim < inputRank; ++dim) {
- int64_t size = inputType.getDimSize(dim);
- if (!tileAndPosMapping.count(dim)) {
- paddedShape.push_back(size);
+
+ // The size of a scalable tile (if present).
+ Value scalableSize;
+
+ // Collect dims for the padded shape.
+ SmallVector<int64_t> paddedShape;
+ for (int64_t dimIdx = 0; dimIdx < inputRank; ++dimIdx) {
+ int64_t inputDimSize = inputType.getDimSize(dimIdx);
+ // 1. Non-tiled outer dims.
+ // These dims should be 1 and we simply preserve them.
+ if (!tileAndPosMapping.count(dimIdx)) {
+ assert(inputDimSize == 1 &&
+ "with all outer dims == 1, this non-tiled input dim should be 1!");
+ paddedShape.push_back(inputDimSize);
+ continue;
+ }
+
+ // 2. Tiled outer dims
+ // As all outer dims == 1, it is safe to use the tile size for the padded
+ // shape.
+ OpFoldResult tileSizeForDim = tileAndPosMapping.lookup(dimIdx);
+
+ // 2.1 Static tile sizes
+ std::optional<int64_t> cstTileSize = getConstantIntValue(tileSizeForDim);
+ if (cstTileSize.has_value()) {
+ paddedShape.push_back(cstTileSize.value());
continue;
}
- // The size is less than or equal to tileSize because outer dims are all 1s.
- std::optional<int64_t> tileSize =
- getConstantIntValue(tileAndPosMapping.lookup(dim));
- assert(tileSize.has_value() && "dynamic inner tile size is not supported");
- paddedShape.push_back(tileSize.value());
+ // 2.2 Dynamic tile sizes
+ paddedShape.push_back(ShapedType::kDynamic);
+
+ // Get the value that holds the scalable size.
+ assert(!scalableSize && "Only one scalable size is supported ATM.");
+ scalableSize = llvm::dyn_cast_if_present<Value>(tileSizeForDim);
+ assert(vector::getConstantVscaleMultiplier(scalableSize) &&
+ "This dynamic shape is not a multiple of vscale, this !");
}
auto resultType =
RankedTensorType::get(paddedShape, inputType.getElementType());
return tensor::createPadHighOp(resultType, input, packOp.getPaddingValue(),
- /*nofold=*/false, loc, builder);
+ /*nofold=*/false, loc, builder, scalableSize);
}
// Normalizes a permutation on a higher rank space to its actual size, e.g.
@@ -1120,10 +1152,18 @@ getPackUnpackRankReducedPerm(ArrayRef<int64_t> shape,
LogicalResult GeneralizeOuterUnitDimsPackOpPattern::matchAndRewrite(
tensor::PackOp packOp, PatternRewriter &rewriter) const {
- if (llvm::any_of(packOp.getMixedTiles(),
- [](OpFoldResult tile) { return tile.is<Value>(); })) {
- return rewriter.notifyMatchFailure(packOp,
- "require inner tile sizes being static");
+ if (llvm::any_of(packOp.getMixedTiles(), [](OpFoldResult tile) {
+ return tile.is<Value>() && !vector::getConstantVscaleMultiplier(
+ llvm::dyn_cast<Value>(tile));
+ })) {
+ return rewriter.notifyMatchFailure(
+ packOp, "require inner tile sizes to be either static or a constant "
+ "multiple of vector.vscale");
+ }
+ if (llvm::count_if(packOp.getMixedTiles(),
+ [](OpFoldResult tile) { return tile.is<Value>(); }) > 1) {
+ return rewriter.notifyMatchFailure(
+ packOp, "at most one dynamic tile size is supported");
}
// TODO: support the case that outer dimensions are not all 1s. A
@@ -1181,7 +1221,23 @@ LogicalResult GeneralizeOuterUnitDimsPackOpPattern::matchAndRewrite(
SmallVector<int64_t> transpShape = readShape;
applyPermutationToVector<int64_t>(transpShape, perm);
- Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType);
+ // If there's a tile with a scalable size, retrieve its size. ATM only 1
+ // scalable tile is allowed.
+ Value scalableSize;
+ for (auto tile : packOp.getMixedTiles()) {
+ if (tile.is<Value>()) {
+ assert(!scalableSize && "Only one scalable size is supported ATM.");
+ scalableSize = cast<Value>(tile);
+ assert(vector::getConstantVscaleMultiplier(scalableSize) &&
+ "This dynamic shape is not a multiple of vscale!");
+ }
+ }
+
+ Value empty =
+ ShapedType::isDynamicShape(cast<ShapedType>(input.getType()).getShape())
+ ? rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType,
+ scalableSize)
+ : rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType);
auto transposedOp =
rewriter.create<linalg::TransposeOp>(loc, tile, empty, perm);
diff --git a/mlir/lib/Dialect/Tensor/Utils/Utils.cpp b/mlir/lib/Dialect/Tensor/Utils/Utils.cpp
index a0d8a08fc6ba47..7b25d9747827e3 100644
--- a/mlir/lib/Dialect/Tensor/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Tensor/Utils/Utils.cpp
@@ -16,6 +16,7 @@
#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Arith/Utils/Utils.h"
#include "mlir/Dialect/Utils/IndexingUtils.h"
+#include "mlir/Dialect/Vector/IR//VectorOps.h"
#include "mlir/Interfaces/ValueBoundsOpInterface.h"
using namespace mlir;
@@ -23,19 +24,40 @@ using namespace mlir::tensor;
PadOp mlir::tensor::createPadHighOp(RankedTensorType type, Value source,
Value pad, bool nofold, Location loc,
- OpBuilder &b) {
+ OpBuilder &b,
+ std::optional<Value> dynOutDim) {
+ assert(llvm::count_if(
+ type.getShape(),
+ [](int64_t dim) { return ShapedType::isDynamic(dim); }) <= 1 &&
+ "At most one output dim can be dynamic!");
+
+ // Init "low" and "high" padding values ("low" is kept as is, "high" is
+ // computed below).
SmallVector<OpFoldResult> low(type.getRank(), b.getIndexAttr(0));
SmallVector<OpFoldResult> high(type.getRank(), b.getIndexAttr(0));
+
for (const auto &en : enumerate(type.getShape())) {
- // Pad only the static dimensions of the result tensor type.
- if (ShapedType::isDynamic(en.value()))
- continue;
- // Compute the padding width.
- AffineExpr d0;
- bindDims(b.getContext(), d0);
- OpFoldResult sz = tensor::getMixedSize(b, loc, source, en.index());
- high[en.index()] =
- affine::makeComposedFoldedAffineApply(b, loc, en.value() - d0, {sz});
+ if (!ShapedType::isDynamic(en.value())) {
+ // Static sizes - the "high" value is computed based on the input and
+ // output dims. Compute the padding width.
+ AffineExpr d0;
+ bindDims(b.getContext(), d0);
+ OpFoldResult sz = tensor::getMixedSize(b, loc, source, en.index());
+ high[en.index()] =
+ affine::makeComposedFoldedAffineApply(b, loc, en.value() - d0, {sz});
+ } else {
+ // Dynamic sizes - the "high" value is computed based on the input dim
+ // and `dynOutDim`.
+ assert(dynOutDim.has_value() &&
+ "dynamic output dim requires dynOutDim to be set");
+
+ // Compute the padding width.
+ AffineExpr d0, d1;
+ auto dimVal = b.create<tensor::DimOp>(loc, source, en.index());
+ bindDims(b.getContext(), d0, d1);
+ high[en.index()] = affine::makeComposedFoldedAffineApply(
+ b, loc, d0 - d1, {dynOutDim.value(), dimVal.getResult()});
+ }
}
return b.create<PadOp>(loc, type, source, low, high, pad, nofold);
}
diff --git a/mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir b/mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir
index 7d87a0994004fe..66a220005ebf36 100644
--- a/mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir
+++ b/mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir
@@ -23,6 +23,8 @@ func.func @simple_pad_and_pack(%input: tensor<5x1xf32>, %output: tensor<1x1x8x2x
%0 = tensor.pack %input padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [8, 2] into %output : tensor<5x1xf32> -> tensor<1x1x8x2xf32>
return %0 : tensor<1x1x8x2xf32>
}
+// CHECK: #[[$ATTR_0:.+]] = affine_map<()[s0, s1] -> (s0 - s1)>
+
// CHECK-LABEL: func.func @simple_pad_and_pack
// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
@@ -34,6 +36,39 @@ func.func @simple_pad_and_pack(%input: tensor<5x1xf32>, %output: tensor<1x1x8x2x
// CHECK-SAME: [0, 0, 0, 0] [1, 1, 8, 2] [1, 1, 1, 1]
// CHECK: return %[[INSERT]]
+/// Same as example above, but with scalable sizes.
+
+/// NOTE: For this example to make sense in practice, the "?" in the output shape
+/// should effectively be 8 * vector.vscale (and that's what tensor.dim
+/// below should return).
+
+func.func @simple_pad_and_pack_scalable(%input: tensor<5x1xf32>, %output: tensor<1x1x?x2xf32>, %pad: f32) -> tensor<1x1x?x2xf32> {
+ %c8 = arith.constant 8 : index
+ %vscale = vector.vscale
+ %c8_vscale = arith.muli %vscale, %c8 : index
+ %0 = tensor.pack %input padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c8_vscale, 2] into %output : tensor<5x1xf32> -> tensor<1x1x?x2xf32>
+ return %0 : tensor<1x1x?x2xf32>
+}
+
+
+// CHECK-LABEL: func.func @simple_pad_and_pack_scalable(
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]: tensor<5x1xf32>,
+// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]: tensor<1x1x?x2xf32>,
+// CHECK-SAME: %[[PAD_VAL:[a-zA-Z0-9]+]]: f32) -> tensor<1x1x?x2xf32> {
+// CHECK: %[[C2:.+]] = arith.constant 2 : index
+// CHECK: %[[C5:.+]] = arith.constant 5 : index
+// CHECK: %[[C8:.+]] = arith.constant 8 : index
+// CHECK: %[[VS:.+]] = vector.vscale
+// CHECK: %[[C8_VS:.+]] = arith.muli %[[VS]], %[[C8]] : index
+// CHECK: %[[PAD_HIGH:.+]] = affine.apply #[[$ATTR_0]](){{\[}}%[[C8_VS]], %[[C5]]]
+// CHECK: %[[PAD:.+]] = tensor.pad %[[SRC]] low[0, 0] high{{\[}}%[[PAD_HIGH]], 1] {
+// CHECK: tensor.yield %[[PAD_VAL]] : f32
+// CHECK-NOT: linalg.transpose
+// CHECK: %[[SLICE:.+]] = tensor.extract_slice %[[PAD:.+]][0, 0] {{\[}}%[[C8_VS]], 2] [1, 1] : tensor<?x2xf32> to tensor<?x2xf32>
+// CHECK: %[[DIM:.+]] = tensor.dim %[[DEST]], %[[C2]] : tensor<1x1x?x2xf32>
+// CHECK: %[[RES:.+]] = tensor.insert_slice %[[SLICE]] into %[[DEST]][0, 0, 0, 0] [1, 1, %[[DIM]], 2] [1, 1, 1, 1] : tensor<?x2xf32> into tensor<1x1x?x2xf32>
+// CHECK: return %[[RES]] : tensor<1x1x?x2xf32>
+
// -----
func.func @simple_NC_to_CNnc(%arg0: tensor<32x8xf32>, %arg1: tensor<1x1x32x8xf32>) -> tensor<1x1x32x8xf32>{
|
// output dims. Compute the padding width. | ||
AffineExpr d0; | ||
bindDims(b.getContext(), d0); | ||
OpFoldResult sz = tensor::getMixedSize(b, loc, source, en.index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hoist it out of the if-condition and use it in both branches. It creates the tensor::DimOp
when the shape is dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! This gave me an idea for even more re-use. Let me extend what you are proposing here :)
SmallVector<int64_t> transpShape = readShape; | ||
applyPermutationToVector<int64_t>(transpShape, perm); | ||
|
||
Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType); | ||
// If there's a tile with a scalable size, retrieve its size. ATM only 1 | ||
// scalable tile is allowed. | ||
Value scalableSize; | ||
for (auto tile : packOp.getMixedTiles()) { | ||
if (tile.is<Value>()) { | ||
assert(!scalableSize && "Only one scalable size is supported ATM."); | ||
scalableSize = cast<Value>(tile); | ||
assert(vector::getConstantVscaleMultiplier(scalableSize) && | ||
"This dynamic shape is not a multiple of vscale!"); | ||
} | ||
} | ||
|
||
Value empty = | ||
ShapedType::isDynamicShape(cast<ShapedType>(input.getType()).getShape()) | ||
? rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, | ||
scalableSize) | ||
: rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify it by using SmallVector<OpFoldResult>
? I think the readSizes
already has the shape (without transposition). So it can be something like:
SmallVector<OpFoldResult> transpShape = readSize;
applyPermutationToVector<OpFoldResult>(transpShape, perm);
Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably right that this can be simplified, but it's not immediately obvious to me :) Note that for this example:
%0 = tensor.pack %arg0 inner_dims_pos = [3, 2] inner_tiles = [8, 32] into %arg1 : tensor<1x1x32x8xf32> -> tensor<1x1x1x1x8x32xf32>
readSizes
and readShape
are [1, 1, 32, 8]
and [32, 8]
, respectively. I need to take another look 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, I misread the code. The unit dims are not inserted to readShape
. Perhaps we can add SmallVector<OpFoldResult> transpShape
to l.1190, and insert the OpFoldResult
values to the vector and use it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. There's quite a few vectors with shapes/offsets/etc. I've renamed two of them so that we have e.g. readShapeForExtractSlice
rather than a bit abstract readShape
. Let me know what you think :)
PadOp createPadHighOp(RankedTensorType type, Value source, Value pad, | ||
bool nofold, Location loc, OpBuilder &builder); | ||
bool nofold, Location loc, OpBuilder &builder, | ||
std::optional<Value> dynOutDim = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extend it to multiple dynamic shapes later for sure, but why not we just support it in the patch?
std::optional<Value> dynOutDim = {}); | |
std::optional<SmallVector<Value>> dynOutDim = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For purely pragmatic reasons, I'd like to stick to one dynamic shape for now :) Would extending this in a separate PR be OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is okay. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for multiple dynamic dims is now included :)
3051ec9
to
b3da00e
Compare
Hey @hanhanW , would you be able to take another look? Here's an overview of changes since your last comment:
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few nits!
// If there's a tile with a dynamic size, retrieve its size. ATM only 1 | ||
// dynamic tile is allowed. | ||
Value dynDimSize; | ||
for (auto tile : packOp.getMixedTiles()) { | ||
if (tile.is<Value>()) { | ||
assert(!dynDimSize && "Only one scalable size is supported ATM."); | ||
dynDimSize = cast<Value>(tile); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove the dead code. We don't need this, right? There are no uses, and the function that should account this is getPackOpSourceOrPaddedSource
.
// Collect dims for the padded shape. | ||
SmallVector<int64_t> paddedShape; | ||
for (int64_t dimIdx = 0; dimIdx < inputRank; ++dimIdx) { | ||
int64_t inputDimSize = inputType.getDimSize(dimIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move it into the if-body because it is only used there.
Value outDim = isDimDynamic | ||
? dynOutDims[outDimIdx++] | ||
: b.create<arith::ConstantIndexOp>(loc, val).getResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is easier if we use OpFoldResult here. You don't need to create the zero constant when it is not needed. The idea is to avoid creating IR if we don't need to. Can you give it a shot?
OpFoldResult outDim = isDimDynamic
? dynOutDims[outDimIdx++]
: b.getIndexAttr(val);
or something like:
OpFoldResult outDim = b.getIndexAttr(val);;
if (isDimDynamic)
outDim = dynOutDims[outDimIdx++];
// CHECK: %[[C2:.+]] = arith.constant 2 : index | ||
// CHECK: %[[C8:.+]] = arith.constant 8 : index | ||
// CHECK: %[[VS:.+]] = vector.vscale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use CHECK-DAG which is more robust in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments, LGTM!
Extends the logic to generalise tensor.pack (into e.g. tensor.pad + tensor.transpose) so that it also works when one of the inner tile sizes is scalable (i.e. a multiple of `vector.vscale`). For example: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = tensor.pack %input padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c8_vscale, 2] into %output : tensor<5x1xf32> -> tensor<1x1x?x2xf32> } ``` is generalised as: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = affine.apply #map()[%c8_vscale, %c5] %padded = tensor.pad %arg0 low[0, 0] high[%0, 1] { ^bb0(%arg3: index, %arg4: index): tensor.yield %arg2 : f32 } : tensor<5x1xf32> to tensor<?x2xf32> ``` At the Tensor level, we model scalability using dynamic shapes and this change basically extends the relevant logic so that it also works for dynamic shapes. However, rather than allowing arbitrary values and number of tile sizes to be dynamic, only _one_ tile size is allowed to be dynamic. In addition, it is required to be a constant multiple of `vector.vscale`. While the requirements above can be relaxed, I wanted to avoid full generality for now. Primarily to avoid complexity that's not yet needed and to make reviewing a bit easier.
Address PR comments from Han-Chung. Some clean-up and also relaxing the requirement that the dynamic dim has to be a constant multiple of vector.vscale.
* Use `OpFoldResult` for the output shape of tensor::EmptyOp (to simplify the invocation) * Rename `readShape` as `readShapeForExtractSlice` * Rename `transpShape` as `transShapeForEmpty`
Allow multiple dynamic dims and remove the dependency on llvm#109667
…nsor.pack Incorporating suggestions from @hanhanW
…lise tensor.pack Remove comment that's no longer valid
3e9521c
to
2d362c8
Compare
Make sure that the logic to generalize tensor.pack (into e.g. tensor.pad tensor.transpose) does indeed allow multiple dynamic tile sizes. This was effectively already implemented in llvm#109815 - in this PR I merely removing one `if` condition and adding a test. I also took the liberty of renaming a few test functions - just to better highlight the differences between the old and the new tests. Follow-on for llvm#109815.
Make sure that the logic to generalize tensor.pack (into e.g. tensor.pad tensor.transpose) does indeed allow multiple dynamic tile sizes. This was effectively already implemented in #109815 - in this PR I merely removing one `if` condition and adding a test. I also took the liberty of renaming a few test functions - just to better highlight the differences between the old and the new tests. Follow-on for #109815.
Extends the logic to generalise tensor.pack (into e.g. tensor.pad + tensor.transpose) so that it also works when one of the inner tile sizes is scalable (i.e. a multiple of `vector.vscale`). For example: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = tensor.pack %input padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c8_vscale, 2] into %output : tensor<5x1xf32> -> tensor<1x1x?x2xf32> } ``` is generalised as: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = affine.apply #map()[%c8_vscale, %c5] %padded = tensor.pad %arg0 low[0, 0] high[%0, 1] { ^bb0(%arg3: index, %arg4: index): tensor.yield %arg2 : f32 } : tensor<5x1xf32> to tensor<?x2xf32> ``` At the Tensor level, we model scalability using dynamic shapes and this change basically extends the relevant logic so that it also works for dynamic shapes.
Make sure that the logic to generalize tensor.pack (into e.g. tensor.pad tensor.transpose) does indeed allow multiple dynamic tile sizes. This was effectively already implemented in llvm#109815 - in this PR I merely removing one `if` condition and adding a test. I also took the liberty of renaming a few test functions - just to better highlight the differences between the old and the new tests. Follow-on for llvm#109815.
Extends the logic to generalise tensor.pack (into e.g. tensor.pad +
tensor.transpose) so that it also works when one of the inner tile sizes
is scalable (i.e. a multiple of
vector.vscale
). For example:is generalised as:
At the Tensor level, we model scalability using dynamic shapes and this
change basically extends the relevant logic so that it also works for
dynamic shapes.