-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379
Conversation
This PR fixes an issue identified in llvm#118786, where the following example triggers a verification error: ```mlir func.func @foo( %src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> { %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32> return %0 : tensor<5x1xf32> } ``` The error occurs because of faulty logic when computing dynamic sizes for `tensor::EmptyOp`, which initializes tensors for `linalg::transpose`. This specific example fails due to: * Dynamic inner tile size, combined with * Non-identity permutation. For reference, here's the verification error: ```bash error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1 ``` and here's the problematic `tensor.empty` (printed in generic form): ```mlir %1 = "tensor.empty"() : () -> tensor<?x2xf32> ``` **Fix** This PR refactors how dynamic sizes for `tensor::EmptyOp` are computed. Instead of generating a separate vector of dynamic sizes to complement the output shape, this PR adopts a common MLIR idiom: passing a vector of `OpFoldResult`s directly to the `EmptyOp` builder. Previously, only dynamic sizes corresponding to outer dimensions were tracked (see `dynamicSizes`), while inner dimensions were skipped, causing issues in certain cases. **Refactoring changes** Variable names have been updated for better readability: * `readShape` → `readShapeForExtractSlice` * `readSizes` → `extractSliceSizes` * `readStrides` → `stridesForExtractSlice` Additional comments have been added for clarity. **Remaining inconsistencies** Patterns for `tensor::PackOp` and `tensor::UnpackOp` remain partially inconsistent: `DecomposeOuterUnitDimsPackOpPattern` enforces that all outer dimensions must be unit-sized, while `DecomposeOuterUnitDimsUnPackOpPattern` does not. The two implementations have diverged in logic. I plan to address these inconsistencies in a follow-up PR to further unify these patterns.
@llvm/pr-subscribers-mlir-linalg Author: Andrzej Warzyński (banach-space) ChangesThis PR fixes an issue identified in #118786, where the following func.func @<!-- -->foo(
%src: tensor<1x1x2x?xf32>,
%dest: tensor<5x1xf32>,
%tile_dim: index) -> tensor<5x1xf32> {
%0 = tensor.unpack %src
inner_dims_pos = [1, 0]
inner_tiles = [2, %tile_dim]
into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
return %0 : tensor<5x1xf32>
} The error occurs because of faulty logic when computing dynamic sizes
For reference, here's the verification error: error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1 and here's the problematic %1 = "tensor.empty"() : () -> tensor<?x2xf32> Fix Refactoring changes
Additional comments have been added for clarity. Remaining inconsistencies
Full diff: https://github.com/llvm/llvm-project/pull/119379.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
"require the tiled outer dimensions of the result are all 1s");
}
- // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+ // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+ // %extracted_tile = tensor.extract_slice(%unpack_op_input)
Location loc = unpackOp.getLoc();
Value source = unpackOp.getSource();
DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
unpackOp.getDimAndTileMapping();
Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
Attribute oneIdxAttr = rewriter.getIndexAttr(1);
- SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
- SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
- SmallVector<OpFoldResult> readSizes;
- SmallVector<int64_t> readShape;
- SmallVector<Value> dynamicDims;
+
+ // The sizes, affset and strides attributes for ExtractSliceOp.
+ SmallVector<OpFoldResult> extractSliceSizes;
+ SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+ SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+ // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+ // extractSliceSizes).
+ SmallVector<int64_t> readShapeForExtractSlice;
+
+ // Shape for EmptyOp that's used as the init value for TransposeOp below.
+ // This should match tile size + transposition.
+ SmallVector<OpFoldResult> shapeForEmptyOp;
+
for (auto i : llvm::seq<unsigned>(0, destRank)) {
+ // Given the assumption that all outer tiled dims are 1, the corresponding
+ // slice size to read is also 1. As this will be rank-reducing "extract
+ // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+ // update:
+ // * the output shape for ExtractSliceOp, nor
+ // * the shape for EmptyOp.
if (dimAndTileMapping.count(i)) {
- readSizes.push_back(oneIdxAttr);
+ extractSliceSizes.push_back(oneIdxAttr);
continue;
}
+ // Compute sizes attribute for ExtractSliceOp + EmptyOp
if (ShapedType::isDynamic(srcShape[i])) {
- Value dynamicDim =
+ OpFoldResult dynamicDim =
rewriter.create<tensor::DimOp>(loc, source, i).getResult();
- readSizes.push_back(dynamicDim);
- dynamicDims.push_back(dynamicDim);
+ extractSliceSizes.push_back(dynamicDim);
+ shapeForEmptyOp.push_back(dynamicDim);
} else {
- readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+ extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+ if (srcShape[i] != 1)
+ shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+ }
+ // Compute the output shape for ExtractSliceOp (take into account
+ // rank-reducing)
+ if (srcShape[i] != 1) {
+ readShapeForExtractSlice.push_back(srcShape[i]);
}
- if (srcShape[i] != 1)
- readShape.push_back(srcShape[i]);
}
auto mixedTiles = unpackOp.getMixedTiles();
- readSizes.append(mixedTiles.begin(), mixedTiles.end());
+ // TODO: This effectively assumes that that tile sizes match the trailing
+ // sizes for ExtractSliceOp and EmptyOp - document this.
+ extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+ shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
// Explicitly create the type for extract_slice op because the inner tile
// size could be 1. We want to represent the whole inner tile in this case.
auto tileShape = srcShape.drop_front(destRank);
// Append the inner tile shape to the permuted and rank-reduced outer shape.
- readShape.append(tileShape.begin(), tileShape.end());
+ readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
Type elemType = unpackOp.getSourceType().getElementType();
- auto readType = RankedTensorType::get(readShape, elemType);
+ auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
- loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+ loc, readType, unpackOp.getSource(), extractSliceOffsets,
+ extractSliceSizes, extractSliceStrides);
// 2. Transpose the tile to match the outer corresponding tile order.
SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
// Unpack is a transition out of packed space so we invert the permutation.
perm = invertPermutationVector(perm);
- SmallVector<int64_t> transpShape(readShape);
- applyPermutationToVector<int64_t>(transpShape, perm);
+ applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
Value empty =
- rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+ rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
auto transposedOp =
rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
// 3. Handle in-complete tiles if needed. It truncates trailing data from the
// transposed tile.
- int numLoops = transpShape.size();
+ int numLoops = shapeForEmptyOp.size();
SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
/// Same as example above, but with 1 dynamic tile size.
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
- %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+ %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
return %0 : tensor<5x1xf32>
}
// CHECK-LABEL: func.func @simple_unpack_dynamic_tile
// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME: %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK: %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME: %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK: %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
// CHECK-NOT: linalg.transpose
// They have the same type, so the insert_slice op is folded
// away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
/// Same as example above, but with 1 dynamic tile size and a trasnpose
-/// FIXME: This is currently broken:
-/// * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+ %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+ return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL: func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK: %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK: %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK: %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME: ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME: outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME: permutation = [1, 0]
+// CHECK: %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK: return %[[SLICE]] : tensor<5x1xf32>
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-// %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-// return %0 : tensor<5x1xf32>
-//}
/// Same as example above, but with 1 scalable tile size.
|
@llvm/pr-subscribers-mlir Author: Andrzej Warzyński (banach-space) ChangesThis PR fixes an issue identified in #118786, where the following func.func @<!-- -->foo(
%src: tensor<1x1x2x?xf32>,
%dest: tensor<5x1xf32>,
%tile_dim: index) -> tensor<5x1xf32> {
%0 = tensor.unpack %src
inner_dims_pos = [1, 0]
inner_tiles = [2, %tile_dim]
into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
return %0 : tensor<5x1xf32>
} The error occurs because of faulty logic when computing dynamic sizes
For reference, here's the verification error: error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1 and here's the problematic %1 = "tensor.empty"() : () -> tensor<?x2xf32> Fix Refactoring changes
Additional comments have been added for clarity. Remaining inconsistencies
Full diff: https://github.com/llvm/llvm-project/pull/119379.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
"require the tiled outer dimensions of the result are all 1s");
}
- // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+ // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+ // %extracted_tile = tensor.extract_slice(%unpack_op_input)
Location loc = unpackOp.getLoc();
Value source = unpackOp.getSource();
DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
unpackOp.getDimAndTileMapping();
Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
Attribute oneIdxAttr = rewriter.getIndexAttr(1);
- SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
- SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
- SmallVector<OpFoldResult> readSizes;
- SmallVector<int64_t> readShape;
- SmallVector<Value> dynamicDims;
+
+ // The sizes, affset and strides attributes for ExtractSliceOp.
+ SmallVector<OpFoldResult> extractSliceSizes;
+ SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+ SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+ // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+ // extractSliceSizes).
+ SmallVector<int64_t> readShapeForExtractSlice;
+
+ // Shape for EmptyOp that's used as the init value for TransposeOp below.
+ // This should match tile size + transposition.
+ SmallVector<OpFoldResult> shapeForEmptyOp;
+
for (auto i : llvm::seq<unsigned>(0, destRank)) {
+ // Given the assumption that all outer tiled dims are 1, the corresponding
+ // slice size to read is also 1. As this will be rank-reducing "extract
+ // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+ // update:
+ // * the output shape for ExtractSliceOp, nor
+ // * the shape for EmptyOp.
if (dimAndTileMapping.count(i)) {
- readSizes.push_back(oneIdxAttr);
+ extractSliceSizes.push_back(oneIdxAttr);
continue;
}
+ // Compute sizes attribute for ExtractSliceOp + EmptyOp
if (ShapedType::isDynamic(srcShape[i])) {
- Value dynamicDim =
+ OpFoldResult dynamicDim =
rewriter.create<tensor::DimOp>(loc, source, i).getResult();
- readSizes.push_back(dynamicDim);
- dynamicDims.push_back(dynamicDim);
+ extractSliceSizes.push_back(dynamicDim);
+ shapeForEmptyOp.push_back(dynamicDim);
} else {
- readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+ extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+ if (srcShape[i] != 1)
+ shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+ }
+ // Compute the output shape for ExtractSliceOp (take into account
+ // rank-reducing)
+ if (srcShape[i] != 1) {
+ readShapeForExtractSlice.push_back(srcShape[i]);
}
- if (srcShape[i] != 1)
- readShape.push_back(srcShape[i]);
}
auto mixedTiles = unpackOp.getMixedTiles();
- readSizes.append(mixedTiles.begin(), mixedTiles.end());
+ // TODO: This effectively assumes that that tile sizes match the trailing
+ // sizes for ExtractSliceOp and EmptyOp - document this.
+ extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+ shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
// Explicitly create the type for extract_slice op because the inner tile
// size could be 1. We want to represent the whole inner tile in this case.
auto tileShape = srcShape.drop_front(destRank);
// Append the inner tile shape to the permuted and rank-reduced outer shape.
- readShape.append(tileShape.begin(), tileShape.end());
+ readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
Type elemType = unpackOp.getSourceType().getElementType();
- auto readType = RankedTensorType::get(readShape, elemType);
+ auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
- loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+ loc, readType, unpackOp.getSource(), extractSliceOffsets,
+ extractSliceSizes, extractSliceStrides);
// 2. Transpose the tile to match the outer corresponding tile order.
SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
// Unpack is a transition out of packed space so we invert the permutation.
perm = invertPermutationVector(perm);
- SmallVector<int64_t> transpShape(readShape);
- applyPermutationToVector<int64_t>(transpShape, perm);
+ applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
Value empty =
- rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+ rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
auto transposedOp =
rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
// 3. Handle in-complete tiles if needed. It truncates trailing data from the
// transposed tile.
- int numLoops = transpShape.size();
+ int numLoops = shapeForEmptyOp.size();
SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
/// Same as example above, but with 1 dynamic tile size.
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
- %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+ %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
return %0 : tensor<5x1xf32>
}
// CHECK-LABEL: func.func @simple_unpack_dynamic_tile
// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME: %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK: %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME: %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK: %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
// CHECK-NOT: linalg.transpose
// They have the same type, so the insert_slice op is folded
// away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
/// Same as example above, but with 1 dynamic tile size and a trasnpose
-/// FIXME: This is currently broken:
-/// * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+ %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+ return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL: func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME: %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK: %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK: %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK: %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME: ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME: outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME: permutation = [1, 0]
+// CHECK: %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK: return %[[SLICE]] : tensor<5x1xf32>
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-// %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-// return %0 : tensor<5x1xf32>
-//}
/// Same as example above, but with 1 scalable tile size.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but best to give it some time for others to look at.
Some minor inline comments, but nothing critical.
for (auto i : llvm::seq<unsigned>(0, destRank)) { | ||
// Given the assumption that all outer tiled dims are 1, the corresponding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more than an assumption here, since we have the check above.
…Pattern Update comments
Thanks for the review, @rengolin ! @hanhanW, @qedawkins, @Max191 - you reviewed my previous patches in this area. Please let me know if you'd like me to hold off on landing this. Otherwise, I plan to proceed. Just a heads-up: I’ll be traveling for the next ~2 weeks. This is important to me, but I won’t be sending updates until I return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Apologies, this landed without the summary. |
Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * llvm#119379, llvm#121393, llvm#121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements llvm#115698 in which similar test for `tensor.pack` was added.
…#121557) Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * #119379, #121393, #121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements #115698 in which similar test for `tensor.pack` was added.
This PR fixes an issue identified in #118786, where the following
example triggers a verification error:
The error occurs because of faulty logic when computing dynamic sizes
for
tensor::EmptyOp
, which initializes tensors forlinalg::transpose
.This specific example fails due to:
For reference, here's the verification error:
error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
and here's the problematic
tensor.empty
(printed in generic form):Fix
This PR refactors how dynamic sizes for
tensor::EmptyOp
are computed. Insteadof generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of
OpFoldResult
sdirectly to the
EmptyOp
builder. Previously, only dynamic sizes correspondingto outer dimensions were tracked (see
dynamicSizes
), while inner dimensionswere skipped, causing issues in certain cases.
Refactoring changes
Variable names have been updated for better readability:
readShape
→readShapeForExtractSlice
readSizes
→extractSliceSizes
readStrides
→stridesForExtractSlice
Additional comments have been added for clarity.
Remaining inconsistencies
Patterns for
tensor::PackOp
andtensor::UnpackOp
remain partially inconsistent:DecomposeOuterUnitDimsPackOpPattern
enforces that all outer dimensions must beunit-sized, while
DecomposeOuterUnitDimsUnPackOpPattern
does not. The twoimplementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.