[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

banach-space · 2024-12-10T13:37:58Z

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @foo(
    %src: tensor<1x1x2x?xf32>,
    %dest: tensor<5x1xf32>,
    %tile_dim: index) -> tensor<5x1xf32> {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>

  return %0 : tensor<5x1xf32>
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

Dynamic inner tile size, combined with
Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -> tensor<?x2xf32>

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

readShape → readShapeForExtractSlice
readSizes → extractSliceSizes
readStrides → stridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.

@foo

This PR fixes an issue identified in llvm#118786, where the following example triggers a verification error: ```mlir func.func @foo( %src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> { %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32> return %0 : tensor<5x1xf32> } ``` The error occurs because of faulty logic when computing dynamic sizes for `tensor::EmptyOp`, which initializes tensors for `linalg::transpose`. This specific example fails due to: * Dynamic inner tile size, combined with * Non-identity permutation. For reference, here's the verification error: ```bash error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1 ``` and here's the problematic `tensor.empty` (printed in generic form): ```mlir %1 = "tensor.empty"() : () -> tensor<?x2xf32> ``` **Fix** This PR refactors how dynamic sizes for `tensor::EmptyOp` are computed. Instead of generating a separate vector of dynamic sizes to complement the output shape, this PR adopts a common MLIR idiom: passing a vector of `OpFoldResult`s directly to the `EmptyOp` builder. Previously, only dynamic sizes corresponding to outer dimensions were tracked (see `dynamicSizes`), while inner dimensions were skipped, causing issues in certain cases. **Refactoring changes** Variable names have been updated for better readability: * `readShape` → `readShapeForExtractSlice` * `readSizes` → `extractSliceSizes` * `readStrides` → `stridesForExtractSlice` Additional comments have been added for clarity. **Remaining inconsistencies** Patterns for `tensor::PackOp` and `tensor::UnpackOp` remain partially inconsistent: `DecomposeOuterUnitDimsPackOpPattern` enforces that all outer dimensions must be unit-sized, while `DecomposeOuterUnitDimsUnPackOpPattern` does not. The two implementations have diverged in logic. I plan to address these inconsistencies in a follow-up PR to further unify these patterns.

llvmbot · 2024-12-10T13:38:33Z

@llvm/pr-subscribers-mlir-linalg

Author: Andrzej Warzyński (banach-space)

Changes

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @<!-- -->foo(
    %src: tensor&lt;1x1x2x?xf32&gt;,
    %dest: tensor&lt;5x1xf32&gt;,
    %tile_dim: index) -&gt; tensor&lt;5x1xf32&gt; {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor&lt;1x1x2x?xf32&gt; -&gt; tensor&lt;5x1xf32&gt;

  return %0 : tensor&lt;5x1xf32&gt;
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

Dynamic inner tile size, combined with
Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -&gt; tensor&lt;?x2xf32&gt;

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

readShape → readShapeForExtractSlice
readSizes → extractSliceSizes
readStrides → stridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.

Full diff: https://github.com/llvm/llvm-project/pull/119379.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+45-21)
(modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir (+20-10)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
         "require the tiled outer dimensions of the result are all 1s");
   }
 
-  // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+  // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+  //    %extracted_tile = tensor.extract_slice(%unpack_op_input)
   Location loc = unpackOp.getLoc();
   Value source = unpackOp.getSource();
   DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
       unpackOp.getDimAndTileMapping();
   Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
   Attribute oneIdxAttr = rewriter.getIndexAttr(1);
-  SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
-  SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
-  SmallVector<OpFoldResult> readSizes;
-  SmallVector<int64_t> readShape;
-  SmallVector<Value> dynamicDims;
+
+  // The sizes, affset and strides attributes for ExtractSliceOp.
+  SmallVector<OpFoldResult> extractSliceSizes;
+  SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+  SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+  // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+  // extractSliceSizes).
+  SmallVector<int64_t> readShapeForExtractSlice;
+
+  // Shape for EmptyOp that's used as the init value for TransposeOp below.
+  // This should match tile size + transposition.
+  SmallVector<OpFoldResult> shapeForEmptyOp;
+
   for (auto i : llvm::seq<unsigned>(0, destRank)) {
+    // Given the assumption that all outer tiled dims are 1, the corresponding
+    // slice size to read is also 1. As this will be rank-reducing "extract
+    // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+    // update:
+    //  * the output shape for ExtractSliceOp, nor
+    //  * the shape for EmptyOp.
     if (dimAndTileMapping.count(i)) {
-      readSizes.push_back(oneIdxAttr);
+      extractSliceSizes.push_back(oneIdxAttr);
       continue;
     }
 
+    // Compute sizes attribute for ExtractSliceOp + EmptyOp
     if (ShapedType::isDynamic(srcShape[i])) {
-      Value dynamicDim =
+      OpFoldResult dynamicDim =
           rewriter.create<tensor::DimOp>(loc, source, i).getResult();
-      readSizes.push_back(dynamicDim);
-      dynamicDims.push_back(dynamicDim);
+      extractSliceSizes.push_back(dynamicDim);
+      shapeForEmptyOp.push_back(dynamicDim);
     } else {
-      readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      if (srcShape[i] != 1)
+        shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+    }
+    // Compute the output shape for ExtractSliceOp (take into account
+    // rank-reducing)
+    if (srcShape[i] != 1) {
+      readShapeForExtractSlice.push_back(srcShape[i]);
     }
-    if (srcShape[i] != 1)
-      readShape.push_back(srcShape[i]);
   }
   auto mixedTiles = unpackOp.getMixedTiles();
-  readSizes.append(mixedTiles.begin(), mixedTiles.end());
+  // TODO: This effectively assumes that that tile sizes match the trailing
+  // sizes for ExtractSliceOp and EmptyOp - document this.
+  extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+  shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
 
   // Explicitly create the type for extract_slice op because the inner tile
   // size could be 1. We want to represent the whole inner tile in this case.
   auto tileShape = srcShape.drop_front(destRank);
   // Append the inner tile shape to the permuted and rank-reduced outer shape.
-  readShape.append(tileShape.begin(), tileShape.end());
+  readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
   Type elemType = unpackOp.getSourceType().getElementType();
-  auto readType = RankedTensorType::get(readShape, elemType);
+  auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
   Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
-      loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+      loc, readType, unpackOp.getSource(), extractSliceOffsets,
+      extractSliceSizes, extractSliceStrides);
 
   // 2. Transpose the tile to match the outer corresponding tile order.
   SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
       srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
   // Unpack is a transition out of packed space so we invert the permutation.
   perm = invertPermutationVector(perm);
-  SmallVector<int64_t> transpShape(readShape);
-  applyPermutationToVector<int64_t>(transpShape, perm);
+  applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
 
   Value empty =
-      rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+      rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
   auto transposedOp =
       rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
 
   // 3. Handle in-complete tiles if needed. It truncates trailing data from the
   // transposed tile.
-  int numLoops = transpShape.size();
+  int numLoops = shapeForEmptyOp.size();
   SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
   SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
   SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size.
 
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
   return %0 : tensor<5x1xf32>
 }
 // CHECK-LABEL: func.func @simple_unpack_dynamic_tile
 // CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
 // CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME:    %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
 // CHECK-NOT:     linalg.transpose
 //                They have the same type, so the insert_slice op is folded
 //                away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size and a trasnpose
 
-/// FIXME: This is currently broken:
-///   * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+  return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL:   func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:           %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK:           %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK:           %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME:        ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME:        outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME:        permutation = [1, 0]
+// CHECK:           %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK:           return %[[SLICE]] : tensor<5x1xf32>
 
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-//  %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-//  return %0 : tensor<5x1xf32>
-//}
 
 /// Same as example above, but with 1 scalable tile size.

llvmbot · 2024-12-10T13:38:34Z

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @<!-- -->foo(
    %src: tensor&lt;1x1x2x?xf32&gt;,
    %dest: tensor&lt;5x1xf32&gt;,
    %tile_dim: index) -&gt; tensor&lt;5x1xf32&gt; {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor&lt;1x1x2x?xf32&gt; -&gt; tensor&lt;5x1xf32&gt;

  return %0 : tensor&lt;5x1xf32&gt;
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

Dynamic inner tile size, combined with
Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -&gt; tensor&lt;?x2xf32&gt;

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

readShape → readShapeForExtractSlice
readSizes → extractSliceSizes
readStrides → stridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.

Full diff: https://github.com/llvm/llvm-project/pull/119379.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+45-21)
(modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir (+20-10)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
         "require the tiled outer dimensions of the result are all 1s");
   }
 
-  // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+  // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+  //    %extracted_tile = tensor.extract_slice(%unpack_op_input)
   Location loc = unpackOp.getLoc();
   Value source = unpackOp.getSource();
   DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
       unpackOp.getDimAndTileMapping();
   Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
   Attribute oneIdxAttr = rewriter.getIndexAttr(1);
-  SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
-  SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
-  SmallVector<OpFoldResult> readSizes;
-  SmallVector<int64_t> readShape;
-  SmallVector<Value> dynamicDims;
+
+  // The sizes, affset and strides attributes for ExtractSliceOp.
+  SmallVector<OpFoldResult> extractSliceSizes;
+  SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+  SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+  // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+  // extractSliceSizes).
+  SmallVector<int64_t> readShapeForExtractSlice;
+
+  // Shape for EmptyOp that's used as the init value for TransposeOp below.
+  // This should match tile size + transposition.
+  SmallVector<OpFoldResult> shapeForEmptyOp;
+
   for (auto i : llvm::seq<unsigned>(0, destRank)) {
+    // Given the assumption that all outer tiled dims are 1, the corresponding
+    // slice size to read is also 1. As this will be rank-reducing "extract
+    // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+    // update:
+    //  * the output shape for ExtractSliceOp, nor
+    //  * the shape for EmptyOp.
     if (dimAndTileMapping.count(i)) {
-      readSizes.push_back(oneIdxAttr);
+      extractSliceSizes.push_back(oneIdxAttr);
       continue;
     }
 
+    // Compute sizes attribute for ExtractSliceOp + EmptyOp
     if (ShapedType::isDynamic(srcShape[i])) {
-      Value dynamicDim =
+      OpFoldResult dynamicDim =
           rewriter.create<tensor::DimOp>(loc, source, i).getResult();
-      readSizes.push_back(dynamicDim);
-      dynamicDims.push_back(dynamicDim);
+      extractSliceSizes.push_back(dynamicDim);
+      shapeForEmptyOp.push_back(dynamicDim);
     } else {
-      readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      if (srcShape[i] != 1)
+        shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+    }
+    // Compute the output shape for ExtractSliceOp (take into account
+    // rank-reducing)
+    if (srcShape[i] != 1) {
+      readShapeForExtractSlice.push_back(srcShape[i]);
     }
-    if (srcShape[i] != 1)
-      readShape.push_back(srcShape[i]);
   }
   auto mixedTiles = unpackOp.getMixedTiles();
-  readSizes.append(mixedTiles.begin(), mixedTiles.end());
+  // TODO: This effectively assumes that that tile sizes match the trailing
+  // sizes for ExtractSliceOp and EmptyOp - document this.
+  extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+  shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
 
   // Explicitly create the type for extract_slice op because the inner tile
   // size could be 1. We want to represent the whole inner tile in this case.
   auto tileShape = srcShape.drop_front(destRank);
   // Append the inner tile shape to the permuted and rank-reduced outer shape.
-  readShape.append(tileShape.begin(), tileShape.end());
+  readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
   Type elemType = unpackOp.getSourceType().getElementType();
-  auto readType = RankedTensorType::get(readShape, elemType);
+  auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
   Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
-      loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+      loc, readType, unpackOp.getSource(), extractSliceOffsets,
+      extractSliceSizes, extractSliceStrides);
 
   // 2. Transpose the tile to match the outer corresponding tile order.
   SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
       srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
   // Unpack is a transition out of packed space so we invert the permutation.
   perm = invertPermutationVector(perm);
-  SmallVector<int64_t> transpShape(readShape);
-  applyPermutationToVector<int64_t>(transpShape, perm);
+  applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
 
   Value empty =
-      rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+      rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
   auto transposedOp =
       rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
 
   // 3. Handle in-complete tiles if needed. It truncates trailing data from the
   // transposed tile.
-  int numLoops = transpShape.size();
+  int numLoops = shapeForEmptyOp.size();
   SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
   SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
   SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size.
 
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
   return %0 : tensor<5x1xf32>
 }
 // CHECK-LABEL: func.func @simple_unpack_dynamic_tile
 // CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
 // CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME:    %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
 // CHECK-NOT:     linalg.transpose
 //                They have the same type, so the insert_slice op is folded
 //                away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size and a trasnpose
 
-/// FIXME: This is currently broken:
-///   * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+  return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL:   func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:           %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK:           %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK:           %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME:        ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME:        outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME:        permutation = [1, 0]
+// CHECK:           %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK:           return %[[SLICE]] : tensor<5x1xf32>
 
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-//  %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-//  return %0 : tensor<5x1xf32>
-//}
 
 /// Same as example above, but with 1 scalable tile size.

rengolin

This looks good to me, but best to give it some time for others to look at.

Some minor inline comments, but nothing critical.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir

rengolin · 2024-12-11T11:23:43Z

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

  for (auto i : llvm::seq<unsigned>(0, destRank)) {
+    // Given the assumption that all outer tiled dims are 1, the corresponding


It's more than an assumption here, since we have the check above.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

…Pattern Update comments

banach-space · 2024-12-13T15:45:47Z

Thanks for the review, @rengolin !

@hanhanW, @qedawkins, @Max191 - you reviewed my previous patches in this area. Please let me know if you'd like me to hold off on landing this. Otherwise, I plan to proceed.

Just a heads-up: I’ll be traveling for the next ~2 weeks. This is important to me, but I won’t be sending updates until I return.

hanhanW

thanks

banach-space · 2024-12-30T19:11:45Z

Apologies, this landed without the summary.

Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * llvm#119379, llvm#121393, llvm#121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements llvm#115698 in which similar test for `tensor.pack` was added.

…#121557) Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes. While relatively simple (e.g., no vectorization), this example required a few fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before that): * #119379, #121393, #121400. The end goal for this test is to incrementally increase its complexity and to work towards scalable tile sizes. Note, this PR complements #115698 in which similar test for `tensor.pack` was added.

banach-space requested review from dcaballe, hanhanW and nicolasvasilache as code owners December 10, 2024 13:37

llvmbot added mlir:linalg mlir labels Dec 10, 2024

banach-space requested review from rengolin and chelini December 10, 2024 13:38

banach-space requested review from qedawkins and Max191 December 10, 2024 13:38

rengolin reviewed Dec 11, 2024

View reviewed changes

fixup! [mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOp…

66e520e

…Pattern Update comments

rengolin approved these changes Dec 13, 2024

View reviewed changes

hanhanW approved these changes Dec 16, 2024

View reviewed changes

banach-space merged commit 58da789 into llvm:main Dec 16, 2024
8 checks passed

banach-space mentioned this pull request Jan 3, 2025

[mlir][tensor] Add e2e test for tensor.unpack with dynamic tile sizes #121557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

Uh oh!

banach-space commented Dec 10, 2024

Uh oh!

llvmbot commented Dec 10, 2024

Uh oh!

llvmbot commented Dec 10, 2024

Uh oh!

rengolin left a comment

Uh oh!

Uh oh!

Uh oh!

rengolin Dec 11, 2024

Uh oh!

Uh oh!

banach-space commented Dec 13, 2024

Uh oh!

hanhanW left a comment

Uh oh!

Uh oh!

banach-space commented Dec 30, 2024

Uh oh!

Uh oh!

		for (auto i : llvm::seq<unsigned>(0, destRank)) {
		// Given the assumption that all outer tiled dims are 1, the corresponding

[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

Uh oh!

Conversation

banach-space commented Dec 10, 2024

Uh oh!

llvmbot commented Dec 10, 2024

Uh oh!

llvmbot commented Dec 10, 2024

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rengolin Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

banach-space commented Dec 13, 2024

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

banach-space commented Dec 30, 2024

Uh oh!

Uh oh!