Skip to content

[mlir][linalg] Fix and Refactor DecomposeOuterUnitDimsUnPackOpPattern #119379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

banach-space
Copy link
Contributor

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @foo(
    %src: tensor<1x1x2x?xf32>,
    %dest: tensor<5x1xf32>,
    %tile_dim: index) -> tensor<5x1xf32> {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>

  return %0 : tensor<5x1xf32>
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

  • Dynamic inner tile size, combined with
  • Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -> tensor<?x2xf32>

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

  • readShapereadShapeForExtractSlice
  • readSizesextractSliceSizes
  • readStridesstridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.

This PR fixes an issue identified in llvm#118786, where the following
example triggers a verification error:

```mlir
func.func @foo(
    %src: tensor<1x1x2x?xf32>,
    %dest: tensor<5x1xf32>,
    %tile_dim: index) -> tensor<5x1xf32> {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>

  return %0 : tensor<5x1xf32>
}
```

The error occurs because of faulty logic when computing dynamic sizes
for `tensor::EmptyOp`, which initializes tensors for `linalg::transpose`.
This specific example fails due to:

  * Dynamic inner tile size, combined with
  * Non-identity permutation.

For reference, here's the verification error:

```bash
error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
```

and here's the problematic `tensor.empty` (printed in generic form):
```mlir
%1 = "tensor.empty"() : () -> tensor<?x2xf32>
```

**Fix**
This PR refactors how dynamic sizes for `tensor::EmptyOp` are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of `OpFoldResult`s
directly to the `EmptyOp` builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see `dynamicSizes`), while inner dimensions
were skipped, causing issues in certain cases.

**Refactoring changes**
Variable names have been updated for better readability:

* `readShape` → `readShapeForExtractSlice`
* `readSizes` → `extractSliceSizes`
* `readStrides` → `stridesForExtractSlice`

Additional comments have been added for clarity.

**Remaining inconsistencies**
Patterns for `tensor::PackOp` and `tensor::UnpackOp` remain partially inconsistent:

`DecomposeOuterUnitDimsPackOpPattern` enforces that all outer dimensions must be
unit-sized, while `DecomposeOuterUnitDimsUnPackOpPattern` does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.
@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2024

@llvm/pr-subscribers-mlir-linalg

Author: Andrzej Warzyński (banach-space)

Changes

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @<!-- -->foo(
    %src: tensor&lt;1x1x2x?xf32&gt;,
    %dest: tensor&lt;5x1xf32&gt;,
    %tile_dim: index) -&gt; tensor&lt;5x1xf32&gt; {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor&lt;1x1x2x?xf32&gt; -&gt; tensor&lt;5x1xf32&gt;

  return %0 : tensor&lt;5x1xf32&gt;
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

  • Dynamic inner tile size, combined with
  • Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -&gt; tensor&lt;?x2xf32&gt;

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

  • readShapereadShapeForExtractSlice
  • readSizesextractSliceSizes
  • readStridesstridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.


Full diff: https://github.com/llvm/llvm-project/pull/119379.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+45-21)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir (+20-10)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
         "require the tiled outer dimensions of the result are all 1s");
   }
 
-  // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+  // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+  //    %extracted_tile = tensor.extract_slice(%unpack_op_input)
   Location loc = unpackOp.getLoc();
   Value source = unpackOp.getSource();
   DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
       unpackOp.getDimAndTileMapping();
   Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
   Attribute oneIdxAttr = rewriter.getIndexAttr(1);
-  SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
-  SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
-  SmallVector<OpFoldResult> readSizes;
-  SmallVector<int64_t> readShape;
-  SmallVector<Value> dynamicDims;
+
+  // The sizes, affset and strides attributes for ExtractSliceOp.
+  SmallVector<OpFoldResult> extractSliceSizes;
+  SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+  SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+  // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+  // extractSliceSizes).
+  SmallVector<int64_t> readShapeForExtractSlice;
+
+  // Shape for EmptyOp that's used as the init value for TransposeOp below.
+  // This should match tile size + transposition.
+  SmallVector<OpFoldResult> shapeForEmptyOp;
+
   for (auto i : llvm::seq<unsigned>(0, destRank)) {
+    // Given the assumption that all outer tiled dims are 1, the corresponding
+    // slice size to read is also 1. As this will be rank-reducing "extract
+    // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+    // update:
+    //  * the output shape for ExtractSliceOp, nor
+    //  * the shape for EmptyOp.
     if (dimAndTileMapping.count(i)) {
-      readSizes.push_back(oneIdxAttr);
+      extractSliceSizes.push_back(oneIdxAttr);
       continue;
     }
 
+    // Compute sizes attribute for ExtractSliceOp + EmptyOp
     if (ShapedType::isDynamic(srcShape[i])) {
-      Value dynamicDim =
+      OpFoldResult dynamicDim =
           rewriter.create<tensor::DimOp>(loc, source, i).getResult();
-      readSizes.push_back(dynamicDim);
-      dynamicDims.push_back(dynamicDim);
+      extractSliceSizes.push_back(dynamicDim);
+      shapeForEmptyOp.push_back(dynamicDim);
     } else {
-      readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      if (srcShape[i] != 1)
+        shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+    }
+    // Compute the output shape for ExtractSliceOp (take into account
+    // rank-reducing)
+    if (srcShape[i] != 1) {
+      readShapeForExtractSlice.push_back(srcShape[i]);
     }
-    if (srcShape[i] != 1)
-      readShape.push_back(srcShape[i]);
   }
   auto mixedTiles = unpackOp.getMixedTiles();
-  readSizes.append(mixedTiles.begin(), mixedTiles.end());
+  // TODO: This effectively assumes that that tile sizes match the trailing
+  // sizes for ExtractSliceOp and EmptyOp - document this.
+  extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+  shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
 
   // Explicitly create the type for extract_slice op because the inner tile
   // size could be 1. We want to represent the whole inner tile in this case.
   auto tileShape = srcShape.drop_front(destRank);
   // Append the inner tile shape to the permuted and rank-reduced outer shape.
-  readShape.append(tileShape.begin(), tileShape.end());
+  readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
   Type elemType = unpackOp.getSourceType().getElementType();
-  auto readType = RankedTensorType::get(readShape, elemType);
+  auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
   Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
-      loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+      loc, readType, unpackOp.getSource(), extractSliceOffsets,
+      extractSliceSizes, extractSliceStrides);
 
   // 2. Transpose the tile to match the outer corresponding tile order.
   SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
       srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
   // Unpack is a transition out of packed space so we invert the permutation.
   perm = invertPermutationVector(perm);
-  SmallVector<int64_t> transpShape(readShape);
-  applyPermutationToVector<int64_t>(transpShape, perm);
+  applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
 
   Value empty =
-      rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+      rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
   auto transposedOp =
       rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
 
   // 3. Handle in-complete tiles if needed. It truncates trailing data from the
   // transposed tile.
-  int numLoops = transpShape.size();
+  int numLoops = shapeForEmptyOp.size();
   SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
   SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
   SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size.
 
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
   return %0 : tensor<5x1xf32>
 }
 // CHECK-LABEL: func.func @simple_unpack_dynamic_tile
 // CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
 // CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME:    %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
 // CHECK-NOT:     linalg.transpose
 //                They have the same type, so the insert_slice op is folded
 //                away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size and a trasnpose
 
-/// FIXME: This is currently broken:
-///   * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+  return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL:   func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:           %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK:           %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK:           %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME:        ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME:        outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME:        permutation = [1, 0]
+// CHECK:           %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK:           return %[[SLICE]] : tensor<5x1xf32>
 
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-//  %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-//  return %0 : tensor<5x1xf32>
-//}
 
 /// Same as example above, but with 1 scalable tile size.
 

@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2024

@llvm/pr-subscribers-mlir

Author: Andrzej Warzyński (banach-space)

Changes

This PR fixes an issue identified in #118786, where the following
example triggers a verification error:

func.func @<!-- -->foo(
    %src: tensor&lt;1x1x2x?xf32&gt;,
    %dest: tensor&lt;5x1xf32&gt;,
    %tile_dim: index) -&gt; tensor&lt;5x1xf32&gt; {

  %0 = tensor.unpack %src
    inner_dims_pos = [1, 0]
    inner_tiles = [2, %tile_dim]
    into %dest : tensor&lt;1x1x2x?xf32&gt; -&gt; tensor&lt;5x1xf32&gt;

  return %0 : tensor&lt;5x1xf32&gt;
}

The error occurs because of faulty logic when computing dynamic sizes
for tensor::EmptyOp, which initializes tensors for linalg::transpose.
This specific example fails due to:

  • Dynamic inner tile size, combined with
  • Non-identity permutation.

For reference, here's the verification error:

error: 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1

and here's the problematic tensor.empty (printed in generic form):

%1 = "tensor.empty"() : () -&gt; tensor&lt;?x2xf32&gt;

Fix
This PR refactors how dynamic sizes for tensor::EmptyOp are computed. Instead
of generating a separate vector of dynamic sizes to complement the output
shape, this PR adopts a common MLIR idiom: passing a vector of OpFoldResults
directly to the EmptyOp builder. Previously, only dynamic sizes corresponding
to outer dimensions were tracked (see dynamicSizes), while inner dimensions
were skipped, causing issues in certain cases.

Refactoring changes
Variable names have been updated for better readability:

  • readShapereadShapeForExtractSlice
  • readSizesextractSliceSizes
  • readStridesstridesForExtractSlice

Additional comments have been added for clarity.

Remaining inconsistencies
Patterns for tensor::PackOp and tensor::UnpackOp remain partially inconsistent:

DecomposeOuterUnitDimsPackOpPattern enforces that all outer dimensions must be
unit-sized, while DecomposeOuterUnitDimsUnPackOpPattern does not. The two
implementations have diverged in logic. I plan to address these inconsistencies
in a follow-up PR to further unify these patterns.


Full diff: https://github.com/llvm/llvm-project/pull/119379.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp (+45-21)
  • (modified) mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir (+20-10)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
index eeaa70c0b65892..8f9211a26a4590 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
@@ -1252,64 +1252,88 @@ LogicalResult DecomposeOuterUnitDimsUnPackOpPattern::matchAndRewrite(
         "require the tiled outer dimensions of the result are all 1s");
   }
 
-  // 1. Use rank-reduced tensor.extract_slice op to extract the tile.
+  // 1. Use rank-reduced tensor.extract_slice op to extract the tile:
+  //    %extracted_tile = tensor.extract_slice(%unpack_op_input)
   Location loc = unpackOp.getLoc();
   Value source = unpackOp.getSource();
   DenseMap<int64_t, OpFoldResult> dimAndTileMapping =
       unpackOp.getDimAndTileMapping();
   Attribute zeroIdxAttr = rewriter.getIndexAttr(0);
   Attribute oneIdxAttr = rewriter.getIndexAttr(1);
-  SmallVector<OpFoldResult> readOffsets(srcRank, zeroIdxAttr);
-  SmallVector<OpFoldResult> readStrides(srcRank, oneIdxAttr);
-  SmallVector<OpFoldResult> readSizes;
-  SmallVector<int64_t> readShape;
-  SmallVector<Value> dynamicDims;
+
+  // The sizes, affset and strides attributes for ExtractSliceOp.
+  SmallVector<OpFoldResult> extractSliceSizes;
+  SmallVector<OpFoldResult> extractSliceOffsets(srcRank, zeroIdxAttr);
+  SmallVector<OpFoldResult> extractSliceStrides(srcRank, oneIdxAttr);
+  // The shape for ExtractSliceOp (due to rank-reducing, this is likely !=
+  // extractSliceSizes).
+  SmallVector<int64_t> readShapeForExtractSlice;
+
+  // Shape for EmptyOp that's used as the init value for TransposeOp below.
+  // This should match tile size + transposition.
+  SmallVector<OpFoldResult> shapeForEmptyOp;
+
   for (auto i : llvm::seq<unsigned>(0, destRank)) {
+    // Given the assumption that all outer tiled dims are 1, the corresponding
+    // slice size to read is also 1. As this will be rank-reducing "extract
+    // slice" (i.e. the unit dims will be "collapsed"), there's no need to
+    // update:
+    //  * the output shape for ExtractSliceOp, nor
+    //  * the shape for EmptyOp.
     if (dimAndTileMapping.count(i)) {
-      readSizes.push_back(oneIdxAttr);
+      extractSliceSizes.push_back(oneIdxAttr);
       continue;
     }
 
+    // Compute sizes attribute for ExtractSliceOp + EmptyOp
     if (ShapedType::isDynamic(srcShape[i])) {
-      Value dynamicDim =
+      OpFoldResult dynamicDim =
           rewriter.create<tensor::DimOp>(loc, source, i).getResult();
-      readSizes.push_back(dynamicDim);
-      dynamicDims.push_back(dynamicDim);
+      extractSliceSizes.push_back(dynamicDim);
+      shapeForEmptyOp.push_back(dynamicDim);
     } else {
-      readSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      extractSliceSizes.push_back(rewriter.getIndexAttr(srcShape[i]));
+      if (srcShape[i] != 1)
+        shapeForEmptyOp.push_back(rewriter.getIndexAttr(srcShape[i]));
+    }
+    // Compute the output shape for ExtractSliceOp (take into account
+    // rank-reducing)
+    if (srcShape[i] != 1) {
+      readShapeForExtractSlice.push_back(srcShape[i]);
     }
-    if (srcShape[i] != 1)
-      readShape.push_back(srcShape[i]);
   }
   auto mixedTiles = unpackOp.getMixedTiles();
-  readSizes.append(mixedTiles.begin(), mixedTiles.end());
+  // TODO: This effectively assumes that that tile sizes match the trailing
+  // sizes for ExtractSliceOp and EmptyOp - document this.
+  extractSliceSizes.append(mixedTiles.begin(), mixedTiles.end());
+  shapeForEmptyOp.append(mixedTiles.begin(), mixedTiles.end());
 
   // Explicitly create the type for extract_slice op because the inner tile
   // size could be 1. We want to represent the whole inner tile in this case.
   auto tileShape = srcShape.drop_front(destRank);
   // Append the inner tile shape to the permuted and rank-reduced outer shape.
-  readShape.append(tileShape.begin(), tileShape.end());
+  readShapeForExtractSlice.append(tileShape.begin(), tileShape.end());
   Type elemType = unpackOp.getSourceType().getElementType();
-  auto readType = RankedTensorType::get(readShape, elemType);
+  auto readType = RankedTensorType::get(readShapeForExtractSlice, elemType);
   Value innerTile = rewriter.create<tensor::ExtractSliceOp>(
-      loc, readType, unpackOp.getSource(), readOffsets, readSizes, readStrides);
+      loc, readType, unpackOp.getSource(), extractSliceOffsets,
+      extractSliceSizes, extractSliceStrides);
 
   // 2. Transpose the tile to match the outer corresponding tile order.
   SmallVector<int64_t> perm = getPackUnpackRankReducedPerm(
       srcShape.take_front(destRank), innerDimsPos, unpackOp.getOuterDimsPerm());
   // Unpack is a transition out of packed space so we invert the permutation.
   perm = invertPermutationVector(perm);
-  SmallVector<int64_t> transpShape(readShape);
-  applyPermutationToVector<int64_t>(transpShape, perm);
+  applyPermutationToVector<OpFoldResult>(shapeForEmptyOp, perm);
 
   Value empty =
-      rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType, dynamicDims);
+      rewriter.create<tensor::EmptyOp>(loc, shapeForEmptyOp, elemType);
   auto transposedOp =
       rewriter.create<linalg::TransposeOp>(loc, innerTile, empty, perm);
 
   // 3. Handle in-complete tiles if needed. It truncates trailing data from the
   // transposed tile.
-  int numLoops = transpShape.size();
+  int numLoops = shapeForEmptyOp.size();
   SmallVector<OpFoldResult> tileStrides(numLoops, oneIdxAttr);
   SmallVector<OpFoldResult> tileOffsets(numLoops, zeroIdxAttr);
   SmallVector<OpFoldResult> tileSizes;
diff --git a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
index a720c655e4be51..bd60504f533456 100644
--- a/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/decompose-tensor-unpack.mlir
@@ -35,15 +35,15 @@ func.func @simple_unpack_static_tiles(%input: tensor<1x1x8x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size.
 
-func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim_0, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
+func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %input inner_dims_pos = [0, 1] inner_tiles = [%tile_dim, 2] into %output : tensor<1x1x?x2xf32> -> tensor<5x1xf32>
   return %0 : tensor<5x1xf32>
 }
 // CHECK-LABEL: func.func @simple_unpack_dynamic_tile
 // CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
 // CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
-// CHECK-SAME:    %[[TILE_DIM_1:[a-zA-Z0-9]+]]
-// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM_1]], 2] [1, 1, 1, 1]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:         %[[TILE:.+]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, %[[TILE_DIM]], 2] [1, 1, 1, 1]
 // CHECK-NOT:     linalg.transpose
 //                They have the same type, so the insert_slice op is folded
 //                away.
@@ -52,13 +52,23 @@ func.func @simple_unpack_dynamic_tile(%input: tensor<1x1x?x2xf32>, %output: tens
 
 /// Same as example above, but with 1 dynamic tile size and a trasnpose
 
-/// FIXME: This is currently broken:
-///   * 'tensor.empty' op incorrect number of dynamic sizes, has 0, expected 1
+func.func @simple_unpack_dynamic_tile_transpose(%src: tensor<1x1x2x?xf32>, %dest: tensor<5x1xf32>, %tile_dim: index) -> tensor<5x1xf32> {
+  %0 = tensor.unpack %src inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim] into %dest : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
+  return %0 : tensor<5x1xf32>
+}
+// CHECK-LABEL:   func.func @simple_unpack_dynamic_tile_transpose
+// CHECK-SAME:    %[[SRC:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[DEST:[a-zA-Z0-9]+]]
+// CHECK-SAME:    %[[TILE_DIM:[a-zA-Z0-9]+]]
+// CHECK:           %[[TILE:.*]] = tensor.extract_slice %[[SRC]][0, 0, 0, 0] [1, 1, 2, %[[TILE_DIM]]] [1, 1, 1, 1] : tensor<1x1x2x?xf32> to tensor<2x?xf32>
+// CHECK:           %[[EMPTY:.*]] = tensor.empty(%[[TILE_DIM]]) : tensor<?x2xf32>
+// CHECK:           %[[TRANSP:.*]] = linalg.transpose
+// CHECK-SAME:        ins(%[[TILE]] : tensor<2x?xf32>)
+// CHECK-SAME:        outs(%[[EMPTY]] : tensor<?x2xf32>)
+// CHECK-SAME:        permutation = [1, 0]
+// CHECK:           %[[SLICE:.*]] = tensor.extract_slice %[[TRANSP]][0, 0] [5, 1] [1, 1] : tensor<?x2xf32> to tensor<5x1xf32>
+// CHECK:           return %[[SLICE]] : tensor<5x1xf32>
 
-//func.func @simple_unpack_dynamic_tile_transpose(%input: tensor<1x1x2x?xf32>, %output: tensor<5x1xf32>, %tile_dim_0: index) -> tensor<5x1xf32> {
-//  %0 = tensor.unpack %input inner_dims_pos = [1, 0] inner_tiles = [2, %tile_dim_0] into %output : tensor<1x1x2x?xf32> -> tensor<5x1xf32>
-//  return %0 : tensor<5x1xf32>
-//}
 
 /// Same as example above, but with 1 scalable tile size.
 

Copy link
Member

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but best to give it some time for others to look at.

Some minor inline comments, but nothing critical.

for (auto i : llvm::seq<unsigned>(0, destRank)) {
// Given the assumption that all outer tiled dims are 1, the corresponding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more than an assumption here, since we have the check above.

@banach-space
Copy link
Contributor Author

Thanks for the review, @rengolin !

@hanhanW, @qedawkins, @Max191 - you reviewed my previous patches in this area. Please let me know if you'd like me to hold off on landing this. Otherwise, I plan to proceed.

Just a heads-up: I’ll be traveling for the next ~2 weeks. This is important to me, but I won’t be sending updates until I return.

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@banach-space banach-space merged commit 58da789 into llvm:main Dec 16, 2024
8 checks passed
@banach-space
Copy link
Contributor Author

Apologies, this landed without the summary.

banach-space added a commit to banach-space/llvm-project that referenced this pull request Jan 3, 2025
Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required
a few fixes in handling `tensor.unpack` (and similar fixes for
`tensor.pack` before that):

* llvm#119379, llvm#121393, llvm#121400.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Note, this PR complements llvm#115698 in which similar test for
`tensor.pack` was added.
banach-space added a commit that referenced this pull request Jan 9, 2025
…#121557)

Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required a few
fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before
that):

* #119379, #121393, #121400.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Note, this PR complements #115698 in which similar test for
`tensor.pack` was added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants