[mlir][tensor] Loosen restrictions on folding dynamic reshapes #137963

AGindinson · 2025-04-30T13:26:43Z

The main idea behind the change is to allow expand-of-collapse folds for reshapes like ?x?xk -> ? (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its output_shape argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage:

%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
	         : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
		: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>

On the above assumption, adjust the routine in getReassociationIndicesForCollapse() to allow dynamic reshapes beyond just ?x..?x1x1x..x1 -> ?. Dynamic subshapes introduce two kinds of issues:

n>2 consecutive dynamic dimensions in the source shape cannot be collapsed together into 1<k<n neighboring dynamic dimensions in the target shape, since there'd be more than one suitable reassociation (example: ?x?x10x? into ?x?)
When figuring out static subshape reassociations based on products, there are cases where a static dimension is collapsed with a dynamic one, and should therefore be skipped when comparing products of source & target dimensions (e.g. ?x2x3x4 into ?x12)

To address 1, we should detect such sequences in the target shape before assigning multiple dynamic dimensions into the same index set. For 2, we take note that a static target dimension was preceded by a dynamic one and allow an "offset" subshape of source static dimensions, as long as there's an exact sequence for the target size later in the source shape.

This PR aims to address all reshapes that can be determined based purely on shapes (and original reassociation
maps, as done in ComposeExpandOfCollapseOp::findCollapsingReassociation). It doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility, so a path forward would be to provide dialect-specific implementations for Linalg/Tensor.

Signed-off-by: Artem Gindinson [email protected]

The main idea behind the change is to allow expand-of-collapse folds for reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its `output_shape` argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage: ``` %c32 = arith.constant 32 : index %div = arith.divsi %<some_index>, %c32 : index %collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]] : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32> %affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div] %expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine] : tensor<9x?x?xf32> into tensor<9x?x32x?xf32> ``` On the above assumption, adjust the routine in `getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond just `?x..?x1x1x..x1` -> `?`. Moreover, the reassociation util was refactored to clearly distinguish between dynamic and static subshapes. A few known caveats were noted as a comment; it doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility. Other implementation ideas/larger refactoring could include: - abandoning the util usage in the `ComposeExpandOfCollapseOp` pattern, employing similar logic to `ComposeCollapseOfExpandOp`; - providing dialect-specific implementations for Linalg/Tensor. Signed-off-by: Artem Gindinson <[email protected]>

github-actions · 2025-04-30T13:27:07Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-04-30T13:27:44Z

@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Artem Gindinson (AGindinson)

Changes

The main idea behind the change is to allow expand-of-collapse folds for reshapes like ?x?xk -> ? (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its output_shape argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage:

%c32 = arith.constant 32 : index
%div = arith.divsi %&lt;some_index&gt;, %c32 : index
%collapsed = tensor.collapse_shape %41#<!-- -->1 [[0], [1, 2], [3, 4]]
	         : tensor&lt;9x?x32x?x32xf32&gt; into tensor&lt;9x?x?xf32&gt;
%affine = affine.apply affine_map&lt;()[s0] -&gt; (s0 * 32)&gt; ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
		: tensor&lt;9x?x?xf32&gt; into tensor&lt;9x?x32x?xf32&gt;

On the above assumption, adjust the routine in
getReassociationIndicesForCollapse() to allow dynamic reshapes beyond just ?x..?x1x1x..x1 -> ?.

Moreover, the reassociation util was refactored to clearly distinguish between dynamic and static subshapes. A few known caveats were noted as a comment; it doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility. Other implementation ideas/larger refactoring could include:

abandoning the util usage in the ComposeExpandOfCollapseOp pattern, employing similar logic to ComposeCollapseOfExpandOp;
providing dialect-specific implementations for Linalg/Tensor.

Signed-off-by: Artem Gindinson <[email protected]>

Full diff: https://github.com/llvm/llvm-project/pull/137963.diff

3 Files Affected:

(modified) mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp (+57-46)
(modified) mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir (+2-2)
(modified) mlir/test/Dialect/Tensor/canonicalize.mlir (+20-4)

diff --git a/mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp b/mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp
index ed40a080441bc..694783849198a 100644
--- a/mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp
+++ b/mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp
@@ -31,59 +31,70 @@ mlir::getReassociationIndicesForReshape(ShapedType sourceType,
 std::optional<SmallVector<ReassociationIndices>>
 mlir::getReassociationIndicesForCollapse(ArrayRef<int64_t> sourceShape,
                                          ArrayRef<int64_t> targetShape) {
-  if (sourceShape.size() <= targetShape.size())
+  unsigned numSourceDims = sourceShape.size(),
+           numTargetDims = targetShape.size();
+  if (numSourceDims <= numTargetDims)
     return std::nullopt;
-  unsigned sourceDim = 0;
-  SmallVector<ReassociationIndices> reassociationMap;
-  reassociationMap.reserve(targetShape.size());
-
-  ReassociationIndices currIndices;
-  int64_t prodOfCollapsedDims = 1;
-  while (sourceDim < sourceShape.size()) {
-    unsigned targetDim = reassociationMap.size();
-    // If we have mapped all the target dimensions stop and handle the remaining
-    // tail of size-1 dimensions explicitly.
-    if (targetDim == targetShape.size())
-      break;
+  SmallVector<ReassociationIndices, 4> reassociationMap;
+  reassociationMap.reserve(numTargetDims);
 
+  unsigned sourceDim = 0, targetDim = 0;
+  for (; targetDim < numTargetDims; ++targetDim) {
     int64_t currTargetShape = targetShape[targetDim];
-    while (sourceDim < (sourceShape.size() - 1) &&
-           sourceShape[sourceDim] != ShapedType::kDynamic &&
-           prodOfCollapsedDims * sourceShape[sourceDim] < currTargetShape) {
+    ReassociationIndices currIndices;
+    // 1. Target dimension is dynamic. Source shape should contain at least
+    // one dynamic dimension.
+    if (currTargetShape == ShapedType::kDynamic) {
+      // FIXME: We stop the search with the first dynamic dimension, while in
+      // fact, we can have a valid pattern like 2x?x?x4x8 -> ?x4x8. It becomes
+      // indeterministic altogether when we have neighboring dynamic dimensions
+      // in the target shape. Most of these patterns will be safely rejected,
+      // however we might achieve more correct folds by taking affine
+      // expressions into account, if these can be passed on by the call sites.
+      bool foundDynamic = false;
+      while (sourceDim < numSourceDims) {
+        currIndices.push_back(sourceDim);
+        if (sourceShape[sourceDim++] == ShapedType::kDynamic) {
+          foundDynamic = true;
+          break;
+        }
+      }
+      if (!foundDynamic)
+        return std::nullopt;
+
+      reassociationMap.push_back(currIndices);
+      continue;
+    }
+    // 2. Target dimension is static. The product of dimensions of the expanded
+    // shape should match the collapsed dimension shape.
+    int64_t prodOfCollapsedDims = 1;
+    bool reachedTargetDimSize = false;
+    while (sourceDim < numSourceDims) {
+      // Source shape cannot be dynamic if the target dim is static.
+      if (sourceShape[sourceDim] == ShapedType::kDynamic)
+        return std::nullopt;
       prodOfCollapsedDims *= sourceShape[sourceDim];
-      currIndices.push_back(sourceDim++);
+      if (prodOfCollapsedDims > currTargetShape)
+        break;
+      else if (prodOfCollapsedDims == currTargetShape) {
+        currIndices.push_back(sourceDim++);
+        reachedTargetDimSize = true;
+        break;
+      } else // prodOfCollapsedDims < currTargetShape
+        currIndices.push_back(sourceDim++);
     }
-
-    // If the current expanded dimension is dynamic, then the collapsed
-    // dimensions should also be dynamic and product of all previous unprocessed
-    // dimensions of the expanded shape should be 1.
-    if (sourceShape[sourceDim] == ShapedType::kDynamic &&
-        (currTargetShape != ShapedType::kDynamic || prodOfCollapsedDims != 1))
+    if (!reachedTargetDimSize)
       return std::nullopt;
-
-    // If the collapsed dim is dynamic, the current expanded dim should also
-    // be dynamic.
-    if (currTargetShape == ShapedType::kDynamic &&
-        sourceShape[sourceDim] != ShapedType::kDynamic)
-      return std::nullopt;
-
-    // For static shapes, if the product of dimensions of the expanded shape
-    // should match the collapsed dimension shape.
-    if (prodOfCollapsedDims * sourceShape[sourceDim] != currTargetShape)
-      return std::nullopt;
-
-    currIndices.push_back(sourceDim++);
-    reassociationMap.emplace_back(ReassociationIndices{});
-    std::swap(reassociationMap.back(), currIndices);
-    prodOfCollapsedDims = 1;
+    reassociationMap.push_back(currIndices);
   }
-  // All the dimensions in the target must have been processed.
-  if (reassociationMap.size() != targetShape.size())
-    return std::nullopt;
-  // Process any remaining entries in the source shape. They all need to be
-  // 1 or dynamic.
-  for (; sourceDim < sourceShape.size(); sourceDim++) {
-    if (sourceShape[sourceDim] != ShapedType::kDynamic &&
+  // Now that we've mapped all the target dimensions, process any remaining
+  // entries in the source shape explicitly. Either the last target dimension
+  // is dynamic, or all remaining source entries need to be 1 or dynamic. Same
+  // applies when target shape is empty (can be the case for subshape
+  // reassociations).
+  for (; sourceDim < numSourceDims; sourceDim++) {
+    if ((targetShape.empty() || targetShape.back() != ShapedType::kDynamic) &&
+        sourceShape[sourceDim] != ShapedType::kDynamic &&
         sourceShape[sourceDim] != 1)
       return std::nullopt;
     // The map is empty when the target type is a scalar.
diff --git a/mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir b/mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir
index 51350e5bc8498..6979770154bab 100644
--- a/mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir
+++ b/mlir/test/Dialect/Linalg/simplify-pack-unpack.mlir
@@ -158,8 +158,8 @@ func.func @unpack_to_partial_slice(%arg0: tensor<8x32xf32>) -> tensor<255xf32> {
 // -----
 
 // CHECK-LABEL: func.func @unpack_dynamic
-// CHECK-NOT:     tensor.collapse
-// CHECK:         linalg.unpack
+// CHECK:     tensor.collapse
+// CHECK-NOT:         linalg.unpack
 func.func @unpack_dynamic(%arg0: tensor<?x32xf32>) -> tensor<?xf32> {
   %c32 = arith.constant 32 : index
   %c0 = arith.constant 0 : index
diff --git a/mlir/test/Dialect/Tensor/canonicalize.mlir b/mlir/test/Dialect/Tensor/canonicalize.mlir
index 85bf6fba52aa4..443f931745557 100644
--- a/mlir/test/Dialect/Tensor/canonicalize.mlir
+++ b/mlir/test/Dialect/Tensor/canonicalize.mlir
@@ -1068,7 +1068,7 @@ func.func @fold_expand_of_collapse(%arg0 : tensor<3x4x4xf32>) -> tensor<3x4x4xf3
 
 // -----
 
-func.func @fold_expand_of_collapse_dynamic(%arg0 : tensor<?x4x?xf32>, %arg1: index, %arg2: index)
+func.func @fold_expand_of_collapse_mixed_subshape(%arg0 : tensor<?x4x?xf32>, %arg1: index, %arg2: index)
     -> tensor<?x4x?xf32> {
   %0 = tensor.collapse_shape %arg0 [[0, 1], [2]]
       : tensor<?x4x?xf32> into tensor<?x?xf32>
@@ -1076,12 +1076,28 @@ func.func @fold_expand_of_collapse_dynamic(%arg0 : tensor<?x4x?xf32>, %arg1: ind
       : tensor<?x?xf32> into tensor<?x4x?xf32>
   return %1 : tensor<?x4x?xf32>
 }
-// CHECK-LABEL: @fold_expand_of_collapse_dynamic
+// CHECK-LABEL: @fold_expand_of_collapse_mixed_subshape
 //   CHECK-NOT:   tensor.{{.*}}_shape
 
 // -----
 
-func.func @no_fold_expand_of_collapse_dynamic(%arg0 : tensor<?x?x?xf32>, %arg1: index, %arg2: index, %arg3: index)
+func.func @fold_expand_of_collapse_mixed_target_subshape(%arg0 : tensor<?x4x?x2xf32>, %arg1: index, %arg2: index)
+    -> tensor<?x4x?xf32> {
+  %0 = tensor.collapse_shape %arg0 [[0, 1], [2, 3]]
+      : tensor<?x4x?x2xf32> into tensor<?x?xf32>
+  %1 = tensor.expand_shape %0 [[0, 1], [2]] output_shape [%arg1, 4, %arg2]
+      : tensor<?x?xf32> into tensor<?x4x?xf32>
+  return %1 : tensor<?x4x?xf32>
+}
+// CHECK-LABEL: @fold_expand_of_collapse_mixed_target_subshape
+//   CHECK-NOT:   tensor.expand_shape
+//       CHECK:   %[[COLLAPSE:.+]] = tensor.collapse_shape %arg0 {{\[}}[0], [1], [2, 3]]
+//  CHECK-SAME:     : tensor<?x4x?x2xf32> into tensor<?x4x?xf32>
+//  CHECK-NEXT:   return %[[COLLAPSE]]
+
+// -----
+
+func.func @no_fold_expand_of_collapse_fully_dynamic(%arg0 : tensor<?x?x?xf32>, %arg1: index, %arg2: index, %arg3: index)
     -> tensor<?x?x?xf32> {
   %0 = tensor.collapse_shape %arg0 [[0, 1], [2]]
       : tensor<?x?x?xf32> into tensor<?x?xf32>
@@ -1089,7 +1105,7 @@ func.func @no_fold_expand_of_collapse_dynamic(%arg0 : tensor<?x?x?xf32>, %arg1:
       : tensor<?x?xf32> into tensor<?x?x?xf32>
   return %1 : tensor<?x?x?xf32>
 }
-// CHECK-LABEL: @no_fold_expand_of_collapse_dynamic
+// CHECK-LABEL: @no_fold_expand_of_collapse_fully_dynamic
 //       CHECK:   tensor.collapse_shape
 //       CHECK:   %[[EXPAND:.+]] = tensor.expand_shape
 //       CHECK:   return %[[EXPAND]]

Signed-off-by: Artem Gindinson <[email protected]>

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp

Signed-off-by: Artem Gindinson <[email protected]>

Signed-off-by: Artem Gindinson <[email protected]> Co-authored-by: Ian Wood <[email protected]>

Signed-off-by: Artem Gindinson <[email protected]>

MaheshRavishankar · 2025-05-11T15:13:18Z

Just following along loosely. I think this is fairly involved and tricky. Marking as request changes since I intend to come back and review it in depth.

MaheshRavishankar

(empty comment)

IanWood1

I think there is a bug here. For example:

  EXPECT_EQ(getReassociationIndicesForCollapse(
                {ShapedType::kDynamic, 10, 10, 10, ShapedType::kDynamic},
                {ShapedType::kDynamic, 10, ShapedType::kDynamic}),
            std::nullopt);

This fails with the output ({ { 0 }, { 1 }, { 2, 3, 4 } }) (I think std::nullopt is correct result here)

Other examples:

  EXPECT_EQ(getReassociationIndicesForCollapse(
                {ShapedType::kDynamic, 3, 4, 3, ShapedType::kDynamic},
                {ShapedType::kDynamic, 12, ShapedType::kDynamic}),
            std::nullopt);
  EXPECT_EQ(getReassociationIndicesForCollapse(
                {ShapedType::kDynamic, 8, 4, 2, 16, ShapedType::kDynamic},
                {ShapedType::kDynamic, 32, ShapedType::kDynamic}),
            std::nullopt);
  EXPECT_EQ(getReassociationIndicesForCollapse(
                {ShapedType::kDynamic, 2, 2, 2, ShapedType::kDynamic},
                {ShapedType::kDynamic, 2, 2, ShapedType::kDynamic}),
            std::nullopt);

I'm trying to think of a generalized rule for when it becomes ambiguous. One possible way might be to check against a reversed target/source shape:

getReassociationIndicesForCollapse(source, target) == 
	reverseReassociation(
		getReassociationIndicesForCollapse(reverse(source), reverse(target)))

The idea is that the current algorithm is "anti-greedy" and will try go minimize the size of the reassociations towards the beginning. So reversing the ordering will compare anti-greedy with greedy. The results will be the same if there is non-ambiguous reassociation.

mlir/unittests/Dialect/Utils/ReshapeOpsUtilsTest.cpp

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp

AGindinson · 2025-05-12T07:33:39Z

I'm trying to think of a generalized rule for when it becomes ambiguous. One possible way might be to check against a reversed target/source shape:
...
The idea is that the current algorithm is "anti-greedy" and will try go minimize the size of the reassociations towards the beginning. So reversing the ordering will compare anti-greedy with greedy. The results will be the same if there is non-ambiguous reassociation.

Thanks, great catch first and foremost!

This may be a good approach, but I'm not sure how to conveniently let the ambiguity for collapsed dimensions of 1 pass through. Either tweaking the reverse iteration logic to push the 1's over to the next iteration (assuming this wouldn't lead to more peculiar edge cases, it might clutter the readability further) or writing the final comparison by hand... I'll mull this over and submit a new version next week.

AGindinson · 2025-05-20T16:30:15Z

@IanWood1, while making adjustments, I've realized:

For a variety of cases, the non-greedy traversal only works in one direction. Take 2x2x?x4x5x? into 2x?x20x?: it's valid, but the reassociation would fail for R-to-L traversal without backtracking because of ?x2 into ?x2x2 at the end. 2x2x?x4x5x4x? into 2x?x20x? would be a corresponding ambiguous input that we wouldn't be able to validate.

1b. I've meddled with a same direction, greedy-ish algorithm as an alternative way to validate: only for subsequent dynamic shapes in the target, it would maximize the range towards the beginning. This allows to exclude subexpressions like ?x2x? into ?x?, ?x?x? into ?x? for the same direction of traversal, but doesn't help with your examples above. Actual greedy matching requires full-on backtracking...
...which would essentially make this a regex match problem, because dynamic dimensions act as a * would. Best illustrated by palindrome cases like 2x1x2x...x?x...x2x1x2 into 2x?x2, with the fine bonus of having to confirm the match as unambiguous (but allowing ambiguity for unit dimensions!) - per our discussions above.

For a full solution, I'm taking another quick look at an approach with slicing the shapes into mixed & static subshapes. IMO, full-on backtracking is an overkill given the context. If not, let's just stick with 1 and accept the missed cases - we'll still get a good increase in coverage. I'll post what I have EO tomorrow.

Signed-off-by: Artem Gindinson <[email protected]>

AGindinson · 2025-05-21T00:28:36Z

Actual greedy matching requires full-on backtracking, which would essentially make this a regex match problem, <...>.

Too much drama, because:

For a full solution, <...> an approach with slicing the shapes into mixed & static subshapes

This problem can be mathematically reduced to my current version. We just need to handle static head and/or tail separately from the dynamic/mixed subshape in between. A few examples for success & failure: https://gist.github.com/AGindinson/2942c0d9667d62f367d8893918b45076.

The logic from the current version would be used heavily, so a review iteration would be of great help at this point @IanWood1 @MaheshRavishankar. As you see fit, we can also push the current approach over the merge line, then I'll iterate further to a complete solution.

Signed-off-by: Artem Gindinson <[email protected]>

IanWood1

The logic looks good from what I understand but is a bit complicated so I may need to reread to make sure I fully grasp what is going on.

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp

Signed-off-by: Artem Gindinson <[email protected]>

AGindinson · 2025-06-03T09:54:23Z

Many thanks Mahesh, Ian for your reviews and overall help. I'd appreciate if you could also help with merging this – it would be my 3rd PR to get in before I can apply for commit access :D

github-actions · 2025-06-03T16:09:25Z

@AGindinson Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-ci · 2025-06-03T16:16:23Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building mlir at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/14038

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x5600d9e9a620 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x5600d9e9a620 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

IanWood1 · 2025-06-03T16:30:12Z

@MaheshRavishankar I'm looking into the failure. Should this be reverted immediately and then re-landed after triaged/fixed?

#137963)" This reverts commit cb4a407.

MaheshRavishankar · 2025-06-03T17:11:37Z

@MaheshRavishankar I'm looking into the failure. Should this be reverted immediately and then re-landed after triaged/fixed?

The policy is to revert on failure and reland with fix.

IanWood1 · 2025-06-03T19:16:50Z

I'm not sure if this is related to the failure seen here but ComposeExpandOfCollapseOp needs to be updated/fixed. It uses getReassociationIndicesForCollapse but doesn't correctly bail on dynamic cases. For example

func.func public @test(%arg0 : tensor<?x8x128x?xf16>, %arg1: index) -> tensor<?x128x?xf16> {
  %collapse = tensor.collapse_shape %arg0 [[0, 1, 2, 3]] : tensor<?x8x128x?xf16> into tensor<?xf16>
  %expanded_19 = tensor.expand_shape %collapse [[0, 1, 2]] output_shape [%arg1, 8, %arg1] : tensor<?xf16> into tensor<?x128x?xf16>
  return %expanded_19 : tensor<?x128x?xf16>
}

gets folded into

func.func public @test(%arg0: tensor<?x8x128x?xf16>, %arg1: index) -> tensor<?x128x?xf16> {
  %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2], [3]] : tensor<?x8x128x?xf16> into tensor<?x128x?xf16>
  return %collapsed : tensor<?x128x?xf16>
}

This isn't valid. expand(collapse) can reshape to in any valid output shape but a single collapse can only collapse contiguous dims.

AGindinson · 2025-06-03T19:30:35Z

@IanWood1 If this is a dynamic size-specific issue, I'd say it's caused by my change either way. That said, I wouldn't be surprised if the same happened for 2x8x128x2 -> 16x256 -> 16x128x2 with the original approach.

Not sure if I understand correctly, what exactly goes against the contiguous rule in this example?

IanWood1 · 2025-06-03T19:46:57Z

I think this is a static case that can't be modeled by a collapse: 2x10x5 -> 100 -> 10x10.

In the case of ?x8x128x? -> ? -> ?x128x? the final 2 dynamic dims could be anything (e.g. the left got 2x bigger and the right got 2x smaller)

…s" (#142639) Reverts #137963 --------- Signed-off-by: Ian Wood <[email protected]>

…mic reshapes" (#142639) Reverts llvm/llvm-project#137963 --------- Signed-off-by: Ian Wood <[email protected]>

AGindinson · 2025-06-04T04:56:38Z

Ah got it. Yeah, this probably only shows up with this change ver. 1 of mine, as it's the one enabling the collapses of non-unit dimensions into dynamic ones. On one hand, the ExpandOfCollapse pattern is the right place to fix this*, but this scenario could also alter (simplify?) my logic for single reassociation calculations.

*until we work out how to look into affinities

…137963) The main idea behind the change is to allow expand-of-collapse folds for reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its `output_shape` argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage: ``` %c32 = arith.constant 32 : index %div = arith.divsi %<some_index>, %c32 : index %collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]] : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32> %affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div] %expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine] : tensor<9x?x?xf32> into tensor<9x?x32x?xf32> ``` On the above assumption, adjust the routine in `getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of issues: 1. n>2 consecutive dynamic dimensions in the source shape cannot be collapsed together into 1<k<n neighboring dynamic dimensions in the target shape, since there'd be more than one suitable reassociation (example: `?x?x10x? into ?x?`) 2. When figuring out static subshape reassociations based on products, there are cases where a static dimension is collapsed with a dynamic one, and should therefore be skipped when comparing products of source & target dimensions (e.g. `?x2x3x4 into ?x12`) To address 1, we should detect such sequences in the target shape before assigning multiple dynamic dimensions into the same index set. For 2, we take note that a static target dimension was preceded by a dynamic one and allow an "offset" subshape of source static dimensions, as long as there's an exact sequence for the target size later in the source shape. This PR aims to address all reshapes that can be determined based purely on shapes (and original reassociation maps, as done in `ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility, so a path forward would be to provide dialect-specific implementations for Linalg/Tensor. Signed-off-by: Artem Gindinson <[email protected]> --------- Signed-off-by: Artem Gindinson <[email protected]> Co-authored-by: Ian Wood <[email protected]>

…s" (llvm#142639) Reverts llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

Changes `findCollapsingReassociation` to return nullopt in all cases where source shape has `>=2` dynamic dims. `expand(collapse)` can reshape to in any valid output shape but a collapse can only collapse contiguous dimensions. When there are `>=2` dynamic dimensions it is impossible to determine if it can be simplified to a collapse or if it is preforming a more advanced reassociation. This problem was uncovered by #137963 --------- Signed-off-by: Ian Wood <[email protected]>

…2663) Changes `findCollapsingReassociation` to return nullopt in all cases where source shape has `>=2` dynamic dims. `expand(collapse)` can reshape to in any valid output shape but a collapse can only collapse contiguous dimensions. When there are `>=2` dynamic dimensions it is impossible to determine if it can be simplified to a collapse or if it is preforming a more advanced reassociation. This problem was uncovered by llvm/llvm-project#137963 --------- Signed-off-by: Ian Wood <[email protected]>

…2827) The original PR #137963 had a nvidia bot failure. This appears to be a flaky test because rerunning the build was successful. This change needs commit 6f2ba47 to fix incorrect usage of `getReassociationIndicesForCollapse`. Reverts #142639 Co-authored-by: Artem Gindinson <[email protected]>

…hapes" (#142827) The original PR llvm/llvm-project#137963 had a nvidia bot failure. This appears to be a flaky test because rerunning the build was successful. This change needs commit 6f2ba47 to fix incorrect usage of `getReassociationIndicesForCollapse`. Reverts llvm/llvm-project#142639 Co-authored-by: Artem Gindinson <[email protected]>

…137963) The main idea behind the change is to allow expand-of-collapse folds for reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its `output_shape` argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage: ``` %c32 = arith.constant 32 : index %div = arith.divsi %<some_index>, %c32 : index %collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]] : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32> %affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div] %expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine] : tensor<9x?x?xf32> into tensor<9x?x32x?xf32> ``` On the above assumption, adjust the routine in `getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of issues: 1. n>2 consecutive dynamic dimensions in the source shape cannot be collapsed together into 1<k<n neighboring dynamic dimensions in the target shape, since there'd be more than one suitable reassociation (example: `?x?x10x? into ?x?`) 2. When figuring out static subshape reassociations based on products, there are cases where a static dimension is collapsed with a dynamic one, and should therefore be skipped when comparing products of source & target dimensions (e.g. `?x2x3x4 into ?x12`) To address 1, we should detect such sequences in the target shape before assigning multiple dynamic dimensions into the same index set. For 2, we take note that a static target dimension was preceded by a dynamic one and allow an "offset" subshape of source static dimensions, as long as there's an exact sequence for the target size later in the source shape. This PR aims to address all reshapes that can be determined based purely on shapes (and original reassociation maps, as done in `ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility, so a path forward would be to provide dialect-specific implementations for Linalg/Tensor. Signed-off-by: Artem Gindinson <[email protected]> --------- Signed-off-by: Artem Gindinson <[email protected]> Co-authored-by: Ian Wood <[email protected]>

…s" (llvm#142639) Reverts llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

Changes `findCollapsingReassociation` to return nullopt in all cases where source shape has `>=2` dynamic dims. `expand(collapse)` can reshape to in any valid output shape but a collapse can only collapse contiguous dimensions. When there are `>=2` dynamic dimensions it is impossible to determine if it can be simplified to a collapse or if it is preforming a more advanced reassociation. This problem was uncovered by llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

…m#142827) The original PR llvm#137963 had a nvidia bot failure. This appears to be a flaky test because rerunning the build was successful. This change needs commit 6f2ba47 to fix incorrect usage of `getReassociationIndicesForCollapse`. Reverts llvm#142639 Co-authored-by: Artem Gindinson <[email protected]>

Changes `findCollapsingReassociation` to return nullopt in all cases where source shape has `>=2` dynamic dims. `expand(collapse)` can reshape to in any valid output shape but a collapse can only collapse contiguous dimensions. When there are `>=2` dynamic dimensions it is impossible to determine if it can be simplified to a collapse or if it is preforming a more advanced reassociation. This problem was uncovered by llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

…m#142827) The original PR llvm#137963 had a nvidia bot failure. This appears to be a flaky test because rerunning the build was successful. This change needs commit 6f2ba47 to fix incorrect usage of `getReassociationIndicesForCollapse`. Reverts llvm#142639 Co-authored-by: Artem Gindinson <[email protected]>

llvmbot added mlir:linalg mlir mlir:tensor labels Apr 30, 2025

AGindinson added a commit to AGindinson/iree that referenced this pull request Apr 30, 2025

[DispatchCreation] Update LIT test after llvm/llvm-project#137963

d71276d

Signed-off-by: Artem Gindinson <[email protected]>

IanWood1 reviewed Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp Outdated Show resolved Hide resolved

AGindinson and others added 6 commits May 5, 2025 06:41

Merge branch 'main' into artem/upstream/reassoc-expand-of-collapse

4f7c389

Merge branch 'main' into artem/upstream/reassoc-expand-of-collapse

18da6fe

[fixup] Algorithm rewrite

52ff4e0

Signed-off-by: Artem Gindinson <[email protected]>

[fixup] Add/expand unit tests

1c85a68

Signed-off-by: Artem Gindinson <[email protected]> Co-authored-by: Ian Wood <[email protected]>

[fixup] variable renaming

0fe986e

Signed-off-by: Artem Gindinson <[email protected]>

[fixup] Additional edge-case

e3aa239

Signed-off-by: Artem Gindinson <[email protected]>

AGindinson requested a review from IanWood1 May 9, 2025 15:52

MaheshRavishankar requested changes May 11, 2025

View reviewed changes

IanWood1 requested changes May 12, 2025

View reviewed changes

AGindinson added 4 commits May 20, 2025 22:26

[WIP] Current tests pass

16a932c

[WIP] New tests

dd36c47

Merge branch 'main' into artem/upstream/reassoc-expand-of-collapse

114af4b

[fixup] Add scalar target tests & fix em

07ed33d

Signed-off-by: Artem Gindinson <[email protected]>

[fixup] for self-induced unit dims problem

6e61a52

Signed-off-by: Artem Gindinson <[email protected]>

AGindinson requested review from MaheshRavishankar and IanWood1 May 21, 2025 05:22

IanWood1 reviewed May 21, 2025

View reviewed changes

Merge branch 'main' into artem/upstream/reassoc-expand-of-collapse

b0e5c93

[fixup] Improve auto usage further

54abd87

Signed-off-by: Artem Gindinson <[email protected]>

IanWood1 merged commit cb4a407 into llvm:main Jun 3, 2025
11 checks passed

IanWood1 added a commit that referenced this pull request Jun 3, 2025

Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes (

8b407ad

#137963)" This reverts commit cb4a407.

IanWood1 mentioned this pull request Jun 3, 2025

Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes" #142639

Merged

IanWood1 mentioned this pull request Jun 3, 2025

[mlir] Fix ComposeExpandOfCollapseOp for dynamic case #142663

Merged

IanWood1 added a commit that referenced this pull request Jun 3, 2025

Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshape…

f5a2f00

…s" (#142639) Reverts #137963 --------- Signed-off-by: Ian Wood <[email protected]>

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 3, 2025

Automerge: Revert "[mlir][tensor] Loosen restrictions on folding dyna…

423c49e

…mic reshapes" (#142639) Reverts llvm/llvm-project#137963 --------- Signed-off-by: Ian Wood <[email protected]>

IanWood1 mentioned this pull request Jun 4, 2025

[mlir] Re-land Loosen restrictions on folding dynamic reshapes #142827

Merged

rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025

Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshape…

bb1d2c2

…s" (llvm#142639) Reverts llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025

Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshape…

045b48b

…s" (llvm#142639) Reverts llvm#137963 --------- Signed-off-by: Ian Wood <[email protected]>

arjunUpatel mentioned this pull request Jun 24, 2025

[lldb] Revive TestSimulatorPlatform.py (#142244) arjunUpatel/llvm-project#1

Closed

[mlir][tensor] Loosen restrictions on folding dynamic reshapes #137963

[mlir][tensor] Loosen restrictions on folding dynamic reshapes #137963

Uh oh!

Conversation

AGindinson commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 30, 2025

Uh oh!

llvmbot commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MaheshRavishankar commented May 11, 2025

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

IanWood1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AGindinson commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AGindinson commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AGindinson commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IanWood1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AGindinson commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

llvm-ci commented Jun 3, 2025

Uh oh!

IanWood1 commented Jun 3, 2025

Uh oh!

MaheshRavishankar commented Jun 3, 2025

Uh oh!

IanWood1 commented Jun 3, 2025

Uh oh!

AGindinson commented Jun 3, 2025

Uh oh!

IanWood1 commented Jun 3, 2025

Uh oh!

AGindinson commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AGindinson commented Apr 30, 2025 •

edited

Loading

llvmbot commented Apr 30, 2025 •

edited

Loading

AGindinson commented May 12, 2025 •

edited

Loading

AGindinson commented May 20, 2025 •

edited

Loading

AGindinson commented May 21, 2025 •

edited

Loading

AGindinson commented Jun 3, 2025 •

edited

Loading

AGindinson commented Jun 4, 2025 •

edited

Loading