[MLIR][Tensor] Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape #134219

vivekkhandelwal1 · 2025-04-03T08:34:33Z

These are problematic because the iterative application that locally resolves the tensor.dim operation introduces
intermediate floor_div, which is losing the information about the exact division that was carried out in the original
IR, and the iterative algorithm can't converge towards the simplest form.
Information loss is not acceptable for canonicalization.

Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and
resolve-shaped-type-result-dims passes.

This commit removes the FoldDimOfExpandShape and FoldDimOfCollapseShape pattern from the list of ExpandShapeOp's canonicalization patterns. The above pattern were resulting in crash while folding the dims of an expanded tensor. The issue can be reproduced by undoing the changes done in this commit and by running the command: ``` mlir-opt --linalg-fuse-elementwise-ops repro.mlir ``` over the IR: https://gist.github.com/vivekkhandelwal1/56a1a398c21d739df77a67ce372b9366. Signed-off-by: Vivek Khandelwal <[email protected]>

llvmbot · 2025-04-03T08:35:04Z

@llvm/pr-subscribers-mlir-linalg
@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Vivek Khandelwal (vivekkhandelwal1)

Changes

This commit removes the FoldDimOfExpandShape and FoldDimOfCollapseShape pattern from the list of ExpandShapeOp's canonicalization patterns.

The above pattern were resulting in crash while folding the dims of an expanded tensor. The issue can be reproduced by undoing the changes done in this commit and by running the command:

mlir-opt --linalg-fuse-elementwise-ops repro.mlir

over the IR: https://gist.github.com/vivekkhandelwal1/56a1a398c21d739df77a67ce372b9366.

Full diff: https://github.com/llvm/llvm-project/pull/134219.diff

1 Files Affected:

(modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+1-2)

diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index d589f627d896e..35265ac49fcbf 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -2158,8 +2158,7 @@ void ExpandShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,
       ComposeExpandOfCollapseOp<ExpandShapeOp, CollapseShapeOp>,
       ConvertToStaticExpandShape, FoldReshapeWithConstant<ExpandShapeOp>,
       FoldReshapeWithSplat<ExpandShapeOp>,
-      FoldReshapeWithFromElements<ExpandShapeOp>, FoldDimOfExpandShape,
-      FoldDimOfCollapseShape>(context);
+      FoldReshapeWithFromElements<ExpandShapeOp>>(context);
 }
 
 void CollapseShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,

joker-eph

I'm very confused that there is no test change? That points to a previous lack of coverage...
Edit: actually the CI is failing, there are test changes and we had coverage :)

Can you add minimal test demonstrating the issue you're fixing?

vivekkhandelwal1 · 2025-04-03T09:44:15Z

I'm very confused that there is no test change? That points to a previous lack of coverage... Edit: actually the CI is failing, there are test changes and we had coverage :)

Can you add minimal test demonstrating the issue you're fixing?

I have fixed the failing regression tests. Sorry, I forgot to do that while creating the PR.

joker-eph · 2025-04-03T09:46:05Z

We still need a test reproducing the crash you mentioned, right now it's not clear if what you're doing in this PR is fundamentally the right fix here or just a "big hammer" solution for the crash.

vivekkhandelwal1 · 2025-04-03T09:48:06Z

We still need a test reproducing the crash you mentioned, right now it's not clear if what you're doing in this PR is fundamentally the right fix here or just a "big hammer" solution for the crash.

Sure, I will add a test which would crash without these changes.

vivekkhandelwal1 · 2025-04-03T14:20:38Z

We still need a test reproducing the crash you mentioned, right now it's not clear if what you're doing in this PR is fundamentally the right fix here or just a "big hammer" solution for the crash.

Hi @joker-eph, I have added a test which will crash without the changes made in this PR.

vivekkhandelwal1 · 2025-04-03T14:23:37Z

Also, after the removal of those canonicalization patterns, the code below becomes dead code since it's not used anywhere else. Shall I remove that, too?

llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Lines 1989 to 2071 in b61e387

    
           struct FoldDimOfExpandShape : public OpRewritePattern<DimOp> { 
        
             using OpRewritePattern<DimOp>::OpRewritePattern; 
        
             LogicalResult matchAndRewrite(DimOp dimOp, 
        
                                           PatternRewriter &rewriter) const override { 
        
               auto expandShapeOp = dimOp.getSource().getDefiningOp<ExpandShapeOp>(); 
        
               if (!expandShapeOp) 
        
                 return failure(); 
        
               // Only constant dimension values are supported. 
        
               std::optional<int64_t> dim = dimOp.getConstantIndex(); 
        
               if (!dim.has_value()) 
        
                 return failure(); 
        
               // Skip static dims. These are folded to constant ops. 
        
               RankedTensorType resultType = expandShapeOp.getResultType(); 
        
               if (!resultType.isDynamicDim(*dim)) 
        
                 return failure(); 
        
               // Find reassociation group that contains this result dimension. 
        
               int64_t srcDim = expandShapeOp.getCorrespondingSourceDim(*dim); 
        
               // `dim` is the only dynamic dimension in `group`. (Otherwise, the 
        
               // ExpandShapeOp would be ambiguous.) 
        
               int64_t product = 1; 
        
               ReassociationIndices grp = expandShapeOp.getReassociationIndices()[srcDim]; 
        
               for (int64_t d : grp) { 
        
                 if (d != dim) { 
        
                   assert(!resultType.isDynamicDim(d) && "expected static dim"); 
        
                   product *= resultType.getDimSize(d); 
        
                 } 
        
               } 
        
               // result dim size = src dim size / (product(other dims in reassoc group)) 
        
               Value srcDimSz = 
        
                   rewriter.create<DimOp>(dimOp.getLoc(), expandShapeOp.getSrc(), srcDim); 
        
               AffineExpr expr; 
        
               bindSymbols(dimOp.getContext(), expr); 
        
               rewriter.replaceOpWithNewOp<affine::AffineApplyOp>( 
        
                   dimOp, expr.floorDiv(product), srcDimSz); 
        
               return success(); 
        
             } 
        
           }; 
        
           struct FoldDimOfCollapseShape : public OpRewritePattern<DimOp> { 
        
             using OpRewritePattern<DimOp>::OpRewritePattern; 
        
             LogicalResult matchAndRewrite(DimOp dimOp, 
        
                                           PatternRewriter &rewriter) const override { 
        
               auto collapseShapeOp = dimOp.getSource().getDefiningOp<CollapseShapeOp>(); 
        
               if (!collapseShapeOp) 
        
                 return failure(); 
        
               // Only constant dimension values are supported. 
        
               std::optional<int64_t> dim = dimOp.getConstantIndex(); 
        
               if (!dim.has_value() || 
        
                   dim.value() >= collapseShapeOp.getResultType().getRank()) 
        
                 return failure(); 
        
               // Skip static dims. These are folded to constant ops. 
        
               RankedTensorType resultType = collapseShapeOp.getResultType(); 
        
               if (!resultType.isDynamicDim(*dim)) 
        
                 return failure(); 
        
               // Get reassociation group of the result dimension. 
        
               ReassociationIndices group = 
        
                   collapseShapeOp.getReassociationIndices()[*dim]; 
        
               // result dim size = product(dims in reassoc group) 
        
               SmallVector<Value> srcDimSizes; 
        
               SmallVector<AffineExpr> syms; 
        
               AffineExpr product; 
        
               for (const auto &it : llvm::enumerate(group)) { 
        
                 srcDimSizes.push_back(rewriter.create<DimOp>( 
        
                     dimOp.getLoc(), collapseShapeOp.getSrc(), it.value())); 
        
                 syms.push_back(rewriter.getAffineSymbolExpr(it.index())); 
        
                 product = product ? product * syms.back() : syms.back(); 
        
               } 
        
               rewriter.replaceOpWithNewOp<affine::AffineApplyOp>(dimOp, product, 
        
                                                                  srcDimSizes); 
        
               return success(); 
        
             } 
        
           };

Groverkss

Why should the canonicalization patterns be removed instead of fixing them? I don't underestand the reasoning behind this PR

joker-eph · 2025-04-03T15:55:32Z

Hi @joker-eph, I have added a test which will crash without the changes made in this PR.

Thanks! Is it really minimal?

Also can you comment on my previous question:

right now it's not clear if what you're doing in this PR is fundamentally the right fix here or just a "big hammer" solution for the crash.

Why is removing these the right fix for the crash?

qedawkins

Agree, dropping the canonicalizers is not the right fix here. The crash is because this comment is outdated:

llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Lines 2011 to 2012 in 835a35c

    
           // `dim` is the only dynamic dimension in `group`. (Otherwise, the 
        
           // ExpandShapeOp would be ambiguous.)

expand_shape takes a list of output sizes now so we can simplify these canonicalizers to just forward the value in output_shape

MaheshRavishankar · 2025-04-03T17:03:47Z

Dropping these canonicalizations is the right thing. These patterns are redundant w.r.t

llvm-project/mlir/lib/Dialect/Tensor/IR/TensorInferTypeOpInterfaceImpl.cpp

Line 129 in a54736a

void mlir::tensor::registerInferTypeOpInterfaceExternalModels(

which is the current way of resolving tensor.dim of results w.r.t dimensions of its operands. There are two reasons why this canonicalization is bad

In general all tensor.dim operation resolutions have been moved to use the ReifyRankedShapedTypeOpInterface and that is wrapped in

llvm-project/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h

Line 52 in a54736a

void populateResolveRankedShapedTypeResultDimsPatterns(

. These two patterns are outliers and redundant.
None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects that might be too hard a hammer for canonicalizations, and was removed from canonicalizations a really long time ago.

vivekkhandelwal1 · 2025-04-07T03:26:47Z

Dropping these canonicalizations is the right thing. These patterns are redundant w.r.t

llvm-project/mlir/lib/Dialect/Tensor/IR/TensorInferTypeOpInterfaceImpl.cpp

Line 129 in a54736a

void mlir::tensor::registerInferTypeOpInterfaceExternalModels(

which is the current way of resolving tensor.dim of results w.r.t dimensions of its operands. There are two reasons why this canonicalization is bad

In general all tensor.dim operation resolutions have been moved to use the ReifyRankedShapedTypeOpInterface and that is wrapped in

llvm-project/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h

Line 52 in a54736a

void populateResolveRankedShapedTypeResultDimsPatterns(

. These two patterns are outliers and redundant.

None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects that might be too hard a hammer for canonicalizations, and was removed from canonicalizations a really long time ago.

Thanks @MaheshRavishankar for adding the rationale behind these changes.

I request all the existing reviewers to re-review this PR since we have been blocked on the Torch-MLIR for quite some time because of this.

CC: @joker-eph

qedawkins

Ok if this is in line with canonicalization of tensor.dim w.r.t. other ops then this makes sense. This wasn't immediately clear from the initial PR description. I'll bow out of the review here.

mlir/test/Dialect/Tensor/canonicalize.mlir

Groverkss

LGTM, can you just update the PR description to include:

This PR makes the tensor.dim folders for tensor.expand_shape/tensor.collapse_shape consistent with other tensor operations. In general:

1. All tensor.dim operation resolutions have been moved to use the ReifyRankedShapedTypeOpInterface.
2. None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects, which make them bad for canonicalizations, and were removed from other tensor operations some time ago.

self dismissing review

vivekkhandelwal1 · 2025-04-08T06:29:07Z

Hi, I don't have merge access. Can someone please merge this for me?

joker-eph · 2025-04-08T08:11:09Z

None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects that might be too hard a hammer for canonicalizations, and was removed from canonicalizations a really long time ago.

I don't see the non-local effects on the current test change, can you elaborate on this please? Maybe an example showing the effect of this canonicalization pattern that shouldn't be a canonicalization?

(I'd rather see this resolved before anyone merges this)

MaheshRavishankar · 2025-04-08T17:00:56Z

None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects that might be too hard a hammer for canonicalizations, and was removed from canonicalizations a really long time ago.

I don't see the non-local effects on the current test change, can you elaborate on this please? Maybe an example showing the effect of this canonicalization pattern that shouldn't be a canonicalization?

(I'd rather see this resolved before anyone merges this)

Let me clarify. These patterns are the outliers. None of the other tensor.dim resolution patterns are run during canonicalization. These pattern are also wrong and redundant. I see no reason to fix these patterns cause they are inconsistent, and duplicate of the tensor.dim resolution that happens through interface here. So the resolution here is to remove this pattern. The question of whether all the tensor.dim resolution needs to happen during canonicalization or not is a separate issue that we can revisit.

It would be better to not block this change cause this is removing a redundant and wrong pattern.

joker-eph · 2025-04-08T19:18:08Z

. These pattern are also wrong

Wrong how?

and redundant

It's a bit ambiguous what you call redundant, if you compare to "other things that are not part of canonicalization" then it's a bit dubious to claim it "redundant" as it won't be part of the canonicalization run.

So the resolution here is to remove this pattern. The question of whether all the tensor.dim resolution needs to happen during canonicalization or not is a separate issue that we can revisit. It would be better to not block this change cause this is removing a redundant and wrong pattern.

This is all an opinion, but I'd rather see you elaborate you're thinking to enlighten everyone, instead of just making a strong claim that you're not trying to substantiate right now (other than very unspecific terms like "wrong" and "redundant").
Because right now:

So the resolution here is to remove this pattern.

may very well be the absolute right thing, however it's important to be able to convey why this is the right ultimate thing.

MaheshRavishankar · 2025-04-08T22:40:40Z

. These pattern are also wrong

Wrong how?

Wrong cause they are crashing.

and redundant

It's a bit ambiguous what you call redundant, if you compare to "other things that are not part of canonicalization" then it's a bit dubious to claim it "redundant" as it won't be part of the canonicalization run.

Redundant because these pattern are replacing tensor.dim of result in terms of tensor.dim of operands. Which is exactly what these patterns do (

llvm-project/mlir/include/mlir/Dialect/MemRef/Transforms/Transforms.h

Line 52 in b712068

void populateResolveRankedShapedTypeResultDimsPatterns(

) using the interface here (

llvm-project/mlir/lib/Dialect/Tensor/IR/TensorInferTypeOpInterfaceImpl.cpp

Line 129 in a54736a

void mlir::tensor::registerInferTypeOpInterfaceExternalModels(

) . So its literally the same thing.

This is all an opinion, but I'd rather see you elaborate you're thinking to enlighten everyone, instead of just making a strong claim that you're not trying to substantiate right now (other than very unspecific terms like "wrong" and "redundant"). Because right now:

It isnt a strong claim. They literally do the same thing that another pattern in the code base does (redundant) and is crashing (wrong). I dont know what else you think would be needed to substantiate that this is wrong and redundant.

So the resolution here is to remove this pattern.

may very well be the absolute right thing, however it's important to be able to convey why this is the right ultimate thing.

I dont want to get dragged into canonicalization vs. non-canonicalization discussion again and again. These two patterns are outliers since they are resolving tensor.dim operations of result in terms of tensor.dim operation of operands. The way the rest of the operations in tree handle that is through the interface (including the two ops here). No other ops have that. So I think we dont even need to get into canonicalization vs. not discussion. These patterns are outliers.

Groverkss · 2025-04-09T11:36:00Z

Ok let's take a step back and see why this patch is a good patch. I looked through the history and code and here is the reasoning:

There are two questions we are answering here:

Is the current implementation of this pattern wrong? Should we fix it?
Should the correct pattern be a canonicalization?

The answer to Question 1, is that this pattern is wrong. It's trying to infer the output shape of the tensor, which is already provided in the operation! This pattern is from the time when expand_shape did not carry this information. The output shape was explicitly added to the operation in #90040 and has reasoning why for dynamic shapes, the output shape inference is wrong.

So now we have two options, either fix the pattern, or remove it. Thing is, the same pattern is implemented correctly through ReifyRankedShapedTypeOpInterface as a general pattern. So it is okay to remove the pattern, as once fixed, this pattern is a duplicate of a more general pattern.

So now, to Question 2, is the correct pattern, in whatever shape or form, a good canonicalization? For static cases, expand_shape already has a folder to move to a more canonical form, where the output_shape field in the op moves to a more canonical form (dynamic -> static). For dynamic cases, the answer to this question is No. The pattern is not a canonicalization for expand_shape at all! In it's correct form, it doesn't change the representation of expand_shape in any form, instead it is using information from expand_shape to bubble up the tensor.dim operations upwards. What I understand by "non-local" effects here is that it's doing transformations around the operation that is meant to be canonicalized, but not doing anything to the operation itself. So it's "non-local" to the operation being canonicalized. This is a clear indication that this pattern is not really a canonicalization.

Finally, The pattern is implemented in transformations that refine shape information through ReifyRankedShapedTypeOpInterface so we aren't losing any old transformations here, this patch is fixing the questions I explained above.

@joker-eph @MaheshRavishankar Hope this clears it up. I think this should provide a clear answer why this pattern is being removed as a canonicalization, and also being deleted at the same time. Let me know if you need more clarification.

Groverkss · 2025-04-09T11:45:17Z

mlir/test/Dialect/Tensor/canonicalize.mlir

-//       CHECK: #[[$MAP:.*]] = affine_map<()[s0] -> (s0 * 64)>
 // CHECK-LABEL: @compose_expand_of_collapse_last_two_dims
 //  CHECK-SAME: %[[ARG0:.+]]: tensor<?x64x1xf32>
-//       CHECK: %[[CONSTANT0:.+]] = arith.constant 0 : index
 //       CHECK: %[[CONSTANT384:.+]] = arith.constant 384 : index
+//       CHECK: %[[CONSTANT0:.+]] = arith.constant 0 : index
 //       CHECK: %[[COLLAPSE:.+]] = tensor.collapse_shape %[[ARG0]] {{\[}}[0, 1, 2]] : tensor<?x64x1xf32> into tensor<?xf32>
-//       CHECK: %[[DIM:.+]] = tensor.dim %[[ARG0]], %[[CONSTANT0]] : tensor<?x64x1xf32>
-//       CHECK: %[[AFFAPPLY:.+]] = affine.apply #[[$MAP]]()[%[[DIM]]]
-//       CHECK: %[[DIVUI:.+]] = arith.divui %[[AFFAPPLY]], %[[CONSTANT384]] : index
+//       CHECK: %[[DIM:.+]] = tensor.dim %[[COLLAPSE]], %[[CONSTANT0]] : tensor<?xf32>
+//       CHECK: %[[DIVUI:.+]] = arith.divui %[[DIM]], %[[CONSTANT384]] : index


@joker-eph For an example of non-local effects, the previous canonicalization pattern for collapse_shape, was not moving collapse_shape to a more canonical form. It's actually not touching the collapse_shape at all, and is doing things around it.

It's very hard to say what the canonical form of where tensor.dim should be here is. There is no reason for the tensor.dim, when propagated up to be more canonical. It can possibly introduce 2 tensor.dim operations, if there were 2 dynamic dimensions before the collapse shape. If there is a cost to a tensor.dim operation, you now pay the cost twice. We cannot really say if one form is more canonical than the other.

In this current example, simplified we have:

func.func @dim_expand_shape(%arg0: tensor<?x64x1xf32>) -> index { %c0 = arith.constant 0 : index %collapsed = tensor.collapse_shape %arg0 [[0, 1, 2]] : tensor<?x64x1xf32> into tensor<?xf32> %dim = tensor.dim %collapsed, %c0 : tensor<?xf32> return %dim : index }

If I want to know something about the dim, it's just the first dimension of the collapsed_shape and I have now to reason about it.

After canonicalization we have:

#map = affine_map<()[s0] -> (s0 * 64)> module { func.func @dim_expand_shape(%arg0: tensor<?x64x1xf32>) -> index { %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x64x1xf32> %0 = affine.apply #map()[%dim] return %0 : index }

The dimension computation is made pretty explicit in terms of the input (the calculation is explicit) and the tensor.collapse_shape even disappears.

There is no reason for the tensor.dim, when propagated up to be more canonical.

It's not always obvious what the "best" canonical form should be, that does not mean we can't pick one though: often it resolves just about picking one convention and sticking to it to be consistent (I don't know if this is the right convention here, just saying).

It can possibly introduce 2 tensor.dim operations, if there were 2 dynamic dimensions before the collapse shape.

Here is the example you describe I believe:

func.func @dim_expand_shape(%arg0: tensor<?x?x64x1xf32>) -> index { %c0 = arith.constant 0 : index %collapsed = tensor.collapse_shape %arg0 [[0, 1, 2, 3]] : tensor<?x?x64x1xf32> into tensor<?xf32> %dim = tensor.dim %collapsed, %c0 : tensor<?xf32> return %dim : index }

Becomes:

#map = affine_map<()[s0, s1] -> ((s0 * s1) * 64)> module { func.func @dim_expand_shape(%arg0: tensor<?x?x64x1xf32>) -> index { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x?x64x1xf32> %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?x64x1xf32> %0 = affine.apply #map()[%dim, %dim_0] return %0 : index } }

As mentioned elsewhere, the cost of "dim" does not seem meaningful to me here. The question is rather about whether this IR is more amenable to reasoning on the values, or equivalence between values in further analysis. Any argument long this line of reasoning can be very valuable to make a statement about whether it is clearly belonging / not belonging to canonicalization.

Groverkss · 2025-04-09T11:47:49Z

mlir/test/Dialect/Tensor/canonicalize.mlir

-//       CHECK: #[[$map:.*]] = affine_map<()[s0] -> (s0 floordiv 40)>
 // CHECK-LABEL: func @dim_of_expand_shape(
 //  CHECK-SAME:     %[[t:.*]]: tensor<?x?xf32>
-//       CHECK:   %[[c1:.*]] = arith.constant 1 : index
-//       CHECK:   %[[dim:.*]] = tensor.dim %[[t]], %[[c1]] : tensor<?x?xf32>
-//       CHECK:   %[[apply:.*]] = affine.apply #[[$map]]()[%[[dim]]]
-//       CHECK:   return %[[apply]]
+//       CHECK:   %[[c2:.*]] = arith.constant 2 : index
+//       CHECK:   %[[expanded:.*]] = tensor.expand_shape %[[t]] {{\[\[}}0], [1, 2, 3, 4, 5]] output_shape [%arg1, 1, %arg2, 5, 1, 8] : tensor<?x?xf32> into tensor<?x1x?x5x1x8xf32>
+//       CHECK:   %[[dim:.*]] = tensor.dim %[[expanded]], %[[c2]] : tensor<?x1x?x5x1x8xf32>
+//       CHECK:   return %[[dim]]


@joker-eph For an example of the old pattern being wrong, notice that we already know the output shape. The result should just be %arg2 here! But it's instead it was doing some magic with affine.apply, which is wrong.

We could implement a folder for this for tensor.dim, but that doesn't change the fact that this pattern is wrong, and doesn't even belong as a canonicalization to expand_shape.

Groverkss · 2025-04-09T11:49:15Z

I left some inline comments on test changes showcasing to back my reasoning

joker-eph · 2025-04-09T17:03:08Z

These pattern are also wrong
Wrong how?
Wrong cause they are crashing.

OK, we had a vocabulary mismatch here, "wrong" confused me, thanks for clarifying what you meant.

So I think we dont even need to get into canonicalization vs. not discussion.

You want to remove something done in canonicalization to make it not part of canonicalization: so while I sympathize, I don't quite see how you can avoid the conversation.

joker-eph · 2025-04-09T17:32:04Z

Thanks for elaborating @Groverkss!

So now, to Question 2, is the correct pattern, in whatever shape or form, a good canonicalization? For static cases, expand_shape already has a folder to move to a more canonical form, where the output_shape field in the op moves to a more canonical form (dynamic -> static). For dynamic cases, the answer to this question is No. The pattern is not a canonicalization for expand_shape at all! In it's correct form, it doesn't change the representation of expand_shape in any form, instead it is using information from expand_shape to bubble up the tensor.dim operations upwards. What I understand by "non-local" effects here is that it's doing transformations around the operation that is meant to be canonicalized, but not doing anything to the operation itself. So it's "non-local" to the operation being canonicalized. This is a clear indication that this pattern is not really a canonicalization.

I see two things here:

"a pattern not touching the operation itself its not a canonicalization", I'm not sure where is this coming from? Idon't quite see why this should be criteria? This is actually something that is often done: you know you can simplify an op from another dialect when it consumes results from your own dialect, so you just load such a pattern in your op that actually affect another one (cf below as well wrt layering).
The pattern is actually a DimOp canonicalization, this is in the signature: LogicalResult matchAndRewrite(DimOp dimOp. It is meant to simplify the dim operation, not the expand_shape operation.
Where is the pattern loaded isn't very important: I'm not sure why this pattern is loaded here instead of in the dimOp for example, in general when we do this is just a dialect layering question (one dialect loads canonicalization for an op in another dialect)

Finally, The pattern is implemented in transformations that refine shape information through ReifyRankedShapedTypeOpInterface

But that does not run in canonicalization right? If you wanted to just eliminate a duplicate implementation, you could call the implementation in ReifyRankedShapedTypeOpInterface from here to delegate it.

I'm surprised you didn't mention the use of affine.apply: are there other patterns in the Tensor dialect that introduce affine.apply (or affine dialect constructs) "out of thin air"? Seems like these one are the only two I can spot.

Would we be able to completely remove the dependency from the Tensor dialect to the affine dialect if these patterns are removed or are there other things left right now?

MaheshRavishankar · 2025-04-09T23:29:21Z

These pattern are also wrong
Wrong how?
Wrong cause they are crashing.

OK, we had a vocabulary mismatch here, "wrong" confused me, thanks for clarifying what you meant.

So I think we dont even need to get into canonicalization vs. not discussion.

You want to remove something done in canonicalization to make it not part of canonicalization: so while I sympathize, I don't quite see how you can avoid the conversation.

Let me understand what you are suggesting. Basically this pattern should never have been part of canonicalization, but it was. It is an outlier w.r.t to the designated way in tree for resolution of tensor.dim operations, and you want to "fix" this pattern to do exactly what the other pattern does. I dont see how that is a reasonable outcome here. I suggest you provide concrete details of how you want to proceed here. Blocking changes without providing guidance of next steps is not a very productive position to take.

joker-eph · 2025-04-10T08:04:55Z

Basically this pattern should never have been part of canonicalization,

Can you elaborate on why? You're stating a fact, but as I mentioned before, while you may very well be right (I don't know, you still haven't explained), it would be helpful if you could make the reasoning behind this explicit.

. Blocking changes without providing guidance of next steps is not a very productive position to take.

Sorry if you feel that way, I believe I provided next steps quite a few times above, you just seem to have missed it maybe?
For example I wrote this that you didn't follow up on (unless I missed it?):

Maybe an example showing the effect of this canonicalization pattern that shouldn't be a canonicalization?

joker-eph · 2025-04-10T11:59:09Z

I chatted with @Groverkss and we identified the reason these can't work well as canonicalization. It isn't visible on collapse_shape example because they trigger a multiplication operation, however it becomes very visible on the expand_shape operation which introduces a floor_div. The problem is that this is an information loss that can't be just recovered easily. The expand_shape guarantees an "exact division", which can't be expressed with affine_apply apparently. Losing information is obviously never OK in canonicalization.

Please update the PR description accordingly please @vivekkhandelwal1, something like:

Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape.

These are problematic because the iterative application that locally resolves the tensor.dim operation introduces
intermediate floor_div which are losing the information about the exact division that was carried out in the original
IR, and the iterative algorithm can't converge towards the simplest form.
Information loss is not acceptable for a canonicalization.

Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and
resolve-shaped-type-result-dims passes.

vivekkhandelwal1 · 2025-04-10T12:47:01Z

I chatted with @Groverkss and we identified the reason these can't work well as canonicalization. It isn't visible on collapse_shape example because they trigger a multiplication operation, however it becomes very visible on the expand_shape operation which introduces a floor_div. The problem is that this is an information loss that can't be just recovered easily. The expand_shape guarantees an "exact division", which can't be expressed with affine_apply apparently. Losing information is obviously never OK in canonicalization.

Please update the PR description accordingly please @vivekkhandelwal1, something like:

Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape.
These are problematic because the iterative application that locally resolves the tensor.dim operation introduces
intermediate floor_div which are losing the information about the exact division that was carried out in the original
IR, and the iterative algorithm can't converge towards the simplest form.
Information loss is not acceptable for a canonicalization.
Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and
resolve-shaped-type-result-dims passes.

@joker-eph I have made the changes as asked. Thanks!

… on tensor.expand_shape/tensor.collapse_shape (llvm#134219) These are problematic because the iterative application that locally resolves the tensor.dim operation introduces intermediate floor_div, which is losing the information about the exact division that was carried out in the original IR, and the iterative algorithm can't converge towards the simplest form. Information loss is not acceptable for canonicalization. Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and resolve-shaped-type-result-dims passes. --------- Signed-off-by: Vivek Khandelwal <[email protected]>

llvmbot added mlir mlir:tensor labels Apr 3, 2025

vivekkhandelwal1 mentioned this pull request Apr 3, 2025

Integrate LLVM at 6d38dbf6eb56fd2b3399565af455de96a99ffa0f llvm/torch-mlir#4103

Merged

joker-eph requested changes Apr 3, 2025

View reviewed changes

Fix regression tests

532a053

llvmbot added the mlir:linalg label Apr 3, 2025

vivekkhandelwal1 requested a review from joker-eph April 3, 2025 14:17

Add test for reproducing the crash

835a35c

vivekkhandelwal1 force-pushed the expandshape-op branch from 5c6b295 to 835a35c Compare April 3, 2025 14:19

Groverkss requested review from MaheshRavishankar and qedawkins April 3, 2025 14:48

Groverkss requested changes Apr 3, 2025

View reviewed changes

qedawkins previously requested changes Apr 3, 2025

View reviewed changes

qedawkins reviewed Apr 7, 2025

View reviewed changes

mlir/test/Dialect/Tensor/canonicalize.mlir Show resolved Hide resolved

Groverkss approved these changes Apr 7, 2025

View reviewed changes

vivekkhandelwal1 added 2 commits April 8, 2025 10:11

Remove repro test

3defa99

Remove dead canonicalization patterns

df091ce

MaheshRavishankar approved these changes Apr 8, 2025

View reviewed changes

Groverkss reviewed Apr 9, 2025

View reviewed changes

vivekkhandelwal1 changed the title ~~[MLIR][Tensor] Remove FoldDimOf[Expand|Collapse]Shape Pattern~~ [MLIR][Tensor] Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape Apr 10, 2025

joker-eph approved these changes Apr 10, 2025

View reviewed changes

qedawkins merged commit e377a5d into llvm:main Apr 11, 2025
11 checks passed

vivekkhandelwal1 deleted the expandshape-op branch April 12, 2025 04:33

	// `dim` is the only dynamic dimension in `group`. (Otherwise, the
	// ExpandShapeOp would be ambiguous.)

[MLIR][Tensor] Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape #134219

[MLIR][Tensor] Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape #134219

Uh oh!

Conversation

vivekkhandelwal1 commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vivekkhandelwal1 commented Apr 3, 2025

Uh oh!

joker-eph commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vivekkhandelwal1 commented Apr 3, 2025

Uh oh!

vivekkhandelwal1 commented Apr 3, 2025

Uh oh!

vivekkhandelwal1 commented Apr 3, 2025

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

joker-eph commented Apr 3, 2025

Uh oh!

qedawkins left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar commented Apr 3, 2025

Uh oh!

vivekkhandelwal1 commented Apr 7, 2025

Uh oh!

qedawkins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

vivekkhandelwal1 commented Apr 8, 2025

Uh oh!

joker-eph commented Apr 8, 2025

Uh oh!

MaheshRavishankar commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar commented Apr 8, 2025

Uh oh!

Groverkss commented Apr 9, 2025

Uh oh!

Groverkss Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

joker-eph Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss commented Apr 9, 2025

Uh oh!

joker-eph commented Apr 9, 2025

Uh oh!

joker-eph commented Apr 9, 2025

Uh oh!

MaheshRavishankar commented Apr 9, 2025

vivekkhandelwal1 commented Apr 3, 2025 •

edited

Loading

llvmbot commented Apr 3, 2025 •

edited

Loading

joker-eph left a comment •

edited

Loading

joker-eph commented Apr 3, 2025 •

edited

Loading

MaheshRavishankar commented Apr 8, 2025 •

edited

Loading

joker-eph commented Apr 8, 2025 •

edited

Loading

joker-eph commented Apr 10, 2025 •

edited

Loading

joker-eph commented Apr 10, 2025 •

edited

Loading