-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR][Tensor] Remove tensor.dim canonicalization patterns registered on tensor.expand_shape/tensor.collapse_shape #134219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit removes the FoldDimOfExpandShape and FoldDimOfCollapseShape pattern from the list of ExpandShapeOp's canonicalization patterns. The above pattern were resulting in crash while folding the dims of an expanded tensor. The issue can be reproduced by undoing the changes done in this commit and by running the command: ``` mlir-opt --linalg-fuse-elementwise-ops repro.mlir ``` over the IR: https://gist.github.com/vivekkhandelwal1/56a1a398c21d739df77a67ce372b9366. Signed-off-by: Vivek Khandelwal <[email protected]>
@llvm/pr-subscribers-mlir-linalg @llvm/pr-subscribers-mlir Author: Vivek Khandelwal (vivekkhandelwal1) ChangesThis commit removes the FoldDimOfExpandShape and FoldDimOfCollapseShape pattern from the list of ExpandShapeOp's canonicalization patterns. The above pattern were resulting in crash while folding the dims of an expanded tensor. The issue can be reproduced by undoing the changes done in this commit and by running the command:
over the IR: https://gist.github.com/vivekkhandelwal1/56a1a398c21d739df77a67ce372b9366. Full diff: https://github.com/llvm/llvm-project/pull/134219.diff 1 Files Affected:
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index d589f627d896e..35265ac49fcbf 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -2158,8 +2158,7 @@ void ExpandShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,
ComposeExpandOfCollapseOp<ExpandShapeOp, CollapseShapeOp>,
ConvertToStaticExpandShape, FoldReshapeWithConstant<ExpandShapeOp>,
FoldReshapeWithSplat<ExpandShapeOp>,
- FoldReshapeWithFromElements<ExpandShapeOp>, FoldDimOfExpandShape,
- FoldDimOfCollapseShape>(context);
+ FoldReshapeWithFromElements<ExpandShapeOp>>(context);
}
void CollapseShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused that there is no test change? That points to a previous lack of coverage...
Edit: actually the CI is failing, there are test changes and we had coverage :)
Can you add minimal test demonstrating the issue you're fixing?
I have fixed the failing regression tests. Sorry, I forgot to do that while creating the PR. |
We still need a test reproducing the crash you mentioned, right now it's not clear if what you're doing in this PR is fundamentally the right fix here or just a "big hammer" solution for the crash. |
Sure, I will add a test which would crash without these changes. |
5c6b295
to
835a35c
Compare
Hi @joker-eph, I have added a test which will crash without the changes made in this PR. |
Also, after the removal of those canonicalization patterns, the code below becomes dead code since it's not used anywhere else. Shall I remove that, too? llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp Lines 1989 to 2071 in b61e387
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should the canonicalization patterns be removed instead of fixing them? I don't underestand the reasoning behind this PR
Thanks! Is it really minimal? Also can you comment on my previous question:
Why is removing these the right fix for the crash? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, dropping the canonicalizers is not the right fix here. The crash is because this comment is outdated:
llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
Lines 2011 to 2012 in 835a35c
// `dim` is the only dynamic dimension in `group`. (Otherwise, the | |
// ExpandShapeOp would be ambiguous.) |
expand_shape
takes a list of output sizes now so we can simplify these canonicalizers to just forward the value in output_shape
Dropping these canonicalizations is the right thing. These patterns are redundant w.r.t
tensor.dim of results w.r.t dimensions of its operands. There are two reasons why this canonicalization is bad
|
Thanks @MaheshRavishankar for adding the rationale behind these changes. I request all the existing reviewers to re-review this PR since we have been blocked on the Torch-MLIR for quite some time because of this. CC: @joker-eph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if this is in line with canonicalization of tensor.dim w.r.t. other ops then this makes sense. This wasn't immediately clear from the initial PR description. I'll bow out of the review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can you just update the PR description to include:
This PR makes the tensor.dim folders for tensor.expand_shape/tensor.collapse_shape consistent with other tensor operations. In general:
1. All tensor.dim operation resolutions have been moved to use the ReifyRankedShapedTypeOpInterface.
2. None of the other tensor.dim resolution patterns are run during canonicalizations. They have non-local effects, which make them bad for canonicalizations, and were removed from other tensor operations some time ago.
Hi, I don't have merge access. Can someone please merge this for me? |
I don't see the non-local effects on the current test change, can you elaborate on this please? Maybe an example showing the effect of this canonicalization pattern that shouldn't be a canonicalization? (I'd rather see this resolved before anyone merges this) |
Let me clarify. These patterns are the outliers. None of the other It would be better to not block this change cause this is removing a redundant and wrong pattern. |
Wrong how?
It's a bit ambiguous what you call redundant, if you compare to "other things that are not part of canonicalization" then it's a bit dubious to claim it "redundant" as it won't be part of the canonicalization run.
This is all an opinion, but I'd rather see you elaborate you're thinking to enlighten everyone, instead of just making a strong claim that you're not trying to substantiate right now (other than very unspecific terms like "wrong" and "redundant").
may very well be the absolute right thing, however it's important to be able to convey why this is the right ultimate thing. |
Wrong cause they are crashing.
Redundant because these pattern are replacing
It isnt a strong claim. They literally do the same thing that another pattern in the code base does (redundant) and is crashing (wrong). I dont know what else you think would be needed to substantiate that this is wrong and redundant.
I dont want to get dragged into canonicalization vs. non-canonicalization discussion again and again. These two patterns are outliers since they are resolving |
Ok let's take a step back and see why this patch is a good patch. I looked through the history and code and here is the reasoning: There are two questions we are answering here:
The answer to Question 1, is that this pattern is wrong. It's trying to infer the output shape of the tensor, which is already provided in the operation! This pattern is from the time when expand_shape did not carry this information. The output shape was explicitly added to the operation in #90040 and has reasoning why for dynamic shapes, the output shape inference is wrong. So now we have two options, either fix the pattern, or remove it. Thing is, the same pattern is implemented correctly through ReifyRankedShapedTypeOpInterface as a general pattern. So it is okay to remove the pattern, as once fixed, this pattern is a duplicate of a more general pattern. So now, to Question 2, is the correct pattern, in whatever shape or form, a good canonicalization? For static cases, expand_shape already has a folder to move to a more canonical form, where the Finally, The pattern is implemented in transformations that refine shape information through ReifyRankedShapedTypeOpInterface so we aren't losing any old transformations here, this patch is fixing the questions I explained above. @joker-eph @MaheshRavishankar Hope this clears it up. I think this should provide a clear answer why this pattern is being removed as a canonicalization, and also being deleted at the same time. Let me know if you need more clarification. |
// CHECK: #[[$MAP:.*]] = affine_map<()[s0] -> (s0 * 64)> | ||
// CHECK-LABEL: @compose_expand_of_collapse_last_two_dims | ||
// CHECK-SAME: %[[ARG0:.+]]: tensor<?x64x1xf32> | ||
// CHECK: %[[CONSTANT0:.+]] = arith.constant 0 : index | ||
// CHECK: %[[CONSTANT384:.+]] = arith.constant 384 : index | ||
// CHECK: %[[CONSTANT0:.+]] = arith.constant 0 : index | ||
// CHECK: %[[COLLAPSE:.+]] = tensor.collapse_shape %[[ARG0]] {{\[}}[0, 1, 2]] : tensor<?x64x1xf32> into tensor<?xf32> | ||
// CHECK: %[[DIM:.+]] = tensor.dim %[[ARG0]], %[[CONSTANT0]] : tensor<?x64x1xf32> | ||
// CHECK: %[[AFFAPPLY:.+]] = affine.apply #[[$MAP]]()[%[[DIM]]] | ||
// CHECK: %[[DIVUI:.+]] = arith.divui %[[AFFAPPLY]], %[[CONSTANT384]] : index | ||
// CHECK: %[[DIM:.+]] = tensor.dim %[[COLLAPSE]], %[[CONSTANT0]] : tensor<?xf32> | ||
// CHECK: %[[DIVUI:.+]] = arith.divui %[[DIM]], %[[CONSTANT384]] : index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joker-eph For an example of non-local effects, the previous canonicalization pattern for collapse_shape, was not moving collapse_shape to a more canonical form. It's actually not touching the collapse_shape at all, and is doing things around it.
It's very hard to say what the canonical form of where tensor.dim should be here is. There is no reason for the tensor.dim, when propagated up to be more canonical. It can possibly introduce 2 tensor.dim operations, if there were 2 dynamic dimensions before the collapse shape. If there is a cost to a tensor.dim operation, you now pay the cost twice. We cannot really say if one form is more canonical than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this current example, simplified we have:
func.func @dim_expand_shape(%arg0: tensor<?x64x1xf32>) -> index {
%c0 = arith.constant 0 : index
%collapsed = tensor.collapse_shape %arg0 [[0, 1, 2]] : tensor<?x64x1xf32> into tensor<?xf32>
%dim = tensor.dim %collapsed, %c0 : tensor<?xf32>
return %dim : index
}
If I want to know something about the dim, it's just the first dimension of the collapsed_shape and I have now to reason about it.
After canonicalization we have:
#map = affine_map<()[s0] -> (s0 * 64)>
module {
func.func @dim_expand_shape(%arg0: tensor<?x64x1xf32>) -> index {
%c0 = arith.constant 0 : index
%dim = tensor.dim %arg0, %c0 : tensor<?x64x1xf32>
%0 = affine.apply #map()[%dim]
return %0 : index
}
The dimension computation is made pretty explicit in terms of the input (the calculation is explicit) and the tensor.collapse_shape even disappears.
There is no reason for the tensor.dim, when propagated up to be more canonical.
It's not always obvious what the "best" canonical form should be, that does not mean we can't pick one though: often it resolves just about picking one convention and sticking to it to be consistent (I don't know if this is the right convention here, just saying).
It can possibly introduce 2 tensor.dim operations, if there were 2 dynamic dimensions before the collapse shape.
Here is the example you describe I believe:
func.func @dim_expand_shape(%arg0: tensor<?x?x64x1xf32>) -> index {
%c0 = arith.constant 0 : index
%collapsed = tensor.collapse_shape %arg0 [[0, 1, 2, 3]] : tensor<?x?x64x1xf32> into tensor<?xf32>
%dim = tensor.dim %collapsed, %c0 : tensor<?xf32>
return %dim : index
}
Becomes:
#map = affine_map<()[s0, s1] -> ((s0 * s1) * 64)>
module {
func.func @dim_expand_shape(%arg0: tensor<?x?x64x1xf32>) -> index {
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%dim = tensor.dim %arg0, %c0 : tensor<?x?x64x1xf32>
%dim_0 = tensor.dim %arg0, %c1 : tensor<?x?x64x1xf32>
%0 = affine.apply #map()[%dim, %dim_0]
return %0 : index
}
}
As mentioned elsewhere, the cost of "dim" does not seem meaningful to me here. The question is rather about whether this IR is more amenable to reasoning on the values, or equivalence between values in further analysis. Any argument long this line of reasoning can be very valuable to make a statement about whether it is clearly belonging / not belonging to canonicalization.
// CHECK: #[[$map:.*]] = affine_map<()[s0] -> (s0 floordiv 40)> | ||
// CHECK-LABEL: func @dim_of_expand_shape( | ||
// CHECK-SAME: %[[t:.*]]: tensor<?x?xf32> | ||
// CHECK: %[[c1:.*]] = arith.constant 1 : index | ||
// CHECK: %[[dim:.*]] = tensor.dim %[[t]], %[[c1]] : tensor<?x?xf32> | ||
// CHECK: %[[apply:.*]] = affine.apply #[[$map]]()[%[[dim]]] | ||
// CHECK: return %[[apply]] | ||
// CHECK: %[[c2:.*]] = arith.constant 2 : index | ||
// CHECK: %[[expanded:.*]] = tensor.expand_shape %[[t]] {{\[\[}}0], [1, 2, 3, 4, 5]] output_shape [%arg1, 1, %arg2, 5, 1, 8] : tensor<?x?xf32> into tensor<?x1x?x5x1x8xf32> | ||
// CHECK: %[[dim:.*]] = tensor.dim %[[expanded]], %[[c2]] : tensor<?x1x?x5x1x8xf32> | ||
// CHECK: return %[[dim]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joker-eph For an example of the old pattern being wrong, notice that we already know the output shape. The result should just be %arg2 here! But it's instead it was doing some magic with affine.apply, which is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement a folder for this for tensor.dim, but that doesn't change the fact that this pattern is wrong, and doesn't even belong as a canonicalization to expand_shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement a folder for this for tensor.dim, but that doesn't change the fact that this pattern is wrong, and doesn't even belong as a canonicalization to expand_shape.
I left some inline comments on test changes showcasing to back my reasoning |
OK, we had a vocabulary mismatch here, "wrong" confused me, thanks for clarifying what you meant.
You want to remove something done in canonicalization to make it not part of canonicalization: so while I sympathize, I don't quite see how you can avoid the conversation. |
Thanks for elaborating @Groverkss!
I see two things here:
But that does not run in canonicalization right? If you wanted to just eliminate a duplicate implementation, you could call the implementation in ReifyRankedShapedTypeOpInterface from here to delegate it. I'm surprised you didn't mention the use of Would we be able to completely remove the dependency from the Tensor dialect to the affine dialect if these patterns are removed or are there other things left right now? |
Let me understand what you are suggesting. Basically this pattern should never have been part of canonicalization, but it was. It is an outlier w.r.t to the designated way in tree for resolution of |
Can you elaborate on why? You're stating a fact, but as I mentioned before, while you may very well be right (I don't know, you still haven't explained), it would be helpful if you could make the reasoning behind this explicit.
Sorry if you feel that way, I believe I provided next steps quite a few times above, you just seem to have missed it maybe?
|
I chatted with @Groverkss and we identified the reason these can't work well as canonicalization. It isn't visible on collapse_shape example because they trigger a multiplication operation, however it becomes very visible on the expand_shape operation which introduces a Please update the PR description accordingly please @vivekkhandelwal1, something like:
|
@joker-eph I have made the changes as asked. Thanks! |
… on tensor.expand_shape/tensor.collapse_shape (llvm#134219) These are problematic because the iterative application that locally resolves the tensor.dim operation introduces intermediate floor_div, which is losing the information about the exact division that was carried out in the original IR, and the iterative algorithm can't converge towards the simplest form. Information loss is not acceptable for canonicalization. Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and resolve-shaped-type-result-dims passes. --------- Signed-off-by: Vivek Khandelwal <[email protected]>
These are problematic because the iterative application that locally resolves the tensor.dim operation introduces
intermediate floor_div, which is losing the information about the exact division that was carried out in the original
IR, and the iterative algorithm can't converge towards the simplest form.
Information loss is not acceptable for canonicalization.
Resolving the dimOp can be achieved through resolve-ranked-shaped-type-result-dims and
resolve-shaped-type-result-dims passes.