Skip to content

[mlir][docs] Update documentation for canonicalize. #99753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

cxy-1993
Copy link
Contributor

Update canonicalize docs.

@llvmbot
Copy link
Member

llvmbot commented Jul 20, 2024

@llvm/pr-subscribers-mlir

Author: donald chen (cxy-1993)

Changes

Update canonicalize docs.


Full diff: https://github.com/llvm/llvm-project/pull/99753.diff

1 Files Affected:

  • (modified) mlir/docs/Canonicalization.md (+60-1)
diff --git a/mlir/docs/Canonicalization.md b/mlir/docs/Canonicalization.md
index d1cba572af212..f4c4edcefb115 100644
--- a/mlir/docs/Canonicalization.md
+++ b/mlir/docs/Canonicalization.md
@@ -33,6 +33,10 @@ together.
 
 Some important things to think about w.r.t. canonicalization patterns:
 
+*   The goal of canonicalization is to make subsequent analyses and
+    optimizations more effective. Therefore, performance improvements are not
+    necessary for canonicalization.
+
 *   Pass pipelines should not rely on the canonicalizer pass for correctness.
     They should work correctly with all instances of the canonicalization pass
     removed.
@@ -51,6 +55,61 @@ Some important things to think about w.r.t. canonicalization patterns:
 *   It is always good to eliminate operations entirely when possible, e.g. by
     folding known identities (like "x + 0 = x").
 
+*   Canonicalization isn't a great place to put pattens with expensive running
+    time (i.e. have O(n) complexity) or complicated cost models.
+
+*   Canonicalize shouldn't lose the semantic of original operation: the original
+    information should always be recoverable from the transformed IR.
+
+For example, a pattern that transform
+
+```
+  %0 = tensor.insert_slice %slice into
+     %x[0, 0, 0, 0, 0][1, 1, 1, 16, 32][1, 1, 1, 1, 1] :
+     tensor<16x32xf32> into tensor<1x1x1x16x32xf32>
+```
+
+to
+
+```
+  %0 = tensor.expand_shape %slice[[0,1,2,3], [4]] :
+           tensor<16x32xf32> into tensor<1x1x1x16x32xf32>
+```
+
+is not a good canonicalize pattern because it lose the destination style
+semantic.
+
+
+A pattern that transform (linalg.transpose is only use of %broadcast)
+
+```
+  %broadcast = linalg.broadcast
+      ins(%input : tensor<2x4x5xf32>)
+      outs(%init1 : tensor<1x2x3x4x5x6xf32>)
+      dimensions = [0, 2, 5]
+  %transpose = linalg.transpose
+      ins(%broadcast : tensor<1x2x3x4x5x6xf32>)
+      outs(%init2 : tensor<1x6x2x3x5x4xf32>)
+      permutation = [0, 5, 1, 2, 4, 3]
+```
+
+to
+
+```
+  %tranpose = linalg.transpose
+      ins(%input : tensor<2x4x5xf32>)
+      outs(%tmp_init : tensor<2x5x4xf32>)
+      permutation = [0, 2, 1]
+  %broadcast = linalg.broadcast
+      ins(%transpose : tensor<2x5x4xf32>)
+      outs(%init2 : tensor<1x6x2x3x5x4xf32>)
+      dimensions = [0, 3, 1]
+```
+
+is a good canonicalize pattern because this pattern always transforms the
+program towards reducing the amount of computational data and keeps the semantic
+of original operation.
+
 ## Globally Applied Rules
 
 These transformations are applied to all levels of IR:
@@ -189,7 +248,7 @@ each of the operands, returning the corresponding constant attribute. These
 operands are those that implement the `ConstantLike` trait. If any of the
 operands are non-constant, a null `Attribute` value is provided instead. For
 example, if MyOp provides three operands [`a`, `b`, `c`], but only `b` is
-constant then `adaptor` will return Attribute() for `getA()` and `getC()`, 
+constant then `adaptor` will return Attribute() for `getA()` and `getC()`,
 and b-value for `getB()`.
 
 Also above, is the use of `OpFoldResult`. This class represents the possible

@cxy-1993
Copy link
Contributor Author

This PR is resubmit from #97063 with comment modified

```

is not a good canonicalize pattern because it lose the destination style
semantic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what "lose the destination style semantic" means, nor why it would be a bad canonicalization.

I think we should stick to very obvious examples here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example was probably chosen because it could trip up the bufferization analysis (and cause less efficient codegen). Destination-style was originally introduced to help with bufferization and I think there is currently no consensus on whether such "hints"/IR structures should be preserved by canonicalization.

The reader of this section may not know about destination style. Maybe a better example: rewriting a multiplication into a sequence of additions?

semantic.


A pattern that transform (linalg.transpose is only use of %broadcast)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(linalg.transpose is only use of %broadcast)

This seems to indicate that the canonicalization need to take into account the number of users, however this does not seem like a generally good guideline.

I think again this example isn't just an obvious enough one to be in the general canonicalization documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern would work even if %broadcast has multiple users. By creating a new broadcast instead of modifying the existing one in-place. If the old broadcast actually had only a single user, it would then DCE away. If not, we now have two broadcasts (which may not be a bad thing).

Swapping patterns/bubble-up patterns/bubble-down pattern are a common class of patterns. It would be nice to clarify if they are suitable for canonicalization.

Comment on lines 58 to 59
* Canonicalization isn't a great place to put pattens with expensive running
time (i.e. have O(n) complexity) or complicated cost models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Canonicalization isn't a great place to put pattens with expensive running
time (i.e. have O(n) complexity) or complicated cost models.
* Pattens with expensive running time (i.e. have O(n) complexity) or complicated cost models
don't belong to canonicalization: since the algorithm is executed iteratively until fixed-point
we want patterns that execute quickly (in particular their matching phase).

@cxy-1993 cxy-1993 force-pushed the update-canonicalize-doc branch from bc3fbd0 to 28926a3 Compare July 21, 2024 10:13
@cxy-1993
Copy link
Contributor Author

Initially, I wanted to use some less clear examples to express our preferences. After all, it is natural to use patterns like x+0->x for canonicalization without the need for further emphasis. However, I also respect your opinion that such patterns fall within our gray area and should not be used as examples. I have made changes based on your suggestions, please review them again. @joker-eph @matthias-springer

@cxy-1993 cxy-1993 merged commit 324fea9 into llvm:main Jul 22, 2024
7 checks passed
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants