Skip to content

[mlir][scf] Add simple LICM pattern for scf.while #76370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Hardcode84
Copy link
Contributor

Move non-side-effecting ops from before block if all their args are defined outside the loop.
Doesn't visit nested regions.
This cleanup is needed for scf.while -> scf.for uplifting #76108 as it expects before block consisting of single cmp op.

Move non-side-effecting ops from `before` region if all their args are defined outside the loop.
This is cleanup needed for `scf.while` -> `scf.for` uplifting llvm#76108 as it expects `before` block consisting of single `cmp` op.
@llvmbot
Copy link
Member

llvmbot commented Dec 25, 2023

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: Ivan Butygin (Hardcode84)

Changes

Move non-side-effecting ops from before block if all their args are defined outside the loop.
Doesn't visit nested regions.
This cleanup is needed for scf.while -> scf.for uplifting #76108 as it expects before block consisting of single cmp op.


Full diff: https://github.com/llvm/llvm-project/pull/76370.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/SCF/IR/SCF.cpp (+31-1)
  • (modified) mlir/test/Dialect/SCF/canonicalize.mlir (+26-3)
diff --git a/mlir/lib/Dialect/SCF/IR/SCF.cpp b/mlir/lib/Dialect/SCF/IR/SCF.cpp
index 5570c2ec688c8a..1f5c420e4775fa 100644
--- a/mlir/lib/Dialect/SCF/IR/SCF.cpp
+++ b/mlir/lib/Dialect/SCF/IR/SCF.cpp
@@ -3810,6 +3810,36 @@ struct WhileRemoveUnusedArgs : public OpRewritePattern<WhileOp> {
   }
 };
 
+/// Simple Loop Invariant Code Motion pattern for `scf.while` op.
+/// `scf.while` to `scf.for` uplifting expects `before` block consisting of
+/// single `cmp` op.
+/// Pattern moves ops from `before` block, doesn't visit nested regions.
+struct SCFWhileLICM : public OpRewritePattern<WhileOp> {
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(WhileOp loop,
+                                PatternRewriter &rewriter) const override {
+    bool changed = false;
+
+    DominanceInfo dom;
+    Block *body = loop.getBeforeBody();
+    for (Operation &op :
+         llvm::make_early_inc_range(body->without_terminator())) {
+      if (llvm::any_of(op.getOperands(), [&](Value arg) {
+            return !dom.properlyDominates(arg, loop);
+          }))
+        continue;
+
+      if (!isMemoryEffectFree(&op))
+        continue;
+
+      rewriter.updateRootInPlace(&op, [&]() { op.moveBefore(loop); });
+      changed = true;
+    }
+    return success(changed);
+  }
+};
+
 /// Remove duplicated ConditionOp args.
 struct WhileRemoveDuplicatedResults : public OpRewritePattern<WhileOp> {
   using OpRewritePattern::OpRewritePattern;
@@ -3879,7 +3909,7 @@ void WhileOp::getCanonicalizationPatterns(RewritePatternSet &results,
   results.add<RemoveLoopInvariantArgsFromBeforeBlock,
               RemoveLoopInvariantValueYielded, WhileConditionTruth,
               WhileCmpCond, WhileUnusedResult, WhileRemoveDuplicatedResults,
-              WhileRemoveUnusedArgs>(context);
+              SCFWhileLICM, WhileRemoveUnusedArgs>(context);
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/SCF/canonicalize.mlir b/mlir/test/Dialect/SCF/canonicalize.mlir
index 52e0fdfa36d6cd..f8f4d737b381df 100644
--- a/mlir/test/Dialect/SCF/canonicalize.mlir
+++ b/mlir/test/Dialect/SCF/canonicalize.mlir
@@ -1022,10 +1022,10 @@ func.func @while_loop_invariant_argument_different_order(%arg : tensor<i32>) ->
 // CHECK-SAME: (%[[ARG:.+]]: tensor<i32>)
 // CHECK:    %[[ZERO:.*]] = arith.constant dense<0>
 // CHECK:    %[[ONE:.*]] = arith.constant dense<1>
+// CHECK:    %[[COND:.*]] = arith.cmpi sgt, %[[ARG]], %[[ZERO]]
+// CHECK:    %[[COND1:.*]] = tensor.extract %[[COND]][]
 // CHECK:    %[[WHILE:.*]]:2 = scf.while (%[[ARG1:.*]] = %[[ONE]], %[[ARG4:.*]] = %[[ZERO]])
-// CHECK:       arith.cmpi sgt, %[[ARG]], %[[ZERO]]
-// CHECK:       tensor.extract %{{.*}}[]
-// CHECK:       scf.condition(%{{.*}}) %[[ARG1]], %[[ARG4]]
+// CHECK:       scf.condition(%[[COND1]]) %[[ARG1]], %[[ARG4]]
 // CHECK:    } do {
 // CHECK:     ^{{.*}}(%{{.*}}: tensor<i32>, %{{.*}}: tensor<i32>):
 // CHECK:       scf.yield %[[ZERO]], %[[ONE]]
@@ -1144,6 +1144,29 @@ func.func @while_duplicated_res() -> (i32, i32) {
 // CHECK:         }
 // CHECK:         return %[[RES]], %[[RES]] : i32, i32
 
+// -----
+
+func.func @while_licm(%arg1: i32, %arg2: i32, %arg3: i32) {
+  scf.while () : () -> () {
+    %val0 = arith.addi %arg1, %arg2 : i32
+    %val = arith.addi %val0, %arg3 : i32
+    %condition = "test.condition"(%val) : (i32) -> i1
+    scf.condition(%condition)
+  } do {
+  ^bb0():
+    "test.test"() : () -> ()
+    scf.yield
+  }
+  return
+}
+// CHECK-LABEL: @while_licm
+//  CHECK-SAME:  (%[[ARG1:.*]]: i32, %[[ARG2:.*]]: i32, %[[ARG3:.*]]: i32)
+//       CHECK:  %[[VAL0:.*]] = arith.addi %[[ARG1]], %[[ARG2]] : i32
+//       CHECK:  %[[VAL1:.*]] = arith.addi %[[VAL0]], %[[ARG3]] : i32
+//       CHECK:  scf.while
+//  CHECK-NEXT:  %[[COND:.*]] = "test.condition"(%[[VAL1]]) : (i32) -> i1
+//  CHECK-NEXT:  scf.condition(%[[COND]])
+
 
 // -----
 

Copy link
Member

@matthias-springer matthias-springer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a LICM pass that hoists ops from loops. (I recently extended it to support scf.while.) This change duplicates a part of the LICM pass in a canonicalization pattern.

Your scf.while uplifting pattern is probably just one example of a pattern that is enabled by LICM, so I'd be in favor of handling this in a more generic way: we could apply LICM as part of every greedy pattern rewrite, before or after each greedy pattern rewriter iteration (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp#L763). We already perform various IR optimizations in there: region simplification and constant de-duplication. In practice, the latter is often a form of LICM (when CSE'ing a constant outside of a loop with a constant inside of a loop), so there is some precedent.

@ThomasRaoux
Copy link
Contributor

I don't think LICM should belong to canonicalization. LICM is not always good for code quality as it may increase register pressure so applying it blindly during canonicalization seems dangerous.
Why not keep it separated as it is currently done (as pointed out by @matthias-springer)

@Hardcode84
Copy link
Contributor Author

Note: I'm only LICM from before block which is always executed at least once

@Hardcode84
Copy link
Contributor Author

I don't think LICM should belong to canonicalization. LICM is not always good for code quality as it may increase register pressure so applying it blindly during canonicalization seems dangerous.

This is true for general loops, but for the before block it should be ok IMO, as it always executed at least once.

@joker-eph
Copy link
Collaborator

I don't think LICM should belong to canonicalization. LICM is not always good for code quality as it may increase register pressure so applying it blindly during canonicalization seems dangerous.

I don't see a contraction between what you're saying and the canonicalization.

The process of canonicalization isn't to make the code faster, it is to make two equivalent input programs with the same IR.

It is expected that a cost-model and target specific lowering has to do some scheduling, and likely undo some potential canonicalizations.

@Hardcode84
Copy link
Contributor Author

After some thought, I actually like the idea of making general LICM part of greedyRewriter/canonicalizer.
The plan will be:

  • Add doLICM option to GreedyRewriteConfig (false by default)
  • Have if enabled in canonicalizer pass (we can add a pass option to control it, if we really want it)

@ThomasRaoux
Copy link
Contributor

I don't think LICM should belong to canonicalization. LICM is not always good for code quality as it may increase register pressure so applying it blindly during canonicalization seems dangerous.

I don't see a contraction between what you're saying and the canonicalization.

The process of canonicalization isn't to make the code faster, it is to make two equivalent input programs with the same IR.

By that definition many optimizations can be considered a canonicalization but yet I don't think we would want to do them as part of canonicalizations. There are precedents where we decided to not do some transformation as canonicalizations as they are too opinionated. I think this is one of those cases.

It is expected that a cost-model and target specific lowering has to do some scheduling, and likely undo some potential canonicalizations.

My concern is that it won't be possible to consider the initial IR to decide whether some IR should be inside or outside the loop. It seems common for compilers to have some kind of cost model for LICM (or other code motion) and this will prevent that for all users.

@ThomasRaoux
Copy link
Contributor

After some thought, I actually like the idea of making general LICM part of greedyRewriter/canonicalizer. The plan will be:

  • Add doLICM option to GreedyRewriteConfig (false by default)
  • Have if enabled in canonicalizer pass (we can add a pass option to control it, if we really want it)

what is the advantage of doing that as part of the rewriter compared to including the LICM pattern explicitly when needed? LICM is kind of SCF specific in this case, is the idea to add a LICM interface that dialects can register?

@matthias-springer
Copy link
Member

I’ve seen cases where it was beneficial to interleave CSE with pattern applications until a fix point is reached. CSE and patterns enabled each other and had to be run multiple times in a loop. (Whether that is a good design or not is a different question.) I had similar cases in mind with LICM and pattern applications.

Copy link
Contributor

@Mogball Mogball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a canonicalization pattern for a variety of reasons. Some others have already pointed out, that this is not an "always good" optimization and should not be enabled by default.

An additional problem is that any canonicalization pattern that walks the body of an operation risks O(N^2) blowup in the pattern rewriter: each time an op is modified in the body of the while, the pattern is triggered again, causing another walk even if nothing changes.

@joker-eph
Copy link
Collaborator

By that definition many optimizations can be considered a canonicalization but yet I don't think we would want to do them as part of canonicalizations.

Like what?

There are precedents where we decided to not do some transformation as canonicalizations as they are too opinionated. I think this is one of those cases.

Please elaborate: I'm on the opposite side and I don't quite perceive the reason here.
Not doing a canonicalization should be exceptional and very well motivated IMO. I'm wary of getting into being afraid of adding more canonicalization.

My concern is that it won't be possible to consider the initial IR to decide whether some IR should be inside or outside the loop. It seems common for compilers to have some kind of cost model for LICM (or other code motion) and this will prevent that for all users.

I don't understand what you're trying to say here? What does it mean "initial IR"?

To me the input program to the compiler is "arbitrary", I don't put any value on the "initial IR" in term of being more "optimal". Doing so would break composability and goes entirely against the concept of canonicalization as I understand it.

@joker-eph
Copy link
Collaborator

This shouldn't be a canonicalization pattern for a variety of reasons. Some others have already pointed out, that this is not an "always good" optimization and should not be enabled by default.

This is not the definition of a canonicalization: I strongly oppose this argument right now.
I argue that if you don't want this kind of transformation, then you don't want canonicalization by principle.

An additional problem is that any canonicalization pattern that walks the body of an operation risks O(N^2) blowup in the pattern rewrite

This is a better argument to me, to be studied in detail though.

@zero9178
Copy link
Member

zero9178 commented Jan 6, 2024

+1 to what mehdi said.

Canonical to me doesn't have to mean more optimised, but rather that the IR is in a shape more useful for future transformations. LICM is a transformation helping in that regard as it makes loops more canonical (by getting closer to a loop only containing loop variant operations).
The fact that it increases register pressure is a backend detail. Since the input program may already contain a loop in "post-LICM" form you'll want a transformation sinking an operation into the loop either way (regardless of whether LICM is part of the pass pipeline or not).

That said, if it has quadratic complexity it should probably not be enabled in the canonicalize pass. I see canonicalize being used analogous to something like InstSimplify in LLVM and applied very often to ensure canonical IR throughout the pass pipeline.

@ThomasRaoux
Copy link
Contributor

By that definition many optimizations can be considered a canonicalization but yet I don't think we would want to do them as part of canonicalizations.

Like what?

There are precedents where we decided to not do some transformation as canonicalizations as they are too opinionated. I think this is one of those cases.

Please elaborate: I'm on the opposite side and I don't quite perceive the reason here. Not doing a canonicalization should be exceptional and very well motivated IMO. I'm wary of getting into being afraid of adding more canonicalization.

Here are few examples of transformations that were moved out of canonicalization over time because having more control over when transformations are applied was necessary:
01055ed
da8a8e9
4142932

That's just some examples from a quick search but I recall there were more.

My concern is that it won't be possible to consider the initial IR to decide whether some IR should be inside or outside the loop. It seems common for compilers to have some kind of cost model for LICM (or other code motion) and this will prevent that for all users.

I don't understand what you're trying to say here? What does it mean "initial IR"?

To me the input program to the compiler is "arbitrary", I don't put any value on the "initial IR" in term of being more "optimal". Doing so would break composability and goes entirely against the concept of canonicalization as I understand it.

I understand the reasoning but in practice it is common for compilers to relying on the original program as fall back when a transformation is not obviously a win.

I don't understand where is the line with regards to canonicalization if we go that route. What kind of transformation would not be a canonicalization then? Would function inlining be a canonicalization for instance? And if not why?

My interpretation (admittedly not based on docs but inferred from use cases) was that canonicalization were a set of always good simplifications that allowed the rest of the compiler to make it simpler for other transformations to decide which form of IR to expect.

My concern is that this could significantly reduce user control over what transformations are applied as canonicalization has to be applied as all or nothing and those patterns are usually not exposed otherwise, preventing any alternative solution for users.

@rengolin
Copy link
Member

rengolin commented Jan 6, 2024

My interpretation (admittedly not based on docs but inferred from use cases) was that canonicalization were a set of always good simplifications that allowed the rest of the compiler to make it simpler for other transformations to decide which form of IR to expect.

Not just "always good", necessary. Without canonicalization, passes down the pipeline cannot assume anything and the complexity of pattern matching expressions can go combinatorial.

My concern is that this could significantly reduce user control over what transformations are applied as canonicalization has to be applied as all or nothing and those patterns are usually not exposed otherwise, preventing any alternative solution for users.

I think people are mixing two different things:

  • Canonicalization between "independent stages" of the pipeline, where one or more passes work in unison. For example one-shot bufferization and follow up cleanups, tile-and-fuse, lowering dialects, etc. Canonicalization between these stages is necessary to make sure that the stages have wiggle room to improve without breaking following stages.
  • Canonicalization between individual passes on the same "stage". For example, running ownership based deallocation right after bufferization. Canonicalization patterns can destroy the interim form of IR operations and make it harder for the second pass to find knowledge (without additional analysis passes).

Questions about the validity of keeping interim information that is prone to cleanups as a dependency between transformation passes on the side, there are always some information being left by analysis passes to following transformation passes that need to be kept. So it's not a clear-cut answer here.

Other artefacts (like annotation, properties, metadata) could be created and managed for different purposes, but having IR ops in a "known but not canonical" shape between two specific passes isn't unheard of and may be beneficial in some cases.

Now, back to this PR, I agree with @joker-eph here: LICM, even a simplified version, isn't canonicalization, it's a transformation.

@Hardcode84 if you add this pattern to the LICM pass and run it before your conversion, would that solve your problem?

@joker-eph
Copy link
Collaborator

joker-eph commented Jan 6, 2024

Now, back to this PR, I agree with @joker-eph here: LICM, even a simplified version, isn't canonicalization, it's a transformation.

Actually you're saying the opposite of me :)

I'm saying that:

for (auto value : vector) {
  int a = f(x, y, z); // loop invariant
  g(value, a);
}

and:

int a = f(x, y, z); // loop invariant
for (auto value : vector) {
  g(value, a);
}

Are equivalent programs, and the argument that the second one "may increase register pressure" (for example, use any kind of target-specific optimization) is not a valid objection to canonicalization.
(and I would want my compiler to treat these two program identically)

@joker-eph
Copy link
Collaborator

Here are few examples of transformations that were moved out of canonicalization over time because having more control over when transformations are applied was necessary:

Some of these aren't motivated, so it's hard to form an opinion. But when I read:

The patterns to remove dead arguments and results of linalg.generic operations are not necessarily canonicalizations.

That seems completely bogus to me. If obvious dead code removal can't be a canonicalization, then we could just as well remove entirely the concept of canonicalization.

@joker-eph
Copy link
Collaborator

I understand the reasoning but in practice it is common for compilers to relying on the original program as fall back when a transformation is not obviously a win.

I see this as a very fragile way to design compilers, I would say that if you want to design your compiler that way, don't include canonicalization inside your pass pipeline.

I don't understand where is the line with regards to canonicalization if we go that route. What kind of transformation would not be a canonicalization then? Would function inlining be a canonicalization for instance? And if not why?

"it's ambiguous" sometimes., but "Inlining" is covered here: https://sunfishcode.github.io/blog/2018/10/22/Canonicalization.html

The way I see it is "how hard is it to reverse a transformation later"? In the case of LICM it is a local transformation and one that a compiler must be able to "undo" (because often the input program will come in both form and the compiler should optimize it!).

Inlining is a global program transformation instead, and we don't know efficiently how to manage this (that is: it's still in the realm of compiler research area, even though there are some implementations around PGO+outlining of common sequence in cold region for example).

My concern is that this could significantly reduce user control over what transformations are applied as canonicalization

How is there a "reduction" of control when there is no control already? That is: we don't have canonicalization knobs, almost by design, we just let people either use canonicalization or don't use it.

@ThomasRaoux
Copy link
Contributor

I understand the reasoning but in practice it is common for compilers to relying on the original program as fall back when a transformation is not obviously a win.

I see this as a very fragile way to design compilers, I would say that if you want to design your compiler that way, don't include canonicalization inside your pass pipeline.

But then I cannot access the existing canonicalization patterns so I would have to rewrite them which is obviously not practical.

My concern is that this could significantly reduce user control over what transformations are applied as canonicalization

How is there a "reduction" of control when there is no control already? That is: we don't have canonicalization knobs, almost by design, we just let people either use canonicalization or don't use it.

Because canonicalization doesn't have knobs if we widen what we consider to be canonicalizations we lose in control.

This reduce the modularity of MLIR patterns which I think is going to be a downside for users.

@rengolin
Copy link
Member

rengolin commented Jan 6, 2024

Actually you're saying the opposite of me :)

😅

It's confusing, because I agree with your statement and still think LICM should be LICM, not canonicalization. Not because the two snippets you show are different (they're not), but because these are well known passes (like CSE, DCE) which are arguably all forms of canonicalization.

So, we either call every transform pass that doesn't change program semantics "canonicalization", or we keep them in separate passes because some canonicalizations are different than others. I strongly prefer the latter, because of my second point earlier: the control that some bundle-of-passes can have on which types of canonicalization can happen in between its inner transformations.

the argument that the second one "may increase register pressure" (for example, use any kind of target-specific optimization) is not a valid objection to canonicalization. (and I would want my compiler to treat these two program identically)

Indeed. Register/memory allocators should operate on canonical forms and do their most optimal work in those forms.

However, the argument itself is still valid. If you interpret this as "the canonical form is changing, and the following passes were coded with the previous canonical form in mind", then my argument about isolating following passes is broken and we need to think about the consequences.

But it feels to me that we're driving away from the initial aims of this PR. To the code here, it shouldn't matter if this transform is in pass A or B, as long as the pass runs before the while-to-for conversion, it should be fine.

@joker-eph
Copy link
Collaborator

But then I cannot access the existing canonicalization patterns so I would have to rewrite them which is obviously not practical.

But that's already the case, I'm saying: nothing new here.

Because canonicalization doesn't have knobs if we widen what we consider to be canonicalizations we lose in control.

This comment seems to me to apply to any addition of a canonicalization pattern from day-1...

This reduce the modularity of MLIR patterns which I think is going to be a downside for users.

I feel you're mixing up the "addition of a new pattern you don't like", with the fundamental design that "canonicalization patterns are an opaque bag and you don't get to choose".

That is: there is no design change here. You just happen to realize that canonicalization patterns is a growing set of transformations and you can't cherry-pick.

Unless we freeze the set of patterns, we're necessarily in the situation you're describing, as far as I understand.

It's confusing, because I agree with your statement and still think LICM should be LICM, not canonicalization. Not because the two snippets you show are different (they're not), but because these are well known passes (like CSE, DCE) which are arguably all forms of canonicalization.

Yes and we run DCE during canonicalization :p
(actually we don't even have a DCE pass in MLIR I think? But maybe we should just for specific "quick cleanups"...)
The fact that these transformations have historically got a name does not make them out-of-scope of canonicalization in nature IMO. The name just refer to the technique itself, not much more.
The reason we don't do CSE during canonicalization for example has more to do with "we don't know how to do it efficiently" than anything else, if CSE was O(1) it would likely be part of the canonicalization pass I believe.

So for canonicalization candidates I rather look at:

  • is the transformation performed with only local information?
  • is the transformation "cheap" enough or does it require extensive computation?
  • is the transformation undoable easily (with similar local information), that is "no information is lost".

Some of the impact of keeping a transformation within canonicalization is the cross-transformation impact: you need to perform these interleaved in the same IR traversal to converge. You can often craft input IR where performing one before the other does not lead to the desired simplification.

@Hardcode84
Copy link
Contributor Author

if you add this pattern to the LICM pass and run it before your conversion, would that solve your problem?

As was mentioned before in thread, canonicalizations can open new opportunities for CSE and LICM and vice-versa so you may need to run this sequence of passes arbitrary number of times. I can live without this pattern upstream and internally we have a facility to run sequence of passes until fixed point is reached. I can work on upstreaming it instead.

(how is this handled in llvm proper BTW?)

@rengolin
Copy link
Member

rengolin commented Jan 6, 2024

So for canonicalization candidates I rather look at:

  • is the transformation performed with only local information?
  • is the transformation "cheap" enough or does it require extensive computation?
  • is the transformation undoable easily (with similar local information), that is "no information is lost".

To me, the reversibility is the key. No loss of information, not just program interpretation (which could say undefined behavior is fair game).

For example, what constitutes "local" can be a matter of interpretation. It is op-local, bb-local, function-local, or module-local? And what is cheap even? Maybe a O(n^2) canonicalization can avoid a O(n^3) matching later, and is therefore cheaper than not running it.

LICM can be argued that uses non-local information (op + loop + loop dependency analysis looking at the context, not just the op itself). The same can be said for CSE, since it looks at multiple expressions. Even simpler things like expression order canonicalization (x + 10 + y + 5 -> x + y + 10 + 5 -> x + y + 15), because there are multiple add ops that you need to look through on what constitutes an expression.

So, I try not to argue too strongly which passes are canonicalizations and which are not. My suggestion to have them separate is not to determine what is what, but to allow interim pass bundles to control which type of canonicalizations (reversible changes into canonical shapes) we apply within. But in between these "bundles", we should always run all of them until a fixed point.

What determines a "bundle" isn't always clear, so don't take my suggestion as concrete, just as a design goal for an upstream pipeline that we all use, so we can collaborate upstream without reinventing the wheel downstream.

@rengolin
Copy link
Member

rengolin commented Jan 6, 2024

Unless we freeze the set of patterns, we're necessarily in the situation you're describing, as far as I understand.

This is key: the canonical form isn't a static form, but an agreed-upon form that the following passes agree to operate on. This form can change with time and passes need to adapt to it, but this change can be expensive and needs to be agreed upon again.

In our project we use a lot of upstream passes. Before we used to have trouble with the shapes that the passes leave in IR, but after we started running canonicalization in between every "bundle", the changes to the canonical shapes became minimal and easily treatable, without loss of information.

We may not even agree what a "bundle" looks like across different projects, but if we run canonicalization to a fixed point in between them locally, the likelihood that there is loss of information is very low, if we agree what a "canonical form" is.

@joker-eph
Copy link
Collaborator

Chatted with @ThomasRaoux and this topic came up, felt like reporting some more on some canonicalization aspects. The argument against this as a canonicalization may come down to the definition and purpose of scf.for. There is a case to be made that scf.for is "unopinionated" and generic, intended to structurally represent a loop at multiple levels of abstractions. So for composability purposes, you want to be able to run canonicalization on your abstraction level without incurring drastic changes on the structure of the loop. Making LICM part of the scf.for canonical form is on the other hand making scf.for opinionated in terms of representing an overly abstract loop form, reducing its applicability. The motion of operations from arbitrary dialects across execution paths may be a step too far for something structural like scf.for.

@Hardcode84
Copy link
Contributor Author

Hardcode84 commented Jan 19, 2024

Making LICM part of the scf.for canonical form is on the other hand making scf.for opinionated in terms of representing an overly abstract loop form, reducing its applicability.

Side comment: Isn't this argument also applies to this entire concept of uplifting scf.while to scf.for as part of canonicalization?

As I said previously, I can live with the LICM being separate pass, the big limitation of this approach is that Upstream currently missing a facility to run sequence of passes until fixed point is reached.

@joker-eph
Copy link
Collaborator

Side comment: Isn't this argument also applies to this entire concept of uplifting scf.while to scf.for as part of canonicalization?

I am not so sure: it won't move operation from other dialects, or make operations executed that wouldn't be executed otherwise (like LICM on a zero-trip loop). It is purely a transformation from an scf operation to an equivalent other scf operation.

@joker-eph
Copy link
Collaborator

As I said previously, I can live with the LICM being separate pass, the big limitation of this approach is that Upstream currently missing a facility to run sequence of passes until fixed point is reached.

The implementation of the canonicalization pass is ~20 lines of code, it should be fairly trivial to write another one downstream that adds more patterns, or even (for example) interleave CSE and canonicalization until fixed-point?

@Hardcode84
Copy link
Contributor Author

or even (for example) interleave CSE and canonicalization until fixed-point?

Yes, and we did it downstream in kind of generic way https://github.com/numba/numba-mlir/blob/main/mlir/lib/Transforms/CompositePass.cpp but there is nothing like this upstream :)

Anyways, I can close this PR an continue with main PR (#76108), making a test pass first, and then making it canonicalization in separate PR.

Also, I'll need to add another pattern which is most likely not qualifies as canonicalization, transforming:

%res = scf.while() {
before:
  %1 = foo.bar()
  condition %cond, %1
after(%arg):
  use %arg
}
use %res

to

scf.while() {
before:
  condition %cond
after():
  %arg = foo.bar()
  use %arg
}
%res = foo.bar()
use %res

I initially wanted to add it to uplifting pass itself, but if it will be gone, we probably need another separate pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants