Skip to content

[MLIR] Fuse parent region location when hoisting constants #75258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

zyx-billy
Copy link
Contributor

@zyx-billy zyx-billy commented Dec 13, 2023

Followup on #74670.

When operation folder hoists constants, the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point. This is to reflect the new meaning of this constant op, which is that it is created at the front of the region to be used by any op in the body if it needs such a constant.

This is motivated by the same reason as a similar change in LLVM (https://reviews.llvm.org/D38088): Keeping the original location will cause stepping in a debugger to be out of order. If we use the following psuedocode as example, after op folding today, the stepping behavior will be in the order of:

  • Line 2: initialization instructions for the constant 8 (assume non-trivial)
  • Line 1: instructions for printing arg
  • Line 2: instructions for printing 8
void example(int arg) {
    print(arg)    # Line 1
    print(8)      # Line 2
    ...
}

This implies that whatever code that initializes the constant at the front of this function cannot be considered solely to be the result of line 2. In LLVM, the solution was to merge the location of the insertion point instruction. In this case, it'll return a location corresponding to the parent scope (which is this function). For us, since MLIR allows carrying around multiple locations as a result of optimizations, we can take advantage of that and just fuse the location of the parent region. This will make the hoisted ops look similar to other source-dependent setup code that may exist in the front of a function already, in that they also carry the location of the function. Backends will then be able to differentiate what belongs in the prologue of a function and instruct the debugger accordingly.

Please LMK if anything about this isn't clear. Thanks!

@zyx-billy zyx-billy force-pushed the mlir/hoist_constant_fuse_parent_loc branch 2 times, most recently from 3bd3ea0 to b0b8e7c Compare December 13, 2023 00:22
@zyx-billy zyx-billy force-pushed the mlir/hoist_constant_fuse_parent_loc branch from b0b8e7c to 8abe24b Compare December 13, 2023 00:23
@zyx-billy zyx-billy marked this pull request as ready for review December 13, 2023 01:10
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Dec 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 13, 2023

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Billy Zhu (zyx-billy)

Changes

Followup on #74670.

When operation folder hoists constants, the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point. This is to reflect the new meaning of this constant op, which is that it is created at the front of the region to be used by any op in the body if it needs such a constant.

This is motivated by the same reason as a similar change in LLVM (https://reviews.llvm.org/D38088): Keeping the original location will cause stepping in a debugger to be out of order. If we use the following psuedocode as example, after op folding today, the stepping behavior will be in the order of:

  • Line 2: initialization instructions for the constant 8 (assume non-trivial)
  • Line 1: instructions for printing arg
  • Line 2: instructions for printing 8
void example(int arg) {
    print(arg)    # Line 1
    print(8)      # Line 2
    ...
}

This implies that whatever code that initializes the constant at the front of this function cannot be considered solely to be the result of line 2. In LLVM, the solution was to merge the location of the insertion point instruction. In this case, it'll return a location corresponding to the parent scope (which is this function). For us, since MLIR allows carrying around multiple locations as a result of optimizations, we can take advantage of that and just fuse the location of the parent region. This will make the hoisted ops look similar to other source-dependent setup code that may exist in the front of a function already, in that they also carry the location of the function. Backends will then be able to differentiate what belongs in the prologue of a function and instruct the debugger accordingly.

Please LMK if anything about this isn't clear. Thanks!


Full diff: https://github.com/llvm/llvm-project/pull/75258.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Transforms/FoldUtils.h (+6-4)
  • (modified) mlir/lib/Transforms/Utils/FoldUtils.cpp (+24-16)
  • (modified) mlir/test/Transforms/canonicalize-debuginfo.mlir (+24-2)
  • (modified) mlir/test/Transforms/constant-fold-debuginfo.mlir (+28-4)
diff --git a/mlir/include/mlir/Transforms/FoldUtils.h b/mlir/include/mlir/Transforms/FoldUtils.h
index 28fa18cf942de4..999b4774eae996 100644
--- a/mlir/include/mlir/Transforms/FoldUtils.h
+++ b/mlir/include/mlir/Transforms/FoldUtils.h
@@ -96,10 +96,12 @@ class OperationFolder {
                                     Dialect *dialect, Attribute value,
                                     Type type, Location loc);
 
-  // Fuse `foldedLocation` into the Location of `retainedOp`. This will result
-  // in `retainedOp` having a FusedLoc with `fusedLocationTag` to help trace the
-  // source of the fusion. If `retainedOp` already had a FusedLoc with the same
-  // tag, `foldedLocation` will simply be appended to it.
+  // Fuse `foldedLocation` into `originalLocation`. This will result in a
+  // FusedLoc with `fusedLocationTag` to help trace the source of the fusion.
+  // If `originalLocation` already had a FusedLoc with the same tag,
+  // `foldedLocation` will simply be appended to it.
+  Location getFusedLocation(Location originalLocation, Location foldedLocation);
+  // Update the location of `retainedOp` by applying `getFusedLocation`.
   void appendFoldedLocation(Operation *retainedOp, Location foldedLocation);
 
   /// Tag for annotating fused locations as a result of merging constants.
diff --git a/mlir/lib/Transforms/Utils/FoldUtils.cpp b/mlir/lib/Transforms/Utils/FoldUtils.cpp
index 056a681718e121..b2afe08430fc4d 100644
--- a/mlir/lib/Transforms/Utils/FoldUtils.cpp
+++ b/mlir/lib/Transforms/Utils/FoldUtils.cpp
@@ -152,8 +152,10 @@ bool OperationFolder::insertKnownConstant(Operation *op, Attribute constValue) {
   // anything. Otherwise, we move the constant to the insertion block.
   Block *insertBlock = &insertRegion->front();
   if (opBlock != insertBlock || (&insertBlock->front() != op &&
-                                 !isFolderOwnedConstant(op->getPrevNode())))
+                                 !isFolderOwnedConstant(op->getPrevNode()))) {
     op->moveBefore(&insertBlock->front());
+    appendFoldedLocation(op, insertBlock->getParent()->getLoc());
+  }
 
   folderConstOp = op;
   referencedDialects[op].push_back(op->getDialect());
@@ -237,6 +239,7 @@ OperationFolder::processFoldResults(Operation *op,
   auto *insertRegion = getInsertionRegion(interfaces, op->getBlock());
   auto &entry = insertRegion->front();
   rewriter.setInsertionPoint(&entry, entry.begin());
+  Location loc = getFusedLocation(op->getLoc(), insertRegion->getLoc());
 
   // Get the constant map for the insertion region of this operation.
   auto &uniquedConstants = foldScopes[insertRegion];
@@ -259,8 +262,8 @@ OperationFolder::processFoldResults(Operation *op,
     // Check to see if there is a canonicalized version of this constant.
     auto res = op->getResult(i);
     Attribute attrRepl = foldResults[i].get<Attribute>();
-    if (auto *constOp = tryGetOrCreateConstant(
-            uniquedConstants, dialect, attrRepl, res.getType(), op->getLoc())) {
+    if (auto *constOp = tryGetOrCreateConstant(uniquedConstants, dialect,
+                                               attrRepl, res.getType(), loc)) {
       // Ensure that this constant dominates the operation we are replacing it
       // with. This may not automatically happen if the operation being folded
       // was inserted before the constant within the insertion block.
@@ -364,31 +367,36 @@ static Location FlattenFusedLocationRecursively(const Location loc) {
   return loc;
 }
 
-void OperationFolder::appendFoldedLocation(Operation *retainedOp,
+Location OperationFolder::getFusedLocation(Location originalLocation,
                                            Location foldedLocation) {
+  // If they're already equal, no need to fuse.
+  if (originalLocation == foldedLocation)
+    return originalLocation;
+
   // Append into existing fused location if it has the same tag.
   if (auto existingFusedLoc =
-          dyn_cast<FusedLocWith<StringAttr>>(retainedOp->getLoc())) {
+          dyn_cast<FusedLocWith<StringAttr>>(originalLocation)) {
     StringAttr existingMetadata = existingFusedLoc.getMetadata();
     if (existingMetadata == fusedLocationTag) {
       ArrayRef<Location> existingLocations = existingFusedLoc.getLocations();
       SetVector<Location> locations(existingLocations.begin(),
                                     existingLocations.end());
       locations.insert(foldedLocation);
-      Location newFusedLoc = FusedLoc::get(
-          retainedOp->getContext(), locations.takeVector(), existingMetadata);
-      retainedOp->setLoc(FlattenFusedLocationRecursively(newFusedLoc));
-      return;
+      Location newFusedLoc =
+          FusedLoc::get(originalLocation->getContext(), locations.takeVector(),
+                        existingMetadata);
+      return FlattenFusedLocationRecursively(newFusedLoc);
     }
   }
 
   // Create a new fusedloc with retainedOp's loc and foldedLocation.
-  // If they're already equal, no need to fuse.
-  if (retainedOp->getLoc() == foldedLocation)
-    return;
-
   Location newFusedLoc =
-      FusedLoc::get(retainedOp->getContext(),
-                    {retainedOp->getLoc(), foldedLocation}, fusedLocationTag);
-  retainedOp->setLoc(FlattenFusedLocationRecursively(newFusedLoc));
+      FusedLoc::get(originalLocation->getContext(),
+                    {originalLocation, foldedLocation}, fusedLocationTag);
+  return FlattenFusedLocationRecursively(newFusedLoc);
+}
+
+void OperationFolder::appendFoldedLocation(Operation *retainedOp,
+                                           Location foldedLocation) {
+  retainedOp->setLoc(getFusedLocation(retainedOp->getLoc(), foldedLocation));
 }
diff --git a/mlir/test/Transforms/canonicalize-debuginfo.mlir b/mlir/test/Transforms/canonicalize-debuginfo.mlir
index 217cc29c0095e2..8b726eabb4c2e8 100644
--- a/mlir/test/Transforms/canonicalize-debuginfo.mlir
+++ b/mlir/test/Transforms/canonicalize-debuginfo.mlir
@@ -33,9 +33,31 @@ func.func @hoist_constant(%arg0: memref<8xi32>) {
     memref.store %0, %arg0[%arg1] : memref<8xi32>
     memref.store %1, %arg0[%arg1] : memref<8xi32>
   }
+  // CHECK: return
   return
-}
+// CHECK-NEXT: } loc(#[[LocFunc:.*]])
+} loc("hoist_constant":2:0)
 
 // CHECK-DAG: #[[LocConst0:.*]] = loc("hoist_constant":0:0)
 // CHECK-DAG: #[[LocConst1:.*]] = loc("hoist_constant":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocConst1]]])
+// CHECK-DAG: #[[LocFunc]] = loc("hoist_constant":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocFunc]], #[[LocConst1]]])
+
+// -----
+
+// CHECK-LABEL: func @hoist_constant_simple
+func.func @hoist_constant_simple(%arg0: memref<8xi32>) -> i32 {
+  // CHECK-NEXT: arith.constant 88 : i32 loc(#[[FusedLoc:.*]])
+  %0 = arith.constant 42 : i32 loc("hoist_constant_simple":0:0)
+  %1 = arith.constant 0 : index
+  memref.store %0, %arg0[%1] : memref<8xi32>
+
+  %2 = arith.constant 88 : i32 loc("hoist_constant_simple":1:0)
+
+  return %2 : i32
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("hoist_constant_simple":2:0)
+
+// CHECK-DAG: #[[LocConst1:.*]] = loc("hoist_constant_simple":1:0)
+// CHECK-DAG: #[[LocFunc]] = loc("hoist_constant_simple":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]]])
diff --git a/mlir/test/Transforms/constant-fold-debuginfo.mlir b/mlir/test/Transforms/constant-fold-debuginfo.mlir
index 79a25f860a4841..7e5ffc3c4d5b13 100644
--- a/mlir/test/Transforms/constant-fold-debuginfo.mlir
+++ b/mlir/test/Transforms/constant-fold-debuginfo.mlir
@@ -11,11 +11,13 @@ func.func @fold_and_merge() -> (i32, i32) {
   %3 = arith.constant 6 : i32 loc("fold_and_merge":1:0)
 
   return %2, %3: i32, i32
-}
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("fold_and_merge":2:0)
 
 // CHECK-DAG: #[[LocConst0:.*]] = loc("fold_and_merge":0:0)
 // CHECK-DAG: #[[LocConst1:.*]] = loc("fold_and_merge":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocConst0]]])
+// CHECK-DAG: #[[LocFunc]] = loc("fold_and_merge":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]], #[[LocConst0]]])
 
 // -----
 
@@ -27,8 +29,30 @@ func.func @materialize_different_dialect() -> (f32, f32) {
   %2 = arith.constant 1.0 : f32 loc("materialize_different_dialect":1:0)
 
   return %1, %2: f32, f32
-}
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("materialize_different_dialect":2:0)
 
 // CHECK-DAG: #[[LocConst0:.*]] = loc("materialize_different_dialect":0:0)
 // CHECK-DAG: #[[LocConst1:.*]] = loc("materialize_different_dialect":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocConst0]]])
+// CHECK-DAG: #[[LocFunc]] = loc("materialize_different_dialect":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]], #[[LocConst0]]])
+
+// -----
+
+// CHECK-LABEL: func @materialize_in_front
+func.func @materialize_in_front(%arg0: memref<8xi32>) {
+  // CHECK-NEXT: arith.constant 6 : i32 loc(#[[FusedLoc:.*]])
+  affine.for %arg1 = 0 to 8 {
+    %1 = arith.constant 1 : i32
+    %2 = arith.constant 5 : i32
+    %3 = arith.addi %1, %2 : i32 loc("materialize_in_front":0:0)
+    memref.store %3, %arg0[%arg1] : memref<8xi32>
+  }
+  // CHECK: return
+  return
+// CHECK-NEXT: } loc(#[[LocFunc:.*]])
+} loc("materialize_in_front":1:0)
+
+// CHECK-DAG: #[[LocConst0:.*]] = loc("materialize_in_front":0:0)
+// CHECK-DAG: #[[LocFunc]] = loc("materialize_in_front":1:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocFunc]]])

@joker-eph
Copy link
Collaborator

joker-eph commented Dec 13, 2023

Why is this different from other CSE cases (or other code motion)?

the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point.

Is this description actually correct? Or is it taking advantage that we always materialize at the beginning of the entry block of a region?

Another question: what if there are block-arguments with locations that have themselves a lowering that will insert operations at the beginning of the block? Aren't you creating a worsen problem?

@zyx-billy
Copy link
Contributor Author

Why is this different from other CSE cases (or other code motion)?

Oh do you mean other CSE or code motion in general? I don't think this one's special. I'm just starting with this case, and if we get to an agreed practice for managing locations during code motion & CSE, we can apply it to other optimizations.

Compared to the previous PR, the difference is this handles moving & materializing ops, while the previous one only handled deduplicating ops. It's likely these happen together, but they can also occur individually (the tests added in this PR checks that).

the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point.

Is this description actually correct? Or is it taking advantage that we always materialize at the beginning of the entry block of a region?

Right, this is only correct because we move to / materialize at the beginning of the entry block in this optimization. Other patterns would need to be considered separately.

what if there are block-arguments with locations that have themselves a lowering that will insert operations at the beginning of the block? Aren't you creating a worsen problem?

If we settle on the convention that a "prologue" of a region (all the user-invisible/compiler-generated setup instructions) should carry the location of the parent region, then the lowerings for the block arguments should also make sure to use the parent region location (if the arguments don't already carry the parent location).

@zyx-billy
Copy link
Contributor Author

So I saw this comment from the previous PR warning about OOM / timeouts, and I wonder if we need to make a tradeoff between tracking everything & efficiency.

Would you be open to doing something lossy, like what was done in LLVM, where whenever we move / materialize / deduplicate constants, we just set the location to the parent location (and throw away the original locations). That achieves the same ultimate effect (clearer op location meaning & no out-of-order stepping) without dragging all the original source locations along. While it may feel weird throwing away locations, that's essentially what was done before the previous PR (except for an arbitrary one that was kept).

@joker-eph
Copy link
Collaborator

If we get into such tradeoff, we could just as well be dropping the location entirely from what I can tell.

@zyx-billy
Copy link
Contributor Author

If we get into such tradeoff, we could just as well be dropping the location entirely from what I can tell.

Actually... yeah that works too 😂 what do you say we just drop locations entirely then? That seems like the cleanest solution so far.

@joker-eph
Copy link
Collaborator

Seems like a reasonable tradeoff all things considered? In particular for constants...

@zyx-billy
Copy link
Contributor Author

OK. Will open a new PR with the new solution 🙏 .

@zyx-billy zyx-billy closed this Dec 13, 2023
zyx-billy added a commit that referenced this pull request Dec 21, 2023
Follow up to the discussion from #75258, and serves as an alternate
solution for #74670.

Set the location to Unknown for deduplicated / moved / materialized
constants by OperationFolder. This makes sure that the folded constants
don't end up with an arbitrary location of one of the original ops that
became it, and that hoisted ops don't confuse the stepping order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants