Skip to content

[MLIR] Fuse locations of merged constants #74670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 12, 2023

Conversation

zyx-billy
Copy link
Contributor

@zyx-billy zyx-billy commented Dec 6, 2023

When merging constants by the operation folder, the location of the op that remains should be updated to track the new meaning of this op. This way we do not lose track of all possible source locations that the constant op came from, and the final location of the op is less reliant on the order of folding. This will also help debuggers understand how to step these instructions.

This PR introduces a helper for operation folder to fuse another location into the location of an op. When an op is deduplicated, fuse the location of the op to be removed into the op that is retained. The retained op now represents both original ops.

The FusedLoc will have a string metadata to help understand the reason for the location fusion (motivated by the example in the docstring of FusedLoc).

@zyx-billy zyx-billy force-pushed the mlir/hoist_constant_loc branch from 8402239 to 90c1e7b Compare December 6, 2023 22:43
@zyx-billy zyx-billy marked this pull request as ready for review December 7, 2023 00:46
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Dec 7, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Billy Zhu (zyx-billy)

Changes

When hoisting & merging constants by the operation folder, the location of the op that remains should be updated to track the new meaning of this op. This way we do not lose track of all possible source locations that the constant op came from, and the final location of the op is less reliant on the order of folding. This will also help debuggers understand how to step these instructions.

This PR introduces a helper for operation folder to fuse another location into the location of an op. There are two ways this is used now:

  • When an op is deduplicated, fuse the location of the op to be removed into the op that is retained. The retained op now represents both original ops.
  • When an op is hoisted to the front/back of a block, fuse the location of the parent region of the block into the hoisted op. The hoisted op is now considered part of the setup of the block.

The FusedLoc will have a string metadata to help understand the reason for the location fusion (motivated by the example in the docstring of FusedLoc).


Full diff: https://github.com/llvm/llvm-project/pull/74670.diff

2 Files Affected:

  • (modified) mlir/lib/Transforms/Utils/FoldUtils.cpp (+49-5)
  • (added) mlir/test/Transforms/canonicalize-debuginfo.mlir (+42)
diff --git a/mlir/lib/Transforms/Utils/FoldUtils.cpp b/mlir/lib/Transforms/Utils/FoldUtils.cpp
index 90ee5ba51de3ad..23fcb4796892d6 100644
--- a/mlir/lib/Transforms/Utils/FoldUtils.cpp
+++ b/mlir/lib/Transforms/Utils/FoldUtils.cpp
@@ -19,6 +19,38 @@
 
 using namespace mlir;
 
+// Fuse `foldedLocation` into the Location of `retainedOp`.
+// This will result in `retainedOp` having a FusedLoc with a StringAttr tag
+// "OpFold" to help trace the source of the fusion. If `retainedOp` already had
+// a FusedLoc with the same tag, `foldedLocation` will simply be appended to it.
+// Usage:
+// - When an op is deduplicated, fuse the location of the op to be removed into
+//   the op that is retained.
+// - When an op is hoisted to the front/back of a block, fuse the location of
+//   the parent region of the block into the hoisted op.
+static void appendFoldedLocation(Operation *retainedOp,
+                                 Location foldedLocation) {
+  constexpr std::string_view tag = "OpFold";
+  // Append into existing fused location if it has the same tag.
+  if (auto existingFusedLoc =
+          retainedOp->getLoc().dyn_cast<FusedLocWith<StringAttr>>()) {
+    StringAttr existingMetadata = existingFusedLoc.getMetadata();
+    if (existingMetadata.strref().equals(tag)) {
+      SmallVector<Location> locations(existingFusedLoc.getLocations());
+      locations.push_back(foldedLocation);
+      Location newFusedLoc =
+          FusedLoc::get(retainedOp->getContext(), locations, existingMetadata);
+      retainedOp->setLoc(newFusedLoc);
+      return;
+    }
+  }
+  // Create a new fusedloc with retainedOp's loc and foldedLocation.
+  Location newFusedLoc = FusedLoc::get(
+      retainedOp->getContext(), {retainedOp->getLoc(), foldedLocation},
+      StringAttr::get(retainedOp->getContext(), tag));
+  retainedOp->setLoc(newFusedLoc);
+}
+
 /// Given an operation, find the parent region that folded constants should be
 /// inserted into.
 static Region *
@@ -77,8 +109,10 @@ LogicalResult OperationFolder::tryToFold(Operation *op, bool *inPlaceUpdate) {
     // Check to see if we should rehoist, i.e. if a non-constant operation was
     // inserted before this one.
     Block *opBlock = op->getBlock();
-    if (&opBlock->front() != op && !isFolderOwnedConstant(op->getPrevNode()))
+    if (&opBlock->front() != op && !isFolderOwnedConstant(op->getPrevNode())) {
       op->moveBefore(&opBlock->front());
+      appendFoldedLocation(op, opBlock->getParent()->getLoc());
+    }
     return failure();
   }
 
@@ -112,8 +146,10 @@ bool OperationFolder::insertKnownConstant(Operation *op, Attribute constValue) {
   // If this is a constant we unique'd, we don't need to insert, but we can
   // check to see if we should rehoist it.
   if (isFolderOwnedConstant(op)) {
-    if (&opBlock->front() != op && !isFolderOwnedConstant(op->getPrevNode()))
+    if (&opBlock->front() != op && !isFolderOwnedConstant(op->getPrevNode())) {
       op->moveBefore(&opBlock->front());
+      appendFoldedLocation(op, opBlock->getParent()->getLoc());
+    }
     return true;
   }
 
@@ -141,6 +177,7 @@ bool OperationFolder::insertKnownConstant(Operation *op, Attribute constValue) {
   // If there is an existing constant, replace `op`.
   if (folderConstOp) {
     notifyRemoval(op);
+    appendFoldedLocation(folderConstOp, op->getLoc());
     rewriter.replaceOp(op, folderConstOp->getResults());
     return false;
   }
@@ -151,8 +188,10 @@ bool OperationFolder::insertKnownConstant(Operation *op, Attribute constValue) {
   // anything. Otherwise, we move the constant to the insertion block.
   Block *insertBlock = &insertRegion->front();
   if (opBlock != insertBlock || (&insertBlock->front() != op &&
-                                 !isFolderOwnedConstant(op->getPrevNode())))
+                                 !isFolderOwnedConstant(op->getPrevNode()))) {
     op->moveBefore(&insertBlock->front());
+    appendFoldedLocation(op, insertBlock->getParent()->getLoc());
+  }
 
   folderConstOp = op;
   referencedDialects[op].push_back(op->getDialect());
@@ -264,8 +303,10 @@ OperationFolder::processFoldResults(Operation *op,
       // with. This may not automatically happen if the operation being folded
       // was inserted before the constant within the insertion block.
       Block *opBlock = op->getBlock();
-      if (opBlock == constOp->getBlock() && &opBlock->front() != constOp)
+      if (opBlock == constOp->getBlock() && &opBlock->front() != constOp) {
         constOp->moveBefore(&opBlock->front());
+        appendFoldedLocation(constOp, opBlock->getParent()->getLoc());
+      }
 
       results.push_back(constOp->getResult(0));
       continue;
@@ -294,8 +335,10 @@ OperationFolder::tryGetOrCreateConstant(ConstantMap &uniquedConstants,
   // Check if an existing mapping already exists.
   auto constKey = std::make_tuple(dialect, value, type);
   Operation *&constOp = uniquedConstants[constKey];
-  if (constOp)
+  if (constOp) {
+    appendFoldedLocation(constOp, loc);
     return constOp;
+  }
 
   // If one doesn't exist, try to materialize one.
   if (!(constOp = materializeConstant(dialect, rewriter, value, type, loc)))
@@ -316,6 +359,7 @@ OperationFolder::tryGetOrCreateConstant(ConstantMap &uniquedConstants,
   // materialized operation in favor of the existing one.
   if (auto *existingOp = uniquedConstants.lookup(newKey)) {
     notifyRemoval(constOp);
+    appendFoldedLocation(existingOp, constOp->getLoc());
     rewriter.eraseOp(constOp);
     referencedDialects[existingOp].push_back(dialect);
     return constOp = existingOp;
diff --git a/mlir/test/Transforms/canonicalize-debuginfo.mlir b/mlir/test/Transforms/canonicalize-debuginfo.mlir
new file mode 100644
index 00000000000000..9cee2fec83bad6
--- /dev/null
+++ b/mlir/test/Transforms/canonicalize-debuginfo.mlir
@@ -0,0 +1,42 @@
+// RUN: mlir-opt -allow-unregistered-dialect %s -pass-pipeline='builtin.module(func.func(canonicalize{test-convergence}))' -split-input-file -mlir-print-debuginfo | FileCheck %s
+
+// CHECK-LABEL: func @merge_constants
+func.func @merge_constants() -> (index, index, index) {
+  // CHECK-NEXT: arith.constant 42 : index loc(#[[FusedLoc:.*]])
+  %0 = arith.constant 42 : index loc("merge_constants":0:0)
+  %1 = arith.constant 42 : index loc("merge_constants":1:0)
+  %2 = arith.constant 42 : index loc("merge_constants":2:0)
+  return %0, %1, %2: index, index, index
+}
+
+// CHECK-DAG: #[[LocConst0:.*]] = loc("merge_constants":0:0)
+// CHECK-DAG: #[[LocConst1:.*]] = loc("merge_constants":1:0)
+// CHECK-DAG: #[[LocConst2:.*]] = loc("merge_constants":2:0)
+
+// CHECK: #[[FusedLoc]] = loc(fused<"OpFold">[
+// CHECK-SAME: #[[LocConst0]]
+// CHECK-SAME: #[[LocConst1]]
+// CHECK-SAME: #[[LocConst2]]
+
+// -----
+
+// CHECK-LABEL: func @hoist_constant
+func.func @hoist_constant(%arg0: memref<8xi32>) {
+  // CHECK-NEXT: arith.constant 42 : i32 loc(#[[FusedWithFunction:.*]])
+  affine.for %arg1 = 0 to 8 {
+    %0 = arith.constant 42 : i32 loc("hoist_constant":0:0)
+    %1 = arith.constant 42 : i32 loc("hoist_constant":1:0)
+    memref.store %0, %arg0[%arg1] : memref<8xi32>
+    memref.store %1, %arg0[%arg1] : memref<8xi32>
+  }
+  return
+} loc("hoist_constant":2:0)
+
+// CHECK-DAG: #[[LocConst0:.*]] = loc("hoist_constant":0:0)
+// CHECK-DAG: #[[LocConst1:.*]] = loc("hoist_constant":1:0)
+// CHECK-DAG: #[[LocFunc:.*]] = loc("hoist_constant":2:0)
+
+// CHECK: #[[FusedWithFunction]] = loc(fused<"OpFold">[
+// CHECK-SAME: #[[LocConst0]]
+// CHECK-SAME: #[[LocFunc]]
+// CHECK-SAME: #[[LocConst1]]

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

I'm not convinced about the block parent location part, we should probably move this to a separate PR and discuss a bit more what we'd be trying to achieve exactly.

// the parent region of the block into the hoisted op.
static void appendFoldedLocation(Operation *retainedOp,
Location foldedLocation) {
constexpr std::string_view tag = "OpFold";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually use StringLiteral in LLVM: has there been a migration to std::string_view?

But even better, could you make this a StringAttr member of the OperationFolder class? (this could be a private method, or just take it an extra argument).
That way we'd avoid the repeated StringAttr creation in this function, and the string comparison line 38 would become a pointer comparison instead.

For the naming, we already have a "CSE" tag I think, it may be useful to reuse the naming here instead?

retainedOp->getLoc().dyn_cast<FusedLocWith<StringAttr>>()) {
StringAttr existingMetadata = existingFusedLoc.getMetadata();
if (existingMetadata.strref().equals(tag)) {
SmallVector<Location> locations(existingFusedLoc.getLocations());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this possible we'd fuse the same location over and over? Should this be a SmallPtrSet to remove the duplication?

Please add tests for this.

op->moveBefore(&opBlock->front());
appendFoldedLocation(op, opBlock->getParent()->getLoc());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that makes sense to me: this is already a "folderOwnedConstant" and we just moved it up in the block

op->moveBefore(&opBlock->front());
appendFoldedLocation(op, opBlock->getParent()->getLoc());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

return;
}
}
// Create a new fusedloc with retainedOp's loc and foldedLocation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably check that the new loc isn't identical to the old one first.

constOp->moveBefore(&opBlock->front());
appendFoldedLocation(constOp, opBlock->getParent()->getLoc());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, I don't think this makes sense to me.

@@ -0,0 +1,42 @@
// RUN: mlir-opt -allow-unregistered-dialect %s -pass-pipeline='builtin.module(func.func(canonicalize{test-convergence}))' -split-input-file -mlir-print-debuginfo | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove -allow-unregistered-dialect

// CHECK: #[[FusedWithFunction]] = loc(fused<"OpFold">[
// CHECK-SAME: #[[LocConst0]]
// CHECK-SAME: #[[LocFunc]]
// CHECK-SAME: #[[LocConst1]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have 7 invocations of appendFoldedLocation, I would think there would be 7 tests?

@@ -19,6 +19,38 @@

using namespace mlir;

// Fuse `foldedLocation` into the Location of `retainedOp`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: please use /// for docstrings

constexpr std::string_view tag = "OpFold";
// Append into existing fused location if it has the same tag.
if (auto existingFusedLoc =
retainedOp->getLoc().dyn_cast<FusedLocWith<StringAttr>>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retainedOp->getLoc().dyn_cast<FusedLocWith<StringAttr>>()) {
dyn_cast<FusedLocWith<StringAttr>>(retainedOp->getLoc())) {

@Mogball
Copy link
Contributor

Mogball commented Dec 7, 2023

Concretely, the problem with hoisting constants to the top of basic blocks is that if those constant ops don't become "trivial" (e.g. they don't lower directly to LLVM constants and have some codegen), debuggers will pick them up when stepping into a function. E.g.

func @my_function() {
  "some.op"() : () -> ()
  %c0 = some.complex.constant = [1, 2, 3]
  return
}

If the constant gets hoisted and lowers to non-trivial code:

llvm.func @my_function() {
  %0 = llvm.mlir.undef : ...
  // bunch of code constructing the "constant"
  "some.op"() :() -> ()
}

The debugger will jump around because the source locations are messed up.

@Mogball
Copy link
Contributor

Mogball commented Dec 7, 2023

I do agree that fusing with the block location is more controversial than fusing constant op locations together. Perhaps the policy should lie with the constant ops themselves

@joker-eph
Copy link
Collaborator

I do agree that fusing with the block location is more controversial than fusing constant op locations together. Perhaps the policy should lie with the constant ops themselves

It's not clear to me why one constant vs another would treat it differently, I think it's best to have a separate PR just so that we can see examples of what we're trying to address exactly with this.

@zyx-billy
Copy link
Contributor Author

Thanks for the reviews! Sounds like I can probably split up constant op location fusing & block parent location fusing, but let me elaborate on the motivation since they both are for the same cause.

Like @Mogball said, these locations end up informing the debugger which line of source code is being stepped on, which is a nice proxy for testing whether the location of an op still makes sense*. Any time we do these constant merging, even though procedurally we're erasing ops & moving ops to the front, semantically it's more like saying we're eliminating all the same constants in the block, and creating a unified one at the front of the block to be shared by all their users.

The reason we have this new constant op in the front of the block, is because of all the source lines that contributed to the original constants in this block. That's why I think both "the original constants" and "this block" are equally important in the creation of the new constant (after all, the block dictated the scope & position of this shared constant in the IR). At this point it's basically common setup code for the block (just like you might have language-dependent function setup code at the front of a function, and those carry the location of the function definition).

The parallel optimization in LLVM actually does the same thing (though it achieves it in a different way, perhaps a bit more weird?) for the same reason (https://reviews.llvm.org/D38088):

  • When merging constants, it does the same thing by merging locations of the constants.
  • When hoisting constants, it merges the location of the insertion point instruction.

Location merging in LLVM returns the location of the common parent scope, which will result in a similar behavior as us. Unlike us, it will unfortunately lose the location of the original constant, whereas we can hold both.

* I'm not sure if this represents a slight semantic shift for what locations mean (I'm hoping it's captured by the original definition), but if it is, I'm also open to considering new concepts for capturing a "hoisted location", or perhaps more broadly a debug location vs. source location distinction if we want to go down that route. LMK what you think. Happy to discuss more!

@joker-eph
Copy link
Collaborator

I looked at the LLVM revision, but I don't find the "parent op location" equivalent in MLIR here?
(I'm not sure I totally understood it though, their tests are... cryptic to me).

I actually may start to understand somehow: they attach the fused location of the insertion point somehow (for PGO reasons apparently), and you're trying to translate it to the "parent op" because you think that in the entry point it'll translate to the closest to this?

It'll probably help to look at a dedicated revision with dedicated examples crafted to showcase this.

@zyx-billy
Copy link
Contributor Author

I looked at the LLVM revision, but I don't find the "parent op location" equivalent in MLIR here?

Ah, this happens inside the logic for DILocation::getMergedLocation. It returns a location with the nearest common scope of two locations if they share a common scope. For us, we don't have to do this immediately since we can use FusedLoc to keep track of both locations (at least, until when we have to lower into more DWARF-like representations, at which point we can do the same thing as LLVM here and extract the parent).

For hoisting too, instead of following LLVM and just using the location of the insertion point instruction, we can also make things clearer by using the location of the parent scope directly, since we always move to the front of the block. This way we can later tell that this op is part of the common setup of the parent scope. DWARF emitters can then use this to mark this op as part of the prologue if appropriate.

It'll probably help to look at a dedicated revision with dedicated examples crafted to showcase this.

Sounds good. Let me split up the PR so it can be considered separately.

@zyx-billy zyx-billy changed the title [MLIR] Fuse locations of hoisted / merged constants [MLIR] Fuse locations of merged constants Dec 8, 2023
@zyx-billy
Copy link
Contributor Author

zyx-billy commented Dec 8, 2023

OK, this PR is now updated to only include changes for fusing merged constants. Also updated tests to check every case where deduplication happens. PTAL when you get a chance 🙏

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks! Looking forward for the follow up on blocks ;)

@zyx-billy
Copy link
Contributor Author

Thank you @joker-eph! Do you mind clicking merge for me? I don't yet have commit access 🙏

@joker-eph joker-eph merged commit 87e2e89 into llvm:main Dec 12, 2023
bchetioui added a commit to bchetioui/llvm-project that referenced this pull request Dec 12, 2023
[PR 74670](llvm#74670) added
support for merging locations at constant folding time. We have
discovered that in some cases, the number of locations grows so big as
to cause a compilation process to OOM. In that case, many of the
locations end up appearing several times in nested fused locations.

We add here a helper that always flattens fused locations in order to
eliminate duplicates in the case of nested fused locations. We only
allow flattening nested fused locations when the inner fused location
has no metadata, or has the same metadata as the outer fused location.
bchetioui added a commit that referenced this pull request Dec 12, 2023
[PR 74670](#74670) added
support for merging locations at constant folding time. We have
discovered that in some cases, the number of locations grows so big as
to cause a compilation process to OOM. In that case, many of the
locations end up appearing several times in nested fused locations.

We add here a helper that always flattens fused locations in order to
eliminate duplicates in the case of nested fused locations.
@MaskRay
Copy link
Member

MaskRay commented Dec 13, 2023

A heads-up that this caused a lot of OOM and timeout failures to our internal users and the follow-up mitigation #75218 doesn't fully address. @bchetioui

MaskRay added a commit to MaskRay/llvm-project that referenced this pull request Dec 13, 2023
This reverts commit 87e2e89.
and its follow-ups 0d1490f (llvm#75218)
and 6fe3cd5 (llvm#75312).

We observed significant OOM/timeout issues due to llvm#74670 to quite a few
services including google-research/swirl-lm. The follow-up llvm#75218 and
 llvm#75312 do not address the issue. Perhaps this is worth more
investigation.
MaskRay added a commit that referenced this pull request Dec 13, 2023
This reverts commit 87e2e89.
and its follow-ups 0d1490f (#75218)
and 6fe3cd5 (#75312).

We observed significant OOM/timeout issues due to #74670 to quite a few
services including google-research/swirl-lm. The follow-up #75218 and
 #75312 do not address the issue. Perhaps this is worth more
investigation.
zyx-billy added a commit that referenced this pull request Dec 21, 2023
Follow up to the discussion from #75258, and serves as an alternate
solution for #74670.

Set the location to Unknown for deduplicated / moved / materialized
constants by OperationFolder. This makes sure that the folded constants
don't end up with an arbitrary location of one of the original ops that
became it, and that hoisted ops don't confuse the stepping order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants