-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MLIR] Fuse parent region location when hoisting constants #75258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3bd3ea0
to
b0b8e7c
Compare
b0b8e7c
to
8abe24b
Compare
@llvm/pr-subscribers-mlir-core @llvm/pr-subscribers-mlir Author: Billy Zhu (zyx-billy) ChangesFollowup on #74670. When operation folder hoists constants, the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point. This is to reflect the new meaning of this constant op, which is that it is created at the front of the region to be used by any op in the body if it needs such a constant. This is motivated by the same reason as a similar change in LLVM (https://reviews.llvm.org/D38088): Keeping the original location will cause stepping in a debugger to be out of order. If we use the following psuedocode as example, after op folding today, the stepping behavior will be in the order of:
This implies that whatever code that initializes the constant at the front of this function cannot be considered solely to be the result of line 2. In LLVM, the solution was to merge the location of the insertion point instruction. In this case, it'll return a location corresponding to the parent scope (which is this function). For us, since MLIR allows carrying around multiple locations as a result of optimizations, we can take advantage of that and just fuse the location of the parent region. This will make the hoisted ops look similar to other source-dependent setup code that may exist in the front of a function already, in that they also carry the location of the function. Backends will then be able to differentiate what belongs in the prologue of a function and instruct the debugger accordingly. Please LMK if anything about this isn't clear. Thanks! Full diff: https://github.com/llvm/llvm-project/pull/75258.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Transforms/FoldUtils.h b/mlir/include/mlir/Transforms/FoldUtils.h
index 28fa18cf942de4..999b4774eae996 100644
--- a/mlir/include/mlir/Transforms/FoldUtils.h
+++ b/mlir/include/mlir/Transforms/FoldUtils.h
@@ -96,10 +96,12 @@ class OperationFolder {
Dialect *dialect, Attribute value,
Type type, Location loc);
- // Fuse `foldedLocation` into the Location of `retainedOp`. This will result
- // in `retainedOp` having a FusedLoc with `fusedLocationTag` to help trace the
- // source of the fusion. If `retainedOp` already had a FusedLoc with the same
- // tag, `foldedLocation` will simply be appended to it.
+ // Fuse `foldedLocation` into `originalLocation`. This will result in a
+ // FusedLoc with `fusedLocationTag` to help trace the source of the fusion.
+ // If `originalLocation` already had a FusedLoc with the same tag,
+ // `foldedLocation` will simply be appended to it.
+ Location getFusedLocation(Location originalLocation, Location foldedLocation);
+ // Update the location of `retainedOp` by applying `getFusedLocation`.
void appendFoldedLocation(Operation *retainedOp, Location foldedLocation);
/// Tag for annotating fused locations as a result of merging constants.
diff --git a/mlir/lib/Transforms/Utils/FoldUtils.cpp b/mlir/lib/Transforms/Utils/FoldUtils.cpp
index 056a681718e121..b2afe08430fc4d 100644
--- a/mlir/lib/Transforms/Utils/FoldUtils.cpp
+++ b/mlir/lib/Transforms/Utils/FoldUtils.cpp
@@ -152,8 +152,10 @@ bool OperationFolder::insertKnownConstant(Operation *op, Attribute constValue) {
// anything. Otherwise, we move the constant to the insertion block.
Block *insertBlock = &insertRegion->front();
if (opBlock != insertBlock || (&insertBlock->front() != op &&
- !isFolderOwnedConstant(op->getPrevNode())))
+ !isFolderOwnedConstant(op->getPrevNode()))) {
op->moveBefore(&insertBlock->front());
+ appendFoldedLocation(op, insertBlock->getParent()->getLoc());
+ }
folderConstOp = op;
referencedDialects[op].push_back(op->getDialect());
@@ -237,6 +239,7 @@ OperationFolder::processFoldResults(Operation *op,
auto *insertRegion = getInsertionRegion(interfaces, op->getBlock());
auto &entry = insertRegion->front();
rewriter.setInsertionPoint(&entry, entry.begin());
+ Location loc = getFusedLocation(op->getLoc(), insertRegion->getLoc());
// Get the constant map for the insertion region of this operation.
auto &uniquedConstants = foldScopes[insertRegion];
@@ -259,8 +262,8 @@ OperationFolder::processFoldResults(Operation *op,
// Check to see if there is a canonicalized version of this constant.
auto res = op->getResult(i);
Attribute attrRepl = foldResults[i].get<Attribute>();
- if (auto *constOp = tryGetOrCreateConstant(
- uniquedConstants, dialect, attrRepl, res.getType(), op->getLoc())) {
+ if (auto *constOp = tryGetOrCreateConstant(uniquedConstants, dialect,
+ attrRepl, res.getType(), loc)) {
// Ensure that this constant dominates the operation we are replacing it
// with. This may not automatically happen if the operation being folded
// was inserted before the constant within the insertion block.
@@ -364,31 +367,36 @@ static Location FlattenFusedLocationRecursively(const Location loc) {
return loc;
}
-void OperationFolder::appendFoldedLocation(Operation *retainedOp,
+Location OperationFolder::getFusedLocation(Location originalLocation,
Location foldedLocation) {
+ // If they're already equal, no need to fuse.
+ if (originalLocation == foldedLocation)
+ return originalLocation;
+
// Append into existing fused location if it has the same tag.
if (auto existingFusedLoc =
- dyn_cast<FusedLocWith<StringAttr>>(retainedOp->getLoc())) {
+ dyn_cast<FusedLocWith<StringAttr>>(originalLocation)) {
StringAttr existingMetadata = existingFusedLoc.getMetadata();
if (existingMetadata == fusedLocationTag) {
ArrayRef<Location> existingLocations = existingFusedLoc.getLocations();
SetVector<Location> locations(existingLocations.begin(),
existingLocations.end());
locations.insert(foldedLocation);
- Location newFusedLoc = FusedLoc::get(
- retainedOp->getContext(), locations.takeVector(), existingMetadata);
- retainedOp->setLoc(FlattenFusedLocationRecursively(newFusedLoc));
- return;
+ Location newFusedLoc =
+ FusedLoc::get(originalLocation->getContext(), locations.takeVector(),
+ existingMetadata);
+ return FlattenFusedLocationRecursively(newFusedLoc);
}
}
// Create a new fusedloc with retainedOp's loc and foldedLocation.
- // If they're already equal, no need to fuse.
- if (retainedOp->getLoc() == foldedLocation)
- return;
-
Location newFusedLoc =
- FusedLoc::get(retainedOp->getContext(),
- {retainedOp->getLoc(), foldedLocation}, fusedLocationTag);
- retainedOp->setLoc(FlattenFusedLocationRecursively(newFusedLoc));
+ FusedLoc::get(originalLocation->getContext(),
+ {originalLocation, foldedLocation}, fusedLocationTag);
+ return FlattenFusedLocationRecursively(newFusedLoc);
+}
+
+void OperationFolder::appendFoldedLocation(Operation *retainedOp,
+ Location foldedLocation) {
+ retainedOp->setLoc(getFusedLocation(retainedOp->getLoc(), foldedLocation));
}
diff --git a/mlir/test/Transforms/canonicalize-debuginfo.mlir b/mlir/test/Transforms/canonicalize-debuginfo.mlir
index 217cc29c0095e2..8b726eabb4c2e8 100644
--- a/mlir/test/Transforms/canonicalize-debuginfo.mlir
+++ b/mlir/test/Transforms/canonicalize-debuginfo.mlir
@@ -33,9 +33,31 @@ func.func @hoist_constant(%arg0: memref<8xi32>) {
memref.store %0, %arg0[%arg1] : memref<8xi32>
memref.store %1, %arg0[%arg1] : memref<8xi32>
}
+ // CHECK: return
return
-}
+// CHECK-NEXT: } loc(#[[LocFunc:.*]])
+} loc("hoist_constant":2:0)
// CHECK-DAG: #[[LocConst0:.*]] = loc("hoist_constant":0:0)
// CHECK-DAG: #[[LocConst1:.*]] = loc("hoist_constant":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocConst1]]])
+// CHECK-DAG: #[[LocFunc]] = loc("hoist_constant":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocFunc]], #[[LocConst1]]])
+
+// -----
+
+// CHECK-LABEL: func @hoist_constant_simple
+func.func @hoist_constant_simple(%arg0: memref<8xi32>) -> i32 {
+ // CHECK-NEXT: arith.constant 88 : i32 loc(#[[FusedLoc:.*]])
+ %0 = arith.constant 42 : i32 loc("hoist_constant_simple":0:0)
+ %1 = arith.constant 0 : index
+ memref.store %0, %arg0[%1] : memref<8xi32>
+
+ %2 = arith.constant 88 : i32 loc("hoist_constant_simple":1:0)
+
+ return %2 : i32
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("hoist_constant_simple":2:0)
+
+// CHECK-DAG: #[[LocConst1:.*]] = loc("hoist_constant_simple":1:0)
+// CHECK-DAG: #[[LocFunc]] = loc("hoist_constant_simple":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]]])
diff --git a/mlir/test/Transforms/constant-fold-debuginfo.mlir b/mlir/test/Transforms/constant-fold-debuginfo.mlir
index 79a25f860a4841..7e5ffc3c4d5b13 100644
--- a/mlir/test/Transforms/constant-fold-debuginfo.mlir
+++ b/mlir/test/Transforms/constant-fold-debuginfo.mlir
@@ -11,11 +11,13 @@ func.func @fold_and_merge() -> (i32, i32) {
%3 = arith.constant 6 : i32 loc("fold_and_merge":1:0)
return %2, %3: i32, i32
-}
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("fold_and_merge":2:0)
// CHECK-DAG: #[[LocConst0:.*]] = loc("fold_and_merge":0:0)
// CHECK-DAG: #[[LocConst1:.*]] = loc("fold_and_merge":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocConst0]]])
+// CHECK-DAG: #[[LocFunc]] = loc("fold_and_merge":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]], #[[LocConst0]]])
// -----
@@ -27,8 +29,30 @@ func.func @materialize_different_dialect() -> (f32, f32) {
%2 = arith.constant 1.0 : f32 loc("materialize_different_dialect":1:0)
return %1, %2: f32, f32
-}
+// CHECK: } loc(#[[LocFunc:.*]])
+} loc("materialize_different_dialect":2:0)
// CHECK-DAG: #[[LocConst0:.*]] = loc("materialize_different_dialect":0:0)
// CHECK-DAG: #[[LocConst1:.*]] = loc("materialize_different_dialect":1:0)
-// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocConst0]]])
+// CHECK-DAG: #[[LocFunc]] = loc("materialize_different_dialect":2:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst1]], #[[LocFunc]], #[[LocConst0]]])
+
+// -----
+
+// CHECK-LABEL: func @materialize_in_front
+func.func @materialize_in_front(%arg0: memref<8xi32>) {
+ // CHECK-NEXT: arith.constant 6 : i32 loc(#[[FusedLoc:.*]])
+ affine.for %arg1 = 0 to 8 {
+ %1 = arith.constant 1 : i32
+ %2 = arith.constant 5 : i32
+ %3 = arith.addi %1, %2 : i32 loc("materialize_in_front":0:0)
+ memref.store %3, %arg0[%arg1] : memref<8xi32>
+ }
+ // CHECK: return
+ return
+// CHECK-NEXT: } loc(#[[LocFunc:.*]])
+} loc("materialize_in_front":1:0)
+
+// CHECK-DAG: #[[LocConst0:.*]] = loc("materialize_in_front":0:0)
+// CHECK-DAG: #[[LocFunc]] = loc("materialize_in_front":1:0)
+// CHECK: #[[FusedLoc]] = loc(fused<"CSE">[#[[LocConst0]], #[[LocFunc]]])
|
Why is this different from other CSE cases (or other code motion)?
Is this description actually correct? Or is it taking advantage that we always materialize at the beginning of the entry block of a region? Another question: what if there are block-arguments with locations that have themselves a lowering that will insert operations at the beginning of the block? Aren't you creating a worsen problem? |
Oh do you mean other CSE or code motion in general? I don't think this one's special. I'm just starting with this case, and if we get to an agreed practice for managing locations during code motion & CSE, we can apply it to other optimizations. Compared to the previous PR, the difference is this handles moving & materializing ops, while the previous one only handled deduplicating ops. It's likely these happen together, but they can also occur individually (the tests added in this PR checks that).
Right, this is only correct because we move to / materialize at the beginning of the entry block in this optimization. Other patterns would need to be considered separately.
If we settle on the convention that a "prologue" of a region (all the user-invisible/compiler-generated setup instructions) should carry the location of the parent region, then the lowerings for the block arguments should also make sure to use the parent region location (if the arguments don't already carry the parent location). |
So I saw this comment from the previous PR warning about OOM / timeouts, and I wonder if we need to make a tradeoff between tracking everything & efficiency. Would you be open to doing something lossy, like what was done in LLVM, where whenever we move / materialize / deduplicate constants, we just set the location to the parent location (and throw away the original locations). That achieves the same ultimate effect (clearer op location meaning & no out-of-order stepping) without dragging all the original source locations along. While it may feel weird throwing away locations, that's essentially what was done before the previous PR (except for an arbitrary one that was kept). |
If we get into such tradeoff, we could just as well be dropping the location entirely from what I can tell. |
Actually... yeah that works too 😂 what do you say we just drop locations entirely then? That seems like the cleanest solution so far. |
Seems like a reasonable tradeoff all things considered? In particular for constants... |
OK. Will open a new PR with the new solution 🙏 . |
Follow up to the discussion from #75258, and serves as an alternate solution for #74670. Set the location to Unknown for deduplicated / moved / materialized constants by OperationFolder. This makes sure that the folded constants don't end up with an arbitrary location of one of the original ops that became it, and that hoisted ops don't confuse the stepping order.
Followup on #74670.
When operation folder hoists constants, the constant op that is moved (or materialized elsewhere) should have its location fused with the location of the parent region of its insertion point. This is to reflect the new meaning of this constant op, which is that it is created at the front of the region to be used by any op in the body if it needs such a constant.
This is motivated by the same reason as a similar change in LLVM (https://reviews.llvm.org/D38088): Keeping the original location will cause stepping in a debugger to be out of order. If we use the following psuedocode as example, after op folding today, the stepping behavior will be in the order of:
arg
This implies that whatever code that initializes the constant at the front of this function cannot be considered solely to be the result of line 2. In LLVM, the solution was to merge the location of the insertion point instruction. In this case, it'll return a location corresponding to the parent scope (which is this function). For us, since MLIR allows carrying around multiple locations as a result of optimizations, we can take advantage of that and just fuse the location of the parent region. This will make the hoisted ops look similar to other source-dependent setup code that may exist in the front of a function already, in that they also carry the location of the function. Backends will then be able to differentiate what belongs in the prologue of a function and instruct the debugger accordingly.
Please LMK if anything about this isn't clear. Thanks!