Skip to content

[mlir][bufferization] Add an ownership based buffer deallocation pass #66337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
604 changes: 604 additions & 0 deletions mlir/docs/Bufferization.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,14 @@ class BufferPlacementTransformationBase {
Liveness liveness;
};

/// Compare two SSA values in a deterministic manner. Two block arguments are
/// ordered by argument number, block arguments are always less than operation
/// results, and operation results are ordered by the `isBeforeInBlock` order of
/// their defining operation.
struct ValueComparator {
bool operator()(const Value &lhs, const Value &rhs) const;
};

// Create a global op for the given tensor-valued constant in the program.
// Globals are created lazily at the top of the enclosing ModuleOp with pretty
// names. Duplicates are avoided.
Expand Down
9 changes: 9 additions & 0 deletions mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "mlir/Pass/Pass.h"

namespace mlir {
class FunctionOpInterface;
class ModuleOp;
class RewritePatternSet;
class OpBuilder;
Expand All @@ -27,6 +28,10 @@ struct OneShotBufferizationOptions;
/// buffers.
std::unique_ptr<Pass> createBufferDeallocationPass();

/// Creates an instance of the OwnershipBasedBufferDeallocation pass to free all
/// allocated buffers.
std::unique_ptr<Pass> createOwnershipBasedBufferDeallocationPass();

/// Creates a pass that optimizes `bufferization.dealloc` operations. For
/// example, it reduces the number of alias checks needed at runtime using
/// static alias analysis.
Expand Down Expand Up @@ -127,6 +132,10 @@ func::FuncOp buildDeallocationLibraryFunction(OpBuilder &builder, Location loc,
/// Run buffer deallocation.
LogicalResult deallocateBuffers(Operation *op);

/// Run ownership basedbuffer deallocation.
LogicalResult deallocateBuffersOwnershipBased(FunctionOpInterface op,
bool privateFuncDynamicOwnership);

/// Creates a pass that moves allocations upwards to reduce the number of
/// required copies that are inserted during the BufferDeallocation pass.
std::unique_ptr<Pass> createBufferHoistingPass();
Expand Down
144 changes: 144 additions & 0 deletions mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,150 @@ def BufferDeallocation : Pass<"buffer-deallocation", "func::FuncOp"> {
let constructor = "mlir::bufferization::createBufferDeallocationPass()";
}

def OwnershipBasedBufferDeallocation : Pass<
"ownership-based-buffer-deallocation", "func::FuncOp"> {
let summary = "Adds all required dealloc operations for all allocations in "
"the input program";
let description = [{
This pass implements an algorithm to automatically introduce all required
deallocation operations for all buffers in the input program. This ensures
that the resulting program does not have any memory leaks.

The Buffer Deallocation pass operates on the level of operations
implementing the FunctionOpInterface. Such operations can take MemRefs as
arguments, but also return them. To ensure compatibility among all functions
(including external ones), some rules have to be enforced. They are just
assumed to hold for all external functions. Functions for which the
definition is available ideally also already adhere to the ABI.
Otherwise, all MemRef write operations in the input IR must dominate all
MemRef read operations in the input IR. Then, the pass may modify the input
IR by inserting `bufferization.clone` operations such that the output IR
adheres to the function boundary ABI:
* When a MemRef is passed as a function argument, ownership is never
acquired. It is always the caller's responsibility to deallocate such
MemRefs.
* Returning a MemRef from a function always passes ownership to the caller,
i.e., it is also the caller's responsibility to deallocate MemRefs
returned from a called function.
* A function must not return a MemRef with the same allocated base buffer as
one of its arguments (in this case a copy has to be created). Note that in
this context two subviews of the same buffer that don't overlap are also
considered an alias.

It is recommended to bufferize all operations first such that no tensor
values remain in the IR once this pass is applied. That way all allocated
MemRefs will be properly deallocated without any additional manual work.
Otherwise, the pass that bufferizes the remaining tensors is responsible to
add the corresponding deallocation operations. Note that this pass does not
consider any values of tensor type and assumes that MemRef values defined by
`bufferization.to_memref` do not return ownership and do not have to be
deallocated. `bufferization.to_tensor` operations are handled similarly to
`bufferization.clone` operations with the exception that the result value is
not handled because it's a tensor (not a MemRef).

Input

```mlir
#map0 = affine_map<(d0) -> (d0)>
module {
func.func @condBranch(%arg0: i1,
%arg1: memref<2xf32>,
%arg2: memref<2xf32>) {
cf.cond_br %arg0, ^bb1, ^bb2
^bb1:
cf.br ^bb3(%arg1 : memref<2xf32>)
^bb2:
%0 = memref.alloc() : memref<2xf32>
linalg.generic {
args_in = 1 : i64,
args_out = 1 : i64,
indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]}
outs(%arg1, %0 : memref<2xf32>, memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32
}
cf.br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):
"memref.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return
}
}
```

Output

```mlir
#map = affine_map<(d0) -> (d0)>
module {
func.func @condBranch(%arg0: i1,
%arg1: memref<2xf32>,
%arg2: memref<2xf32>) {
%false = arith.constant false
%true = arith.constant true
cf.cond_br %arg0, ^bb1, ^bb2
^bb1: // pred: ^bb0
cf.br ^bb3(%arg1, %false : memref<2xf32>, i1)
^bb2: // pred: ^bb0
%alloc = memref.alloc() : memref<2xf32>
linalg.generic {
indexing_maps = [#map, #map],
iterator_types = ["parallel"]}
outs(%arg1, %alloc : memref<2xf32>, memref<2xf32>)
attrs = {args_in = 1 : i64, args_out = 1 : i64} {
^bb0(%out: f32, %out_0: f32):
%2 = math.exp %out : f32
linalg.yield %2, %out_0 : f32, f32
}
cf.br ^bb3(%alloc, %true : memref<2xf32>, i1)
^bb3(%0: memref<2xf32>, %1: i1): // 2 preds: ^bb1, ^bb2
memref.copy %0, %arg2 : memref<2xf32> to memref<2xf32>
%base_buffer, %offset, %sizes, %strides =
memref.extract_strided_metadata %0 :
memref<2xf32> -> memref<f32>, index, index, index
bufferization.dealloc (%base_buffer : memref<f32>) if (%1)
return
}
}
```

The `private-function-dynamic-ownership` pass option allows the pass to add
additional arguments to private functions to dynamically give ownership of
MemRefs to callees. This can enable earlier deallocations and allows the
pass to by-pass the function boundary ABI and thus potentially leading to
fewer MemRef clones being inserted. For example, the private function
```mlir
func.func private @passthrough(%memref: memref<2xi32>) -> memref<2xi32> {
return %memref : memref<2xi32>
}
```
would be converted to
```mlir
func.func private @passthrough(%memref: memref<2xi32>,
%ownership: i1) -> (memref<2xi32>, i1) {
return %memref, %ownership : memref<2xi32>, i1
}
```
and thus allows the returned MemRef to alias with the MemRef passed as
argument (which would otherwise be forbidden according to the function
boundary ABI).
}];
let options = [
Option<"privateFuncDynamicOwnership", "private-function-dynamic-ownership",
"bool", /*default=*/"false",
"Allows to add additional arguments to private functions to "
"dynamically pass ownership of memrefs to callees. This can enable "
"earlier deallocations.">,
];
let constructor = "mlir::bufferization::createOwnershipBasedBufferDeallocationPass()";

let dependentDialects = [
"mlir::bufferization::BufferizationDialect", "mlir::arith::ArithDialect",
"mlir::memref::MemRefDialect", "mlir::scf::SCFDialect"
];
}

def BufferDeallocationSimplification :
Pass<"buffer-deallocation-simplification", "func::FuncOp"> {
let summary = "Optimizes `bufferization.dealloc` operation for more "
Expand Down
59 changes: 59 additions & 0 deletions mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,62 @@ bufferization::getGlobalFor(arith::ConstantOp constantOp, uint64_t alignment,
global->moveBefore(&moduleOp.front());
return global;
}

//===----------------------------------------------------------------------===//
// ValueComparator
//===----------------------------------------------------------------------===//

bool ValueComparator::operator()(const Value &lhs, const Value &rhs) const {
if (lhs == rhs)
return false;

// Block arguments are less than results.
bool lhsIsBBArg = lhs.isa<BlockArgument>();
if (lhsIsBBArg != rhs.isa<BlockArgument>()) {
return lhsIsBBArg;
}

Region *lhsRegion;
Region *rhsRegion;
if (lhsIsBBArg) {
auto lhsBBArg = llvm::cast<BlockArgument>(lhs);
auto rhsBBArg = llvm::cast<BlockArgument>(rhs);
if (lhsBBArg.getArgNumber() != rhsBBArg.getArgNumber()) {
return lhsBBArg.getArgNumber() < rhsBBArg.getArgNumber();
}
lhsRegion = lhsBBArg.getParentRegion();
rhsRegion = rhsBBArg.getParentRegion();
assert(lhsRegion != rhsRegion &&
"lhsRegion == rhsRegion implies lhs == rhs");
} else if (lhs.getDefiningOp() == rhs.getDefiningOp()) {
return llvm::cast<OpResult>(lhs).getResultNumber() <
llvm::cast<OpResult>(rhs).getResultNumber();
} else {
lhsRegion = lhs.getDefiningOp()->getParentRegion();
rhsRegion = rhs.getDefiningOp()->getParentRegion();
if (lhsRegion == rhsRegion) {
return lhs.getDefiningOp()->isBeforeInBlock(rhs.getDefiningOp());
}
}

// lhsRegion != rhsRegion, so if we look at their ancestor chain, they
// - have different heights
// - or there's a spot where their region numbers differ
// - or their parent regions are the same and their parent ops are
// different.
while (lhsRegion && rhsRegion) {
if (lhsRegion->getRegionNumber() != rhsRegion->getRegionNumber()) {
return lhsRegion->getRegionNumber() < rhsRegion->getRegionNumber();
}
if (lhsRegion->getParentRegion() == rhsRegion->getParentRegion()) {
return lhsRegion->getParentOp()->isBeforeInBlock(
rhsRegion->getParentOp());
}
lhsRegion = lhsRegion->getParentRegion();
rhsRegion = rhsRegion->getParentRegion();
}
if (rhsRegion)
return true;
assert(lhsRegion && "this should only happen if lhs == rhs");
return false;
}
2 changes: 2 additions & 0 deletions mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ add_mlir_dialect_library(MLIRBufferizationTransforms
LowerDeallocations.cpp
OneShotAnalysis.cpp
OneShotModuleBufferize.cpp
OwnershipBasedBufferDeallocation.cpp
TensorCopyInsertion.cpp

ADDITIONAL_HEADER_DIRS
Expand All @@ -34,6 +35,7 @@ add_mlir_dialect_library(MLIRBufferizationTransforms
MLIRPass
MLIRTensorDialect
MLIRSCFDialect
MLIRControlFlowDialect
MLIRSideEffectInterfaces
MLIRTransforms
MLIRViewLikeInterface
Expand Down
Loading