Skip to content

[mlir][func] Remove func-bufferize pass #114152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 2 additions & 268 deletions mlir/docs/Bufferization.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@ the resulting `memref` IR has no memory leaks.

## Deprecated Passes

The old dialect conversion-based bufferization passes have been deprecated and
should not be used anymore. Most of those passes have already been removed from
MLIR. One-Shot Bufferize produces in better bufferization results with fewer
memory allocations and buffer copies.

The buffer deallocation pass has been deprecated in favor of the ownership-based
buffer deallocation pipeline. The deprecated pass has some limitations that may
cause memory leaks in the resulting IR.
Expand Down Expand Up @@ -276,18 +271,13 @@ semantics (i.e., tensor result or tensor operand) that is not bufferizable
`to_memref`/`to_tensor` ops around the bufferization boundary.

One-Shot Bufferize can be configured to bufferize only ops from a set of
dialects with `dialect-filter`. This can be useful for gradually migrating from
dialect conversion-based bufferization to One-Shot Bufferize. One-Shot Bufferize
must run first in such a case, because dialect conversion-based bufferization
generates `to_tensor` ops without the `restrict` unit attribute, which One-Shot
Bufferize cannot analyze.
dialects with `dialect-filter`.

One-Shot Bufferize can also be called programmatically with
[`bufferization::runOneShotBufferize`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L167).
Alternatively,
[`bufferization::bufferizeOp`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/Bufferize.h#L78)
skips the analysis and inserts a copy on every buffer write, just like the
dialect conversion-based bufferization.
skips the analysis and inserts a copy on every buffer write.

By default, function boundaries are not bufferized. This is because there are
currently limitations around function graph bufferization: recursive
Expand Down Expand Up @@ -484,259 +474,3 @@ conflict detection algorithm, interested users may want to refer to:
* [Original design document](https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf)
* [ODM talk](https://youtu.be/TXEo59CYS9A), ([slides](https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf)).
* [LLVM Dev Meeting 2023 tutorial slides](https://m-sp.org/downloads/llvm_dev_2023.pdf)

## Migrating from Dialect Conversion-based Bufferization

Both dialect conversion-based bufferization and One-Shot Bufferize generate
`to_tensor`/`to_memref` ops at the bufferization boundary (when run with
`allow-unknown-ops`). They can be combined and run in sequence. However,
One-Shot Bufferize must run first because it cannot analyze those boundary ops.
To update existing code step-by-step, it may be useful to specify a dialect
filter for One-Shot Bufferize, so that dialects can be switched over one-by-one.

## Dialect Conversion-based Bufferization

Disclaimer: Most dialect conversion-based bufferization has been migrated to
One-Shot Bufferize. New users should use One-Shot Bufferize (with or without
analysis). The following documentation is only for existing users of dialect
conversion-based bufferization.

This system is a simple application of MLIR's dialect conversion infrastructure.
The bulk of the code related to bufferization is a set of ordinary
`ConversionPattern`'s that dialect authors write for converting ops that operate
on `tensor`'s to ops that operate on `memref`'s. A set of conventions and best
practices are followed that allow these patterns to be run across multiple
independent passes (rather than requiring a single huge atomic conversion pass),
which makes the compilation pipelines scalable, robust, and easy to debug.

This document is targeted at people looking to utilize MLIR's bufferization
functionality, along with people who want to extend it to cover their own ops.

<a name="the-talk">**NOTE:**</a> Before reading this document, please watch the
talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
Infrastructure"
([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
That talk gives a high-level overview of the bufferization infrastructure and
important conceptual details related to using the MLIR dialect conversion
infrastructure.

### Bufferization's place in a compilation pipeline

Bufferization itself does not free any of the buffers that have been allocated,
nor does it do anything particularly intelligent with the placement of buffers
w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
of:

1. Bufferization
1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
`promote-buffers-to-stack`, which do optimizations that are only exposed
after bufferization.
1. Finally, running the [ownership-based buffer deallocation](OwnershipBasedBufferDeallocation.md)
pass.

After buffer deallocation has been completed, the program will be quite
difficult to transform due to the presence of the deallocation ops. Thus, other
optimizations such as linalg fusion on memrefs should be done before that stage.

### General structure of the bufferization process

Bufferization consists of running multiple *partial* bufferization passes,
followed by one *finalizing* bufferization pass.

There is typically one partial bufferization pass per dialect (though other
subdivisions are possible). For example, for a dialect `X` there will typically
be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
By running pass `X-bufferize` for each dialect `X` in the program, all the ops
in the program are incrementally bufferized.

Partial bufferization passes create programs where only some ops have been
bufferized. These passes will create *materializations* (also sometimes called
"casts") that convert between the `tensor` and `memref` type, which allows
bridging between ops that have been bufferized and ops that have not yet been
bufferized.

Finalizing bufferizations complete the bufferization process, and guarantee that
there are no tensors remaining in the program. This involves eliminating the
materializations. The pass `finalizing-bufferize` provides a minimal pass that
only eliminates materializations and issues an error if any unbufferized ops
exist in the program.

However, it is possible for a finalizing bufferization to do more than just
eliminate materializations. By adding patterns (just as a partial bufferization
would), it is possible for a finalizing bufferization pass to simultaneously
bufferize ops and eliminate materializations. This has a number of disadvantages
discussed in the talk and should generally be avoided.

### Example

As a concrete example, we will look at the bufferization pipeline from the
`mlir-npcomp` reference backend
([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
The code, slightly simplified and annotated, is reproduced here:

```c++
// Partial bufferization passes.
pm.addPass(createTensorConstantBufferizePass());
pm.addNestedPass<func::FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
pm.addNestedPass<func::FuncOp>(createLinalgBufferizePass());
pm.addNestedPass<func::FuncOp>(createTensorBufferizePass());
pm.addPass(createFuncBufferizePass());

// Finalizing bufferization pass.
pm.addNestedPass<func::FuncOp>(createFinalizingBufferizePass());
```

Looking first at the partial bufferization passes, we see that there are a
sequence of `FuncOp` passes (which run in parallel on functions). These function
passes are bracketed by `arith-bufferize` and `func-bufferize`, which are module
passes (and thus serialize the parallel compilation process). These two passes
must be module passes because they make changes to the top-level module.

The bulk of the bufferization work is done by the function passes. Most of these
passes are provided as part of the upstream MLIR distribution and bufferize
their respective dialects (e.g. `abc-bufferize` bufferizes the `abc` dialect).
The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
the other passes provided upstream.

The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
backend has arranged that all ops are bufferized by partial bufferizations, so
that the upstream `finalizing-bufferize` pass can be used as the finalizing
bufferization pass. This gives excellent diagnostics when something goes wrong
with the bufferization process, such as due to an op that wasn't handled by any
pattern.

### How to write a partial bufferization pass

The contract of a partial bufferization pass is that a subset of ops (or kinds
of ops, customizable by a ConversionTarget) get bufferized.

A partial bufferization pass is just a pass that uses the
[dialect conversion](DialectConversion.md) framework to apply
`ConversionPattern`s with a `tensor` to `memref` type conversion.

To describe how to write such a pass, we will walk through an example, the
`tensor-bufferize` pass
([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
that bufferizes the `tensor` dialect. Note that these passes have been replaced
with a `BufferizableOpInterface`-based implementation in the meantime, so we
have to take a looker at an older version of the code.

The bulk of the code in the pass will be a set of conversion patterns, with a
simple example being
[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).

```
class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
public:
using OpConversionPattern::OpConversionPattern;
LogicalResult
matchAndRewrite(tensor::CastOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {
auto resultType = getTypeConverter()->convertType(op.getType());
rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, adaptor.source());
return success();
}
};
```

See [the talk](#the-talk) for more details on how to write these patterns.

The
[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
is very small, and follows the basic pattern of any dialect conversion pass.

```
void mlir::populateTensorBufferizePatterns(
const BufferizeTypeConverter &typeConverter, RewritePatternSet &patterns) {
patterns.add<BufferizeCastOp, BufferizeExtractOp>(typeConverter,
patterns.getContext());
}

struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
void runOnOperation() override {
auto *context = &getContext();
BufferizeTypeConverter typeConverter;
RewritePatternSet patterns(context);
ConversionTarget target(*context);

populateTensorBufferizePatterns(typeConverter, patterns);
target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
target.addLegalDialect<func::FuncDialect>();

if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();
}
};
```

The pass has all the hallmarks of a dialect conversion pass that does type
conversions: a `TypeConverter`, a `RewritePatternSet`, and a `ConversionTarget`,
and a call to `applyPartialConversion`. Note that a function
`populateTensorBufferizePatterns` is separated, so that power users can use the
patterns independently, if necessary (such as to combine multiple sets of
conversion patterns into a single conversion call, for performance).

One convenient utility provided by the MLIR bufferization infrastructure is the
`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
and materializations between `tensor` and `memref`.

In this case, the `BufferizationOpsDialect` is marked as legal, so the
`bufferization.to_tensor` and `bufferization.to_memref` ops, which are inserted
automatically by the dialect conversion framework as materializations, are
legal. There is a helper `populateBufferizeMaterializationLegality`
([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
which helps with this in general.

### Other partial bufferization examples

- `func-bufferize`
([code](https://github.com/llvm/llvm-project/blob/2f5715dc78328215d51d5664c72c632a6dac1046/mlir/lib/Dialect/Func/Transforms/FuncBufferize.cpp#L1),
[test](https://github.com/llvm/llvm-project/blob/2f5715dc78328215d51d5664c72c632a6dac1046/mlir/test/Dialect/Func/func-bufferize.mlir#L1))

- Bufferizes `func`, `call`, and `BranchOpInterface` ops.
- This is an example of how to bufferize ops that have multi-block
regions.
- This is an example of a pass that is not split along dialect
subdivisions.

### How to write a finalizing bufferization pass

The contract of a finalizing bufferization pass is that all tensors are gone
from the program.

The easiest way to write a finalizing bufferize pass is to not write one at all!
MLIR provides a pass `finalizing-bufferize` which eliminates the
`bufferization.to_tensor` / `bufferization.to_memref` materialization ops
inserted by partial bufferization passes and emits an error if that is not
sufficient to remove all tensors from the program.

This pass is sufficient when partial bufferization passes have bufferized all
the ops in the program, leaving behind only the materializations. When possible,
it is recommended to structure your pass pipeline this way, as this has the
significant advantage that if an op does not get bufferized (due to a missing
pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
error, and the IR seen by `finalizing-bufferize` will only contain only one
unbufferized op.

However, before the current bufferization infrastructure was put in place,
bufferization could only be done as a single finalizing bufferization mega-pass
that used the `populate*BufferizePatterns` functions from multiple dialects to
simultaneously bufferize everything at once. Thus, one might see code in
downstream projects structured this way. This structure is not recommended in
new code. A helper, `populateEliminateBufferizeMaterializationsPatterns`
([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
is available for such passes to provide patterns that eliminate
`bufferization.to_tensor` and `bufferization.to_memref`.

### Changes since [the talk](#the-talk)

- `func-bufferize` was changed to be a partial conversion pass, and there is a
new `finalizing-bufferize` which serves as a general finalizing
bufferization pass.
- Most partial bufferization passes have been reimplemented in terms of
`BufferizableOpInterface`. New users should use One-Shot Bufferize instead
of dialect conversion-based bufferization.
3 changes: 0 additions & 3 deletions mlir/include/mlir/Dialect/Func/Transforms/Passes.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ namespace func {
#define GEN_PASS_DECL
#include "mlir/Dialect/Func/Transforms/Passes.h.inc"

/// Creates an instance of func bufferization pass.
std::unique_ptr<Pass> createFuncBufferizePass();

/// Pass to deduplicate functions.
std::unique_ptr<Pass> createDuplicateFunctionEliminationPass();

Expand Down
29 changes: 0 additions & 29 deletions mlir/include/mlir/Dialect/Func/Transforms/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -11,35 +11,6 @@

include "mlir/Pass/PassBase.td"

def FuncBufferize : Pass<"func-bufferize", "ModuleOp"> {
let summary = "Bufferize func/call/return ops";
let description = [{
A bufferize pass that bufferizes func.func and func.call ops.

Because this pass updates func.func ops, it must be a module pass. It is
useful to keep this pass separate from other bufferizations so that the
other ones can be run at function-level in parallel.

This pass must be done atomically because it changes func op signatures,
which requires atomically updating calls as well throughout the entire
module.

This pass also changes the type of block arguments, which requires that all
successor arguments of predecessors be converted. This is achieved by
rewriting terminators based on the information provided by the
`BranchOpInterface`.
As this pass rewrites function operations, it also rewrites the
corresponding return operations. Other return-like operations that
implement the `ReturnLike` trait are not rewritten in general, as they
require that the corresponding parent operation is also rewritten.
Finally, this pass fails for unknown terminators, as we cannot decide
whether they need rewriting.
}];
let constructor = "mlir::func::createFuncBufferizePass()";
let dependentDialects = ["bufferization::BufferizationDialect",
"memref::MemRefDialect"];
}

def DuplicateFunctionEliminationPass : Pass<"duplicate-function-elimination",
"ModuleOp"> {
let summary = "Deduplicate functions";
Expand Down
3 changes: 0 additions & 3 deletions mlir/lib/Dialect/Func/Transforms/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
add_mlir_dialect_library(MLIRFuncTransforms
DecomposeCallGraphTypes.cpp
DuplicateFunctionElimination.cpp
FuncBufferize.cpp
FuncConversions.cpp
OneToNFuncConversions.cpp

Expand All @@ -12,8 +11,6 @@ add_mlir_dialect_library(MLIRFuncTransforms
MLIRFuncTransformsIncGen

LINK_LIBS PUBLIC
MLIRBufferizationDialect
MLIRBufferizationTransforms
MLIRFuncDialect
MLIRIR
MLIRMemRefDialect
Expand Down
Loading
Loading