Skip to content

[mlir][doc] Improve Destination-passing-style documentation #70283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 26, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 32 additions & 14 deletions mlir/docs/Bufferization.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,26 +101,45 @@ bufferization strategy would be unacceptable for high-performance codegen. When
choosing an already existing buffer, we must be careful not to accidentally
overwrite data that is still needed later in the program.

To simplify this problem, One-Shot Bufferize was designed for ops that are in
*destination-passing style*. For every tensor result, such ops have a tensor
operand, whose buffer could be utilized for storing the result of the op in the
absence of other conflicts. We call such tensor operands the *destination*.
To simplify this problem, One-Shot Bufferize was designed to take advantage of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't current generate web docs for this, but we could like to this: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Interfaces/DestinationStyleOpInterface.td

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

*destination-passing style*. This form exists in itself independently of
bufferization and is tied to SSA semantics: many ops are “updating” part of
their input SSA variable. For example the LLVM instruction
[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction)
is inserting an element inside a vector. Since SSA values are immutable, the
operation returns a copy of the input vector with the element inserted.
Another example in MLIR is `linalg.generic`, which always has an extra `outs`
operand which provides the initial values to update (for example when the
operation is doing a reduction).

This input is referred to as "destination" in the following (quotes are
important as this operand isn't modified in place but copied) and comes into
place in the context of bufferization as a possible "anchor" for the
bufferization algorithm. This allows the user to shape the input in a form that
guarantees close to optimal bufferization result when carefully choosing the
SSA value used as "destination".

For every tensor result, a "destination-passing" style op has a corresponding
tensor operand. If there aren't any other uses of this tensor, the bufferization
can alias it with the op result and perform the operation "in-place" by reusing
the buffer allocated for this "destination" input.

As an example, consider the following op: `%0 = tensor.insert %cst into
%t[%idx] : tensor<?xf32>`

`%t` is the destination in this example. When choosing a buffer for the result
`%t` is the "destination" in this example. When choosing a buffer for the result
`%0`, denoted as `buffer(%0)`, One-Shot Bufferize considers only two options:

1. `buffer(%0) = buffer(%t)`, or
1. `buffer(%0) = buffer(%t)` : alias the "destination" tensor with the
result and perform the operation in-place.
2. `buffer(%0)` is a newly allocated buffer.

There may be other buffers in the same function that could potentially be used
for `buffer(%0)`, but those are not considered by One-Shot Bufferize to keep the
bufferization simple. One-Shot Bufferize could be extended to consider such
buffers in the future to achieve a better quality of bufferization.

Tensor ops that are not in destination-passing style always bufferize to a
Tensor ops that are not in destination-passing style always bufferized to a
memory allocation. E.g.:

```mlir
Expand All @@ -131,10 +150,10 @@ memory allocation. E.g.:
} : tensor<?xf32>
```

The result of `tensor.generate` does not have a destination operand, so
The result of `tensor.generate` does not have a "destination" operand, so
bufferization allocates a new buffer. This could be avoided by choosing an
op such as `linalg.generic`, which can express the same computation with a
destination operand, as specified behind outputs (`outs`):
"destination" operand, as specified behind outputs (`outs`):

```mlir
#map = affine_map<(i) -> (i)>
Expand All @@ -159,14 +178,13 @@ slice of a tensor:
```

The above example bufferizes to a `memref.subview`, followed by a
"`linalg.generic` on memrefs" that overwrites the memory of the subview. The
`tensor.insert_slice` bufferizes to a no-op (in the absence of RaW conflicts
such as a subsequent read of `%s`).
"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming
that the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes
to a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`).

RaW conflicts are detected with an analysis of SSA use-def chains (details
later). One-Shot Bufferize works best if there is a single SSA use-def chain,
where the result of a tensor op is the destination operand of the next tensor
ops, e.g.:
where the result of a tensor op is the operand of the next tensor ops, e.g.:

```mlir
%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
Expand Down