Skip to content

Commit 76dea22

Browse files
authored
[mlir][doc] Improve Destination-passing-style documentation (#70283)
Trying to help with confusion, like here: https://discourse.llvm.org/t/74396
1 parent f40ed13 commit 76dea22

File tree

1 file changed

+32
-14
lines changed

1 file changed

+32
-14
lines changed

mlir/docs/Bufferization.md

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -101,26 +101,45 @@ bufferization strategy would be unacceptable for high-performance codegen. When
101101
choosing an already existing buffer, we must be careful not to accidentally
102102
overwrite data that is still needed later in the program.
103103

104-
To simplify this problem, One-Shot Bufferize was designed for ops that are in
105-
*destination-passing style*. For every tensor result, such ops have a tensor
106-
operand, whose buffer could be utilized for storing the result of the op in the
107-
absence of other conflicts. We call such tensor operands the *destination*.
104+
To simplify this problem, One-Shot Bufferize was designed to take advantage of
105+
*destination-passing style*. This form exists in itself independently of
106+
bufferization and is tied to SSA semantics: many ops are “updating” part of
107+
their input SSA variable. For example the LLVM instruction
108+
[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction)
109+
is inserting an element inside a vector. Since SSA values are immutable, the
110+
operation returns a copy of the input vector with the element inserted.
111+
Another example in MLIR is `linalg.generic`, which always has an extra `outs`
112+
operand which provides the initial values to update (for example when the
113+
operation is doing a reduction).
114+
115+
This input is referred to as "destination" in the following (quotes are
116+
important as this operand isn't modified in place but copied) and comes into
117+
place in the context of bufferization as a possible "anchor" for the
118+
bufferization algorithm. This allows the user to shape the input in a form that
119+
guarantees close to optimal bufferization result when carefully choosing the
120+
SSA value used as "destination".
121+
122+
For every tensor result, a "destination-passing" style op has a corresponding
123+
tensor operand. If there aren't any other uses of this tensor, the bufferization
124+
can alias it with the op result and perform the operation "in-place" by reusing
125+
the buffer allocated for this "destination" input.
108126

109127
As an example, consider the following op: `%0 = tensor.insert %cst into
110128
%t[%idx] : tensor<?xf32>`
111129

112-
`%t` is the destination in this example. When choosing a buffer for the result
130+
`%t` is the "destination" in this example. When choosing a buffer for the result
113131
`%0`, denoted as `buffer(%0)`, One-Shot Bufferize considers only two options:
114132

115-
1. `buffer(%0) = buffer(%t)`, or
133+
1. `buffer(%0) = buffer(%t)` : alias the "destination" tensor with the
134+
result and perform the operation in-place.
116135
2. `buffer(%0)` is a newly allocated buffer.
117136

118137
There may be other buffers in the same function that could potentially be used
119138
for `buffer(%0)`, but those are not considered by One-Shot Bufferize to keep the
120139
bufferization simple. One-Shot Bufferize could be extended to consider such
121140
buffers in the future to achieve a better quality of bufferization.
122141

123-
Tensor ops that are not in destination-passing style always bufferize to a
142+
Tensor ops that are not in destination-passing style always bufferized to a
124143
memory allocation. E.g.:
125144

126145
```mlir
@@ -131,10 +150,10 @@ memory allocation. E.g.:
131150
} : tensor<?xf32>
132151
```
133152

134-
The result of `tensor.generate` does not have a destination operand, so
153+
The result of `tensor.generate` does not have a "destination" operand, so
135154
bufferization allocates a new buffer. This could be avoided by choosing an
136155
op such as `linalg.generic`, which can express the same computation with a
137-
destination operand, as specified behind outputs (`outs`):
156+
"destination" operand, as specified behind outputs (`outs`):
138157

139158
```mlir
140159
#map = affine_map<(i) -> (i)>
@@ -159,14 +178,13 @@ slice of a tensor:
159178
```
160179

161180
The above example bufferizes to a `memref.subview`, followed by a
162-
"`linalg.generic` on memrefs" that overwrites the memory of the subview. The
163-
`tensor.insert_slice` bufferizes to a no-op (in the absence of RaW conflicts
164-
such as a subsequent read of `%s`).
181+
"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming
182+
that the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes
183+
to a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`).
165184

166185
RaW conflicts are detected with an analysis of SSA use-def chains (details
167186
later). One-Shot Bufferize works best if there is a single SSA use-def chain,
168-
where the result of a tensor op is the destination operand of the next tensor
169-
ops, e.g.:
187+
where the result of a tensor op is the operand of the next tensor ops, e.g.:
170188

171189
```mlir
172190
%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)

0 commit comments

Comments
 (0)