@@ -101,26 +101,45 @@ bufferization strategy would be unacceptable for high-performance codegen. When
101
101
choosing an already existing buffer, we must be careful not to accidentally
102
102
overwrite data that is still needed later in the program.
103
103
104
- To simplify this problem, One-Shot Bufferize was designed for ops that are in
105
- * destination-passing style* . For every tensor result, such ops have a tensor
106
- operand, whose buffer could be utilized for storing the result of the op in the
107
- absence of other conflicts. We call such tensor operands the * destination* .
104
+ To simplify this problem, One-Shot Bufferize was designed to take advantage of
105
+ * destination-passing style* . This form exists in itself independently of
106
+ bufferization and is tied to SSA semantics: many ops are “updating” part of
107
+ their input SSA variable. For example the LLVM instruction
108
+ [ ` insertelement ` ] ( https://llvm.org/docs/LangRef.html#insertelement-instruction )
109
+ is inserting an element inside a vector. Since SSA values are immutable, the
110
+ operation returns a copy of the input vector with the element inserted.
111
+ Another example in MLIR is ` linalg.generic ` , which always has an extra ` outs `
112
+ operand which provides the initial values to update (for example when the
113
+ operation is doing a reduction).
114
+
115
+ This input is referred to as "destination" in the following (quotes are
116
+ important as this operand isn't modified in place but copied) and comes into
117
+ place in the context of bufferization as a possible "anchor" for the
118
+ bufferization algorithm. This allows the user to shape the input in a form that
119
+ guarantees close to optimal bufferization result when carefully choosing the
120
+ SSA value used as "destination".
121
+
122
+ For every tensor result, a "destination-passing" style op has a corresponding
123
+ tensor operand. If there aren't any other uses of this tensor, the bufferization
124
+ can alias it with the op result and perform the operation "in-place" by reusing
125
+ the buffer allocated for this "destination" input.
108
126
109
127
As an example, consider the following op: `%0 = tensor.insert %cst into
110
128
%t[ %idx] : tensor<?xf32>`
111
129
112
- ` %t ` is the destination in this example. When choosing a buffer for the result
130
+ ` %t ` is the " destination" in this example. When choosing a buffer for the result
113
131
` %0 ` , denoted as ` buffer(%0) ` , One-Shot Bufferize considers only two options:
114
132
115
- 1 . ` buffer(%0) = buffer(%t) ` , or
133
+ 1 . ` buffer(%0) = buffer(%t) ` : alias the "destination" tensor with the
134
+ result and perform the operation in-place.
116
135
2 . ` buffer(%0) ` is a newly allocated buffer.
117
136
118
137
There may be other buffers in the same function that could potentially be used
119
138
for ` buffer(%0) ` , but those are not considered by One-Shot Bufferize to keep the
120
139
bufferization simple. One-Shot Bufferize could be extended to consider such
121
140
buffers in the future to achieve a better quality of bufferization.
122
141
123
- Tensor ops that are not in destination-passing style always bufferize to a
142
+ Tensor ops that are not in destination-passing style always bufferized to a
124
143
memory allocation. E.g.:
125
144
126
145
``` mlir
@@ -131,10 +150,10 @@ memory allocation. E.g.:
131
150
} : tensor<?xf32>
132
151
```
133
152
134
- The result of ` tensor.generate ` does not have a destination operand, so
153
+ The result of ` tensor.generate ` does not have a " destination" operand, so
135
154
bufferization allocates a new buffer. This could be avoided by choosing an
136
155
op such as ` linalg.generic ` , which can express the same computation with a
137
- destination operand, as specified behind outputs (` outs ` ):
156
+ " destination" operand, as specified behind outputs (` outs ` ):
138
157
139
158
``` mlir
140
159
#map = affine_map<(i) -> (i)>
@@ -159,14 +178,13 @@ slice of a tensor:
159
178
```
160
179
161
180
The above example bufferizes to a ` memref.subview ` , followed by a
162
- "` linalg.generic ` on memrefs" that overwrites the memory of the subview. The
163
- ` tensor.insert_slice ` bufferizes to a no-op (in the absence of RaW conflicts
164
- such as a subsequent read of ` %s ` ).
181
+ "` linalg.generic ` on memrefs" that overwrites the memory of the subview, assuming
182
+ that the slice ` %t ` has no other user. The ` tensor.insert_slice ` then bufferizes
183
+ to a no-op (in the absence of RaW conflicts such as a subsequent read of ` %s ` ).
165
184
166
185
RaW conflicts are detected with an analysis of SSA use-def chains (details
167
186
later). One-Shot Bufferize works best if there is a single SSA use-def chain,
168
- where the result of a tensor op is the destination operand of the next tensor
169
- ops, e.g.:
187
+ where the result of a tensor op is the operand of the next tensor ops, e.g.:
170
188
171
189
``` mlir
172
190
%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
0 commit comments