@@ -227,16 +227,17 @@ def LoopLikeOpInterface : OpInterface<"LoopLikeOpInterface"> {
227
227
this loop is moved over to the new loop.
228
228
229
229
This method is similar to `replaceWithAdditionalYields` but instead of
230
- returning the value that is actually yielded, this returns the tiles of
231
- the values that are yielded. This allows for unified handling of opreations
232
- like `scf.forall` which dont yield a value from the loop, but instead
233
- the terminator specifies where to insert the tile yielded by the body of
230
+ yielding a value from within the loop, it allows each loop construct
231
+ implementing this method to handle the result of each iteration
232
+ appropriately. This allows for unified handling of operations
233
+ like `scf.forall` which don't yield a value from the loop, but instead
234
+ the terminator specifies where to insert the tile computed by the body of
234
235
the loop. For example,
235
236
236
237
```mlir
237
238
%0 = scf.forall ... shared_outs(%arg0 = %arg1) {
238
239
...
239
- %tiled_value
240
+ %tiled_value = ...
240
241
scf.forall.in_parallel {
241
242
tensor.parallel_insert_slice %tiled_value into %arg0[%o1, %o2]...
242
243
}
@@ -247,13 +248,13 @@ def LoopLikeOpInterface : OpInterface<"LoopLikeOpInterface"> {
247
248
```mlir
248
249
%0 = scf.for ... iter_args(%arg0 = %arg1) {
249
250
...
250
- %tiled_value
251
+ %tiled_value = ...
251
252
%insert = tensor.insert_slice %tiled_value into %arg0[%o1, %o2]...
252
253
scf.yield %insert
253
254
}
254
255
```
255
256
256
- So for the caller, the tiled value (`%tiled_values `) and the offsets
257
+ So for the caller, the tiled value (`%tiled_value `) and the offsets
257
258
`(%o1, %o2)` and sizes (not shown) are generated the same way, but
258
259
the implementation method for the different loop constructs handles
259
260
the difference in representation.
0 commit comments