27
27
## Usage
28
28
29
29
In order to enable ` do concurrent ` to OpenMP mapping, ` flang ` adds a new
30
- compiler flag: ` -fdo-concurrent-to-openmp ` . This flags has 3 possible values:
30
+ compiler flag: ` -fdo-concurrent-to-openmp ` . This flag has 3 possible values:
31
31
1 . ` host ` : this maps ` do concurent ` loops to run in parallel on the host CPU.
32
32
This maps such loops to the equivalent of ` omp parallel do ` .
33
- 2 . ` device ` : this maps ` do concurent ` loops to run in parallel on a device
34
- (GPU). This maps such loops to the equivalent of `omp target teams
35
- distribute parallel do`.
36
- 3 . ` none ` : this disables ` do concurrent ` mapping altogether. In such case, such
33
+ 2 . ` device ` : this maps ` do concurent ` loops to run in parallel on a target device.
34
+ This maps such loops to the equivalent of
35
+ ` omp target teams distribute parallel do` .
36
+ 3 . ` none ` : this disables ` do concurrent ` mapping altogether. In that case, such
37
37
loops are emitted as sequential loops.
38
38
39
- The above compiler switch is currently avaialble only when OpenMP is also
39
+ The above compiler switch is currently available only when OpenMP is also
40
40
enabled. So you need to provide the following options to flang in order to
41
41
enable it:
42
42
```
@@ -54,13 +54,13 @@ that:
54
54
To describe current status in more detail, following is a description of how
55
55
the pass currently behaves for single-range loops and then for multi-range
56
56
loops. The following sub-sections describe the status of the downstream
57
- implementation on the AMD's ROCm fork( * ) . We are working on upstreaming the
57
+ implementation on the AMD's ROCm fork[ ^ 1 ] . We are working on upstreaming the
58
58
downstream implementation gradually and this document will be updated to reflect
59
59
such upstreaming process. Example LIT tests referenced below might also be only
60
60
be available in the ROCm fork and will upstream with the relevant parts of the
61
61
code.
62
62
63
- ( * ) https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
63
+ [ ^ 1 ] : https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
64
64
65
65
### Single-range loops
66
66
@@ -211,8 +211,8 @@ loops and map them as "collapsed" loops in OpenMP.
211
211
212
212
Loop-nest detection is currently limited to the scenario described in the previous
213
213
section. However, this is quite limited and can be extended in the future to cover
214
- more cases. For example, for the following loop nest, even thought , both loops are
215
- perfectly nested; at the moment, only the outer loop is parallized :
214
+ more cases. For example, for the following loop nest, even though , both loops are
215
+ perfectly nested; at the moment, only the outer loop is parallelized :
216
216
``` fortran
217
217
do concurrent(i=1:n)
218
218
do concurrent(j=1:m)
@@ -221,9 +221,9 @@ do concurrent(i=1:n)
221
221
end do
222
222
```
223
223
224
- Similary for the following loop nest, even though the intervening statement ` x = 41 `
225
- does not have any memory effects that would affect parallization , this nest is
226
- not parallized as well (only the outer loop is).
224
+ Similarly, for the following loop nest, even though the intervening statement ` x = 41 `
225
+ does not have any memory effects that would affect parallelization , this nest is
226
+ not parallelized as well (only the outer loop is).
227
227
228
228
``` fortran
229
229
do concurrent(i=1:n)
@@ -244,7 +244,7 @@ of what is and is not detected as a perfect loop nest.
244
244
245
245
### Data environment
246
246
247
- By default, variables that are used inside a ` do concurernt ` loop nest are
247
+ By default, variables that are used inside a ` do concurrent ` loop nest are
248
248
either treated as ` shared ` in case of mapping to ` host ` , or mapped into the
249
249
` target ` region using a ` map ` clause in case of mapping to ` device ` . The only
250
250
exceptions to this are:
@@ -253,20 +253,20 @@ exceptions to this are:
253
253
examples above.
254
254
1 . any values that are from allocations outside the loop nest and used
255
255
exclusively inside of it. In such cases, a local privatized
256
- value is created in the OpenMP region to prevent multiple teams of threads
257
- from accessing and destroying the same memory block which causes runtime
256
+ copy is created in the OpenMP region to prevent multiple teams of threads
257
+ from accessing and destroying the same memory block, which causes runtime
258
258
issues. For an example of such cases, see
259
259
` flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 ` .
260
260
261
- Implicit mapping detection (for mapping to the GPU ) is still quite limited and
262
- work to make it smarter is underway for both OpenMP in general and ` do concurrent `
263
- mapping.
261
+ Implicit mapping detection (for mapping to the target device ) is still quite
262
+ limited and work to make it smarter is underway for both OpenMP in general
263
+ and ` do concurrent ` mapping.
264
264
265
265
#### Non-perfectly-nested loops' IVs
266
266
267
267
For non-perfectly-nested loops, the IVs are still treated as ` shared ` or
268
268
` map ` entries as pointed out above. This ** might not** be consistent with what
269
- the Fortran specficiation tells us. In particular, taking the following
269
+ the Fortran specification tells us. In particular, taking the following
270
270
snippets from the spec (version 2023) into account:
271
271
272
272
> § 3.35
@@ -277,9 +277,9 @@ snippets from the spec (version 2023) into account:
277
277
> § 19.4
278
278
> ------
279
279
> A variable that appears as an index-name in a FORALL or DO CONCURRENT
280
- > construct, or ... is a construct entity. A variable that has LOCAL or
280
+ > construct [ ...] is a construct entity. A variable that has LOCAL or
281
281
> LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
282
- > ...
282
+ > [ ...]
283
283
> The name of a variable that appears as an index-name in a DO CONCURRENT
284
284
> construct, FORALL statement, or FORALL construct has a scope of the statement
285
285
> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
@@ -288,7 +288,7 @@ snippets from the spec (version 2023) into account:
288
288
From the above quotes, it seems there is an equivalence between the IV of a `do
289
289
concurrent` loop and a variable with a ` LOCAL` locality specifier (equivalent
290
290
to OpenMP's ` private ` clause). Which means that we should probably
291
- localize/privatize a ` do concurernt ` loop's IV even if it is not perfectly
291
+ localize/privatize a ` do concurrent ` loop's IV even if it is not perfectly
292
292
nested in the nest we are parallelizing. For now, however, we ** do not** do
293
293
that as pointed out previously. In the near future, we propose a middle-ground
294
294
solution (see the Next steps section for more details).
@@ -327,8 +327,8 @@ At the moment, the FIR dialect does not have a way to model locality specifiers
327
327
on the IR level. Instead, something similar to early/eager privatization in OpenMP
328
328
is done for the locality specifiers in ` fir.do_loop ` ops. Having locality specifier
329
329
modelled in a way similar to delayed privatization (i.e. the ` omp.private ` op) and
330
- reductions (i.e. the ` omp.delcare_reduction ` op) can make mapping ` do concurrent `
331
- to OpenMP (and other parallization models) much easier.
330
+ reductions (i.e. the ` omp.declare_reduction ` op) can make mapping ` do concurrent `
331
+ to OpenMP (and other parallel programming models) much easier.
332
332
333
333
Therefore, one way to approach this problem is to extract the TableGen records
334
334
for relevant OpenMP clauses in a shared dialect for "data environment management"
@@ -345,7 +345,7 @@ logic of loop nests needs to be implemented.
345
345
### Data-dependence analysis
346
346
347
347
Right now, we map loop nests without analysing whether such mapping is safe to
348
- do or not. We probalby need to at least warn the use of unsafe loop nests due
348
+ do or not. We probably need to at least warn the use of unsafe loop nests due
349
349
to loop-carried dependencies.
350
350
351
351
### Non-rectangular loop nests
@@ -362,7 +362,7 @@ end do
362
362
We defer this to the (hopefully) near future when we get the conversion in a
363
363
good share for the samples/projects at hand.
364
364
365
- ### Generalizing the pass to other parallization models
365
+ ### Generalizing the pass to other parallel programming models
366
366
367
367
Once we have a stable and capable ` do concurrent ` to OpenMP mapping, we can take
368
368
this in a more generalized direction and allow the pass to target other models;
0 commit comments