6
6
7
7
-->
8
8
9
- # ` DO CONCURENT ` mapping to OpenMP
9
+ # ` DO CONCURRENT ` mapping to OpenMP
10
10
11
11
``` {contents}
12
12
---
@@ -17,20 +17,20 @@ local:
17
17
This document seeks to describe the effort to parallelize ` do concurrent ` loops
18
18
by mapping them to OpenMP worksharing constructs. The goals of this document
19
19
are:
20
- * Describing how to instruct ` flang ` to map ` DO CONCURENT ` loops to OpenMP
20
+ * Describing how to instruct ` flang ` to map ` DO CONCURRENT ` loops to OpenMP
21
21
constructs.
22
22
* Tracking the current status of such mapping.
23
- * Describing the limitations of the current implmenentation .
23
+ * Describing the limitations of the current implementation .
24
24
* Describing next steps.
25
25
* Tracking the current upstreaming status (from the AMD ROCm fork).
26
26
27
27
## Usage
28
28
29
29
In order to enable ` do concurrent ` to OpenMP mapping, ` flang ` adds a new
30
30
compiler flag: ` -fdo-concurrent-to-openmp ` . This flag has 3 possible values:
31
- 1 . ` host ` : this maps ` do concurent ` loops to run in parallel on the host CPU.
31
+ 1 . ` host ` : this maps ` do concurrent ` loops to run in parallel on the host CPU.
32
32
This maps such loops to the equivalent of ` omp parallel do ` .
33
- 2 . ` device ` : this maps ` do concurent ` loops to run in parallel on a target device.
33
+ 2 . ` device ` : this maps ` do concurrent ` loops to run in parallel on a target device.
34
34
This maps such loops to the equivalent of
35
35
` omp target teams distribute parallel do ` .
36
36
3 . ` none ` : this disables ` do concurrent ` mapping altogether. In that case, such
@@ -42,6 +42,8 @@ enable it:
42
42
```
43
43
flang ... -fopenmp -fdo-concurrent-to-openmp=[host|device|none] ...
44
44
```
45
+ For mapping to device, the target device architecture must be specified as well.
46
+ See ` -fopenmp-targets ` and ` -foffload-arch ` for more info.
45
47
46
48
## Current status
47
49
@@ -249,7 +251,7 @@ either treated as `shared` in case of mapping to `host`, or mapped into the
249
251
` target ` region using a ` map ` clause in case of mapping to ` device ` . The only
250
252
exceptions to this are:
251
253
1 . the loop's iteration variable(s) (IV) of ** perfect** loop nests. In that
252
- case, for each IV, we allocate a local copy as shown the by the mapping
254
+ case, for each IV, we allocate a local copy as shown by the mapping
253
255
examples above.
254
256
1 . any values that are from allocations outside the loop nest and used
255
257
exclusively inside of it. In such cases, a local privatized
0 commit comments