Skip to content

Commit e24b6d2

Browse files
committed
Handle some review comments
1 parent d1ed094 commit e24b6d2

File tree

7 files changed

+44
-38
lines changed

7 files changed

+44
-38
lines changed

clang/include/clang/Driver/Options.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6920,7 +6920,7 @@ defm loop_versioning : BoolOptionWithoutMarshalling<"f", "version-loops-for-stri
69206920
def fhermetic_module_files : Flag<["-"], "fhermetic-module-files">, Group<f_Group>,
69216921
HelpText<"Emit hermetic module files (no nested USE association)">;
69226922

6923-
def do_concurrent_parallel_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
6923+
def do_concurrent_to_openmp_EQ : Joined<["-"], "fdo-concurrent-to-openmp=">,
69246924
HelpText<"Try to map `do concurrent` loops to OpenMP [none|host|device]">,
69256925
Values<"none,host,device">;
69266926
} // let Visibility = [FC1Option, FlangOption]

clang/lib/Driver/ToolChains/Flang.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ void Flang::addCodegenOptions(const ArgList &Args,
153153
CmdArgs.push_back("-fversion-loops-for-stride");
154154

155155
Args.addAllArgs(CmdArgs,
156-
{options::OPT_do_concurrent_parallel_EQ,
156+
{options::OPT_do_concurrent_to_openmp_EQ,
157157
options::OPT_flang_experimental_hlfir,
158158
options::OPT_flang_deprecated_no_hlfir,
159159
options::OPT_fno_ppc_native_vec_elem_order,

flang/docs/DoConcurrentConversionToOpenMP.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,16 @@ are:
2727
## Usage
2828

2929
In order to enable `do concurrent` to OpenMP mapping, `flang` adds a new
30-
compiler flag: `-fdo-concurrent-to-openmp`. This flags has 3 possible values:
30+
compiler flag: `-fdo-concurrent-to-openmp`. This flag has 3 possible values:
3131
1. `host`: this maps `do concurent` loops to run in parallel on the host CPU.
3232
This maps such loops to the equivalent of `omp parallel do`.
33-
2. `device`: this maps `do concurent` loops to run in parallel on a device
34-
(GPU). This maps such loops to the equivalent of `omp target teams
35-
distribute parallel do`.
36-
3. `none`: this disables `do concurrent` mapping altogether. In such case, such
33+
2. `device`: this maps `do concurent` loops to run in parallel on a target device.
34+
This maps such loops to the equivalent of
35+
`omp target teams distribute parallel do`.
36+
3. `none`: this disables `do concurrent` mapping altogether. In that case, such
3737
loops are emitted as sequential loops.
3838

39-
The above compiler switch is currently avaialble only when OpenMP is also
39+
The above compiler switch is currently available only when OpenMP is also
4040
enabled. So you need to provide the following options to flang in order to
4141
enable it:
4242
```
@@ -54,13 +54,13 @@ that:
5454
To describe current status in more detail, following is a description of how
5555
the pass currently behaves for single-range loops and then for multi-range
5656
loops. The following sub-sections describe the status of the downstream
57-
implementation on the AMD's ROCm fork(*). We are working on upstreaming the
57+
implementation on the AMD's ROCm fork[^1]. We are working on upstreaming the
5858
downstream implementation gradually and this document will be updated to reflect
5959
such upstreaming process. Example LIT tests referenced below might also be only
6060
be available in the ROCm fork and will upstream with the relevant parts of the
6161
code.
6262

63-
(*) https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
63+
[^1]: https://github.com/ROCm/llvm-project/blob/amd-staging/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
6464

6565
### Single-range loops
6666

@@ -211,8 +211,8 @@ loops and map them as "collapsed" loops in OpenMP.
211211

212212
Loop-nest detection is currently limited to the scenario described in the previous
213213
section. However, this is quite limited and can be extended in the future to cover
214-
more cases. For example, for the following loop nest, even thought, both loops are
215-
perfectly nested; at the moment, only the outer loop is parallized:
214+
more cases. For example, for the following loop nest, even though, both loops are
215+
perfectly nested; at the moment, only the outer loop is parallelized:
216216
```fortran
217217
do concurrent(i=1:n)
218218
do concurrent(j=1:m)
@@ -221,9 +221,9 @@ do concurrent(i=1:n)
221221
end do
222222
```
223223

224-
Similary for the following loop nest, even though the intervening statement `x = 41`
225-
does not have any memory effects that would affect parallization, this nest is
226-
not parallized as well (only the outer loop is).
224+
Similarly, for the following loop nest, even though the intervening statement `x = 41`
225+
does not have any memory effects that would affect parallelization, this nest is
226+
not parallelized as well (only the outer loop is).
227227

228228
```fortran
229229
do concurrent(i=1:n)
@@ -244,7 +244,7 @@ of what is and is not detected as a perfect loop nest.
244244

245245
### Data environment
246246

247-
By default, variables that are used inside a `do concurernt` loop nest are
247+
By default, variables that are used inside a `do concurrent` loop nest are
248248
either treated as `shared` in case of mapping to `host`, or mapped into the
249249
`target` region using a `map` clause in case of mapping to `device`. The only
250250
exceptions to this are:
@@ -253,20 +253,20 @@ exceptions to this are:
253253
examples above.
254254
1. any values that are from allocations outside the loop nest and used
255255
exclusively inside of it. In such cases, a local privatized
256-
value is created in the OpenMP region to prevent multiple teams of threads
257-
from accessing and destroying the same memory block which causes runtime
256+
copy is created in the OpenMP region to prevent multiple teams of threads
257+
from accessing and destroying the same memory block, which causes runtime
258258
issues. For an example of such cases, see
259259
`flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
260260

261-
Implicit mapping detection (for mapping to the GPU) is still quite limited and
262-
work to make it smarter is underway for both OpenMP in general and `do concurrent`
263-
mapping.
261+
Implicit mapping detection (for mapping to the target device) is still quite
262+
limited and work to make it smarter is underway for both OpenMP in general
263+
and `do concurrent` mapping.
264264

265265
#### Non-perfectly-nested loops' IVs
266266

267267
For non-perfectly-nested loops, the IVs are still treated as `shared` or
268268
`map` entries as pointed out above. This **might not** be consistent with what
269-
the Fortran specficiation tells us. In particular, taking the following
269+
the Fortran specification tells us. In particular, taking the following
270270
snippets from the spec (version 2023) into account:
271271

272272
> § 3.35
@@ -277,9 +277,9 @@ snippets from the spec (version 2023) into account:
277277
> § 19.4
278278
> ------
279279
> A variable that appears as an index-name in a FORALL or DO CONCURRENT
280-
> construct, or ... is a construct entity. A variable that has LOCAL or
280+
> construct [...] is a construct entity. A variable that has LOCAL or
281281
> LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
282-
> ...
282+
> [...]
283283
> The name of a variable that appears as an index-name in a DO CONCURRENT
284284
> construct, FORALL statement, or FORALL construct has a scope of the statement
285285
> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
@@ -288,7 +288,7 @@ snippets from the spec (version 2023) into account:
288288
From the above quotes, it seems there is an equivalence between the IV of a `do
289289
concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
290290
to OpenMP's `private` clause). Which means that we should probably
291-
localize/privatize a `do concurernt` loop's IV even if it is not perfectly
291+
localize/privatize a `do concurrent` loop's IV even if it is not perfectly
292292
nested in the nest we are parallelizing. For now, however, we **do not** do
293293
that as pointed out previously. In the near future, we propose a middle-ground
294294
solution (see the Next steps section for more details).
@@ -327,8 +327,8 @@ At the moment, the FIR dialect does not have a way to model locality specifiers
327327
on the IR level. Instead, something similar to early/eager privatization in OpenMP
328328
is done for the locality specifiers in `fir.do_loop` ops. Having locality specifier
329329
modelled in a way similar to delayed privatization (i.e. the `omp.private` op) and
330-
reductions (i.e. the `omp.delcare_reduction` op) can make mapping `do concurrent`
331-
to OpenMP (and other parallization models) much easier.
330+
reductions (i.e. the `omp.declare_reduction` op) can make mapping `do concurrent`
331+
to OpenMP (and other parallel programming models) much easier.
332332

333333
Therefore, one way to approach this problem is to extract the TableGen records
334334
for relevant OpenMP clauses in a shared dialect for "data environment management"
@@ -345,7 +345,7 @@ logic of loop nests needs to be implemented.
345345
### Data-dependence analysis
346346

347347
Right now, we map loop nests without analysing whether such mapping is safe to
348-
do or not. We probalby need to at least warn the use of unsafe loop nests due
348+
do or not. We probably need to at least warn the use of unsafe loop nests due
349349
to loop-carried dependencies.
350350

351351
### Non-rectangular loop nests
@@ -362,7 +362,7 @@ end do
362362
We defer this to the (hopefully) near future when we get the conversion in a
363363
good share for the samples/projects at hand.
364364

365-
### Generalizing the pass to other parallization models
365+
### Generalizing the pass to other parallel programming models
366366

367367
Once we have a stable and capable `do concurrent` to OpenMP mapping, we can take
368368
this in a more generalized direction and allow the pass to target other models;

flang/include/flang/Optimizer/OpenMP/Passes.td

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def FunctionFilteringPass : Pass<"omp-function-filtering"> {
5050
];
5151
}
5252

53-
def DoConcurrentConversionPass : Pass<"fopenmp-do-concurrent-conversion", "mlir::func::FuncOp"> {
53+
def DoConcurrentConversionPass : Pass<"omp-do-concurrent-conversion", "mlir::func::FuncOp"> {
5454
let summary = "Map `DO CONCURRENT` loops to OpenMP worksharing loops.";
5555

5656
let description = [{ This is an experimental pass to map `DO CONCURRENT` loops
@@ -59,7 +59,7 @@ def DoConcurrentConversionPass : Pass<"fopenmp-do-concurrent-conversion", "mlir:
5959
For now the following is supported:
6060
- Mapping simple loops to `parallel do`.
6161

62-
Still to TODO:
62+
Still TODO:
6363
- More extensive testing.
6464
}];
6565

flang/include/flang/Optimizer/Passes/Pipelines.h

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,12 +128,15 @@ void createHLFIRToFIRPassPipeline(
128128
mlir::PassManager &pm, bool enableOpenMP,
129129
llvm::OptimizationLevel optLevel = defaultOptLevel);
130130

131-
using DoConcurrentMappingKind =
132-
Fortran::frontend::CodeGenOptions::DoConcurrentMappingKind;
133-
134131
struct OpenMPFIRPassPipelineOpts {
132+
/// Whether code is being generated for a target device rather than the host
133+
/// device
135134
bool isTargetDevice;
136-
DoConcurrentMappingKind doConcurrentMappingKind;
135+
136+
/// Controls how to map `do concurrent` loops; to device, host, or none at
137+
/// all.
138+
Fortran::frontend::CodeGenOptions::DoConcurrentMappingKind
139+
doConcurrentMappingKind;
137140
};
138141

139142
/// Create a pass pipeline for handling certain OpenMP transformations needed
@@ -143,8 +146,8 @@ struct OpenMPFIRPassPipelineOpts {
143146
/// that the FIR is correct with respect to OpenMP operations/attributes.
144147
///
145148
/// \param pm - MLIR pass manager that will hold the pipeline definition.
146-
/// \param isTargetDevice - Whether code is being generated for a target device
147-
/// rather than the host device.
149+
/// \param opts - options to control OpenMP code-gen; see struct docs for more
150+
/// details.
148151
void createOpenMPFIRPassPipeline(mlir::PassManager &pm,
149152
OpenMPFIRPassPipelineOpts opts);
150153

flang/lib/Frontend/CompilerInvocation.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ static bool parseDoConcurrentMapping(Fortran::frontend::CodeGenOptions &opts,
161161
llvm::opt::ArgList &args,
162162
clang::DiagnosticsEngine &diags) {
163163
llvm::opt::Arg *arg =
164-
args.getLastArg(clang::driver::options::OPT_do_concurrent_parallel_EQ);
164+
args.getLastArg(clang::driver::options::OPT_do_concurrent_to_openmp_EQ);
165165
if (!arg)
166166
return true;
167167

flang/lib/Optimizer/Passes/Pipelines.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,9 @@ void createHLFIRToFIRPassPipeline(mlir::PassManager &pm, bool enableOpenMP,
280280
/// rather than the host device.
281281
void createOpenMPFIRPassPipeline(mlir::PassManager &pm,
282282
OpenMPFIRPassPipelineOpts opts) {
283+
using DoConcurrentMappingKind =
284+
Fortran::frontend::CodeGenOptions::DoConcurrentMappingKind;
285+
283286
if (opts.doConcurrentMappingKind != DoConcurrentMappingKind::DCMK_None)
284287
pm.addPass(flangomp::createDoConcurrentConversionPass(
285288
opts.doConcurrentMappingKind == DoConcurrentMappingKind::DCMK_Device));

0 commit comments

Comments
 (0)