Skip to content

Commit afb05cd

Browse files
authored
[Flang][MLIR][OpenMP] Create a deferred declare target marking process for Bridge.cpp (#78502)
This patch seeks to create a process that happens on module finalization for OpenMP, in which a list of operations that had declare target directives applied to them and were not generated at the time of processing the original declare target directive are re-checked to apply the appropriate declare target semantics. This works by maintaining a vector of declare target related data inside of the FIR converter, in this case the symbol and the two relevant unsigned integers representing the enumerators. This vector is added to via a new function called from Bridge.cpp, insertDeferredDeclareTargets, which happens prior to the processing of the directive (similarly to getDeclareTargetFunctionDevice currently for requires), it effectively checks if the Operation the declare target directive is applied to currently exists, if it doesn't it appends to the vector. This is a seperate function to the processing of the declare target via the overloaded genOMP as we unfortunately do not have access to the list without passing it through every call, as the AbstractConverter we pass will not allow access to it (I've seen no other cases of casting it to a FirConverter, so I opted to not do that). The list is then processed at the end of the module in the finalizeOpenMPLowering function in Bridge by calling a new function markDelayedDeclareTargetFunctions which marks the latently generated operations. In certain cases, some still will not be generated, e.g. if an interface is defined, marked as declare target, but has no definition or usage in the module then it will not be emitted to the module, so due to these cases we must silently ignore when an operation has not been found via it's symbol. The main use-case for this (although, I imagine there is others) is for processing interfaces that have been declared in a module with a declare target directive but do not have their implementation defined in the same module. For example, inside of a seperate C++ module that will be linked in. In cases where the interface is called inside of a target region it'll be marked as used on device appropriately (although, realistically a user should explicitly mark it to match the corresponding definition), however, in cases where it's used in a non-clear manner through something like a function pointer passed to an external call we require this explicit marking, which this patch adds support for (currently will cause the compiler to crash). This patch also adds documentation on the declare target process and mechanisms within the compiler currently.
1 parent 3b84b6f commit afb05cd

File tree

7 files changed

+476
-41
lines changed

7 files changed

+476
-41
lines changed

flang/docs/OpenMP-declare-target.md

Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
<!--===- docs/OpenMP-declare-target.md
2+
3+
Part of the LLVM Project, under the Apache License v2.0 with LLVM
4+
Exceptions.
5+
See https://llvm.org/LICENSE.txt for license information.
6+
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
7+
8+
-->
9+
10+
# Introduction to Declare Target
11+
12+
In OpenMP `declare target` is a directive that can be applied to a function or
13+
variable (primarily global) to notate to the compiler that it should be
14+
generated in a particular device's environment. In essence whether something
15+
should be emitted for host or device, or both. An example of its usage for
16+
both data and functions can be seen below.
17+
18+
```Fortran
19+
module test_0
20+
integer :: sp = 0
21+
!$omp declare target link(sp)
22+
end module test_0
23+
24+
program main
25+
use test_0
26+
!$omp target map(tofrom:sp)
27+
sp = 1
28+
!$omp end target
29+
end program
30+
```
31+
32+
In the above example, we create a variable in a separate module, mark it
33+
as `declare target` and then map it, embedding it into the device IR and
34+
assigning to it.
35+
36+
37+
```Fortran
38+
function func_t_device() result(i)
39+
!$omp declare target to(func_t_device) device_type(nohost)
40+
INTEGER :: I
41+
I = 1
42+
end function func_t_device
43+
44+
program main
45+
!$omp target
46+
call func_t_device()
47+
!$omp end target
48+
end program
49+
```
50+
51+
In the above example, we are stating that a function is required on device
52+
utilising `declare target`, and that we will not be utilising it on host,
53+
so we are in theory free to remove or ignore it there. A user could also
54+
in this case, leave off the `declare target` from the function and it
55+
would be implicitly marked `declare target any` (for both host and device),
56+
as it's been utilised within a target region.
57+
58+
# Declare Target as represented in the OpenMP Dialect
59+
60+
In the OpenMP Dialect `declare target` is not represented by a specific
61+
`operation`. Instead, it's an OpenMP dialect specific `attribute` that can be
62+
applied to any operation in any dialect, which helps to simplify the
63+
utilisation of it. Rather than replacing or modifying existing global or
64+
function `operations` in a dialect, it applies to it as extra metadata that
65+
the lowering can use in different ways as is necessary.
66+
67+
The `attribute` is composed of multiple fields representing the clauses you
68+
would find on the `declare target` directive i.e. device type (`nohost`,
69+
`any`, `host`) or the capture clause (`link` or `to`). A small example of
70+
`declare target` applied to a Fortran `real` can be found below:
71+
72+
```
73+
fir.global internal @_QFEi {omp.declare_target =
74+
#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 {
75+
%0 = fir.undefined f32
76+
fir.has_value %0 : f32
77+
}
78+
```
79+
80+
This would look similar for function style `operations`.
81+
82+
The application and access of this attribute is aided by an OpenMP Dialect
83+
MLIR Interface named `DeclareTargetInterface`, which can be utilised on
84+
operations to access the appropriate interface functions, e.g.:
85+
86+
```C++
87+
auto declareTargetGlobal =
88+
llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation());
89+
declareTargetGlobal.isDeclareTarget();
90+
```
91+
92+
# Declare Target Fortran OpenMP Lowering
93+
94+
The initial lowering of `declare target` to MLIR for both use-cases is done
95+
inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However,
96+
some direct calls to `declare target` related functions from Flang's
97+
lowering bridge in flang/lib/Lower/Bridge.cpp are made.
98+
99+
The marking of operations with the declare target attribute happens in two
100+
phases, the second one optional and contingent on the first failing. The
101+
initial phase happens when the declare target directive and its clauses
102+
are initially processed, with the primary data gathering for the directive and
103+
clause happening in a function called `getDeclareTargetInfo`. This is then used
104+
to feed the `markDeclareTarget` function, which does the actual marking
105+
utilising the `DeclareTargetInterface`. If it encounters a variable or function
106+
that has been marked twice over multiple directives with two differing device
107+
types (e.g. `host`, `nohost`), then it will swap the device type to `any`.
108+
109+
Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from the
110+
lowering bridge, we are also invoking another function called
111+
`gatherOpenMPDeferredDeclareTargets`, which gathers information relevant to the
112+
application of the `declare target` attribute. This information
113+
includes the symbol that it should be applied to, device type clause,
114+
and capture clause, and it is stored in a vector that is part of the lowering
115+
bridge's instantiation of the `AbstractConverter`. It is only stored if we
116+
encounter a function or variable symbol that does not have an operation
117+
instantiated for it yet. This cannot happen as part of the
118+
initial marking as we must store this data in the lowering bridge and we
119+
only have access to the abstract version of the converter via the OpenMP
120+
lowering.
121+
122+
The information produced by the first phase is used in the second phase,
123+
which is a form of deferred processing of the `declare target` marked
124+
operations that have delayed generation and cannot be proccessed in the
125+
first phase. The main notable case this occurs currently is when a
126+
Fortran function interface has been marked. This is
127+
done via the function
128+
`markOpenMPDeferredDeclareTargetFunctions`, which is called from the lowering
129+
bridge at the end of the lowering process allowing us to mark those where
130+
possible. It iterates over the data previously gathered by
131+
`gatherOpenMPDeferredDeclareTargets`
132+
checking if any of the recorded symbols have now had their corresponding
133+
operations instantiated and applying the declare target attribute where
134+
possible utilising `markDeclareTarget`. However, it must be noted that it
135+
is still possible for operations not to be generated for certain symbols,
136+
in particular the case of function interfaces that are not directly used
137+
or defined within the current module. This means we cannot emit errors in
138+
the case of left-over unmarked symbols. These must (and should) be caught
139+
by the initial semantic analysis.
140+
141+
NOTE: `declare target` can be applied to implicit `SAVE` attributed variables.
142+
However, by default Flang does not represent these as `GlobalOp`'s, which means
143+
we cannot tag and lower them as `declare target` normally. Instead, similarly
144+
to the way `threadprivate` handles these cases, we raise and initialize the
145+
variable as an internal `GlobalOp` and apply the attribute. This occurs in the
146+
flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`.
147+
148+
# Declare Target Transformation Passes for Flang
149+
150+
There are currently two passes within Flang that are related to the processing
151+
of `declare target`:
152+
* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured
153+
(called from) in `target` regions or other `declare target` marked functions as
154+
`declare target`. It does so recursively, i.e. nested calls will also be
155+
implicitly marked. It currently will try to mark things as conservatively as
156+
possible, e.g. if captured in a `target` region it will apply `nohost`, unless
157+
it encounters a `host` `declare target` in which case it will apply the `any`
158+
device type. Functions are handled similarly, except we utilise the parent's
159+
device type where possible.
160+
* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget`
161+
pass, and its job is to conservatively remove host functions from
162+
the module where possible when compiling for the device. This helps make
163+
sure that most incompatible code for the host is not lowered for the
164+
device. Host functions with `target` regions in them need to be preserved
165+
(e.g. for lowering the `target region`(s) inside). Otherwise, it removes
166+
any function marked as a `declare target host` function and any uses will be
167+
replaced with `undef`'s so that the remaining host code doesn't become broken.
168+
Host functions with `target` regions are marked with a `declare target host`
169+
attribute so they will be removed after outlining the target regions contained
170+
inside.
171+
172+
While this infrastructure could be generally applicable to more than just Flang,
173+
it is only utilised in the Flang frontend, so it resides there rather than in
174+
the OpenMP dialect codebase.
175+
176+
# Declare Target OpenMP Dialect To LLVM-IR Lowering
177+
178+
The OpenMP dialect lowering of `declare target` is done through the
179+
`amendOperation` flow, as it's not an `operation` but rather an
180+
`attribute`. This is triggered immediately after the corresponding
181+
operation has been lowered to LLVM-IR. As it is applicable to
182+
different types of operations, we must specialise this function for
183+
each operation type that we may encounter. Currently, this is
184+
`GlobalOp`'s and `FuncOp`'s.
185+
186+
`FuncOp` processing is fairly simple. When compiling for the device,
187+
`host` marked functions are removed, including those that could not
188+
be removed earlier due to having `target` directives within. This
189+
leaves `any`, `device` or indeterminable functions left in the
190+
module to lower further. When compiling for the host, no filtering is
191+
done because `nohost` functions must be available as a fallback
192+
implementation.
193+
194+
For `GlobalOp`'s, the processing is a little more complex. We
195+
currently leverage the `registerTargetGlobalVariable` and
196+
`getAddrOfDeclareTargetVar` `OMPIRBuilder` functions shared with Clang.
197+
These two functions invoke each other depending on the clauses and options
198+
provided to the `OMPIRBuilder` (in particular, unified shared memory). Their
199+
main purposes are the generation of a new global device pointer with a
200+
"ref_" prefix on the device and enqueuing metadata generation by the
201+
`OMPIRBuilder` to be produced at module finalization time. This is done
202+
for both host and device and it links the newly generated device global
203+
pointer and the host pointer together across the two modules.
204+
205+
Similarly to other metadata (e.g. for `TargetOp`) that is shared across
206+
both host and device modules, processing of `GlobalOp`'s in the device
207+
needs access to the previously generated host IR file, which is done
208+
through another `attribute` applied to the `ModuleOp` by the compiler
209+
frontend. The file is loaded in and consumed by the `OMPIRBuilder` to
210+
populate it's `OffloadInfoManager` data structures, keeping host and
211+
device appropriately synchronised.
212+
213+
The second (and more important to remember) is that as we effectively replace
214+
the original LLVM-IR generated for the `declare target` marked `GlobalOp` we
215+
have some corrections we need to do for `TargetOp`'s (or other region
216+
operations that use them directly) which still refer to the original lowered
217+
global operation. This is done via `handleDeclareTargetMapVar` which is invoked
218+
as the final function and alteration to the lowered `target` region, it's only
219+
invoked for device as it's only required in the case where we have emitted the
220+
"ref" pointer , and it effectively replaces all uses of the originally lowered
221+
global symbol, with our new global ref pointer's symbol. Currently we do not
222+
remove or delete the old symbol, this is due to the fact that the same symbol
223+
can be utilised across multiple target regions, if we remove it, we risk
224+
breaking lowerings of target regions that will be processed at a later time.
225+
To appropriately delete these no longer necessary symbols we would need a
226+
deferred removal process at the end of the module, which is currently not in
227+
place. It may be possible to store this information in the OMPIRBuilder and
228+
then perform this cleanup process on finalization, but this is open for
229+
discussion and implementation still.
230+
231+
# Current Support
232+
233+
For the moment, `declare target` should work for:
234+
* Marking functions/subroutines and function/subroutine interfaces for
235+
generation on host, device or both.
236+
* Implicit function/subroutine capture for calls emitted in a `target` region
237+
or explicitly marked `declare target` function/subroutine. Note: Calls made
238+
via arguments passed to other functions must still be themselves marked
239+
`declare target`, e.g. passing a `C` function pointer and invoking it, then
240+
the interface and the `C` function in the other module must be marked
241+
`declare target`, with the same type of marking as indicated by the
242+
specification.
243+
* Marking global variables with `declare target`'s `link` clause and mapping
244+
the data to the device data environment utilising `declare target`. This may
245+
not work for all types yet, but for scalars and arrays of scalars, it
246+
should.
247+
248+
Doesn't work for, or needs further testing for:
249+
* Marking the following types with `declare target link` (needs further
250+
testing):
251+
* Descriptor based types, e.g. pointers/allocatables.
252+
* Derived types.
253+
* Members of derived types (use-case needs legality checking with OpenMP
254+
specification).
255+
* Marking global variables with `declare target`'s `to` clause. A lot of the
256+
lowering should exist, but it needs further testing and likely some further
257+
changes to fully function.

flang/docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ on how to get in touch with us and to learn more about the current status.
6868
OpenACC
6969
OpenACC-descriptor-management.md
7070
OpenMP-4.5-grammar.md
71+
OpenMP-declare-target
7172
OpenMP-descriptor-management
7273
OpenMP-semantics
7374
OptionComparison

flang/include/flang/Lower/OpenMP.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,19 @@
1313
#ifndef FORTRAN_LOWER_OPENMP_H
1414
#define FORTRAN_LOWER_OPENMP_H
1515

16+
#include "llvm/ADT/SmallVector.h"
17+
1618
#include <cinttypes>
19+
#include <utility>
1720

1821
namespace mlir {
1922
class Value;
2023
class Operation;
2124
class Location;
25+
namespace omp {
26+
enum class DeclareTargetDeviceType : uint32_t;
27+
enum class DeclareTargetCaptureClause : uint32_t;
28+
} // namespace omp
2229
} // namespace mlir
2330

2431
namespace fir {
@@ -49,6 +56,12 @@ struct Evaluation;
4956
struct Variable;
5057
} // namespace pft
5158

59+
struct OMPDeferredDeclareTargetInfo {
60+
mlir::omp::DeclareTargetCaptureClause declareTargetCaptureClause;
61+
mlir::omp::DeclareTargetDeviceType declareTargetDeviceType;
62+
const Fortran::semantics::Symbol &sym;
63+
};
64+
5265
// Generate the OpenMP terminator for Operation at Location.
5366
mlir::Operation *genOpenMPTerminator(fir::FirOpBuilder &, mlir::Operation *,
5467
mlir::Location);
@@ -86,6 +99,14 @@ bool isOpenMPDeviceDeclareTarget(Fortran::lower::AbstractConverter &,
8699
Fortran::semantics::SemanticsContext &,
87100
Fortran::lower::pft::Evaluation &,
88101
const parser::OpenMPDeclarativeConstruct &);
102+
void gatherOpenMPDeferredDeclareTargets(
103+
Fortran::lower::AbstractConverter &, Fortran::semantics::SemanticsContext &,
104+
Fortran::lower::pft::Evaluation &,
105+
const parser::OpenMPDeclarativeConstruct &,
106+
llvm::SmallVectorImpl<OMPDeferredDeclareTargetInfo> &);
107+
bool markOpenMPDeferredDeclareTargetFunctions(
108+
mlir::Operation *, llvm::SmallVectorImpl<OMPDeferredDeclareTargetInfo> &,
109+
AbstractConverter &);
89110
void genOpenMPRequires(mlir::Operation *, const Fortran::semantics::Symbol *);
90111

91112
} // namespace lower

flang/lib/Lower/Bridge.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2633,6 +2633,9 @@ class FirConverter : public Fortran::lower::AbstractConverter {
26332633
ompDeviceCodeFound ||
26342634
Fortran::lower::isOpenMPDeviceDeclareTarget(
26352635
*this, bridge.getSemanticsContext(), getEval(), ompDecl);
2636+
Fortran::lower::gatherOpenMPDeferredDeclareTargets(
2637+
*this, bridge.getSemanticsContext(), getEval(), ompDecl,
2638+
ompDeferredDeclareTarget);
26362639
genOpenMPDeclarativeConstruct(
26372640
*this, localSymbols, bridge.getSemanticsContext(), getEval(), ompDecl);
26382641
builder->restoreInsertionPoint(insertPt);
@@ -5171,6 +5174,13 @@ class FirConverter : public Fortran::lower::AbstractConverter {
51715174
/// lowering.
51725175
void finalizeOpenMPLowering(
51735176
const Fortran::semantics::Symbol *globalOmpRequiresSymbol) {
5177+
if (!ompDeferredDeclareTarget.empty()) {
5178+
bool deferredDeviceFuncFound =
5179+
Fortran::lower::markOpenMPDeferredDeclareTargetFunctions(
5180+
getModuleOp().getOperation(), ompDeferredDeclareTarget, *this);
5181+
ompDeviceCodeFound = ompDeviceCodeFound || deferredDeviceFuncFound;
5182+
}
5183+
51745184
// Set the module attribute related to OpenMP requires directives
51755185
if (ompDeviceCodeFound)
51765186
Fortran::lower::genOpenMPRequires(getModuleOp().getOperation(),
@@ -5227,6 +5237,13 @@ class FirConverter : public Fortran::lower::AbstractConverter {
52275237
/// intended for device offloading has been detected
52285238
bool ompDeviceCodeFound = false;
52295239

5240+
/// Keeps track of symbols defined as declare target that could not be
5241+
/// processed at the time of lowering the declare target construct, such
5242+
/// as certain cases where interfaces are declared but not defined within
5243+
/// a module.
5244+
llvm::SmallVector<Fortran::lower::OMPDeferredDeclareTargetInfo>
5245+
ompDeferredDeclareTarget;
5246+
52305247
const Fortran::lower::ExprToValueMap *exprValueOverrides{nullptr};
52315248

52325249
/// Stack of derived type under construction to avoid infinite loops when

0 commit comments

Comments
 (0)