|
| 1 | +<!--===- docs/OpenMP-declare-target.md |
| 2 | +
|
| 3 | + Part of the LLVM Project, under the Apache License v2.0 with LLVM |
| 4 | + Exceptions. |
| 5 | + See https://llvm.org/LICENSE.txt for license information. |
| 6 | + SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| 7 | +
|
| 8 | +--> |
| 9 | + |
| 10 | +# Introduction to Declare Target |
| 11 | + |
| 12 | +In OpenMP `declare target` is a directive that can be applied to a function or |
| 13 | +variable (primarily global) to notate to the compiler that it should be |
| 14 | +generated in a particular device's environment. In essence whether something |
| 15 | +should be emitted for host or device, or both. An example of its usage for |
| 16 | +both data and functions can be seen below. |
| 17 | + |
| 18 | +```Fortran |
| 19 | +module test_0 |
| 20 | + integer :: sp = 0 |
| 21 | +!$omp declare target link(sp) |
| 22 | +end module test_0 |
| 23 | +
|
| 24 | +program main |
| 25 | + use test_0 |
| 26 | +!$omp target map(tofrom:sp) |
| 27 | + sp = 1 |
| 28 | +!$omp end target |
| 29 | +end program |
| 30 | +``` |
| 31 | + |
| 32 | +In the above example, we create a variable in a separate module, mark it |
| 33 | +as `declare target` and then map it, embedding it into the device IR and |
| 34 | +assigning to it. |
| 35 | + |
| 36 | + |
| 37 | +```Fortran |
| 38 | +function func_t_device() result(i) |
| 39 | + !$omp declare target to(func_t_device) device_type(nohost) |
| 40 | + INTEGER :: I |
| 41 | + I = 1 |
| 42 | +end function func_t_device |
| 43 | +
|
| 44 | +program main |
| 45 | +!$omp target |
| 46 | + call func_t_device() |
| 47 | +!$omp end target |
| 48 | +end program |
| 49 | +``` |
| 50 | + |
| 51 | +In the above example, we are stating that a function is required on device |
| 52 | +utilising `declare target`, and that we will not be utilising it on host, |
| 53 | +so we are in theory free to remove or ignore it there. A user could also |
| 54 | +in this case, leave off the `declare target` from the function and it |
| 55 | +would be implicitly marked `declare target any` (for both host and device), |
| 56 | +as it's been utilised within a target region. |
| 57 | + |
| 58 | +# Declare Target as represented in the OpenMP Dialect |
| 59 | + |
| 60 | +In the OpenMP Dialect `declare target` is not represented by a specific |
| 61 | +`operation`. Instead, it's an OpenMP dialect specific `attribute` that can be |
| 62 | +applied to any operation in any dialect, which helps to simplify the |
| 63 | +utilisation of it. Rather than replacing or modifying existing global or |
| 64 | +function `operations` in a dialect, it applies to it as extra metadata that |
| 65 | +the lowering can use in different ways as is necessary. |
| 66 | + |
| 67 | +The `attribute` is composed of multiple fields representing the clauses you |
| 68 | +would find on the `declare target` directive i.e. device type (`nohost`, |
| 69 | +`any`, `host`) or the capture clause (`link` or `to`). A small example of |
| 70 | +`declare target` applied to a Fortran `real` can be found below: |
| 71 | + |
| 72 | +``` |
| 73 | +fir.global internal @_QFEi {omp.declare_target = |
| 74 | +#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 { |
| 75 | + %0 = fir.undefined f32 |
| 76 | + fir.has_value %0 : f32 |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +This would look similar for function style `operations`. |
| 81 | + |
| 82 | +The application and access of this attribute is aided by an OpenMP Dialect |
| 83 | +MLIR Interface named `DeclareTargetInterface`, which can be utilised on |
| 84 | +operations to access the appropriate interface functions, e.g.: |
| 85 | + |
| 86 | +```C++ |
| 87 | +auto declareTargetGlobal = |
| 88 | +llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation()); |
| 89 | +declareTargetGlobal.isDeclareTarget(); |
| 90 | +``` |
| 91 | + |
| 92 | +# Declare Target Fortran OpenMP Lowering |
| 93 | + |
| 94 | +The initial lowering of `declare target` to MLIR for both use-cases is done |
| 95 | +inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However, |
| 96 | +some direct calls to `declare target` related functions from Flang's |
| 97 | +lowering bridge in flang/lib/Lower/Bridge.cpp are made. |
| 98 | + |
| 99 | +The marking of operations with the declare target attribute happens in two |
| 100 | +phases, the second one optional and contingent on the first failing. The |
| 101 | +initial phase happens when the declare target directive and its clauses |
| 102 | +are initially processed, with the primary data gathering for the directive and |
| 103 | +clause happening in a function called `getDeclareTargetInfo`. This is then used |
| 104 | +to feed the `markDeclareTarget` function, which does the actual marking |
| 105 | +utilising the `DeclareTargetInterface`. If it encounters a variable or function |
| 106 | +that has been marked twice over multiple directives with two differing device |
| 107 | +types (e.g. `host`, `nohost`), then it will swap the device type to `any`. |
| 108 | + |
| 109 | +Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from the |
| 110 | +lowering bridge, we are also invoking another function called |
| 111 | +`gatherOpenMPDeferredDeclareTargets`, which gathers information relevant to the |
| 112 | +application of the `declare target` attribute. This information |
| 113 | +includes the symbol that it should be applied to, device type clause, |
| 114 | +and capture clause, and it is stored in a vector that is part of the lowering |
| 115 | +bridge's instantiation of the `AbstractConverter`. It is only stored if we |
| 116 | +encounter a function or variable symbol that does not have an operation |
| 117 | +instantiated for it yet. This cannot happen as part of the |
| 118 | +initial marking as we must store this data in the lowering bridge and we |
| 119 | +only have access to the abstract version of the converter via the OpenMP |
| 120 | +lowering. |
| 121 | + |
| 122 | +The information produced by the first phase is used in the second phase, |
| 123 | +which is a form of deferred processing of the `declare target` marked |
| 124 | +operations that have delayed generation and cannot be proccessed in the |
| 125 | +first phase. The main notable case this occurs currently is when a |
| 126 | +Fortran function interface has been marked. This is |
| 127 | +done via the function |
| 128 | +`markOpenMPDeferredDeclareTargetFunctions`, which is called from the lowering |
| 129 | +bridge at the end of the lowering process allowing us to mark those where |
| 130 | +possible. It iterates over the data previously gathered by |
| 131 | +`gatherOpenMPDeferredDeclareTargets` |
| 132 | +checking if any of the recorded symbols have now had their corresponding |
| 133 | +operations instantiated and applying the declare target attribute where |
| 134 | +possible utilising `markDeclareTarget`. However, it must be noted that it |
| 135 | +is still possible for operations not to be generated for certain symbols, |
| 136 | +in particular the case of function interfaces that are not directly used |
| 137 | +or defined within the current module. This means we cannot emit errors in |
| 138 | +the case of left-over unmarked symbols. These must (and should) be caught |
| 139 | +by the initial semantic analysis. |
| 140 | + |
| 141 | +NOTE: `declare target` can be applied to implicit `SAVE` attributed variables. |
| 142 | +However, by default Flang does not represent these as `GlobalOp`'s, which means |
| 143 | +we cannot tag and lower them as `declare target` normally. Instead, similarly |
| 144 | +to the way `threadprivate` handles these cases, we raise and initialize the |
| 145 | +variable as an internal `GlobalOp` and apply the attribute. This occurs in the |
| 146 | +flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`. |
| 147 | + |
| 148 | +# Declare Target Transformation Passes for Flang |
| 149 | + |
| 150 | +There are currently two passes within Flang that are related to the processing |
| 151 | +of `declare target`: |
| 152 | +* `OMPMarkDeclareTarget` - This pass is in charge of marking functions captured |
| 153 | +(called from) in `target` regions or other `declare target` marked functions as |
| 154 | +`declare target`. It does so recursively, i.e. nested calls will also be |
| 155 | +implicitly marked. It currently will try to mark things as conservatively as |
| 156 | +possible, e.g. if captured in a `target` region it will apply `nohost`, unless |
| 157 | +it encounters a `host` `declare target` in which case it will apply the `any` |
| 158 | +device type. Functions are handled similarly, except we utilise the parent's |
| 159 | +device type where possible. |
| 160 | +* `OMPFunctionFiltering` - This is executed after the `OMPMarkDeclareTarget` |
| 161 | +pass, and its job is to conservatively remove host functions from |
| 162 | +the module where possible when compiling for the device. This helps make |
| 163 | +sure that most incompatible code for the host is not lowered for the |
| 164 | +device. Host functions with `target` regions in them need to be preserved |
| 165 | +(e.g. for lowering the `target region`(s) inside). Otherwise, it removes |
| 166 | +any function marked as a `declare target host` function and any uses will be |
| 167 | +replaced with `undef`'s so that the remaining host code doesn't become broken. |
| 168 | +Host functions with `target` regions are marked with a `declare target host` |
| 169 | +attribute so they will be removed after outlining the target regions contained |
| 170 | +inside. |
| 171 | + |
| 172 | +While this infrastructure could be generally applicable to more than just Flang, |
| 173 | +it is only utilised in the Flang frontend, so it resides there rather than in |
| 174 | +the OpenMP dialect codebase. |
| 175 | + |
| 176 | +# Declare Target OpenMP Dialect To LLVM-IR Lowering |
| 177 | + |
| 178 | +The OpenMP dialect lowering of `declare target` is done through the |
| 179 | +`amendOperation` flow, as it's not an `operation` but rather an |
| 180 | +`attribute`. This is triggered immediately after the corresponding |
| 181 | +operation has been lowered to LLVM-IR. As it is applicable to |
| 182 | +different types of operations, we must specialise this function for |
| 183 | +each operation type that we may encounter. Currently, this is |
| 184 | +`GlobalOp`'s and `FuncOp`'s. |
| 185 | + |
| 186 | +`FuncOp` processing is fairly simple. When compiling for the device, |
| 187 | +`host` marked functions are removed, including those that could not |
| 188 | +be removed earlier due to having `target` directives within. This |
| 189 | +leaves `any`, `device` or indeterminable functions left in the |
| 190 | +module to lower further. When compiling for the host, no filtering is |
| 191 | +done because `nohost` functions must be available as a fallback |
| 192 | +implementation. |
| 193 | + |
| 194 | +For `GlobalOp`'s, the processing is a little more complex. We |
| 195 | +currently leverage the `registerTargetGlobalVariable` and |
| 196 | +`getAddrOfDeclareTargetVar` `OMPIRBuilder` functions shared with Clang. |
| 197 | +These two functions invoke each other depending on the clauses and options |
| 198 | +provided to the `OMPIRBuilder` (in particular, unified shared memory). Their |
| 199 | +main purposes are the generation of a new global device pointer with a |
| 200 | +"ref_" prefix on the device and enqueuing metadata generation by the |
| 201 | +`OMPIRBuilder` to be produced at module finalization time. This is done |
| 202 | +for both host and device and it links the newly generated device global |
| 203 | +pointer and the host pointer together across the two modules. |
| 204 | + |
| 205 | +Similarly to other metadata (e.g. for `TargetOp`) that is shared across |
| 206 | +both host and device modules, processing of `GlobalOp`'s in the device |
| 207 | +needs access to the previously generated host IR file, which is done |
| 208 | +through another `attribute` applied to the `ModuleOp` by the compiler |
| 209 | +frontend. The file is loaded in and consumed by the `OMPIRBuilder` to |
| 210 | +populate it's `OffloadInfoManager` data structures, keeping host and |
| 211 | +device appropriately synchronised. |
| 212 | + |
| 213 | +The second (and more important to remember) is that as we effectively replace |
| 214 | +the original LLVM-IR generated for the `declare target` marked `GlobalOp` we |
| 215 | +have some corrections we need to do for `TargetOp`'s (or other region |
| 216 | +operations that use them directly) which still refer to the original lowered |
| 217 | +global operation. This is done via `handleDeclareTargetMapVar` which is invoked |
| 218 | +as the final function and alteration to the lowered `target` region, it's only |
| 219 | +invoked for device as it's only required in the case where we have emitted the |
| 220 | +"ref" pointer , and it effectively replaces all uses of the originally lowered |
| 221 | +global symbol, with our new global ref pointer's symbol. Currently we do not |
| 222 | +remove or delete the old symbol, this is due to the fact that the same symbol |
| 223 | +can be utilised across multiple target regions, if we remove it, we risk |
| 224 | +breaking lowerings of target regions that will be processed at a later time. |
| 225 | +To appropriately delete these no longer necessary symbols we would need a |
| 226 | +deferred removal process at the end of the module, which is currently not in |
| 227 | +place. It may be possible to store this information in the OMPIRBuilder and |
| 228 | +then perform this cleanup process on finalization, but this is open for |
| 229 | +discussion and implementation still. |
| 230 | + |
| 231 | +# Current Support |
| 232 | + |
| 233 | +For the moment, `declare target` should work for: |
| 234 | +* Marking functions/subroutines and function/subroutine interfaces for |
| 235 | +generation on host, device or both. |
| 236 | +* Implicit function/subroutine capture for calls emitted in a `target` region |
| 237 | +or explicitly marked `declare target` function/subroutine. Note: Calls made |
| 238 | +via arguments passed to other functions must still be themselves marked |
| 239 | +`declare target`, e.g. passing a `C` function pointer and invoking it, then |
| 240 | +the interface and the `C` function in the other module must be marked |
| 241 | +`declare target`, with the same type of marking as indicated by the |
| 242 | +specification. |
| 243 | +* Marking global variables with `declare target`'s `link` clause and mapping |
| 244 | +the data to the device data environment utilising `declare target`. This may |
| 245 | +not work for all types yet, but for scalars and arrays of scalars, it |
| 246 | +should. |
| 247 | + |
| 248 | +Doesn't work for, or needs further testing for: |
| 249 | +* Marking the following types with `declare target link` (needs further |
| 250 | +testing): |
| 251 | + * Descriptor based types, e.g. pointers/allocatables. |
| 252 | + * Derived types. |
| 253 | + * Members of derived types (use-case needs legality checking with OpenMP |
| 254 | +specification). |
| 255 | +* Marking global variables with `declare target`'s `to` clause. A lot of the |
| 256 | +lowering should exist, but it needs further testing and likely some further |
| 257 | +changes to fully function. |
0 commit comments