@@ -92,104 +92,46 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
92
92
93
93
Q: How to build an OpenMP Nvidia offload capable compiler?
94
94
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95
- The Cuda SDK is required on the machine that will execute the openmp application.
96
-
97
- If your build machine is not the target machine or automatic detection of the
98
- available GPUs failed, you should also set:
99
-
100
- - ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_<xy>;...' `` where ``<xy> `` is the numeric
101
- compute capability of your GPU. For instance, set
102
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_70;sm_80' `` to target the Nvidia Volta
103
- and Ampere architectures.
104
-
95
+ The CUDA SDK is required on the machine that will build and execute the
96
+ offloading application. Normally this is only required at runtime by dynamically
97
+ opening the CUDA driver API. This can be disabled in the build by omitting
98
+ ``cuda `` from the ``LIBOMPTARGET_DLOPEN_PLUGINS `` list which is present by
99
+ default. With this setting we will instead find the CUDA library at LLVM build
100
+ time and link against it directly.
105
101
106
102
.. _build_amdgpu_offload_capable_compiler :
107
103
108
104
Q: How to build an OpenMP AMDGPU offload capable compiler?
109
105
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110
- A subset of the `ROCm <https://github.com/radeonopencompute >`_ toolchain is
111
- required to build the LLVM toolchain and to execute the openmp application.
112
- Either install ROCm somewhere that cmake's find_package can locate it, or
113
- build the required subcomponents ROCt and ROCr from source.
114
-
115
- The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
116
- Roct is the userspace part of the linux driver. It calls into the driver which
117
- ships with the linux kernel. It is an implementation detail of Rocr from
118
- OpenMP's perspective. Rocr is an implementation of `HSA
119
- <http://www.hsafoundation.com> `_.
120
-
121
- .. code-block :: text
122
-
123
- SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
124
- BUILD_DIR=somewhere
125
- INSTALL_PREFIX=same-as-llvm-install
126
-
127
- cd $SOURCE_DIR
128
- git clone [email protected] :RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
129
- --single-branch
130
- git clone [email protected] :RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
131
- --single-branch
132
-
133
- cd $BUILD_DIR && mkdir roct && cd roct
134
- cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
135
- -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
136
- make && make install
137
-
138
- cd $BUILD_DIR && mkdir rocr && cd rocr
139
- cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
140
- -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
141
- -DBUILD_SHARED_LIBS=ON
142
- make && make install
143
-
144
- ``IMAGE_SUPPORT `` requires building rocr with clang and is not used by openmp.
145
-
146
- Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
147
- build a tool ``bin/amdgpu-arch `` which will print a string like ``gfx906 `` when
148
- run if it recognises a GPU on the local system. LLVM will also build a shared
149
- library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
150
-
151
- With those libraries installed, then LLVM build and installed, try:
152
-
153
- .. code-block :: shell
154
-
155
- clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
156
106
157
- If your build machine is not the target machine or automatic detection of the
158
- available GPUs failed, you should also set:
159
-
160
- - ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx<xyz>;...' `` where ``<xyz> `` is the
161
- shader core instruction set architecture. For instance, set
162
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx906;gfx90a' `` to target AMD GCN5
163
- and CDNA2 devices.
107
+ The OpenMP AMDGPU offloading support depends on the ROCm math libraries and the
108
+ HSA ROCr / ROCt runtimes. These are normally provided by a standard ROCm
109
+ installation, but can be built and used independently if desired. Building the
110
+ libraries does not depend on these libraries by default by dynamically loading
111
+ the HSA runtime at program execution. As in the CUDA case, this can be change by
112
+ omitting ``amdgpu `` from the ``LIBOMPTARGET_DLOPEN_PLUGINS `` list.
164
113
165
114
Q: What are the known limitations of OpenMP AMDGPU offload?
166
115
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
167
- LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
168
116
169
- There is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
170
- of the rocm device library, which will be searched for if linking with '-lm'.
117
+ LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and
118
+ libomptarget.so correctly. The recommended way to configure this is with the
119
+ ``-frtlib-add-rpath `` option. Alternatively, set the ``LD_LIBRARY_PATH ``
120
+ environment variable to point to the installation. Normally, these libraries are
121
+ installed in the target specific runtime directory. For example, a typical
122
+ installation will have
123
+ ``<install>/lib/x86_64-unknown-linux-gnu/llibomptarget.so ``
171
124
172
125
Some versions of the driver for the radeon vii (gfx906) will error unless the
173
126
environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
174
127
175
- It is a recent addition to LLVM and the implementation differs from that which
176
- has been shipping in ROCm and AOMP for some time. Early adopters will encounter
177
- bugs.
178
-
179
128
Q: What are the LLVM components used in offloading and how are they found?
180
129
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
181
130
The libraries used by an executable compiled for target offloading are:
182
131
183
132
- ``libomp.so `` (or similar), the host openmp runtime
184
133
- ``libomptarget.so ``, the target-agnostic target offloading openmp runtime
185
- - plugins loaded by libomptarget.so:
186
-
187
- - ``libomptarget.rtl.amdgpu.so ``
188
- - ``libomptarget.rtl.cuda.so ``
189
- - ``libomptarget.rtl.x86_64.so ``
190
- - ``libomptarget.rtl.ve.so ``
191
- - and others
192
-
134
+ - ``libompdevice.a ``, the device-side OpenMP runtime.
193
135
- dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
194
136
195
137
The compiled executable is dynamically linked against a host runtime, e.g.
@@ -245,7 +187,6 @@ Q: Does OpenMP offloading support work in packages distributed as part of my OS?
245
187
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
246
188
For now, the answer is most likely *no *. Please see :ref: `build_offload_capable_compiler `.
247
189
248
-
249
190
.. _math_and_complex_in_target_regions :
250
191
251
192
Q: Does Clang support `<math.h> ` and `<complex.h> ` operations in OpenMP target on GPUs?
@@ -274,21 +215,13 @@ through a similar mechanism. It is worth noting that this support requires
274
215
<https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant> `__
275
216
that are exposed through LLVM/Clang to the user as well.
276
217
277
- Q: What is a way to debug errors from mapping memory to a target device?
278
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
279
-
280
- An experimental way to debug these errors is to use :ref: `remote process
281
- offloading <remote_offloading_plugin>`.
282
- By using ``libomptarget.rtl.rpc.so `` and ``openmp-offloading-server ``, it is
283
- possible to explicitly perform memory transfers between processes on the host
284
- CPU and run sanitizers while doing so in order to catch these errors.
285
-
286
218
Q: Can I use dynamically linked libraries with OpenMP offloading?
287
219
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
288
220
289
- Dynamically linked libraries can be only used if there is no device code split
221
+ Dynamically linked libraries can be used if there is no device code shared
290
222
between the library and application. Anything declared on the device inside the
291
- shared library will not be visible to the application when it's linked.
223
+ shared library will not be visible to the application when it's linked. This is
224
+ because device code only supports static linking.
292
225
293
226
Q: How to build an OpenMP offload capable compiler with an outdated host compiler?
294
227
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -303,38 +236,6 @@ For example, if your system-wide GCC installation is too old to build LLVM and
303
236
you would like to use a newer GCC, set ``--gcc-install-dir= ``
304
237
to inform clang of the GCC installation you would like to use in the second stage.
305
238
306
- Q: How can I include OpenMP offloading support in my CMake project?
307
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
308
-
309
- Currently, there is an experimental CMake find module for OpenMP target
310
- offloading provided by LLVM. It will attempt to find OpenMP target offloading
311
- support for your compiler. The flags necessary for OpenMP target offloading will
312
- be loaded into the ``OpenMPTarget::OpenMPTarget_<device> `` target or the
313
- ``OpenMPTarget_<device>_FLAGS `` variable if successful. Currently supported
314
- devices are ``AMDGPU `` and ``NVPTX ``.
315
-
316
- To use this module, simply add the path to CMake's current module path and call
317
- ``find_package ``. The module will be installed with your OpenMP installation by
318
- default. Including OpenMP offloading support in an application should now only
319
- require a few additions.
320
-
321
- .. code-block :: cmake
322
-
323
- cmake_minimum_required(VERSION 3.20.0)
324
- project(offloadTest VERSION 1.0 LANGUAGES CXX)
325
-
326
- list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
327
-
328
- find_package(OpenMPTarget REQUIRED NVPTX)
329
-
330
- add_executable(offload)
331
- target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
332
- target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
333
-
334
- Using this module requires at least CMake version 3.20.0. Supported languages
335
- are C and C++ with Fortran support planned in the future. Compiler support is
336
- best for Clang but this module should work for other compiler vendors such as
337
- IBM, GNU.
338
239
339
240
Q: What does 'Stack size for entry function cannot be statically determined' mean?
340
241
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -360,11 +261,11 @@ will only extract archive members if an architecture is used, allowing users to
360
261
create generic libraries.
361
262
362
263
The architecture can either be specified manually using ``--offload-arch= ``. If
363
- ``--offload-arch= `` is present no ``-fopenmp-targets= `` flag is present then the
364
- targets will be inferred from the architectures. Conversely, if
264
+ ``--offload-arch= `` is present and no ``-fopenmp-targets= `` flag is present then
265
+ the targets will be inferred from the architectures. Conversely, if
365
266
``--fopenmp-targets= `` is present with no ``--offload-arch `` then the target
366
267
architecture will be set to a default value, usually the architecture supported
367
- by the system LLVM was built on.
268
+ by the system LLVM was built on by executing the `` offload-arch `` utility .
368
269
369
270
For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
370
271
given that the necessary build tools are installed for both.
@@ -434,7 +335,7 @@ linkable device image.
434
335
clang++ openmp.o cuda.o --offload-link -o app
435
336
436
337
Q: Are libomptarget and plugins backward compatible?
437
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
438
339
439
340
No. libomptarget and plugins are now built as LLVM libraries starting from LLVM
440
341
15. Because LLVM libraries are not backward compatible, libomptarget and plugins
@@ -460,7 +361,7 @@ with OpenMP.
460
361
461
362
.. code-block :: shell
462
363
463
- clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu
364
+ clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -Xoffload-linker -lc
464
365
465
366
For more information on how this is implemented in LLVM/OpenMP's offloading
466
367
runtime, refer to the `runtime documentation <libomptarget_libc >`_.
0 commit comments