Skip to content

Commit 4166ff6

Browse files
committed
[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
I have added a few things to the OpenMP FAQ which I think were missing. Feel free to suggest some changes. Are there missing options in the offloading command line reference? And what do you think about the section "Q: Why is my build taking a long time"? Differential Revision: https://reviews.llvm.org/D156387
1 parent c956f91 commit 4166ff6

File tree

1 file changed

+123
-9
lines changed

1 file changed

+123
-9
lines changed

openmp/docs/SupportAndFAQ.rst

Lines changed: 123 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,15 @@ All patches go through the regular `LLVM review process
5252
Q: How to build an OpenMP GPU offload capable compiler?
5353
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5454
To build an *effective* OpenMP offload capable compiler, only one extra CMake
55-
option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
55+
option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic
5656
information about building LLVM is available `here
57-
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
58-
are targeted by OpenMP to be enabled. By default, Clang will be built with all
59-
backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
60-
should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
61-
default.
57+
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
58+
are targeted by OpenMP are enabled. That can be done by adjusting the CMake
59+
option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
60+
and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
61+
Clang will be built with all backends enabled. When building with
62+
``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
63+
``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
6264

6365
For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
6466
For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
@@ -72,14 +74,14 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
7274

7375
.. _build_nvidia_offload_capable_compiler:
7476

75-
Q: How to build an OpenMP NVidia offload capable compiler?
77+
Q: How to build an OpenMP Nvidia offload capable compiler?
7678
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7779
The Cuda SDK is required on the machine that will execute the openmp application.
7880

7981
If your build machine is not the target machine or automatic detection of the
8082
available GPUs failed, you should also set:
8183

82-
- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
84+
- ``LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY`` where ``YY`` is the numeric compute capacity of your GPU, e.g., 75.
8385

8486

8587
.. _build_amdgpu_offload_capable_compiler:
@@ -349,7 +351,7 @@ create generic libraries.
349351
The architecture can either be specified manually using ``--offload-arch=``. If
350352
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
351353
targets will be inferred from the architectures. Conversely, if
352-
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
354+
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
353355
architecture will be set to a default value, usually the architecture supported
354356
by the system LLVM was built on.
355357

@@ -451,3 +453,115 @@ with OpenMP.
451453
452454
For more information on how this is implemented in LLVM/OpenMP's offloading
453455
runtime, refer to the `runtime documentation <libomptarget_libc>`_.
456+
457+
Q: What command line options can I use for OpenMP offloading?
458+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
459+
``-fopenmp-targets``
460+
""""""""""""""""""""
461+
Specify which OpenMP offloading targets should be supported. For example, you
462+
may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx-none``.
463+
464+
``--offload-arch``
465+
""""""""""""""""""
466+
Specify the device architecture for OpenMP offloading. For instance
467+
``--offload-arch=sm_80`` to target an Nvidia Tesla A100 or
468+
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X.
469+
470+
``--offload-device-only``
471+
"""""""""""""""""""""""""
472+
Compile the target regions for the device only. All target regions will be
473+
compiled for both host and device if not specified.
474+
475+
``--offload-host-device`` or ``--offload-host-only``
476+
""""""""""""""""""""""""""""""""""""""""""""""""""""
477+
Compile the target regions for the host only. All target regions will be
478+
compiled for both host and device if not specified.
479+
480+
``-Xopenmp-target <arg>``
481+
"""""""""""""""""""""""""
482+
Pass an argument to the offloading toolchain, for instance
483+
``-Xopenmp-target -march=sm_80``.
484+
485+
``-Xopenmp-target=<triple> <arg>``
486+
""""""""""""""""""""""""""""""""""
487+
Pass an argument to the offloading toolchain for the triple. That is especially
488+
useful when an argument must differ for each triple. For instance
489+
``-Xopenmp-target=nvptx64 --offload-arch=sm_80
490+
-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
491+
architecture.
492+
493+
``-Xoffload-linker<triple> <arg>``
494+
""""""""""""""""""""""""""""""""""
495+
Pass an argument ``<arg>`` to the offloading linker for the target specified in
496+
``<triple>``.
497+
498+
``-foffload-lto=<arg>``
499+
"""""""""""""""""""""""
500+
Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
501+
Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
502+
less time while still achieving some performance gains.
503+
504+
``-foffload-lto``
505+
"""""""""""""""""
506+
Enable ``full`` link time optimization on the device. This option is equivalent to
507+
``-foffload-lto=full``.
508+
509+
``-fopenmp-offload-mandatory``
510+
""""""""""""""""""""""""""""""
511+
With this option enabled, a host fallback will not be created for a situation
512+
when offloading to the device fails. An example use case of this option is to
513+
verify that code is being offloaded to the device.
514+
515+
``-fopenmp-target-debug``
516+
"""""""""""""""""""""""""
517+
Enable debugging in the device runtime library (RTL).
518+
519+
``-fno-openmp-target-debug``
520+
""""""""""""""""""""""""""""
521+
Disable debugging in the device RTL.
522+
523+
``-fopenmp-target-jit``
524+
"""""""""""""""""""""""
525+
Emit code that can be Just-in-Time (JIT) compiled for OpenMP offloading.
526+
527+
``--offload-new-driver``
528+
""""""""""""""""""""""""
529+
Use the new driver for offloading compilation. OpenMP offloading can be
530+
experimentally linked with CUDA and HIP files. That requires using the new
531+
offloading driver.
532+
533+
``--no-offload-new-driver``
534+
"""""""""""""""""""""""""""
535+
Do not use the new driver for offloading compilation.
536+
537+
``--offload-link``
538+
""""""""""""""""""
539+
Use the new offloading linker to perform the link job. OpenMP offloading can be
540+
experimentally linked with CUDA and HIP files. The new offloading linker must be
541+
used when linking with CUDA or HIP files.
542+
543+
``-nogpulib``
544+
"""""""""""""
545+
Do not link the device library for CUDA or HIP device compilation.
546+
547+
``-nogpuinc``
548+
"""""""""""""
549+
Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
550+
include paths.
551+
552+
Q: Why is my build taking a long time?
553+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
554+
When installing OpenMP and other LLVM components, the build time on multicore
555+
systems can be significantly reduced with parallel build jobs. As suggested in
556+
*LLVM Techniques, Tips, and Best Practices*, one could consider using `ninja` as the
557+
generator. This can be done with the CMake option `cmake -G Ninja`. Afterward,
558+
use `ninja install` and specify the number of parallel jobs with `-j`. The build
559+
time can also be reduced by setting the build type to `Release ` with the
560+
`CMAKE_BUILD_TYPE` option. Recompilation can also be sped up by caching previous
561+
compilations. Consider enabling `Ccache` with
562+
`CMAKE_CXX_COMPILER_LAUNCHER=ccache`.
563+
564+
Q: Did this FAQ not answer your question?
565+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
566+
Feel free to post questions or browse old threads at
567+
`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.

0 commit comments

Comments
 (0)