Skip to content

Commit 15e6206

Browse files
committed
[Clang][Docs] Update information on the new driver now that it's default
Summary: This patch updates some of the documentation on the new driver now that it's the default. Also the ABI for embedding these images changed.
1 parent ae23be8 commit 15e6206

File tree

3 files changed

+21
-25
lines changed

3 files changed

+21
-25
lines changed

clang/docs/ClangCommandLineReference.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -801,7 +801,7 @@ Generate Interface Stub Files, emit merged text not binary.
801801

802802
Extract API information
803803

804-
.. option:: -fopenmp-new-driver
804+
.. option:: -fopenmp-new-driver, fno-openmp-new-driver
805805

806806
Use the new driver for OpenMP offloading.
807807

clang/docs/OffloadingDesign.rst

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,6 @@ application using Clang.
1717
OpenMP Offloading
1818
=================
1919

20-
.. note::
21-
This documentation describes Clang's behavior using the new offloading
22-
driver. This currently must be enabled manually using
23-
``-fopenmp-new-driver``.
24-
2520
Clang supports OpenMP target offloading to several different architectures such
2621
as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
2722
Clang and then executed using the ``libomptarget`` runtime and the associated
@@ -226,15 +221,15 @@ A fat binary is a binary file that contains information intended for another
226221
device. We create a fat object by embedding the output of the device compilation
227222
stage into the host as a named section. The output from the device compilation
228223
is passed to the host backend using the ``-fembed-offload-object`` flag. This
229-
inserts the object as a global in the host's IR. The section name contains the
230-
target triple and architecture that the data corresponds to for later use.
231-
Typically we will also add an extra string to the section name to prevent it
232-
from being merged with other sections if the user performs relocatable linking
233-
on the object.
224+
embeds the device image into the ``.llvm.offloading`` section using a special
225+
binary format that behaves like a string map. This binary format is used to
226+
bundle metadata about the image so the linker can associate the proper device
227+
linking action with the image. Each device image will start with the magic bytes
228+
``0x10FF10AD``.
234229

235230
.. code-block:: llvm
236231
237-
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading.nvptx64.sm_70."
232+
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
238233
239234
The device code will then be placed in the corresponding section one the backend
240235
is run on the host, creating a fat object. Using fat objects allows us to treat
@@ -250,7 +245,7 @@ will use this information when :ref:`Device Linking`.
250245
+==================================+====================================================================+
251246
| omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) |
252247
+----------------------------------+--------------------------------------------------------------------+
253-
| .llvm.offloading.<triple>.<arch> | Embedded device object file for the target device and architecture |
248+
| .llvm.offloading | Embedded device object file for the target device and architecture |
254249
+----------------------------------+--------------------------------------------------------------------+
255250

256251
.. _Device Linking:
@@ -262,9 +257,10 @@ Objects containing :ref:`table-offloading_sections` require special handling to
262257
create an executable device image. This is done using a Clang tool, see
263258
:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
264259
over the host linking job. It scans the input object files for the offloading
265-
sections and runs the appropriate device linking action. The linked device image
266-
is then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load the
267-
device image and link it with the host.
260+
section ``.llvm.offloading``. The device files stored in this section are then
261+
extracted and passed tot he appropriate linking job. The linked device image is
262+
then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
263+
the device image and link it with the host.
268264

269265
The linker wrapper tool supports linking bitcode files through link time
270266
optimization (LTO). This is used whenever the object files embedded in the host
@@ -438,19 +434,22 @@ This code is compiled using the following Clang flags.
438434
439435
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
440436
441-
The output section in the object file can be seen using the ``readelf`` utility
437+
The output section in the object file can be seen using the ``readelf`` utility.
438+
The ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
439+
removed from the final executable or shared library by the linker.
442440

443441
.. code-block:: text
444442
445443
$ llvm-readelf -WS zaxpy.o
446-
[Nr] Name Type
447-
...
448-
[34] omp_offloading_entries PROGBITS
449-
[35] .llvm.offloading.nvptx64-nvidia-cuda.sm_70 PROGBITS
444+
Section Headers:
445+
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
446+
[11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00 A 0 0 1
447+
[12] .llvm.offloading PROGBITS 0000000000000000 000260 030950 00 E 0 0 8
448+
450449
451450
Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
452451
extract and link the device code stored at the section named
453-
``.llvm.offloading.nvptx64-nvidia-cuda.sm_70`` and then use entries stored in
452+
``.llvm.offloading`` and then use entries stored in
454453
the section named ``omp_offloading_entries`` to create the symbols necessary for
455454
``libomptarget`` to register the device image and call the entry function.
456455

clang/docs/OpenMPSupport.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -95,9 +95,6 @@ Features not supported or with limited support for Cuda devices
9595

9696
- Nested parallelism: inner parallel regions are executed sequentially.
9797

98-
- Static linking of libraries containing device code is not supported without
99-
explicitly using ``-fopenmp-new-driver``.
100-
10198
- Automatic translation of math functions in target regions to device-specific
10299
math functions is not implemented yet.
103100

0 commit comments

Comments
 (0)