Skip to content

Commit 0cbbcf1

Browse files
authored
[libc] Update GPU documentation pages (#84076)
Summary: After the overhaul of the GPU build the documentation pages were a little stale. This updates them with more in-depth information on building the GPU runtimes and using them. Specifically using them goes through the differences between the offloading and direct compilation modes.
1 parent c161720 commit 0cbbcf1

File tree

5 files changed

+473
-50
lines changed

5 files changed

+473
-50
lines changed

libc/docs/full_cross_build.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ The above ``ninja`` command will build the libc static archives ``libc.a`` and
9494
``libm.a`` for the target specified with ``-DLIBC_TARGET_TRIPLE`` in the CMake
9595
configure step.
9696

97+
.. _runtimes_cross_build:
98+
9799
Runtimes cross build
98100
====================
99101

@@ -230,3 +232,12 @@ component of the target triple as ``none``. For example, to build for a
230232
32-bit arm target on bare metal, one can use a target triple like
231233
``arm-none-eabi``. Other than that, the libc for a bare metal target can be
232234
built using any of the three recipes described above.
235+
236+
Building for the GPU
237+
====================
238+
239+
To build for a GPU architecture, it should only be necessary to specify the
240+
target triple as one of the supported GPU targets. Currently, this is either
241+
``nvptx64-nvidia-cuda`` for NVIDIA GPUs or ``amdgcn-amd-amdhsa`` for AMD GPUs.
242+
More detailed information is provided in the :ref:`GPU
243+
documentation<libc_gpu_building>`.

libc/docs/gpu/building.rst

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
.. _libc_gpu_building:
2+
3+
======================
4+
Building libs for GPUs
5+
======================
6+
7+
.. contents:: Table of Contents
8+
:depth: 4
9+
:local:
10+
11+
Building the GPU C library
12+
==========================
13+
14+
This document will present recipes to build the LLVM C library targeting a GPU
15+
architecture. The GPU build uses the same :ref:`cross build<full_cross_build>`
16+
support as the other targets. However, the GPU target has the restriction that
17+
it *must* be built with an up-to-date ``clang`` compiler. This is because the
18+
GPU target uses several compiler extensions to target GPU architectures.
19+
20+
The LLVM C library currently supports two GPU targets. This is either
21+
``nvptx64-nvidia-cuda`` for NVIDIA GPUs or ``amdgcn-amd-amdhsa`` for AMD GPUs.
22+
Targeting these architectures is done through ``clang``'s cross-compiling
23+
support using the ``--target=<triple>`` flag. The following sections will
24+
describe how to build the GPU support specifically.
25+
26+
Once you have finished building, refer to :ref:`libc_gpu_usage` to get started
27+
with the newly built C library.
28+
29+
Standard runtimes build
30+
-----------------------
31+
32+
The simplest way to build the GPU libc is to use the existing LLVM runtimes
33+
support. This will automatically handle bootstrapping an up-to-date ``clang``
34+
compiler and using it to build the C library. The following CMake invocation
35+
will instruct it to build the ``libc`` runtime targeting both AMD and NVIDIA
36+
GPUs.
37+
38+
.. code-block:: sh
39+
40+
$> cd llvm-project # The llvm-project checkout
41+
$> mkdir build
42+
$> cd build
43+
$> cmake ../llvm -G Ninja \
44+
-DLLVM_ENABLE_PROJECTS="clang;lld" \
45+
-DLLVM_ENABLE_RUNTIMES="openmp" \
46+
-DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type
47+
-DCMAKE_INSTALL_PREFIX=<PATH> \ # Where the libraries will live
48+
-DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libc \
49+
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=libc \
50+
-DLLVM_RUNTIME_TARGETS="default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda"
51+
$> ninja install
52+
53+
We need ``clang`` to build the GPU C library and ``lld`` to link AMDGPU
54+
executables, so we enable them in ``LLVM_ENABLE_PROJECTS``. We add ``openmp`` to
55+
``LLVM_ENABLED_RUNTIMES`` so it is built for the default target and provides
56+
OpenMP support. We then set ``RUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES`` to enable
57+
``libc`` for the GPU targets. The ``LLVM_RUNTIME_TARGETS`` sets the enabled
58+
targets to build, in this case we want the default target and the GPU targets.
59+
Note that if ``libc`` were included in ``LLVM_ENABLE_RUNTIMES`` it would build
60+
targeting the default host environment as well.
61+
62+
Runtimes cross build
63+
--------------------
64+
65+
For users wanting more direct control over the build process, the build steps
66+
can be done manually instead. This build closely follows the instructions in the
67+
:ref:`main documentation<runtimes_cross_build>` but is specialized for the GPU
68+
build. We follow the same steps to first build the libc tools and a suitable
69+
compiler. These tools must all be up-to-date with the libc source.
70+
71+
.. code-block:: sh
72+
73+
$> cd llvm-project # The llvm-project checkout
74+
$> mkdir build-libc-tools # A different build directory for the build tools
75+
$> cd build-libc-tools
76+
$> HOST_C_COMPILER=<C compiler for the host> # For example "clang"
77+
$> HOST_CXX_COMPILER=<C++ compiler for the host> # For example "clang++"
78+
$> cmake ../llvm \
79+
-G Ninja \
80+
-DLLVM_ENABLE_PROJECTS="clang;libc" \
81+
-DCMAKE_C_COMPILER=$HOST_C_COMPILER \
82+
-DCMAKE_CXX_COMPILER=$HOST_CXX_COMPILER \
83+
-DLLVM_LIBC_FULL_BUILD=ON \
84+
-DLIBC_HDRGEN_ONLY=ON \ # Only build the 'libc-hdrgen' tool
85+
-DCMAKE_BUILD_TYPE=Release # Release suggested to make "clang" fast
86+
$> ninja # Build the 'clang' compiler
87+
$> ninja libc-hdrgen # Build the 'libc-hdrgen' tool
88+
89+
Once this has finished the build directory should contain the ``clang`` compiler
90+
and the ``libc-hdrgen`` executable. We will use the ``clang`` compiler to build
91+
the GPU code and the ``libc-hdrgen`` tool to create the necessary headers. We
92+
use these tools to bootstrap the build out of the runtimes directory targeting a
93+
GPU architecture.
94+
95+
.. code-block:: sh
96+
97+
$> cd llvm-project # The llvm-project checkout
98+
$> mkdir build # A different build directory for the build tools
99+
$> cd build
100+
$> TARGET_TRIPLE=<amdgcn-amd-amdhsa or nvptx64-nvidia-cuda>
101+
$> TARGET_C_COMPILER=</path/to/clang>
102+
$> TARGET_CXX_COMPILER=</path/to/clang++>
103+
$> HDRGEN=</path/to/libc-hdrgen>
104+
$> cmake ../runtimes \ # Point to the runtimes build
105+
-G Ninja \
106+
-DLLVM_ENABLE_RUNTIMES=libc \
107+
-DCMAKE_C_COMPILER=$TARGET_C_COMPILER \
108+
-DCMAKE_CXX_COMPILER=$TARGET_CXX_COMPILER \
109+
-DLLVM_LIBC_FULL_BUILD=ON \
110+
-DLLVM_RUNTIMES_TARGET=$TARGET_TRIPLE \
111+
-DLIBC_HDRGEN_EXE=$HDRGEN \
112+
-DCMAKE_BUILD_TYPE=Release
113+
$> ninja install
114+
115+
The above steps will result in a build targeting one of the supported GPU
116+
architectures. Building for multiple targets requires separate CMake
117+
invocations.
118+
119+
Standalone cross build
120+
----------------------
121+
122+
The GPU build can also be targeted directly as long as the compiler used is a
123+
supported ``clang`` compiler. This method is generally not recommended as it can
124+
only target a single GPU architecture.
125+
126+
.. code-block:: sh
127+
128+
$> cd llvm-project # The llvm-project checkout
129+
$> mkdir build # A different build directory for the build tools
130+
$> cd build
131+
$> CLANG_C_COMPILER=</path/to/clang> # Must be a trunk build
132+
$> CLANG_CXX_COMPILER=</path/to/clang++> # Must be a trunk build
133+
$> TARGET_TRIPLE=<amdgcn-amd-amdhsa or nvptx64-nvidia-cuda>
134+
$> cmake ../llvm \ # Point to the llvm directory
135+
-G Ninja \
136+
-DLLVM_ENABLE_PROJECTS=libc \
137+
-DCMAKE_C_COMPILER=$CLANG_C_COMPILER \
138+
-DCMAKE_CXX_COMPILER=$CLANG_CXX_COMPILER \
139+
-DLLVM_LIBC_FULL_BUILD=ON \
140+
-DLIBC_TARGET_TRIPLE=$TARGET_TRIPLE \
141+
-DCMAKE_BUILD_TYPE=Release
142+
$> ninja install
143+
144+
This will build and install the GPU C library along with all the other LLVM
145+
libraries.
146+
147+
Build overview
148+
==============
149+
150+
Once installed, the GPU build will create several files used for different
151+
targets. This section will briefly describe their purpose.
152+
153+
**lib/<host-triple>/libcgpu-amdgpu.a or lib/libcgpu-amdgpu.a**
154+
A static library containing fat binaries supporting AMD GPUs. These are built
155+
using the support described in the `clang documentation
156+
<https://clang.llvm.org/docs/OffloadingDesign.html>`_. These are intended to
157+
be static libraries included natively for offloading languages like CUDA, HIP,
158+
or OpenMP. This implements the standard C library.
159+
160+
**lib/<host-triple>/libmgpu-amdgpu.a or lib/libmgpu-amdgpu.a**
161+
A static library containing fat binaries that implements the standard math
162+
library for AMD GPUs.
163+
164+
**lib/<host-triple>/libcgpu-nvptx.a or lib/libcgpu-nvptx.a**
165+
A static library containing fat binaries that implement the standard C library
166+
for NVIDIA GPUs.
167+
168+
**lib/<host-triple>/libmgpu-nvptx.a or lib/libmgpu-nvptx.a**
169+
A static library containing fat binaries that implement the standard math
170+
library for NVIDIA GPUs.
171+
172+
**include/<target-triple>**
173+
The include directory where all of the generated headers for the target will
174+
go. These definitions are strictly for the GPU when being targeted directly.
175+
176+
**lib/clang/<llvm-major-version>/include/llvm-libc-wrappers/llvm-libc-decls**
177+
These are wrapper headers created for offloading languages like CUDA, HIP, or
178+
OpenMP. They contain functions supported in the GPU libc along with attributes
179+
and metadata that declare them on the target device and make them compatible
180+
with the host headers.
181+
182+
**lib/<target-triple>/libc.a**
183+
The main C library static archive containing LLVM-IR targeting the given GPU.
184+
It can be linked directly or inspected depending on the target support.
185+
186+
**lib/<target-triple>/libm.a**
187+
The C library static archive providing implementations of the standard math
188+
functions.
189+
190+
**lib/<target-triple>/libc.bc**
191+
An alternate form of the library provided as a single LLVM-IR bitcode blob.
192+
This can be used similarly to NVIDIA's or AMD's device libraries.
193+
194+
**lib/<target-triple>/libm.bc**
195+
An alternate form of the library provided as a single LLVM-IR bitcode blob
196+
containing the standard math functions.
197+
198+
**lib/<target-triple>/crt1.o**
199+
An LLVM-IR file containing startup code to call the ``main`` function on the
200+
GPU. This is used similarly to the standard C library startup object.
201+
202+
**bin/amdhsa-loader**
203+
A binary utility used to launch executables compiled targeting the AMD GPU.
204+
This will be included if the build system found the ``hsa-runtime64`` library
205+
either in ``/opt/rocm`` or the current CMake installation directory. This is
206+
required to build the GPU tests .See the :ref:`libc GPU usage<libc_gpu_usage>`
207+
for more information.
208+
209+
**bin/nvptx-loader**
210+
A binary utility used to launch executables compiled targeting the NVIDIA GPU.
211+
This will be included if the build system found the CUDA driver API. This is
212+
required for building tests.
213+
214+
**include/llvm-libc-rpc-server.h**
215+
A header file containing definitions that can be used to interface with the
216+
:ref:`RPC server<libc_gpu_rpc>`.
217+
218+
**lib/libllvmlibc_rpc_server.a**
219+
The static library containing the implementation of the RPC server. This can
220+
be used to enable host services for anyone looking to interface with the
221+
:ref:`RPC client<libc_gpu_rpc>`.
222+
223+
CMake options
224+
=============
225+
226+
This section briefly lists a few of the CMake variables that specifically
227+
control the GPU build of the C library.
228+
229+
**LLVM_LIBC_FULL_BUILD**:BOOL
230+
This flag controls whether or not the libc build will generate its own
231+
headers. This must always be on when targeting the GPU.
232+
233+
**LIBC_GPU_TEST_ARCHITECTURE**:STRING
234+
Sets the architecture used to build the GPU tests for, such as ``gfx90a`` or
235+
``sm_80`` for AMD and NVIDIA GPUs respectively. The default behavior is to
236+
detect the system's GPU architecture using the ``native`` option. If this
237+
option is not set and a GPU was not detected the tests will not be built.
238+
239+
**LIBC_GPU_TEST_JOBS**:STRING
240+
Sets the number of threads used to run GPU tests. The GPU test suite will
241+
commonly run out of resources if this is not constrained so it is recommended
242+
to keep it low. The default value is a single thread.
243+
244+
**LIBC_GPU_LOADER_EXECUTABLE**:STRING
245+
Overrides the default loader used for running GPU tests. If this is not
246+
provided the standard one will be built.

libc/docs/gpu/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ learn more about this project.
1212

1313
.. toctree::
1414

15+
building
1516
using
1617
support
17-
testing
1818
rpc
19+
testing
1920
motivation

libc/docs/gpu/rpc.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,8 @@ in the GPU executable as an indicator for whether or not the server can be
188188
checked. These details should ideally be handled by the GPU language runtime,
189189
but the following example shows how it can be used by a standard user.
190190

191+
.. _libc_gpu_cuda_server:
192+
191193
.. code-block:: cuda
192194
193195
#include <cstdio>

0 commit comments

Comments
 (0)