Skip to content

Commit 6818c7b

Browse files
authored
[libc] Update GPU testing documentation (#85459)
Summary: This documentation was lagging reality and didn't contain much. Update it with some more information now that it's more mature.
1 parent 01fa550 commit 6818c7b

File tree

3 files changed

+141
-23
lines changed

3 files changed

+141
-23
lines changed

libc/docs/gpu/building.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,11 +220,15 @@ targets. This section will briefly describe their purpose.
220220
be used to enable host services for anyone looking to interface with the
221221
:ref:`RPC client<libc_gpu_rpc>`.
222222

223+
.. _gpu_cmake_options:
224+
223225
CMake options
224226
=============
225227

226228
This section briefly lists a few of the CMake variables that specifically
227-
control the GPU build of the C library.
229+
control the GPU build of the C library. These options can be passed individually
230+
to each target using ``-DRUNTIMES_<target>_<variable>=<value>`` when using a
231+
standard runtime build.
228232

229233
**LLVM_LIBC_FULL_BUILD**:BOOL
230234
This flag controls whether or not the libc build will generate its own

libc/docs/gpu/testing.rst

Lines changed: 127 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
.. _libc_gpu_testing:
22

33

4-
============================
5-
Testing the GPU libc library
6-
============================
4+
=========================
5+
Testing the GPU C library
6+
=========================
77

88
.. note::
99
Running GPU tests with high parallelism is likely to cause spurious failures,
@@ -14,24 +14,134 @@ Testing the GPU libc library
1414
:depth: 4
1515
:local:
1616

17-
Testing Infrastructure
17+
Testing infrastructure
1818
======================
1919

20-
The testing support in LLVM's libc implementation for GPUs is designed to mimic
21-
the standard unit tests as much as possible. We use the :ref:`libc_gpu_rpc`
22-
support to provide the necessary utilities like printing from the GPU. Execution
23-
is performed by emitting a ``_start`` kernel from the GPU
24-
that is then called by an external loader utility. This is an example of how
25-
this can be done manually:
20+
The LLVM C library supports different kinds of :ref:`tests <build_and_test>`
21+
depending on the build configuration. The GPU target is considered a full build
22+
and therefore provides all of its own utilities to build and run the generated
23+
tests. Currently the GPU supports two kinds of tests.
24+
25+
#. **Hermetic tests** - These are unit tests built with a test suite similar to
26+
Google's ``gtest`` infrastructure. These use the same infrastructure as unit
27+
tests except that the entire environment is self-hosted. This allows us to
28+
run them on the GPU using our custom utilities. These are used to test the
29+
majority of functional implementations.
30+
31+
#. **Integration tests** - These are lightweight tests that simply call a
32+
``main`` function and checks if it returns non-zero. These are primarily used
33+
to test interfaces that are sensitive to threading.
34+
35+
The GPU uses the same testing infrastructure as the other supported ``libc``
36+
targets. We do this by treating the GPU as a standard hosted environment capable
37+
of launching a ``main`` function. Effectively, this means building our own
38+
startup libraries and loader.
39+
40+
Testing utilities
41+
=================
42+
43+
We provide two utilities to execute arbitrary programs on the GPU. That is the
44+
``loader`` and the ``start`` object.
45+
46+
Startup object
47+
--------------
48+
49+
This object mimics the standard object used by existing C library
50+
implementations. Its job is to perform the necessary setup prior to calling the
51+
``main`` function. In the GPU case, this means exporting GPU kernels that will
52+
perform the necessary operations. Here we use ``_begin`` and ``_end`` to handle
53+
calling global constructors and destructors while ``_start`` begins the standard
54+
execution. The following code block shows the implementation for AMDGPU
55+
architectures.
56+
57+
.. code-block:: c++
58+
59+
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
60+
_begin(int argc, char **argv, char **env) {
61+
LIBC_NAMESPACE::atexit(&LIBC_NAMESPACE::call_fini_array_callbacks);
62+
LIBC_NAMESPACE::call_init_array_callbacks(argc, argv, env);
63+
}
64+
65+
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
66+
_start(int argc, char **argv, char **envp, int *ret) {
67+
__atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED);
68+
}
69+
70+
extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
71+
_end(int retval) {
72+
LIBC_NAMESPACE::exit(retval);
73+
}
74+
75+
Loader runtime
76+
--------------
77+
78+
The startup object provides a GPU executable with callable kernels for the
79+
respective runtime. We can then define a minimal runtime that will launch these
80+
kernels on the given device. Currently we provide the ``amdhsa-loader`` and
81+
``nvptx-loader`` targeting the AMD HSA runtime and CUDA driver runtime
82+
respectively. By default these will launch with a single thread on the GPU.
2683

2784
.. code-block:: sh
2885
29-
$> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto
30-
$> ./amdhsa_loader --threads 1 --blocks 1 a.out
86+
$> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto
87+
$> amdhsa_loader --threads 1 --blocks 1 ./a.out
3188
Test Passed!
3289
33-
Unlike the exported ``libcgpu.a``, the testing architecture can only support a
34-
single architecture at a time. This is either detected automatically, or set
35-
manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful
36-
in cases where the user does not build LLVM's libc on machine with the GPU to
37-
use for testing.
90+
The loader utility will forward any arguments passed after the executable image
91+
to the program on the GPU as well as any set environment variables. The number
92+
of threads and blocks to be set can be controlled with ``--threads`` and
93+
``--blocks``. These also accept additional ``x``, ``y``, ``z`` variants for
94+
multidimensional grids.
95+
96+
Running tests
97+
=============
98+
99+
Tests will only be built and run if a GPU target architecture is set and the
100+
corresponding loader utility was built. These can be overridden with the
101+
``LIBC_GPU_TEST_ARCHITECTURE`` and ``LIBC_GPU_LOADER_EXECUTABLE`` :ref:`CMake
102+
options <gpu_cmake_options>`. Once built, they can be run like any other tests.
103+
The CMake target depends on how the library was built.
104+
105+
#. **Cross build** - If the C library was built using ``LLVM_ENABLE_PROJECTS``
106+
or a runtimes cross build, then the standard targets will be present in the
107+
base CMake build directory.
108+
109+
#. All tests - You can run all supported tests with the command:
110+
111+
.. code-block:: sh
112+
113+
$> ninja check-libc
114+
115+
#. Hermetic tests - You can run hermetic with tests the command:
116+
117+
.. code-block:: sh
118+
119+
$> ninja libc-hermetic-tests
120+
121+
#. Integration tests - You can run integration tests by the command:
122+
123+
.. code-block:: sh
124+
125+
$> ninja libc-integration-tests
126+
127+
#. **Runtimes build** - If the library was built using ``LLVM_ENABLE_RUNTIMES``
128+
then the actual ``libc`` build will be in a separate directory.
129+
130+
#. All tests - You can run all supported tests with the command:
131+
132+
.. code-block:: sh
133+
134+
$> ninja check-libc-amdgcn-amd-amdhsa
135+
$> ninja check-libc-nvptx64-nvidia-cuda
136+
137+
#. Specific tests - You can use the same targets as above by entering the
138+
runtimes build directory.
139+
140+
.. code-block:: sh
141+
142+
$> ninja -C runtimes/runtimes-amdgcn-amd-amdhsa-bins check-libc
143+
$> ninja -C runtimes/runtimes-nvptx64-nvidia-cuda-bins check-libc
144+
$> cd runtimes/runtimes-amdgcn-amd-amdhsa-bins && ninja check-libc
145+
$> cd runtimes/runtimes-nvptx64-nvidia-cuda-bins && ninja check-libc
146+
147+
Tests can also be built and run manually using the respective loader utility.

libc/docs/gpu/using.rst

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -159,17 +159,21 @@ GPUs.
159159
}
160160

161161
We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
162-
following commands.
162+
following commands. This will yield valid LLVM-IR for the given target just like
163+
if we were using CUDA, OpenCL, or OpenMP.
163164

164165
.. code-block:: sh
165166
166167
$> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
167168
$> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
168169
169-
We use this support to treat the GPU as a hosted environment by providing a C
170-
library and startup object just like a standard C library running on the host
171-
machine. Then, in order to execute these programs, we provide a loader utility
172-
to launch the executable on the GPU similar to a cross-compiling emulator.
170+
We can also use this support to treat the GPU as a hosted environment by
171+
providing a C library and startup object just like a standard C library running
172+
on the host machine. Then, in order to execute these programs, we provide a
173+
loader utility to launch the executable on the GPU similar to a cross-compiling
174+
emulator. This is how we run :ref:`unit tests <libc_gpu_testing>` targeting the
175+
GPU. This is clearly not the most efficient way to use a GPU, but it provides a
176+
simple method to test execution on a GPU for debugging or development.
173177

174178
Building for AMDGPU targets
175179
^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)