[libc] Update GPU testing documentation

jhuber6 · jhuber6 · commit 729206a464cb · 2024-03-15T17:41:32.000-05:00
Summary:
This documentation was lagging reality and didn't contain much. Update
it with some more information now that it's more mature.
diff --git a/libc/docs/gpu/building.rst b/libc/docs/gpu/building.rst
@@ -220,11 +220,15 @@ targets. This section will briefly describe their purpose.
   be used to enable host services for anyone looking to interface with the
   :ref:`RPC client<libc_gpu_rpc>`.
 
+.. _gpu_cmake_options:
+
 CMake options
 =============
 
 This section briefly lists a few of the CMake variables that specifically
-control the GPU build of the C library.
+control the GPU build of the C library. These options can be passed individually
+to each target using ``-DRUNTIMES_<target>_<variable>=<value>`` when using a
+standard runtime build.
 
 **LLVM_LIBC_FULL_BUILD**:BOOL
   This flag controls whether or not the libc build will generate its own
diff --git a/libc/docs/gpu/testing.rst b/libc/docs/gpu/testing.rst
@@ -1,9 +1,9 @@
 .. _libc_gpu_testing:
 
 
-============================
-Testing the GPU libc library
-============================
+=========================
+Testing the GPU C library
+=========================
 
 .. note::
    Running GPU tests with high parallelism is likely to cause spurious failures,
@@ -14,24 +14,134 @@ Testing the GPU libc library
   :depth: 4
   :local:
 
-Testing Infrastructure
+Testing infrastructure
 ======================
 
-The testing support in LLVM's libc implementation for GPUs is designed to mimic
-the standard unit tests as much as possible. We use the :ref:`libc_gpu_rpc`
-support to provide the necessary utilities like printing from the GPU. Execution
-is performed by emitting a ``_start`` kernel from the GPU
-that is then called by an external loader utility. This is an example of how
-this can be done manually:
+The LLVM C library supports different kinds of :ref:`tests <build_and_test>`
+depending on the build configuration. The GPU target is considered a full build
+and therefore provides all of its own utilities to build and run the generated
+tests. Currently the GPU supports two kinds of tests.
+
+#. **Hermetic tests** - These are unit tests built with a test suite similar to
+   Google's ``gtest`` infrastructure. These use the same infrastructure as unit
+   tests except that the entire environment is self-hosted. This allows us to
+   run them on the GPU using our custom utilities. These are used to test the
+   majority of functional implementations.
+
+#. **Integration tests** - These are lightweight tests that simply call a
+   ``main`` function and checks if it returns non-zero. These are primarily used
+   to test interfaces that are sensitive to threading.
+
+The GPU uses the same testing infrastructure as the other supported ``libc``
+targets. We do this by treating the GPU as a standard hosted environment capable
+of launching a ``main`` function. Effectively, this means building our own
+startup libraries and loader.
+
+Testing utilities
+=================
+
+We provide two utilities to execute arbitrary programs on the GPU. That is the
+``loader`` and the ``start`` object.
+
+Startup object
+--------------
+
+This object mimics the standard object used by existing C library
+implementations. Its job is to perform the necessary setup prior to calling the
+``main`` function. In the GPU case, this means exporting GPU kernels that will
+perform the necessary operations. Here we use ``_begin`` and ``_end`` to handle
+calling global constructors and destructors while ``_start`` begins the standard
+execution. The following code block shows the implementation for AMDGPU
+architectures.
+
+.. code-block:: c++
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _begin(int argc, char **argv, char **env) {
+    LIBC_NAMESPACE::atexit(&LIBC_NAMESPACE::call_fini_array_callbacks);
+    LIBC_NAMESPACE::call_init_array_callbacks(argc, argv, env);
+  }
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _start(int argc, char **argv, char **envp, int *ret) {
+    __atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED);
+  }
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _end(int retval) {
+    LIBC_NAMESPACE::exit(retval);
+  }
+
+Loader runtime
+--------------
+
+The startup object provides a GPU executable with callable kernels for the
+respective runtime. We can then define a minimal runtime that will launch these
+kernels on the given device. Currently we provide the ``amdhsa-loader`` and
+``nvptx-loader`` targeting the AMD HSA runtime and CUDA driver runtime
+respectively. By default these will launch with a single thread on the GPU.
 
 .. code-block:: sh
 
-   $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto
-   $> ./amdhsa_loader --threads 1 --blocks 1 a.out
+   $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto
+   $> amdhsa_loader --threads 1 --blocks 1 ./a.out
    Test Passed!
 
-Unlike the exported ``libcgpu.a``, the testing architecture can only support a
-single architecture at a time. This is either detected automatically, or set
-manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful
-in cases where the user does not build LLVM's libc on machine with the GPU to
-use for testing.
+The loader utility will forward any arguments passed after the executable image
+to the program on the GPU as well as any set environment variables. The number
+of threads and blocks to be set can be controlled with ``--threads`` and
+``--blocks``. These also accept additional ``x``, ``y``, ``z`` variants for
+multidimensional grids.
+
+Running tests
+=============
+
+Tests will only be built and run if a GPU target architecture is set and the
+corresponding loader utility was built. These can be overridden with the
+``LIBC_GPU_TEST_ARCHITECTURE`` and ``LIBC_GPU_LOADER_EXECUTABLE`` :ref:`CMake
+options <gpu_cmake_options>`. Once built, they can be run like any other tests.
+The CMake target depends on how the library was built.
+
+#. **Cross build** - If the C library was built using ``LLVM_ENABLE_PROJECTS``
+   or a runtimes cross build, then the standard targets will be present in the
+   base CMake build directory.
+
+   #. All tests - You can run all supported tests with the command:
+
+      .. code-block:: sh
+
+        $> ninja check-libc
+
+   #. Hermetic tests - You can run hermetic with tests the command:
+
+      .. code-block:: sh
+
+        $> ninja libc-hermetic-tests
+
+   #. Integration tests - You can run integration tests by the command:
+
+      .. code-block:: sh
+
+        $> ninja libc-integration-tests
+
+#. **Runtimes build** - If the library was built using ``LLVM_ENABLE_RUNTIMES``
+   then the actual ``libc`` build will be in a separate directory.
+
+   #. All tests - You can run all supported tests with the command:
+
+      .. code-block:: sh
+
+        $> ninja check-libc-amdgcn-amd-amdhsa
+        $> ninja check-libc-nvptx64-nvidia-cuda
+
+   #. Specific tests - You can use the same targets as above by entering the
+      runtimes build directory.
+
+      .. code-block:: sh
+
+        $> ninja -C runtimes/runtimes-amdgcn-amd-amdhsa-bins check-libc
+        $> ninja -C runtimes/runtimes-nvptx64-nvidia-cuda-bins check-libc
+        $> cd runtimes/runtimes-amdgcn-amd-amdhsa-bins && ninja check-libc
+        $> cd runtimes/runtimes-nvptx64-nvidia-cuda-bins && ninja check-libc
+
+Tests can also be built and run manually using the respective loader utility.
diff --git a/libc/docs/gpu/using.rst b/libc/docs/gpu/using.rst
@@ -159,17 +159,21 @@ GPUs.
   }
 
 We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
-following commands.
+following commands. This will yield valid LLVM-IR for the given target just like
+if we were using CUDA, OpenCL, or OpenMP.
 
 .. code-block:: sh
 
   $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
   $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
 
-We use this support to treat the GPU as a hosted environment by providing a C
-library and startup object just like a standard C library running on the host
-machine. Then, in order to execute these programs, we provide a loader utility
-to launch the executable on the GPU similar to a cross-compiling emulator.
+We can also use this support to treat the GPU as a hosted environment by
+providing a C library and startup object just like a standard C library running
+on the host machine. Then, in order to execute these programs, we provide a
+loader utility to launch the executable on the GPU similar to a cross-compiling
+emulator. This is how we run :ref:`unit tests <libc_gpu_testing>` targeting the
+GPU. This is clearly not the most efficient way to use a GPU, but it provides a
+simple method to test execution on a GPU for debugging or development.
 
 Building for AMDGPU targets
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^