Skip to content

Commit 545a243

Browse files
authored
[Doc] Update SYCL CUDA documentation (#4214)
Update the documentation regarding the compilation process for CUDA targets to reflect the module splitting support. Signed-off-by: Victor Lomuller <[email protected]>
1 parent d36ecab commit 545a243

File tree

4 files changed

+536
-286
lines changed

4 files changed

+536
-286
lines changed

clang/include/clang/Driver/Action.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -738,7 +738,7 @@ class SYCLPostLinkJobAction : public JobAction {
738738
void anchor() override;
739739

740740
public:
741-
// The tempfiletable management relies on a shadowing the main file type by
741+
// The tempfiletable management relies on shadowing the main file type by
742742
// types::TY_Tempfiletable. The problem of shadowing is it prevents its
743743
// integration with clang tools that relies on the file type to properly set
744744
// args.

sycl/doc/CompilerAndRuntimeDesign.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -548,13 +548,15 @@ down to the NVPTX Back End. All produced bitcode depends on two libraries,
548548

549549
During the "PTX target processing" in the device linking step [Device
550550
code post-link step](#device-code-post-link-step), the llvm bitcode
551-
objects for the CUDA target are linked together alongside
552-
`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc`, compiled to PTX
553-
using the NVPTX backend and assembled into a cubin using the `ptxas`
554-
tool (part of the CUDA SDK). The PTX file and cubin are assembled
555-
together using `fatbinary` to produce a CUDA fatbin. The CUDA fatbin
556-
then replaces the llvm bitcode file in the file table generated by
557-
`sycl-post-link`. The resulting table is passed to the offload wrapper tool.
551+
objects for the CUDA target are linked together during the common
552+
`llvm-link` step and then split using the `sycl-post-link` tool.
553+
For each temporary bitcode file, clang is invoked for the temporary file to link
554+
`libspirv-nvptx64--nvidiacl.bc` and `libdevice.bc` and compile the resulting
555+
module to PTX using the NVPTX backend. The resulting PTX file is assembled
556+
into a cubin using the `ptxas` tool (part of the CUDA SDK). The PTX file and
557+
cubin are assembled together using `fatbinary` to produce a CUDA fatbin.
558+
The produced CUDA fatbins then replace the llvm bitcode files in the file table generated
559+
by `sycl-post-link`. The resulting table is passed to the offload wrapper tool.
558560

559561
![NVPTX AOT build](images/DevicePTXProcessing.svg)
560562

0 commit comments

Comments
 (0)