Skip to content

Commit bc32fad

Browse files
committed
Document CUDA kernel fusion in design documentation
Signed-off-by: Lukas Sommer <[email protected]>
1 parent f7df423 commit bc32fad

File tree

2 files changed

+124
-127
lines changed

2 files changed

+124
-127
lines changed

sycl/doc/design/CompilerAndRuntimeDesign.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -758,6 +758,29 @@ entry:
758758

759759
Note: Kernel naming is not fully stable for now.
760760

761+
##### Kernel Fusion Support
762+
763+
The [experimental kernel fusion
764+
extension](../extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc)
765+
also supports the CUDA backend. However, as neither CUBIN nor PTX are a suitable
766+
input format for the [kernel fusion JIT compiler](KernelFusionJIT.md), a
767+
suitable IR has to be added as an additional device binary.
768+
769+
Therefore, in case kernel fusion should be performed for the CUDA backend, the
770+
user needs to specify the additional flag `-fsycl-embed-ir` during compilation,
771+
to add LLVM IR as an additional device binary. When the flag `-fsycl-embed-ir`
772+
is specified, the LLVM IR produced by Clang for the CUDA backend device
773+
compilation is added to the fat binary file. To this end, the resulting
774+
file-table from `sycl-post-link` is additionally passed to the
775+
`clang-offload-wrapper`, creating a wrapper object with target `llvm_nvptx64`.
776+
777+
This device binary in LLVM IR format can be retrieved by the SYCL runtime and
778+
used by the kernel fusion JIT compiler. The resulting fused kernel is compiled
779+
to PTX assembly by the kernel fusion JIT compiler at runtime.
780+
781+
Note that the device binary in LLVM IR does not replace the device binary in
782+
CUBIN/PTX format, but is embed in addition to it.
783+
761784
### Integration with SPIR-V format
762785

763786
This section explains how to generate SPIR-V specific types and operations from

0 commit comments

Comments
 (0)