@@ -758,6 +758,29 @@ entry:
758
758
759
759
Note: Kernel naming is not fully stable for now.
760
760
761
+ ##### Kernel Fusion Support
762
+
763
+ The [ experimental kernel fusion
764
+ extension] ( ../extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc )
765
+ also supports the CUDA backend. However, as neither CUBIN nor PTX are a suitable
766
+ input format for the [ kernel fusion JIT compiler] ( KernelFusionJIT.md ) , a
767
+ suitable IR has to be added as an additional device binary.
768
+
769
+ Therefore, in case kernel fusion should be performed for the CUDA backend, the
770
+ user needs to specify the additional flag ` -fsycl-embed-ir ` during compilation,
771
+ to add LLVM IR as an additional device binary. When the flag ` -fsycl-embed-ir `
772
+ is specified, the LLVM IR produced by Clang for the CUDA backend device
773
+ compilation is added to the fat binary file. To this end, the resulting
774
+ file-table from ` sycl-post-link ` is additionally passed to the
775
+ ` clang-offload-wrapper ` , creating a wrapper object with target ` llvm_nvptx64 ` .
776
+
777
+ This device binary in LLVM IR format can be retrieved by the SYCL runtime and
778
+ used by the kernel fusion JIT compiler. The resulting fused kernel is compiled
779
+ to PTX assembly by the kernel fusion JIT compiler at runtime.
780
+
781
+ Note that the device binary in LLVM IR does not replace the device binary in
782
+ CUBIN/PTX format, but is embed in addition to it.
783
+
761
784
### Integration with SPIR-V format
762
785
763
786
This section explains how to generate SPIR-V specific types and operations from
0 commit comments