Skip to content

Commit b4d3968

Browse files
committed
Update kernel fusion design document
Signed-off-by: Lukas Sommer <[email protected]>
1 parent bc32fad commit b4d3968

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

sycl/doc/design/KernelFusionJIT.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -162,11 +162,20 @@ The metadata is attached to a function that will become the fused kernel:
162162

163163
### Support for non SPIR-V targets
164164

165-
Non SPIR-V targets (NVPTX / AMDGCN) are not supported at the moment as they cannot ingest a SPIR-V module. However, we are looking into adding support for these targets once the initial SPIR-V based path is operational.
165+
Fusion is currently supported for the NVPTX/CUDA backend.
166166

167-
In this scenario, two options are possible to add JIT support:
167+
As this backend cannot ingest a SPIR-V module, additional changes to the
168+
compilation flow are necessary. During static compilation the LLVM module for
169+
this backend is stored in addition to the finalized binary.
168170

169-
- During static compilation we store the LLVM module on top of the finalized binary. This behavior could be controlled by a flag to avoid a too important binary inflation. Then, during the fusion process, the JIT will load that LLVM IR and finalize the fused kernel to the final target as driven by the PI plugin.
170-
- SPIR-V ingestion support is added for these targets. The module to be loaded could then be the generic SPIR-V module. This path would however exclude target specific optimizations written in user's code. The current state of the SPIR-V translator does not allow this at the moment and significant work is needed to add this support.
171+
This behavior is controlled by the `-fsycl-embed-ir` flag to avoid binary
172+
inflation in case kernel fusion is not used. If users want to use kernel fusion
173+
at runtime on the NVPTX/CUDA backend, they need to pass the `-fsycl-embed-ir`
174+
flag during static compilation.
171175

172-
In these cases, PI will need to be extended to allow to somehow drive the JIT process, so it is tailored to the plugin target needs.
176+
During the fusion process at runtime , the JIT will load the LLVM IR and
177+
finalize the fused kernel to the final target. More information is available
178+
[here](./CompilerAndRuntimeDesign.md#kernel-fusion-support).
179+
180+
Support for the AMD GPU/HIP/AMDGCN backend is not yet implemented, but could
181+
follow an approach similar to the NVPTX/CUDA backend.

0 commit comments

Comments
 (0)