You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sycl/doc/design/KernelFusionJIT.md
+14-5Lines changed: 14 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -162,11 +162,20 @@ The metadata is attached to a function that will become the fused kernel:
162
162
163
163
### Support for non SPIR-V targets
164
164
165
-
Non SPIR-V targets (NVPTX / AMDGCN) are not supported at the moment as they cannot ingest a SPIR-V module. However, we are looking into adding support for these targets once the initial SPIR-V based path is operational.
165
+
Fusion is currently supported for the NVPTX/CUDA backend.
166
166
167
-
In this scenario, two options are possible to add JIT support:
167
+
As this backend cannot ingest a SPIR-V module, additional changes to the
168
+
compilation flow are necessary. During static compilation the LLVM module for
169
+
this backend is stored in addition to the finalized binary.
168
170
169
-
- During static compilation we store the LLVM module on top of the finalized binary. This behavior could be controlled by a flag to avoid a too important binary inflation. Then, during the fusion process, the JIT will load that LLVM IR and finalize the fused kernel to the final target as driven by the PI plugin.
170
-
- SPIR-V ingestion support is added for these targets. The module to be loaded could then be the generic SPIR-V module. This path would however exclude target specific optimizations written in user's code. The current state of the SPIR-V translator does not allow this at the moment and significant work is needed to add this support.
171
+
This behavior is controlled by the `-fsycl-embed-ir` flag to avoid binary
172
+
inflation in case kernel fusion is not used. If users want to use kernel fusion
173
+
at runtime on the NVPTX/CUDA backend, they need to pass the `-fsycl-embed-ir`
174
+
flag during static compilation.
171
175
172
-
In these cases, PI will need to be extended to allow to somehow drive the JIT process, so it is tailored to the plugin target needs.
176
+
During the fusion process at runtime , the JIT will load the LLVM IR and
177
+
finalize the fused kernel to the final target. More information is available
0 commit comments