Skip to content

Commit 3d37c91

Browse files
Fznamznonromanovvlad
authored andcommitted
[SYCL][Doc] Add device code split feature design (#631)
Signed-off-by: Mariya Podchishchaeva <[email protected]>
1 parent b06fc66 commit 3d37c91

File tree

2 files changed

+1351
-0
lines changed

2 files changed

+1351
-0
lines changed

sycl/doc/SYCLCompilerAndRuntimeDesign.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,51 @@ llvm-no-spir-kernel host.bc
394394
395395
It returns 0 if no kernels are present and 1 otherwise.
396396
397+
#### Device code split
398+
399+
Putting all device code into a single SPIRV module does not work well in the
400+
following cases:
401+
1. There are thousands of kernels defined and only small part of them is used at
402+
run-time. Having them all in one SPIR-V module significantly increases JIT time.
403+
2. Device code can be specialized for different devices. For example, kernels
404+
that are supposed to be executed only on FPGA can use extensions avaliable for
405+
FPGA only. This will cause JIT compilation failure on other devices even if this
406+
particular kernel is never called on them.
407+
408+
To resolve these problems the compiler can split a single module into smaller
409+
ones. The following features is supported:
410+
* Emitting a separate module for source (translation unit)
411+
* Emitting a separate module for each kernel
412+
413+
The current approach is:
414+
* Generate special meta-data with translation unit ID for each kernel in SYCL
415+
front-end. This ID will be used to group kernels on per-translation unit basis
416+
* Link all device LLVM modules using llvm-link
417+
* Perform split on a fully linked module
418+
* Generate a symbol table (list of kernels) for each produced device module for
419+
proper module selection in runtime
420+
* Perform SPIR-V translation and AOT compilation (if requested) on each produced
421+
module separately
422+
* Add information about presented kernels to a wrappring object for each device
423+
image
424+
425+
Device code splitting process:
426+
![Device code splitting](images/DeviceCodeSplit.svg)
427+
428+
The "split" box is implemented as functionality of the dedicated tool
429+
`sycl-post-link`. The tool runs a set of LLVM passes to split input module and
430+
generates a symbol table (list of kernels) for each produced device module.
431+
432+
To enable device code split, a special option must be passed to the clang
433+
driver:
434+
435+
`-fsycl-device-code-split=<value>`
436+
437+
There are three possible values for this option:
438+
* `per_source` - enables emitting a separate module for each source (translation
439+
unit)
440+
* `per_kernel` - enables emitting a separate module for each kernel
441+
* `off` - disables device code split
397442
398443
### Integration with SPIR-V format
399444

0 commit comments

Comments
 (0)