@@ -394,6 +394,51 @@ llvm-no-spir-kernel host.bc
394
394
395
395
It returns 0 if no kernels are present and 1 otherwise.
396
396
397
+ #### Device code split
398
+
399
+ Putting all device code into a single SPIRV module does not work well in the
400
+ following cases:
401
+ 1. There are thousands of kernels defined and only small part of them is used at
402
+ run-time. Having them all in one SPIR-V module significantly increases JIT time.
403
+ 2. Device code can be specialized for different devices. For example, kernels
404
+ that are supposed to be executed only on FPGA can use extensions avaliable for
405
+ FPGA only. This will cause JIT compilation failure on other devices even if this
406
+ particular kernel is never called on them.
407
+
408
+ To resolve these problems the compiler can split a single module into smaller
409
+ ones. The following features is supported:
410
+ * Emitting a separate module for source (translation unit)
411
+ * Emitting a separate module for each kernel
412
+
413
+ The current approach is:
414
+ * Generate special meta-data with translation unit ID for each kernel in SYCL
415
+ front-end. This ID will be used to group kernels on per-translation unit basis
416
+ * Link all device LLVM modules using llvm-link
417
+ * Perform split on a fully linked module
418
+ * Generate a symbol table (list of kernels) for each produced device module for
419
+ proper module selection in runtime
420
+ * Perform SPIR-V translation and AOT compilation (if requested) on each produced
421
+ module separately
422
+ * Add information about presented kernels to a wrappring object for each device
423
+ image
424
+
425
+ Device code splitting process:
426
+ 
427
+
428
+ The "split" box is implemented as functionality of the dedicated tool
429
+ `sycl-post-link`. The tool runs a set of LLVM passes to split input module and
430
+ generates a symbol table (list of kernels) for each produced device module.
431
+
432
+ To enable device code split, a special option must be passed to the clang
433
+ driver:
434
+
435
+ `-fsycl-device-code-split=<value>`
436
+
437
+ There are three possible values for this option:
438
+ * `per_source` - enables emitting a separate module for each source (translation
439
+ unit)
440
+ * `per_kernel` - enables emitting a separate module for each kernel
441
+ * `off` - disables device code split
397
442
398
443
### Integration with SPIR-V format
399
444
0 commit comments