Skip to content

Commit a972d5b

Browse files
committed
Document how the optimization works
1 parent 001892b commit a972d5b

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

sycl/doc/design/SYCL2020-SpecializationConstants.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1058,3 +1058,31 @@ the translator will generate `OpSpecConstant` SPIR-V instructions with proper
10581058
OpReturnValue %struct
10591059
OpFunctionEnd
10601060
```
1061+
1062+
### Specialization constant to CUDA symbol optimization
1063+
1064+
CUDA backend uses a hybrid approach in which the specialization constants are
1065+
still bundled into one memory buffer (as per emulated support), however the
1066+
implicit kernel argument is replaced by a global symbol.
1067+
1068+
#### Compiler support
1069+
1070+
`sycl-post-link` detects if the compilation targets CUDA backend and if the
1071+
kernel makes use of specialization constants and if so adds
1072+
`CUDASpecConstantToSymbolPass` to the optimization pipeline. The purpose of the
1073+
pass is as follows:
1074+
* to allocate a global symbol of the same size as the accumulated size of all the specialization constants,
1075+
* to rewrite the kernel signature in order to remove the implicit argument,
1076+
* to replace all uses of the implicit kernel argument with corresponding uses of the global variable.
1077+
The global variable allocated by the pass follows a naming convention of
1078+
`sycl_specialization_constants_kernel_` + kernel name, the convention is
1079+
important, as it allows the runtime to query for the symbol, at the time of
1080+
setting its value.
1081+
1082+
#### RT symbol setup
1083+
1084+
PI CUDA plugin implements `piextProgramSetSpecializationConstant` API entry, in
1085+
which it uses the `kernel` argument to construct the name of the pre-allocated
1086+
global variable. The name is passed to `cuModuleGetGlobal` that retrieves the
1087+
address of the global, finally a call to `cuMemcpyHtoD` is issued to transfer
1088+
the data corresponding to bundled specialization constants.

0 commit comments

Comments
 (0)