@@ -1058,3 +1058,31 @@ the translator will generate `OpSpecConstant` SPIR-V instructions with proper
1058
1058
OpReturnValue %struct
1059
1059
OpFunctionEnd
1060
1060
```
1061
+
1062
+ ### Specialization constant to CUDA symbol optimization
1063
+
1064
+ CUDA backend uses a hybrid approach in which the specialization constants are
1065
+ still bundled into one memory buffer (as per emulated support), however the
1066
+ implicit kernel argument is replaced by a global symbol.
1067
+
1068
+ #### Compiler support
1069
+
1070
+ ` sycl-post-link ` detects if the compilation targets CUDA backend and if the
1071
+ kernel makes use of specialization constants and if so adds
1072
+ ` CUDASpecConstantToSymbolPass ` to the optimization pipeline. The purpose of the
1073
+ pass is as follows:
1074
+ * to allocate a global symbol of the same size as the accumulated size of all the specialization constants,
1075
+ * to rewrite the kernel signature in order to remove the implicit argument,
1076
+ * to replace all uses of the implicit kernel argument with corresponding uses of the global variable.
1077
+ The global variable allocated by the pass follows a naming convention of
1078
+ ` sycl_specialization_constants_kernel_ ` + kernel name, the convention is
1079
+ important, as it allows the runtime to query for the symbol, at the time of
1080
+ setting its value.
1081
+
1082
+ #### RT symbol setup
1083
+
1084
+ PI CUDA plugin implements ` piextProgramSetSpecializationConstant ` API entry, in
1085
+ which it uses the ` kernel ` argument to construct the name of the pre-allocated
1086
+ global variable. The name is passed to ` cuModuleGetGlobal ` that retrieves the
1087
+ address of the global, finally a call to ` cuMemcpyHtoD ` is issued to transfer
1088
+ the data corresponding to bundled specialization constants.
0 commit comments