-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Implement specialization constants. #1356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review for post-link part.
Waiting for tests for new functionality.
2223f8c
to
dc40485
Compare
|
4e0b174
to
316a855
Compare
Dear reviewers - @AGindinson, @erichkeane, @AlexeySachkov, @Fznamznon, @sndmitriev, @smaslov-intel, @romanovvlad, @mdtoguchi, @sergey-semenov - please take another look. All prior comments have been addressed to the best of my knowledge. |
rebase, squashed deploy dependence commit, replaced first three commits with updated after code review. 2 review comments from @erichkeane to be fixed in FE |
316a855
to
8766334
Compare
two review comments in FE (SemaSYCL.cpp) addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small comments on the CFE parts, otherwise LGTM. Looks like others will have to approve as most of this is not CFE.
49fa9bc
to
3324fb3
Compare
addressed review comments from @erichkeane , @Fznamznon , @mdtoguchi |
@bader, when merging, please merge these first: let me know if you need updates from me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CFE Changes LGTM.
eeac8c3
to
5ef5cae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl-post-link part LGTM
BTW, It would be nice to have spec constants design explanation somewhere in this doc https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md someday. |
@Fznamznon, @sndmitriev, @smaslov-intel, @romanovvlad, @mdtoguchi - please approve post-link, offload wrapper, PI, RT and driver changes if you are OK with current code |
Post link part is approved #1356 (review) |
LGTM for Driver. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-offload-wrapper changes look okay,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PI-related changes look good to me.
yes, agree. But let's do this after the API and the implementation is finalized. |
5ef5cae
to
eb382f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor comment.
Add a spec constant lowering pass to sycl-post-link tool. Support file table output format. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
New PI API added: pi_result piProgramSetSpecializationConstant(pi_program prog, pi_uint32 spec_id, size_t spec_size, const void *spec_value); Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…I version. This change breaks backward binary compatibility of device binary image descriptors. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…ects. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
…d wrapper. 1. New option - "-properties=<file>". <file> must be a property set registry file, as defined by llvm/Support/PropertySetIO.h. The wrapper will add the property sets to the binary image descriptor and the them available to the runtime. 2. New options - "-batch". With this option the only input can be a file table, as defined by llvm/Support/SimpleTable.h. Column names are a part of interface between this tool and the sycl-post-link, which produces the file table. 3. Binary image descriptor LLVM type updated to resemble changes in Plugin Interface v1.2. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
1. Detect kernel lambda object captures corresponding to specialization constants and (a) don't create kernel arguments for them (b) generate specializations of the SpecConstantInfo structure into the integration header. 2. Recognize the __sycl_fe_getStableUniqueTypeName intrinsic and replace it with a string literal uniquely identifying the type of the typename template parameter to this intrinsic. 3. FE-related changes in the runtime: - new SpecConstantInfo templated struct for type->name translation for specialization constants used by integration header - define the __sycl_fe_getStableUniqueTypeName intrinsic Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Based on https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md 1. Define SYCL API (sycl/include/CL/sycl/experimental/spec_constant.hpp) 2. Add convenience C++ wrappers for PI device binary structures and refactor runtime to use the wrappers. Get rid of custom deleters for binary images. 3. Implement SYCL spec constant APIs in program an program manager. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
Clang driver's design can't handily model (1) multiple inputs/outputs in the action graph. Because of that, for example, sycl-post-link tool is invoked twice - once to to split the code and produce multiple bitcode files, and secondly - to generate symbol files for the split modules. (2) "Clusters" of inputs/outputs, when subsets of inputs/outputs are associated and describe different aspects of the same data. Example of such clustering is the split module + its symbol file above. Clustering would require support both in the driver and the tools invoked in response to actions. This commit moves SYCL offload processing to the "file table concept." sycl-post-link instead of (1) being invoked n times, once per each output type requested (once for device split and once for symbol file generation) (2) outputting multiple file lists each listing outputs from the corresponding invocation above is now invoked once and produces single file table output. E.g. [Code|Symbols|Properties] a_0.bc|a_0.sym|a_0.props a_1.bc|a_1.sym|a_1.props This solves both problems - multiple input/output and clustering. Combined with the file-table-tform tool, this allows for efficent handling of multiple clusters of files (each represented as a row in the table file) in the clang driver infrastructure. For example, there is a real offload processing problem: step1. sycl-post-link outputs N clusters of files step2. "Code" file of each cluster resuilting from step1 ({a_0.bc, a_1.bc} in the example above) must undergo further transformations - translation to SPIRV and optional ahead-of-time compilation. step3. In each cluster resulting from step1 the "Code" file needs to be replaced with the result of step2 step4. All the clusters are processed by the ClangOffloadWrapper tool, which needs to know how files are distributed into clusters and what is the roles of each file in a cluster - whether it is "Code", "Symbol" or "Properties". To solve this, the following action graph is constructed in the clang driver: column:"Code" t1 -> [file-table-tform:extract column] -> t1a -> [for-each:] -> t1b llvm-spirv aot-comp t1 \ column:"Code" [file-table-tform:replace column] -> t2 -> [ClangOffloadWrapper] / t1b where t1b is ["Code"] and t2 is [Code|Symbols|Properties] a_0.bin a_0.bin|a_0.sym|a_0.props a_1.bin a_1.bin|a_1.sym|a_1.props Note that the graph does not change with growing number of clusters, neither it changes when more files are added to each cluster (e.g. a "Manifest" file). Signed-off-by: Konstantin S Bobrovsky <[email protected]>
CPU OpenCL Runtime on build machines is not updated yet. Signed-off-by: Konstantin S Bobrovsky <[email protected]>
eb382f7
to
3ede529
Compare
@bader, I've got approvals from component owners - please take a look and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced with @kbobrovs. CPU test disabled because requires CPU RT uplift required. He will enable it with followup commit. Other approvals taken.
Implementation is done according to:
the language-level spec draft from Codeplay (to be substituted by a new one soon, though, but a lot of implementation will remain):
https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md
SPIRV:
https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_a_id_specializationsection_a_specialization
OpenCL API:
https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clSetProgramSpecializationConstant.html
Known deficiencies:
The first three commits are not really SYCL specific and separate PRs will be created. Those are also planned to go to llvm upstream.
property I/O: #1357
table library: #1358
file-table-tform: #1359
A lot of complexity stems from the fact that SPIRV identifies specialization constants by integer ID. It is impossible to assign integer IDs at translation unit compilation time because of separate compilation and potential ID conflict between TUs. So, clang instead generates special intrinsic aka get_spec_constant_value_by_name("spec constant name generated from its type ID into int header") .
Then after linkage, the sycl-post-link tool
(1) runs an LLVMIR pass which assigns integer ID to each spec constant and replaces the get_spec_constant_value_by_name intrinsic with a standard SPIRV intrinsic to get spec constant value with integer ID
(2) saves a map "name"->"int ID" into a file which is later inserted by the clang offload wrapper into the device binary descriptor created for device code (SPIRV module).
At runtime, SYCL runtime receives a request to set a spec constant with given name to some value , uses the "name"->ID map to resolve it to an integer and call piProgramSetSpecializationConstant with the corresponding int ID.
To store the "name"->ID map a "property set" library is developed, which can store arbitrary sets of properties. ClangOffloadWrapper is able to parse them and insert into the wrapper object. Later entries, manifest, build options, etc (as well as future information accompanying the device code) can move to this mechanism instead of having separate fields in the binary descriptor.
Another complexity is clang driver's inability to conveniently model multiple dynamic inputs/outputs in the action graph. The file-table-tform tool fills the gap.
See also comments in the c9a794c.