-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Use CUDA symbol for specialization constants #7946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl-post-link
part feedback
8398740
to
484d770
Compare
@jchlanda, just in case you are waiting for pre-commit test results on NVIDIA GPU. Please, merge with the head of the |
4f91cb8
to
91cba2f
Compare
357b1c9
to
cb76a89
Compare
cb76a89
to
53c2168
Compare
f8f1e26
to
158ec58
Compare
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable adapters to opt-in to setting specialization constants on a kernel object rather than a program object. On these adapters it is not required to perform compilation or linking when setting specialization constants and as such they can be set immediately prior to `urEnqueueKernelLaunch` with very low overhead. This patch: * Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device query to determine if `urKernelSetSpecializationConstants` is supported by the device. * Adds `ur_specialization_constant_info_t` to represent the data required to set an individual specialization constant * Adds `urKernelSetSpecializationConstants` entry point, which adapters can opt-in to supporting, which enables setting multiple specialization constants on a kernel object. * Replaces the singular `urProgramSetSpecializationConstant` with the plural `urProgramSetSpecializationConstants` to enable setting multiple specialization constants at once on a program object and have a consistent API between the program and kernel variations.
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable adapters to opt-in to setting specialization constants on a kernel object rather than a program object. On these adapters it is not required to perform compilation or linking when setting specialization constants and as such they can be set immediately prior to `urEnqueueKernelLaunch` with very low overhead. This patch: * Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device query to determine if `urKernelSetSpecializationConstants` is supported by the device. * Adds `ur_specialization_constant_info_t` to represent the data required to set an individual specialization constant * Adds `urKernelSetSpecializationConstants` entry point, which adapters can opt-in to supporting, which enables setting multiple specialization constants on a kernel object. * Replaces the singular `urProgramSetSpecializationConstant` with the plural `urProgramSetSpecializationConstants` to enable setting multiple specialization constants at once on a program object and have a consistent API between the program and kernel variations.
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable adapters to opt-in to setting specialization constants on a kernel object rather than a program object. On these adapters it is not required to perform compilation or linking when setting specialization constants and as such they can be set immediately prior to `urEnqueueKernelLaunch` with very low overhead. This patch: * Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device query to determine if `urKernelSetSpecializationConstants` is supported by the device. * Adds `ur_specialization_constant_info_t` to represent the data required to set an individual specialization constant * Adds `urKernelSetSpecializationConstants` entry point, which adapters can opt-in to supporting, which enables setting multiple specialization constants on a kernel object. * Replaces the singular `urProgramSetSpecializationConstant` with the plural `urProgramSetSpecializationConstants` to enable setting multiple specialization constants at once on a program object and have a consistent API between the program and kernel variations.
The issue #269, and the PR intel/llvm#7946, outline a desire to enable adapters to opt-in to setting specialization constants on a kernel object rather than a program object. On these adapters it is not required to perform compilation or linking when setting specialization constants and as such they can be set immediately prior to `urEnqueueKernelLaunch` with very low overhead. This patch: * Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device query to determine if `urKernelSetSpecializationConstants` is supported by the device. * Adds `ur_specialization_constant_info_t` to represent the data required to set an individual specialization constant * Adds `urKernelSetSpecializationConstants` entry point, which adapters can opt-in to supporting, which enables setting multiple specialization constants on a kernel object. * Replaces the singular `urProgramSetSpecializationConstant` with the plural `urProgramSetSpecializationConstants` to enable setting multiple specialization constants at once on a program object and have a consistent API between the program and kernel variations.
a34a518
to
b882437
Compare
@intel/llvm-reviewers-runtime ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime and CUDA implementation looks okay to me, though I share @gmlueck's concerns about the API. It seems like adding a new API would avoid both the overloading of the behavior and the PI ABI break. However, based the latest information in both of the related discussions I don't think it has to be a blocker. @smaslov-intel - Feel free to overrule if you feel strong about it.
We are in agreement that we do allow PI API/ABI break now, so let's go ahead here. The better interfaces are being added to UR. |
For each kernel using specialization constants copy the data from the implicit argument to a global symbol, then fix up the uses of that argument accordingly, and remove the arg.
Co-authored-by: Alexey Sachkov <[email protected]>
Co-authored-by: Alexey Sachkov <[email protected]>
Switching to draft, as the faith of cuda symbols for spec constants is somehow uncertain. |
b882437
to
22ad6c8
Compare
ba56dc9
to
fc97c36
Compare
This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days. |
This pull request was closed because it has been stalled for 30 days with no activity. |
This patch improves support for CUDA specialization constants. The approach is hybrid, where we make use of spec constant emulation (as already present in oneAPI), which bundles all spec constants together and passes them to the kernel as an additional, implicit argument; as well as special handling in the CUDA pi plugin alongside a custom pass.
The pass creates a global entry that corresponds to the implicit argument and rewrites the kernel to drop that additional implicit argument, furthermore it changes the accesses from implicit argument to the global value. At the runtime side, CUDA pi plugin is responsible for retrieving the symbol (using
cuModuleGetGlobal
) and setting it up (usingcuMemcpyHtoD
).The patch requires an update to the PI interface, extending the list of arguments of
piextProgramSetSpecializationConstant
with api_kernel
. This is needed, as emulated specialization constants rely on extra kernel argument, for this patch the argument is replaced with a CUDA symbol.