Skip to content

[SYCL] Use CUDA symbol for specialization constants #7946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

jchlanda
Copy link
Contributor

@jchlanda jchlanda commented Jan 6, 2023

This patch improves support for CUDA specialization constants. The approach is hybrid, where we make use of spec constant emulation (as already present in oneAPI), which bundles all spec constants together and passes them to the kernel as an additional, implicit argument; as well as special handling in the CUDA pi plugin alongside a custom pass.
The pass creates a global entry that corresponds to the implicit argument and rewrites the kernel to drop that additional implicit argument, furthermore it changes the accesses from implicit argument to the global value. At the runtime side, CUDA pi plugin is responsible for retrieving the symbol (using cuModuleGetGlobal) and setting it up (using cuMemcpyHtoD).

The patch requires an update to the PI interface, extending the list of arguments of piextProgramSetSpecializationConstant with a pi_kernel. This is needed, as emulated specialization constants rely on extra kernel argument, for this patch the argument is replaced with a CUDA symbol.

@jchlanda jchlanda requested review from a team as code owners January 6, 2023 13:47
@jchlanda jchlanda marked this pull request as draft January 6, 2023 13:47
@jchlanda jchlanda temporarily deployed to aws January 6, 2023 14:04 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 9, 2023 13:55 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 9, 2023 17:15 — with GitHub Actions Inactive
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl-post-link part feedback

@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from 8398740 to 484d770 Compare January 11, 2023 10:29
@jchlanda jchlanda temporarily deployed to aws January 12, 2023 05:26 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 12, 2023 15:35 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jan 13, 2023

@jchlanda, just in case you are waiting for pre-commit test results on NVIDIA GPU. Please, merge with the head of the sycl branch to pull the fix for CI scripts. There was a bug, which hang the validation.

@jchlanda jchlanda temporarily deployed to aws January 13, 2023 16:33 — with GitHub Actions Inactive
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from 4f91cb8 to 91cba2f Compare January 18, 2023 13:57
@jchlanda jchlanda temporarily deployed to aws January 18, 2023 14:23 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 18, 2023 15:14 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 19, 2023 14:35 — with GitHub Actions Inactive
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from 357b1c9 to cb76a89 Compare January 19, 2023 14:35
@jchlanda jchlanda temporarily deployed to aws January 19, 2023 14:40 — with GitHub Actions Inactive
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from cb76a89 to 53c2168 Compare January 20, 2023 09:13
@jchlanda jchlanda temporarily deployed to aws January 20, 2023 09:39 — with GitHub Actions Inactive
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from f8f1e26 to 158ec58 Compare January 20, 2023 14:15
@jchlanda jchlanda changed the title [SYCL] Add support for CUDA native spec constants [SYCL] Use CUDA symbol for specialization constants Jan 20, 2023
@jchlanda jchlanda temporarily deployed to aws January 20, 2023 14:37 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 23, 2023 11:22 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws January 23, 2023 11:53 — with GitHub Actions Inactive
kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Mar 7, 2023
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable
adapters to opt-in to setting specialization constants on a kernel
object rather than a program object. On these adapters it is not
required to perform compilation or linking when setting specialization
constants and as such they can be set immediately prior to
`urEnqueueKernelLaunch` with very low overhead.

This patch:

* Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device
  query to determine if `urKernelSetSpecializationConstants` is
  supported by the device.
* Adds `ur_specialization_constant_info_t` to represent the data
  required to set an individual specialization constant
* Adds `urKernelSetSpecializationConstants` entry point, which adapters
  can opt-in to supporting, which enables setting multiple
  specialization constants on a kernel object.
* Replaces the singular `urProgramSetSpecializationConstant` with the
  plural `urProgramSetSpecializationConstants` to enable setting
  multiple specialization constants at once on a program object and have
  a consistent API between the program and kernel variations.
kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Mar 7, 2023
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable
adapters to opt-in to setting specialization constants on a kernel
object rather than a program object. On these adapters it is not
required to perform compilation or linking when setting specialization
constants and as such they can be set immediately prior to
`urEnqueueKernelLaunch` with very low overhead.

This patch:

* Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device
  query to determine if `urKernelSetSpecializationConstants` is
  supported by the device.
* Adds `ur_specialization_constant_info_t` to represent the data
  required to set an individual specialization constant
* Adds `urKernelSetSpecializationConstants` entry point, which adapters
  can opt-in to supporting, which enables setting multiple
  specialization constants on a kernel object.
* Replaces the singular `urProgramSetSpecializationConstant` with the
  plural `urProgramSetSpecializationConstants` to enable setting
  multiple specialization constants at once on a program object and have
  a consistent API between the program and kernel variations.
kbenzie added a commit to kbenzie/unified-runtime that referenced this pull request Mar 7, 2023
The issue oneapi-src#269, and the PR intel/llvm#7946, outline a desire to enable
adapters to opt-in to setting specialization constants on a kernel
object rather than a program object. On these adapters it is not
required to perform compilation or linking when setting specialization
constants and as such they can be set immediately prior to
`urEnqueueKernelLaunch` with very low overhead.

This patch:

* Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device
  query to determine if `urKernelSetSpecializationConstants` is
  supported by the device.
* Adds `ur_specialization_constant_info_t` to represent the data
  required to set an individual specialization constant
* Adds `urKernelSetSpecializationConstants` entry point, which adapters
  can opt-in to supporting, which enables setting multiple
  specialization constants on a kernel object.
* Replaces the singular `urProgramSetSpecializationConstant` with the
  plural `urProgramSetSpecializationConstants` to enable setting
  multiple specialization constants at once on a program object and have
  a consistent API between the program and kernel variations.
kbenzie added a commit to oneapi-src/unified-runtime that referenced this pull request Mar 8, 2023
The issue #269, and the PR intel/llvm#7946, outline a desire to enable
adapters to opt-in to setting specialization constants on a kernel
object rather than a program object. On these adapters it is not
required to perform compilation or linking when setting specialization
constants and as such they can be set immediately prior to
`urEnqueueKernelLaunch` with very low overhead.

This patch:

* Adds the `UR_DEVICE_INFO_KERNEL_SET_SPECIALIZATION_CONSTANTS` device
  query to determine if `urKernelSetSpecializationConstants` is
  supported by the device.
* Adds `ur_specialization_constant_info_t` to represent the data
  required to set an individual specialization constant
* Adds `urKernelSetSpecializationConstants` entry point, which adapters
  can opt-in to supporting, which enables setting multiple
  specialization constants on a kernel object.
* Replaces the singular `urProgramSetSpecializationConstant` with the
  plural `urProgramSetSpecializationConstants` to enable setting
  multiple specialization constants at once on a program object and have
  a consistent API between the program and kernel variations.
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from a34a518 to b882437 Compare March 10, 2023 07:49
@jchlanda jchlanda temporarily deployed to aws March 10, 2023 08:16 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws March 10, 2023 08:44 — with GitHub Actions Inactive
@jchlanda
Copy link
Contributor Author

@intel/llvm-reviewers-runtime ping

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime and CUDA implementation looks okay to me, though I share @gmlueck's concerns about the API. It seems like adding a new API would avoid both the overloading of the behavior and the PI ABI break. However, based the latest information in both of the related discussions I don't think it has to be a blocker. @smaslov-intel - Feel free to overrule if you feel strong about it.

@smaslov-intel
Copy link
Contributor

Runtime and CUDA implementation looks okay to me, though I share @gmlueck's concerns about the API. It seems like adding a new API would avoid both the overloading of the behavior and the PI ABI break. However, based the latest information in both of the related discussions I don't think it has to be a blocker. @smaslov-intel - Feel free to overrule if you feel strong about it.

We are in agreement that we do allow PI API/ABI break now, so let's go ahead here. The better interfaces are being added to UR.

jchlanda and others added 7 commits July 12, 2023 05:38
For each kernel using specialization constants copy the data from the
implicit argument to a global symbol, then fix up the uses of that
argument accordingly, and remove the arg.
@jchlanda jchlanda marked this pull request as draft July 13, 2023 06:30
@jchlanda
Copy link
Contributor Author

Switching to draft, as the faith of cuda symbols for spec constants is somehow uncertain.

@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from b882437 to 22ad6c8 Compare July 13, 2023 06:42
@jchlanda jchlanda force-pushed the jakub/cuda_native_symbols branch from ba56dc9 to fc97c36 Compare July 13, 2023 06:52
@jchlanda jchlanda temporarily deployed to aws July 13, 2023 07:11 — with GitHub Actions Inactive
@jchlanda jchlanda temporarily deployed to aws July 13, 2023 07:51 — with GitHub Actions Inactive
Copy link
Contributor

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Sep 16, 2024
Copy link
Contributor

This pull request was closed because it has been stalled for 30 days with no activity.

@github-actions github-actions bot closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants