[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp #14353

hdelan · 2024-06-28T16:38:23Z

The SYCL RT assumes that for devices in the same context, no mem
migration needs to occur across devices for a kernel launch or host
task. However, a CUdeviceptr is relevant to a specific device, so mem
migration must occur between devices in a ctx. If this assumption that
the SYCL RT makes about native mems being accessible to all devices in a
context, it must hand off the HT lambda to the plugin, so that the
plugin can handle the necessary mem migration.

This patch uses the new urEnqueueCustomCommandExp to execute the HT
lambda, which takes care of mem migration implicitly in the plugin.

PietroGhg

Native CPU LGTM, thank you

hdelan · 2024-07-01T11:35:42Z

Ping @cperkinsintel @intel/unified-runtime-reviewers this is very high priority. Would be great to get reviews ASAP, thanks.

steffenlarsen

Overall I think it looks good.

sycl/source/detail/scheduler/commands.cpp

steffenlarsen · 2024-07-01T11:57:29Z

sycl/source/detail/scheduler/commands.cpp

-
-        HostTask.MHostTask->call(MThisCmd->MEvent->getHostProfilingInfo(), IH);
+        if (IH.get_backend() == backend::ext_oneapi_cuda ||
+            IH.get_backend() == backend::ext_oneapi_hip) {


I would have preferred that the UR adapter would tell us if we could use this path, instead of making it the runtime checking the type of backend. That said, I can live with this as a stop-gap solution. Is the plan to have the other backends support this UR functionality as well?

We can query this info from UR actually, but I think it's better only to use this entry point if it's needed for correctness, rather than if the entry point is supported. Although I don't feel too strongly about this either way, I can change this to query whether the device supports the experimental UR entry point.

There are indeed plans for other backends to support this although we are focussing our energy on CUDA and HIP for now, since interop is a big perf limiter for these backends. Interop isn't a big issue for other backends at the moment, but I think this will soon be supported in other backends as well.

Maybe as an interim solution I'll add a TODO in code, saying should this entry point be used for all the backends that support it?. This is an open question.

I am fine with a TODO, but in the end I would still prefer if we could rely on the support query for this. Hope would be that any backend that reports support for it should be able to do it efficiently and correctly, as I assume we don't want different behavior for the API.

Maybe it means we can eventually always use this path and scrap some of the implementation in our library. 😉

Yeah I think that's a good idea! I will change actually to use the query, if that is the preferable path

I have updated the check to query piDeviceGetInfo to see if the entry point is supported.

The SYCL RT assumes that for devices in the same context, no mem migration needs to occur across devices for a kernel launch or host task. However, a CUdeviceptr is relevant to a specific device, so mem migration must occur between devices in a ctx. If this assumption that the SYCL RT makes about native mems being accessible to all devices in a context, it must hand off the HT lambda to the plugin, so that the plugin can handle the necessary mem migration. This patch uses the new urEnqueueCustomCommandExp to execute the HT lambda, which takes care of mem migration implicitly in the plugin.

Clang format and add pimock symbol and abi symbols.

Co-authored-by: Steffen Larsen <[email protected]>

Don't branch on backend, use UR entry point if it is supported. Also add equivalent enum to PI.

hdelan · 2024-07-01T20:40:17Z

Ping @intel/llvm-gatekeepers this can be merged. Thanks

This should address the post-commit failure introduced in intel#14353 Signed-off-by: Larsen, Steffen <[email protected]>

This should address the post-commit failure introduced in #14353 Signed-off-by: Larsen, Steffen <[email protected]>

hdelan requested review from a team as code owners June 28, 2024 16:38

hdelan requested review from JackAKirk and cperkinsintel June 28, 2024 16:38

hdelan force-pushed the fix-host-task-mem-migration branch from dae1178 to 380557b Compare June 28, 2024 16:38

hdelan had a problem deploying to WindowsCILock June 28, 2024 16:41 — with GitHub Actions Failure

hdelan had a problem deploying to WindowsCILock June 28, 2024 21:52 — with GitHub Actions Failure

hdelan force-pushed the fix-host-task-mem-migration branch from 1ca21ea to 48edd61 Compare July 1, 2024 08:37

hdelan temporarily deployed to WindowsCILock July 1, 2024 08:37 — with GitHub Actions Inactive

hdelan temporarily deployed to WindowsCILock July 1, 2024 09:09 — with GitHub Actions Inactive

hdelan temporarily deployed to WindowsCILock July 1, 2024 10:48 — with GitHub Actions Inactive

hdelan changed the title ~~[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp~~ [ABI break][CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp Jul 1, 2024

hdelan changed the title ~~[ABI break][CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp~~ [CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp Jul 1, 2024

PietroGhg approved these changes Jul 1, 2024

View reviewed changes

hdelan had a problem deploying to WindowsCILock July 1, 2024 11:25 — with GitHub Actions Error

steffenlarsen approved these changes Jul 1, 2024

View reviewed changes

hdelan temporarily deployed to WindowsCILock July 1, 2024 12:01 — with GitHub Actions Inactive

hdelan had a problem deploying to WindowsCILock July 1, 2024 13:08 — with GitHub Actions Error

hdelan temporarily deployed to WindowsCILock July 1, 2024 13:32 — with GitHub Actions Inactive

hdelan had a problem deploying to WindowsCILock July 1, 2024 15:07 — with GitHub Actions Error

Hugh Delaney and others added 7 commits July 1, 2024 17:07

Add pi entry points

81866d9

Fix tests

8e6cc34

Clang format and add pimock symbol and abi symbols.

Remove early exit in test

13aed34

Update sycl/source/detail/scheduler/commands.cpp

063b499

Co-authored-by: Steffen Larsen <[email protected]>

Add TODO

df7f0ae

Use urEnqueueNativeCommandExp if supported

e67b566

Don't branch on backend, use UR entry point if it is supported. Also add equivalent enum to PI.

hdelan force-pushed the fix-host-task-mem-migration branch from 985ed2b to e67b566 Compare July 1, 2024 16:17

hdelan temporarily deployed to WindowsCILock July 1, 2024 16:21 — with GitHub Actions Inactive

aarongreig approved these changes Jul 1, 2024

View reviewed changes

hdelan temporarily deployed to WindowsCILock July 1, 2024 18:59 — with GitHub Actions Inactive

ldrumm merged commit 2e212e0 into intel:sycl Jul 1, 2024
14 checks passed

steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Jul 2, 2024

[SYCL] Remove unused parameter in InteropFreeFunc

64330c6

This should address the post-commit failure introduced in intel#14353 Signed-off-by: Larsen, Steffen <[email protected]>

steffenlarsen mentioned this pull request Jul 2, 2024

[SYCL] Remove unused parameter in InteropFreeFunc #14377

Merged

steffenlarsen added a commit that referenced this pull request Jul 2, 2024

[SYCL] Remove unused parameter in InteropFreeFunc (#14377)

417cc1b

This should address the post-commit failure introduced in #14353 Signed-off-by: Larsen, Steffen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp #14353

[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp #14353

Uh oh!

hdelan commented Jun 28, 2024

Uh oh!

PietroGhg left a comment

Uh oh!

hdelan commented Jul 1, 2024

Uh oh!

steffenlarsen left a comment

Uh oh!

Uh oh!

steffenlarsen Jul 1, 2024

Uh oh!

hdelan Jul 1, 2024

Uh oh!

hdelan Jul 1, 2024

Uh oh!

steffenlarsen Jul 1, 2024

Uh oh!

hdelan Jul 1, 2024

Uh oh!

hdelan Jul 1, 2024

Uh oh!

hdelan commented Jul 1, 2024

Uh oh!

Uh oh!

Uh oh!

[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp #14353

[CUDA][HIP] Fix host task mem migration and add pi entry point for urEnqueueNativeCommandExp #14353

Uh oh!

Conversation

hdelan commented Jun 28, 2024

Uh oh!

PietroGhg left a comment

Choose a reason for hiding this comment

Uh oh!

hdelan commented Jul 1, 2024

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steffenlarsen Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

steffenlarsen Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

hdelan commented Jul 1, 2024

Uh oh!

Uh oh!

Uh oh!