Skip to content

[SYCL] Support sycl::kernel_bundle for multi-device scenario #15546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 23, 2024

Conversation

againull
Copy link
Contributor

@againull againull commented Sep 28, 2024

This PR includes:

  • Changes in the program manager methods to be able to properly create/build UR program for multiple devices. So far, we were mostly using only the first device in the vector to create/build UR program which made UR program unusable on other devices.

  • UR tag update brings the version of urProgramCreateWithBinary which allows
    to create UR program from multiple device binaries.

  • Our program cache key allowed only a single device. I have changed it to contain a set of devices. If UR program is created and built for a set of devices then the same UR program is usable whenver we have any subset of this set. That's why if we have a program built for a set of devices then add all subsets to the cache. Before we were adding a record to the cache for each device from the set which is incorrect. For example, if someone requests a UR program for {dev2, dev3} from the cache then it is expected that this UR progam must be usable to submit a kernel to dev3. But we could get a program for {dev1, dev2} from the cache which is unusable on dev3.

@againull againull force-pushed the sycl_bundle_multi_device branch from 464b6dc to 02a5333 Compare September 28, 2024 00:47
@againull againull force-pushed the sycl_bundle_multi_device branch from 02a5333 to 5b5df33 Compare September 30, 2024 05:13
@againull againull force-pushed the sycl_bundle_multi_device branch from 5b5df33 to ac8444b Compare September 30, 2024 05:36
@againull againull force-pushed the sycl_bundle_multi_device branch 2 times, most recently from 3cfe9b7 to 5e4b86e Compare September 30, 2024 20:18
@againull againull force-pushed the sycl_bundle_multi_device branch from 5e4b86e to 2b387e4 Compare September 30, 2024 20:22
@againull againull force-pushed the sycl_bundle_multi_device branch from 2b387e4 to b8e0652 Compare October 1, 2024 07:26
@againull againull force-pushed the sycl_bundle_multi_device branch from b8e0652 to 080bd06 Compare October 1, 2024 07:28
@againull againull force-pushed the sycl_bundle_multi_device branch from 080bd06 to 9f17f3f Compare October 1, 2024 07:37
@againull againull force-pushed the sycl_bundle_multi_device branch from 9f17f3f to 687d688 Compare October 1, 2024 08:05
@againull againull marked this pull request as ready for review October 1, 2024 08:06
@againull againull requested review from a team as code owners October 1, 2024 08:06
This PR includes:
* Changes in the program manager methods to be able to properly
  create/build UR program for multiple devices. So far, we were mostly
  using only the first device in the vector to create/build UR program
  which made UR program unusable on other devices.

* UR tag update brings the version of urProgramCreateWithBinary which allows
  to create UR program from multiple device binaries.

* Our program cache key allowed only a single device. I have changed it
  to contain a set of devices. If UR program is created and built for a
  set of devices then the same UR program is usable whenver we have any
  subset of this set. That's why if we have a program built for a set of
  devices then add all subsets to the cache. Before we were adding a
  record to the cache for each device from the set which is incorrect.
  For example, if someone requests a UR program for {dev2, dev3} from
  the cache then it is expected that this UR progam must be usable to
  submit a kernel to dev3. But we could get a program for
  {dev1, dev2} from the cache which is unusable on dev3.
@againull againull force-pushed the sycl_bundle_multi_device branch from 687d688 to cce501a Compare October 1, 2024 08:50
Copy link
Contributor

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15773 needs to be merged before this to pull in sanitizer changes that went into ur prior to this.

Copy link
Contributor

@nrspruit nrspruit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge to fix the build with intel/llvm and latest Unified Runtime mainline.

@againull
Copy link
Contributor Author

Failures:
SYCL :: AOT/double.cpp
SYCL :: AOT/half.cpp
SYCL :: AOT/reqd-sg-size.cpp
are unrelated and going to be fixed by #15823

@againull againull merged commit c498567 into intel:sycl Oct 23, 2024
11 of 12 checks passed
// RUN: env SYCL_CACHE_IN_MEM=0 NEOReadDebugKeys=1 CreateMultipleRootDevices=4 %{run} %t.out

// Test AOT next.
// RUN: %{build} -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device *" -o %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@againull, ocloc requirement is missing for this test, making it fail in some environments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, missed that, will fix the test.

againull added a commit that referenced this pull request Nov 26, 2024
…f devices (#16175)

This test case didn't work properly (incorrect number of
retains/releases) before the fix for multi-device case:
#15546
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants