-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Support sycl::kernel_bundle for multi-device scenario #15546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
464b6dc
to
02a5333
Compare
02a5333
to
5b5df33
Compare
5b5df33
to
ac8444b
Compare
3cfe9b7
to
5e4b86e
Compare
5e4b86e
to
2b387e4
Compare
2b387e4
to
b8e0652
Compare
b8e0652
to
080bd06
Compare
080bd06
to
9f17f3f
Compare
9f17f3f
to
687d688
Compare
This PR includes: * Changes in the program manager methods to be able to properly create/build UR program for multiple devices. So far, we were mostly using only the first device in the vector to create/build UR program which made UR program unusable on other devices. * UR tag update brings the version of urProgramCreateWithBinary which allows to create UR program from multiple device binaries. * Our program cache key allowed only a single device. I have changed it to contain a set of devices. If UR program is created and built for a set of devices then the same UR program is usable whenver we have any subset of this set. That's why if we have a program built for a set of devices then add all subsets to the cache. Before we were adding a record to the cache for each device from the set which is incorrect. For example, if someone requests a UR program for {dev2, dev3} from the cache then it is expected that this UR progam must be usable to submit a kernel to dev3. But we could get a program for {dev1, dev2} from the cache which is unusable on dev3.
687d688
to
cce501a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#15773 needs to be merged before this to pull in sanitizer changes that went into ur prior to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge to fix the build with intel/llvm and latest Unified Runtime mainline.
Failures: |
// RUN: env SYCL_CACHE_IN_MEM=0 NEOReadDebugKeys=1 CreateMultipleRootDevices=4 %{run} %t.out | ||
|
||
// Test AOT next. | ||
// RUN: %{build} -fsycl-targets=spir64_gen -Xsycl-target-backend=spir64_gen "-device *" -o %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@againull, ocloc
requirement is missing for this test, making it fail in some environments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, missed that, will fix the test.
This PR includes:
Changes in the program manager methods to be able to properly create/build UR program for multiple devices. So far, we were mostly using only the first device in the vector to create/build UR program which made UR program unusable on other devices.
UR tag update brings the version of urProgramCreateWithBinary which allows
to create UR program from multiple device binaries.
Our program cache key allowed only a single device. I have changed it to contain a set of devices. If UR program is created and built for a set of devices then the same UR program is usable whenver we have any subset of this set. That's why if we have a program built for a set of devices then add all subsets to the cache. Before we were adding a record to the cache for each device from the set which is incorrect. For example, if someone requests a UR program for {dev2, dev3} from the cache then it is expected that this UR progam must be usable to submit a kernel to dev3. But we could get a program for {dev1, dev2} from the cache which is unusable on dev3.