-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][CUDA][HIP] Support zero range kernel for cuda and hip backends. #7044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
719c6b8
[SYCL][CUDA][HIP] Suppor zero range kernel for cuda and hip backends.
mmoadeli 1eabb73
Merge branch 'sycl' into zero_range
mmoadeli 70e06e3
[SYCL][Test] Address the event handling issue for cuda and hip range …
mmoadeli 6678c49
Merge remote-tracking branch 'upstream/sycl' into zero_range
mmoadeli e7a69d8
Merge branch 'zero_range' of github.com:mmoadeli/llvm into zero_range
mmoadeli a76ac27
Update sycl/plugins/hip/pi_hip.cpp
mmoadeli 02bf9ae
Update sycl/plugins/cuda/pi_cuda.cpp
mmoadeli 207f2b2
[SYCL] Fix style issues.
mmoadeli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing an early exit here means that we do not create an event. I fear that could cause unexpected problems. Same goes for HIP.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steffenlarsen I agree with you on this regarding the event.
If I move the condition (if (*global_work_size =! 0 )) to guard the call to PI_CHECK_ERROR(cuLaunchKernel( it will preserve the functionality related to event handling. Do you have any reservation doing that?
I have not tried the L0, but the opencl:cpu and esimd (and host in an earlier version of DP++) backend are tried and work without the need for any modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either that or you could copy the event creation to here, something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it will be a correct implementation. A call with
NDRange == 0
while should do nothing in terms of running the kernel, the event produced should be as if we run a kernel:event_wait_list
event_wait_list
of a subsequent enqueue.So, I think that the more correct implementation would be just calling
cuda_piEnqueueEventsWaitWithBarrier
ifNDRange == 0
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. Now that there are multiple queues, recording an event is not enough to act like a barrier.