-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Doc] Expand device_global related PI API #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Expand device_global related PI API #5906
Conversation
This commit changes the PI API introduced by the device_global design document, aligning it with the corresponding OpenCL API and the rest of PI API. Additionally, it changes the references to relevant CUDA API functions from referring to the CUDA runtime API to refer to the CUDA driver API, which is the API used by the CUDA backend. Signed-off-by: Steffen Larsen <[email protected]>
size_t Offset, const void *Src, | ||
pi_uint32 NumEventsInWaitList, | ||
const pi_event *EventsWaitList, | ||
pi_event *Event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to this, but I'm curious. Does the DPC++ runtime really need all this flexibility? For example, will it ever use this API with BlockingRead
set to false
? If not, why expose all this in the PI interface?
This is one of the criticisms I have about the PI interface in general. It just duplicates much of the OpenCL API even if the DPC++ runtime doesn't need all that flexibility. This makes the API more difficult to port to new backends because you need to implement stuff that is allowed in the PI API even though it's never used by the runtime.
In reality, backend ports often don't implement the PI interfaces fully because they know the runtime doesn't really needs all that functionality. This creates a sort of double standard. We have the "official" PI interface that no one really implements, and the "reality" version of the interface that isn't documented anywhere.
It seems like much of this would be improved if the PI interface exposed only the functionality that DPC++ really needs. So, does the DPC++ runtime really need the extra parameters you added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't disagree with most of your sentiments. However, the feature generally boils down to getting the pointer to the device variable and then a generic copy to or from it, so I think it makes sense to have it in line with the other memory operations, which have similar parameters to this.
Additionally, I expect to use all parameters except for blocking_read
and blocking_write
. Note also that the first parameter has been changed from pi_device
to pi_queue
as it is an enqueued operation and we can get the device from the queue but not vice versa.
@smaslov-intel, ping. |
Signed-off-by: Larsen, Steffen <[email protected]>
This commit changes the PI API introduced by the device_global design document, aligning it with the corresponding OpenCL API and the rest of PI API.
Additionally, it changes the references to relevant CUDA API functions from referring to the CUDA runtime API to refer to the CUDA driver API, which is the API used by the CUDA backend.