Skip to content

[SYCL][Doc] Expand device_global related PI API #5906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steffenlarsen
Copy link
Contributor

This commit changes the PI API introduced by the device_global design document, aligning it with the corresponding OpenCL API and the rest of PI API.

Additionally, it changes the references to relevant CUDA API functions from referring to the CUDA runtime API to refer to the CUDA driver API, which is the API used by the CUDA backend.

This commit changes the PI API introduced by the device_global design
document, aligning it with the corresponding OpenCL API and the rest of
PI API.

Additionally, it changes the references to relevant CUDA API functions
from referring to the CUDA runtime API to refer to the CUDA driver API,
which is the API used by the CUDA backend.

Signed-off-by: Steffen Larsen <[email protected]>
@steffenlarsen steffenlarsen requested a review from a team as a code owner March 28, 2022 15:48
size_t Offset, const void *Src,
pi_uint32 NumEventsInWaitList,
const pi_event *EventsWaitList,
pi_event *Event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this, but I'm curious. Does the DPC++ runtime really need all this flexibility? For example, will it ever use this API with BlockingRead set to false? If not, why expose all this in the PI interface?

This is one of the criticisms I have about the PI interface in general. It just duplicates much of the OpenCL API even if the DPC++ runtime doesn't need all that flexibility. This makes the API more difficult to port to new backends because you need to implement stuff that is allowed in the PI API even though it's never used by the runtime.

In reality, backend ports often don't implement the PI interfaces fully because they know the runtime doesn't really needs all that functionality. This creates a sort of double standard. We have the "official" PI interface that no one really implements, and the "reality" version of the interface that isn't documented anywhere.

It seems like much of this would be improved if the PI interface exposed only the functionality that DPC++ really needs. So, does the DPC++ runtime really need the extra parameters you added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't disagree with most of your sentiments. However, the feature generally boils down to getting the pointer to the device variable and then a generic copy to or from it, so I think it makes sense to have it in line with the other memory operations, which have similar parameters to this.

Additionally, I expect to use all parameters except for blocking_read and blocking_write. Note also that the first parameter has been changed from pi_device to pi_queue as it is an enqueued operation and we can get the device from the queue but not vice versa.

gmlueck
gmlueck previously approved these changes Mar 28, 2022
@bader bader changed the title [SYCL][doc] Expand device_global related PI API [SYCL][Doc] Expand device_global related PI API Mar 29, 2022
@bader
Copy link
Contributor

bader commented Apr 6, 2022

@smaslov-intel, ping.

@steffenlarsen steffenlarsen requested a review from gmlueck April 8, 2022 08:26
@steffenlarsen steffenlarsen merged commit b01d820 into intel:sycl Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants