-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][Doc] Expand device_global related PI API #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -563,15 +563,37 @@ instance of a device global variable in a `pi_program`. This functionality is | |
exposed as two new PI interfaces: | ||
|
||
``` | ||
pi_result piextCopyToDeviceVariable(pi_device Device, pi_program Program, | ||
const char *name, const void *src, size_t count, size_t offset); | ||
|
||
pi_result piextCopyFromDeviceVariable(pi_device Device, pi_program Program, | ||
const char *name, void *dst, size_t count, size_t offset); | ||
pi_result piextEnqueueDeviceVariableRead(pi_queue Queue, pi_program Program, | ||
const char *Name, pi_bool BlockingRead, | ||
size_t Count, size_t Offset, void *Dst, | ||
pi_uint32 NumEventsInWaitList, | ||
const pi_event *EventsWaitList, | ||
pi_event *Event); | ||
|
||
pi_result piextEnqueueDeviceVariableWrite(pi_queue Queue, pi_program Program, | ||
const char *Name, | ||
pi_bool BlockingWrite, size_t Count, | ||
size_t Offset, const void *Src, | ||
pi_uint32 NumEventsInWaitList, | ||
const pi_event *EventsWaitList, | ||
pi_event *Event); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not opposed to this, but I'm curious. Does the DPC++ runtime really need all this flexibility? For example, will it ever use this API with This is one of the criticisms I have about the PI interface in general. It just duplicates much of the OpenCL API even if the DPC++ runtime doesn't need all that flexibility. This makes the API more difficult to port to new backends because you need to implement stuff that is allowed in the PI API even though it's never used by the runtime. In reality, backend ports often don't implement the PI interfaces fully because they know the runtime doesn't really needs all that functionality. This creates a sort of double standard. We have the "official" PI interface that no one really implements, and the "reality" version of the interface that isn't documented anywhere. It seems like much of this would be improved if the PI interface exposed only the functionality that DPC++ really needs. So, does the DPC++ runtime really need the extra parameters you added? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I don't disagree with most of your sentiments. However, the feature generally boils down to getting the pointer to the device variable and then a generic copy to or from it, so I think it makes sense to have it in line with the other memory operations, which have similar parameters to this. Additionally, I expect to use all parameters except for |
||
``` | ||
|
||
In both cases the `name` parameter is the same as the `sycl-unique-id` string | ||
that is associated with the device global variable. | ||
The `piextEnqueueDeviceVariableRead` function reads `Count` bytes at byte-offset | ||
`Offset` from a device global variable in `Program` identified by the name | ||
`Name`. The read data is stored in `Dst`. Likewise, the | ||
`piextEnqueueDeviceVariableWrite` function reads `Count` bytes from `Dst` and | ||
stores them at byte-offset `Offset` in the device global variable in `Program` | ||
identified by the name `Name`. | ||
|
||
Both functions will enqueue the associated memory command on `Queue` where it | ||
will first wait for `NumEventsInWaitList` events in `EventsWaitList` to finish. | ||
`Event` will be populated with the event associated with resulting enqueued | ||
command. If either `BlockingRead` or `BlockingWrite` is `true` the call will | ||
block on the host until the enqueued command finishes execution. | ||
|
||
For `device_global` variables the `Name` parameter in calls to these functions | ||
is the same as the associated `sycl-unique-id` string. | ||
|
||
The Level Zero backend has existing APIs that can implement these PI | ||
interfaces. The plugin first calls [`zeModuleGetGlobalPointer()`][8] to get a | ||
|
@@ -616,8 +638,8 @@ depends upon implementation of that OpenCL extension. | |
|
||
[10]: <opencl-extensions/cl_intel_global_variable_access.asciidoc> | ||
|
||
The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and | ||
`cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. | ||
The CUDA backend has existing APIs `cuModuleGetGlobal()` and `cuMemcpyAsync()` | ||
which can be used to implement these PI interfaces. | ||
|
||
|
||
## Design choices | ||
|
Uh oh!
There was an error while loading. Please reload this page.