-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Implement device_global host-side memory operations #8022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Implement device_global host-side memory operations #8022
Conversation
This commit implements the copy and memcpy operations to and from device_global. If the device_global does not have device_image_scope the memory operation will be on the underlying USM memory, while if the operation is on a device_global with device_image_scope the runtime will try to find a suitable program in the program cache or build a new program using the image using it. Signed-off-by: Steffen Larsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early feedback after a brief read.
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
// CacheKey is captured by reference so when we overwrite it later we can | ||
// reuse this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to modify the comment to point where such an overwrite is happening? Or would it become stale too soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry it will be outdated too easily if we are specific here, but there is a comment with
// Change device in the cache key to reduce copying of spec const data.
the place where the key is mutated.
DepEvents.push_back(*ZIEvent); | ||
|
||
MemoryManager::copy_usm(Src, Queue, NumBytes, | ||
reinterpret_cast<char *>(Dest) + Offset, DepEvents, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this accepts DepEvents
by ref, because otherwise the same lifetime/retain concerns...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem to, but since the events are just handles it should not be a problem. Since we are now keeping the event from being released for the duration of this call we do not have to worry about it dying while inside copy_usm
. If copy_usm
needs it to stay alive for longer than that it will have to maintain the lifetime itself.
That said, there could be an argument for changing the DepEvents
arguments of these to references. I have done this for the new interfaces, but the rest should be done separately. Note that changing them may require an ABI break.
This commit introduces a new RAII struct for guarding the lifetime of the zero-initialization event. This is used to prevent from being released while it is in use. Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
This commit adds behavior for reading and writing to memory related to device_globals in the corresponding unittests. This allows the tests to check that the behavior of the memory operations are as expected. Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
#8104 is the subset of this patch that refactors build and kernel cache to allow the memory operations to look up existing programs in the cache for device_globals with device_image_scope. @romanovvlad | @sergey-semenov - Could you please have a look at that patch? |
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
@@ -47,13 +47,12 @@ DeviceGlobalUSMMem::getZeroInitEvent(const plugin &Plugin) { | |||
*MZeroInitEvent, Plugin) == info::event_command_status::complete) { | |||
Plugin.call<PiApiKind::piEventRelease>(*MZeroInitEvent); | |||
MZeroInitEvent = {}; | |||
return std::nullopt; | |||
return OwnedPiEvent(Plugin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Another option is to change reference to a pointer and default to nullptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference seems to be in line with how it's normally passed around, so since we have the plugin available here we may as well keep it to the norm. It also means we could have a variant that gets an event after the fact, though that's just a sidenote.
Signed-off-by: Larsen, Steffen <[email protected]>
MZeroInitEvent = {}; | ||
return OwnedPiEvent(Plugin); | ||
} else { | ||
return std::move(OwnedPiEvent(*MZeroInitEvent, Plugin)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't compile with clang.
https://github.com/intel/llvm/actions/runs/4244006981/jobs/7377519363
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit implements the copy and memcpy operations to and from device_global. If the device_global does not have device_image_scope the memory operation will be on the underlying USM memory, while if the operation is on a device_global with device_image_scope the runtime will try to find a suitable program in the program cache or build a new program using the image using it.