You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CUDA and HIP adapters are both using a nearly identical complicated
queue that handles creating an out-of-order UR queue from in-order
CUDA/HIP streams.
This patch extracts all of the queue logic into a separate templated
class that can be used by both adapters. Beyond removing a lot of
duplicated code, it also makes it a lot easier to maintain.
There was a few functional differences between the queues in both
adapters, but mostly due to fixes done in the CUDA adapter that were not
ported to the HIP adapter. There might be more but I found at least one
race condition (intel/llvm#15100) and one
performance issue (intel/llvm#6333) that weren't
fixed in the HIP adapter.
This patch uses the CUDA version of the queue as a base for the generic
queue, and will thus fix for HIP the race condition and performance
issue mentioned above.
This code is quite complex, so this patch also aimed to minimize any
other changes beyond the structural changes needed to share the code.
However it did do the following changes in the two adapters:
`stream_queue.hpp`:
* Remove `urDeviceRetain/Release`: essentially a no-op
CUDA:
* Rename `ur_stream_guard_` to `ur_stream_guard`
* Rename `getNextEventID` to `getNextEventId`
* Remove duplicate `get_device` getter, use `getDevice` instead
HIP:
* Fix queue finish so it doesn't fail when no streams need to be
synchronized
0 commit comments