Skip to content

[SYCL] Keep track of signaled fences to properly recycle associated command-lists. #3215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 40 additions & 24 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -521,17 +521,19 @@ pi_result _pi_context::finalize() {
return PI_SUCCESS;
}

pi_result
_pi_queue::resetCommandListFenceEntry(ze_command_list_handle_t ZeCommandList,
bool MakeAvailable) {
// Event has been signalled: If the fence for the associated command list
// is signalled, then reset the fence and command list and add them to the
// available list for ruse in PI calls.
ZE_CALL(zeFenceReset(this->ZeCommandListFenceMap[ZeCommandList]));
ZE_CALL(zeCommandListReset(ZeCommandList));
pi_result _pi_queue::resetCommandListFenceEntry(
_pi_queue::command_list_fence_map_t::value_type &MapEntry,
bool MakeAvailable) {
// Fence had been signalled meaning the associated command-list completed.
// Reset the fence and put the command list into a cache for reuse in PI
// calls.
ZE_CALL(zeFenceReset(MapEntry.second.ZeFence));
ZE_CALL(zeCommandListReset(MapEntry.first));
Comment on lines +530 to +531
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here we call zeFenceReset first and then zeCommandListReset.
But, when we are about to release pi_queue, the command list was successfully reset, but fence is still in use (not reset)?
Did I misunderstand this situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the fence and the command-list it is fencing are not reset. These 2 always come together. The purpose of the fence is to tell if command-list is still in use.

MapEntry.second.InUse = false;

if (MakeAvailable) {
std::lock_guard<std::mutex> lock(this->Context->ZeCommandListCacheMutex);
this->Context->ZeCommandListCache.push_back(ZeCommandList);
this->Context->ZeCommandListCache.push_back(MapEntry.first);
}

return PI_SUCCESS;
Expand Down Expand Up @@ -597,14 +599,15 @@ pi_result _pi_context::getAvailableCommandList(
*ZeCommandList = Queue->Context->ZeCommandListCache.front();
auto it = Queue->ZeCommandListFenceMap.find(*ZeCommandList);
if (it != Queue->ZeCommandListFenceMap.end()) {
*ZeFence = it->second;
*ZeFence = it->second.ZeFence;
it->second.InUse = true;
} else {
// If there is a command list available on this device, but no
// fence yet associated, then we must create a fence/list
// reference for this Queue. This can happen if two Queues reuse
// a device which did not have the resources freed.
ZE_CALL(zeFenceCreate(Queue->ZeCommandQueue, &ZeFenceDesc, ZeFence));
Queue->ZeCommandListFenceMap[*ZeCommandList] = *ZeFence;
Queue->ZeCommandListFenceMap[*ZeCommandList] = {*ZeFence, true};
}
Queue->Context->ZeCommandListCache.pop_front();
return PI_SUCCESS;
Expand All @@ -617,12 +620,14 @@ pi_result _pi_context::getAvailableCommandList(
// if a command list has completed dispatch of its commands and is ready for
// reuse. If a command list is found to have been signalled, then the
// command list & fence are reset and we return.
for (const auto &MapEntry : Queue->ZeCommandListFenceMap) {
ze_result_t ZeResult = ZE_CALL_NOCHECK(zeFenceQueryStatus(MapEntry.second));
for (auto &MapEntry : Queue->ZeCommandListFenceMap) {
ze_result_t ZeResult =
ZE_CALL_NOCHECK(zeFenceQueryStatus(MapEntry.second.ZeFence));
if (ZeResult == ZE_RESULT_SUCCESS) {
Queue->resetCommandListFenceEntry(MapEntry.first, false);
Queue->resetCommandListFenceEntry(MapEntry, false);
*ZeCommandList = MapEntry.first;
*ZeFence = MapEntry.second;
*ZeFence = MapEntry.second.ZeFence;
MapEntry.second.InUse = true;
return PI_SUCCESS;
}
}
Expand All @@ -642,8 +647,8 @@ pi_result _pi_context::getAvailableCommandList(
Queue->Device->Platform->ZeGlobalCommandListCount++;
ZE_CALL(zeFenceCreate(Queue->ZeCommandQueue, &ZeFenceDesc, ZeFence));
Queue->ZeCommandListFenceMap.insert(
std::pair<ze_command_list_handle_t, ze_fence_handle_t>(*ZeCommandList,
*ZeFence));
std::pair<ze_command_list_handle_t, _pi_queue::command_list_fence_t>(
*ZeCommandList, {*ZeFence, false}));
pi_result = PI_SUCCESS;
}

Expand Down Expand Up @@ -2161,14 +2166,21 @@ pi_result piQueueRelease(pi_queue Queue) {

if (RefCountZero) {
// It is possible to get to here and still have an open command list
// if no wait or finish ever occurred for this queue. But still need
// // TODO: o make sure commands get executed.
// if no wait or finish ever occurred for this queue.
if (auto Res = Queue->executeOpenCommandList())
return Res;

// Make sure all commands get executed.
zeCommandQueueSynchronize(Queue->ZeCommandQueue, UINT64_MAX);

// Destroy all the fences created associated with this queue.
for (const auto &MapEntry : Queue->ZeCommandListFenceMap) {
ZE_CALL(zeFenceDestroy(MapEntry.second));
for (auto &MapEntry : Queue->ZeCommandListFenceMap) {
// This fence wasn't yet signalled when we polled it for recycling
// the command-list, so need to release the command-list too.
if (MapEntry.second.InUse) {
Queue->resetCommandListFenceEntry(MapEntry, true);
}
ZE_CALL(zeFenceDestroy(MapEntry.second.ZeFence));
}
Queue->ZeCommandListFenceMap.clear();
ZE_CALL(zeCommandQueueDestroy(Queue->ZeCommandQueue));
Expand Down Expand Up @@ -3793,10 +3805,14 @@ static pi_result cleanupAfterEvent(pi_event Event) {
// is signalled, then reset the fence and command list and add them to the
// available list for reuse in PI calls.
if (Queue->RefCount > 0) {
ze_result_t ZeResult = ZE_CALL_NOCHECK(
zeFenceQueryStatus(Queue->ZeCommandListFenceMap[EventCommandList]));
auto it = Queue->ZeCommandListFenceMap.find(EventCommandList);
if (it == Queue->ZeCommandListFenceMap.end()) {
die("Missing command-list completition fence");
}
ze_result_t ZeResult =
ZE_CALL_NOCHECK(zeFenceQueryStatus(it->second.ZeFence));
if (ZeResult == ZE_RESULT_SUCCESS) {
Queue->resetCommandListFenceEntry(EventCommandList, true);
Queue->resetCommandListFenceEntry(*it, true);
Event->ZeCommandList = nullptr;
}
}
Expand Down
21 changes: 18 additions & 3 deletions sycl/plugins/level_zero/pi_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -335,9 +335,24 @@ struct _pi_queue : _pi_object {
pi_uint32 NumTimesClosedEarly = {0};
pi_uint32 NumTimesClosedFull = {0};

// Structure describing the fence used to track command-list completion.
typedef struct {
// The Level-Zero fence that will be signalled at completion.
ze_fence_handle_t ZeFence;
// Record if the fence is in use by any command-list.
// This is needed to avoid leak of the tracked command-list if the fence
// was not yet signaled at the time all events in that list were already
// completed (we are polling the fence at events completion). The fence
// may be still "in-use" due to sporadic delay in HW.
//
bool InUse;
} command_list_fence_t;

// Map of all Command lists created with their associated Fence used for
// tracking when the command list is available for use again.
std::map<ze_command_list_handle_t, ze_fence_handle_t> ZeCommandListFenceMap;
typedef std::map<ze_command_list_handle_t, command_list_fence_t>
command_list_fence_map_t;
command_list_fence_map_t ZeCommandListFenceMap;

// Returns true if any commands for this queue are allowed to
// be batched together.
Expand All @@ -356,8 +371,8 @@ struct _pi_queue : _pi_object {
// If the reset command list should be made available, then MakeAvailable
// needs to be set to true. The caller must verify that this command list and
// fence have been signalled.
pi_result resetCommandListFenceEntry(ze_command_list_handle_t ZeCommandList,
bool MakeAvailable);
pi_result resetCommandListFenceEntry(
command_list_fence_map_t::value_type &ZeCommandList, bool MakeAvailable);

// Attach a command list to this queue, close, and execute it.
// Note that this command list cannot be appended to after this.
Expand Down