Skip to content

Commit ce69d69

Browse files
[SYCL][L0] Heuristically reduce overhead from immediate command-list cleanup (#9052)
Signed-off-by: Sergey V Maslov <[email protected]>
1 parent 0794dff commit ce69d69

File tree

2 files changed

+34
-11
lines changed

2 files changed

+34
-11
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ variables in production code.</span>
258258
| `SYCL_PI_LEVEL_ZERO_REUSE_DISCARDED_EVENTS` | Integer | When set to a positive value enables the mode when discarded Level Zero events are reset and reused in scope of the same in-order queue based on the dependency chain between commands. Default is 1. |
259259
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
260260
| `SYCL_PI_LEVEL_ZERO_COMMANDLISTS_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of command lists in a queue exceeds this threshold, an attempt is made to cleanup completed command lists for their subsequent reuse. The default is 20. |
261-
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 20. |
261+
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 1000. |
262262
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Controls if/where to make USM allocations resident at the time of allocation. If set to 0 (default) then no special residency is forced. If set to 1 then allocation (device or shared) is made resident at the device of allocation. If set to 2 then allocation (device or shared) is made resident on all devices in the context of allocation that have P2P access to the device of allocation. For host allocation, any non-0 setting forces the allocation resident on all devices in the context. |
263263
| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |
264264

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,7 @@ pi_result _pi_queue::addEventToQueueCache(pi_event Event) {
475475
static const size_t ImmCmdListsEventCleanupThreshold = [] {
476476
const char *ImmCmdListsEventCleanupThresholdStr = std::getenv(
477477
"SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD");
478-
static constexpr int Default = 20;
478+
static constexpr int Default = 1000;
479479
if (!ImmCmdListsEventCleanupThresholdStr)
480480
return Default;
481481

@@ -739,26 +739,49 @@ pi_result _pi_queue::resetCommandList(pi_command_list_ptr_t CommandList,
739739
std::back_inserter(EventListToCleanup));
740740
EventList.clear();
741741
} else if (!isDiscardEvents()) {
742-
// For immediate commandlist reset only those events that have signalled.
743742
// If events in the queue are discarded then we can't check their status.
744-
for (auto it = EventList.begin(); it != EventList.end();) {
745-
std::scoped_lock<ur_shared_mutex> EventLock((*it)->Mutex);
743+
// Helper for checking of event completion
744+
auto EventCompleted = [](pi_event Event) -> bool {
745+
std::scoped_lock<ur_shared_mutex> EventLock(Event->Mutex);
746746
ze_result_t ZeResult =
747-
(*it)->Completed
747+
Event->Completed
748748
? ZE_RESULT_SUCCESS
749-
: ZE_CALL_NOCHECK(zeEventQueryStatus, ((*it)->ZeEvent));
749+
: ZE_CALL_NOCHECK(zeEventQueryStatus, (Event->ZeEvent));
750+
return ZeResult == ZE_RESULT_SUCCESS;
751+
};
752+
// Handle in-order specially as we can just in few checks (with binary
753+
// search) a completed event and then all events before it are also
754+
// done.
755+
if (isInOrderQueue()) {
756+
size_t Bisect = EventList.size();
757+
size_t Iter = 0;
758+
for (auto it = EventList.rbegin(); it != EventList.rend(); ++Iter) {
759+
if (!EventCompleted(*it)) {
760+
if (Bisect > 1 && Iter < 3) { // Heuristically limit by 3 checks
761+
Bisect >>= 1;
762+
it += Bisect;
763+
continue;
764+
}
765+
break;
766+
}
767+
// Bulk move of event up to "it" to the list ready for cleanup
768+
std::move(it, EventList.rend(), std::back_inserter(EventListToCleanup));
769+
EventList.erase(EventList.begin(), it.base());
770+
break;
771+
}
772+
return PI_SUCCESS;
773+
}
774+
// For immediate commandlist reset only those events that have signalled.
775+
for (auto it = EventList.begin(); it != EventList.end();) {
750776
// Break early as soon as we found first incomplete event because next
751777
// events are submitted even later. We are not trying to find all
752778
// completed events here because it may be costly. I.e. we are checking
753779
// only elements which are most likely completed because they were
754780
// submitted earlier. It is guaranteed that all events will be eventually
755781
// cleaned up at queue sync/release.
756-
if (ZeResult == ZE_RESULT_NOT_READY)
782+
if (!EventCompleted(*it))
757783
break;
758784

759-
if (ZeResult != ZE_RESULT_SUCCESS)
760-
return mapError(ZeResult);
761-
762785
EventListToCleanup.push_back(std::move((*it)));
763786
it = EventList.erase(it);
764787
}

0 commit comments

Comments
 (0)