Skip to content

Commit ca54ea3

Browse files
authored
[SYCL] Start events cleanup for immediate command lists based on threshold (#7773)
For regular command lists we check the status of the command list fence quite often - every time when command list is requested and there is no available command list. When fence is completed we cleanup events from the command list. Currently for immediate command lists events get cleaned up only at points of syncrhonization which may be too rare for some applications. So, make the cleanup process to be started if number of events in the immediate command list exceeds threshold. Also introduce a switch which allows to control/tune. This results in better recycling of events for immediate command lists and improves performance.
1 parent aca04e7 commit ca54ea3

File tree

2 files changed

+26
-0
lines changed

2 files changed

+26
-0
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,7 @@ variables in production code.</span>
250250
| `SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION` | Integer | When set to "0" tells to use single root-device allocation for all devices in a context where all devices have same root. Otherwise performs regular buffer migration. Default is 1. |
251251
| `SYCL_PI_LEVEL_ZERO_REUSE_DISCARDED_EVENTS` | Integer | When set to a positive value enables the mode when discarded Level Zero events are reset and reused in scope of the same in-order queue based on the dependency chain between commands. Default is 1. |
252252
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
253+
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 20. |
253254

254255
## Debugging variables for CUDA Plugin
255256

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -845,6 +845,25 @@ static const int ImmediateCommandlistsSetting = [] {
845845
return std::stoi(ImmediateCommandlistsSettingStr);
846846
}();
847847

848+
// Get value of the threshold for number of events in immediate command lists.
849+
// If number of events in the immediate command list exceeds this threshold then
850+
// cleanup process for those events is executed.
851+
static const size_t ImmCmdListsEventCleanupThreshold = [] {
852+
const char *ImmCmdListsEventCleanupThresholdStr = std::getenv(
853+
"SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD");
854+
static constexpr int Default = 20;
855+
if (!ImmCmdListsEventCleanupThresholdStr)
856+
return Default;
857+
858+
int Threshold = std::atoi(ImmCmdListsEventCleanupThresholdStr);
859+
860+
// Basically disable threshold if negative value is provided.
861+
if (Threshold < 0)
862+
return INT_MAX;
863+
864+
return Threshold;
865+
}();
866+
848867
// Whether immediate commandlists will be used for kernel launches and copies.
849868
// The default is standard commandlists. Setting a value >=1 specifies use of
850869
// immediate commandlists. Note: when immediate commandlists are used then
@@ -1439,6 +1458,12 @@ pi_result _pi_context::getAvailableCommandList(
14391458
// Immediate commandlists have been pre-allocated and are always available.
14401459
if (Queue->Device->useImmediateCommandLists()) {
14411460
CommandList = Queue->getQueueGroup(UseCopyEngine).getImmCmdList();
1461+
if (CommandList->second.EventList.size() >
1462+
ImmCmdListsEventCleanupThreshold) {
1463+
std::vector<pi_event> EventListToCleanup;
1464+
Queue->resetCommandList(CommandList, false, EventListToCleanup);
1465+
CleanupEventListFromResetCmdList(EventListToCleanup, true);
1466+
}
14421467
PI_CALL(Queue->insertStartBarrierIfDiscardEventsMode(CommandList));
14431468
if (auto Res = Queue->insertActiveBarriers(CommandList, UseCopyEngine))
14441469
return Res;

0 commit comments

Comments
 (0)