-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Start events cleanup for immediate command lists based on threshold #7773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…shold For regular command lists we check the status of the command list fence quite often - every time when command list is requested and there is no available command list. When fence is completed we cleanup events from the command list. Currently for immediate command lists events get cleaned up only at points of syncrhonization which may be too rare for some applications. So, make the cleanup process to be started if number of events in the immediate command list exceeds threshold. Also introduce a switch which allows to control/tune. This results in better recycling of events for immediate command lists and improves performance.
static const size_t ImmCmdListsEventCleanupThreshold = [] { | ||
const char *ImmCmdListsEventCleanupThresholdStr = std::getenv( | ||
"SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD"); | ||
static constexpr int Default = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What made you choose 100? Can it be lower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be lower, what number do you think would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Rajiv offline, agreed on 20 for now. It can be changed after benchmarking.
sycl/doc/EnvironmentVariables.md
Outdated
@@ -250,6 +250,7 @@ variables in production code.</span> | |||
| `SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION` | Integer | When set to "0" tells to use single root-device allocation for all devices in a context where all devices have same root. Otherwise performs regular buffer migration. Default is 1. | | |||
| `SYCL_PI_LEVEL_ZERO_REUSE_DISCARDED_EVENTS` | Integer | When set to a positive value enables the mode when discarded Level Zero events are reset and reused in scope of the same in-order queue based on the dependency chain between commands. Default is 1. | | |||
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. | | |||
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 100. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 100. | | |
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 20. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thank you!
if (!ImmCmdListsEventCleanupThresholdStr) | ||
return Default; | ||
|
||
int Threshold = std::stoi(ImmCmdListsEventCleanupThresholdStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atoi doesn't throw
int Threshold = std::stoi(ImmCmdListsEventCleanupThresholdStr); | |
int Threshold = std::atoi(ImmCmdListsEventCleanupThresholdStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change, for some reasons we have mix of both in the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to fix other uses of "stoi", change to "atoi" or try-catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to atoi.
if (CommandList->second.EventList.size() > | ||
ImmCmdListsEventCleanupThreshold) { | ||
std::vector<pi_event> EventListToCleanup; | ||
Queue->resetCommandList(CommandList, false, EventListToCleanup); | ||
CleanupEventListFromResetCmdList(EventListToCleanup, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you find it desired to run more cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because currently for immediate command lists we cleanup only at point of synchronization. If application doesn't have any points of synchronization then we keep creating new events for immediate lists. This change results in better recycling of events and I see significant performance improvement for one of the sycl benchmarks (didn't check other)
Also I provided some details in the description of PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good.
For regular command lists we check the status of the command list fence quite often - every time when command list is requested and there is no available command list. When fence is completed we cleanup events from the command list.
Currently for immediate command lists events get cleaned up only at points of syncrhonization which may be too rare for some applications. So, make the cleanup process to be started if number of events in the immediate command list exceeds threshold. Also introduce a switch which allows to control/tune.
This results in better recycling of events for immediate command lists and improves performance.