Skip to content

[SYCL] Start events cleanup for immediate command lists based on threshold #7773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 14, 2022

Conversation

againull
Copy link
Contributor

For regular command lists we check the status of the command list fence quite often - every time when command list is requested and there is no available command list. When fence is completed we cleanup events from the command list.

Currently for immediate command lists events get cleaned up only at points of syncrhonization which may be too rare for some applications. So, make the cleanup process to be started if number of events in the immediate command list exceeds threshold. Also introduce a switch which allows to control/tune.

This results in better recycling of events for immediate command lists and improves performance.

…shold

For regular command lists we check the status of the command list fence
quite often - every time when command list is requested and there is no
available command list. When fence is completed we cleanup events from
the command list.

Currently for immediate command lists events get cleaned up only at
points of syncrhonization which may be too rare for some applications.
So, make the cleanup process to be started if number of events in the
immediate command list exceeds threshold. Also introduce a switch which
allows to control/tune.

This results in better recycling of events for immediate command lists
and improves performance.
@againull againull requested review from a team as code owners December 14, 2022 00:03
static const size_t ImmCmdListsEventCleanupThreshold = [] {
const char *ImmCmdListsEventCleanupThresholdStr = std::getenv(
"SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD");
static constexpr int Default = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What made you choose 100? Can it be lower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be lower, what number do you think would be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Rajiv offline, agreed on 20 for now. It can be changed after benchmarking.

@@ -250,6 +250,7 @@ variables in production code.</span>
| `SYCL_PI_LEVEL_ZERO_SINGLE_ROOT_DEVICE_BUFFER_MIGRATION` | Integer | When set to "0" tells to use single root-device allocation for all devices in a context where all devices have same root. Otherwise performs regular buffer migration. Default is 1. |
| `SYCL_PI_LEVEL_ZERO_REUSE_DISCARDED_EVENTS` | Integer | When set to a positive value enables the mode when discarded Level Zero events are reset and reused in scope of the same in-order queue based on the dependency chain between commands. Default is 1. |
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 100. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 100. |
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If a non-negative value then the threshold becomes equal to this value. If a negative value then the threshold is INT_MAX. If number of events in the immediate command list exceeds this threshold then cleanup process for those events is executed. It allows to control recycling of events. When this threshold is low then status of the events is checked more often which may allow to recycle events faster, on the other hand makind it too low may result in unnecessary status checks. Default is 20. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry, fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 20.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you!

if (!ImmCmdListsEventCleanupThresholdStr)
return Default;

int Threshold = std::stoi(ImmCmdListsEventCleanupThresholdStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atoi doesn't throw

Suggested change
int Threshold = std::stoi(ImmCmdListsEventCleanupThresholdStr);
int Threshold = std::atoi(ImmCmdListsEventCleanupThresholdStr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change, for some reasons we have mix of both in the plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to fix other uses of "stoi", change to "atoi" or try-catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to atoi.

Comment on lines +1461 to +1466
if (CommandList->second.EventList.size() >
ImmCmdListsEventCleanupThreshold) {
std::vector<pi_event> EventListToCleanup;
Queue->resetCommandList(CommandList, false, EventListToCleanup);
CleanupEventListFromResetCmdList(EventListToCleanup, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you find it desired to run more cleanup?

Copy link
Contributor Author

@againull againull Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because currently for immediate command lists we cleanup only at point of synchronization. If application doesn't have any points of synchronization then we keep creating new events for immediate lists. This change results in better recycling of events and I see significant performance improvement for one of the sycl benchmarks (didn't check other)

Also I provided some details in the description of PR.

Copy link
Contributor

@rdeodhar rdeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good.

@steffenlarsen steffenlarsen merged commit ca54ea3 into intel:sycl Dec 14, 2022
@againull againull deleted the cleanup_on_threshold branch December 14, 2022 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants