Skip to content

[SYCL][L0][PI] Modify SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE environment variable to allow user a finer control over copy engines #4333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ subject to change. Do not rely on these variables in production code.
| `SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR` | MaxPoolableSize,Capacity,MaxPoolSize | Values specified as positive integers. Defaults are 1, 4, 256. MaxPoolableSize is the maximum allocation size in MB that may be pooled. Capacity is the number of allocations in each size range that are freed by the program but retained in the pool for reallocation. Size ranges follow this pattern: 32, 48, 64, 96, 128, 192, and so on, i.e., powers of 2, with one range in between. MaxPoolSize is the maximum size of the pool in MB. |
| `SYCL_PI_LEVEL_ZERO_BATCH_SIZE` | Integer | Sets a preferred number of commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
| `SYCL_PARALLEL_FOR_RANGE_ROUNDING_TRACE` | Any(\*) | Enables tracing of `parallel_for` invocations with rounded-up ranges. |
Expand Down
99 changes: 77 additions & 22 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,6 @@ static const bool UseCopyEngineForD2DCopy = [] {
return (CopyEngineForD2DCopy && (std::stoi(CopyEngineForD2DCopy) != 0));
}();

static const bool CopyEngineRequested = [] {
const char *CopyEngine = std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE");
bool UseCopyEngine = (!CopyEngine || (std::stoi(CopyEngine) != 0));
return UseCopyEngine;
}();

// This class encapsulates actions taken along with a call to Level Zero API.
class ZeCall {
private:
Expand Down Expand Up @@ -291,6 +285,48 @@ class ReturnHelper {

} // anonymous namespace

// SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE can be set to an integer value, or
// a pair of integer values of the form "lower_index:upper_index".
// Here, the indices point to copy engines in a list of all available copy
// engines.
// This functions returns this pair of indices.
// If the user specifies only a single integer, a value of 0 indicates that
// the copy engines will not be used at all. A value of 1 indicates that all
// available copy engines can be used.
static const std::pair<int, int> getRangeOfAllowedCopyEngines = [] {
const char *EnvVar = std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE");
// If the environment variable is not set, all available copy engines can be
// used.
if (!EnvVar)
return std::pair<int, int>(0, INT_MAX);
std::string CopyEngineRange = EnvVar;
// Environment variable can be a single integer or a pair of integers
// separated by ":"
auto pos = CopyEngineRange.find(":");
if (pos == std::string::npos) {
bool UseCopyEngine = (std::stoi(CopyEngineRange) != 0);
if (UseCopyEngine)
return std::pair<int, int>(0, INT_MAX); // All copy engines can be used.
return std::pair<int, int>(-1, -1); // No copy engines will be used.
}
int LowerCopyEngineIndex = std::stoi(CopyEngineRange.substr(0, pos));
int UpperCopyEngineIndex = std::stoi(CopyEngineRange.substr(pos + 1));
if ((LowerCopyEngineIndex > UpperCopyEngineIndex) ||
(LowerCopyEngineIndex < -1) || (UpperCopyEngineIndex < -1)) {
zePrint("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE: invalid value provided, "
"default set.\n");
LowerCopyEngineIndex = 0;
UpperCopyEngineIndex = INT_MAX;
}
return std::pair<int, int>(LowerCopyEngineIndex, UpperCopyEngineIndex);
}();

static const bool CopyEngineRequested = [] {
int LowerCopyQueueIndex = getRangeOfAllowedCopyEngines.first;
int UpperCopyQueueIndex = getRangeOfAllowedCopyEngines.second;
return ((LowerCopyQueueIndex != -1) || (UpperCopyQueueIndex != -1));
}();

// Global variables used in PI_Level_Zero
// Note we only create a simple pointer variables such that C++ RT won't
// deallocate them automatically at the end of the main program.
Expand Down Expand Up @@ -1056,13 +1092,21 @@ _pi_queue::getZeCopyCommandQueue(int *CopyQueueIndex,
int *CopyQueueGroupIndex) {
assert(CopyQueueIndex);
int n = ZeCopyCommandQueues.size();
// Return nullptr when no copy command queues are available
if (n == 0) {
int LowerCopyQueueIndex = getRangeOfAllowedCopyEngines.first;
int UpperCopyQueueIndex = getRangeOfAllowedCopyEngines.second;

// Return nullptr when no copy command queues are allowed to be used or if
// no copy command queues are available.
if ((LowerCopyQueueIndex == -1) || (UpperCopyQueueIndex == -1) || (n == 0)) {
if (CopyQueueGroupIndex)
*CopyQueueGroupIndex = -1;
*CopyQueueIndex = -1;
return nullptr;
}

LowerCopyQueueIndex = std::max(0, LowerCopyQueueIndex);
UpperCopyQueueIndex = std::min(UpperCopyQueueIndex, n - 1);

// If there is only one copy queue, it is the main copy queue, which is the
// first, and only entry in ZeCopyCommandQueues.
if (n == 1) {
Expand All @@ -1075,23 +1119,27 @@ _pi_queue::getZeCopyCommandQueue(int *CopyQueueIndex,

// Round robin logic is used here to access copy command queues.
// Initial value of LastUsedCopyCommandQueueIndex is -1.
// So, the round robin logic will start its access at 0th queue.
// So, the round robin logic will start its access at 'LowerCopyQueueIndex'
// queue.
// TODO: In this implementation, all the copy engines (main and link)
// have equal priority. It is expected that main copy engine will be
// advantageous for H2D and D2H copies, whereas the link copy engines will
// be advantageous for D2D. We will perform experiments and then assign
// priority to different copy engines for different types of copy operations.
if (LastUsedCopyCommandQueueIndex == (n - 1))
*CopyQueueIndex = 0;
if ((LastUsedCopyCommandQueueIndex == -1) ||
(LastUsedCopyCommandQueueIndex == UpperCopyQueueIndex))
*CopyQueueIndex = LowerCopyQueueIndex;
else
*CopyQueueIndex = LastUsedCopyCommandQueueIndex + 1;
LastUsedCopyCommandQueueIndex = *CopyQueueIndex;
zePrint("Note: CopyQueueIndex = %d\n", *CopyQueueIndex);
if (CopyQueueGroupIndex)
// Last queue in the vector of copy queues is the main copy queue.
*CopyQueueGroupIndex = (*CopyQueueIndex == (n - 1))
? Device->ZeMainCopyQueueGroupIndex
: Device->ZeLinkCopyQueueGroupIndex;
// First queue in the vector of copy queues is the main copy queue,
// if available. Otherwise it's a link copy queue.
*CopyQueueGroupIndex =
((*CopyQueueIndex == 0) && Device->hasMainCopyEngine())
? Device->ZeMainCopyQueueGroupIndex
: Device->ZeLinkCopyQueueGroupIndex;
return ZeCopyCommandQueues[*CopyQueueIndex];
}

Expand Down Expand Up @@ -2572,24 +2620,34 @@ pi_result piQueueCreate(pi_context Context, pi_device Device,
&ZeCommandQueueDesc, // TODO: translate properties
&ZeComputeCommandQueue));

// Create second queue to main copy engine
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;

// Create queue to main copy engine
ze_command_queue_handle_t ZeMainCopyCommandQueue = nullptr;
if (Device->hasCopyEngine()) {
if (Device->hasMainCopyEngine()) {
zePrint("NOTE: Main Copy Engine ZeCommandQueueDesc.ordinal = %d, "
"ZeCommandQueueDesc.index = %d\n",
Device->ZeMainCopyQueueGroupIndex, 0);
ZeCommandQueueDesc.ordinal = Device->ZeMainCopyQueueGroupIndex;
ZeCommandQueueDesc.index = 0;
ZE_CALL(zeCommandQueueCreate,
(Context->ZeContext, ZeDevice,
&ZeCommandQueueDesc, // TODO: translate properties
&ZeMainCopyCommandQueue));
// Main Copy Command Queue is pushed at start of ZeCopyCommandQueues
// vector.
ZeCopyCommandQueues.push_back(ZeMainCopyCommandQueue);
}
PI_ASSERT(Queue, PI_INVALID_QUEUE);

// Create additional queues to link copy engines and push them into
// ZeCopyCommandQueues vector.
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;
if (Device->hasCopyEngine()) {
if (Device->hasLinkCopyEngine()) {
auto ZeNumLinkCopyQueues = Device->ZeLinkCopyQueueGroupProperties.numQueues;
for (uint32_t i = 0; i < ZeNumLinkCopyQueues; ++i) {
zePrint("NOTE: Link Copy Engine ZeCommandQueueDesc.ordinal = %d, "
"ZeCommandQueueDesc.index = %d\n",
Device->ZeLinkCopyQueueGroupIndex, i);
ze_command_queue_handle_t ZeLinkCopyCommandQueue = nullptr;
ZeCommandQueueDesc.ordinal = Device->ZeLinkCopyQueueGroupIndex;
ZeCommandQueueDesc.index = i;
Expand All @@ -2599,9 +2657,6 @@ pi_result piQueueCreate(pi_context Context, pi_device Device,
&ZeLinkCopyCommandQueue));
ZeCopyCommandQueues.push_back(ZeLinkCopyCommandQueue);
}
// Main Copy Command Queue is pushed at the end of ZeCopyCommandQueues
// vector.
ZeCopyCommandQueues.push_back(ZeMainCopyCommandQueue);
}
PI_ASSERT(Queue, PI_INVALID_QUEUE);

Expand Down
4 changes: 2 additions & 2 deletions sycl/plugins/level_zero/pi_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -559,8 +559,8 @@ struct _pi_queue : _pi_object {
// Vector of Level Zero copy command command queue handles.
// Some (or all) of these handles may not be available depending on user
// preference and/or target device.
// In this vector, link copy engines, if available, come first followed by
// main copy engine, if available.
// In this vector, main copy engine, if available, come first followed by
// link copy engines, if available.
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;

// One of the many available copy command queues will be used for
Expand Down