Skip to content

Commit 3828df5

Browse files
authored
[SYCL][L0][PI] Allow a finer control over copy engines with SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE (#4333)
Signed-off-by: Arvind Sudarsanam <[email protected]>
1 parent cb2265b commit 3828df5

File tree

3 files changed

+80
-25
lines changed

3 files changed

+80
-25
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ subject to change. Do not rely on these variables in production code.
3131
| `SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR` | MaxPoolableSize,Capacity,MaxPoolSize | Values specified as positive integers. Defaults are 1, 4, 256. MaxPoolableSize is the maximum allocation size in MB that may be pooled. Capacity is the number of allocations in each size range that are freed by the program but retained in the pool for reallocation. Size ranges follow this pattern: 32, 48, 64, 96, 128, 192, and so on, i.e., powers of 2, with one range in between. MaxPoolSize is the maximum size of the pool in MB. |
3232
| `SYCL_PI_LEVEL_ZERO_BATCH_SIZE` | Integer | Sets a preferred number of commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
3333
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 1. |
34-
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The default is 1. |
34+
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 1. |
3535
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
3636
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
3737
| `SYCL_PARALLEL_FOR_RANGE_ROUNDING_TRACE` | Any(\*) | Enables tracing of `parallel_for` invocations with rounded-up ranges. |

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 77 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,6 @@ static const bool UseCopyEngineForD2DCopy = [] {
6060
return (CopyEngineForD2DCopy && (std::stoi(CopyEngineForD2DCopy) != 0));
6161
}();
6262

63-
static const bool CopyEngineRequested = [] {
64-
const char *CopyEngine = std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE");
65-
bool UseCopyEngine = (!CopyEngine || (std::stoi(CopyEngine) != 0));
66-
return UseCopyEngine;
67-
}();
68-
6963
// This class encapsulates actions taken along with a call to Level Zero API.
7064
class ZeCall {
7165
private:
@@ -291,6 +285,48 @@ class ReturnHelper {
291285

292286
} // anonymous namespace
293287

288+
// SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE can be set to an integer value, or
289+
// a pair of integer values of the form "lower_index:upper_index".
290+
// Here, the indices point to copy engines in a list of all available copy
291+
// engines.
292+
// This functions returns this pair of indices.
293+
// If the user specifies only a single integer, a value of 0 indicates that
294+
// the copy engines will not be used at all. A value of 1 indicates that all
295+
// available copy engines can be used.
296+
static const std::pair<int, int> getRangeOfAllowedCopyEngines = [] {
297+
const char *EnvVar = std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE");
298+
// If the environment variable is not set, all available copy engines can be
299+
// used.
300+
if (!EnvVar)
301+
return std::pair<int, int>(0, INT_MAX);
302+
std::string CopyEngineRange = EnvVar;
303+
// Environment variable can be a single integer or a pair of integers
304+
// separated by ":"
305+
auto pos = CopyEngineRange.find(":");
306+
if (pos == std::string::npos) {
307+
bool UseCopyEngine = (std::stoi(CopyEngineRange) != 0);
308+
if (UseCopyEngine)
309+
return std::pair<int, int>(0, INT_MAX); // All copy engines can be used.
310+
return std::pair<int, int>(-1, -1); // No copy engines will be used.
311+
}
312+
int LowerCopyEngineIndex = std::stoi(CopyEngineRange.substr(0, pos));
313+
int UpperCopyEngineIndex = std::stoi(CopyEngineRange.substr(pos + 1));
314+
if ((LowerCopyEngineIndex > UpperCopyEngineIndex) ||
315+
(LowerCopyEngineIndex < -1) || (UpperCopyEngineIndex < -1)) {
316+
zePrint("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE: invalid value provided, "
317+
"default set.\n");
318+
LowerCopyEngineIndex = 0;
319+
UpperCopyEngineIndex = INT_MAX;
320+
}
321+
return std::pair<int, int>(LowerCopyEngineIndex, UpperCopyEngineIndex);
322+
}();
323+
324+
static const bool CopyEngineRequested = [] {
325+
int LowerCopyQueueIndex = getRangeOfAllowedCopyEngines.first;
326+
int UpperCopyQueueIndex = getRangeOfAllowedCopyEngines.second;
327+
return ((LowerCopyQueueIndex != -1) || (UpperCopyQueueIndex != -1));
328+
}();
329+
294330
// Global variables used in PI_Level_Zero
295331
// Note we only create a simple pointer variables such that C++ RT won't
296332
// deallocate them automatically at the end of the main program.
@@ -1103,13 +1139,21 @@ _pi_queue::getZeCopyCommandQueue(int *CopyQueueIndex,
11031139
int *CopyQueueGroupIndex) {
11041140
assert(CopyQueueIndex);
11051141
int n = ZeCopyCommandQueues.size();
1106-
// Return nullptr when no copy command queues are available
1107-
if (n == 0) {
1142+
int LowerCopyQueueIndex = getRangeOfAllowedCopyEngines.first;
1143+
int UpperCopyQueueIndex = getRangeOfAllowedCopyEngines.second;
1144+
1145+
// Return nullptr when no copy command queues are allowed to be used or if
1146+
// no copy command queues are available.
1147+
if ((LowerCopyQueueIndex == -1) || (UpperCopyQueueIndex == -1) || (n == 0)) {
11081148
if (CopyQueueGroupIndex)
11091149
*CopyQueueGroupIndex = -1;
11101150
*CopyQueueIndex = -1;
11111151
return nullptr;
11121152
}
1153+
1154+
LowerCopyQueueIndex = std::max(0, LowerCopyQueueIndex);
1155+
UpperCopyQueueIndex = std::min(UpperCopyQueueIndex, n - 1);
1156+
11131157
// If there is only one copy queue, it is the main copy queue, which is the
11141158
// first, and only entry in ZeCopyCommandQueues.
11151159
if (n == 1) {
@@ -1122,23 +1166,27 @@ _pi_queue::getZeCopyCommandQueue(int *CopyQueueIndex,
11221166

11231167
// Round robin logic is used here to access copy command queues.
11241168
// Initial value of LastUsedCopyCommandQueueIndex is -1.
1125-
// So, the round robin logic will start its access at 0th queue.
1169+
// So, the round robin logic will start its access at 'LowerCopyQueueIndex'
1170+
// queue.
11261171
// TODO: In this implementation, all the copy engines (main and link)
11271172
// have equal priority. It is expected that main copy engine will be
11281173
// advantageous for H2D and D2H copies, whereas the link copy engines will
11291174
// be advantageous for D2D. We will perform experiments and then assign
11301175
// priority to different copy engines for different types of copy operations.
1131-
if (LastUsedCopyCommandQueueIndex == (n - 1))
1132-
*CopyQueueIndex = 0;
1176+
if ((LastUsedCopyCommandQueueIndex == -1) ||
1177+
(LastUsedCopyCommandQueueIndex == UpperCopyQueueIndex))
1178+
*CopyQueueIndex = LowerCopyQueueIndex;
11331179
else
11341180
*CopyQueueIndex = LastUsedCopyCommandQueueIndex + 1;
11351181
LastUsedCopyCommandQueueIndex = *CopyQueueIndex;
11361182
zePrint("Note: CopyQueueIndex = %d\n", *CopyQueueIndex);
11371183
if (CopyQueueGroupIndex)
1138-
// Last queue in the vector of copy queues is the main copy queue.
1139-
*CopyQueueGroupIndex = (*CopyQueueIndex == (n - 1))
1140-
? Device->ZeMainCopyQueueGroupIndex
1141-
: Device->ZeLinkCopyQueueGroupIndex;
1184+
// First queue in the vector of copy queues is the main copy queue,
1185+
// if available. Otherwise it's a link copy queue.
1186+
*CopyQueueGroupIndex =
1187+
((*CopyQueueIndex == 0) && Device->hasMainCopyEngine())
1188+
? Device->ZeMainCopyQueueGroupIndex
1189+
: Device->ZeLinkCopyQueueGroupIndex;
11421190
return ZeCopyCommandQueues[*CopyQueueIndex];
11431191
}
11441192

@@ -2591,24 +2639,34 @@ pi_result piQueueCreate(pi_context Context, pi_device Device,
25912639
&ZeCommandQueueDesc, // TODO: translate properties
25922640
&ZeComputeCommandQueue));
25932641

2594-
// Create second queue to main copy engine
2642+
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;
2643+
2644+
// Create queue to main copy engine
25952645
ze_command_queue_handle_t ZeMainCopyCommandQueue = nullptr;
2596-
if (Device->hasCopyEngine()) {
2646+
if (Device->hasMainCopyEngine()) {
2647+
zePrint("NOTE: Main Copy Engine ZeCommandQueueDesc.ordinal = %d, "
2648+
"ZeCommandQueueDesc.index = %d\n",
2649+
Device->ZeMainCopyQueueGroupIndex, 0);
25972650
ZeCommandQueueDesc.ordinal = Device->ZeMainCopyQueueGroupIndex;
25982651
ZeCommandQueueDesc.index = 0;
25992652
ZE_CALL(zeCommandQueueCreate,
26002653
(Context->ZeContext, ZeDevice,
26012654
&ZeCommandQueueDesc, // TODO: translate properties
26022655
&ZeMainCopyCommandQueue));
2656+
// Main Copy Command Queue is pushed at start of ZeCopyCommandQueues
2657+
// vector.
2658+
ZeCopyCommandQueues.push_back(ZeMainCopyCommandQueue);
26032659
}
26042660
PI_ASSERT(Queue, PI_INVALID_QUEUE);
26052661

26062662
// Create additional queues to link copy engines and push them into
26072663
// ZeCopyCommandQueues vector.
2608-
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;
2609-
if (Device->hasCopyEngine()) {
2664+
if (Device->hasLinkCopyEngine()) {
26102665
auto ZeNumLinkCopyQueues = Device->ZeLinkCopyQueueGroupProperties.numQueues;
26112666
for (uint32_t i = 0; i < ZeNumLinkCopyQueues; ++i) {
2667+
zePrint("NOTE: Link Copy Engine ZeCommandQueueDesc.ordinal = %d, "
2668+
"ZeCommandQueueDesc.index = %d\n",
2669+
Device->ZeLinkCopyQueueGroupIndex, i);
26122670
ze_command_queue_handle_t ZeLinkCopyCommandQueue = nullptr;
26132671
ZeCommandQueueDesc.ordinal = Device->ZeLinkCopyQueueGroupIndex;
26142672
ZeCommandQueueDesc.index = i;
@@ -2618,9 +2676,6 @@ pi_result piQueueCreate(pi_context Context, pi_device Device,
26182676
&ZeLinkCopyCommandQueue));
26192677
ZeCopyCommandQueues.push_back(ZeLinkCopyCommandQueue);
26202678
}
2621-
// Main Copy Command Queue is pushed at the end of ZeCopyCommandQueues
2622-
// vector.
2623-
ZeCopyCommandQueues.push_back(ZeMainCopyCommandQueue);
26242679
}
26252680
PI_ASSERT(Queue, PI_INVALID_QUEUE);
26262681

sycl/plugins/level_zero/pi_level_zero.hpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -609,8 +609,8 @@ struct _pi_queue : _pi_object {
609609
// Vector of Level Zero copy command command queue handles.
610610
// Some (or all) of these handles may not be available depending on user
611611
// preference and/or target device.
612-
// In this vector, link copy engines, if available, come first followed by
613-
// main copy engine, if available.
612+
// In this vector, main copy engine, if available, come first followed by
613+
// link copy engines, if available.
614614
std::vector<ze_command_queue_handle_t> ZeCopyCommandQueues;
615615

616616
// One of the many available copy command queues will be used for

0 commit comments

Comments
 (0)