Skip to content

Commit 32be008

Browse files
authored
[SYCL][L0] Add temporary option to allow user to use copy engine for device to device copy (#4127)
This option has been added to enable users to analyze performance of device to device copy operations on the copy engine. Signed-off-by: Arvind Sudarsanam <[email protected]>
1 parent 66ef4eb commit 32be008

File tree

2 files changed

+17
-0
lines changed

2 files changed

+17
-0
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ subject to change. Do not rely on these variables in production code.
3232
| `SYCL_PI_LEVEL_ZERO_BATCH_SIZE` | Integer | Sets a preferred number of commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
3333
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 1. |
3434
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The default is 1. |
35+
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
3536
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
3637
| `SYCL_PARALLEL_FOR_RANGE_ROUNDING_TRACE` | Any(\*) | Enables tracing of `parallel_for` invocations with rounded-up ranges. |
3738
| `SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING` | Any(\*) | Disables automatic rounding-up of `parallel_for` invocation ranges. |

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,14 @@ static const pi_uint32 ZeSerialize = [] {
4646
return SerializeModeValue;
4747
}();
4848

49+
// This is an experimental option to test performance of device to device copy
50+
// operations on copy engines (versus compute engine)
51+
static const bool UseCopyEngineForD2DCopy = [] {
52+
const char *CopyEngineForD2DCopy =
53+
std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY");
54+
return (CopyEngineForD2DCopy && (std::stoi(CopyEngineForD2DCopy) != 0));
55+
}();
56+
4957
static const bool CopyEngineRequested = [] {
5058
const char *CopyEngine = std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE");
5159
bool UseCopyEngine = (!CopyEngine || (std::stoi(CopyEngine) != 0));
@@ -5034,6 +5042,10 @@ pi_result piEnqueueMemBufferCopy(pi_queue Queue, pi_mem SrcBuffer,
50345042
// Copy engine is preferred only for host to device transfer.
50355043
// Device to device transfers run faster on compute engines.
50365044
bool PreferCopyEngine = (SrcBuffer->OnHost || DstBuffer->OnHost);
5045+
5046+
// Temporary option added to use copy engine for D2D copy
5047+
PreferCopyEngine |= UseCopyEngineForD2DCopy;
5048+
50375049
return enqueueMemCopyHelper(
50385050
PI_COMMAND_TYPE_MEM_BUFFER_COPY, Queue,
50395051
pi_cast<char *>(DstBuffer->getZeHandle()) + DstOffset,
@@ -6281,6 +6293,10 @@ pi_result piextUSMEnqueueMemcpy(pi_queue Queue, pi_bool Blocking, void *DstPtr,
62816293
// (versus compute engine).
62826294
bool PreferCopyEngine = !IsDevicePointer(Queue->Context, SrcPtr) ||
62836295
!IsDevicePointer(Queue->Context, DstPtr);
6296+
6297+
// Temporary option added to use copy engine for D2D copy
6298+
PreferCopyEngine |= UseCopyEngineForD2DCopy;
6299+
62846300
return enqueueMemCopyHelper(
62856301
// TODO: do we need a new command type for this?
62866302
PI_COMMAND_TYPE_MEM_BUFFER_COPY, Queue, DstPtr, Blocking, Size, SrcPtr,

0 commit comments

Comments
 (0)