Skip to content

Commit e59098a

Browse files
[SYCL][L0] Use compute engine for memory fill command (#6802)
E2E test in intel/llvm-test-suite#1273 Signed-off-by: Sergey V Maslov <[email protected]>
1 parent aa70922 commit e59098a

File tree

2 files changed

+9
-6
lines changed

2 files changed

+9
-6
lines changed

sycl/doc/EnvironmentVariables.md

100755100644
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ variables in production code.</span>
187187
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 2, meaning only the last event in a batch is host-visible. |
188188
| `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` | Integer | When set to a positive value enables use of Level Zero immediate commandlists, which means there is no batching and all commands are immediately submitted for execution. Default is 0. Note: When immediate commandlist usage is enabled it is necessary to also set SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS to either 0 or 1. |
189189
| `SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS` | Integer | When set to a positive value enables use of multiple Level Zero commandlists when submitting barriers. Default is 0. |
190+
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL` | Integer | When set to a positive value enables use of a copy engine for memory fill operations. Default is 0. |
190191

191192
## Debugging variables for CUDA Plugin
192193

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6660,7 +6660,7 @@ enqueueMemCopyHelper(pi_command_type CommandType, pi_queue Queue, void *Dst,
66606660
printZeEventList(WaitList);
66616661

66626662
ZE_CALL(zeCommandListAppendMemoryCopy,
6663-
(ZeCommandList, Dst, Src, Size, ZeEvent, 0, nullptr));
6663+
(ZeCommandList, Dst, Src, Size, ZeEvent, 0, nullptr));
66646664

66656665
if (auto Res =
66666666
Queue->executeCommandList(CommandList, BlockingWrite, OkToBatch))
@@ -6913,15 +6913,17 @@ enqueueMemFillHelper(pi_command_type CommandType, pi_queue Queue, void *Ptr,
69136913

69146914
auto &Device = Queue->Device;
69156915

6916-
// Performance analysis on a simple SYCL data "fill" test shows copy engine
6917-
// is faster than compute engine for such operations.
6918-
//
6919-
bool PreferCopyEngine = true;
6916+
// Default to using compute engine for fill operation, but allow to
6917+
// override this with an environment variable.
6918+
const char *PreferCopyEngineEnv =
6919+
std::getenv("SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL");
6920+
bool PreferCopyEngine =
6921+
PreferCopyEngineEnv ? std::stoi(PreferCopyEngineEnv) != 0 : false;
69206922

69216923
// Make sure that pattern size matches the capability of the copy queues.
69226924
// Check both main and link groups as we don't known which one will be used.
69236925
//
6924-
if (Device->hasCopyEngine()) {
6926+
if (PreferCopyEngine && Device->hasCopyEngine()) {
69256927
if (Device->hasMainCopyEngine() &&
69266928
Device->QueueGroup[_pi_device::queue_group_info_t::MainCopy]
69276929
.ZeProperties.maxMemoryFillPatternSize < PatternSize) {

0 commit comments

Comments
 (0)