Skip to content

Commit 2023e10

Browse files
[SYCL] Add a workaround for a Level Zero batching issue (#4268)
Consider the following scenario on Level Zero: 1. Kernel A, which uses buffer A, is submitted to queue A. 2. Kernel B, which uses buffer B, is submitted to queue B. 3. queueA.wait(). 4. queueB.wait(). DPCPP runtime used to treat unmap/write commands for buffer A/B as host dependencies (i.e. they were waited for prior to enqueueing any command that's dependent on them). This allowed Level Zero plugin to detect that each queue is idle on steps 1/2 and submit the command list right away. This is no longer the case since we started passing these dependencies in an event waitlist and Level Zero plugin attempts to batch these commands, so the execution of kernel B starts only on step 4. This workaround restores the old behavior in this case until this is resolved.
1 parent 0c866e8 commit 2023e10

File tree

2 files changed

+41
-0
lines changed

2 files changed

+41
-0
lines changed

sycl/source/detail/scheduler/commands.cpp

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1091,6 +1091,25 @@ void UnMapMemObject::emitInstrumentationData() {
10911091
#endif
10921092
}
10931093

1094+
bool UnMapMemObject::producesPiEvent() const {
1095+
// TODO remove this workaround once the batching issue is addressed in Level
1096+
// Zero plugin.
1097+
// Consider the following scenario on Level Zero:
1098+
// 1. Kernel A, which uses buffer A, is submitted to queue A.
1099+
// 2. Kernel B, which uses buffer B, is submitted to queue B.
1100+
// 3. queueA.wait().
1101+
// 4. queueB.wait().
1102+
// DPCPP runtime used to treat unmap/write commands for buffer A/B as host
1103+
// dependencies (i.e. they were waited for prior to enqueueing any command
1104+
// that's dependent on them). This allowed Level Zero plugin to detect that
1105+
// each queue is idle on steps 1/2 and submit the command list right away.
1106+
// This is no longer the case since we started passing these dependencies in
1107+
// an event waitlist and Level Zero plugin attempts to batch these commands,
1108+
// so the execution of kernel B starts only on step 4. This workaround
1109+
// restores the old behavior in this case until this is resolved.
1110+
return MQueue->getPlugin().getBackend() != backend::level_zero;
1111+
}
1112+
10941113
cl_int UnMapMemObject::enqueueImp() {
10951114
waitForPreparedHostEvents();
10961115
std::vector<EventImplPtr> EventImpls = MPreparedDepsEvents;
@@ -1167,6 +1186,26 @@ const QueueImplPtr &MemCpyCommand::getWorkerQueue() const {
11671186
return MQueue->is_host() ? MSrcQueue : MQueue;
11681187
}
11691188

1189+
bool MemCpyCommand::producesPiEvent() const {
1190+
// TODO remove this workaround once the batching issue is addressed in Level
1191+
// Zero plugin.
1192+
// Consider the following scenario on Level Zero:
1193+
// 1. Kernel A, which uses buffer A, is submitted to queue A.
1194+
// 2. Kernel B, which uses buffer B, is submitted to queue B.
1195+
// 3. queueA.wait().
1196+
// 4. queueB.wait().
1197+
// DPCPP runtime used to treat unmap/write commands for buffer A/B as host
1198+
// dependencies (i.e. they were waited for prior to enqueueing any command
1199+
// that's dependent on them). This allowed Level Zero plugin to detect that
1200+
// each queue is idle on steps 1/2 and submit the command list right away.
1201+
// This is no longer the case since we started passing these dependencies in
1202+
// an event waitlist and Level Zero plugin attempts to batch these commands,
1203+
// so the execution of kernel B starts only on step 4. This workaround
1204+
// restores the old behavior in this case until this is resolved.
1205+
return MQueue->is_host() ||
1206+
MQueue->getPlugin().getBackend() != backend::level_zero;
1207+
}
1208+
11701209
cl_int MemCpyCommand::enqueueImp() {
11711210
waitForPreparedHostEvents();
11721211
std::vector<EventImplPtr> EventImpls = MPreparedDepsEvents;

sycl/source/detail/scheduler/commands.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,7 @@ class UnMapMemObject : public Command {
441441
void printDot(std::ostream &Stream) const final;
442442
const Requirement *getRequirement() const final { return &MDstReq; }
443443
void emitInstrumentationData() override;
444+
bool producesPiEvent() const final;
444445

445446
private:
446447
cl_int enqueueImp() final;
@@ -463,6 +464,7 @@ class MemCpyCommand : public Command {
463464
void emitInstrumentationData() final;
464465
const ContextImplPtr &getWorkerContext() const final;
465466
const QueueImplPtr &getWorkerQueue() const final;
467+
bool producesPiEvent() const final;
466468

467469
private:
468470
cl_int enqueueImp() final;

0 commit comments

Comments
 (0)