Skip to content

[SYCL] Allow batching for wait with a barrier commands #5307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5379,6 +5379,22 @@ pi_result piEnqueueEventsWaitWithBarrier(pi_queue Queue,
PI_ASSERT(Queue, PI_INVALID_QUEUE);
PI_ASSERT(Event, PI_INVALID_EVENT);

// Submit dependent open command lists for execution, if any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify in the comment why we need to execute all command lists.
It doesn't look obvious why it is required.

Also this maybe not closely related to this PR but how we deal with the situation when events in the waitlist are associated with device(s) which are different from device where command list is created. It looks like it is illegal per L0 spec:
https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zecommandlistappendbarrier#zecommandlistappendbarrier

The application must ensure the events are accessible by the device on which the command list was created.

In opencl requirement is less strict and it is enough for queue and events to be in the same context:
https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueBarrierWithWaitList.html

The context associated with events in event_wait_list and command_queue must be the same.

It doesn't look like we have a guarantee that all events in the list are associated with the same device or its subdevices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to execute all command lists.

This is to ensure forward progress, and no deadlock.
If we don't submit commands being waited, then the wait may be infinite.

deal with the situation when events in the waitlist are associated with device(s) which are different from device where command list is created

That's the same as for all other commands, e.g. see zeCommandListAppendLaunchKernel:
https://spec.oneapi.io/level-zero/latest/core/api.html?highlight=zecommandlistappend#_CPPv431zeCommandListAppendLaunchKernel24ze_command_list_handle_t18ze_kernel_handle_tPK16ze_group_count_t17ze_event_handle_t8uint32_tP17ze_event_handle_t

If this is an issue then we'd address it separately, and I don't know how at this moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure forward progress, and no deadlock.
If we don't submit commands being waited, then the wait may be infinite.

Indeed, got it, thank you.

That's the same as for all other commands, e.g. see zeCommandListAppendLaunchKernel:

You are right, I didn't notice this until now.

// Only do it for queues other than the current, since the barrier
// will go into current queue submission together with the waited event.
for (uint32_t I = 0; I < NumEventsInWaitList; I++) {
auto EventQueue = EventWaitList[I]->Queue;
if (EventQueue && EventQueue != Queue) {
// Lock automatically releases when this goes out of scope.
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex);

if (EventQueue->RefCount > 0) {
if (auto Res = EventQueue->executeAllOpenCommandLists())
return Res;
}
}
}

// Lock automatically releases when this goes out of scope.
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex);

Expand All @@ -5388,8 +5404,10 @@ pi_result piEnqueueEventsWaitWithBarrier(pi_queue Queue,
return Res;

// Get a new command list to be used on this call
bool OkToBatch = true;
pi_command_list_ptr_t CommandList{};
if (auto Res = Queue->Context->getAvailableCommandList(Queue, CommandList))
if (auto Res = Queue->Context->getAvailableCommandList(
Queue, CommandList, false /*copy*/, OkToBatch))
return Res;

ze_event_handle_t ZeEvent = nullptr;
Expand All @@ -5406,7 +5424,7 @@ pi_result piEnqueueEventsWaitWithBarrier(pi_queue Queue,

// Execute command list asynchronously as the event will be used
// to track down its completion.
return Queue->executeCommandList(CommandList);
return Queue->executeCommandList(CommandList, false, OkToBatch);
}

pi_result piEnqueueMemBufferRead(pi_queue Queue, pi_mem Src,
Expand Down