Skip to content

[SYCL] Prework before protecting access to the event objects with a mutex #6179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 1, 2022

Conversation

againull
Copy link
Contributor

@againull againull commented May 20, 2022

  • Currently resetCommandList() calls cleanup() and vice versa which makes it difficult to guard access to pi_objects processed by these functions leading to locks of mutexes in different order (causing dead lock). So, don't call resetCommandList from event->cleanup(), we can call resetCommandLists() from piEventsWait() directly instead. Also, remove VM_BIND workaround which is incorrect because currently we perform batching of the copy command lists.
  • Remove pointer to command list (ZeCommandList) from pi_event. Command list references event as well and this introduces difficulty: we need to lock a queue and then an event to be able to process events in the command list. And vice versa to lock an event and then a queue to update ZeCommandList when cleaning up an event.
  • Perform cleanup and release of events outside of queue lock. Callers of cleanup and EventRelease must no lock any mutexes.
    As a related change remove LockedQueue parameter from cleanup functions. Currently LockedQueue parameter is used several functions for cleaning resources. It specifies which queue is locked in the caller of these functions to not lock it twice. Problem is that some of the functions internally process events in a loop which can be from different queues and these queues are locked as part of this processing. It leads to the problem that we can have nested locks of different queues and this is a classic deadlock situation.
  • Make cleanup() and resetCommandLists() functions free. The resetCommandLists() function performs release of events in command lists and we release the queue if event's refcount reaches 0, so it is possible that queue can be destructed inside this function. That's why it should be free function and not a method of the queue. Same for cleanup().

againull added 7 commits May 20, 2022 12:47
* Don't call resetCommandList from event->cleanup().
* Remove VM_BIND workaround which is incorrect because
currently we perform batching of the copy command lists.
This member is unnecessary and creates a headache when trying to guard
access to the queue and event objects, because queue indirectly
references events through commandlist.
Event which is not associated with queue/cmd list/kernel and
doesn't have dep events doesn't need a cleanup and can be released
right away.
Currently LockedQueue parameter is used several functions for cleaning
resources. It specifies which queue is locked in the caller of these
functions to not lock it twice. Problem is that some of the functions
internally process events in a loop which can be from different queues
and these queues are locked as part of this processing. It leads to the
problem that we can have nested locks of differnt queues and this is a
classic deadlock situation.
The resetCommandLists() function performs release of events in command lists
and we release the queue if event's refcount reaches 0, so it is possible
that queue can be destructed inside this function. That's why it should be
free function and not a method of the queue.

Same logic for the cleanup(), it calls EventRelease and event can be
deleted, so it must be free function.
@againull againull requested a review from a team as a code owner May 20, 2022 23:47
@againull againull requested a review from smaslov-intel May 20, 2022 23:53
@againull
Copy link
Contributor Author

Sergey, I've made changes per our discussion as a prework to protect access to events with a mutex.
Since they logically try to solve related problems and don't touch a lot of code, I've put them in one PR.

@againull againull requested a review from aelovikov-intel May 21, 2022 00:06
pi_result piEventRelease(pi_event Event) {
return EventRelease(Event, nullptr);
}
pi_result piEventRelease(pi_event Event) { return EventRelease(Event); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see need for having both EventRelease/piEventRelease anymore, but it's not unlikely I'm missing some knowledge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, fixed.

auto &EventList = CommandList->second.EventList;
// We don't need to synchronize the events since the fence
// synchronized above already does that.
for (auto &Event : EventList) {
Event->Completed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this moved to line 1156 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this can be moved, fixed.

static pi_result EventCreate(pi_context Context, pi_queue Queue,
bool HostVisible, pi_event *RetEvent);
static pi_result cleanup(pi_event Event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have cleanup outside "extern C" unless absolutely necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, fixed.

Comment on lines 1459 to 1462
// We know here that refcount is more than 2 so decrement refcount
// manually to avoid calling piEventRelease under queue lock.
assert(HostVisibleEvent->RefCount > 2);
HostVisibleEvent->RefCount -= 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make traces much harder to read as the decrement wouldn't be shown there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, fixed.

@againull againull requested a review from aelovikov-intel May 23, 2022 22:58
// Perform any necessary cleanup after an event has been signalled.
// This currently makes sure to release any kernel that may have been used by
// the event, updates the last command event in the queue and cleanes up all dep
// events of of the event.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "of of"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 1169 to 1170
if (AssociatedQueue->LastCommandEvent == Event) {
AssociatedQueue->LastCommandEvent = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it might be better to wrap LastCommandEvent into std::atomic and do compare_exchange_strong without mutex lock.

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any more comments. @smaslov-intel , I'd expect Arthur is awaiting your review here.

@againull againull requested a review from smaslov-intel May 24, 2022 19:36
smaslov-intel
smaslov-intel previously approved these changes Jun 1, 2022
@againull againull requested a review from smaslov-intel June 1, 2022 16:33
@@ -30,8 +30,7 @@

extern "C" {
// Forward declarartions.
static pi_result EventRelease(pi_event Event, pi_queue LockedQueue);
static pi_result piQueueReleaseInternal(pi_queue Queue, pi_queue LockedQueue);
static pi_result piQueueReleaseInternal(pi_queue Queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, this will conflict with #6226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants