-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL] Prework before protecting access to the event objects with a mutex #6179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Don't call resetCommandList from event->cleanup(). * Remove VM_BIND workaround which is incorrect because currently we perform batching of the copy command lists.
This member is unnecessary and creates a headache when trying to guard access to the queue and event objects, because queue indirectly references events through commandlist.
Event which is not associated with queue/cmd list/kernel and doesn't have dep events doesn't need a cleanup and can be released right away.
Currently LockedQueue parameter is used several functions for cleaning resources. It specifies which queue is locked in the caller of these functions to not lock it twice. Problem is that some of the functions internally process events in a loop which can be from different queues and these queues are locked as part of this processing. It leads to the problem that we can have nested locks of differnt queues and this is a classic deadlock situation.
The resetCommandLists() function performs release of events in command lists and we release the queue if event's refcount reaches 0, so it is possible that queue can be destructed inside this function. That's why it should be free function and not a method of the queue. Same logic for the cleanup(), it calls EventRelease and event can be deleted, so it must be free function.
Sergey, I've made changes per our discussion as a prework to protect access to events with a mutex. |
pi_result piEventRelease(pi_event Event) { | ||
return EventRelease(Event, nullptr); | ||
} | ||
pi_result piEventRelease(pi_event Event) { return EventRelease(Event); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see need for having both EventRelease
/piEventRelease
anymore, but it's not unlikely I'm missing some knowledge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, fixed.
auto &EventList = CommandList->second.EventList; | ||
// We don't need to synchronize the events since the fence | ||
// synchronized above already does that. | ||
for (auto &Event : EventList) { | ||
Event->Completed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this moved to line 1156 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this can be moved, fixed.
static pi_result EventCreate(pi_context Context, pi_queue Queue, | ||
bool HostVisible, pi_event *RetEvent); | ||
static pi_result cleanup(pi_event Event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have cleanup
outside "extern C" unless absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, fixed.
// We know here that refcount is more than 2 so decrement refcount | ||
// manually to avoid calling piEventRelease under queue lock. | ||
assert(HostVisibleEvent->RefCount > 2); | ||
HostVisibleEvent->RefCount -= 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make traces much harder to read as the decrement wouldn't be shown there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, fixed.
// Perform any necessary cleanup after an event has been signalled. | ||
// This currently makes sure to release any kernel that may have been used by | ||
// the event, updates the last command event in the queue and cleanes up all dep | ||
// events of of the event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "of of"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
if (AssociatedQueue->LastCommandEvent == Event) { | ||
AssociatedQueue->LastCommandEvent = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, it might be better to wrap LastCommandEvent
into std::atomic
and do compare_exchange_strong
without mutex lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any more comments. @smaslov-intel , I'd expect Arthur is awaiting your review here.
@@ -30,8 +30,7 @@ | |||
|
|||
extern "C" { | |||
// Forward declarartions. | |||
static pi_result EventRelease(pi_event Event, pi_queue LockedQueue); | |||
static pi_result piQueueReleaseInternal(pi_queue Queue, pi_queue LockedQueue); | |||
static pi_result piQueueReleaseInternal(pi_queue Queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, this will conflict with #6226
As a related change remove LockedQueue parameter from cleanup functions. Currently LockedQueue parameter is used several functions for cleaning resources. It specifies which queue is locked in the caller of these functions to not lock it twice. Problem is that some of the functions internally process events in a loop which can be from different queues and these queues are locked as part of this processing. It leads to the problem that we can have nested locks of different queues and this is a classic deadlock situation.