-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][L0] Fix the memory leak in pi_queue #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1931,8 +1931,7 @@ pi_result piQueueRetain(pi_queue Queue) { | |
|
||
pi_result piQueueRelease(pi_queue Queue) { | ||
assert(Queue); | ||
// Lock automatically releases when this goes out of scope. | ||
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex); | ||
Queue->PiQueueMutex.lock(); | ||
|
||
if (--(Queue->RefCount) == 0) { | ||
// It is possible to get to here and still have an open command list | ||
|
@@ -1951,6 +1950,10 @@ pi_result piQueueRelease(pi_queue Queue) { | |
|
||
zePrint("piQueueRelease NumTimesClosedFull %d, NumTimesClosedEarly %d\n", | ||
Queue->NumTimesClosedFull, Queue->NumTimesClosedEarly); | ||
Queue->PiQueueMutex.unlock(); | ||
delete Queue; | ||
} else { | ||
Queue->PiQueueMutex.unlock(); | ||
} | ||
return PI_SUCCESS; | ||
} | ||
|
@@ -3497,24 +3500,25 @@ static void cleanupAfterEvent(pi_event Event) { | |
// any of the Event's data members that need to be read/reset as | ||
// part of the cleanup operations. | ||
auto Queue = Event->Queue; | ||
if (Queue) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We needed to guard the code by adding "if (Queue)" because when the event is released we don't need to reuse the fence and command list associated with queue (that is being deleted) |
||
// Lock automatically releases when this goes out of scope. | ||
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex); | ||
|
||
// Lock automatically releases when this goes out of scope. | ||
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex); | ||
|
||
// Cleanup the command list associated with the event if it hasn't | ||
// been cleaned up already. | ||
auto EventCommandList = Event->ZeCommandList; | ||
|
||
if (EventCommandList) { | ||
// Event has been signalled: If the fence for the associated command list | ||
// is signalled, then reset the fence and command list and add them to the | ||
// available list for reuse in PI calls. | ||
if (Queue->RefCount > 0) { | ||
ze_result_t ZeResult = ZE_CALL_NOCHECK( | ||
zeFenceQueryStatus(Queue->ZeCommandListFenceMap[EventCommandList])); | ||
if (ZeResult == ZE_RESULT_SUCCESS) { | ||
Queue->resetCommandListFenceEntry(EventCommandList, true); | ||
Event->ZeCommandList = nullptr; | ||
// Cleanup the command list associated with the event if it hasn't | ||
// been cleaned up already. | ||
auto EventCommandList = Event->ZeCommandList; | ||
|
||
if (EventCommandList) { | ||
// Event has been signalled: If the fence for the associated command list | ||
// is signalled, then reset the fence and command list and add them to the | ||
// available list for reuse in PI calls. | ||
if (Queue->RefCount > 0) { | ||
ze_result_t ZeResult = ZE_CALL_NOCHECK( | ||
zeFenceQueryStatus(Queue->ZeCommandListFenceMap[EventCommandList])); | ||
if (ZeResult == ZE_RESULT_SUCCESS) { | ||
Queue->resetCommandListFenceEntry(EventCommandList, true); | ||
Event->ZeCommandList = nullptr; | ||
} | ||
} | ||
} | ||
} | ||
|
@@ -3580,17 +3584,24 @@ pi_result piEventRetain(pi_event Event) { | |
pi_result piEventRelease(pi_event Event) { | ||
assert(Event); | ||
if (--(Event->RefCount) == 0) { | ||
// By default, cleanupAfterEvent will reuse the fence and the commandList | ||
// associated with event. This is a good idea if the event will be alive. | ||
// But, we are deallocating event in this function, so reusing the | ||
// associated fence and commandList will cause memory leak. | ||
// Note that we still need to release the kernel associated with the event. | ||
// So, before calling cleanupAfterEvent, we have to indicate that the event | ||
// is being removed by setting the Queue member to nullptr. | ||
Event->Queue = nullptr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is practically killing command-list caching and you will see performance regressions. Also I don't understand why not to reclaim the command-list and fence to the cache just before destroying the event. The cache is in the Queue and should get properly destroyed at the time the queue is about to be deleted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that a pi_event has an associated data member pi_queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We should continue to reuse command-list/fence from the events being destroyed, and thus need to find other ways for fixing the problem you are seeing.
There was an earlier proposal to retain queue when event is linking to a queue, and release queue when event is deleted. So, effectively the queue is going to be deleted after all events in it are deleted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That won't solve the issue of pi_event being not notified of the pi_queue released. I already tried the "retain" approach in #2905, and it did not solve this issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean by "notified" and for which events? I believe the proper way is to use retain/release to identify the internal usage of objects. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just meant that I tried retain/release approach in PR #2905, but it did not work.
Please let me know if I am going to the wrong direction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like #2905, please close this one. |
||
cleanupAfterEvent(Event); | ||
|
||
auto Context = Event->Context; | ||
if (Event->CommandType == PI_COMMAND_TYPE_MEM_BUFFER_UNMAP && | ||
Event->CommandData) { | ||
// Free the memory allocated in the piEnqueueMemBufferMap. | ||
ZE_CALL(zeMemFree(Event->Queue->Context->ZeContext, Event->CommandData)); | ||
ZE_CALL(zeMemFree(Context->ZeContext, Event->CommandData)); | ||
Event->CommandData = nullptr; | ||
} | ||
ZE_CALL(zeEventDestroy(Event->ZeEvent)); | ||
|
||
auto Context = Event->Context; | ||
ZE_CALL(Context->decrementAliveEventsInPool(Event->ZeEventPool)); | ||
|
||
delete Event; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was necessary because the previous code will cause accessing Queue->PiQueueMutex at the end of this function (out of scope).
But, I added "delete Queue" at line 1954, so lock_guard deallocation using Queue->PiQueueMutex causes accessing memory that is already removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a way to still use scoped lock, please re-work
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the std::lock_guard constructor at line 1935, it uses (Queue->PiQueueMutex), and it needs Queue->PiQueueMutex again when it goes out of scope and destructor for std::lock_guard is called.
But, we already deallocated Queue by the time the control goes out of the scope, so it accesses the memory that is already removed (Queue->PiQueueMutex)
The only way I can fix this ordering issue is to call std::mutex.lock() before changing the ref count, and std::mutex.unlock() before "delete Queue".
If you can think of how you can still use the scoped lock that requires something already deallocated, please let me know.