-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL][L0] Fix the memory leak in pi_queue #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pi_queue was never deallocated before, so I added delete for it. It had some dependency on pi_event, and I removed those dependency. Signed-off-by: Byoungro So <[email protected]>
@@ -1931,8 +1931,7 @@ pi_result piQueueRetain(pi_queue Queue) { | |||
|
|||
pi_result piQueueRelease(pi_queue Queue) { | |||
assert(Queue); | |||
// Lock automatically releases when this goes out of scope. | |||
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex); | |||
Queue->PiQueueMutex.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was necessary because the previous code will cause accessing Queue->PiQueueMutex at the end of this function (out of scope).
But, I added "delete Queue" at line 1954, so lock_guard deallocation using Queue->PiQueueMutex causes accessing memory that is already removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a way to still use scoped lock, please re-work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the std::lock_guard constructor at line 1935, it uses (Queue->PiQueueMutex), and it needs Queue->PiQueueMutex again when it goes out of scope and destructor for std::lock_guard is called.
But, we already deallocated Queue by the time the control goes out of the scope, so it accesses the memory that is already removed (Queue->PiQueueMutex)
The only way I can fix this ordering issue is to call std::mutex.lock() before changing the ref count, and std::mutex.unlock() before "delete Queue".
If you can think of how you can still use the scoped lock that requires something already deallocated, please let me know.
@@ -3497,24 +3500,25 @@ static void cleanupAfterEvent(pi_event Event) { | |||
// any of the Event's data members that need to be read/reset as | |||
// part of the cleanup operations. | |||
auto Queue = Event->Queue; | |||
if (Queue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We needed to guard the code by adding "if (Queue)" because when the event is released we don't need to reuse the fence and command list associated with queue (that is being deleted)
// Note that we still need to release the kernel associated with the event. | ||
// So, before calling cleanupAfterEvent, we have to indicate that the event | ||
// is being removed by setting the Queue member to nullptr. | ||
Event->Queue = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is practically killing command-list caching and you will see performance regressions. Also I don't understand why not to reclaim the command-list and fence to the cache just before destroying the event. The cache is in the Queue and should get properly destroyed at the time the queue is about to be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that a pi_event has an associated data member pi_queue.
But, if that pi_queue is already released, pi_event does not know if the associated pi_queue is already deallocated.
This will cause a seg fault because piEventRelease calls cleanupAfterEvent and it uses pi_queue in cleanupAfterEvent() function which is already deallocated.
Previously, this issue was hidden because we never run "delete Queue" in piQueueRelease.
It is not a matter of memory leak in command-list cache.
Without nullifying Event->Queue, cleanupAfterEvent() crash and cause a segmentation fault.
As I mentioned in the code comments, reusing command-list and fence is a good idea as long as the associated queue is still in use, but should avoid if the associated queue is already deallocated because it will cause a seg fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is practically killing command-list caching
reusing command-list and fence is a good idea as long as the associated queue is still in use, but should avoid if the associated queue is already deallocated because it will cause a seg fault.
We should continue to reuse command-list/fence from the events being destroyed, and thus need to find other ways for fixing the problem you are seeing.
The problem is that a pi_event has an associated data member pi_queue.
But, if that pi_queue is already released, pi_event does not know if the associated pi_queue is already deallocated.
There was an earlier proposal to retain queue when event is linking to a queue, and release queue when event is deleted. So, effectively the queue is going to be deleted after all events in it are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an earlier proposal to retain queue when event is linking to a queue, and release queue when event is deleted. So, effectively the queue is going to be deleted after all events in it are deleted.
That won't solve the issue of pi_event being not notified of the pi_queue released. I already tried the "retain" approach in #2905, and it did not solve this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't solve the issue of pi_event being not notified of the pi_queue released.
What do you mean by "notified" and for which events? I believe the proper way is to use retain/release to identify the internal usage of objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant that I tried retain/release approach in PR #2905, but it did not work.
Lots of LIT tests failed.
Next step:
- Investigate why LIT tests failed in PR [SYCL][L0] Fix memory leak in pi Queue #2905. It looks like there are still the case piEventRelease occurs after piQueueRelease.
- Investigate if I can keep the record of pi_queue without acutall "delete pi_queue", and deallocate them all at once in piTearDown.
Please let me know if I am going to the wrong direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like #2905, please close this one.
pi_queue was never deallocated before, so I added delete for it.
It had some dependency on pi_event, and I removed those dependency.
Signed-off-by: Byoungro So [email protected]