-
Notifications
You must be signed in to change notification settings - Fork 788
[SYCL] Fix memory leak in reduction resources #5653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix memory leak in reduction resources #5653
Conversation
Reductions that require additional resources, such as buffers, can currently create a circular dependency between the resources and the commands issued by the reductions. These changes clear up this dependence in a similar way to how streams are transferred by transferring ownership of the resources to the commands and ensuring release when cleaning up the commands. Signed-off-by: Steffen Larsen <[email protected]>
Solution has been discussed with @sergey-semenov offline (w.r.t #5162 (comment)) and given it is very similar to streams, it should not add much more work to making it support post-enqueue cleanup. |
/verify with intel/llvm-test-suite#624 |
Signed-off-by: Steffen Larsen <[email protected]>
Signed-off-by: Steffen Larsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Steffen Larsen <[email protected]>
@alexbatashev, it looks like GHA uses outdated driver version. Am I right? Can you help to fix this, please? |
Patch: #5679 |
@steffenlarsen, is it ready for merge? I'm not sure because I see failures in pre-commit validation and Sergey requested some additional changes to make in #5120. |
@bader This is ready for merge, but I think we may have to disable the failing tests temporarily. The precommit failures are caused by the compute-runtime and newer versions are believed to have fixed this issue. #5120 was reverted, so I will make the required common change in the re-opened PR for it (assuming this is merged first.) |
I suggest we update the compute-runtime and re-test this change to get the status for pre-commit tests. We should be able to merge #5742 in a few hours if tests pass. Based on updated results we either merge this PR as is or update with tests if needed. |
Yes, that sounds like the best solution to me, thanks! |
@steffenlarsen, #5742 is merged. To get updated test status, could you merge with the tip of the |
Signed-off-by: Steffen Larsen <[email protected]>
@bader Tests still ran with the old drivers. Do you know why? |
Sorry, my fault. #5742 updated the driver for Jenkins CI only. You can see that all Linux tests pass in Jenkins, but there are Windows driver yet, so we still have to update the test status on Windows. |
@steffenlarsen, GitHub Actions pre-commit passed. Could you create a PR for llvm-test-suite update on Windows, please? |
Failing precommit tests are on Windows because CI drivers for Windows haven't been uplifted yet. @bader do you prefer we disable the sporadic failures (majority of reduction tests) on Windows or wait for the Windows drivers to be there? |
AFAIK, @tfzhu is already updating Windows drivers. I think it should be done soon and we can pass pre-commit for this change. |
/verify with intel/llvm-test-suite#624 |
Signed-off-by: Steffen Larsen <[email protected]>
There was a bug in the reduction implementation where auxiliary resources would initialize with pointers to data kept in another (non-buffer) auxiliary resource but not prevent an unnecessary copy-back. With the leak fixed, the initializer resources would die before the buffer resource it was used to initialize, which would cause the copy-back to copy to unexpected memory. This is fixed by disabling the unnecessary copy-back on affected auxiliary reduction resources. |
/verify with intel/llvm-test-suite#624 |
1 similar comment
/verify with intel/llvm-test-suite#624 |
Test has been fixed. @sergey-semenov could you please have another look? |
@intel/llvm-gatekeepers - This should be ready for merge. |
PR intel/llvm#5653 fixed memory leaks when ZE_DEBUG=4 is set Signed-off-by: Byoungro So <[email protected]>
PR intel/llvm#5653 fixed memory leaks when ZE_DEBUG=4 is set Signed-off-by: Byoungro So <[email protected]>
…e#930) PR intel#5653 fixed memory leaks when ZE_DEBUG=4 is set Signed-off-by: Byoungro So <[email protected]>
Reductions that require additional resources, such as buffers, can currently create a circular dependency between the resources and the commands issued by the reductions. These changes clear up this dependence in a similar way to how streams are transferred by transferring ownership of the resources to the commands and ensuring release when cleaning up the commands.