Skip to content

[SYCL] Fix deadlock in ProgramManager class #2131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 28, 2020
Merged

[SYCL] Fix deadlock in ProgramManager class #2131

merged 10 commits into from
Jul 28, 2020

Conversation

dm-vodopyanov
Copy link
Contributor

This patch fixes deadlock in sycl::detail::ProgramManager class caused by data race in usage of KernelProgramCache::MBuildCV condition variable.
Even if BuildResult->State.load() is atomic, it must be modified under the mutex. The deadlock happens in situation when notification notify_all() is sent while MBuildCV is in the wait expression BUT not in the waiting state. The result is - the notification is lost. After that the thread goes to the waiting state and sleeps forever.
BuildResult is protected by mutex now, so the notification will be sent only when the other thread in the waiting state.

This patch fixes deadlock in sycl::detail::ProgramManager class caused
by data race in usage of KernelProgramCache::MBuildCV condition
variable.
Even if BuildResult->State.load() is atomic, it must be modified under
the mutex. The deadlock happends in situation when notification
notify_all() is sent while MBuildCV is in the wait expression BUT not
in the waiting state. The result is - the notification is lost. After
that the thread goes to the waiting state and sleeps forever.
BuildResult is protected by mutex now, so the notification will be
sent only when the other thread in the waiting state.
@dm-vodopyanov dm-vodopyanov requested review from kbobrovs and a team as code owners July 16, 2020 19:21
@dm-vodopyanov dm-vodopyanov requested a review from rbegam July 16, 2020 19:21
@dm-vodopyanov
Copy link
Contributor Author

@kbobrovs, @rbegam, ping.

Copy link
Contributor

@s-kanaev s-kanaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
Though, a couple of comments to be resolved one way or the other.

kbobrovs
kbobrovs previously approved these changes Jul 27, 2020
Sergey Kanaev added 5 commits July 27, 2020 13:43
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev s-kanaev requested review from kbobrovs and a team July 27, 2020 13:25
Signed-off-by: Sergey Kanaev <[email protected]>
@s-kanaev
Copy link
Contributor

/summary:run

Copy link
Contributor

@rbegam rbegam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit a9c776f into intel:sycl Jul 28, 2020
jsji pushed a commit that referenced this pull request Aug 31, 2023
This is a bulk conversion done using the migration script.

There are more extension tests to be converted; however they will need
manual fixups, so leave them out of this bulk conversion.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@e6eae86
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
[EXP][CMDBUF] Make command handle behaviour consistent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants