Skip to content

bpo-45274: Fix Thread._wait_for_tstate_lock() race condition #28532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions Lib/threading.py
Original file line number Diff line number Diff line change
Expand Up @@ -1094,11 +1094,24 @@ def _wait_for_tstate_lock(self, block=True, timeout=-1):
# If the lock is acquired, the C code is done, and self._stop() is
# called. That sets ._is_stopped to True, and ._tstate_lock to None.
lock = self._tstate_lock
if lock is None: # already determined that the C code is done
if lock is None:
# already determined that the C code is done
assert self._is_stopped
elif lock.acquire(block, timeout):
lock.release()
self._stop()
return

try:
if lock.acquire(block, timeout):
lock.release()
self._stop()
except:
if lock.locked():
# bpo-45274: lock.acquire() acquired the lock, but the function
# was interrupted with an exception before reaching the
# lock.release(). It can happen if a signal handler raises an
# exception, like CTRL+C which raises KeyboardInterrupt.
lock.release()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this code is 100% reliable if lock.release() gets interrupted by a second exception (ex: raised by a second signal: fatality!). Maybe a context manager could be used. But I'm exhausted, I will think about this code after a good night :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we've still got bugs in the context manager exit implementation where even it can miss something in this scenario and not call __exit__. https://bugs.python.org/issue29988
This came up during the dev sprint in 2017 when working on the interpreter loop.

So this PR, while not technically a fix, is at least an improvement and should help the single Ctrl-C KeyboardInterrupt case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function should be reimplemented in C to better control how signals are handled.

In pure Python, I don't think that it's possible to fully control handle any possible exception at any line number.

@serhiy-storchaka proposed to rewrite acquire()+release() in C to make sure that at least the lock remains consistent: https://bugs.python.org/issue45274#msg402532 So it doesn't handle exceptions in the _stop() method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I only wanted to enhance the code, I'm not interested to rewrite threading.Thread in C.

self._stop()
raise

@property
def name(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Fix a race condition in the :meth:`Thread.join() <threading.Thread.join>`
method of the :mod:`threading` module. If the function is interrupted by a
signal and the signal handler raises an exception, make sure that the thread
remains in a consistent state to prevent a deadlock. Patch by Victor
Stinner.