Skip to content

bpo-41299: Fix EnterNonRecursiveMutex() #28548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2021
Merged

bpo-41299: Fix EnterNonRecursiveMutex() #28548

merged 1 commit into from
Sep 24, 2021

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 24, 2021

Remove Py_FatalError() call: the code works even if now is negative.

https://bugs.python.org/issue41299

Remove Py_FatalError() call: the code works even if now is negative.
@vstinner
Copy link
Member Author

If _PyTime_GetPerfCounter() always return 0, PyCOND_TIMEDWAIT() (WaitForSingleObjectEx()) is called once, and then then the loop stops. I don't think that EnterNonRecursiveMutex() is the right place to check for _PyTime_GetPerfCounter() failure: check if QueryPerformanceFrequency() failed. We should ignore such error silently in EnterNonRecursiveMutex().

@lunixbochs
Copy link
Contributor

lunixbochs commented Sep 24, 2021

Thanks for taking this on, it's a much better approach than the patch I had in mind. Note this specific patch needs a fix or it will hang (never timeout) when now=0, which is why I had the original check. I think your described behavior is way better than a fatal exit, and it's very nice that we can avoid the PyTime startup checks as well.

It's a quick fix to remove the hang and still behave as you describe. As _PyTime_t is signed, if now <= 0, we can set target to _PyTime_MIN to force the loop to terminate after the first WaitForSingleObjectEx

Fix for hang that preserves your desired behavior:

diff --git a/Python/thread_nt.h b/Python/thread_nt.h
index e9dc880115..086b46cf49 100644
--- a/Python/thread_nt.h
+++ b/Python/thread_nt.h
@@ -78,7 +78,7 @@ EnterNonRecursiveMutex(PNRMUTEX mutex, DWORD milliseconds)
         /* wait at least until the target */
         _PyTime_t now = _PyTime_GetPerfCounter();
         _PyTime_t nanoseconds = _PyTime_FromNanoseconds((_PyTime_t)milliseconds * 1000000);
-        _PyTime_t target = now + nanoseconds;
+        _PyTime_t target = (now > 0) ? (now + nanoseconds) : (_PyTime_MIN);
         while (mutex->locked) {
             _PyTime_t microseconds = _PyTime_AsMicroseconds(nanoseconds, _PyTime_ROUND_TIMEOUT);
             if (PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs, microseconds) < 0) {

Test scenario for the hang:

diff --git a/Python/thread_nt.h b/Python/thread_nt.h
index e9dc880115..7480fe3ced 100644
--- a/Python/thread_nt.h
+++ b/Python/thread_nt.h
@@ -76,19 +76,25 @@ EnterNonRecursiveMutex(PNRMUTEX mutex, DWORD milliseconds)
         }
     } else if (milliseconds != 0) {
         /* wait at least until the target */
-        _PyTime_t now = _PyTime_GetPerfCounter();
+        _PyTime_t now = 0;
         _PyTime_t nanoseconds = _PyTime_FromNanoseconds((_PyTime_t)milliseconds * 1000000);
         _PyTime_t target = now + nanoseconds;
+        printf("now=%lld nanos=%lld target=%lld\n", now, nanoseconds, target);
+        int64_t counter = 0;
         while (mutex->locked) {
             _PyTime_t microseconds = _PyTime_AsMicroseconds(nanoseconds, _PyTime_ROUND_TIMEOUT);
             if (PyCOND_TIMEDWAIT(&mutex->cv, &mutex->cs, microseconds) < 0) {
+                printf("wait failed\n");
                 result = WAIT_FAILED;
                 break;
             }
-            now = _PyTime_GetPerfCounter();
-            if (target <= now)
+            now = 0;
+            if (target <= now) {
+                printf("target < now: %lld, %lld\n", target, now);
                 break;
+            }
             nanoseconds = target - now;
+            printf("loop %lld, nanoseconds=%lld\n", counter++, nanoseconds);
         }
     }
     if (!mutex->locked) {
>>> import threading
>>> lock = threading.Lock()
>>> lock.acquire()
True
>>> lock.acquire(timeout=0.1)
now=0 nanos=100000000 target=100000000
loop 0, nanoseconds=100000000
loop 1, nanoseconds=100000000
loop 2, nanoseconds=100000000
loop 3, nanoseconds=100000000
loop 4, nanoseconds=100000000
loop 5, nanoseconds=100000000

After my target = _PyTime_MIN fix:

>>> import threading
>>> lock = threading.Lock()
>>> lock.acquire()
True
>>> lock.acquire(timeout=0.1)
now=0 nanos=100000000 target=0
target < now: -9223372036854775808, 0
False
>>>

@vstinner
Copy link
Member Author

Since Windows XP, QueryPerformanceFrequency() cannot fail. It's writen in the Windows documentation:
https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency

On systems that run Windows XP or later, the function will always succeed and will thus never return zero.

On my Windows 10, QueryPerformanceFrequency() now returns a fixed value of 10,000,000 (10 MHz).

@lunixbochs
Copy link
Contributor

lunixbochs commented Sep 24, 2021

If _PyTime_GetPerfCounter() always return 0, PyCOND_TIMEDWAIT() (WaitForSingleObjectEx()) is called once, and then the loop stops

I'm referring to the above statement, which is incorrect with your patch, and is easily fixed by my suggestion to clamp the target time to _PyTime_MIN if now <= 0.

If it's definitely impossible for the QPC API to ever fail, I suggest simplifying the underlying QPC code so that's more obvious. Otherwise the time API has (probably) dead code that returns 0, and this code both doesn't handle that and doesn't have any comments about the assumptions made.

@vstinner
Copy link
Member Author

I created #28552 to make it more explicit.

@vstinner vstinner merged commit d639e31 into python:main Sep 24, 2021
@vstinner vstinner deleted the mutex_timeout branch September 24, 2021 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants