-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-31530: fix crash when multiple threads iterate over a file, round 2 #5060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… file on multiple threads. (#3672)"
Lib/test/test_file2k.py
Outdated
|
||
def test_iteration_seek(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this test is removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this test no longer raises any exceptions.
} | ||
if ((f->f_buf = (char *)PyMem_Malloc(bufsize)) == NULL) { | ||
if ((rab->buf = PyMem_MALLOC(bufsize)) == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is changing PyMem_Malloc to PyMem_MALLOC required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I prefer them over the functions.
cache the file buffer state locally and only set it back on the file | ||
object when we're done. | ||
*/ | ||
readaheadbuffer rab = {f->f_buf, f->f_bufptr, f->f_bufend}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop_readaheadbuffer() frees rab->buf
which is f->f_buf
. If two threads call drop_readaheadbuffer() simultaneously f->f_buf
will be freed twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other threads can't access f->f_buf
because it's set to NULL
for the duration of the method.
@@ -0,0 +1 @@ | |||
Prevent crashes when multiple threads iterate of a file at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "iterate of" correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually this file is supposed to be gone.
…s, round 2 Multiple threads iterating over a file can corrupt the file's internal readahead buffer resulting in crashes. To fix this, cache buffer state thread-locally for the duration of a file_iternext call and only update the file's internal state after reading completes. No attempt is made to define or provide "reasonable" semantics for iterating over a file on multiple threads. (Non-crashing) races are still present. Duplicated, corrupt, and missing data will happen. This was originally fixed by 6401e56, which raised an exception from seek() and next() when concurrent operations were detected. Alas, this simpler solution breaks legitimate use cases such as capturing the standard streams when multiple threads are logging.
0ddd5db
to
e2dcd38
Compare
This resurrects #3670 and mostly reverts 6401e56. See https://bugs.python.org/msg309265 for why.
https://bugs.python.org/issue31530