-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-44289: Keep argument file object's current position in tarfile.is_tarfile #26488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_tarifle function.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
@amateja hi, you need to sign the CLA.. Other than that, I've looked at the patch and it looks good to me. I've also confirmed that the fix does what it's supposed to by running test with & w/out the fix. Since this is a (minor) backwards compatibility change, I'm not sure if it needs to also be added to "what's new", I hope @ethanfurman can comment. |
Hi @akulakov, thanks for reviewing. I believe I've already signed CLA. I confirmed that with https://check-python-cla.herokuapp.com/. Regarding "what's new" I entirely base on your judgment. Just let me know what should be done to unblock this change from being accepted. |
@amateja You are right, I somehow missed the CLA accepted message. You're good as far as that goes :-) |
I looked at it again and I think this should be fixed in Line 1616 in 9fe0de2
So the fix would be (a few lines above the linked line; I tested it & it works):
|
Note also that the fix I suggested covers more issues: it also covers calling
|
Here are my further suggestions:
Do you want me to write a test for calling |
Thanks for making the changes! I will do a bit of testing tomorrow and look over it again; I think it's enough to have the test that checks that open keeps the same position as you have it, because both bugs are effectively covered by this test. I will try to post an update tomorrow or in the next 2 days. |
I did some profiling to make sure an extra seek doesn't affect performance, I did it by opening 1000 small tar files; here's what I got: with patch: 415, 417, 418 msec per loop As expected, it doesn't make noticeable difference.
|
Please update the PR title and the news entry. For the unit tests, keep both of them for now. A core developer will need to review the PR, so I would wait for them to make this call. |
I'm not a bit concerned that users may be relying on the initial seek of |
I do think it's plausible that users may be relying on If users are relying on multiple opens to seek further, they are relying on buggy behaviour and IMO it's ok to fix that. Note that when open fails, we still always want to restore original position. It's only on success that special casing of 512 position comes into play. I did some testing and the following logic works:
It's a bit ugly to hardcode the initial seek position like this, it may be someone more familiar with module internals will suggest a better fix. |
Andrzej: I have some spare time right now, if you prefer I can finish the PR and list you as a co-author (because I would have to make it into a new PR). If you are still planning to finish this, no rush. |
I'm more than happy to work further on this case. I thought we are waiting for someone more familiar with tar internals to comment on your proposal. |
Hi @akulakov, I've read some GNU tar manual to understand your proposal better and followed your suggestion with some cosmetic tuning. Your proposal if fileobj is not None:
saved_pos = fileobj.tell()
try:
if fileobj is not None and saved_pos == BLOCKSIZE:
fileobj.seek(0)
return func(name, "r", fileobj, **kwargs) and mine if fileobj is not None:
saved_pos = fileobj.tell()
if saved_pos == BLOCKSIZE:
fileobj.seek(0)
try:
return func(name, "r", fileobj, **kwargs) Your fix is just great for uncompressed archives but it does not work for any supported compression methods. Each compression method changes Following case brought me here: >>> import tarfile
>>>
>>> with open('spam', 'rb') as fileobj:
... # here fileobj.tell() == 0
... if tarfile.is_tarfile(fileobj):
... # here fileobj.tell() != 0
... archive = tarfile.open(fileobj=fileobj)
... # therefore listing archive fails
... print(archive.getmembers())
...
[] and I have some gut feeling that we should not modify Please use my hacky test method for def test_is_tarfile_affects_getmembers(self):
with tarfile.open(name=self.tarname) as f:
refer = len(f.getmembers())
with open(self.tarname, "rb") as fileobj:
self.assertTrue(tarfile.is_tarfile(fileobj))
pos = fileobj.tell()
with tarfile.open(fileobj=fileobj) as tar:
num_files = len(tar.getmembers())
self.assertEqual(num_files, refer, pos) I'm open for some other proposals and really looking forward to your feedback. |
Andrzej: good catch about compressed files, my fix won't work for them. I think it's not clear if breaking backwards compatibility here is worth it. One other idea we can explore is setting the saved position on a fileobj and then checking for it on What do you think? |
Andrzej: can you also check if on subsequent If so, we can do logic like: |
Hi Andrei, Let me start with answering your question. It turns out that calling subsequently >>> import tarfile
>>> name = 'spam.tar.gz'
>>> f = open(name, 'rb')
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b8d4fd0>
>>> f.tell()
8202
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b9eb0a0>
>>> f.tell()
8714
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b8d4fd0>
>>> f.tell()
9226
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7bb86d60>
>>> f.tell()
9738
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b9eb220>
>>> f.tell()
10250
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b9d4fa0>
>>> f.tell()
10762
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b8d4fd0>
>>> f.tell()
11274
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7bb86d60>
>>> f.tell()
11786
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7b9eb0a0>
>>> f.tell()
12298
>>> tarfile.open(fileobj=f)
<tarfile.TarFile object at 0x7fae7bb86ac0>
>>> f.tell()
12810 So for now we have few options here:
I like option 4 and option 2 is not too shabby either. I would also back option 3 up if there is no other way and I'm not very fond of option 1 in it's current state. |
Oh wait... I did some more math. First |
In my testing the pointer doesn't always advance, with two tiny text files, gzip format, it advanced from 0 to 202 but with subsequent One additional option we have is to wait for more user feedback and for somebody to come up with a better fix. I will think more about this later tonight / tomorrow. |
Andrzej: after some discussion on discord, I think the option 2) (your original solution), is the best way to address this. |
c94f343
to
bdb1ff9
Compare
Aye aye. Please find recent commit reverted. |
@@ -0,0 +1 @@ | |||
Keep argument file object's current position in tarfile.is_tarifle method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
News entry can be more descriptive, so that users can easily tell if they need to update their codebase;
something along these lines:
Fix an issue with :meth:~tarfile.is_tarfile
method when using fileobj argument: position
in the fileobj was advanced forward which made it unreadable with :meth:tarfile.TarFile.open
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to come up with something for new entry but your version is just right. I hope you don't mind me using it, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, go ahead!
@@ -0,0 +1 @@ | |||
Fix an issue with :meth:`~tarfile.is_tarfile` method when using _fileobj_ argument: position in the _fileobj_ was advanced forward which made it unreadable with :meth:`tarfile.TarFile.open`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In documentation, bold highlight (with asterisks) is generally used for parameter names, for example see https://github.com/python/cpython/edit/main/Doc/whatsnew/3.10.rst line 913
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks. I've also noticed that here and there monospace is used and I'm not sure why.
https://github.com/python/cpython/blame/main/Doc/whatsnew/3.10.rst#L994
https://github.com/python/cpython/blame/main/Doc/whatsnew/3.10.rst#L1015
Fixed already. Happy to follow guidelines.
Aside from one small nitpick in the news file, LGTM! @gvanrossum As you looked at it a couple of days ago, maybe you can review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is what I expected it would look like. Thanks for the thorough tests! I will merge this now.
(Whoops, let's run the tests before merging rashly. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the PR title has a typo which should be fixed before merging.
(And the tests need to run one more time.) |
Many thanks to @akulakov for going together through this issue with me. Thank you @gvanrossum for reviewing and @kumaraditya303 for approving :) Looking forward to working with you again. |
Andrzej: you're very welcome, thanks for contributing! |
Since Python 3.9 tarfile.is_tarfile accepts not only paths but also files and file-like objects #18090.
Verification if a file or file-like object is a tar file modifies file object's current position.
https://bugs.python.org/issue44289
https://bugs.python.org/issue44289