Skip to content

gh-127350: Add Py_fopen() function #127821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jan 6, 2025
Merged

gh-127350: Add Py_fopen() function #127821

merged 20 commits into from
Jan 6, 2025

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Dec 11, 2024

Rename _Py_fopen_obj() to Py_fopen(). The function now also accepts bytes path on Windows.

Remove the private, undocumented, and untested function _Py_fopen_obj().


📚 Documentation preview 📚: https://cpython-previews--127821.org.readthedocs.build/

Rename _Py_fopen_obj() to Py_fopen(). The function now also accepts
bytes path on Windows.

Remove the private, undocumented, and untested function
_Py_fopen_obj().
@zooba
Copy link
Member

zooba commented Dec 11, 2024

Do we need Py_fclose as well?

@rruuaanng
Copy link
Contributor

Do we need Py_fclose as well?

fclose seems to be enough, we don't need to wrap it(Unless we use CloseHandle to close the file).

@vstinner
Copy link
Member Author

@zooba:

Do we need Py_fclose as well?

Py_fopen() adds values to fopen(): accept a Python object, make the file descriptor non-inheritable, raise an exception on error.

Py_fclose() would just call fclose() without any additional value? I would prefer to not add it unless we have to, and call directly fclose(). I asked the reporter if Py_fclose() would be needed.

To be honest, I don't understand well the Windows DLL issue. From what I understood, to use a FILE* in Python DLL, it should be open in the Python DLL. Otherwise, things can go wrong if the C library version is not exactly the same in two DLLs. But I don't know if it's safe to call fopen() / Py_fopen() in DLL A and call fclose() in DLL B.

@PlanetCNC
Copy link

PyRun_AnyFileExFlags has "closeit" parameter which I always used.
But you are correct. Py_fopen() needs Py_fclose().

Thank you guys for doing all this.

@vstinner
Copy link
Member Author

But you are correct. Py_fopen() needs Py_fclose().

Is Py_fclose() needed because of the DLL issue?

@zooba
Copy link
Member

zooba commented Dec 12, 2024

Is Py_fclose() needed because of the DLL issue?

Yes. When you call Py_fopen(), you will get a FILE * from whichever C runtime Python is linked to, which isn't necessarily the same as your own application. So you can only pass it back to the same C runtime - it must only be passed into Python APIs, essentially. So regular fclose might crash (or close a different file), and you need Py_fclose() to make sure it goes back to the right runtime.

@vstinner
Copy link
Member Author

Ok, I wanted to make sure that I get the rationale correctly.

I added Py_fclose() function and documented that files opened by Py_fopen() must only be closed by Py_fclose().

@zooba
Copy link
Member

zooba commented Dec 13, 2024

Your updated documentation seems a bit snarky. The Py_fclose may be needed on any platform depending on your static linking situation.

I see no reason not to just document it as "should be used to close files opened by Py_fopen" and if people manage to use fclose and it works then good for them.

@vstinner
Copy link
Member Author

Your updated documentation seems a bit snarky. The Py_fclose may be needed on any platform depending on your static linking situation.

It's not my intent to be snarky. I just tried to explain why calling fclose() directly can cause issues in practice.

I see no reason not to just document it as "should be used to close files opened by Py_fopen" and if people manage to use fclose and it works then good for them.

Ok, let's do that.

Co-authored-by: Steve Dower <[email protected]>
@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#51: "Add public Py_fopen() and Py_fclose() functions, and remove private _Py_fopen_obj() function".

@serhiy-storchaka serhiy-storchaka self-requested a review December 31, 2024 15:16
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. I would like to see more tests. The support of the path-like protocol is inconsistent.

wmode, Py_ARRAY_LENGTH(wmode));
if (usize == 0) {
PyErr_SetFromWindowsErr(0);
wchar_t wmode[10];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add tests for mode: NULL, non-decodable (non-ASCII) mode, too long mode (e.g. "rt+, ccs=UTF-8").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for "too long mode" and non-ASCII mode.

Passing NULL does crash. I don't know how to modify the Argument Clinic code to accept NULL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use "z#" to pass arbitrary sequence of bytes or NULL. In Argument Clinic it is expressed as str(zeroes=True, accept={robuffer, str, NoneType}).

Use the NULLABLE macro for path.

If it crashes, just add a comment # CRASHES with specific call as in other tests as indication that we did not miss such case, but cannot test it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it crashes, just add a comment # CRASHES with specific call as in other tests as indication that we did not miss such case, but cannot test it.

Ok, I added a comment.

@kumaraditya303 kumaraditya303 removed their request for review January 3, 2025 10:17
@vstinner vstinner enabled auto-merge (squash) January 6, 2025 11:37
@vstinner vstinner disabled auto-merge January 6, 2025 11:37
@vstinner vstinner enabled auto-merge (squash) January 6, 2025 11:38

# non-ASCII mode failing with "Invalid argument"
with self.assertRaises(OSError):
_testcapi.py_fopen(__file__, "\xe9")
Copy link
Member

@serhiy-storchaka serhiy-storchaka Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\xe9" is encoded to b'\xc3\xa9'. Please test also with non-UTF-8 bytes. You may get different error on Windows. Actually, it may depend on the locale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to pass non-UTF-8 bytes, PySys_Audit() decodes the mode from UTF-8 in strict mode:

    if (PySys_Audit("open", "Osi", path, mode, 0) < 0) {
        return NULL;
    }

I don't think that it's worth to "support" non-UTF-8 just for the test, whereas it's rejected anyway by fopen().

wmode, Py_ARRAY_LENGTH(wmode));
if (usize == 0) {
PyErr_SetFromWindowsErr(0);
wchar_t wmode[10];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use "z#" to pass arbitrary sequence of bytes or NULL. In Argument Clinic it is expressed as str(zeroes=True, accept={robuffer, str, NoneType}).

Use the NULLABLE macro for path.

If it crashes, just add a comment # CRASHES with specific call as in other tests as indication that we did not miss such case, but cannot test it.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet more test suggestions.

@vstinner vstinner disabled auto-merge January 6, 2025 11:54
@vstinner vstinner enabled auto-merge (squash) January 6, 2025 12:36
@vstinner
Copy link
Member Author

vstinner commented Jan 6, 2025

@serhiy-storchaka: I tried to implement all requested tests, but I skipped some of them for practical reasons.

On Windows, the file mode is limited to 10 characters. If it's a problem later, I can allocate memory on the heap and write more complicated code to handle longer modes. In practice, we should be fine with this arbitrary limit.

@vstinner vstinner merged commit f89e5e2 into python:main Jan 6, 2025
40 of 41 checks passed
@vstinner vstinner deleted the py_fopen branch January 6, 2025 12:43
@vstinner
Copy link
Member Author

vstinner commented Jan 6, 2025

PR merged, thanks for your reviews!

@serhiy-storchaka: You can write a follow-up PR if you want to extend the code coverage (add more tests).

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants