-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Doc/library/os.rst
Outdated
@@ -1730,6 +1730,11 @@ features: | |||
|
|||
Return a bytestring representing the current working directory. | |||
|
|||
.. versionchanged:: 3.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not fix it in 3.8 as well? It's not a new feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a backward incompatible change.
The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure.
I retargeted my PR 14396 to Python 3.8.
Sorry, I was confused about 3.8 status. It's not released yet, I'm fine with targeting 3.8. It's better to get this bug fixed ASAP. |
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
GH-14399 is a backport of this pull request to the 3.8 branch. |
The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure. (cherry picked from commit 689830e) Co-authored-by: Victor Stinner <[email protected]>
The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure. (cherry picked from commit 689830e) Co-authored-by: Victor Stinner <[email protected]>
terminating \0. If the buffer is too small, len includes | ||
the space needed for the terminator. */ | ||
if (len >= Py_ARRAY_LENGTH(wbuf)) { | ||
if (len >= PY_SSIZE_T_MAX / sizeof(wchar_t)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyMem_RawMalloc
already checks this and returns NULL
if size > (size_t)PY_SSIZE_T_MAX
. Also, >=
is the wrong comparison operator, so this is causing a MemoryError
on long paths:
>>> p = 'C:/Temp/longpath' + ('/' + 'a' * 255) * 9
>>> os.chdir(p)
>>> len(os.getcwd())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX.
In CPython, we attempt to avoid undefined behaviors (UB). The C language does not define what happens in case of integer overflow. Maybe PyMem_RawMalloc() check will be enough, maybe not. In case of doubt, we prefer to check before the multiply.
I wrote PR #14424 to fix my regression and add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I didn't think that through clearly. In this case it's academic, though. In practice the current directory is limited to 32,767 characters since the kernel uses counted strings with a USHORT
length in bytes.
The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure.
The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure.
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.
os.getcwd() and os.getcwdb() detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
https://bugs.python.org/issue37412