Skip to content

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2019
Merged

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

merged 1 commit into from
Jun 26, 2019

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jun 26, 2019

The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.

https://bugs.python.org/issue37412

@@ -1730,6 +1730,11 @@ features:

Return a bytestring representing the current working directory.

.. versionchanged:: 3.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fix it in 3.8 as well? It's not a new feature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a backward incompatible change.

The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
@vstinner
Copy link
Member Author

I retargeted my PR 14396 to Python 3.8.

Why not fix it in 3.8 as well? It's not a new feature

Sorry, I was confused about 3.8 status. It's not released yet, I'm fine with targeting 3.8. It's better to get this bug fixed ASAP.

@vstinner vstinner merged commit 689830e into python:master Jun 26, 2019
@vstinner vstinner deleted the getcwdb_windows branch June 26, 2019 15:31
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-14399 is a backport of this pull request to the 3.8 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 26, 2019
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
(cherry picked from commit 689830e)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington added a commit that referenced this pull request Jun 26, 2019
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
(cherry picked from commit 689830e)

Co-authored-by: Victor Stinner <[email protected]>
terminating \0. If the buffer is too small, len includes
the space needed for the terminator. */
if (len >= Py_ARRAY_LENGTH(wbuf)) {
if (len >= PY_SSIZE_T_MAX / sizeof(wchar_t)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX. Also, >= is the wrong comparison operator, so this is causing a MemoryError on long paths:

>>> p = 'C:/Temp/longpath' + ('/' + 'a' * 255) * 9
>>> os.chdir(p)
>>> len(os.getcwd())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX.

In CPython, we attempt to avoid undefined behaviors (UB). The C language does not define what happens in case of integer overflow. Maybe PyMem_RawMalloc() check will be enough, maybe not. In case of doubt, we prefer to check before the multiply.

I wrote PR #14424 to fix my regression and add a test for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't think that through clearly. In this case it's academic, though. In practice the current directory is limited to 32,767 characters since the kernel uses counted strings with a USHORT length in bytes.

lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants