bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

vstinner · 2019-06-26T14:32:19Z

The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.

os.getcwd() and os.getcwdb() detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.

https://bugs.python.org/issue37412

zooba · 2019-06-26T14:50:38Z

Doc/library/os.rst

@@ -1730,6 +1730,11 @@ features:

   Return a bytestring representing the current working directory.

+   .. versionchanged:: 3.9


Why not fix it in 3.8 as well? It's not a new feature

It's a backward incompatible change.

Modules/posixmodule.c

The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure.

vstinner · 2019-06-26T15:03:57Z

I retargeted my PR 14396 to Python 3.8.

Why not fix it in 3.8 as well? It's not a new feature

Sorry, I was confused about 3.8 status. It's not released yet, I'm fine with targeting 3.8. It's better to get this bug fixed ASAP.

miss-islington · 2019-06-26T15:31:25Z

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

bedevere-bot · 2019-06-26T15:31:51Z

GH-14399 is a backport of this pull request to the 3.8 branch.

The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure. (cherry picked from commit 689830e) Co-authored-by: Victor Stinner <[email protected]>

eryksun · 2019-06-27T01:05:25Z

Modules/posixmodule.c

+       terminating \0. If the buffer is too small, len includes
+       the space needed for the terminator. */
+    if (len >= Py_ARRAY_LENGTH(wbuf)) {
+        if (len >= PY_SSIZE_T_MAX / sizeof(wchar_t)) {


PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX. Also, >= is the wrong comparison operator, so this is causing a MemoryError on long paths:

>>> p = 'C:/Temp/longpath' + ('/' + 'a' * 255) * 9 >>> os.chdir(p) >>> len(os.getcwd()) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError

PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX.

In CPython, we attempt to avoid undefined behaviors (UB). The C language does not define what happens in case of integer overflow. Maybe PyMem_RawMalloc() check will be enough, maybe not. In case of doubt, we prefer to check before the multiply.

I wrote PR #14424 to fix my regression and add a test for it.

Right, I didn't think that through clearly. In this case it's academic, though. In practice the current directory is limited to 32,767 characters since the kernel uses counted strings with a USHORT length in bytes.

The os.getcwdb() function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see PEP 529 for the rationale. The function is no longer deprecated on Windows. os.getcwd() and os.getcwdb() now detect integer overflow on memory allocations. On Unix, these functions properly report MemoryError on memory allocation failure.

the-knights-who-say-ni added the CLA signed label Jun 26, 2019

bedevere-bot added the awaiting core review label Jun 26, 2019

vstinner requested review from zooba and methane June 26, 2019 14:32

zooba approved these changes Jun 26, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jun 26, 2019

vstinner merged commit 689830e into python:master Jun 26, 2019

bedevere-bot removed the awaiting merge label Jun 26, 2019

vstinner deleted the getcwdb_windows branch June 26, 2019 15:31

vstinner added the needs backport to 3.8 label Jun 26, 2019

bedevere-bot removed the needs backport to 3.8 label Jun 26, 2019

eryksun reviewed Jun 27, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

Uh oh!

vstinner commented Jun 26, 2019 •

edited by bedevere-bot

Loading

Uh oh!

zooba Jun 26, 2019

Uh oh!

vstinner Jun 26, 2019

Uh oh!

Uh oh!

vstinner commented Jun 26, 2019

Uh oh!

miss-islington commented Jun 26, 2019

Uh oh!

bedevere-bot commented Jun 26, 2019

Uh oh!

eryksun Jun 27, 2019

Uh oh!

vstinner Jun 27, 2019

Uh oh!

eryksun Jun 27, 2019

Uh oh!

Uh oh!

		@@ -1730,6 +1730,11 @@ features:

		Return a bytestring representing the current working directory.

		.. versionchanged:: 3.9

Uh oh!

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

Uh oh!

Conversation

vstinner commented Jun 26, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zooba Jun 26, 2019

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 26, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Jun 26, 2019

Uh oh!

miss-islington commented Jun 26, 2019

Uh oh!

bedevere-bot commented Jun 26, 2019

Uh oh!

eryksun Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

vstinner Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

eryksun Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Jun 26, 2019 •

edited by bedevere-bot

Loading