Skip to content

bpo-37412: os.getcwdb() now uses UTF-8 on Windows #14396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Doc/library/os.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1730,6 +1730,11 @@ features:

Return a bytestring representing the current working directory.

.. versionchanged:: 3.8
The function now uses the UTF-8 encoding on Windows, rather than the ANSI
code page: see :pep:`529` for the rationale. The function is no longer
deprecated on Windows.


.. function:: lchflags(path, flags)

Expand Down
5 changes: 5 additions & 0 deletions Doc/whatsnew/3.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1231,6 +1231,11 @@ Changes in Python behavior
Changes in the Python API
-------------------------

* The :func:`os.getcwdb` function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see :pep:`529` for the rationale. The
function is no longer deprecated on Windows.
(Contributed by Victor Stinner in :issue:`37412`.)

* :class:`subprocess.Popen` can now use :func:`os.posix_spawn` in some cases
for better performance. On Windows Subsystem for Linux and QEMU User
Emulation, Popen constructor using :func:`os.posix_spawn` no longer raise an
Expand Down
11 changes: 11 additions & 0 deletions Lib/test/test_os.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,17 @@ def create_file(filename, content=b'content'):
fp.write(content)


class MiscTests(unittest.TestCase):
def test_getcwd(self):
cwd = os.getcwd()
self.assertIsInstance(cwd, str)

def test_getcwdb(self):
cwd = os.getcwdb()
self.assertIsInstance(cwd, bytes)
self.assertEqual(os.fsdecode(cwd), os.getcwd())


# Tests creating TESTFN
class FileTests(unittest.TestCase):
def setUp(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The :func:`os.getcwdb` function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see :pep:`529` for the rationale. The function
is no longer deprecated on Windows.
123 changes: 64 additions & 59 deletions Modules/posixmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -506,17 +506,6 @@ void _Py_attribute_data_to_stat(BY_HANDLE_FILE_INFORMATION *,
ULONG, struct _Py_stat_struct *);
#endif

#ifdef MS_WINDOWS
static int
win32_warn_bytes_api()
{
return PyErr_WarnEx(PyExc_DeprecationWarning,
"The Windows bytes API has been deprecated, "
"use Unicode filenames instead",
1);
}
#endif


#ifndef MS_WINDOWS
PyObject *
Expand Down Expand Up @@ -3334,83 +3323,99 @@ os_lchown_impl(PyObject *module, path_t *path, uid_t uid, gid_t gid)
static PyObject *
posix_getcwd(int use_bytes)
{
char *buf, *tmpbuf;
char *cwd;
const size_t chunk = 1024;
size_t buflen = 0;
PyObject *obj;

#ifdef MS_WINDOWS
if (!use_bytes) {
wchar_t wbuf[MAXPATHLEN];
wchar_t *wbuf2 = wbuf;
PyObject *resobj;
DWORD len;
Py_BEGIN_ALLOW_THREADS
len = GetCurrentDirectoryW(Py_ARRAY_LENGTH(wbuf), wbuf);
/* If the buffer is large enough, len does not include the
terminating \0. If the buffer is too small, len includes
the space needed for the terminator. */
if (len >= Py_ARRAY_LENGTH(wbuf)) {
wchar_t wbuf[MAXPATHLEN];
wchar_t *wbuf2 = wbuf;
DWORD len;

Py_BEGIN_ALLOW_THREADS
len = GetCurrentDirectoryW(Py_ARRAY_LENGTH(wbuf), wbuf);
/* If the buffer is large enough, len does not include the
terminating \0. If the buffer is too small, len includes
the space needed for the terminator. */
if (len >= Py_ARRAY_LENGTH(wbuf)) {
if (len >= PY_SSIZE_T_MAX / sizeof(wchar_t)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX. Also, >= is the wrong comparison operator, so this is causing a MemoryError on long paths:

>>> p = 'C:/Temp/longpath' + ('/' + 'a' * 255) * 9
>>> os.chdir(p)
>>> len(os.getcwd())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_RawMalloc already checks this and returns NULL if size > (size_t)PY_SSIZE_T_MAX.

In CPython, we attempt to avoid undefined behaviors (UB). The C language does not define what happens in case of integer overflow. Maybe PyMem_RawMalloc() check will be enough, maybe not. In case of doubt, we prefer to check before the multiply.

I wrote PR #14424 to fix my regression and add a test for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't think that through clearly. In this case it's academic, though. In practice the current directory is limited to 32,767 characters since the kernel uses counted strings with a USHORT length in bytes.

wbuf2 = PyMem_RawMalloc(len * sizeof(wchar_t));
if (wbuf2)
len = GetCurrentDirectoryW(len, wbuf2);
}
Py_END_ALLOW_THREADS
if (!wbuf2) {
PyErr_NoMemory();
return NULL;
else {
wbuf2 = NULL;
}
if (!len) {
if (wbuf2 != wbuf)
PyMem_RawFree(wbuf2);
return PyErr_SetFromWindowsErr(0);
if (wbuf2) {
len = GetCurrentDirectoryW(len, wbuf2);
}
resobj = PyUnicode_FromWideChar(wbuf2, len);
}
Py_END_ALLOW_THREADS

if (!wbuf2) {
PyErr_NoMemory();
return NULL;
}
if (!len) {
if (wbuf2 != wbuf)
PyMem_RawFree(wbuf2);
return resobj;
return PyErr_SetFromWindowsErr(0);
}

if (win32_warn_bytes_api())
return NULL;
#endif
PyObject *resobj = PyUnicode_FromWideChar(wbuf2, len);
if (wbuf2 != wbuf) {
PyMem_RawFree(wbuf2);
}

if (use_bytes) {
if (resobj == NULL) {
return NULL;
}
Py_SETREF(resobj, PyUnicode_EncodeFSDefault(resobj));
}

return resobj;
#else
const size_t chunk = 1024;

char *buf = NULL;
char *cwd = NULL;
size_t buflen = 0;

buf = cwd = NULL;
Py_BEGIN_ALLOW_THREADS
do {
buflen += chunk;
#ifdef MS_WINDOWS
if (buflen > INT_MAX) {
PyErr_NoMemory();
break;
char *newbuf;
if (buflen <= PY_SSIZE_T_MAX - chunk) {
buflen += chunk;
newbuf = PyMem_RawRealloc(buf, buflen);
}
#endif
tmpbuf = PyMem_RawRealloc(buf, buflen);
if (tmpbuf == NULL)
else {
newbuf = NULL;
}
if (newbuf == NULL) {
PyMem_RawFree(buf);
buf = NULL;
break;
}
buf = newbuf;

buf = tmpbuf;
#ifdef MS_WINDOWS
cwd = getcwd(buf, (int)buflen);
#else
cwd = getcwd(buf, buflen);
#endif
} while (cwd == NULL && errno == ERANGE);
Py_END_ALLOW_THREADS

if (buf == NULL) {
return PyErr_NoMemory();
}
if (cwd == NULL) {
PyMem_RawFree(buf);
return posix_error();
}

if (use_bytes)
PyObject *obj;
if (use_bytes) {
obj = PyBytes_FromStringAndSize(buf, strlen(buf));
else
}
else {
obj = PyUnicode_DecodeFSDefault(buf);
}
PyMem_RawFree(buf);

return obj;
#endif /* !MS_WINDOWS */
}


Expand Down