Skip to content

Commit 144f1e2

Browse files
authored
[3.7] bpo-34589: Add -X coerce_c_locale option; C locale coercion off by default (GH-9379)
* bpo-34589: Make _PyCoreConfig.coerce_c_locale private (GH-9371) _PyCoreConfig: * Rename coerce_c_locale to _coerce_c_locale * Rename coerce_c_locale_warn to _coerce_c_locale_warn These fields are now private (name prefixed by "_"). (cherry picked from commit 188ebfa) * bpo-34589: C locale coercion off by default (GH-9073) Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3). test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent. (cherry picked from commit 7a0791b) * bpo-34589: Add -X coerce_c_locale command line option (GH-9378) Add a new -X coerce_c_locale command line option to control C locale coercion (PEP 538). (cherry picked from commit dbdee00)
1 parent 512d710 commit 144f1e2

File tree

15 files changed

+224
-67
lines changed

15 files changed

+224
-67
lines changed

Doc/using/cmdline.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -438,10 +438,19 @@ Miscellaneous options
438438
* Set the :attr:`~sys.flags.dev_mode` attribute of :attr:`sys.flags` to
439439
``True``
440440

441-
* ``-X utf8`` enables UTF-8 mode for operating system interfaces, overriding
441+
* ``-X utf8`` enables UTF-8 mode (:pep:`540`) for operating system interfaces, overriding
442442
the default locale-aware mode. ``-X utf8=0`` explicitly disables UTF-8
443443
mode (even when it would otherwise activate automatically).
444444
See :envvar:`PYTHONUTF8` for more details.
445+
* ``-X coerce_c_locale`` or ``-X coerce_c_locale=1`` tries to coerce the C
446+
locale (:pep:`538`).
447+
``-X coerce_c_locale=0`` skips coercing the legacy ASCII-based C and POSIX
448+
locales to a more capable UTF-8 based alternative.
449+
``-X coerce_c_locale=warn`` will cause Python to emit warning messages on
450+
``stderr`` if either the locale coercion activates, or else if a locale
451+
that *would* have triggered coercion is still active when the Python
452+
runtime is initialized.
453+
See :envvar:`PYTHONCOERCECLOCALE` for more details.
445454

446455
It also allows passing arbitrary values and retrieving them through the
447456
:data:`sys._xoptions` dictionary.
@@ -461,6 +470,9 @@ Miscellaneous options
461470
.. versionadded:: 3.7
462471
The ``-X importtime``, ``-X dev`` and ``-X utf8`` options.
463472

473+
.. versionadded:: 3.7.1
474+
The ``-X coerce_c_locale`` option.
475+
464476

465477
Options you shouldn't use
466478
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -834,6 +846,8 @@ conflict.
834846
order to force the interpreter to use ``ASCII`` instead of ``UTF-8`` for
835847
system interfaces.
836848

849+
Also available as the :option:`-X` ``coerce_c_locale`` option.
850+
837851
Availability: \*nix
838852

839853
.. versionadded:: 3.7

Doc/whatsnew/3.7.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2494,3 +2494,10 @@ versions, it respected an ill-defined subset of those environment variables,
24942494
while in Python 3.7.0 it didn't read any of them due to :issue:`34247`). If
24952495
this behavior is unwanted, set :c:data:`Py_IgnoreEnvironmentFlag` to 1 before
24962496
calling :c:func:`Py_Initialize`.
2497+
2498+
:c:func:`Py_Initialize` and :c:func:`Py_Main` cannot enable the C locale
2499+
coercion (:pep:`538`) anymore: it is always disabled. It can now only be
2500+
enabled by the Python program ("python3).
2501+
2502+
New :option:`-X` ``coerce_c_locale`` command line option to control C locale
2503+
coercion (:pep:`538`).

Include/pylifecycle.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,11 @@ PyAPI_FUNC(int) Py_FdIsInteractive(FILE *, const char *);
119119
/* Bootstrap __main__ (defined in Modules/main.c) */
120120
PyAPI_FUNC(int) Py_Main(int argc, wchar_t **argv);
121121
#ifdef Py_BUILD_CORE
122+
# ifdef MS_WINDOWS
123+
PyAPI_FUNC(int) _Py_WindowsMain(int argc, wchar_t **argv);
124+
# else
122125
PyAPI_FUNC(int) _Py_UnixMain(int argc, char **argv);
126+
# endif
123127
#endif
124128

125129
/* In getpath.c */

Include/pystate.h

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,6 @@ typedef struct {
4141
int show_alloc_count; /* -X showalloccount */
4242
int dump_refs; /* PYTHONDUMPREFS */
4343
int malloc_stats; /* PYTHONMALLOCSTATS */
44-
int coerce_c_locale; /* PYTHONCOERCECLOCALE, -1 means unknown */
45-
int coerce_c_locale_warn; /* PYTHONCOERCECLOCALE=warn */
4644
int utf8_mode; /* PYTHONUTF8, -X utf8; -1 means unknown */
4745

4846
wchar_t *program_name; /* Program name, see also Py_GetProgramName() */
@@ -74,14 +72,17 @@ typedef struct {
7472

7573
/* Private fields */
7674
int _disable_importlib; /* Needed by freeze_importlib */
75+
int _coerce_c_locale; /* PYTHONCOERCECLOCALE, -1 means unknown */
76+
int _coerce_c_locale_warn; /* PYTHONCOERCECLOCALE=warn */
7777
} _PyCoreConfig;
7878

7979
#define _PyCoreConfig_INIT \
8080
(_PyCoreConfig){ \
8181
.install_signal_handlers = -1, \
8282
.ignore_environment = -1, \
8383
.use_hash_seed = -1, \
84-
.coerce_c_locale = -1, \
84+
._coerce_c_locale = 0, \
85+
._coerce_c_locale_warn = 0, \
8586
.faulthandler = -1, \
8687
.tracemalloc = -1, \
8788
.utf8_mode = -1, \

Lib/test/test_c_locale_coercion.py

Lines changed: 45 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ def _handle_output_variations(data):
139139
return data
140140

141141
@classmethod
142-
def get_child_details(cls, env_vars):
142+
def get_child_details(cls, env_vars, xoption=None):
143143
"""Retrieves fsencoding and standard stream details from a child process
144144
145145
Returns (encoding_details, stderr_lines):
@@ -150,10 +150,11 @@ def get_child_details(cls, env_vars):
150150
The child is run in isolated mode if the current interpreter supports
151151
that.
152152
"""
153-
result, py_cmd = run_python_until_end(
154-
"-X", "utf8=0", "-c", cls.CHILD_PROCESS_SCRIPT,
155-
**env_vars
156-
)
153+
args = []
154+
if xoption:
155+
args.extend(("-X", f"coerce_c_locale={xoption}"))
156+
args.extend(("-X", "utf8=0", "-c", cls.CHILD_PROCESS_SCRIPT))
157+
result, py_cmd = run_python_until_end(*args, **env_vars)
157158
if not result.rc == 0:
158159
result.fail(py_cmd)
159160
# All subprocess outputs in this test case should be pure ASCII
@@ -212,15 +213,16 @@ def _check_child_encoding_details(self,
212213
expected_fs_encoding,
213214
expected_stream_encoding,
214215
expected_warnings,
215-
coercion_expected):
216+
coercion_expected,
217+
xoption=None):
216218
"""Check the C locale handling for the given process environment
217219
218220
Parameters:
219221
expected_fs_encoding: expected sys.getfilesystemencoding() result
220222
expected_stream_encoding: expected encoding for standard streams
221223
expected_warning: stderr output to expect (if any)
222224
"""
223-
result = EncodingDetails.get_child_details(env_vars)
225+
result = EncodingDetails.get_child_details(env_vars, xoption)
224226
encoding_details, stderr_lines = result
225227
expected_details = EncodingDetails.get_expected_details(
226228
coercion_expected,
@@ -290,6 +292,7 @@ def _check_c_locale_coercion(self,
290292
coerce_c_locale,
291293
expected_warnings=None,
292294
coercion_expected=True,
295+
use_xoption=False,
293296
**extra_vars):
294297
"""Check the C locale handling for various configurations
295298
@@ -319,8 +322,12 @@ def _check_c_locale_coercion(self,
319322
"PYTHONCOERCECLOCALE": "",
320323
}
321324
base_var_dict.update(extra_vars)
325+
xoption = None
322326
if coerce_c_locale is not None:
323-
base_var_dict["PYTHONCOERCECLOCALE"] = coerce_c_locale
327+
if use_xoption:
328+
xoption = coerce_c_locale
329+
else:
330+
base_var_dict["PYTHONCOERCECLOCALE"] = coerce_c_locale
324331

325332
# Check behaviour for the default locale
326333
with self.subTest(default_locale=True,
@@ -342,7 +349,8 @@ def _check_c_locale_coercion(self,
342349
fs_encoding,
343350
stream_encoding,
344351
_expected_warnings,
345-
_coercion_expected)
352+
_coercion_expected,
353+
xoption=xoption)
346354

347355
# Check behaviour for explicitly configured locales
348356
for locale_to_set in EXPECTED_C_LOCALE_EQUIVALENTS:
@@ -357,7 +365,8 @@ def _check_c_locale_coercion(self,
357365
fs_encoding,
358366
stream_encoding,
359367
expected_warnings,
360-
coercion_expected)
368+
coercion_expected,
369+
xoption=xoption)
361370

362371
def test_PYTHONCOERCECLOCALE_not_set(self):
363372
# This should coerce to the first available target locale by default
@@ -404,6 +413,32 @@ def test_LC_ALL_set_to_C(self):
404413
expected_warnings=[LEGACY_LOCALE_WARNING],
405414
coercion_expected=False)
406415

416+
def test_xoption_set_to_1(self):
417+
self._check_c_locale_coercion("utf-8", "utf-8", coerce_c_locale="1",
418+
use_xoption=True)
419+
420+
def test_xoption_set_to_zero(self):
421+
# The setting "0" should result in the locale coercion being disabled
422+
self._check_c_locale_coercion(EXPECTED_C_LOCALE_FS_ENCODING,
423+
EXPECTED_C_LOCALE_STREAM_ENCODING,
424+
coerce_c_locale="0",
425+
coercion_expected=False,
426+
use_xoption=True)
427+
# Setting LC_ALL=C shouldn't make any difference to the behaviour
428+
self._check_c_locale_coercion(EXPECTED_C_LOCALE_FS_ENCODING,
429+
EXPECTED_C_LOCALE_STREAM_ENCODING,
430+
coerce_c_locale="0",
431+
LC_ALL="C",
432+
coercion_expected=False,
433+
use_xoption=True)
434+
435+
def test_xoption_set_to_warn(self):
436+
# -X coerce_c_locale=warn enables runtime warnings for legacy locales
437+
self._check_c_locale_coercion("utf-8", "utf-8",
438+
coerce_c_locale="warn",
439+
expected_warnings=[CLI_COERCION_WARNING],
440+
use_xoption=True)
441+
407442
def test_main():
408443
test.support.run_unittest(
409444
LocaleConfigurationTests,

Lib/test/test_cmd_line.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,13 +159,16 @@ def test_undecodable_code(self):
159159
env = os.environ.copy()
160160
# Use C locale to get ascii for the locale encoding
161161
env['LC_ALL'] = 'C'
162-
env['PYTHONCOERCECLOCALE'] = '0'
163162
code = (
164163
b'import locale; '
165164
b'print(ascii("' + undecodable + b'"), '
166165
b'locale.getpreferredencoding())')
167166
p = subprocess.Popen(
168-
[sys.executable, "-c", code],
167+
[sys.executable,
168+
# Disable C locale coercion and UTF-8 Mode to not use UTF-8
169+
"-X", "coerce_c_locale=0",
170+
"-X", "utf8=0",
171+
"-c", code],
169172
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
170173
env=env)
171174
stdout, stderr = p.communicate()

Lib/test/test_embed.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -267,9 +267,6 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
267267
'malloc_stats': 0,
268268
'utf8_mode': 0,
269269

270-
'coerce_c_locale': 0,
271-
'coerce_c_locale_warn': 0,
272-
273270
'program_name': './_testembed',
274271
'argc': 0,
275272
'argv': '[]',
@@ -290,6 +287,8 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):
290287

291288
'_disable_importlib': 0,
292289
'Py_FrozenFlag': 0,
290+
'_coerce_c_locale': 0,
291+
'_coerce_c_locale_warn': 0,
293292
}
294293

295294
def check_config(self, testname, expected):

Lib/test/test_sys.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -656,9 +656,8 @@ def test_getfilesystemencoding(self):
656656

657657
def c_locale_get_error_handler(self, locale, isolated=False, encoding=None):
658658
# Force the POSIX locale
659-
env = os.environ.copy()
659+
env = dict(os.environ)
660660
env["LC_ALL"] = locale
661-
env["PYTHONCOERCECLOCALE"] = "0"
662661
code = '\n'.join((
663662
'import sys',
664663
'def dump(name):',
@@ -668,7 +667,10 @@ def c_locale_get_error_handler(self, locale, isolated=False, encoding=None):
668667
'dump("stdout")',
669668
'dump("stderr")',
670669
))
671-
args = [sys.executable, "-c", code]
670+
args = [sys.executable,
671+
"-X", "utf8=0",
672+
"-X", "coerce_c_locale=0",
673+
"-c", code]
672674
if isolated:
673675
args.append("-I")
674676
if encoding is not None:

Lib/test/test_utf8_mode.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ def posix_locale(self):
2727
return (loc in POSIX_LOCALES)
2828

2929
def get_output(self, *args, failure=False, **kw):
30+
# Always disable the C locale coercion (PEP 538)
31+
args = ('-X', 'coerce_c_locale=0', *args)
3032
kw = dict(self.DEFAULT_ENV, **kw)
3133
if failure:
3234
out = assert_python_failure(*args, **kw)
@@ -116,7 +118,6 @@ def test_filesystemencoding(self):
116118
# PYTHONLEGACYWINDOWSFSENCODING disables the UTF-8 mode
117119
# and has the priority over -X utf8 and PYTHONUTF8
118120
out = self.get_output('-X', 'utf8', '-c', code,
119-
PYTHONUTF8='strict',
120121
PYTHONLEGACYWINDOWSFSENCODING='1')
121122
self.assertEqual(out, 'mbcs/replace')
122123

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538)
2+
anymore: it is always disabled. It can now only be enabled by the Python
3+
program ("python3).
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Add a new :option:`-X` ``coerce_c_locale`` command line option to control C
2+
locale coercion (:pep:`538`).

0 commit comments

Comments
 (0)