Skip to content

Commit 29c89d0

Browse files
[3.5] bpo-29755: Fixed the lgettext() family of functions in the gettext module. (GH-2266) (#2298)
They now always return bytes. Updated the gettext documentation. (cherry picked from commit 26cb465)
1 parent 4108606 commit 29c89d0

File tree

4 files changed

+229
-107
lines changed

4 files changed

+229
-107
lines changed

Doc/library/gettext.rst

Lines changed: 80 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,10 @@ class-based API instead.
4848

4949
.. function:: bind_textdomain_codeset(domain, codeset=None)
5050

51-
Bind the *domain* to *codeset*, changing the encoding of strings returned by the
52-
:func:`gettext` family of functions. If *codeset* is omitted, then the current
53-
binding is returned.
51+
Bind the *domain* to *codeset*, changing the encoding of byte strings
52+
returned by the :func:`lgettext`, :func:`ldgettext`, :func:`lngettext`
53+
and :func:`ldngettext` functions.
54+
If *codeset* is omitted, then the current binding is returned.
5455

5556

5657
.. function:: textdomain(domain=None)
@@ -67,28 +68,14 @@ class-based API instead.
6768
:func:`_` in the local namespace (see examples below).
6869

6970

70-
.. function:: lgettext(message)
71-
72-
Equivalent to :func:`gettext`, but the translation is returned in the
73-
preferred system encoding, if no other encoding was explicitly set with
74-
:func:`bind_textdomain_codeset`.
75-
76-
7771
.. function:: dgettext(domain, message)
7872

79-
Like :func:`gettext`, but look the message up in the specified *domain*.
80-
81-
82-
.. function:: ldgettext(domain, message)
83-
84-
Equivalent to :func:`dgettext`, but the translation is returned in the
85-
preferred system encoding, if no other encoding was explicitly set with
86-
:func:`bind_textdomain_codeset`.
73+
Like :func:`.gettext`, but look the message up in the specified *domain*.
8774

8875

8976
.. function:: ngettext(singular, plural, n)
9077

91-
Like :func:`gettext`, but consider plural forms. If a translation is found,
78+
Like :func:`.gettext`, but consider plural forms. If a translation is found,
9279
apply the plural formula to *n*, and return the resulting message (some
9380
languages have more than two plural forms). If no translation is found, return
9481
*singular* if *n* is 1; return *plural* otherwise.
@@ -101,24 +88,33 @@ class-based API instead.
10188
formulas for a variety of languages.
10289

10390

104-
.. function:: lngettext(singular, plural, n)
105-
106-
Equivalent to :func:`ngettext`, but the translation is returned in the
107-
preferred system encoding, if no other encoding was explicitly set with
108-
:func:`bind_textdomain_codeset`.
109-
110-
11191
.. function:: dngettext(domain, singular, plural, n)
11292

11393
Like :func:`ngettext`, but look the message up in the specified *domain*.
11494

11595

96+
.. function:: lgettext(message)
97+
.. function:: ldgettext(domain, message)
98+
.. function:: lngettext(singular, plural, n)
11699
.. function:: ldngettext(domain, singular, plural, n)
117100

118-
Equivalent to :func:`dngettext`, but the translation is returned in the
119-
preferred system encoding, if no other encoding was explicitly set with
101+
Equivalent to the corresponding functions without the ``l`` prefix
102+
(:func:`.gettext`, :func:`dgettext`, :func:`ngettext` and :func:`dngettext`),
103+
but the translation is returned as a byte string encoded in the preferred
104+
system encoding if no other encoding was explicitly set with
120105
:func:`bind_textdomain_codeset`.
121106

107+
.. warning::
108+
109+
These functions should be avoided in Python 3, because they return
110+
encoded bytes. It's much better to use alternatives which return
111+
Unicode strings instead, since most Python applications will want to
112+
manipulate human readable text as strings instead of bytes. Further,
113+
it's possible that you may get unexpected Unicode-related exceptions
114+
if there are encoding problems with the translated strings. It is
115+
possible that the ``l*()`` functions will be deprecated in future Python
116+
versions due to their inherent problems and limitations.
117+
122118

123119
Note that GNU :program:`gettext` also defines a :func:`dcgettext` method, but
124120
this was deemed not useful and so it is currently unimplemented.
@@ -179,8 +175,9 @@ class can also install themselves in the built-in namespace as the function
179175
names are cached. The actual class instantiated is either *class_* if
180176
provided, otherwise :class:`GNUTranslations`. The class's constructor must
181177
take a single :term:`file object` argument. If provided, *codeset* will change
182-
the charset used to encode translated strings in the :meth:`lgettext` and
183-
:meth:`lngettext` methods.
178+
the charset used to encode translated strings in the
179+
:meth:`~NullTranslations.lgettext` and :meth:`~NullTranslations.lngettext`
180+
methods.
184181

185182
If multiple files are found, later files are used as fallbacks for earlier ones.
186183
To allow setting the fallback, :func:`copy.copy` is used to clone each
@@ -250,26 +247,29 @@ are the methods of :class:`NullTranslations`:
250247

251248
.. method:: gettext(message)
252249

253-
If a fallback has been set, forward :meth:`gettext` to the fallback.
254-
Otherwise, return the translated message. Overridden in derived classes.
255-
256-
257-
.. method:: lgettext(message)
258-
259-
If a fallback has been set, forward :meth:`lgettext` to the fallback.
260-
Otherwise, return the translated message. Overridden in derived classes.
250+
If a fallback has been set, forward :meth:`.gettext` to the fallback.
251+
Otherwise, return *message*. Overridden in derived classes.
261252

262253

263254
.. method:: ngettext(singular, plural, n)
264255

265256
If a fallback has been set, forward :meth:`ngettext` to the fallback.
266-
Otherwise, return the translated message. Overridden in derived classes.
257+
Otherwise, return *singular* if *n* is 1; return *plural* otherwise.
258+
Overridden in derived classes.
267259

268260

261+
.. method:: lgettext(message)
269262
.. method:: lngettext(singular, plural, n)
270263

271-
If a fallback has been set, forward :meth:`lngettext` to the fallback.
272-
Otherwise, return the translated message. Overridden in derived classes.
264+
Equivalent to :meth:`.gettext` and :meth:`ngettext`, but the translation
265+
is returned as a byte string encoded in the preferred system encoding
266+
if no encoding was explicitly set with :meth:`set_output_charset`.
267+
Overridden in derived classes.
268+
269+
.. warning::
270+
271+
These methods should be avoided in Python 3. See the warning for the
272+
:func:`lgettext` function.
273273

274274

275275
.. method:: info()
@@ -279,32 +279,28 @@ are the methods of :class:`NullTranslations`:
279279

280280
.. method:: charset()
281281

282-
Return the "protected" :attr:`_charset` variable, which is the encoding of
283-
the message catalog file.
282+
Return the encoding of the message catalog file.
284283

285284

286285
.. method:: output_charset()
287286

288-
Return the "protected" :attr:`_output_charset` variable, which defines the
289-
encoding used to return translated messages in :meth:`lgettext` and
290-
:meth:`lngettext`.
287+
Return the encoding used to return translated messages in :meth:`.lgettext`
288+
and :meth:`.lngettext`.
291289

292290

293291
.. method:: set_output_charset(charset)
294292

295-
Change the "protected" :attr:`_output_charset` variable, which defines the
296-
encoding used to return translated messages.
293+
Change the encoding used to return translated messages.
297294

298295

299296
.. method:: install(names=None)
300297

301-
This method installs :meth:`self.gettext` into the built-in namespace,
298+
This method installs :meth:`.gettext` into the built-in namespace,
302299
binding it to ``_``.
303300

304301
If the *names* parameter is given, it must be a sequence containing the
305302
names of functions you want to install in the builtins namespace in
306-
addition to :func:`_`. Supported names are ``'gettext'`` (bound to
307-
:meth:`self.gettext`), ``'ngettext'`` (bound to :meth:`self.ngettext`),
303+
addition to :func:`_`. Supported names are ``'gettext'``, ``'ngettext'``,
308304
``'lgettext'`` and ``'lngettext'``.
309305

310306
Note that this is only one way, albeit the most convenient way, to make
@@ -349,49 +345,52 @@ If the :file:`.mo` file's magic number is invalid, the major version number is
349345
unexpected, or if other problems occur while reading the file, instantiating a
350346
:class:`GNUTranslations` class can raise :exc:`OSError`.
351347

352-
The following methods are overridden from the base class implementation:
353-
348+
.. class:: GNUTranslations
354349

355-
.. method:: GNUTranslations.gettext(message)
350+
The following methods are overridden from the base class implementation:
356351

357-
Look up the *message* id in the catalog and return the corresponding message
358-
string, as a Unicode string. If there is no entry in the catalog for the
359-
*message* id, and a fallback has been set, the look up is forwarded to the
360-
fallback's :meth:`gettext` method. Otherwise, the *message* id is returned.
352+
.. method:: gettext(message)
361353

354+
Look up the *message* id in the catalog and return the corresponding message
355+
string, as a Unicode string. If there is no entry in the catalog for the
356+
*message* id, and a fallback has been set, the look up is forwarded to the
357+
fallback's :meth:`~NullTranslations.gettext` method. Otherwise, the
358+
*message* id is returned.
362359

363-
.. method:: GNUTranslations.lgettext(message)
364360

365-
Equivalent to :meth:`gettext`, but the translation is returned as a
366-
bytestring encoded in the selected output charset, or in the preferred system
367-
encoding if no encoding was explicitly set with :meth:`set_output_charset`.
361+
.. method:: ngettext(singular, plural, n)
368362

363+
Do a plural-forms lookup of a message id. *singular* is used as the message id
364+
for purposes of lookup in the catalog, while *n* is used to determine which
365+
plural form to use. The returned message string is a Unicode string.
369366

370-
.. method:: GNUTranslations.ngettext(singular, plural, n)
367+
If the message id is not found in the catalog, and a fallback is specified,
368+
the request is forwarded to the fallback's :meth:`~NullTranslations.ngettext`
369+
method. Otherwise, when *n* is 1 *singular* is returned, and *plural* is
370+
returned in all other cases.
371371

372-
Do a plural-forms lookup of a message id. *singular* is used as the message id
373-
for purposes of lookup in the catalog, while *n* is used to determine which
374-
plural form to use. The returned message string is a Unicode string.
372+
Here is an example::
375373

376-
If the message id is not found in the catalog, and a fallback is specified, the
377-
request is forwarded to the fallback's :meth:`ngettext` method. Otherwise, when
378-
*n* is 1 *singular* is returned, and *plural* is returned in all other cases.
374+
n = len(os.listdir('.'))
375+
cat = GNUTranslations(somefile)
376+
message = cat.ngettext(
377+
'There is %(num)d file in this directory',
378+
'There are %(num)d files in this directory',
379+
n) % {'num': n}
379380

380-
Here is an example::
381381

382-
n = len(os.listdir('.'))
383-
cat = GNUTranslations(somefile)
384-
message = cat.ngettext(
385-
'There is %(num)d file in this directory',
386-
'There are %(num)d files in this directory',
387-
n) % {'num': n}
382+
.. method:: lgettext(message)
383+
.. method:: lngettext(singular, plural, n)
388384

385+
Equivalent to :meth:`.gettext` and :meth:`.ngettext`, but the translation
386+
is returned as a byte string encoded in the preferred system encoding
387+
if no encoding was explicitly set with
388+
:meth:`~NullTranslations.set_output_charset`.
389389

390-
.. method:: GNUTranslations.lngettext(singular, plural, n)
390+
.. warning::
391391

392-
Equivalent to :meth:`gettext`, but the translation is returned as a
393-
bytestring encoded in the selected output charset, or in the preferred system
394-
encoding if no encoding was explicitly set with :meth:`set_output_charset`.
392+
These methods should be avoided in Python 3. See the warning for the
393+
:func:`lgettext` function.
395394

396395

397396
Solaris message catalog support
@@ -509,7 +508,7 @@ module::
509508

510509
import gettext
511510
t = gettext.translation('spam', '/usr/share/locale')
512-
_ = t.lgettext
511+
_ = t.gettext
513512

514513

515514
Localizing your application

Lib/gettext.py

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,9 @@ def gettext(self, message):
270270
def lgettext(self, message):
271271
if self._fallback:
272272
return self._fallback.lgettext(message)
273-
return message
273+
if self._output_charset:
274+
return message.encode(self._output_charset)
275+
return message.encode(locale.getpreferredencoding())
274276

275277
def ngettext(self, msgid1, msgid2, n):
276278
if self._fallback:
@@ -284,9 +286,12 @@ def lngettext(self, msgid1, msgid2, n):
284286
if self._fallback:
285287
return self._fallback.lngettext(msgid1, msgid2, n)
286288
if n == 1:
287-
return msgid1
289+
tmsg = msgid1
288290
else:
289-
return msgid2
291+
tmsg = msgid2
292+
if self._output_charset:
293+
return tmsg.encode(self._output_charset)
294+
return tmsg.encode(locale.getpreferredencoding())
290295

291296
def info(self):
292297
return self._info
@@ -368,7 +373,7 @@ def _parse(self, fp):
368373
if mlen == 0:
369374
# Catalog description
370375
lastk = None
371-
for b_item in tmsg.split('\n'.encode("ascii")):
376+
for b_item in tmsg.split(b'\n'):
372377
item = b_item.decode().strip()
373378
if not item:
374379
continue
@@ -416,24 +421,24 @@ def lgettext(self, message):
416421
if tmsg is missing:
417422
if self._fallback:
418423
return self._fallback.lgettext(message)
419-
return message
424+
tmsg = message
420425
if self._output_charset:
421426
return tmsg.encode(self._output_charset)
422427
return tmsg.encode(locale.getpreferredencoding())
423428

424429
def lngettext(self, msgid1, msgid2, n):
425430
try:
426431
tmsg = self._catalog[(msgid1, self.plural(n))]
427-
if self._output_charset:
428-
return tmsg.encode(self._output_charset)
429-
return tmsg.encode(locale.getpreferredencoding())
430432
except KeyError:
431433
if self._fallback:
432434
return self._fallback.lngettext(msgid1, msgid2, n)
433435
if n == 1:
434-
return msgid1
436+
tmsg = msgid1
435437
else:
436-
return msgid2
438+
tmsg = msgid2
439+
if self._output_charset:
440+
return tmsg.encode(self._output_charset)
441+
return tmsg.encode(locale.getpreferredencoding())
437442

438443
def gettext(self, message):
439444
missing = object()
@@ -573,11 +578,11 @@ def dgettext(domain, message):
573578
return t.gettext(message)
574579

575580
def ldgettext(domain, message):
581+
codeset = _localecodesets.get(domain)
576582
try:
577-
t = translation(domain, _localedirs.get(domain, None),
578-
codeset=_localecodesets.get(domain))
583+
t = translation(domain, _localedirs.get(domain, None), codeset=codeset)
579584
except OSError:
580-
return message
585+
return message.encode(codeset or locale.getpreferredencoding())
581586
return t.lgettext(message)
582587

583588
def dngettext(domain, msgid1, msgid2, n):
@@ -592,14 +597,15 @@ def dngettext(domain, msgid1, msgid2, n):
592597
return t.ngettext(msgid1, msgid2, n)
593598

594599
def ldngettext(domain, msgid1, msgid2, n):
600+
codeset = _localecodesets.get(domain)
595601
try:
596-
t = translation(domain, _localedirs.get(domain, None),
597-
codeset=_localecodesets.get(domain))
602+
t = translation(domain, _localedirs.get(domain, None), codeset=codeset)
598603
except OSError:
599604
if n == 1:
600-
return msgid1
605+
tmsg = msgid1
601606
else:
602-
return msgid2
607+
tmsg = msgid2
608+
return tmsg.encode(codeset or locale.getpreferredencoding())
603609
return t.lngettext(msgid1, msgid2, n)
604610

605611
def gettext(message):

0 commit comments

Comments
 (0)