Skip to content

Commit 051b9d0

Browse files
authored
closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)
1 parent 76d5877 commit 051b9d0

File tree

11 files changed

+29772
-28737
lines changed

11 files changed

+29772
-28737
lines changed

Doc/library/stdtypes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -352,7 +352,7 @@ Notes:
352352
The numeric literals accepted include the digits ``0`` to ``9`` or any
353353
Unicode equivalent (code points with the ``Nd`` property).
354354

355-
See http://www.unicode.org/Public/12.1.0/ucd/extracted/DerivedNumericType.txt
355+
See http://www.unicode.org/Public/13.0.0/ucd/extracted/DerivedNumericType.txt
356356
for a complete list of code points with the ``Nd`` property.
357357

358358

Doc/library/unicodedata.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
This module provides access to the Unicode Character Database (UCD) which
1919
defines character properties for all Unicode characters. The data contained in
20-
this database is compiled from the `UCD version 12.1.0
21-
<http://www.unicode.org/Public/12.1.0/ucd>`_.
20+
this database is compiled from the `UCD version 13.0.0
21+
<http://www.unicode.org/Public/13.0.0/ucd>`_.
2222

2323
The module uses the same names and symbols as defined by Unicode
2424
Standard Annex #44, `"Unicode Character Database"
@@ -175,6 +175,6 @@ Examples:
175175

176176
.. rubric:: Footnotes
177177

178-
.. [#] http://www.unicode.org/Public/12.1.0/ucd/NameAliases.txt
178+
.. [#] http://www.unicode.org/Public/13.0.0/ucd/NameAliases.txt
179179
180-
.. [#] http://www.unicode.org/Public/12.1.0/ucd/NamedSequences.txt
180+
.. [#] http://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt

Doc/reference/lexical_analysis.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@ The Unicode category codes mentioned above stand for:
316316
* *Nd* - decimal numbers
317317
* *Pc* - connector punctuations
318318
* *Other_ID_Start* - explicit list of characters in `PropList.txt
319-
<http://www.unicode.org/Public/12.1.0/ucd/PropList.txt>`_ to support backwards
319+
<http://www.unicode.org/Public/13.0.0/ucd/PropList.txt>`_ to support backwards
320320
compatibility
321321
* *Other_ID_Continue* - likewise
322322

Doc/whatsnew/3.9.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,11 @@ types with context-specific metadata and new ``include_extras`` parameter to
372372
:func:`typing.get_type_hints` to access the metadata at runtime. (Contributed
373373
by Till Varoquaux and Konstantin Kashin.)
374374

375+
unicodedata
376+
-----------
377+
378+
The Unicode database has been updated to version 13.0.0. (:issue:`39926`).
379+
375380
venv
376381
----
377382

Lib/test/test_ucn.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ def test_cjk_unified_ideographs(self):
9999
self.checkletter("CJK UNIFIED IDEOGRAPH-2B734", "\U0002B734")
100100
self.checkletter("CJK UNIFIED IDEOGRAPH-2B740", "\U0002B740")
101101
self.checkletter("CJK UNIFIED IDEOGRAPH-2B81D", "\U0002B81D")
102+
self.checkletter("CJK UNIFIED IDEOGRAPH-3134A", "\U0003134A")
102103

103104
def test_bmp_characters(self):
104105
for code in range(0x10000):
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Update Unicode database to Unicode version 13.0.0.

Modules/unicodedata.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,13 +1031,14 @@ static int
10311031
is_unified_ideograph(Py_UCS4 code)
10321032
{
10331033
return
1034-
(0x3400 <= code && code <= 0x4DB5) || /* CJK Ideograph Extension A */
1035-
(0x4E00 <= code && code <= 0x9FEF) || /* CJK Ideograph */
1036-
(0x20000 <= code && code <= 0x2A6D6) || /* CJK Ideograph Extension B */
1034+
(0x3400 <= code && code <= 0x4DBF) || /* CJK Ideograph Extension A */
1035+
(0x4E00 <= code && code <= 0x9FFC) || /* CJK Ideograph */
1036+
(0x20000 <= code && code <= 0x2A6DD) || /* CJK Ideograph Extension B */
10371037
(0x2A700 <= code && code <= 0x2B734) || /* CJK Ideograph Extension C */
10381038
(0x2B740 <= code && code <= 0x2B81D) || /* CJK Ideograph Extension D */
10391039
(0x2B820 <= code && code <= 0x2CEA1) || /* CJK Ideograph Extension E */
1040-
(0x2CEB0 <= code && code <= 0x2EBEF); /* CJK Ideograph Extension F */
1040+
(0x2CEB0 <= code && code <= 0x2EBE0) || /* CJK Ideograph Extension F */
1041+
(0x30000 <= code && code <= 0x3134A); /* CJK Ideograph Extension G */
10411042
}
10421043

10431044
/* macros used to determine if the given code point is in the PUA range that

Modules/unicodedata_db.h

Lines changed: 2348 additions & 2247 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Modules/unicodename_db.h

Lines changed: 26826 additions & 25958 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Objects/unicodetype_db.h

Lines changed: 575 additions & 518 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Tools/unicode/makeunicodedata.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
# * Doc/library/stdtypes.rst, and
4545
# * Doc/library/unicodedata.rst
4646
# * Doc/reference/lexical_analysis.rst (two occurrences)
47-
UNIDATA_VERSION = "12.1.0"
47+
UNIDATA_VERSION = "13.0.0"
4848
UNICODE_DATA = "UnicodeData%s.txt"
4949
COMPOSITION_EXCLUSIONS = "CompositionExclusions%s.txt"
5050
EASTASIAN_WIDTH = "EastAsianWidth%s.txt"
@@ -100,13 +100,14 @@
100100

101101
# these ranges need to match unicodedata.c:is_unified_ideograph
102102
cjk_ranges = [
103-
('3400', '4DB5'),
104-
('4E00', '9FEF'),
105-
('20000', '2A6D6'),
103+
('3400', '4DBF'),
104+
('4E00', '9FFC'),
105+
('20000', '2A6DD'),
106106
('2A700', '2B734'),
107107
('2B740', '2B81D'),
108108
('2B820', '2CEA1'),
109109
('2CEB0', '2EBE0'),
110+
('30000', '3134A'),
110111
]
111112

112113

0 commit comments

Comments
 (0)