Skip to content

Commit fd1e477

Browse files
authored
closes gh-96734: Update to Unicode 15.0.0. (GH-96809)
1 parent 69d9a08 commit fd1e477

File tree

11 files changed

+27467
-27174
lines changed

11 files changed

+27467
-27174
lines changed

Doc/library/stdtypes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -353,7 +353,7 @@ Notes:
353353
The numeric literals accepted include the digits ``0`` to ``9`` or any
354354
Unicode equivalent (code points with the ``Nd`` property).
355355

356-
See https://www.unicode.org/Public/14.0.0/ucd/extracted/DerivedNumericType.txt
356+
See https://www.unicode.org/Public/15.0.0/ucd/extracted/DerivedNumericType.txt
357357
for a complete list of code points with the ``Nd`` property.
358358

359359

Doc/library/unicodedata.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717

1818
This module provides access to the Unicode Character Database (UCD) which
1919
defines character properties for all Unicode characters. The data contained in
20-
this database is compiled from the `UCD version 14.0.0
21-
<https://www.unicode.org/Public/14.0.0/ucd>`_.
20+
this database is compiled from the `UCD version 15.0.0
21+
<https://www.unicode.org/Public/15.0.0/ucd>`_.
2222

2323
The module uses the same names and symbols as defined by Unicode
2424
Standard Annex #44, `"Unicode Character Database"
@@ -175,6 +175,6 @@ Examples:
175175

176176
.. rubric:: Footnotes
177177

178-
.. [#] https://www.unicode.org/Public/14.0.0/ucd/NameAliases.txt
178+
.. [#] https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt
179179
180-
.. [#] https://www.unicode.org/Public/14.0.0/ucd/NamedSequences.txt
180+
.. [#] https://www.unicode.org/Public/15.0.0/ucd/NamedSequences.txt

Doc/reference/lexical_analysis.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -315,16 +315,16 @@ The Unicode category codes mentioned above stand for:
315315
* *Nd* - decimal numbers
316316
* *Pc* - connector punctuations
317317
* *Other_ID_Start* - explicit list of characters in `PropList.txt
318-
<https://www.unicode.org/Public/14.0.0/ucd/PropList.txt>`_ to support backwards
318+
<https://www.unicode.org/Public/15.0.0/ucd/PropList.txt>`_ to support backwards
319319
compatibility
320320
* *Other_ID_Continue* - likewise
321321

322322
All identifiers are converted into the normal form NFKC while parsing; comparison
323323
of identifiers is based on NFKC.
324324

325325
A non-normative HTML file listing all valid identifier characters for Unicode
326-
14.0.0 can be found at
327-
https://www.unicode.org/Public/14.0.0/ucd/DerivedCoreProperties.txt
326+
15.0.0 can be found at
327+
https://www.unicode.org/Public/15.0.0/ucd/DerivedCoreProperties.txt
328328

329329

330330
.. _keywords:
@@ -1013,4 +1013,4 @@ occurrence outside string literals and comments is an unconditional error:
10131013
10141014
.. rubric:: Footnotes
10151015

1016-
.. [#] https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt
1016+
.. [#] https://www.unicode.org/Public/15.0.0/ucd/NameAliases.txt

Doc/whatsnew/3.12.rst

Lines changed: 6 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,4 @@
11

2-
****************************
3-
What's New In Python 3.12
4-
****************************
5-
6-
:Release: |release|
7-
:Date: |today|
8-
9-
.. Rules for maintenance:
10-
11-
* Anyone can add text to this document. Do not spend very much time
12-
on the wording of your changes, because your text will probably
13-
get rewritten to some degree.
14-
15-
* The maintainer will go through Misc/NEWS periodically and add
16-
changes; it's therefore more important to add your changes to
17-
Misc/NEWS than to this file.
18-
19-
* This is not a complete list of every single change; completeness
20-
is the purpose of Misc/NEWS. Some changes I consider too small
21-
or esoteric to include. If such a change is added to the text,
22-
I'll just remove it. (This is another reason you shouldn't spend
23-
too much time on writing your addition.)
24-
25-
* If you want to draw your new text to the attention of the
26-
maintainer, add 'XXX' to the beginning of the paragraph or
27-
section.
28-
29-
* It's OK to just add a fragmentary note about a change. For
30-
example: "XXX Describe the transmogrify() function added to the
31-
socket module." The maintainer will research the change and
32-
write the necessary text.
33-
34-
* You can comment out your additions if you like, but it's not
35-
necessary (especially when a final release is some months away).
36-
37-
* Credit the author of a patch or bugfix. Just the name is
38-
sufficient; the e-mail address isn't necessary.
39-
40-
* It's helpful to add the bug/patch number as a comment:
41-
42-
XXX Describe the transmogrify() function added to the socket
43-
module.
44-
(Contributed by P.Y. Developer in :issue:`12345`.)
45-
46-
This saves the maintainer the effort of going through the Mercurial log
47-
when researching a change.
48-
49-
This article explains the new features in Python 3.12, compared to 3.11.
50-
51-
For full details, see the :ref:`changelog <changelog>`.
52-
53-
.. note::
54-
55-
Prerelease users should be aware that this document is currently in draft
56-
form. It will be updated substantially as Python 3.12 moves towards release,
57-
so it's worth checking back even after reading earlier versions.
58-
59-
60-
Summary -- Release highlights
61-
=============================
62-
63-
.. This section singles out the most important changes in Python 3.12.
64-
Brevity is key.
65-
66-
67-
.. PEP-sized items next.
68-
69-
Important deprecations, removals or restrictions:
70-
71-
* :pep:`623`, Remove wstr from Unicode
722

733

744
New Features
@@ -147,6 +77,12 @@ threading
14777
profiling functions in all running threads in addition to the calling one.
14878
(Contributed by Pablo Galindo in :gh:`93503`.)
14979

80+
unicodedata
81+
-----------
82+
83+
* The Unicode database has been updated to version 15.0.0. (Contributed by
84+
Benjamin Peterson in :gh:`96734`).
85+
15086

15187
Optimizations
15288
=============

Lib/test/test_unicodedata.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
class UnicodeMethodsTest(unittest.TestCase):
1919

2020
# update this, if the database changes
21-
expectedchecksum = '4739770dd4d0e5f1b1677accfc3552ed3c8ef326'
21+
expectedchecksum = 'e708c31c0d51f758adf475cb7201cf80917362be'
2222

2323
@requires_resource('cpu')
2424
def test_method_checksum(self):
@@ -71,7 +71,7 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest):
7171

7272
# Update this if the database changes. Make sure to do a full rebuild
7373
# (e.g. 'make distclean && make') to get the correct checksum.
74-
expectedchecksum = '4975f3ec0acd4a62465d18c9bf8519b1964181f6'
74+
expectedchecksum = '84b88a89f40aeae96852732f9dc0ee08be49780f'
7575

7676
@requires_resource('cpu')
7777
def test_function_checksum(self):
@@ -224,7 +224,7 @@ def test_east_asian_width(self):
224224
def test_east_asian_width_unassigned(self):
225225
eaw = self.db.east_asian_width
226226
# unassigned
227-
for char in '\u0530\u0ece\u10c6\u20fc\uaaca\U000107bd\U000115f2':
227+
for char in '\u0530\u0ecf\u10c6\u20fc\uaaca\U000107bd\U000115f2':
228228
self.assertEqual(eaw(char), 'N')
229229
self.assertIs(self.db.name(char, None), None)
230230

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Update :mod:`unicodedata` database to Unicode 15.0.0.

Modules/unicodedata.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1046,11 +1046,12 @@ is_unified_ideograph(Py_UCS4 code)
10461046
(0x3400 <= code && code <= 0x4DBF) || /* CJK Ideograph Extension A */
10471047
(0x4E00 <= code && code <= 0x9FFF) || /* CJK Ideograph */
10481048
(0x20000 <= code && code <= 0x2A6DF) || /* CJK Ideograph Extension B */
1049-
(0x2A700 <= code && code <= 0x2B738) || /* CJK Ideograph Extension C */
1049+
(0x2A700 <= code && code <= 0x2B739) || /* CJK Ideograph Extension C */
10501050
(0x2B740 <= code && code <= 0x2B81D) || /* CJK Ideograph Extension D */
10511051
(0x2B820 <= code && code <= 0x2CEA1) || /* CJK Ideograph Extension E */
10521052
(0x2CEB0 <= code && code <= 0x2EBE0) || /* CJK Ideograph Extension F */
1053-
(0x30000 <= code && code <= 0x3134A); /* CJK Ideograph Extension G */
1053+
(0x30000 <= code && code <= 0x3134A) || /* CJK Ideograph Extension G */
1054+
(0x31350 <= code && code <= 0x323AF); /* CJK Ideograph Extension H */
10541055
}
10551056

10561057
/* macros used to determine if the given code point is in the PUA range that

0 commit comments

Comments
 (0)