Skip to content

Commit c6bcb03

Browse files
committed
Update Unicode generator data files to 16
1 parent 8f6ad4c commit c6bcb03

23 files changed

+21040
-1157
lines changed

utils/gen-unicode-data/Data/15/ScriptExtensions.txt

Lines changed: 0 additions & 628 deletions
This file was deleted.

utils/gen-unicode-data/Data/15/Apple/DerivedCoreProperties.txt renamed to utils/gen-unicode-data/Data/16/Apple/DerivedCoreProperties.txt

Lines changed: 898 additions & 111 deletions
Large diffs are not rendered by default.

utils/gen-unicode-data/Data/15/Apple/UnicodeData.txt renamed to utils/gen-unicode-data/Data/16/Apple/UnicodeData.txt

Lines changed: 5203 additions & 11 deletions
Large diffs are not rendered by default.

utils/gen-unicode-data/Data/15/CaseFolding.txt renamed to utils/gen-unicode-data/Data/16/CaseFolding.txt

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# CaseFolding-15.0.0.txt
2-
# Date: 2022-02-02, 23:35:35 GMT
3-
# © 2022 Unicode®, Inc.
1+
# CaseFolding-16.0.0.txt
2+
# Date: 2024-04-30, 21:48:11 GMT
3+
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5-
# For terms of use, see https://www.unicode.org/terms_of_use.html
5+
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# Unicode Character Database
88
# For documentation, see https://www.unicode.org/reports/tr44/
@@ -603,6 +603,7 @@
603603
1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN
604604
1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT
605605
1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK
606+
1C89; C; 1C8A; # CYRILLIC CAPITAL LETTER TJE
606607
1C90; C; 10D0; # GEORGIAN MTAVRULI CAPITAL LETTER AN
607608
1C91; C; 10D1; # GEORGIAN MTAVRULI CAPITAL LETTER BAN
608609
1C92; C; 10D2; # GEORGIAN MTAVRULI CAPITAL LETTER GAN
@@ -929,6 +930,7 @@
929930
1FCC; S; 1FC3; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
930931
1FD2; F; 03B9 0308 0300; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA
931932
1FD3; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
933+
1FD3; S; 0390; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
932934
1FD6; F; 03B9 0342; # GREEK SMALL LETTER IOTA WITH PERISPOMENI
933935
1FD7; F; 03B9 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI
934936
1FD8; C; 1FD0; # GREEK CAPITAL LETTER IOTA WITH VRACHY
@@ -937,6 +939,7 @@
937939
1FDB; C; 1F77; # GREEK CAPITAL LETTER IOTA WITH OXIA
938940
1FE2; F; 03C5 0308 0300; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA
939941
1FE3; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
942+
1FE3; S; 03B0; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
940943
1FE4; F; 03C1 0313; # GREEK SMALL LETTER RHO WITH PSILI
941944
1FE6; F; 03C5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI
942945
1FE7; F; 03C5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI
@@ -1238,9 +1241,13 @@ A7C5; C; 0282; # LATIN CAPITAL LETTER S WITH HOOK
12381241
A7C6; C; 1D8E; # LATIN CAPITAL LETTER Z WITH PALATAL HOOK
12391242
A7C7; C; A7C8; # LATIN CAPITAL LETTER D WITH SHORT STROKE OVERLAY
12401243
A7C9; C; A7CA; # LATIN CAPITAL LETTER S WITH SHORT STROKE OVERLAY
1244+
A7CB; C; 0264; # LATIN CAPITAL LETTER RAMS HORN
1245+
A7CC; C; A7CD; # LATIN CAPITAL LETTER S WITH DIAGONAL STROKE
12411246
A7D0; C; A7D1; # LATIN CAPITAL LETTER CLOSED INSULAR G
12421247
A7D6; C; A7D7; # LATIN CAPITAL LETTER MIDDLE SCOTS S
12431248
A7D8; C; A7D9; # LATIN CAPITAL LETTER SIGMOID S
1249+
A7DA; C; A7DB; # LATIN CAPITAL LETTER LAMBDA
1250+
A7DC; C; 019B; # LATIN CAPITAL LETTER LAMBDA WITH STROKE
12441251
A7F5; C; A7F6; # LATIN CAPITAL LETTER REVERSED HALF H
12451252
AB70; C; 13A0; # CHEROKEE SMALL LETTER A
12461253
AB71; C; 13A1; # CHEROKEE SMALL LETTER E
@@ -1328,6 +1335,7 @@ FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL
13281335
FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI
13291336
FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL
13301337
FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T
1338+
FB05; S; FB06; # LATIN SMALL LIGATURE LONG S T
13311339
FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST
13321340
FB13; F; 0574 0576; # ARMENIAN SMALL LIGATURE MEN NOW
13331341
FB14; F; 0574 0565; # ARMENIAN SMALL LIGATURE MEN ECH
@@ -1522,6 +1530,28 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
15221530
10CB0; C; 10CF0; # OLD HUNGARIAN CAPITAL LETTER EZS
15231531
10CB1; C; 10CF1; # OLD HUNGARIAN CAPITAL LETTER ENT-SHAPED SIGN
15241532
10CB2; C; 10CF2; # OLD HUNGARIAN CAPITAL LETTER US
1533+
10D50; C; 10D70; # GARAY CAPITAL LETTER A
1534+
10D51; C; 10D71; # GARAY CAPITAL LETTER CA
1535+
10D52; C; 10D72; # GARAY CAPITAL LETTER MA
1536+
10D53; C; 10D73; # GARAY CAPITAL LETTER KA
1537+
10D54; C; 10D74; # GARAY CAPITAL LETTER BA
1538+
10D55; C; 10D75; # GARAY CAPITAL LETTER JA
1539+
10D56; C; 10D76; # GARAY CAPITAL LETTER SA
1540+
10D57; C; 10D77; # GARAY CAPITAL LETTER WA
1541+
10D58; C; 10D78; # GARAY CAPITAL LETTER LA
1542+
10D59; C; 10D79; # GARAY CAPITAL LETTER GA
1543+
10D5A; C; 10D7A; # GARAY CAPITAL LETTER DA
1544+
10D5B; C; 10D7B; # GARAY CAPITAL LETTER XA
1545+
10D5C; C; 10D7C; # GARAY CAPITAL LETTER YA
1546+
10D5D; C; 10D7D; # GARAY CAPITAL LETTER TA
1547+
10D5E; C; 10D7E; # GARAY CAPITAL LETTER RA
1548+
10D5F; C; 10D7F; # GARAY CAPITAL LETTER NYA
1549+
10D60; C; 10D80; # GARAY CAPITAL LETTER FA
1550+
10D61; C; 10D81; # GARAY CAPITAL LETTER NA
1551+
10D62; C; 10D82; # GARAY CAPITAL LETTER PA
1552+
10D63; C; 10D83; # GARAY CAPITAL LETTER HA
1553+
10D64; C; 10D84; # GARAY CAPITAL LETTER OLD KA
1554+
10D65; C; 10D85; # GARAY CAPITAL LETTER OLD NA
15251555
118A0; C; 118C0; # WARANG CITI CAPITAL LETTER NGAA
15261556
118A1; C; 118C1; # WARANG CITI CAPITAL LETTER A
15271557
118A2; C; 118C2; # WARANG CITI CAPITAL LETTER WI

utils/gen-unicode-data/Data/15/DerivedAge.txt renamed to utils/gen-unicode-data/Data/16/DerivedAge.txt

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# DerivedAge-15.0.0.txt
2-
# Date: 2022-04-26, 23:14:23 GMT
3-
# © 2022 Unicode®, Inc.
1+
# DerivedAge-16.0.0.txt
2+
# Date: 2024-04-30, 21:48:12 GMT
3+
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5-
# For terms of use, see https://www.unicode.org/terms_of_use.html
5+
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# Unicode Character Database
88
# For documentation, see https://www.unicode.org/reports/tr44/
@@ -1991,4 +1991,72 @@ FDFE..FDFF ; 14.0 # [2] ARABIC LIGATURE SUBHAANAHU WA TAAALAA..ARABIC LIGAT
19911991

19921992
# Total code points: 4489
19931993

1994+
# ================================================
1995+
1996+
# Age=V15_1
1997+
1998+
# Newly assigned in Unicode 15.1.0 (September, 2023)
1999+
2000+
2FFC..2FFF ; 15.1 # [4] IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION
2001+
31EF ; 15.1 # IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION
2002+
2EBF0..2EE5D ; 15.1 # [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2003+
2004+
# Total code points: 627
2005+
2006+
# ================================================
2007+
2008+
# Age=V16_0
2009+
2010+
# Newly assigned in Unicode 16.0.0 (September, 2024)
2011+
2012+
0897 ; 16.0 # ARABIC PEPET
2013+
1B4E..1B4F ; 16.0 # [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
2014+
1B7F ; 16.0 # BALINESE PANTI BAWAK
2015+
1C89..1C8A ; 16.0 # [2] CYRILLIC CAPITAL LETTER TJE..CYRILLIC SMALL LETTER TJE
2016+
2427..2429 ; 16.0 # [3] SYMBOL FOR DELETE SQUARE CHECKER BOARD FORM..SYMBOL FOR DELETE MEDIUM SHADE FORM
2017+
31E4..31E5 ; 16.0 # [2] CJK STROKE HXG..CJK STROKE SZP
2018+
A7CB..A7CD ; 16.0 # [3] LATIN CAPITAL LETTER RAMS HORN..LATIN SMALL LETTER S WITH DIAGONAL STROKE
2019+
A7DA..A7DC ; 16.0 # [3] LATIN CAPITAL LETTER LAMBDA..LATIN CAPITAL LETTER LAMBDA WITH STROKE
2020+
105C0..105F3 ; 16.0 # [52] TODHRI LETTER A..TODHRI LETTER OO
2021+
10D40..10D65 ; 16.0 # [38] GARAY DIGIT ZERO..GARAY CAPITAL LETTER OLD NA
2022+
10D69..10D85 ; 16.0 # [29] GARAY VOWEL SIGN E..GARAY SMALL LETTER OLD NA
2023+
10D8E..10D8F ; 16.0 # [2] GARAY PLUS SIGN..GARAY MINUS SIGN
2024+
10EC2..10EC4 ; 16.0 # [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
2025+
10EFC ; 16.0 # ARABIC COMBINING ALEF OVERLAY
2026+
11380..11389 ; 16.0 # [10] TULU-TIGALARI LETTER A..TULU-TIGALARI LETTER VOCALIC LL
2027+
1138B ; 16.0 # TULU-TIGALARI LETTER EE
2028+
1138E ; 16.0 # TULU-TIGALARI LETTER AI
2029+
11390..113B5 ; 16.0 # [38] TULU-TIGALARI LETTER OO..TULU-TIGALARI LETTER LLLA
2030+
113B7..113C0 ; 16.0 # [10] TULU-TIGALARI SIGN AVAGRAHA..TULU-TIGALARI VOWEL SIGN VOCALIC LL
2031+
113C2 ; 16.0 # TULU-TIGALARI VOWEL SIGN EE
2032+
113C5 ; 16.0 # TULU-TIGALARI VOWEL SIGN AI
2033+
113C7..113CA ; 16.0 # [4] TULU-TIGALARI VOWEL SIGN OO..TULU-TIGALARI SIGN CANDRA ANUNASIKA
2034+
113CC..113D5 ; 16.0 # [10] TULU-TIGALARI SIGN ANUSVARA..TULU-TIGALARI DOUBLE DANDA
2035+
113D7..113D8 ; 16.0 # [2] TULU-TIGALARI SIGN OM PUSHPIKA..TULU-TIGALARI SIGN SHRII PUSHPIKA
2036+
113E1..113E2 ; 16.0 # [2] TULU-TIGALARI VEDIC TONE SVARITA..TULU-TIGALARI VEDIC TONE ANUDATTA
2037+
116D0..116E3 ; 16.0 # [20] MYANMAR PAO DIGIT ZERO..MYANMAR EASTERN PWO KAREN DIGIT NINE
2038+
11BC0..11BE1 ; 16.0 # [34] SUNUWAR LETTER DEVI..SUNUWAR SIGN PVO
2039+
11BF0..11BF9 ; 16.0 # [10] SUNUWAR DIGIT ZERO..SUNUWAR DIGIT NINE
2040+
11F5A ; 16.0 # KAWI SIGN NUKTA
2041+
13460..143FA ; 16.0 # [3995] EGYPTIAN HIEROGLYPH-13460..EGYPTIAN HIEROGLYPH-143FA
2042+
16100..16139 ; 16.0 # [58] GURUNG KHEMA LETTER A..GURUNG KHEMA DIGIT NINE
2043+
16D40..16D79 ; 16.0 # [58] KIRAT RAI SIGN ANUSVARA..KIRAT RAI DIGIT NINE
2044+
18CFF ; 16.0 # KHITAN SMALL SCRIPT CHARACTER-18CFF
2045+
1CC00..1CCF9 ; 16.0 # [250] UP-POINTING GO-KART..OUTLINED DIGIT NINE
2046+
1CD00..1CEB3 ; 16.0 # [436] BLOCK OCTANT-3..BLACK RIGHT TRIANGLE CARET
2047+
1E5D0..1E5FA ; 16.0 # [43] OL ONAL LETTER O..OL ONAL DIGIT NINE
2048+
1E5FF ; 16.0 # OL ONAL ABBREVIATION SIGN
2049+
1F8B2..1F8BB ; 16.0 # [10] RIGHTWARDS ARROW WITH LOWER HOOK..SOUTH WEST ARROW FROM BAR
2050+
1F8C0..1F8C1 ; 16.0 # [2] LEFTWARDS ARROW FROM DOWNWARDS ARROW..RIGHTWARDS ARROW FROM DOWNWARDS ARROW
2051+
1FA89 ; 16.0 # HARP
2052+
1FA8F ; 16.0 # SHOVEL
2053+
1FABE ; 16.0 # LEAFLESS TREE
2054+
1FAC6 ; 16.0 # FINGERPRINT
2055+
1FADC ; 16.0 # ROOT VEGETABLE
2056+
1FADF ; 16.0 # SPLATTER
2057+
1FAE9 ; 16.0 # FACE WITH BAGS UNDER EYES
2058+
1FBCB..1FBEF ; 16.0 # [37] WHITE CROSS MARK..TOP LEFT JUSTIFIED LOWER RIGHT QUARTER BLACK CIRCLE
2059+
2060+
# Total code points: 5185
2061+
19942062
# EOF

utils/gen-unicode-data/Data/15/DerivedBinaryProperties.txt renamed to utils/gen-unicode-data/Data/16/DerivedBinaryProperties.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# DerivedBinaryProperties-15.0.0.txt
2-
# Date: 2022-02-26, 00:38:29 GMT
3-
# © 2022 Unicode®, Inc.
1+
# DerivedBinaryProperties-16.0.0.txt
2+
# Date: 2024-04-30, 21:48:15 GMT
3+
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5-
# For terms of use, see https://www.unicode.org/terms_of_use.html
5+
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# Unicode Character Database
88
# For documentation, see https://www.unicode.org/reports/tr44/
@@ -51,7 +51,7 @@
5151
225F..2260 ; Bidi_Mirrored # Sm [2] QUESTIONED EQUAL TO..NOT EQUAL TO
5252
2262 ; Bidi_Mirrored # Sm NOT IDENTICAL TO
5353
2264..226B ; Bidi_Mirrored # Sm [8] LESS-THAN OR EQUAL TO..MUCH GREATER-THAN
54-
226E..228C ; Bidi_Mirrored # Sm [31] NOT LESS-THAN..MULTISET
54+
226D..228C ; Bidi_Mirrored # Sm [32] NOT EQUIVALENT TO..MULTISET
5555
228F..2292 ; Bidi_Mirrored # Sm [4] SQUARE IMAGE OF..SQUARE ORIGINAL OF OR EQUAL TO
5656
2298 ; Bidi_Mirrored # Sm CIRCLED DIVISION SLASH
5757
22A2..22A3 ; Bidi_Mirrored # Sm [2] RIGHT TACK..LEFT TACK
@@ -236,6 +236,6 @@ FF63 ; Bidi_Mirrored # Pe HALFWIDTH RIGHT CORNER BRACKET
236236
1D789 ; Bidi_Mirrored # Sm MATHEMATICAL SANS-SERIF BOLD PARTIAL DIFFERENTIAL
237237
1D7C3 ; Bidi_Mirrored # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL
238238

239-
# Total code points: 553
239+
# Total code points: 554
240240

241241
# EOF

0 commit comments

Comments
 (0)