Skip to content

Commit 2e07c37

Browse files
committed
Formalize Unicode block properties
Previously we only supported a subset of the Oniguruma spellings for these. Introduce them as an actual Unicode property with the key `blk` or `block`. Additionally, allow a non-Unicode shorthand syntax that uses the prefix `in`. This is supported by Oniguruma and Perl (though Perl discourages its usage). We may want to warn/error on it and suggest users switch to the more explicit form.
1 parent e8d780c commit 2e07c37

File tree

7 files changed

+1069
-347
lines changed

7 files changed

+1069
-347
lines changed

Sources/_RegexParser/Regex/AST/Atom.swift

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -441,13 +441,15 @@ extension AST.Atom.CharacterProperty {
441441

442442
/// Character age, as per UnicodeScalar.Properties.age.
443443
case age(major: Int, minor: Int)
444-
444+
445+
/// A block property.
446+
case block(Unicode.Block)
447+
445448
case posix(Unicode.POSIXProperty)
446449

447450
/// Some special properties implemented by PCRE and Oniguruma.
448451
case pcreSpecial(PCRESpecialCategory)
449-
case onigurumaSpecial(OnigurumaSpecialProperty)
450-
452+
451453
public enum MapKind: Hashable {
452454
case lowercase
453455
case uppercase

Sources/_RegexParser/Regex/Parse/CharacterPropertyClassification.swift

Lines changed: 352 additions & 6 deletions
Large diffs are not rendered by default.

Sources/_RegexParser/Regex/Parse/Diagnostics.swift

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ enum ParseError: Error, Hashable {
6262
case unknownProperty(key: String?, value: String)
6363
case unrecognizedScript(String)
6464
case unrecognizedCategory(String)
65+
case unrecognizedBlock(String)
6566
case invalidAge(String)
6667
case invalidNumericValue(String)
6768
case unrecognizedNumericType(String)
@@ -195,6 +196,8 @@ extension ParseError: CustomStringConvertible {
195196
return "unrecognized script '\(value)'"
196197
case .unrecognizedCategory(let value):
197198
return "unrecognized category '\(value)'"
199+
case .unrecognizedBlock(let value):
200+
return "unrecognized block '\(value)'"
198201
case .unrecognizedNumericType(let value):
199202
return "unrecognized numeric type '\(value)'"
200203
case .invalidAge(let value):

Sources/_RegexParser/Regex/Parse/Sema.swift

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ extension RegexValidator {
173173
break
174174
case .pcreSpecial:
175175
throw error(.unsupported("PCRE property"), at: loc)
176-
case .onigurumaSpecial:
176+
case .block:
177177
throw error(.unsupported("Unicode block property"), at: loc)
178178
}
179179
}

0 commit comments

Comments
 (0)