-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Implement SE-0221: Character Properties #20520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
Provide convenience properties on Character.
b02bc87
to
a73f49e
Compare
@swift-ci please test |
Build failed |
Build failed |
Build failed |
@swift-ci please test |
@airspeedswift The following changes were made between #15880 and #20520. stdlib/public/core/CharacterProperties.swift @inlinable
internal var _isSingleScalar: Bool {
- return self.unicodeScalars.count == 1
+ return self.unicodeScalars.index(
+ after: self.unicodeScalars.startIndex
+ ) == self.unicodeScalars.endIndex
}
- @inlinable
- static internal var _crlf: Character { return "\r\n" }
-
- @inlinable
- static internal var _lf: Character { return "\n" }
@inlinable
public var asciiValue: UInt8? {
- if _slowPath(self == ._crlf) { return 0x000A /* LINE FEED (LF) */ }
+ if _slowPath(self == "\r\n") { return 0x000A /* LINE FEED (LF) */ }
if _slowPath(!_isSingleScalar || _firstScalar.value >= 0x80) { return nil }
return UInt8(_firstScalar.value)
} test/stdlib/CharacterProperties.swift + expectTrue(Character("\r\n").isASCII)
+ expectEqual(Character("\n").asciiValue, Character("\r\n").asciiValue)
+ expectFalse(Character("⅚").isASCII)
+ expectFalse(Character("“").isASCII)
+ expectFalse(Character("e\u{301}").isASCII)
+ expectNil(Character("㊈").hexDigitValue)
+ expectNil(Character("Ⅴ").hexDigitValue)
+ expectNil(Character("7\u{301}").hexDigitValue)
+ expectFalse(Character("A").isNumber)
+ expectFalse(Character("Z").isNumber)
+ expectFalse(Character("7\u{301}").isWholeNumber)
+ expectEqual("ß", Character("ẞ").lowercased()) This addresses the previous review comments (except for benchmarks). |
The test case fails on my Red Hat Fedora Linux 29 box. Specifically, the following line:
Strangely, What can I do to help debug this? Also, isn't "ß" known to be an exception to the rule of uppercase/lowercase? This seems like more of a test of ICU than of the character properties, but I'm probably out of my depth. |
@davezarzycki I suggested that test in #15880 (comment). ẞ (U+1E9E LATIN CAPITAL LETTER SHARP S) was assigned ten years ago (in Unicode 5.1). Although it might not have been used officially until last year.
The test could either be removed, or conditionally compiled only for Apple platforms. I assume the latest Fedora Linux has a recent ICU library. SR-8876 builds ICU, but is that only for Ubuntu Linux? |
The latest Fedora 29 was released just a few weeks ago, and it uses ICU 62.1. Is that not new enough? In any case, it would be a shame if Swift couldn't use the system ICU just because of the Swift test suite. Is this specific test line actually necessary? |
ICU 62.1 is new enough, but there could be a problem with its data files. Or could the system fonts of Fedora Linux be displaying the wrong glyph?
"Ṿ" is U+1E7E (LATIN CAPITAL LETTER V WITH DOT BELOW). There are some Unicode scalars which can be mapped to lowercase by adding 0x20. But in this case, it is subtracting 0x20 from U+1E9E.
No, but you might have to wait until next week for someone to review a pull request. |
Via
|
Hi @milseman – Ping. So what's the right thing to do here? |
That seems like an actual bug in either String or ICU. @davezarzycki, what does the following program give you?
For both 4.2 and master on Darwin, I get the same result. Does the result differ between master and 4.2? |
As a preface, Safari/macOS 10.14.2 Beta (18C48a) doesn't like Nevertheless, a freshly built copy of master on Red Hat Fedora 29 generates I'm trying to figure out a way to test 4.2 on Fedora 29. The Ubuntu 18.04 package from swift.org doesn't work because it wants libedit 2.x, but Fedora 29 uses libedit 3.1. While I'm sure I could figure out how to checkout 4.2 and build it, I'm not sure I have the time given I'm also caring for a infant. Is there anything else I can do to help you debug this on top-of-tree? EDIT – With careful use of the Unicode character picker, I was able to update the comment. Whether SS or |
Frustrating. Ok, so lowering this down a level, can you try this: (swift) let lower = ("\u{1E9E}" as Unicode.Scalar).properties.lowercaseMapping
// lower : String = "ß"
(swift) lower.unicodeScalars.map { "0x\(String($0.value, radix: 16, uppercase: true))" }
// r0 : [String] = ["0xDF"] That's as close as public API gets to directly servicing ICU's answer. |
|
Alright, this seems like a clear-cut bug in ICU. It seems like 62.1 specifically had a number of case-conversion issues. Are you able to use a more recent ICU or do you need to use the one shipped with your system? If the latter, how long until it gets updated? We can figure out how we want to disable the test based on that. |
I'd like to keep using the system ICU for the time being and I don't know [Red Hat] Fedora's timetable for updating ICU. If I may, I don't think that the Swift test suite should duplicate ICU testing unless Swift is working around ICU bugs. Can Swift "#ifdef ICU > XX.Y" around the ß test, or just drop the ß test entirely? Why is the ß test needed? EDIT – By the way, Fedora Linux is fairly aggressive about updating packages. I'm surprised that ICU in Fedora 29, released just weeks ago isn't new enough. Nevertheless, it looks like the fix is in 62.2, so Fedora will probably pick it up soonish. I think my original point about duplicating ICU testing is still valid though. |
Regarding duplicating ICU's testing in general: we have fast-paths and other code in the way prior to consulting ICU and might be adding even more in the future, so testing these things is very useful. Also, we don't have to use ICU, and long-term I think the Swift project would be much better off without ICU as a dependency, using a more efficient native implementation based directly on the data tables. Having a history of regression tests of behavior is useful. For any specific test case, I'm indifferent. We should disable this test for you; I don't think we have availability based on ICU version number. We can also guard it for Darwin platforms only. |
Opened #20780. Would you want this cherry-picked into the 5.0 release? |
Provide convenience properties on Character.
Prior PR: #15880