-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[String] Add UTF-8 fast-paths for Foundation initializers #21959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -178,6 +178,10 @@ extension String { | |
/// Creates a string by copying the data from a given | ||
/// C array of UTF8-encoded bytes. | ||
public init?(utf8String bytes: UnsafePointer<CChar>) { | ||
if let str = String(validatingUTF8: bytes) { | ||
self = str | ||
return | ||
} | ||
if let ns = NSString(utf8String: bytes) { | ||
self = String._unconditionallyBridgeFromObjectiveC(ns) | ||
} else { | ||
|
@@ -202,12 +206,18 @@ extension String { | |
/// - Parameters: | ||
/// - bytes: A sequence of bytes to interpret using `encoding`. | ||
/// - encoding: The ecoding to use to interpret `bytes`. | ||
public init? <S: Sequence>(bytes: __shared S, encoding: Encoding) | ||
where S.Iterator.Element == UInt8 { | ||
public init?<S: Sequence>(bytes: __shared S, encoding: Encoding) | ||
where S.Iterator.Element == UInt8 { | ||
let byteArray = Array(bytes) | ||
if encoding == .utf8, | ||
let str = byteArray.withUnsafeBufferPointer({ String._tryFromUTF8($0) }) | ||
{ | ||
self = str | ||
return | ||
} | ||
|
||
if let ns = NSString( | ||
bytes: byteArray, length: byteArray.count, encoding: encoding.rawValue) { | ||
|
||
self = String._unconditionallyBridgeFromObjectiveC(ns) | ||
} else { | ||
return nil | ||
|
@@ -365,6 +375,10 @@ extension String { | |
cString: UnsafePointer<CChar>, | ||
encoding enc: Encoding | ||
) { | ||
if enc == .utf8, let str = String(validatingUTF8: cString) { | ||
self = str | ||
return | ||
} | ||
if let ns = NSString(cString: cString, encoding: enc.rawValue) { | ||
self = String._unconditionallyBridgeFromObjectiveC(ns) | ||
} else { | ||
|
@@ -381,6 +395,14 @@ extension String { | |
/// Returns a `String` initialized by converting given `data` into | ||
/// Unicode characters using a given `encoding`. | ||
public init?(data: __shared Data, encoding: Encoding) { | ||
if encoding == .utf8, | ||
let str = data.withUnsafeBytes({ | ||
String._tryFromUTF8($0.bindMemory(to: UInt8.self)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mm, in previous discussions, we've agreed that the safety of binding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to express the UTF-8 validation in terms of raw buffer pointers instead of typed? (Or overload to make this possible safely?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if this is something to be concerned with, but if it is: The fix would be for Data to provide a way to access its contents as a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not terribly concerned (as we've previously discussed — we can't prevent someone from doing something like reading arbitrary data bound to some non-trivial type There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI I encountered that issue with #22028
you will invoke a fast-path which calls There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any String utilities that we want to accept a raw byte stream, like Data, including validateUTF8 and _uncheckedFromUTF8 should either take URBP or some ContiguouslyStored protocol. Either will be source compatible with the existing APIs, so they can probably be fixed later. |
||
}) { | ||
self = str | ||
return | ||
} | ||
|
||
guard let s = NSString(data: data, encoding: encoding.rawValue) else { return nil } | ||
self = String._unconditionallyBridgeFromObjectiveC(s) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not remove the NSString-and-bridge part of this entirely?
Then we could also make it
inlineable
since it trivially forwards to the stdlib function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to preserve behavior if the bytes are invalidly encoded. We should probably deprecate this initializer.