-
Notifications
You must be signed in to change notification settings - Fork 1.2k
String bridging and internal NUL [SR-2225], take four #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,15 +29,20 @@ extension String : _ObjectTypeBridgeable { | |
result = source._storage | ||
} else if type(of: source) == _NSCFString.self { | ||
let cf = unsafeBitCast(source, to: CFString.self) | ||
let str = CFStringGetCStringPtr(cf, CFStringEncoding(kCFStringEncodingUTF8)) | ||
if str != nil { | ||
result = String(cString: str!) | ||
let length = CFStringGetLength(cf) // This value is not always the length in characters, in spite of what the documentation says | ||
|
||
// FIXME: If we had a reliable way to determine the length of `cf` | ||
// in bytes, then we wouldn't have to allocate a new buffer | ||
// if `CFStringGetCStringPtr(_:_:)` doesn't return nil. | ||
if let buffer = CFStringGetCharactersPtr(cf) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The recommended pattern is documented here: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I write above in the PR description, this recommended pattern cannot be used because |
||
result = String._fromCodeUnitSequence(UTF16.self, input: UnsafeBufferPointer(start: buffer, count: length)) | ||
} else { | ||
let length = CFStringGetLength(cf) | ||
let buffer = UnsafeMutablePointer<UniChar>.allocate(capacity: length) | ||
CFStringGetCharacters(cf, CFRangeMake(0, length), buffer) | ||
|
||
let str = String._fromCodeUnitSequence(UTF16.self, input: UnsafeBufferPointer(start: buffer, count: length)) | ||
// Retrieving Unicode characters is unreliable; instead retrieve UTF8-encoded bytes | ||
let max = CFStringGetMaximumSizeForEncoding(length, CFStringEncoding(kCFStringEncodingUTF8)) | ||
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: max) | ||
var count: CFIndex = -1 | ||
CFStringGetBytes(cf, CFRangeMake(0, length), CFStringEncoding(kCFStringEncodingUTF8), 0, false, buffer, max, &count) | ||
let str = String._fromCodeUnitSequence(UTF8.self, input: UnsafeBufferPointer(start: buffer, count: count)) | ||
buffer.deinitialize(count: length) | ||
buffer.deallocate(capacity: length) | ||
result = str | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1373,8 +1373,8 @@ extension TestJSONSerialization { | |
outputStream.open() | ||
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream, options: []) | ||
outputStream.close() | ||
if(result > -1) { | ||
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}") | ||
if result > -1 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's leave these diffs out of this PR for clarity.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The parentheses here are trivial, but the change in the subsequent line is required for the test to pass as the test itself is buggy. We discussed this in #604 a year ago. |
||
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0") | ||
} | ||
} catch { | ||
XCTFail("Error thrown: \(error)") | ||
|
@@ -1390,15 +1390,15 @@ extension TestJSONSerialization { | |
outputStream?.open() | ||
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream!, options: []) | ||
outputStream?.close() | ||
if(result > -1) { | ||
if result > -1 { | ||
let fileStream: InputStream = InputStream(fileAtPath: filePath!)! | ||
var buffer = [UInt8](repeating: 0, count: 20) | ||
fileStream.open() | ||
if fileStream.hasBytesAvailable { | ||
let resultRead: Int = fileStream.read(&buffer, maxLength: buffer.count) | ||
fileStream.close() | ||
if(resultRead > -1){ | ||
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}") | ||
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0") | ||
} | ||
} | ||
removeTestFile(filePath!) | ||
|
@@ -1419,8 +1419,8 @@ extension TestJSONSerialization { | |
do { | ||
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream, options: []) | ||
outputStream.close() | ||
if(result > -1) { | ||
XCTAssertNotEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}") | ||
if result > -1 { | ||
XCTAssertNotEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0") | ||
} | ||
} catch { | ||
XCTFail("Error occurred while writing to stream") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the disconnect here comes from the Foundation definition of characters (UTF16 code points) vs the one Swift uses (grapheme clusters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rebase of something I worked on over a year ago, so memories are no longer fresh, but here's what I recall from that time:
CFStringGetLength
is simply buggy on its own terms. That is, it's not a UTF16 vs. grapheme clusters issue. It's thatCFStringGetLength
returns length in bytes if the underlying storage is UTF8-based, which is plainly contrary to its documented behavior. This buggy behavior is convenient in that we could actually rely on it for a bridging fast path here that avoids allocating yet another buffer, but then we would be relying on the buggy behavior ofCFStringGetLength
never being fixed. That seems unwise to me, and I have avoided doing so here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the behavior on Darwin, can you provide an example of how this is failing or behaving differently on Linux? if so it is definitely a more serious bug than you are attempting to address here and we should fix that first before hand imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to take me a bit of time to reconstruct the test cases that led to that conclusion when I last looked into this in 2016. However, it was very much the same behavior on Darwin and on Linux at the time. The gist of how I came to that conclusion was as follows.
The existing bridging code is:
(A) and (B) clearly blow away any internal NUL characters. By contrast, (C) does not. Removing (A) and (B) should result in inefficient but correct bridging.
However, removing (A) and (B), leaving (C) alone, actually caused a bunch of Foundation tests to fail, even on macOS. When I looked into the cause of the failures, I found that in certain circumstances when
CFStringGetCStringPtr
doesn't returnnil
,CFStringGetLength
andCFStringGetCharacters
return incorrect results that causeresult
to be a clobbered version ofsource
. Those circumstances only occurred whensource
had multibyte characters.