Skip to content

String bridging and internal NUL [SR-2225], take four #1253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions Foundation/NSString.swift
Original file line number Diff line number Diff line change
Expand Up @@ -1188,10 +1188,8 @@ extension NSString {
if data.isEmpty {
self.init("")
} else {
guard let cf = data.withUnsafeBytes({ (bytes: UnsafePointer<UInt8>) -> CFString? in
return CFStringCreateWithBytes(kCFAllocatorDefault, bytes, data.count, CFStringConvertNSStringEncodingToEncoding(encoding), true)
}) else { return nil }

guard let cf = CFStringCreateFromExternalRepresentation(kCFAllocatorDefault, data._cfObject, CFStringConvertNSStringEncodingToEncoding(encoding)) else { return nil }

var str: String?
if String._conditionallyBridgeFromObjectiveC(cf._nsObject, result: &str) {
self.init(str!)
Expand Down
21 changes: 13 additions & 8 deletions Foundation/String.swift
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,20 @@ extension String : _ObjectTypeBridgeable {
result = source._storage
} else if type(of: source) == _NSCFString.self {
let cf = unsafeBitCast(source, to: CFString.self)
let str = CFStringGetCStringPtr(cf, CFStringEncoding(kCFStringEncodingUTF8))
if str != nil {
result = String(cString: str!)
let length = CFStringGetLength(cf) // This value is not always the length in characters, in spite of what the documentation says
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the disconnect here comes from the Foundation definition of characters (UTF16 code points) vs the one Swift uses (grapheme clusters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rebase of something I worked on over a year ago, so memories are no longer fresh, but here's what I recall from that time:

CFStringGetLength is simply buggy on its own terms. That is, it's not a UTF16 vs. grapheme clusters issue. It's that CFStringGetLength returns length in bytes if the underlying storage is UTF8-based, which is plainly contrary to its documented behavior. This buggy behavior is convenient in that we could actually rely on it for a bridging fast path here that avoids allocating yet another buffer, but then we would be relying on the buggy behavior of CFStringGetLength never being fixed. That seems unwise to me, and I have avoided doing so here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not the behavior on Darwin, can you provide an example of how this is failing or behaving differently on Linux? if so it is definitely a more serious bug than you are attempting to address here and we should fix that first before hand imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to take me a bit of time to reconstruct the test cases that led to that conclusion when I last looked into this in 2016. However, it was very much the same behavior on Darwin and on Linux at the time. The gist of how I came to that conclusion was as follows.

The existing bridging code is:

// ...
        else if type(of: source) == _NSCFString.self {
            let cf = unsafeBitCast(source, to: CFString.self)
            let str = CFStringGetCStringPtr(cf, CFStringEncoding(kCFStringEncodingUTF8)) // (A)
            if str != nil { // (B)
                result = String(cString: str!)
            } else { // (C)
                let length = CFStringGetLength(cf)
                let buffer = UnsafeMutablePointer<UniChar>.allocate(capacity: length)
                CFStringGetCharacters(cf, CFRangeMake(0, length), buffer)
                
                let str = String._fromCodeUnitSequence(UTF16.self, input: UnsafeBufferPointer(start: buffer, count: length))
                buffer.deinitialize(count: length)
                buffer.deallocate(capacity: length)
                result = str
            }
        }

(A) and (B) clearly blow away any internal NUL characters. By contrast, (C) does not. Removing (A) and (B) should result in inefficient but correct bridging.

However, removing (A) and (B), leaving (C) alone, actually caused a bunch of Foundation tests to fail, even on macOS. When I looked into the cause of the failures, I found that in certain circumstances when CFStringGetCStringPtr doesn't return nil, CFStringGetLength and CFStringGetCharacters return incorrect results that cause result to be a clobbered version of source. Those circumstances only occurred when source had multibyte characters.


// FIXME: If we had a reliable way to determine the length of `cf`
// in bytes, then we wouldn't have to allocate a new buffer
// if `CFStringGetCStringPtr(_:_:)` doesn't return nil.
if let buffer = CFStringGetCharactersPtr(cf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically:

CFStringRef str;
const UniChar *chars;
 
str = CFStringCreateWithCString(NULL, "Hello World", kCFStringEncodingMacRoman);
chars = CFStringGetCharactersPtr(str);
if (chars == NULL) {
    CFIndex length = CFStringGetLength(str);
    UniChar *buffer = malloc(length * sizeof(UniChar));
    CFStringGetCharacters(str, CFRangeMake(0, length), buffer);
    // Process the characters...
    free(buffer);
}

Copy link
Contributor Author

@xwu xwu Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I write above in the PR description, this recommended pattern cannot be used because CFStringGetCharacters is buggy and returns bytes instead of UTF16 characters if the underlying storage is UTF8-based; attempting to use the recommended pattern clobbers multibyte Unicode strings. We must here check if CFStringGetCharactersPtr returns NULL, and if so, retrieve UTF8-encoded bytes instead.

result = String._fromCodeUnitSequence(UTF16.self, input: UnsafeBufferPointer(start: buffer, count: length))
} else {
let length = CFStringGetLength(cf)
let buffer = UnsafeMutablePointer<UniChar>.allocate(capacity: length)
CFStringGetCharacters(cf, CFRangeMake(0, length), buffer)

let str = String._fromCodeUnitSequence(UTF16.self, input: UnsafeBufferPointer(start: buffer, count: length))
// Retrieving Unicode characters is unreliable; instead retrieve UTF8-encoded bytes
let max = CFStringGetMaximumSizeForEncoding(length, CFStringEncoding(kCFStringEncodingUTF8))
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: max)
var count: CFIndex = -1
CFStringGetBytes(cf, CFRangeMake(0, length), CFStringEncoding(kCFStringEncodingUTF8), 0, false, buffer, max, &count)
let str = String._fromCodeUnitSequence(UTF8.self, input: UnsafeBufferPointer(start: buffer, count: count))
buffer.deinitialize(count: length)
buffer.deallocate(capacity: length)
result = str
Expand Down
12 changes: 6 additions & 6 deletions TestFoundation/TestJSONSerialization.swift
Original file line number Diff line number Diff line change
Expand Up @@ -1373,8 +1373,8 @@ extension TestJSONSerialization {
outputStream.open()
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream, options: [])
outputStream.close()
if(result > -1) {
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}")
if result > -1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave these diffs out of this PR for clarity.

Copy link
Contributor Author

@xwu xwu Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parentheses here are trivial, but the change in the subsequent line is required for the test to pass as the test itself is buggy. We discussed this in #604 a year ago.

XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0")
}
} catch {
XCTFail("Error thrown: \(error)")
Expand All @@ -1390,15 +1390,15 @@ extension TestJSONSerialization {
outputStream?.open()
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream!, options: [])
outputStream?.close()
if(result > -1) {
if result > -1 {
let fileStream: InputStream = InputStream(fileAtPath: filePath!)!
var buffer = [UInt8](repeating: 0, count: 20)
fileStream.open()
if fileStream.hasBytesAvailable {
let resultRead: Int = fileStream.read(&buffer, maxLength: buffer.count)
fileStream.close()
if(resultRead > -1){
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}")
XCTAssertEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0")
}
}
removeTestFile(filePath!)
Expand All @@ -1419,8 +1419,8 @@ extension TestJSONSerialization {
do {
let result = try JSONSerialization.writeJSONObject(dict, toStream: outputStream, options: [])
outputStream.close()
if(result > -1) {
XCTAssertNotEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}")
if result > -1 {
XCTAssertNotEqual(NSString(bytes: buffer, length: buffer.count, encoding: String.Encoding.utf8.rawValue), "{\"a\":{\"b\":1}}\0\0\0\0\0\0\0")
}
} catch {
XCTFail("Error occurred while writing to stream")
Expand Down
14 changes: 11 additions & 3 deletions TestFoundation/TestNSString.swift
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ class TestNSString : XCTestCase {
("test_FromNullTerminatedCStringInUTF8", test_FromNullTerminatedCStringInUTF8 ),
// Swift3 updates broke the expectations of this test. disabling for now
// ("test_FromMalformedNullTerminatedCStringInUTF8", test_FromMalformedNullTerminatedCStringInUTF8 ),
("test_FromDataWithInternalNullCharacter", test_FromDataWithInternalNullCharacter ),
("test_uppercaseString", test_uppercaseString ),
("test_lowercaseString", test_lowercaseString ),
("test_capitalizedString", test_capitalizedString ),
Expand All @@ -75,7 +76,7 @@ class TestNSString : XCTestCase {
("test_FromContentOfFile",test_FromContentOfFile),
("test_swiftStringUTF16", test_swiftStringUTF16),
// This test takes forever on build servers; it has been seen up to 1852.084 seconds
// ("test_completePathIntoString", test_completePathIntoString),
// ("test_completePathIntoString", test_completePathIntoString),
("test_stringByTrimmingCharactersInSet", test_stringByTrimmingCharactersInSet),
("test_initializeWithFormat", test_initializeWithFormat),
("test_initializeWithFormat2", test_initializeWithFormat2),
Expand Down Expand Up @@ -285,6 +286,14 @@ class TestNSString : XCTestCase {
XCTAssertNil(string)
}

func test_FromDataWithInternalNullCharacter() {
let bytes = mockASCIIStringBytes + [0x00] + mockASCIIStringBytes
let length = mockASCIIStringBytes.count * 2 + 1
let data = Data(bytes: bytes, count: length)
let string = NSString(data: data, encoding: String.Encoding.ascii.rawValue)
XCTAssertTrue(string?.isEqual(to: mockASCIIString + "\0" + mockASCIIString) ?? false)
}

func test_FromContentsOfURL() {
guard let testFileURL = testBundle().url(forResource: "NSStringTestData", withExtension: "txt") else {
XCTFail("URL for NSStringTestData.txt is nil")
Expand Down Expand Up @@ -1215,8 +1224,7 @@ let comparisonTests = [
ComparisonTest("\r\n", "t"),
ComparisonTest("\r\n", "\n",
reason: "blocked on rdar://problem/19036555"),
ComparisonTest("\u{0}", "\u{0}\u{0}",
reason: "rdar://problem/19034601"),
ComparisonTest("\u{0}", "\u{0}\u{0}"),

// Whitespace
// U+000A LINE FEED (LF)
Expand Down
8 changes: 4 additions & 4 deletions TestFoundation/TestStream.swift
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class TestStream : XCTestCase {
XCTAssertEqual(Stream.Status.notOpen, dataStream.streamStatus)
dataStream.open()
XCTAssertEqual(Stream.Status.open, dataStream.streamStatus)
var buffer = [UInt8](repeating: 0, count: 20)
var buffer = [UInt8](repeating: 0, count: 17)
if dataStream.hasBytesAvailable {
let result: Int = dataStream.read(&buffer, maxLength: buffer.count)
dataStream.close()
Expand All @@ -62,7 +62,7 @@ class TestStream : XCTestCase {
XCTAssertEqual(Stream.Status.notOpen, urlStream.streamStatus)
urlStream.open()
XCTAssertEqual(Stream.Status.open, urlStream.streamStatus)
var buffer = [UInt8](repeating: 0, count: 20)
var buffer = [UInt8](repeating: 0, count: 17)
if urlStream.hasBytesAvailable {
let result :Int = urlStream.read(&buffer, maxLength: buffer.count)
urlStream.close()
Expand All @@ -89,7 +89,7 @@ class TestStream : XCTestCase {
XCTAssertEqual(Stream.Status.notOpen, fileStream.streamStatus)
fileStream.open()
XCTAssertEqual(Stream.Status.open, fileStream.streamStatus)
var buffer = [UInt8](repeating: 0, count: 20)
var buffer = [UInt8](repeating: 0, count: 17)
if fileStream.hasBytesAvailable {
let result: Int = fileStream.read(&buffer, maxLength: buffer.count)
fileStream.close()
Expand All @@ -110,7 +110,7 @@ class TestStream : XCTestCase {
let message: NSString = "Hello, playground"
let messageData: Data = message.data(using: String.Encoding.utf8.rawValue)!
let stream: InputStream = InputStream(data: messageData)
var buffer = [UInt8](repeating: 0, count: 20)
var buffer = [UInt8](repeating: 0, count: 17)
stream.open()
XCTAssertTrue(stream.hasBytesAvailable)
_ = stream.read(&buffer, maxLength: buffer.count)
Expand Down