[SR-2225] String bridging and internal NUL (rebased) #604

xwu · 2016-08-29T01:06:59Z

(A rebased version of #549.)

What's going on in this PR

Existing bridging is buggy

The existing implementation of bridging from _NSCFString to String is broken in several ways:

The "fast path" blows away internal NUL characters because it uses facilities for C strings.
Merely removing the "fast path" causes tests to fail. The underlying reason for the failing tests is that the default "slow path" is also broken.
The "slow path" uses CFStringGetLength() and CFStringGetCharacters(). However, if the CFString in question has been initialized using bytes and contains multi-byte characters, neither of these CF functions behaves as documented:
- CFStringGetLength() returns the length in bytes, not the length in UTF16 code pairs as it's supposed to.
- CFStringGetCharacters() returns the bytes, not the UTF16 code pairs as it's supposed to.

As a consequence, if UTF8-encoded bytes that encode internal NUL characters are used to initialize an NSString, the "fast path" would clobber bridging in one way; but, if those bytes encode multi-byte characters, the "slow path" would clobber bridging in a different way.

Revised bridging

Since CF UTF16-related functions are unreliable, I allocate a buffer to store UTF8-encoded bytes. If some way can be found to determine (reliably, not relying on the result of a buggy implementation in CF) the length in bytes of a CFString that uses UTF8-encoded bytes for storage, then a fast path can be restored that doesn't require allocating a buffer.

Modified tests

After implementing this revised bridging, five tests began to fail. On further examination, these tests were found to be incorrect because they relied on multiple tandem NUL characters being lost on bridging. On a Mac, Darwin Foundation would fail these five tests as well. I have corrected the tests.

This PR also adds a test for constructing an NSString with data that contains an internal NUL character.

Miscellaneous changes

This PR rolls in the previously discussed (#496) change in init?(data:encoding:).
This PR also fixes two nits: a typo in a test name and inconsistent use of an unindented //.

xwu · 2016-09-20T00:55:47Z

@parkera Any post-Swift 3 thoughts on this issue?

parkera · 2016-09-22T22:19:07Z

I'll take a look at this soon. Thanks @xwu

phausler · 2016-10-13T17:09:08Z

Foundation/String.swift

+
+            // FIXME: If we had a reliable way to determine the length of `cf`
+            //        in bytes, then we wouldn't have to allocate a new buffer
+            //        if `CFStringGetCStringPtr(_:_:)` doesn't return nil.


The correct way to obtain the backing buffers is to expect nil and switch on either the 8 bit or 16 bit backing variants of CFString more like the previous code did, else you have a distinct chance that the CFStringGetBytes function will have to do a full walk of the unichars and convert each one.

This is roughly how a c implementation can be the fastest at converting to either a UTF8 or UTF16 buffer with an applier pseudo-lambda

void applyBuffer(CFStringRef str, void (*applier)(const void *buffer, CFStringEncoding)) { const char *cptr = CFStringGetCStringPtr(str, kCFStringEncodingUTF8); if (cptr != NULL) { applier(cptr, kCFStringEncodingUTF8); } else { const UniChar *uptr = CFStringGetCharactersPtr(str); if (uptr != NULL) { applier(uptr, kCFStringEncodingUTF16); } else { CFIndex length = CFStringGetLength(str); UniChar stack_characters[1024] = {0}; // adjust stack size according to taste UniChar *characters = &stack_characters[0]; if (length > sizeof(stack_characters) / sizeof(stack_characters[0])) { characters = malloc(sizeof(UniChar) * length); } CFStringGetCharacters(str, CFRangeMake(0, length), characters); applier(characters, kCFStringEncodingUTF16); if (characters != &stack_characters[0]) { free(characters); } } } }

CFString, NSString and struct String all have dual backing implementations that either are fast paths for 8 bit storage or 16 bit storage.

Neat. I will study this code and implement accordingly. Currently traveling so can't promise quick turnaround.

phausler · 2016-10-13T17:10:06Z

TestFoundation/TestNSStream.swift

@@ -33,7 +33,7 @@ class TestNSStream : XCTestCase {
    }

    func test_InputStreamWithData(){
-        let message: NSString = "Hello, playground"
+        let message: NSString = "Hello, playground\0\0\0"


it would be better to write it's own tests instead of hijacking other un-related unit tests (this applies to all of the unit test changes here)

Sorry, these are meant to be minimal changes to make these tests pass, not a way of slipping in additional tests; they relied on buggy behavior in their current form and must be changed once string bridging is fixed. The alternative is to alter some of the hardcoded buffer lengths and offsets, but it's potato po-tah-to IMO.

To be concrete, these tests expect NSStrings of 17 characters to roundtrip through a 20-character buffer. After string bridging is fixed, we must either make the string 20 characters long or the buffer 17 characters long. Not modifying the test is not an option.

hmm interesting, running this test on Darwin (in it's original form) fails.

This looks like the test is busted and should be fixed. Which I feel like the correct fix for the test is to have the buffer be 17 bytes and not add null bytes onto the NSString since that is definitely an uncommon edge case behavior.

Yes, sorry for not communicating that fact more clearly (the fact that these tests fail on Darwin). It's stuffed in the middle of my PR description. Happy to modify things the other way if you prefer; notably, however you fix it, the test passes with the currently busted string bridging, so I'm definitely not sneaking in any test cases under the radar :)

xwu · 2016-10-29T23:49:36Z

@phausler See if these latest changes adequately address your comments.

alblue · 2017-10-05T12:46:12Z

This pull request has gone stale, closing. Please rebase before re-opening.

xwu mentioned this pull request Aug 29, 2016

NSString bridging and internal NUL, take two [SR-2225] #549

Closed

parkera self-assigned this Sep 22, 2016

phausler reviewed Oct 13, 2016

View reviewed changes

xwu added 2 commits October 29, 2016 18:03

[SR-2225] String bridging and internal NUL (rebased)

ebf9a27

Address reviewer comments

b3de379

alblue closed this Oct 5, 2017

xwu mentioned this pull request Oct 8, 2017

String bridging and internal NUL [SR-2225], take four #1253

Closed

xwu mentioned this pull request Sep 2, 2018

SR-7455: Allow NUL in Strings to match Darwin #1677

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SR-2225] String bridging and internal NUL (rebased) #604

[SR-2225] String bridging and internal NUL (rebased) #604

Uh oh!

xwu commented Aug 29, 2016 •

edited

Loading

Uh oh!

xwu commented Sep 20, 2016

Uh oh!

parkera commented Sep 22, 2016

Uh oh!

phausler Oct 13, 2016

Uh oh!

xwu Oct 13, 2016

Uh oh!

phausler Oct 13, 2016

Uh oh!

xwu Oct 13, 2016

Uh oh!

xwu Oct 13, 2016

Uh oh!

phausler Oct 13, 2016

Uh oh!

xwu Oct 14, 2016

Uh oh!

xwu commented Oct 29, 2016

Uh oh!

alblue commented Oct 5, 2017

Uh oh!

Uh oh!

[SR-2225] String bridging and internal NUL (rebased) #604

[SR-2225] String bridging and internal NUL (rebased) #604

Uh oh!

Conversation

xwu commented Aug 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's going on in this PR

Existing bridging is buggy

Revised bridging

Modified tests

Miscellaneous changes

Uh oh!

xwu commented Sep 20, 2016

Uh oh!

parkera commented Sep 22, 2016

Uh oh!

phausler Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

xwu Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

phausler Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

xwu Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

xwu Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

phausler Oct 13, 2016

Choose a reason for hiding this comment

Uh oh!

xwu Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

xwu commented Oct 29, 2016

Uh oh!

alblue commented Oct 5, 2017

Uh oh!

Uh oh!

xwu commented Aug 29, 2016 •

edited

Loading