Skip to content

NSString bridging and internal NUL, take two [SR-2225] #549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

NSString bridging and internal NUL, take two [SR-2225] #549

wants to merge 2 commits into from

Conversation

xwu
Copy link
Contributor

@xwu xwu commented Aug 16, 2016

What's going on in this PR

Existing bridging is buggy

The existing implementation of bridging from _NSCFString to String was broken in several ways:

  1. The "fast path" blew away internal NUL characters because it used facilities for C strings.
  2. Merely removing the "fast path" strangely caused tests to fail. Debugging revealed that the reason for failing tests was that the default "slow path" was also broken.
  3. The "slow path" used CFStringGetLength() and CFStringGetCharacters(). However, if the CFString in question contains multi-byte characters and was initialized using bytes, then neither of these CF functions behaves as documented:
    • CFStringGetLength() seems to return the length in bytes, not the length in UTF16 code pairs.
    • CFStringGetCharacters() seems to return the bytes, not the UTF16 code pairs.

As a consequence, if UTF8-encoded bytes that encode internal NUL characters and multi-byte characters were used to initialize an NSString, the "fast path" would clobber bridging in one way, and the "slow path" would clobber bridging in a different way.

Revised bridging

Since CF UTF16-related functions are erratic, I use a buffer to store UTF8-encoded bytes. If some way can be found to determine (reliably, not relying on the result of a buggy implementation in CF) the length in bytes of a CFString that uses UTF8-encoded bytes for storage, then a fast path can be restored that doesn't require allocating a buffer.

Modified tests

After implementing this revised bridging, five tests began to fail. On further examination, these tests were found to be incorrect because they relied on multiple tandem NUL characters being lost on bridging. On a Mac, Darwin Foundation fails these five tests as well. I have corrected the tests.

This PR additionally adds a test for constructing an NSString with data that contains an internal NUL character.

Miscellaneous changes

This PR rolls in the previously discussed (#496) change in init?(data:encoding:).

This PR also fixes three nits: a typo in a test name; an alphabetization issue in the Xcode project for a test file; and inconsistent use of an unindented //.

@xwu
Copy link
Contributor Author

xwu commented Aug 17, 2016

@parkera I know you're probably super-busy at the moment. I've tried to document my thinking copiously to ease the burden for the reader. Hopefully, not too late for Swift 3.

@parkera
Copy link
Contributor

parkera commented Aug 19, 2016

@xwu sorry, we've been focusing on getting the other tasks done for swift 3. I hope to get back to this soon. It may be post swift-3.0 at this point.

@xwu
Copy link
Contributor Author

xwu commented Aug 19, 2016

@parkera I understand. Look forward to your comments when you get around to it.

@xwu
Copy link
Contributor Author

xwu commented Aug 29, 2016

Closed in favor of #604 (enough code has moved around that manually applying these changes is easier than resolving git rebase conflicts).

@xwu xwu closed this Aug 29, 2016
@xwu xwu deleted the nsstring-init-with-data branch August 29, 2016 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants