Skip to content

SR-7455: Allow NUL in Strings to match Darwin #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 10, 2018

Conversation

spevans
Copy link
Contributor

@spevans spevans commented Aug 29, 2018

  • String._conditionallyBridgeFromObjectiveC:
    Use String(decoding:as: UTF8.self) instead of String(cString:) to
    allow NULs in the decoded string.

  • Update tests which use buffers with trailing zeros to create strings,
    making the tests work correctly against Darwin's native Foundation.

@spevans
Copy link
Contributor Author

spevans commented Aug 29, 2018

@swift-ci test

@spevans spevans requested a review from parkera August 29, 2018 15:44
result = str.withMemoryRebound(to: UInt8.self, capacity: length) { ptr in
let buffer = UnsafeBufferPointer(start: ptr, count: length)
return String(decoding: buffer, as: UTF8.self)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. If you're open to a small addition to your fix, we may want to also consider trying to call CFStringGetCharactersPtr if CFStringGetCStringPtr fails. That one may return a pointer to a UTF16 encoded buffer (or NULL), which we can also use without re-allocating in the slow path below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* These functions attempt to return in O(1) time the desired format for the string.
Note that although this means a pointer to the internal structure is being returned,
this can't always be counted on. Please see note at the top of the file for more
details.
*/

CF_EXPORT
const UniChar *CFStringGetCharactersPtr(CFStringRef theString);  /* May return NULL at any time; be prepared for NULL, if not now, in some other time or place. See discussion at top of this file. */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pakera I will look into making that change. BTW, is it correct to use CFStringGetLength to find the number of bytes in the UTF8 string or is there something better to use?

@spevans
Copy link
Contributor Author

spevans commented Aug 30, 2018

@swift-ci test

1 similar comment
@spevans
Copy link
Contributor Author

spevans commented Aug 30, 2018

@swift-ci test

@spevans
Copy link
Contributor Author

spevans commented Aug 30, 2018

@parkera does the 2nd commit look ok?

Copy link
Contributor

@millenomi millenomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFStringGetLength returns the number of characters, which is not the number of UTF-8 bytes of their encoding. I’m pretty sure that’s not okay.

@millenomi
Copy link
Contributor

(Where ‘characters’ means ‘UTF-16 code points’.)

@spevans
Copy link
Contributor Author

spevans commented Aug 30, 2018

@millenomi Are CFStrings always UTF-16? Is there a way to get the string length in bytes?

@xwu
Copy link
Contributor

xwu commented Sep 2, 2018

I've made four attempts at this and no longer have the time to do so, so I'm glad you're taking a look. There are the four attempts, which hopefully will give some insight:

#1253
#604
#549
#496

It's been a while since I've thought about this, but I recall being greatly dismayed by the following:

If the CFString in question contains multi-byte characters and was initialized using bytes, then neither of these CF functions behaves as documented: CFStringGetLength() seems to return the length in bytes, not the length in UTF16 code points. CFStringGetCharacters() seems to return the bytes, not the UTF16 code points.

@spevans
Copy link
Contributor Author

spevans commented Sep 29, 2018

@swift-ci test

1 similar comment
@spevans
Copy link
Contributor Author

spevans commented Sep 29, 2018

@swift-ci test

@spevans
Copy link
Contributor Author

spevans commented Sep 29, 2018

@millenomi I think I have addressed the UTF-8 multibyte issue when using CFStringGetCStringPtr() by requesting kCFStringEncodingASCII instead of kCFStringEncodingUTF8 so the code point count from CFStringGetLength() should be the number of ASCII bytes.

Some testing actually revealed that constructing a multibyte UTF-8 CFString made CFStringGetCStringPtr(cfstr, kCFStringEncodingUTF8) return NULL anyway.

@spevans
Copy link
Contributor Author

spevans commented Oct 2, 2018

@millenomi Can you check if my last comment about only requesting an ASCII string when using CFStringGetCStringPtr() and CFStringGetLength() makes sense?

@millenomi
Copy link
Contributor

Yeah, in that encoding byte length is guaranteed to be equal to character length.

@spevans
Copy link
Contributor Author

spevans commented Oct 3, 2018

@swift-ci test and merge

1 similar comment
@spevans
Copy link
Contributor Author

spevans commented Oct 4, 2018

@swift-ci test and merge

@millenomi
Copy link
Contributor

@swift-ci please test and merge

1 similar comment
@spevans
Copy link
Contributor Author

spevans commented Oct 10, 2018

@swift-ci please test and merge

- String._conditionallyBridgeFromObjectiveC:
  Use String(decoding:as: UTF8.self) instead of String(cString:) to
  allow NULs in the decoded string.

- Update tests which use buffers with trailing zeros to create strings,
  making the tests work correctly against Darwin's native Foundation.
- If CFStringGetLength() returns 0 just create an empty string ("").

- Try CFStringGetCharactersPtr() before CFStringGetCharacters()
  to avoid allocating a buffer.
- Add tests for multibyte UTF-8/ UTF-16.
@millenomi
Copy link
Contributor

@spevans I missed the fact this has conflicts; even with them, tho, the build failure is worrying. Let's try again tomorrow.

@spevans
Copy link
Contributor Author

spevans commented Oct 10, 2018

@swift-ci test

@spevans
Copy link
Contributor Author

spevans commented Oct 10, 2018

The Mojave merge had some whitespace that caused a conflict, Ive just rebased.

But yes, that build error is odd and not one Ive seen before.

@spevans
Copy link
Contributor Author

spevans commented Oct 10, 2018

@swift-ci please test and merge

1 similar comment
@spevans
Copy link
Contributor Author

spevans commented Oct 10, 2018

@swift-ci please test and merge

@swift-ci swift-ci merged commit 74eb084 into swiftlang:master Oct 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants