Restore more-correct behavior of getting the full contents of bridged NSStrings containing invalid UTF-8 #26172
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
5.1 version of #26152
(cherry picked from commit d091ecb)
Fixes rdar://problem/53119693
Explanation: NSStrings can be created with un-paired UTF16 surrogates, which will then cause them to fail to transcode to UTF8. We previously avoided this (and correctly repaired the invalid contents with the unicode replacement character) by asking for UTF16, one at a time, and doing our own conversion. In Swift 5.1 we switched to a bulk-access NSString API that does the transcoding for us, which caused our handling of invalid contents like this to preconditionFailure instead of repairing.
The fix is to add the old one at a time code back as a recovery path.
Scope: Swift Standard Library
Issue: rdar://problem/53119693
Risk: Low. This is a change of behavior, but it's restoring 4.2 behavior; additionally it only comes into effect in edge cases, since unpaired surrogates are invalid.
Testing: Regular tests + new automated tests specifically for this issue
Reviewer: @milseman