Skip to content

[stdlib] Make String.Index(_:within:) initializers more permissive #42442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 20, 2022

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Apr 19, 2022

In Swift 5.6 and below, (broken) code that acquired indices from a UTF-16-encoded string bridged from Cocoa and kept using them after a makeContiguousUTF8 call (or other mutation) may have appeared to be working correctly as long as the string consisted of ASCII characters up to the given index.

import Foundation
var s = ("abcdef 😣" as NSString) as String
let i = s.index(s.startIndex, offsetBy: 2) // "c"
s.makeContiguousUTF8() // documented to invalidate all indices in `s`.
if let j = String.Index(i, within: s) { // illegal; `i` is no longer a valid index in any view of `s`.
  print(s[j])
} else {
  print("Invalid index")
}

Since #41417, the String(_:within:) initializers recognize miscoded indices and reject them by returning nil, so this code prints Invalid index. This is technically correct, but it unfortunately may be a binary compatibility issue, as in Swift 5.6 and below, this initializer used to return a non-nil value, and because the string starts with ASCII characters, the returned index happened to address the expected character. ('c' in this case.)

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string, transcoding their offsets as needed. (Attempting to use an UTF-8 index on a UTF-16 string is still rejected — we do not implicitly convert strings in that direction.) This restores Swift 5.6's behavior, as well as returning the expected value even in case the string's prefix includes non-ASCII scalars.

While we’re here, also fix a long-standing issue where the UTF16View overload of String.Index.init(_:within:) used to return nil for valid indices that happened to point to a trailing surrogate in a UTF-8 encoded string.

var s = "\u{10059}" // LINEAR B SYMBOL B079 (looks like a cute little bug: 𐁙)
let i = s.utf16.index(s.utf16.startIndex, offsetBy: 1) // trailing surrogate
print(String.Index(i, within: s.utf16) == nil) // true?!

rdar://89369680&91935537

In Swift 5.6 and below, (broken) code that acquired indices from a
UTF-16-encoded string bridged from Cocoa and kept using them after a
`makeContiguousUTF8` call (or other mutation) may have appeared to be
working correctly as long as the string was ASCII.

Since swiftlang#41417, the
`String(_:within:)` initializers recognize miscoded indices and reject
them by returning nil. This is technically correct, but it
unfortunately may be a binary compatibility issue, as these used to
return non-nil in previous versions.

Mitigate this issue by accepting UTF-16 indices on a UTF-8 string,
transcoding their offset as needed. (Attempting to use an UTF-8 index
on a UTF-16 string is still rejected — we do not implicitly convert
strings in that direction.)

rdar://89369680
…tring.Index(_:within:)`

Fix a long-standing issue where the UTF16View overload of
`String.Index.init(_:within:)` used to return nil for valid indices
that happened to point to a trailing surrogate in a UTF-8-encoded
string.

rdar://91935537
There is little point to having `isUTF16` properties when they simply
return `!isUTF8`; remove them.

Rename `String.Index._copyEncoding(from:)` to
`_copyingEncoding(from:)`.
@lorentey lorentey requested review from milseman and Azoy April 19, 2022 04:08
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci test Windows platform

@lorentey
Copy link
Member Author

Windows seems to be down with an unrelated failure, but it did seem to pass the new tests:

Failed Tests (1):
  Swift(windows-x86_64) :: SIL/clang-function-type-windows.swift

@lorentey lorentey merged commit 2574d78 into swiftlang:main Apr 20, 2022
@lorentey lorentey deleted the better-index-conversions branch April 20, 2022 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants