Skip to content

[stdlib] Switch to using unchecked buffer subscript in low-level Unicode helpers #59899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions stdlib/public/core/UnicodeHelpers.swift
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ internal func _decodeUTF8(
internal func _decodeScalar(
_ utf16: UnsafeBufferPointer<UInt16>, startingAt i: Int
) -> (Unicode.Scalar, scalarLength: Int) {
let high = utf16[i]
let high = utf16[_unchecked: i]
if i + 1 >= utf16.count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was:

guard i < utf16.endIndex else { ... }

Then the index for low could be calculated without trapping on overflow (i &+ 1). I think, as it is written, the compiler must always evaluate i + 1 and check for overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think llvm has much trouble figuring out that i < utf16.count here. (So e.g. switching to the unchecked subscript probably changes nothing for the i+1 subscript invocation below.)

One way to verify this is to look at the generated code to see if we actually have an overflow branch for the addition. If we do, then switching to wrapping arithmetic could be worth investigating in a followup PR!

Copy link
Contributor

@karwa karwa Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that this function is not even used? I can't seem to find its caller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possible! These functions have a precondition that the buffer they're given contains valid UTF-8/UTF-16 data. This precludes their use with NSString, and we no longer have a native string form that uses UTF-16.

_internalInvariant(!UTF16.isLeadSurrogate(high))
_internalInvariant(!UTF16.isTrailSurrogate(high))
Expand All @@ -76,7 +76,7 @@ internal func _decodeScalar(
return (Unicode.Scalar(_unchecked: UInt32(high)), 1)
}

let low = utf16[i+1]
let low = utf16[_unchecked: i+1]
_internalInvariant(UTF16.isLeadSurrogate(high))
_internalInvariant(UTF16.isTrailSurrogate(low))
return (UTF16._decodeSurrogates(high, low), 2)
Expand Down Expand Up @@ -207,7 +207,7 @@ extension _StringGuts {
@inlinable
internal func fastUTF8ScalarLength(startingAt i: Int) -> Int {
_internalInvariant(isFastUTF8)
let len = _utf8ScalarLength(self.withFastUTF8 { $0[i] })
let len = _utf8ScalarLength(self.withFastUTF8 { $0[_unchecked: i] })
_internalInvariant((1...4) ~= len)
return len
}
Expand Down