-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Switch to using unchecked buffer subscript in low-level Unicode helpers #59899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Switch to using unchecked buffer subscript in low-level Unicode helpers #59899
Conversation
…ode helpers We expect indices to be already validated by the time these are called, and UBP’s debug-mode checks are compiled into opaque parts of the stdlib. (The big exception is that Swift <5.7 used to not do proper index validation and allowed out-of-bounds accesses. We emulate this behavior for code that was compiled with such a release, and it turns out that these UnsafeBufferPointer checks are interfering with the intended (undefined) behavior.) rdar://93707276
@swift-ci test |
@swift-ci benchmark |
Yep, bounds checking things twice had some performance impact. |
@@ -64,7 +64,7 @@ internal func _decodeUTF8( | |||
internal func _decodeScalar( | |||
_ utf16: UnsafeBufferPointer<UInt16>, startingAt i: Int | |||
) -> (Unicode.Scalar, scalarLength: Int) { | |||
let high = utf16[i] | |||
let high = utf16[_unchecked: i] | |||
if i + 1 >= utf16.count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was:
guard i < utf16.endIndex else { ... }
Then the index for low
could be calculated without trapping on overflow (i &+ 1
). I think, as it is written, the compiler must always evaluate i + 1
and check for overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think llvm has much trouble figuring out that i < utf16.count
here. (So e.g. switching to the unchecked subscript probably changes nothing for the i+1 subscript invocation below.)
One way to verify this is to look at the generated code to see if we actually have an overflow branch for the addition. If we do, then switching to wrapping arithmetic could be worth investigating in a followup PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that this function is not even used? I can't seem to find its caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possible! These functions have a precondition that the buffer they're given contains valid UTF-8/UTF-16 data. This precludes their use with NSString, and we no longer have a native string form that uses UTF-16.
We expect indices to be already validated by the time these are called, and UBP’s debug-mode checks are compiled into opaque parts of the stdlib.
The big exception is that Swift <5.7 used to not do proper index validation and allowed out-of-bounds accesses. We emulate this behavior for code that was compiled with such a release, and it turns out that these UnsafeBufferPointer checks are interfering with the intended (undefined) behavior in this case. Switching to unchecked accesses will speed up operations for newly built code that will have proper validation up front, while also allowing broken-but-not-trapping code built with Swift 5.6 or earlier to continue "working" as it did before.
rdar://93707276