Skip to content

[stdlib] Switch to using unchecked buffer subscript in low-level Unicode helpers #59899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Jul 6, 2022

We expect indices to be already validated by the time these are called, and UBP’s debug-mode checks are compiled into opaque parts of the stdlib.

The big exception is that Swift <5.7 used to not do proper index validation and allowed out-of-bounds accesses. We emulate this behavior for code that was compiled with such a release, and it turns out that these UnsafeBufferPointer checks are interfering with the intended (undefined) behavior in this case. Switching to unchecked accesses will speed up operations for newly built code that will have proper validation up front, while also allowing broken-but-not-trapping code built with Swift 5.6 or earlier to continue "working" as it did before.

rdar://93707276

…ode helpers

We expect indices to be already validated by the time these are
called, and UBP’s debug-mode checks are compiled into opaque parts
of the stdlib.

(The big exception is that Swift <5.7 used to not do proper index
validation and allowed out-of-bounds accesses. We emulate this
behavior for code that was compiled with such a release, and it turns
out that these UnsafeBufferPointer checks are interfering with the
intended (undefined) behavior.)

rdar://93707276
@lorentey
Copy link
Member Author

lorentey commented Jul 6, 2022

@swift-ci test

@lorentey
Copy link
Member Author

lorentey commented Jul 6, 2022

@swift-ci benchmark

@lorentey
Copy link
Member Author

lorentey commented Jul 6, 2022

------- Performance (x86_64): -O -------

REGRESSION                                  OLD    NEW    DELTA    RATIO    
DictionaryKeysContainsCocoa                 13     14     +7.7%    **0.93x (?)**

IMPROVEMENT                                 OLD    NEW    DELTA    RATIO    
NormalizedIterator_fastPrenormal            580    470    -19.0%   **1.23x (?)**
FlattenListFlatMap                          4237   3451   -18.6%   **1.23x (?)**
NormalizedIterator_latin1                   204    174    -14.7%   **1.17x (?)**
NormalizedIterator_emoji                    372    324    -12.9%   **1.15x**
FlattenListLoop                             1581   1386   -12.3%   **1.14x (?)**
Breadcrumbs.MutatedUTF16ToIdx.Mixed         206    181    -12.1%   **1.14x (?)**
Breadcrumbs.MutatedIdxToUTF16.Mixed         209    185    -11.5%   **1.13x (?)**
StringBuilderSmallReservingCapacity         226    208    -8.0%    **1.09x (?)**
ObjectiveCBridgeStubToNSDateRef             2300   2140   -7.0%    **1.07x (?)**
NormalizedIterator_nonBMPSlowestPrenormal   450    420    -6.7%    **1.07x (?)**
------- Performance (x86_64): -Osize -------

REGRESSION                            OLD    NEW    DELTA    RATIO    
PrefixAnySeqCntRange                  13     16     +23.1%   **0.81x (?)**
Chars2                                3150   3450   +9.5%    **0.91x (?)**
Calculator                            144    155    +7.6%    **0.93x (?)**

IMPROVEMENT                           OLD    NEW    DELTA    RATIO    
Breadcrumbs.MutatedUTF16ToIdx.Mixed   206    180    -12.6%   **1.14x (?)**
Breadcrumbs.MutatedIdxToUTF16.Mixed   209    184    -12.0%   **1.14x (?)**
NormalizedIterator_latin1             198    180    -9.1%    **1.10x (?)**
StringWalk                            1360   1240   -8.8%    **1.10x (?)**
------- Performance (x86_64): -Onone -------

IMPROVEMENT                           OLD    NEW    DELTA    RATIO    
SIMDReduce.Int8                       7175   6302   -12.2%   **1.14x (?)**
Breadcrumbs.MutatedUTF16ToIdx.Mixed   222    195    -12.2%   **1.14x (?)**
Breadcrumbs.MutatedIdxToUTF16.Mixed   226    200    -11.5%   **1.13x (?)**

Yep, bounds checking things twice had some performance impact.

@@ -64,7 +64,7 @@ internal func _decodeUTF8(
internal func _decodeScalar(
_ utf16: UnsafeBufferPointer<UInt16>, startingAt i: Int
) -> (Unicode.Scalar, scalarLength: Int) {
let high = utf16[i]
let high = utf16[_unchecked: i]
if i + 1 >= utf16.count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was:

guard i < utf16.endIndex else { ... }

Then the index for low could be calculated without trapping on overflow (i &+ 1). I think, as it is written, the compiler must always evaluate i + 1 and check for overflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think llvm has much trouble figuring out that i < utf16.count here. (So e.g. switching to the unchecked subscript probably changes nothing for the i+1 subscript invocation below.)

One way to verify this is to look at the generated code to see if we actually have an overflow branch for the addition. If we do, then switching to wrapping arithmetic could be worth investigating in a followup PR!

Copy link
Contributor

@karwa karwa Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that this function is not even used? I can't seem to find its caller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possible! These functions have a precondition that the buffer they're given contains valid UTF-8/UTF-16 data. This precludes their use with NSString, and we no longer have a native string form that uses UTF-16.

@glessard glessard merged commit 02efc3a into swiftlang:main Jul 6, 2022
@lorentey lorentey deleted the unchecked-buffer-access-inside-strings branch July 6, 2022 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants