-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Vectorize UTF16 offset calculations #41866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize UTF16 offset calculations #41866
Conversation
@swift-ci please smoke test |
@swift-ci Please test Windows platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. Thanks.
@swift-ci please benchmark |
let fourBytes = U.zero.replacing(with: U.one, where: uValue .>= 0b11110000) | ||
let fourByteCount = Int(fourBytes.wrappedSum()) | ||
|
||
utf16Count &+= (U.scalarCount - continuationCount) + fourByteCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Horizontal operations (wrappedSum) are expensive; usually in SIMD code we keep the accumulation in vector until forced to map down to a scalar rather, than accumulating every loop iteration. It's a little bit of a pain with Int8 because you can only do 127 accumulations before you have to worry about overflow, but still doing one horizontal operation every N iterations is much better than 2/iteration.
The easy change here would be to do:
&+= U.scalarCount + (fourByteCount &- continuationCount).wrappedSum()
so that you only do half as many as currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll make a followup patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The optimizer might be managing to pull this apart for you already, but it would be nice to make explicit anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm hoping once wider vectors are feasible that the common case here, post-breadcrumb offsets, will only ever iterate once because we can just do the entire 64 bytes at once)
Cleaned up PR and commit history for #41684