Skip to content

Vectorize UTF16 offset calculations #41866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2022

Conversation

Catfish-Man
Copy link
Contributor

Cleaned up PR and commit history for #41684

@Catfish-Man Catfish-Man self-assigned this Mar 17, 2022
@Catfish-Man
Copy link
Contributor Author

@swift-ci please smoke test

@Catfish-Man
Copy link
Contributor Author

@swift-ci Please test Windows platform

Copy link
Contributor

@glessard glessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Thanks.

@Catfish-Man Catfish-Man merged commit cb082e1 into swiftlang:main Mar 18, 2022
@milseman
Copy link
Member

@swift-ci please benchmark

let fourBytes = U.zero.replacing(with: U.one, where: uValue .>= 0b11110000)
let fourByteCount = Int(fourBytes.wrappedSum())

utf16Count &+= (U.scalarCount - continuationCount) + fourByteCount
Copy link
Contributor

@stephentyrone stephentyrone Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Horizontal operations (wrappedSum) are expensive; usually in SIMD code we keep the accumulation in vector until forced to map down to a scalar rather, than accumulating every loop iteration. It's a little bit of a pain with Int8 because you can only do 127 accumulations before you have to worry about overflow, but still doing one horizontal operation every N iterations is much better than 2/iteration.

The easy change here would be to do:

&+= U.scalarCount + (fourByteCount &- continuationCount).wrappedSum()

so that you only do half as many as currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll make a followup patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The optimizer might be managing to pull this apart for you already, but it would be nice to make explicit anyway.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm hoping once wider vectors are feasible that the common case here, post-breadcrumb offsets, will only ever iterate once because we can just do the entire 64 bytes at once)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants