-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Speed up Character.init significantly for small characters. #6850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Speed up Character.init significantly for small characters. #6850
Conversation
This optimization checks to see if a Character can fit in the 63-bit small representation instead of unconditionally constructing a String and paying the cost of that allocation. If it does, the small representation is computed directly from its UTF-8 code units. In optimized builds, this turns Character literals <= 8 UTF-8 code units long into single 64-bit integer loads -- a huge improvement.
@swift-ci Please test |
dest: UnsafeMutableRawPointer(Builtin.addressof(&buffer)), | ||
src: UnsafeMutableRawPointer(start), | ||
size: UInt(utf8CodeUnitCount)) | ||
let utf8Chunk = UInt64(littleEndian: buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is this kosher for big-endian systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .small
representation orders the packed code units from low to high bits. Since we're doing a memcpy
of code units directly into the memory occupied by a UInt64
, it's assuming little-endianness everywhere, and the UInt64.init(littleEndian:)
call is necessary to ensure the correct integer representation on big-endian systems. The same logic—which I adapted—already exists in the function called by Character.init(String:)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Maybe some of the comments from the other place would be illuminating here too.
src: UnsafeMutableRawPointer(start), | ||
size: UInt(utf8CodeUnitCount)) | ||
let utf8Chunk = UInt64(littleEndian: buffer) | ||
let bits = MemoryLayout.size(ofValue: utf8Chunk) &* 8 &- 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just MemoryLayout<UInt64>.size
. But why can't you hardcode 63 instead of using bits
? (Edit: In fact, since you're explicitly invoking built-ins that expect 64 bits, this probably should just be 63
below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately kept my logic similar to what was already implemented elsewhere in the Character
type. Since I'm not the original author, I can only guess the reason it was written that way—I could imagine resilience being a motivation if the type used for .small
were to ever change. In that case, I could consider replacing the hardcoded 8 in the main conditional to something MemoryLayout
-based as well.
It would be great to add a new benchmark for this! |
One more Q: Now that you bypass the preconditions in |
@slavapestov I'll look into adding one! @xwu AFAICT, the compiler won't generate a call to this initializer for a literal that contains more than one grapheme cluster because it would be caught as an error before then. If you try to write |
@allevato So a 150x improvement for no trade-offs: nice! |
@swift-ci Please benchmark |
Definitely would be good to get some specific benchmarks on this change (if you do them in a separate PR, then merge it ahead of this one and rebase, we can see the results from CI on this PR) but just running a preliminary benchmark to see if anything shows up. |
Build comment file:Optimized (O) Regression (4)
Improvement (7)
No Changes (146)
Regression (5)
Improvement (4)
No Changes (148)
|
@airspeedswift Sounds good—I'll add some new benchmarks as soon as I have time to work on it again in the next couple of days and send a separate PR for those. |
Now that #6879 is merged, do I need to do anything extra to re-run the benchmarks? (When I tried running it on that PR, nothing happened—are the swift-ci commands only available to project collaborators?) |
@allevato Yes. Let me kick it off for you. I'll do a smoke test as well. |
@swift-ci Please benchmark |
@swift-ci Please smoke test |
@swift-ci Please smoke test linux platform |
Build comment file:Optimized (O) Regression (4)
Improvement (7)
No Changes (148)
Regression (7)
Improvement (4)
No Changes (148)
|
Wonderful! Those improvements jibe with what I observed locally. Are the regressions shown there something I should look into, or is it statistical noise/within tolerance? |
@swift-ci Please smoke test Linux |
@allevato We can always just run the benchmarks again :-) |
@swift-ci Please benchmark |
Build comment file:Optimized (O) Regression (4)
Improvement (7)
No Changes (148)
Regression (3)
Improvement (7)
No Changes (149)
|
Yeah the second bench run suggests the regressions we are seeing are just jitter, not caused by the change. I think we can merge this. @dabrahams do you agree? |
Thanks all! |
Directly computes the 63-bit representation for
Character
literals whose UTF-8 encoding will fit in the.small
representation, instead of unconditionally constructing aString
and paying the cost of that heap allocation. This speed-up is helpful when using character literals in parsing loops; if you have aswitch
statement over characters in a string and the cases are literals, you're currently constructing many temporary strings to perform those comparisons.In optimized builds, the compiler evaluates the whole computation and turns small characters into a single load operation. For example, these functions:
are compiled to the following x86_64 code, replacing string allocations with:
Large character literals also benefit from this change; we already know that it won't fit based on the UTF-8 count, so we can bypass the unnecessary UTF-8 encode that would happen if we passed it to
Character.init(String)
.Performance was tested on my late 2013 MacBook Pro 2.4GHz Intel Core i5 by creating characters of various sizes 1,000,000 times. Results:
So, anywhere from ~150–300x improvements for small characters and 2x for some large ones.