-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Some performance improvements in the stdlib, mainly for small string literals #16890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ull-terminated C string. And remove the now unused _SentinelCollection utility.
We can always eliminate ARC operations on the result of a value_to_bridge_object instruction. There is no need to restrict the operand to a specific SIL pattern.
…nd pointers to start-pointer + count. This saves a few instructions for some operations, like getting the count. Also, it avoids the check for unwrapping the optional end pointer. For example, iterating over an unsafe buffer now has no overhead. Also remove the _unboundedStartingAt initializer, which is not needed anymore.
Making sure that makeIterator is always inlined, enables devirutalization of the iterator calls. Inlining was not done with -Osize which resulted in pretty bad performance when iterating over an existential collection.
…o 2 constant loads. The main part of this is to rewrite the small string literal-constructor to work with values (= shifting bytes) instead of setting bytes in memory. This allows the compiler to fold away everything and end up with the optimal code for small string literals.
@swift-ci test |
@swift-ci smoke benchmark |
@swift-ci test compiler performance |
Build comment file:Build failed before running benchmark. |
@swift-ci smoke benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewing the small string changes for now.
CC @atrick for the UBP stuff
} | ||
_sanityCheck(self.count == 0, "overwrote count early?") | ||
self.count = count | ||
high |= (UInt(count) &<< (8*15)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead put this in the setter for count
, so that all other code paths can benefit?
Also, rather than have a naked 15
, you can use capacity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the count setter would have to mask out the old value, which is not needed here.
Also, I followed the count getter+setter pattern which also hard-code the '15'.
self.count = count | ||
high |= (UInt(count) &<< (8*15)) | ||
let low = _bytesToUInt(addr, lowCount) | ||
_storage = (low, high) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code would be more clear if this was in the if-else blocks. We'd also not need to introduce so many in-scope variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to reduce the number of calls to _bytesToUInt. Although llvm can fold away the whole loop, this is not yet done on SIL level, so duplicating _bytes_ToUInt in the then- and else-branch generates an additional loop in SIL.
shift = shift &+ 8 | ||
} | ||
return r | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this whole function just a (potentially-unaligned) load and bitfield-insert mask? Can you verify what code is generated here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole code is folded away (that's the point of the change). I added a test for that: SILOptimizer/string_literals.swift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(following up), this code is also called for many other dynamic String initializations, so we'd want efficient code even when it can't be constant folded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erik and I discussed changing the UBP representation. I couldn't think of any good reason to keep the (start, end] representation for the buffer, as long as the iterator keeps its representation.
Build comment file:Optimized (O)Regression (39)
Improvement (26)
No Changes (373)
Hardware Overview
|
For details see the commit list