-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Simplify buffer pointer initialization. #34879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci test |
CC @atrick and @gottesmm who are likely both interested here. To the two of you: I can get rid of the last two preconditions, but doing so removes some guard rails. The two preconditions come from The first precondition here is impossible, for the same reason we could do unchecked subtraction on startIndex and endIndex: However, users are capable of screwing this up because you can just whip up a |
Motivation: I recently discovered that UnsafeRawBufferPointer.init(rebasing:) is surprisingly expensive, with 7 traps and 11 branches. A simple replacement can make it a lot cheaper, down to two traps and four branches. This ends up having pretty drastic effects on ByteBuffer-heavy NIO code, which often outlines the call to that initializer and loses the ability to make a bunch of site-local optimisations. While this has been potentially fixed upstream with swiftlang/swift#34879, there is no good reason to wait until Swift 5.4 for this improvement. Due to the niche use-case, I didn't bother doing this for _every_ rebasing in the program. In particular, there is at least one UnsafeBufferPointer(rebasing:) that I didn't do this with, and there are uses in both NIOTLS and NIOHTTP1 that I didn't change. While we can fix those if we really need to, it would be nice to avoid this helper proliferating too far through our codebase. Modifications: - Replaced the use of URBP.init(rebasing:) with a custom hand-rolled version that avoids Slice.count. Result: Cheaper code. One NIOHTTP2 benchmark sees a 2.9% speedup from this change alone.
Worked around in SwiftNIO at apple/swift-nio#1696. |
UnsafeRawBufferPointer was not made self-slicing, but is instead sliced by Slice<UnsafeRawBufferPointer>. When working with UnsafeRawBufferPointer objects it is quite common to want to transform back into the underlying collection, which is why the UnsafeRawBufferPointer provides a constructor from its subsequence: UnsafeRawBufferPointer.init(rebasing:). Unfortunately for an initializer on this extremely low-level type, this initializer is surprisingly expensive. When disassembled on Linux it emits 7 separate traps and 11 branches. This relative heft means this method often gets outlined, which is a shame, as several of the branches could often be eliminated due to checks elsewhere in functions where this initializer is used. Almost all of these branches and almost all of the traps are unnecessary. We can remove them by noting that it is impossible to construct a Slice whose endIndex is earlier than its startIndex. All Slice inits are constructed with bounds expressed as Range, and Range enforces ordering on its endpoints. For this reason, we can do unchecked arithmetic on the start and end index of the slice, and remove 5 traps in one fell swoop. This greatly cheapens the cost of the initializer, improving its odds of being inlined and having even more of its branches optimised away. For what it's worth, I also considered trying to remove the last two preconditions. Unfortunately I concluded I couldn't confidently do that. I wanted to remove them based on the premise that a valid Slice must have indices that were in-bounds for its parent Collection, and so we could safely assume that if the parent base address was nil the count would have to be zero, and that adding start index to the base address would definitely not overflow. However, the existence of the Slice.init(base:bounds:) constructor led me to be a bit suspicions of removing those checks. Given that UnsafeRawBufferPointer.init(start:count:) enforces those invariants, I decided it was safest for us to continue to do that.
ed1d406
to
300f4da
Compare
@swift-ci test |
I'm trying to remove these in #34747 but it caused some benchmark regressions (e.g. SuffixAnySequence, +200% at -Osize is probably not noise). I haven't had a chance to investigate those in detail, but my hunch is that they allow other preconditions/overflow checks to be omitted later on. I don't think we have a way to tell the compiler "assume this statement is true, but don't trap if it isn't" (well... I kind of discovered a way, but it's awful). |
I’m not proposing to remove them in full generality, just in the specific constructor from |
@swift-ci benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci test |
Build failed |
@swift-ci test |
Motivation: I recently discovered that UnsafeRawBufferPointer.init(rebasing:) is surprisingly expensive, with 7 traps and 11 branches. A simple replacement can make it a lot cheaper, down to two traps and four branches. This ends up having pretty drastic effects on ByteBuffer-heavy NIO code, which often outlines the call to that initializer and loses the ability to make a bunch of site-local optimisations. While this has been potentially fixed upstream with swiftlang/swift#34879, there is no good reason to wait until Swift 5.4 for this improvement. Due to the niche use-case, I didn't bother doing this for _every_ rebasing in the program. In particular, there is at least one UnsafeBufferPointer(rebasing:) that I didn't do this with, and there are uses in both NIOTLS and NIOHTTP1 that I didn't change. While we can fix those if we really need to, it would be nice to avoid this helper proliferating too far through our codebase. Modifications: - Replaced the use of URBP.init(rebasing:) with a custom hand-rolled version that avoids Slice.count. Result: Cheaper code. One NIOHTTP2 benchmark sees a 2.9% speedup from this change alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic stuff, Cory 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR itself is great. Thanks a lot! I'll add some comments to the discussion about the other possibilities.
@Lukasa @lorentey I'm going to invent some rationale for when we need preconditions, so I'd like to hear what you think about it too. I'll claim that we want preconditions the check consistency across user-provided arguments whenever we construct a value, regardless of whether the value's type is Unsafe. For example, UBP.init(start:count:) should always check count >= 0 and non-nil start for count > 0. When it comes to accessing that type, we may demote the argument check to a debugPrecondition, as in UBP.subscript(). On the other hand, when constructing Unsafe types, we can assume that any single argument value is already self-consistent. So, UBP.init(rebasing:) does not need to re-check the validity of Slice. After all, constructing an invalid Slice requires using Range.init(uncheckedBounds:). So what could we do better:
Are we confident enough that users won't misuse Slice.init(uncheckedBounds:) to do this? I did notice that Range.init(from decoder:) does not have any preconditions, which seems a little alarming. |
Disappointingly I think this isn’t true. This constructor allows an arbitrary Range and base instance to be provided for Slicing. While this constructor is primarily intended for users conforming to Collection, in principle it can be used arbitrarily. Because it does not validate that the bounds are valid for the base collection, nothing prevents users doing this:
I think it’s totally fine to assume that |
I think the theory with Supposing the following statements are true:
Then I think it's reasonable to say that in release builds, constructing a buffer pointer from a slice must do exactly as many checks as constructing one through (That said, we should definitely have If we had a time machine, I would go back to the evolution discussion & I would be arguing loudly to make these buffer types self-slicing, with pointers as their index type. Unfortunately we don't have one, so we are stuck with a one-size-fits-all SubSequence implementation on a very unusual collection type. |
Yes, definitely! But
That is very alarming. It should signal invalid data by throwing an error. Edit: This is a false alarm -- |
Right, I was suggesting that |
@lorentey URBP was originally self-slicing (and UBP must have been too), before DaveA pointed out that generic algorithms broke because of non-transferable indices. Yeah, it's really unfortunate they don't have pointer indices, and I don't know what the historical reason for that was. |
Yeah -- Slice.count gets the default @inlinable // unsafe-performance
public func distance(from start: Int, to end: Int) -> Int {
// NOTE: this is a manual specialization of index movement for a Strideable
// index that is required for UnsafeBufferPointer performance. The
// optimizer is not capable of creating partial specializations yet.
// NOTE: Range checks are not performed here, because it is done later by
// the subscript function.
return end - start
} Unfortunately, if we accept that calling |
We could replace public func distance(from start: Int, to end: Int) -> Int {
let result = end.subtractingReportingOverflow(start)
_debugPrecondition(!result.overflow)
return result.partialValue
} (On the theory that an overflow here can only ever occur if |
Interesting... and annoying that a valid Range has an invalid distance |
Hm, expanding on the legal theory above:
So, if I understand correctly:
So if all this is right, then we should probably change public init(rebasing slice: Slice<UnsafeMutableBufferPointer<Element>>) {
// We should probably not call _failEarlyRangeCheck here because range
// construction would likely emit unnecessary checks.
_debugPrecondition(
slice.startIndex >= 0 && slice.endIndex <= slice.base.count,
"Invalid slice")
// Note: this assumes https://github.com/apple/swift/pull/34951
// is eventually merged (which it should be). Otherwise this should
// initialize members directly, with an explicit check against `start == nil && count != 0`.
self.init(
start: slice.base.baseAddress?.advanced(by: slice.startIndex),
count: slice.count)
} |
That sounds reasonable to me. I’ll throw together a patch to that effect today. |
Ace, I love not doing work I don't have to! |
UnsafeRawBufferPointer was not made self-slicing, but is instead sliced
by Slice. When working with
UnsafeRawBufferPointer objects it is quite common to want to transform
back into the underlying collection, which is why the
UnsafeRawBufferPointer provides a constructor from its subsequence:
UnsafeRawBufferPointer.init(rebasing:).
Unfortunately for an initializer on this extremely low-level type, this
initializer is surprisingly expensive. When disassembled on Linux it
emits 7 separate traps and 11 branches. This relative heft means this
method often gets outlined, which is a shame, as several of the branches
could often be eliminated due to checks elsewhere in functions where
this initializer is used.
Almost all of these branches and almost all of the traps are
unnecessary. We can remove them by noting that it is impossible to
construct a Slice whose endIndex is earlier than its startIndex. All
Slice inits are constructed with bounds expressed as Range, and Range
enforces ordering on its endpoints.
For this reason, we can do unchecked arithmetic on the start and end
index of the slice, and remove 5 traps in one fell swoop. This greatly
cheapens the cost of the initializer, improving its odds of being
inlined and having even more of its branches optimised away.
For what it's worth, I also considered trying to remove the last two
preconditions. Unfortunately I concluded I couldn't confidently do that.
I wanted to remove them based on the premise that a valid Slice must
have indices that were in-bounds for its parent Collection, and so we
could safely assume that if the parent base address was nil the count
would have to be zero, and that adding start index to the base address
would definitely not overflow. However, the existence of the
Slice.init(base:bounds:) constructor led me to be a bit suspicions of
removing those checks. Given that
UnsafeRawBufferPointer.init(start:count:) enforces those invariants, I
decided it was safest for us to continue to do that.