-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Review and fix some problems with unsafe buffer and Range initialization #34961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Review and fix some problems with unsafe buffer and Range initialization #34961
Conversation
Every other Unsafe(Mutable)BufferPointer precondition in this file is debug-mode only. This information is already part of the documentation for this function, and it's really hard to get rid of these branches and traps otherwise.
…ckedBounds:) Unchecked APIs must still perform checks in debug builds to ensure that invariants aren’t violated. (I.e., these aren’t a license to perform invalid operations — they just let us get rid of the checks when we know for sure they are unnecessary.)
`Slice` does not technically guarantee that its indices are valid in its base, so these initializers accidentally allowed the creation of obviously out-of-bounds buffers.
Let's see how much havoc this will wreak in our benchmarks. 🙈 @swift-ci benchmark |
This comment has been minimized.
This comment has been minimized.
I assume it will wreak all the havoc. No benchmark will be safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shiny, I love this patch.
@swift-ci benchmark |
This comment has been minimized.
This comment has been minimized.
Build failed |
OK, the egregious code size regressions in #34951 are gone now. I'm assuming the time regressions are due to changes in inlining decisions; they don't seem too bad, surprisingly enough.
The |
macOS test failure is unrelated.
|
Fantastic, thanks for taking this on @lorentey. I wasn’t sure how to debug the benchmark regressions, so I was worried I wouldn’t be able to land the other patch on my own. I’ve heard that seemingly unrelated regressions can sometimes be down to code alignment, but I don’t know how to check that. Anyway, I’m really happy to see these improvements. |
This eliminates a couple of _debugPreconditions which seem to change inlining enough to interfere with some benchmarks.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Regarding that func flattenFlatMap(_ input: [(Int, Int, Int, Int)]) -> [Int] {
return input.flatMap { [$0.0, $0.1, $0.2, $0.3] }
} This is somehow triggered by the compiler losing evidence about public init(
@_nonEphemeral start: Unsafe${Mutable}Pointer<Element>?, count: Int
) {
_debugPrecondition(
count >= 0, "Unsafe${Mutable}BufferPointer with negative count")
_debugPrecondition(
count == 0 || start != nil,
"Unsafe${Mutable}BufferPointer has a nil start and nonzero count")
_position = start
self.count = count
} Replacing the assignment with |
@swift-ci benchmark |
@swift-ci test |
This comment has been minimized.
This comment has been minimized.
Build failed |
Remaining perf regressions seem to be due to The macOS test regression is another case of
The tiny addition of the |
@swift-ci test |
@swift-ci benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm in favor of this change! I would like to see the optimizer be smarter about this code, what's the best way to follow up on that?
_position = start | ||
self.count = count | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be @inlinable
? I don't think AEIC implies that, but I don't recall the details.
What is @_nonEphemeral
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@_aeic
does imply @inlinable
-- indeed, it is a stronger variant of it.
The @_nonEphemeral
attribute indicates that the associated function escapes the pointer. The compiler emits a warning when a temporary pointer (such as the ones resulting from the implicit inout-to-pointer conversion feature) is passed for a parameter that has this attribute. It's a nice (if limited) safety feature for a common issue.
@milseman I have not looked at the implementation of the optimizer, but based the behavior exhibited here and similar cases (as well as second/third-hand info), it seems that one major practical source of performance instability is that the presence/absence of a If this diagnosis is correct (which is a big if -- recall that I know very little about how the optimizer works), then one piece of duck tape we could apply is to explicitly tell the compiler about this fact. I.e., we could introduce a semantic attribute that entirely removes these calls at a very early stage in optimized mode (or just changes any code size metric to not count their projection). This would be an extremely limited half-solution, but it may still be worth it, given how much we rely on these assertions. @atrick @aschwaighofer @eeckstein @gottesmm What do you think? As a Swift engineer, I expect to be able to add |
Yep, #35065 demonstrates that @usableFromInline @_transparent
internal func _debugPrecondition(
_ condition: @autoclosure () -> Bool, _ message: StaticString = StaticString(),
file: StaticString = #file, line: UInt = #line
) {
// Only check in debug mode.
if _slowPath(_isDebugAssertConfiguration()) {
if !_fastPath(condition()) {
_fatalErrorMessage("Fatal error", message, file: file, line: line,
flags: _fatalErrorFlags())
}
} |
@swift-ci test |
@lorentey Unfortunately the situation is more complicated. We have to distinguish between debug/release build of the stdlib itself and debug/release build of client code.
Now, I'm not sure what you are asking for. In case you are looking for an "assert" which is removed in the release build of the stdlib: it's Or do you mean that specialized stdlib functions should result in the same code regardless if an All this is quite confusing and it always takes me a while until I fully understand it. |
That makes sense! However, at least some of the issues I've been fighting was about compiler behavior for code that isn't calling into the stdlib dylib -- e.g., the I also don't recall seeing any calls into the swift parts of libswiftCore in the FlattenListFlatMap hot loop that I needed to investigate above -- and that one is listed as (weakly) impacted in #35065. I can easily believe how the AnySequence/AnyCollection benchmarks (that got a 2x speedup in #35065) may call into libswiftCore though -- their code is fully inlinable but I don't expect their boxes would get specialized. The 2x factor seems a bit harsh; we should probably track down the exact cause and try to fix it somehow. |
Hm, I may be misremembering; I explicitly speculated on dylib changes triggering the issue above. I'll need to double check. |
Yeah, FlattenListFlatMap only calls runtime entry points and a specialized version of The FlattenListFlatMap benchmark results in #35065 seem to be entirely due to benchmark flakiness -- locally measured numbers seem to be roughly similar. Phew! 😌 |
func benchmarkSequenceAlgos<S: Sequence>(s: S, n: Int) where S.Element == Int {
CheckResults(s.reduce(0, &+) == (n*(n-1))/2)
let mn = s.min()
let mx = s.max()
CheckResults(mn == 0 && mx == n-1)
CheckResults(s.starts(with: s))
} This gets correctly specialized for I think the large speedups in #35065 are all because of this wrinkle. (I don't think they are cause for alarm, though. Who uses collection existentials on ranges in practice?) I have not looked into the This leaves the So I think I have convinced myself everything is "fine" -- Unfortunately we'll likely need to stop relying on the collection conformances of |
Precondition/debugPrecondition/internalInvariant is mentioned in https://github.com/apple/swift/blob/main/docs/StandardLibraryProgrammersManual.md#_precondition-_debugprecondition-and-_internalinvariant. |
The goal here is to make sure there is no performance argument against using
UnsafeBufferPointer
instead of passing around naked pointer/count values, while also making sure obvious errors are caught in debug builds at least.U*BP.init(start:count:)
only check for invalid input in debug builds (by @karwa in [stdlib] Make UBP nil/count checks debug-mode only #34747)[Closed]Range.init(uncheckedBounds:)
(from [stdlib] Add a missing debug precondition to Range.init(uncheckedBounds:) #34951)U*BP.init(rebasing:)
(followup to [stdlib] Simplify buffer pointer initialization. #34879 based on discussions therein)U[M]BP.distance(from:to:)
(ditto)U[M]RBP.count
(bonus change)Putting on my pedantic hat, the new
_debugPreconditions
are technically source-breaking changes. However, for point (2),Range.init(uncheckedBounds:)
was documented to strictly require these conditions, and ranges that violate the invariant aren't likely to be usable. (Code that consumes ranges invariably assumes thatlowerBound <= upperBound
.)As for point (3),
U*BP.init(rebasing:)
can only ever trap if someone explicitly invoked theSlice(base:bounds:)
initializer with a bogus index range -- which is highly unlikely, since passing that to.init(rebasing:)
is far more work than just calling.init(start:count:)
.ABI compatibility can only be an issue for binaries built in debug mode. (Which may be a significant problem.)
If compatibility concerns prevent us from adding (2) and (3) above, then we should remove these but keep the rest.