-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add benchmarks that measure KeyPath read and write performance. #60383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci smoke test |
@swift-ci please benchmark |
@swift-ci please benchmark |
@swift-ci please benchmark |
1 similar comment
@swift-ci please benchmark |
… them above 20 us.
@swift-ci please benchmark |
@swift-ci please smoke test |
@eeckstein - You provided great feedback on the PR that was the predecessor to this one. Would you still be the right person to review these keypath benchmark additions? |
let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e | ||
|
||
// Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess(). | ||
struct FixedSizeArray100<Element>: Sequence, IteratorProtocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of 100 elements if only a small subset is used in the benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for your feedback! We're wanting to introduce a suite of benchmarks that highlight some of the performance issues we've been experience with KeyPaths. What we've seen so far can be largely placed into three categories.
The first category involves a collection of KeyPaths not being inlined if they're spread too far apart in memory. The magic number, after some experimentation, happened to be 14. As soon as you have a 15th KeyPath, the inlining in release mode does not occur, and performance falls off a cliff -- being some 100 times worse or more. This is part of the reason why we chose a struct with 100 elements instead of just 10. (I do agree that asking the optimizer to inline KeyPaths pointing to an arbitrarily-wide range is wishful thinking, but perhaps the value of 14 is smaller than some might expect.) I've added a comment right above run_testKeyPathsInlined()
which explains this in a bit more detail.
The second category involves nested structs or classes. Any sort of KeyPath projection operation is O(N) in the depth of the object. The benchmark KeyPathNestedStructs
uses this.
The third category involves accessing a constant-sized value within structs of various sizes. Namely, performance decreases as more and more variables are added to the struct. I've added a benchmark called KeyPathsSmallStruct
to showcase the difference in performance between a struct with 10 elements and one with 100 (KeyPathReadPerformance)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use @inline(never)
to ensure you are testing the non-inlined case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need the 100 elements when using @inline(never)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep it if possible: I've added the @inline(never)
directive to both versions ofgetKeypathToElement()
-- for the 10-element struct and also for the 100-element struct -- and compared the times for each iteration. In both Debug and Release mode (i.e., after compiling via swift build
and swift build --configuration release
), the 100- element struct needs about 1.5X more time per iteration when compared with the 10-element struct.
tags: [.validation, .api], | ||
setUpFunction: setupSingleton | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be very useful to add benchmarks for other keypath variants, like getters+setters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what is meant by this: does this mean adding benchmarks featuring ReferenceWritableKeyPaths, PartialKeyPaths, and others?
Or, benchmarks for KeyPath syntax like this?
get { ... #keyPath(...) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this benchmark is already getting somewhat long (over 1300 lines). Would it be a good candidate for multi-source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example for a gettable property:
struct S {
var x: Int { return 27 }
}
let kp = \S.x
Would it be a good candidate for multi-source?
No, keeping it in a single file is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several kind of keypath components:
- stored property (what you are testing)
- gettable property (my previous example)
- settable property (the same, just for writing)
- tuple element
- optional chain
- optional force
- optional wrap
Those are handled differently in the runtime. Therefore it would be nice if the benchmarks would cover all of the keypath component types + combinations of them, e.g.
struct T {
var y: Int
}
struct S {
var x: (Int, T?) { return (27, T(y: 42)) }
}
let kp = \S.x.1?.y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see that they're handled differently, especially inside _projectReadOnly<CurValue, NewValue, LeafValue>(...)
in KeyPath.swift
.
I'll create a few new benchmarks and verify I get all the cases listed above (and that all the cases are being hit inside that function). These'll be coming soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added benchmarks that'll test the performance of each type of KeyPathComponent. There is now a MarkDown table near the top of the file that indicates which benchmark stresses which type of KeyPathComponent.
I did find another error in the CI output indicating that the setup time for KeyPathWritePerformance
was too high; I've moved its KeyPaths, and the ones for KeyPathReadPerformance
, into the singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the main message associated with the PR to discuss the benchmarks that have been added since then.
What is the convention for resolving conversations (via the Resolve conversation
button) during workflows like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you like. Just let me somehow know when the PR is ready for another round of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound good!
…. Removed benchmarks dealing with an inlining issue.
…p and documentation.
@swift-ci please benchmark |
…enamed GetSet to Getset.
@swift-ci please benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks already pretty good! I have a few more comments.
let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e | ||
|
||
// Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess(). | ||
struct FixedSizeArray100<Element>: Sequence, IteratorProtocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need the 100 elements when using @inline(never)
var kp50: WritableKeyPath<FixedSizeArray100<ElementType>, ElementType> | ||
|
||
static let shared = FixedSizeArrayHolder() | ||
init() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the keypaths stored as class members?
If I understand correctly, for the purpose of this benchmark it would be sufficient to create the relevant keypaths in the benchmark functions - wrapped into identity()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed the 100-element issue in another post -- it boils down to 100 elements needing 1.5X more time per iteration that 10, at least as far as reads are concerned.
I've moved these KeyPaths into the class due to a single CI run where it was complaining about a high setup time in KeyPathWritePerformance
. It was this one here:
https://ci.swift.org/job/swift-PR-macos-perf/260/console
The error in question was: ERROR runtime: 'KeyPathWritePerformance' has setup overhead of 68 μs (8.3%).
So the idea here was to move everything but the kitchen sink into the class.
tags: [.validation, .api], | ||
setUpFunction: setupSingleton | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you like. Just let me somehow know when the PR is ready for another round of review.
let elementCount = FixedSizeArrayHolder.shared.mainArrayForNestedStructs.count | ||
for _ in 1 ... iterationMultipier * n { | ||
let element = FixedSizeArrayHolder.shared.mainArrayForNestedStructs[index] | ||
sum += element[keyPath: identity(appendedKeyPath)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add identity
outside of the loop, because it adds considerable runtime overhead:
let appendedKeyPath = identity(singleHopKeyPath0.appending(path: singleHopKeyPath1)
.appending(path: singleHopKeyPath2).appending(path: singleHopKeyPath3)
.appending(path: singleHopKeyPath4))
Please also correct this in all other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
@swift-ci please test |
@swift-ci please benchmark |
It looks like there's a CI failure caused by this test: |
… long setup overhead errors.
@swift-ci please benchmark |
…t() to try to reduce setup time errors.
@swift-ci please benchmark |
@swift-ci please smoke test |
@swift-ci please smoke test Windows platform |
1 similar comment
@swift-ci please smoke test Windows platform |
…tlang#60383) * Add benchmarks that measure KeyPath read and write performance. * Added setUpFunctions. Revised number of iterations per benchmark. * Include n as a factor in the number of iterations. * Increased number of iterations for KeyPathDirectAccess by a factor of 25. * One last tweak to the number of iterations on testDirectAccess to get them above 20 us. * Made revisions based on feedback. Added three new benchmarks. * Added benchmarks to exhaustively benchmark all KeyPathComponent types. Removed benchmarks dealing with an inlining issue. * Wrapped additional keypaths with identity() where needed. More cleanup and documentation. * Moved KeyPaths for KeyPathRead and Write into FixedSizeArrayHolder. Renamed GetSet to Getset. * Added inline(never) to both versions of getKeypathToElement(). * Moved identity() wraps so that they're called once per variable per benchmark. * Moving destinationKeyPaths into FixedSizeArrayHolder to try to reduce long setup overhead errors. * Additional moving of the identity() wrapping into the singleton's init() to try to reduce setup time errors.
This is the first in a series of pull requests aimed at improving the speed of read and write operations via KeyPaths.
Intended to supersede #36451.
Although this PR doesn't change anything related to KeyPath performance or functionality, it does feature four benchmarks that can be used to establish a baseline in terms of timing.
The benchmark called
run_testDirectAccess()
establishes a baseline for how quickly read and write operations can be performed with direct use of dot notation. (That said,n
would need to be equal inside the benchmark used for comparison.)The benchmark called
run_KeyPathNestedStructs()
measures the performance of read operations via KeyPaths that traverse through structs only; it is anticipated that the memory layout of such structs is predictable. One potential optimization in this case would be to jump to the finalValue
specified by the KeyPath without having to perform a full projection from theRoot
to theValue
. (Recalculation of the array offset would also need to be performed during any KeyPathappend()
operations, of course).The benchmarks called
run_testKeyPathReadPerformance()
andrun_testKeyPathWritePerformance()
have 25 keypath read and write operations, respectively, per iteration. Other elements are accessed via dot notation. This is done so that the the performance of reads can be tracked separately from writes. It also enables comparison of relative performance between read and write operations.The benchmark called
run_testKeyPathSmallStruct()
is the same asrun_testKeyPathReadPerformance()
, except the struct used has 10 elements in it instead of 100. It was found that reading a value of a given size from larger struct takes longer than reading the value from a smaller struct.The remainder of the benchmarks explicitly benchmark the types of
KeyPathComponents
not featured in any of the other benchmarks. The reason this is done is that the different components get handled slightly differently at runtime. SeeKeyPath.swift
.