Add benchmarks that measure KeyPath read and write performance. #60383

fibrechannelscsi · 2022-08-03T21:00:51Z

This is the first in a series of pull requests aimed at improving the speed of read and write operations via KeyPaths.

Intended to supersede #36451.

Although this PR doesn't change anything related to KeyPath performance or functionality, it does feature four benchmarks that can be used to establish a baseline in terms of timing.

The benchmark called run_testDirectAccess() establishes a baseline for how quickly read and write operations can be performed with direct use of dot notation. (That said, n would need to be equal inside the benchmark used for comparison.)

The benchmark called run_KeyPathNestedStructs() measures the performance of read operations via KeyPaths that traverse through structs only; it is anticipated that the memory layout of such structs is predictable. One potential optimization in this case would be to jump to the final Value specified by the KeyPath without having to perform a full projection from the Root to the Value. (Recalculation of the array offset would also need to be performed during any KeyPath append() operations, of course).

The benchmarks called run_testKeyPathReadPerformance() and run_testKeyPathWritePerformance() have 25 keypath read and write operations, respectively, per iteration. Other elements are accessed via dot notation. This is done so that the the performance of reads can be tracked separately from writes. It also enables comparison of relative performance between read and write operations.

The benchmark called run_testKeyPathSmallStruct() is the same as run_testKeyPathReadPerformance(), except the struct used has 10 elements in it instead of 100. It was found that reading a value of a given size from larger struct takes longer than reading the value from a smaller struct.

The remainder of the benchmarks explicitly benchmark the types of KeyPathComponents not featured in any of the other benchmarks. The reason this is done is that the different components get handled slightly differently at runtime. See KeyPath.swift.

fibrechannelscsi · 2022-08-03T21:01:23Z

@swift-ci smoke test

BradLarson · 2022-08-03T21:04:55Z

@swift-ci please benchmark

BradLarson · 2022-08-05T16:23:20Z

@swift-ci please benchmark

BradLarson · 2022-08-05T21:24:58Z

@swift-ci please benchmark

BradLarson · 2022-08-08T16:33:34Z

@swift-ci please benchmark

… 25.

… them above 20 us.

BradLarson · 2022-08-08T20:22:32Z

@swift-ci please benchmark

BradLarson · 2022-08-08T21:45:31Z

@swift-ci please smoke test

BradLarson · 2022-08-09T02:28:54Z

@eeckstein - You provided great feedback on the PR that was the predecessor to this one. Would you still be the right person to review these keypath benchmark additions?

benchmark/single-source/KeyPathPerformanceTests.swift

eeckstein · 2022-08-10T07:24:33Z

benchmark/single-source/KeyPathPerformanceTests.swift

+let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e
+
+// Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess().
+struct FixedSizeArray100<Element>: Sequence, IteratorProtocol {


What's the purpose of 100 elements if only a small subset is used in the benchmarks?

Hello, thanks for your feedback! We're wanting to introduce a suite of benchmarks that highlight some of the performance issues we've been experience with KeyPaths. What we've seen so far can be largely placed into three categories.

The first category involves a collection of KeyPaths not being inlined if they're spread too far apart in memory. The magic number, after some experimentation, happened to be 14. As soon as you have a 15th KeyPath, the inlining in release mode does not occur, and performance falls off a cliff -- being some 100 times worse or more. This is part of the reason why we chose a struct with 100 elements instead of just 10. (I do agree that asking the optimizer to inline KeyPaths pointing to an arbitrarily-wide range is wishful thinking, but perhaps the value of 14 is smaller than some might expect.) I've added a comment right above run_testKeyPathsInlined() which explains this in a bit more detail.

The second category involves nested structs or classes. Any sort of KeyPath projection operation is O(N) in the depth of the object. The benchmark KeyPathNestedStructs uses this.

The third category involves accessing a constant-sized value within structs of various sizes. Namely, performance decreases as more and more variables are added to the struct. I've added a benchmark called KeyPathsSmallStruct to showcase the difference in performance between a struct with 10 elements and one with 100 (KeyPathReadPerformance).

You can use @inline(never) to ensure you are testing the non-inlined case.

Do you still need the 100 elements when using @inline(never)

I'd like to keep it if possible: I've added the @inline(never) directive to both versions ofgetKeypathToElement() -- for the 10-element struct and also for the 100-element struct -- and compared the times for each iteration. In both Debug and Release mode (i.e., after compiling via swift build and swift build --configuration release), the 100- element struct needs about 1.5X more time per iteration when compared with the 10-element struct.

benchmark/single-source/KeyPathPerformanceTests.swift

eeckstein · 2022-08-10T07:36:50Z

benchmark/single-source/KeyPathPerformanceTests.swift

+    tags: [.validation, .api],
+    setUpFunction: setupSingleton
+  ),
+]


It would also be very useful to add benchmarks for other keypath variants, like getters+setters.

I'm wondering what is meant by this: does this mean adding benchmarks featuring ReferenceWritableKeyPaths, PartialKeyPaths, and others?
Or, benchmarks for KeyPath syntax like this?
get { ... #keyPath(...) }

By the way, this benchmark is already getting somewhat long (over 1300 lines). Would it be a good candidate for multi-source?

Example for a gettable property:

struct S { var x: Int { return 27 } } let kp = \S.x

Would it be a good candidate for multi-source?

No, keeping it in a single file is fine.

There are several kind of keypath components:

stored property (what you are testing)

gettable property (my previous example)

settable property (the same, just for writing)

tuple element

optional chain

optional force

optional wrap

Those are handled differently in the runtime. Therefore it would be nice if the benchmarks would cover all of the keypath component types + combinations of them, e.g.

struct T { var y: Int } struct S { var x: (Int, T?) { return (27, T(y: 42)) } } let kp = \S.x.1?.y

Ah yes, I see that they're handled differently, especially inside _projectReadOnly<CurValue, NewValue, LeafValue>(...) in KeyPath.swift.
I'll create a few new benchmarks and verify I get all the cases listed above (and that all the cases are being hit inside that function). These'll be coming soon.

I've added benchmarks that'll test the performance of each type of KeyPathComponent. There is now a MarkDown table near the top of the file that indicates which benchmark stresses which type of KeyPathComponent.
I did find another error in the CI output indicating that the setup time for KeyPathWritePerformance was too high; I've moved its KeyPaths, and the ones for KeyPathReadPerformance, into the singleton.

I've updated the main message associated with the PR to discuss the benchmarks that have been added since then.
What is the convention for resolving conversations (via the Resolve conversation button) during workflows like this?

As you like. Just let me somehow know when the PR is ready for another round of review.

Sound good!

benchmark/single-source/KeyPathPerformanceTests.swift

…. Removed benchmarks dealing with an inlining issue.

benchmark/single-source/KeyPathPerformanceTests.swift

…p and documentation.

BradLarson · 2022-08-16T18:18:58Z

@swift-ci please benchmark

…enamed GetSet to Getset.

BradLarson · 2022-08-17T16:59:38Z

@swift-ci please benchmark

eeckstein

Looks already pretty good! I have a few more comments.

eeckstein · 2022-08-18T06:45:30Z

benchmark/single-source/KeyPathPerformanceTests.swift

+let singleHopKeyPath4: WritableKeyPath<E, Int> = \E.e
+
+// Used for run_KeyPathFixedSizeArrayAccess() and run_testDirectAccess().
+struct FixedSizeArray100<Element>: Sequence, IteratorProtocol {


Do you still need the 100 elements when using @inline(never)

eeckstein · 2022-08-18T06:46:34Z

benchmark/single-source/KeyPathPerformanceTests.swift

+  var kp50: WritableKeyPath<FixedSizeArray100<ElementType>, ElementType>
+
+  static let shared = FixedSizeArrayHolder()
+  init() {


Why are the keypaths stored as class members?
If I understand correctly, for the purpose of this benchmark it would be sufficient to create the relevant keypaths in the benchmark functions - wrapped into identity()

I've addressed the 100-element issue in another post -- it boils down to 100 elements needing 1.5X more time per iteration that 10, at least as far as reads are concerned.
I've moved these KeyPaths into the class due to a single CI run where it was complaining about a high setup time in KeyPathWritePerformance. It was this one here:
https://ci.swift.org/job/swift-PR-macos-perf/260/console
The error in question was: ERROR runtime: 'KeyPathWritePerformance' has setup overhead of 68 μs (8.3%).
So the idea here was to move everything but the kitchen sink into the class.

benchmark/single-source/KeyPathPerformanceTests.swift

eeckstein · 2022-08-19T05:20:05Z

benchmark/single-source/KeyPathPerformanceTests.swift

+    tags: [.validation, .api],
+    setUpFunction: setupSingleton
+  ),
+]


As you like. Just let me somehow know when the PR is ready for another round of review.

eeckstein · 2022-08-22T09:33:12Z

benchmark/single-source/KeyPathPerformanceTests.swift

+  let elementCount = FixedSizeArrayHolder.shared.mainArrayForNestedStructs.count
+  for _ in 1 ... iterationMultipier * n {
+    let element = FixedSizeArrayHolder.shared.mainArrayForNestedStructs[index]
+    sum += element[keyPath: identity(appendedKeyPath)]


You need to add identity outside of the loop, because it adds considerable runtime overhead:

let appendedKeyPath = identity(singleHopKeyPath0.appending(path: singleHopKeyPath1) .appending(path: singleHopKeyPath2).appending(path: singleHopKeyPath3) .appending(path: singleHopKeyPath4))

Please also correct this in all other functions.

…enchmark.

eeckstein

lgtm, thanks!

BradLarson · 2022-08-22T18:39:37Z

@swift-ci please test

BradLarson · 2022-08-22T18:44:29Z

@swift-ci please benchmark

fibrechannelscsi · 2022-08-22T20:20:07Z

It looks like there's a CI failure caused by this test:
TEST 'Swift(iphonesimulator-x86_64) :: attr/attr_availability_async_rename_toplevel.swift' FAILED
I've checked a couple of previous pipelines (3553 and 3554) and it's happening there too.

… long setup overhead errors.

BradLarson · 2022-08-23T17:35:24Z

@swift-ci please benchmark

…t() to try to reduce setup time errors.

BradLarson · 2022-08-23T19:45:22Z

@swift-ci please benchmark

BradLarson · 2022-08-23T20:37:55Z

@swift-ci please smoke test

BradLarson · 2022-08-24T00:27:41Z

@swift-ci please smoke test Windows platform

BradLarson · 2022-08-24T15:46:50Z

@swift-ci please smoke test Windows platform

…tlang#60383) * Add benchmarks that measure KeyPath read and write performance. * Added setUpFunctions. Revised number of iterations per benchmark. * Include n as a factor in the number of iterations. * Increased number of iterations for KeyPathDirectAccess by a factor of 25. * One last tweak to the number of iterations on testDirectAccess to get them above 20 us. * Made revisions based on feedback. Added three new benchmarks. * Added benchmarks to exhaustively benchmark all KeyPathComponent types. Removed benchmarks dealing with an inlining issue. * Wrapped additional keypaths with identity() where needed. More cleanup and documentation. * Moved KeyPaths for KeyPathRead and Write into FixedSizeArrayHolder. Renamed GetSet to Getset. * Added inline(never) to both versions of getKeypathToElement(). * Moved identity() wraps so that they're called once per variable per benchmark. * Moving destinationKeyPaths into FixedSizeArrayHolder to try to reduce long setup overhead errors. * Additional moving of the identity() wrapping into the singleton's init() to try to reduce setup time errors.

Add benchmarks that measure KeyPath read and write performance.

0089a50

BradLarson mentioned this pull request Aug 3, 2022

Adds KeyPath test to benchmark suite #36451

Closed

Added setUpFunctions. Revised number of iterations per benchmark.

7c4fd37

Include n as a factor in the number of iterations.

ea9984b

Increased number of iterations for KeyPathDirectAccess by a factor of…

578ce7d

… 25.

fibrechannelscsi force-pushed the main branch from f2147cb to 578ce7d Compare August 8, 2022 16:37

One last tweak to the number of iterations on testDirectAccess to get…

14ef45a

… them above 20 us.

eeckstein requested changes Aug 10, 2022

View reviewed changes

Made revisions based on feedback. Added three new benchmarks.

2efa8eb

eeckstein reviewed Aug 12, 2022

View reviewed changes

benchmark/single-source/KeyPathPerformanceTests.swift Show resolved Hide resolved

Added benchmarks to exhaustively benchmark all KeyPathComponent types…

221c0e5

…. Removed benchmarks dealing with an inlining issue.

eeckstein reviewed Aug 16, 2022

View reviewed changes

benchmark/single-source/KeyPathPerformanceTests.swift Outdated Show resolved Hide resolved

Wrapped additional keypaths with identity() where needed. More cleanu…

1b7d834

…p and documentation.

Moved KeyPaths for KeyPathRead and Write into FixedSizeArrayHolder. R…

8562c6a

…enamed GetSet to Getset.

eeckstein reviewed Aug 19, 2022

View reviewed changes

Added inline(never) to both versions of getKeypathToElement().

0db40a0

eeckstein reviewed Aug 22, 2022

View reviewed changes

Moved identity() wraps so that they're called once per variable per b…

1c9ba16

…enchmark.

eeckstein approved these changes Aug 22, 2022

View reviewed changes

Moving destinationKeyPaths into FixedSizeArrayHolder to try to reduce…

34d0ac0

… long setup overhead errors.

Additional moving of the identity() wrapping into the singleton's ini…

ceb69d0

…t() to try to reduce setup time errors.

BradLarson merged commit 730bf24 into swiftlang:main Aug 24, 2022

Add benchmarks that measure KeyPath read and write performance. #60383

Add benchmarks that measure KeyPath read and write performance. #60383

Uh oh!

Conversation

fibrechannelscsi commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fibrechannelscsi commented Aug 3, 2022

Uh oh!

BradLarson commented Aug 3, 2022

Uh oh!

BradLarson commented Aug 5, 2022

Uh oh!

BradLarson commented Aug 5, 2022

Uh oh!

BradLarson commented Aug 8, 2022

Uh oh!

BradLarson commented Aug 8, 2022

Uh oh!

BradLarson commented Aug 8, 2022

Uh oh!

BradLarson commented Aug 9, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fibrechannelscsi Aug 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BradLarson commented Aug 16, 2022

Uh oh!

BradLarson commented Aug 17, 2022

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fibrechannelscsi commented Aug 3, 2022 •

edited

Loading

eeckstein Aug 12, 2022 •

edited

Loading

eeckstein Aug 12, 2022 •

edited

Loading

fibrechannelscsi Aug 12, 2022 •

edited

Loading

fibrechannelscsi Aug 19, 2022 •

edited

Loading