-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Variants for Set Sequence Methods #24156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Variants for Set Sequence Methods #24156
Conversation
CC: @lorentey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure we need these benchmarks.
-
The non-generic
isSuperset
/isStrictSuperset
are trivial wrappers aroundisSubset
/isStrictSubset
, and I don't expect they will ever change. (Or rather, they will remain mostly in sync withisSubset
when they do.) -
The mutation tests don't seem too useful in their current form, since they include a COW copy on every iteration, and I expect that will often swamp the actual operation.
Some Set operations have generic variants that take a sequence -- e.g., Set.isSuperset<S: Sequence>(of: S)
. These are currently not covered by benchmarks at all, even though we're planning to improve them soon! I'd very much welcome benchmarks that measure those.
@swift-ci benchmark |
This comment has been minimized.
This comment has been minimized.
Thank you for reviewing. Then, I'll implement benchmark of Improving set is amazing. I wish I could have skills like you... |
@lorentey |
@lorentey @palimondo @airspeedswift Could you please review this when you have time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use a shortened Seq
to designate the sequence variants? I.e. Set.isStrictSuperset.Seq.Box25
. (Just for consistency with a planned future rename of other Sequence
benchmarks that will be using Seq
to refer to the type, as illustrated in the examples under section Use groups and variants in the Naming Convention).
I don't see what are the Filter
benchmark variants doing here at all.
It seems to me there are still missing benchmarks for 2 generic methods that take a Sequence
parameter:
union
symmetricDifference
I'm assuming @lorentey wants to add these too...
@palimondo Thank you for reviewing. I fixed code. |
Oh, OK, I understand now that this set of new benchmarks should be tailored specifically for the pending improvements in #21300. Then it makes sense not to include As for the So we’d have What do you think @lorentey? |
@palimondo |
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
I got warnings. |
Not yet. The memory warnings are safe to ignore in these cases. But we need to discuss the surprises… In principle, it is not a showstopper when some benchmarks are over the recommended runtime threshold of 1000 μs when there is pending improvement which will bring the runtime down. Going over 10 000 μs (10 ms) is much more serious issue, because this crosses the scheduler quantum on mac os — this means that every sample is guaranteed to include measurement error from a context switch which leads to very noisy results with sporadic false changes in our reports. We should aim for these benchmarks to run in reasonable times after #21300 lands, buts some of these results are disturbingly high. So high that performance improvements @lorentey mentions in #21300 will not get the runtimes of these benchmark to the ranges we need for stable measurements. I don't really see the need to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a few minor points about Int100
comparisons and a nitpick about variable naming in filter case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for those abnormally high runtimes are incorrect multipliers. In this benchmark family, we are using 5000
only for Int0
, Box0
and Empty
to inflate the workload for the case where the operation is almost nothing — to make it observable. All other cases use 50
.
This works for set-to-set comparisons in both directions, because their size is known. The size of a Sequence
is unknown before it is fully consumed. So we need to handle the set-to-sequence Int0
and Box0
variants depending on their runtime. 50 is better multiplier constant for those cases.
Edit: I forgot to comment on these, which should also use 50:
Set.isSubset.Seq.[Int0, Box0]
Set.isStrictSubset.Seq.[Int0, Box0]
Set.isStrictSuperset.Seq.[Int0, Box0]
These two runtime outliers are interesting:
because they indicate we might benefit from special casing the empty set in the |
@Gumichocopengin8 Tip: You can apply the patch from PR 21300 locally to verify the final runtimes will fall, if possible into the 20–1000 μs range (adjusting for your machine's relative performance vis-a-vis the Mac Pros that are running the CI here). |
@palimondo
Basically, I followed your instruction above. However, I didn't apply it sometimes in order to adjust benchmark for PR 21300. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gumichocopengin8 We've overlooked one more workload multiplier. Could you please fix it and force-push again?
@eeckstein, @lorentey How soon can we be landing the #21300? This additional performance coverage is tuned (in terms of workload scaling) for those improvements. Otherwise we'll have 4 benchmarks with really suboptimal runtimes: A recent addition (21 days ago in #23690) Similarly, these rely on the newly optimized handling and currently run at over 30k μs:
Are you OK with landing it this way now? |
894e92e
to
b1a96e8
Compare
@palimondo |
@Gumichocopengin8 Hmm… I see there is an option to do that in the dropdown next to Just to clarify, it is not necessary to squash commits into single one for a PR to be merged here. It is in fact beneficial to have many small and semantically single-themed commits if you are doing non-trivial multi-step changes such as refactoring, to ease the reviewer's job. But it is good practice to clean up the fits and starts (all the reverted dead ends etc.) and not pollute the history with unnecessary distractions. So thank you again for your patience with me, you did excellent work here! I wanted to wait for a green-light from @lorentey and @eeckstein (I'm assuming many Apple devs are quite busy preparing for WWDC these days), but I guess I could merge this myself and ask for forgiveness if they'd disagree with my judgement. But I have noticed something we should change before accepting this. I'll do a re-review later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stack the deck a bit more in favor of #21300 to showcase how its optimizations don't have to consume the entire Sequence
because they provide early exits as soon as all the required conditions are met. I did one more review pass, comparing the original Set.Set
benchmarks with the the newly added Set.Sequence
. You have faithfully translated the arguments for isSubset
and isStrictSubset
👍. But to better showcase the optimizations, we should be comparing against the next-one-up arrays (they have same size and same intersection, but their set difference is at the end of the sequence, instead of the beginning):
Int25
- usesetB
witharrayBC
(instead ofarrayAB
); analogously forBox25
Int50
- usesetY
witharrayYZ
(instead ofarrayXY
).
This is similar argument swapping like @lorentey requested in his review before.
(We will not be changing the arguments on Set.Set
variants to match these, because it would cause unnecessary performance changes in our long-term tracking.)
Generally we should always have the same size of sequence (400) on the right side for all of these Seq
variants. So for the isSuperset.Seq
and isStrictSuperset.Seq
we also need to swap the arguments and expected results around a bit, again, to allow for early exit from the optimizations.
@Gumichocopengin8 Will you find a time to do the requested modifications today? If not, I can make them for you, if you want... (I have the commit ready, just let me know.) |
@palimondo I fixed them. |
I don’t see the fixes on |
@palimondo oh sorry. Let me fix them tonight. I don't have time now... |
No problem. Here you go. |
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
The workload multipliers need to be increased in order to accommodate the pending improvements from `lorentey:set-on-fire`, so that these benchmark don't drop to under 20 μs runtimes when those optimizations kick in. These multiplier will differ from those used on Set.subtracting.Set by a factor of 5.
@swift-ci please benchmark |
@swift-ci please smoke test |
@swift-ci please smoke test os x platform |
@palimondo Thanks! |
Regarding the concern about the 4 long running benchmarks from above, I asked @lorentey via email if there was something else, beside the missing benchmarks which this PR provides, that is holding the merge of those optimizations. His response:
|
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
I will merge this and run the benchmarks on #21300 to validate we have all multipliers set up correctly to get 20—1000 μs runtimes. Then I'll temporarily disable the following benchmarks in #24725 until
@Gumichocopengin8 Thank you for your great work! |
Added benchmarks for Set.
To support this: #24156 (review)