-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Optimize high-level Set operations #40012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use a temporary bitset to speed up the Sequence variant by roughly a factor of 3.
Use a temporary bitset to speed up the Sequence variant by roughly a factor of 3.
Call into the specialized overload if the argument happens to be a `Set`.
- Use a temporary bitset to speed up the `Sequence` variant by roughly a factor of 4. - Fix a logic error causing the `a == b` case for the set variant to be O(n) instead of O(1).
Have the generic variant call out to the specialized overload if the argument happens to be a `Set`.
Use a temporary bitset to avoid hashing elements more than once, and to prevent rehashings during the creation of the result set. This leads to a speedup of about 0-4x, depending on the number of elements removed.
This works the same way as `Set.subtracting<S>(_:)`, and has similar performance benefits.
Use a temporary bitset to speed up the `Sequence` variant by roughly a factor of ~4-6, and the set/set variant by a factor of ~1-4, depending on the ratio of overlapping elements.
@swift-ci test |
@swift-ci benchmark |
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
Huh, a mere 5-10% overall code size increase on SetTests.o seems like a welcome surprise! I have some assertion failures I'll need to investigate. (That isn't cause for alarm, this PR is a draft for a reason. 🙈) |
@swift-ci benchmark |
@swift-ci test |
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
@swift-ci test |
This revives and replaces #21300, dramatically speeding up some high-level
Set
operations while also restoring the (undocumented) behavior thatSet.intersection
always returns items fromself
, not its argument.As discussed on the forums, Swift 5.5 changed
Set.intersection
's behavior as an optimization (#36678). This method does not guarantee which input set it uses to construct its results, so arguably, code that relies on the original behavior is in the wrong. However, there is no reason to break such code: we can implement intersections even faster than #36678, while also preserving the original behavior. This PR does this and more. (rdar://84831592)(I believe it would be worth documenting this behavior for both
Set.intersection
andSet.union
. However, that is an API change that likely requires a Swift evolution proposal.)This new incarnation of #21300 does not introduce a new type, which makes it possible to cleanly deploy these improvements to any shipping stdlib version. In exchange, the implementation turned slightly more complicated in places: the existing
_UnsafeBitset
type does not have a cachedcount
, so we need to explicitly maintain it as a standalone variable. Temporary bitmaps are now allocated using the new temporary allocation facility, so they will be in most cases allocated on the stack.The primary drawback of all these optimizations is a code size increase. I believe the performance increase will be worth this extra cost, but let's keep on eye on benchmark reports.
The original PR promised the following improvements: (these are very rough numbers, and they depend on the contents/size of the set)
Set.isSubset<S>(of:)
Set.isStrictSubset<S>(of:)
Set.isSuperset<S>(of:)
Set.isStrictSuperset<S>(of:)
Set.isDisjoint<S>(with:)
Set.subtracting(_:)
Set.filter(_:)
Set.intersection<S>(_:)
Set.intersection(_:)