Skip to content

[stdlib] Fix handling of duplicate items in generic Set.intersection #59417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 14, 2022

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Jun 13, 2022

If the sequence passed to Set.intersection has exactly as many duplicate items as items missing from self, then the new implementation in 5.6 mistakenly returns self unchanged. (It counts duplicate hits as distinct items in the result.)

rdar://94803458

If the sequence passed to `Set.intersection` has exactly as many duplicate items as items missing from `self`, then the new implementation in 5.6 will mistakenly return `self` unchanged. (It counts duplicate hits as distinct items in the result.)

rdar://94803458
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci benchmark

// Prefer to iterate over the smaller set. However, we must be careful to
// only include elements from `self`, not `other`.
guard self.count <= other.count else {
return genericIntersection(other)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note: we could continue forwarding to genericIntersection in the new code, but extra branches in the new variant would've unnecessarily slowed down this code path.)

@lorentey
Copy link
Member Author

The fix will (mostly) deploy back to 5.6, as we expect these paths to be (almost) always specialized.

Copy link
Contributor

@rctcwyvrn rctcwyvrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentey
Copy link
Member Author

Very reasonable performance results. Also, a code size improvement??? 😵‍💫

------- Performance (x86_64): -O -------

REGRESSION                                         OLD    NEW    DELTA    RATIO    
FlattenListFlatMap                                 3160   4541   +43.7%   **0.70x (?)**
Set.intersection.Seq.Int100                        38     42     +10.5%   **0.90x (?)**
StringToDataSmall                                  650    700    +7.7%    **0.93x (?)**

IMPROVEMENT                                        OLD    NEW    DELTA    RATIO    
NSStringConversion.MutableCopy.Rebridge.LongUTF8   372    341    -8.3%    **1.09x (?)**
String.data.Medium                                 67     62     -7.5%    **1.08x (?)**
TypeName                                           639    596    -6.7%    **1.07x (?)**

------- Code size: -O -------

IMPROVEMENT   OLD      NEW      DELTA   RATIO  
SetTests.o   140457   138865   -1.1%   **1.01x**

@lorentey
Copy link
Member Author

The -Osize results also support the hypothesis that this will have minimal performance impact in practice. None of our Set benchmarks are testing cases with duplicate items, so the additional branch per iteration is highly predictable.

(Not that we wouldn't land a correctness fix if it regressed benchmarks, but it's nice that this won't really have that effect.)

------- Performance (x86_64): -Osize -------

IMPROVEMENT                            OLD    NEW    DELTA    RATIO    
FlattenListLoop                        1380   1030   -25.4%   **1.34x (?)**
FlattenListFlatMap                     3131   2447   -21.8%   **1.28x (?)**
StringInterpolationManySmallSegments   8500   7100   -16.5%   **1.20x (?)**
Set.filter.Int100.28k                  49     42     -14.3%   **1.17x (?)**
Set.filter.Int100.24k                  39     34     -12.8%   **1.15x (?)**
Set.filter.Int100.20k                  33     29     -12.1%   **1.14x (?)**
Set.filter.Int100.16k                  27     24     -11.1%   **1.12x (?)**
Set.subtracting.Empty.Box              19     17     -10.5%   **1.12x (?)**
ParseFloat.Float.Exp                   10     9      -10.0%   **1.11x (?)**
Set.isDisjoint.Int.Empty               49     45     -8.2%    **1.09x (?)**

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants