Skip to content

[Optimization] skip copying old values during Set's removeAll() #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dduan
Copy link
Contributor

@dduan dduan commented Dec 13, 2015

As noted in FIXME(performance), when not uniquely referenced,
Set.nativeRemoveAll() makes a copy of the memory with same content
and only then proceed to remove each element.

This patch makes nativeRemoveAll() skip the copying step.

@dduan
Copy link
Contributor Author

dduan commented Dec 13, 2015

Tests cleared: validation-test/stdlib/Set.swift
Performance gain: ~1.5% (tested with this program)

@dduan dduan changed the title Set: skip copying old values during removeAll [Optimization] skip copying old values during Set's removeAll() Dec 13, 2015
@gribozavr gribozavr self-assigned this Dec 13, 2015
@nadavrot
Copy link
Contributor

@aschwaighofer Can you please take a look?

@dduan Thanks for working on this. Can you please write a small dedicated benchmark?

@dduan
Copy link
Contributor Author

dduan commented Dec 13, 2015

@nadavrot Here's result from running the program from the 1st comment on my Mac, each version was ran 4 times.

unoptimized -R
0.811820268630981
0.795395374298096
0.773354709148407
0.800016462802887

optimized -R
0.787416100502014
0.800009787082672
0.828950881958008
0.805211424827576

averaged gain in speed: 1.3%

@dduan dduan force-pushed the set_faster_nativeRemoveAll branch from 6c350e1 to 4190cbc Compare December 13, 2015 19:40
@nadavrot
Copy link
Contributor

@dduan The gains look suspiciously low. I would expect something higher than 1.3%. Can you step with a debugger (or add printfs, or look at sil) and check that the code does what you think it does?

@dduan
Copy link
Contributor Author

dduan commented Dec 14, 2015

@nadavrot I made a mistake in the benchmark, it's fixed now. And the gains don't look low any more 😛.

These numbers are from the same MBP (with Netflix playing), the workload is reduced compared to the results I posted above, otherwise the unoptimized version takes forever to finish.

optimized -R
0.0569544434547424

unoptimized -R
23.6760957241058

gain in speed: 415%

@dabrahams
Copy link
Contributor

On Dec 13, 2015, at 11:15 PM, Daniel Duan [email protected] wrote:

@nadavrot https://github.com/nadavrot I made a mistake in the benchmark, it's fixed https://gist.github.com/dduan/279b3a1a83e31cec5ec1 now. And the gains don't look low any more 😛.

These numbers are from the same MBP (with Netflix playing), the workload is reduced compared to the results I posted above, otherwise the unoptimized version takes forever to finish.

optimized -R
0.0569544434547424

unoptimized -R
23.6760957241058

gain in speed: A LOT

That's what I'm talking about!
Thanks for working on this…

-Dave

@nadavrot
Copy link
Contributor

@dduan Excellent. Please commit (if @dabrahams and @gribozavr approve)

@dabrahams
Copy link
Contributor

A few questions before accepting the change:

  1. do we already do the equivalent optimization for set intersection? (x.removeAll()x.intersectInPlace([]))
  2. if so, can we make this just fast by sharing that code?
  3. if not, shouldn’t we be applying the same optimization there (and elsewhere) in set?

As noted in FIXME(performance), when not uniquely referenced,
Set.nativeRemoveAll() makes a copy of the memory with same content
and only then proceed to remove each element.

This patch makes nativeRemoveAll() skip the copying part.
@dduan dduan force-pushed the set_faster_nativeRemoveAll branch from 4190cbc to b4a7993 Compare December 16, 2015 00:26
@dduan
Copy link
Contributor Author

dduan commented Dec 16, 2015

@dabrahams I updated intersectInPlace() to call removeAll() when possible.

@dabrahams
Copy link
Contributor

@dduan Thanks, but that’s not what I had in mind. intersectInPlace is the more general case and should get the optimization for all cases. Suppose intersectInPlace gets an argument that’s a single element. The result will then be a single element or zero elements. It would be wasteful to make a copy of the original set and then remove everything, or everything but one element.

@dduan
Copy link
Contributor Author

dduan commented Dec 16, 2015

@dabrahams I happen to have a patch for intersectInPlace() addressing that, it was created before this patch. If that patch is accepted, then implementing removeAll() as intersectInPlace([]) would be faster already. But the change in this PR is even faster, because no inspection of self's storage is needed at all.

@dabrahams
Copy link
Contributor

IMO no inspection is needed either way. What am I missing? intersectInPlace looks like self = self.intersect(other) in that case, and self.intersect(other) should iterate whichever set is shorter and poke the elements contained in the longer set into the result.

Also, if there’s no measurable difference, it’s better to share code ;-)

@dduan
Copy link
Contributor Author

dduan commented Dec 16, 2015

@dabrahams my other patch re-implemented intersectInPlace() to avoid the copy self.intersect(other) creates. It walks self's underlying hash table memory and delete element not found in other. It's 40+% faster than the current implementation.

@dabrahams
Copy link
Contributor

@dduan I think we’re talking past each-other. self.intersect(other) should only be called in the non-unique case, and it shouldn’t create a copy of anything. It should create a fresh hash table of capacity matching the smaller of self and other, and populate it once, with the elements both sets have in common.

@dduan
Copy link
Contributor Author

dduan commented Dec 16, 2015

I finally got it :)

One thing that confused me is that intersectInPlace() is not optimized. Further, I missed the fact that if the shorter set were to be iterated in intersect(), then removeAll() would get the same optimization because [] would be the shorter set.

I'll rework this patch starting at intersect().

Thank you @dabrahams

@dduan
Copy link
Contributor Author

dduan commented Dec 16, 2015

I took another look of this patch and remembered that it optimizes both Dictionary and Set (courtesy of GYP). In terms of code sharing, following through the intersectInPlace() discussion means having separate code for Dictionary.

There's another nuance: a normal call on .removeAll() will NOT trigger a copy as is. This patch only affects .removeAll(keepCapacity: true) cases.

What do you think @dabrahams?

@dabrahams
Copy link
Contributor

on Wed Dec 16 2015, Daniel Duan <notifications-AT-i.8713187.xyz> wrote:

I took another look of this patch and remembered that it optimizes
both Dictionary and Set (courtesy of GYP). In terms of code sharing,
following through the intersectInPlace() discussion means having
separate code for Dictionary.

I'm sorry it's been so long, but I'm afraid I don't know what this means.

There's another nuance: a normal call on .removeAll() will NOT
trigger a copy as is. This patch only affects
.removeAll(keepCapacity: true) cases.

What do you think @dabrahams?

Ideally .removeAll() would just release the buffer, so not copying
sounds perfect to me. .removeAll(keepCapacity: true) has to mutate
the elements, so if there's another collection out there sharing the
same elements, a copy should be triggered. Otherwise, not.

-Dave

@dduan dduan deleted the set_faster_nativeRemoveAll branch December 23, 2019 21:49
freak4pc pushed a commit to freak4pc/swift that referenced this pull request Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants