Skip to content

Add bifurcate(_:) #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Add bifurcate(_:) #151

wants to merge 3 commits into from

Conversation

mdznr
Copy link
Contributor

@mdznr mdznr commented Jul 13, 2021

Description

Adds a bifurcate(_:) algorithm. This is very similar to filter(_:), but instead of just getting an array of the elements that did match the given predicate, also get a second array for the elements that did not match the given predicate.

This is more performant than calling filter(_:) twice on the same input with mutually-exclusive predicates since:

  1. It only requires a single pass of the elements
  2. If the input has a known number of elements, the cumulative space for both returned arrays is known and can avoid array buffer resizing.
let cast = ["Vivien", "Marlon", "Kim", "Karl"]
let (shortNames, longNames) = cast.bifurcate({ $0.count < 5 })
print(shortNames)
// Prints "["Kim", "Karl"]"
print(longNames)
// Prints "["Vivien", "Marlon"]"

Detailed Design

extension Sequence {
  @inlinable
  public func bifurcate(
    _ belongsInFirstCollection: (Element) throws -> Bool
  ) rethrows -> ([Element], [Element])
}

Naming

It was hard to find a word to describe this behavior without using terms like “split” or “separate”, which might have other interpretations (like only being able to get the prefix and suffix), so I chose “bifurcate”, but definitely open to hearing different names.

For consistency with filter(_:), I avoided calling this function something like bifurcating(_:), bifurcated(_:), or giving the parameter a name at the call-site, like bifurcate(where:).

Documentation Plan

  • Inline documentation for each new function
  • Comments in the implementation
  • Updated README.md
  • Added Guides/Bifurcate.md

Test Plan

  • Adds unit tests for example given in documentation
  • Adds unit tests for various inputs
  • Adds unit tests for empty input

Source Impact

This is purely additive

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

mdznr added 3 commits July 12, 2021 17:00
Like `filter(_:)`, but also returns a collection of the elements for which the predicate returned `false`
@mdznr mdznr marked this pull request as ready for review July 13, 2021 00:02
@mdznr mdznr changed the title Bifurcate Add bifurcate(_:) Jul 13, 2021
@timvermeulen
Copy link
Contributor

Thanks for suggesting this @mdznr! This operation is commonly called partition in other languages. The Swift standard library happens to already have an in-place partition(by:) (with a stable variant residing in this package) and this could be considered the not-in-place version of that. What do you think about calling this method partitioned(by:)?

Your implementation of the Collection overload is nicely able to avoid any array resizing, but in doing so it takes a couple of (potentially) O(n) hits. We should probably run some benchmarks using Swift Collections Benchmark to make sure this added complexity is worthwhile. There are some additional optimizations we could consider:

  • Don't allocate two new arrays for the return values but instead return ArraySlices that both point to the same buffer.
  • Taking that to extremes, don't reverse the elements on the right side at all, and instead return (a wrapper around) a ReversedCollection<ArraySlice>.
  • Another direction entirely could be to not allocate a new buffer with the same size of the input collection at all, but instead store which elements satisfy the predicate in a bitset. Once that is done we'll know exactly how much capacity to reserve for the return values.

There are some trade-offs here regarding performance, implementation simplicity, and having a clean API. A good next step would be to get our hands on some benchmark results that might tell us which directions are worth pursuing.

@mdznr
Copy link
Contributor Author

mdznr commented Jul 14, 2021

Thanks for suggesting this @mdznr! This operation is commonly called partition in other languages. The Swift standard library happens to already have an in-place partition(by:) (with a stable variant residing in this package) and this could be considered the not-in-place version of that. What do you think about calling this method partitioned(by:)?

I like that idea a lot. I don’t know how I missed the existing stable partition function in this package. 🤦‍♂️

Your implementation of the Collection overload is nicely able to avoid any array resizing, but in doing so it takes a couple of (potentially) O(n) hits. We should probably run some benchmarks using Swift Collections Benchmark to make sure this added complexity is worthwhile. There are some additional optimizations we could consider:

  • Don't allocate two new arrays for the return values but instead return ArraySlices that both point to the same buffer.
  • Taking that to extremes, don't reverse the elements on the right side at all, and instead return (a wrapper around) a ReversedCollection<ArraySlice>.

I like that idea. That allows the caller to decide whether the conversation to Array is needed or not.

  • Another direction entirely could be to not allocate a new buffer with the same size of the input collection at all, but instead store which elements satisfy the predicate in a bitset. Once that is done we'll know exactly how much capacity to reserve for the return values.

I see _UnsafeBitset in swift-collections, but it needs to know the size ahead of time. Is there another API I should be looking at for this?

There are some trade-offs here regarding performance, implementation simplicity, and having a clean API. A good next step would be to get our hands on some benchmark results that might tell us which directions are worth pursuing.

Sounds like a good idea. I'll use https://github.com/apple/swift-collections-benchmark to guide this.

@mdznr mdznr closed this Jul 14, 2021
@mdznr mdznr deleted the bifurcate branch July 14, 2021 16:55
@mdznr mdznr restored the bifurcate branch July 15, 2021 17:44
@mdznr
Copy link
Contributor Author

mdznr commented Jul 15, 2021

GitHub won’t let me re-open this PR because I’ve force-pushed changes onto the branch. Following up with #152

Screen Shot 2021-07-15 at 12 58 35 PM

@mdznr mdznr mentioned this pull request Jul 27, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants