Skip to content

Evenly divide a collection into chunks #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 51 additions & 24 deletions Guides/Chunked.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,24 @@
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Chunked.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/ChunkedTests.swift)]

Break a collection into subsequences where consecutive elements pass a binary
predicate, or where all elements in each chunk project to the same value.
Break a collection into nonoverlapping subsequences:

Also, includes a `chunks(ofCount:)` that breaks a collection into subsequences
of a given `count`.
* `chunked(by:)` forms chunks of consecutive elements that pass a binary predicate,
* `chunked(on:)` forms chunks of consecutive elements that project to equal values,
* `chunks(ofCount:)` forms chunks of a given size, and
* `evenlyChunked(into:)` forms a given number of equally-sized chunks.

There are two variations of the `chunked` method: `chunked(by:)` and
`chunked(on:)`. `chunked(by:)` uses a binary predicate to test consecutive
elements, separating chunks where the predicate returns `false`. For example,
you can chunk a collection into ascending sequences using this method:
`chunked(by:)` uses a binary predicate to test consecutive elements, separating
chunks where the predicate returns `false`. For example, you can chunk a
collection into ascending sequences using this method:

```swift
let numbers = [10, 20, 30, 10, 40, 40, 10, 20]
let chunks = numbers.chunked(by: { $0 <= $1 })
// [[10, 20, 30], [10, 40, 40], [10, 20]]
```

The `chunk(on:)` method, by contrast, takes a projection of each element and
The `chunked(on:)` method, by contrast, takes a projection of each element and
separates chunks where the projection of two consecutive elements is not equal.

```swift
Expand All @@ -29,11 +29,10 @@ let chunks = names.chunked(on: \.first!)
// [["David"], ["Kyle", "Karoy"], ["Nate"]]
```

The `chunks(ofCount:)` takes a `count` parameter (required to be > 0) and separates
the collection into `n` chunks of this given count. If the `count` parameter is
evenly divided by the count of the base `Collection` all the chunks will have
the count equals to the parameter. Otherwise, the last chunk will contain the
remaining elements.
The `chunks(ofCount:)` method takes a `count` parameter (required to be > 0) and
separates the collection into chunks of this given count. If the length of the
collection is a multiple of the `count` parameter, all chunks will have the
specified size. Otherwise, the last chunk will contain the remaining elements.

```swift
let names = ["David", "Kyle", "Karoy", "Nate"]
Expand All @@ -44,7 +43,22 @@ let remaining = names.chunks(ofCount: 3)
// equivalent to [["David", "Kyle", "Karoy"], ["Nate"]]
```

The `chunks(ofCount:)` is the method of the [existing SE proposal][proposal].
The `chunks(ofCount:)` method was previously [proposed](proposal) for inclusion
in the standard library.

The `evenlyChunked(into:)` method takes a `count` parameter and divides the
collection into `count` number of equally-sized chunks. If the length of the
collection is not a multiple of the `count` parameter, the chunks at the start
will be longer than the chunks at the end.

```swift
let evenChunks = (0..<15).evenlyChunked(into: 3)
// equivalent to [0..<5, 5..<10, 10..<15]

let nearlyEvenChunks = (0..<15).evenlyChunked(into: 4)
// equivalent to [0..<4, 4..<8, 8..<12, 12..<15]
```

Unlike the `split` family of methods, the entire collection is included in the
chunked result — joining the resulting chunks recreates the original collection.

Expand All @@ -53,14 +67,13 @@ c.elementsEqual(c.chunked(...).joined())
// true
```

Check the [proposal][proposal] detailed design section for more info.

[proposal]: https://github.com/apple/swift-evolution/pull/935

## Detailed Design

The two methods are added as extension to `Collection`, with two matching
versions that return a lazy wrapper added to `LazyCollectionProtocol`.
The four methods are added to `Collection`, with matching versions of
`chunked(by:)` and `chunked(on:)` that return a lazy wrapper added to
`LazyCollectionProtocol`.

```swift
extension Collection {
Expand All @@ -71,9 +84,14 @@ extension Collection {
public func chunked<Subject: Equatable>(
on projection: (Element) -> Subject
) -> [SubSequence]

public func chunks(ofCount count: Int) -> ChunkedByCount<Self>

public func evenlyChunked(into count: Int) -> EvenChunks<Self>
}
}

extension LazyCollectionProtocol {
extension LazyCollectionProtocol {
public func chunked(
by belongInSameGroup: @escaping (Element, Element) -> Bool
) -> Chunked<Elements>
Expand All @@ -84,8 +102,17 @@ extension Collection {
}
```

The `Chunked` type is bidirectional when the wrapped collection is
bidirectional.
The `Chunked` type conforms to `LazyCollectionProtocol` and also conforms to
`BidirectionalCollection` when the base collection conforms.

The `ChunkedByCount` type conforms to `Collection` and also conforms to both
`BidirectionalCollection` and `RandomAccessCollection` when the base collection
is random-access, as well as to `LazyCollectionProtocol` when the base
collection conforms.

The `EvenChunks` type conforms to `Collection` and also
conforms to `BidirectionalCollection`, `RandomAccessCollection`, and
`LazyCollectionProtocol` when the base collection conforms..

### Complexity

Expand All @@ -105,5 +132,5 @@ The operation performed by these methods is similar to other ways of breaking a
**Ruby:** Ruby’s `Enumerable` class defines `chunk_while` and `chunk`, which map
to the proposed `chunked(by:)` and `chunked(on:)` methods.

**Rust:** Rust defines a variety of size-based `chunks` methods, but doesn’t
include any with the functionality described here.
**Rust:** Rust defines a variety of size-based `chunks` methods, of which the
standard version corresponds to the `chunks(ofCount:)` method defined here.
237 changes: 237 additions & 0 deletions Sources/Algorithms/Chunked.swift
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,214 @@ extension Chunked: BidirectionalCollection
}
}


@available(*, deprecated, renamed: "Chunked")
public typealias LazyChunked<Base: Collection, Subject> = Chunked<Base, Subject>

/// A collection wrapper that evenly breaks a collection into a given number of
/// chunks.
public struct EvenChunks<Base: Collection> {
/// The base collection.
@usableFromInline
internal let base: Base

/// The number of equal chunks the base collection is divided into.
@usableFromInline
internal let numberOfChunks: Int

/// The count of the base collection.
@usableFromInline
internal let baseCount: Int

/// The upper bound of the first chunk.
@usableFromInline
internal var firstUpperBound: Base.Index

@inlinable
internal init(base: Base, numberOfChunks: Int) {
self.base = base
self.numberOfChunks = numberOfChunks
self.baseCount = base.count
self.firstUpperBound = base.startIndex

if numberOfChunks > 0 {
firstUpperBound = endOfChunk(startingAt: base.startIndex, offset: 0)
}
}
}

extension EvenChunks {
/// Returns the number of chunks with size `smallChunkSize + 1` at the start
/// of this collection.
@inlinable
internal var numberOfLargeChunks: Int {
baseCount % numberOfChunks
}

/// Returns the size of a chunk at a given offset.
@inlinable
internal func sizeOfChunk(offset: Int) -> Int {
let isLargeChunk = offset < numberOfLargeChunks
return baseCount / numberOfChunks + (isLargeChunk ? 1 : 0)
}

/// Returns the index in the base collection of the end of the chunk starting
/// at the given index.
@inlinable
internal func endOfChunk(startingAt start: Base.Index, offset: Int) -> Base.Index {
base.index(start, offsetBy: sizeOfChunk(offset: offset))
}

/// Returns the index in the base collection of the start of the chunk ending
/// at the given index.
@inlinable
internal func startOfChunk(endingAt end: Base.Index, offset: Int) -> Base.Index {
base.index(end, offsetBy: -sizeOfChunk(offset: offset))
}

/// Returns the index that corresponds to the chunk that starts at the given
/// base index.
@inlinable
internal func indexOfChunk(startingAt start: Base.Index, offset: Int) -> Index {
guard offset != numberOfChunks else { return endIndex }
let end = endOfChunk(startingAt: start, offset: offset)
return Index(start..<end, offset: offset)
}

/// Returns the index that corresponds to the chunk that ends at the given
/// base index.
@inlinable
internal func indexOfChunk(endingAt end: Base.Index, offset: Int) -> Index {
let start = startOfChunk(endingAt: end, offset: offset)
return Index(start..<end, offset: offset)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers are great! 👏🏻


extension EvenChunks: Collection {
public struct Index: Comparable {
/// The range corresponding to the chunk at this position.
@usableFromInline
internal var baseRange: Range<Base.Index>

/// The offset corresponding to the chunk at this position. The first chunk
/// has offset `0` and all other chunks have an offset `1` greater than the
/// previous.
@usableFromInline
internal var offset: Int

@inlinable
internal init(_ baseRange: Range<Base.Index>, offset: Int) {
self.baseRange = baseRange
self.offset = offset
}

@inlinable
public static func == (lhs: Self, rhs: Self) -> Bool {
lhs.offset == rhs.offset
}

@inlinable
public static func < (lhs: Self, rhs: Self) -> Bool {
lhs.offset < rhs.offset
}
}

public typealias Element = Base.SubSequence

@inlinable
public var startIndex: Index {
Index(base.startIndex..<firstUpperBound, offset: 0)
}

@inlinable
public var endIndex: Index {
Index(base.endIndex..<base.endIndex, offset: numberOfChunks)
}

@inlinable
public func index(after i: Index) -> Index {
precondition(i != endIndex, "Can't advance past endIndex")
let start = i.baseRange.upperBound
return indexOfChunk(startingAt: start, offset: i.offset + 1)
}

@inlinable
public subscript(position: Index) -> Element {
precondition(position != endIndex)
return base[position.baseRange]
}

@inlinable
public func index(_ i: Index, offsetBy distance: Int) -> Index {
/// Returns the base distance between two `EvenChunks` indices from the end
/// of one to the start of the other, when given their offsets.
func baseDistance(from offsetA: Int, to offsetB: Int) -> Int {
let smallChunkSize = baseCount / numberOfChunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @timvermeulen :)
Should we safe guard by a 0 value in numberOfChunks? The following code for example

    let ec =  "".evenlyChunked(into: 0)
    let d = ec.index(ec.startIndex, offsetBy: 1)

could lead to a division by zero.

Given that should fail anyways because we cannot advance that, should we just precondition that instead of fail in division by zero? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, it's fine (or even desired) for the program to crash in that scenario since it's a programmer error to advance past the end. Division by zero doesn't have the most descriptive error message, but other than that it's a totally reasonable way to crash.

Advancing past the end (or before the start) isn't required to crash in any particular way, in fact, it isn't required to crash at all: Array is a common example of a collection that is totally fine with you moving an index outside the bounds of the collection, as long as you don't use it to try to index the array. In the Algorithms package we try to be a lot more vigilant about making sure the program crashes when an invalid index is used for subscripting, than about whatever happens when you try to move out of bounds (usually deferring to however the base collection handles it).

Note that in this particular case the division by zero only happens when the number of chunks is 0 — when moving past the end of, say, [1, 2, 3].evenlyChunked(into: 2) using index(_:offsetBy:), no crash happens at all. index(_:offsetBy:) is probably where the change should be made if we wanted to be more strict about this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok! Thanks :)

Division by zero doesn't have the most descriptive error message, but other than that it's a totally reasonable way to crash.

That is what I was thinking with the precondition suggestion, if we will crash, it seems a smoother way to crash from the user perspective since we can give a better error message. But agree, it is totally fine.

let numberOfChunks = (offsetB - offsetA) - 1

let largeChunksEnd = Swift.min(self.numberOfLargeChunks, offsetB)
let largeChunksStart = Swift.min(self.numberOfLargeChunks, offsetA + 1)
let numberOfLargeChunks = largeChunksEnd - largeChunksStart

return smallChunkSize * numberOfChunks + numberOfLargeChunks
}

if distance == 0 {
return i
} else if distance > 0 {
let offset = i.offset + distance
let baseOffset = baseDistance(from: i.offset, to: offset)
let start = base.index(i.baseRange.upperBound, offsetBy: baseOffset)
return indexOfChunk(startingAt: start, offset: offset)
} else {
let offset = i.offset + distance
let baseOffset = baseDistance(from: offset, to: i.offset)
let end = base.index(i.baseRange.lowerBound, offsetBy: -baseOffset)
return indexOfChunk(endingAt: end, offset: offset)
}
}

@inlinable
public func index(_ i: Index, offsetBy distance: Int, limitedBy limit: Index) -> Index? {
if distance >= 0 {
if (0..<distance).contains(self.distance(from: i, to: limit)) {
return nil
}
} else {
if (0..<(-distance)).contains(self.distance(from: limit, to: i)) {
return nil
}
}
return index(i, offsetBy: distance)
}

@inlinable
public func distance(from start: Index, to end: Index) -> Int {
end.offset - start.offset
}
}

extension EvenChunks.Index: Hashable where Base.Index: Hashable {}

extension EvenChunks: BidirectionalCollection
where Base: BidirectionalCollection
{
@inlinable
public func index(before i: Index) -> Index {
precondition(i != startIndex, "Can't advance before startIndex")
return indexOfChunk(endingAt: i.baseRange.lowerBound, offset: i.offset - 1)
}
}

extension EvenChunks: RandomAccessCollection
where Base: RandomAccessCollection {}

extension EvenChunks: LazySequenceProtocol
where Base: LazySequenceProtocol {}

extension EvenChunks: LazyCollectionProtocol
where Base: LazyCollectionProtocol {}

//===----------------------------------------------------------------------===//
// lazy.chunked(by:)
//===----------------------------------------------------------------------===//
Expand Down Expand Up @@ -519,3 +724,35 @@ extension ChunkedByCount: LazySequenceProtocol
where Base: LazySequenceProtocol {}
extension ChunkedByCount: LazyCollectionProtocol
where Base: LazyCollectionProtocol {}

//===----------------------------------------------------------------------===//
// evenlyChunked(into:)
//===----------------------------------------------------------------------===//

extension Collection {
/// Returns a collection of `count` evenly divided subsequences of this
/// collection.
///
/// This method divides the collection into a given number of equally sized
/// chunks. If the length of the collection is not divisible by `count`, the
/// chunks at the start will be longer than the chunks at the end, like in
/// this example:
///
/// for chunk in "Hello, world!".evenlyChunked(into: 5) {
/// print(chunk)
/// }
/// // "Hel"
/// // "lo,"
/// // " wo"
/// // "rl"
/// // "d!"
///
/// - Complexity: O(1) if the collection conforms to `RandomAccessCollection`,
/// otherwise O(*n*), where *n* is the length of the collection.
@inlinable
public func evenlyChunked(into count: Int) -> EvenChunks<Self> {
precondition(count >= 0, "Can't divide into a negative number of chunks")
precondition(count > 0 || isEmpty, "Can't divide a non-empty collection into 0 chunks")
return EvenChunks(base: self, numberOfChunks: count)
}
}
Loading