Unify chunked(by:) and chunked(on:) #44

timvermeulen · 2020-11-22T10:31:39Z

Unify chunked(by:) and chunked(on:) by:

forwarding the eager versions to a new internal chunked(on:by:), and
having LazyChunked store both a projection function and a predicate.

The status quo of chunked(on:) calling chunked(by: { f($0) == f($1) }) causes more calls to the projection function than necessary, specifically by repeatedly calling it with the first element of the chunk. Unifying chunked(by:) and chunked(on:) allows this projected value to be stored and reused.

This PR also slightly improves endOfChunk(startingAt:) by moving the endIndex check to index(after:) (at the only point where it can ever be hit). This change, together with changing the endIndex representation to have no upper bound, has the pleasant consequence that chunked[chunked.endIndex] and chunked.index(after: chunked.endIndex) now both trap rather than returning an empty slice and endIndex, respectively.

Open questions:

LazyChunked.Index could store the projected value. This saves a single call to the projection function for each chunk, but for chunked(by:) it only increases the size of the index without any benefit. This does not seem like a clear win so I chose to keep Index lean for now.
Should chunked(on:by:) be public as well? I've left the public API unchanged but I think exposing it might be the nice thing to do.

Checklist

I've added at least one test that validates that my change is working, if appropriate
I've followed the code style of the rest of the project
I've read the Contribution Guidelines
I've updated the documentation if necessary

kylemacomber

These seem like great improvements!

It makes sense to me to be more frugal about calling the projection closure in case, say, there's a type conversion that heap allocates in there.

It also strikes me as unusual that chunked[chunked.endIndex] didn't trap before. I think this new approach is more consistent with how the other collections in the stdlib behave. I consider this a bigger benefit than chunked.index(after: chunked.endIndex) trapping, which I agree is nice to have as long as performance isn't impacted.

As for the open questions, I think I agree with the decisions you made:

LazyChunked.Index could store the projected value.

I think keeping the Index lightweight is more consistent with the stdlib. I believe this is preferable so as to mitigate incurring referencing counting overhead.

Should chunked(on:by:) be public as well?

My inclination is to be cautious here and limit the API surface area. What do you see as the biggest benefits for adding it?

timvermeulen · 2020-12-01T01:29:15Z

It also strikes me as unusual that chunked[chunked.endIndex] didn't trap before. I think this new approach is more consistent with how the other collections in the stdlib behave. I consider this a bigger benefit than chunked.index(after: chunked.endIndex) trapping, which I agree is nice to have as long as performance isn't impacted.

Yep, indeed just nice to have, and in this case a natural consequence of organizing the code better (making endOfChunk symmetric to startOfChunk).

LazyChunked.Index could store the projected value.

I think keeping the Index lightweight is more consistent with the stdlib. I believe this is preferable so as to mitigate incurring referencing counting overhead.

That's a great point! This makes me quite confident that this is the right choice, especially knowing how few closure calls it would prevent in the first place.

Should chunked(on:by:) be public as well?

My inclination is to be cautious here and limit the API surface area. What do you see as the biggest benefits for adding it?

It's a tough question. My initial thought was that I wouldn't want people to have to hand-roll this type if they need the chunked(on:by:) functionality that we already have sitting around. On the other hand, you only need this functionality if you have a projection function and a custom predicate (already quite unusual), and they're both expensive to execute. Sounds like a very rare occurrence.

Let's err on the side of keeping the API surface area small, we can always add it later if there's a compellinig use case.

natecook1000

Very cool, @timvermeulen! Just one note below, then I think this is ready to merge.

natecook1000 · 2020-11-30T22:25:40Z

Sources/Algorithms/Chunked.swift

-    let upperBound = i.upperBound ?? endOfChunk(from: i.lowerBound)
-    let end = endOfChunk(from: upperBound)
+    let upperBound = i.upperBound ?? endOfChunk(startingAt: i.lowerBound)
+    guard upperBound != base.endIndex else { return endIndex }


~~This seems like it still has the c.index(after: c.endIndex) == c.endIndex behavior — let's have it trap on that case instead.~~ Ah, I see it. I think I'd rather be explicit to trap here with an index(after:)-specific message than rely on the subscripting in endOfChunk(from:).

That sounds reasonable!

timvermeulen added 3 commits November 22, 2020 00:27

Unify the eager chunked(on:) and chunked(by:)

2f66634

Unify the lazy chunked(on:) and chunked(by:)

b79ef3e

Trap when going past the end

fcaed33

kylemacomber approved these changes Nov 30, 2020

View reviewed changes

natecook1000 reviewed Dec 1, 2020

View reviewed changes

Add preconditions in index(before:)/index(after:)

2976fa6

timvermeulen force-pushed the unify-chunked branch from 4c0037c to 2976fa6 Compare December 2, 2020 20:00

natecook1000 approved these changes Dec 2, 2020

View reviewed changes

natecook1000 merged commit 99152ec into apple:main Dec 2, 2020

timvermeulen deleted the unify-chunked branch December 24, 2020 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unify chunked(by:) and chunked(on:) #44

Unify chunked(by:) and chunked(on:) #44

Uh oh!

timvermeulen commented Nov 22, 2020

Uh oh!

kylemacomber left a comment

Uh oh!

timvermeulen commented Dec 1, 2020

Uh oh!

natecook1000 left a comment

Uh oh!

natecook1000 Nov 30, 2020

Uh oh!

timvermeulen Dec 2, 2020

Uh oh!

Uh oh!

Unify chunked(by:) and chunked(on:) #44

Unify chunked(by:) and chunked(on:) #44

Uh oh!

Conversation

timvermeulen commented Nov 22, 2020

Checklist

Uh oh!

kylemacomber left a comment

Choose a reason for hiding this comment

Uh oh!

timvermeulen commented Dec 1, 2020

Uh oh!

natecook1000 left a comment

Choose a reason for hiding this comment

Uh oh!

natecook1000 Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

timvermeulen Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!