Skip to content

Unify chunked(by:) and chunked(on:) #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2020

Conversation

timvermeulen
Copy link
Contributor

Unify chunked(by:) and chunked(on:) by:

  • forwarding the eager versions to a new internal chunked(on:by:), and
  • having LazyChunked store both a projection function and a predicate.

The status quo of chunked(on:) calling chunked(by: { f($0) == f($1) }) causes more calls to the projection function than necessary, specifically by repeatedly calling it with the first element of the chunk. Unifying chunked(by:) and chunked(on:) allows this projected value to be stored and reused.

This PR also slightly improves endOfChunk(startingAt:) by moving the endIndex check to index(after:) (at the only point where it can ever be hit). This change, together with changing the endIndex representation to have no upper bound, has the pleasant consequence that chunked[chunked.endIndex] and chunked.index(after: chunked.endIndex) now both trap rather than returning an empty slice and endIndex, respectively.

Open questions:

  • LazyChunked.Index could store the projected value. This saves a single call to the projection function for each chunk, but for chunked(by:) it only increases the size of the index without any benefit. This does not seem like a clear win so I chose to keep Index lean for now.
  • Should chunked(on:by:) be public as well? I've left the public API unchanged but I think exposing it might be the nice thing to do.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

Copy link

@kylemacomber kylemacomber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like great improvements!

It makes sense to me to be more frugal about calling the projection closure in case, say, there's a type conversion that heap allocates in there.

It also strikes me as unusual that chunked[chunked.endIndex] didn't trap before. I think this new approach is more consistent with how the other collections in the stdlib behave. I consider this a bigger benefit than chunked.index(after: chunked.endIndex) trapping, which I agree is nice to have as long as performance isn't impacted.

As for the open questions, I think I agree with the decisions you made:

LazyChunked.Index could store the projected value.

I think keeping the Index lightweight is more consistent with the stdlib. I believe this is preferable so as to mitigate incurring referencing counting overhead.

Should chunked(on:by:) be public as well?

My inclination is to be cautious here and limit the API surface area. What do you see as the biggest benefits for adding it?

@timvermeulen
Copy link
Contributor Author

It also strikes me as unusual that chunked[chunked.endIndex] didn't trap before. I think this new approach is more consistent with how the other collections in the stdlib behave. I consider this a bigger benefit than chunked.index(after: chunked.endIndex) trapping, which I agree is nice to have as long as performance isn't impacted.

Yep, indeed just nice to have, and in this case a natural consequence of organizing the code better (making endOfChunk symmetric to startOfChunk).

LazyChunked.Index could store the projected value.

I think keeping the Index lightweight is more consistent with the stdlib. I believe this is preferable so as to mitigate incurring referencing counting overhead.

That's a great point! This makes me quite confident that this is the right choice, especially knowing how few closure calls it would prevent in the first place.

Should chunked(on:by:) be public as well?

My inclination is to be cautious here and limit the API surface area. What do you see as the biggest benefits for adding it?

It's a tough question. My initial thought was that I wouldn't want people to have to hand-roll this type if they need the chunked(on:by:) functionality that we already have sitting around. On the other hand, you only need this functionality if you have a projection function and a custom predicate (already quite unusual), and they're both expensive to execute. Sounds like a very rare occurrence.

Let's err on the side of keeping the API surface area small, we can always add it later if there's a compellinig use case.

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, @timvermeulen! Just one note below, then I think this is ready to merge.

let upperBound = i.upperBound ?? endOfChunk(from: i.lowerBound)
let end = endOfChunk(from: upperBound)
let upperBound = i.upperBound ?? endOfChunk(startingAt: i.lowerBound)
guard upperBound != base.endIndex else { return endIndex }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it still has the c.index(after: c.endIndex) == c.endIndex behavior — let's have it trap on that case instead. Ah, I see it. I think I'd rather be explicit to trap here with an index(after:)-specific message than rely on the subscripting in endOfChunk(from:).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable!

@natecook1000 natecook1000 merged commit 99152ec into apple:main Dec 2, 2020
@timvermeulen timvermeulen deleted the unify-chunked branch December 24, 2020 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants