Skip to content

CharIterator changes for std::str #8590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

CharIterator changes for std::str #8590

wants to merge 12 commits into from

Conversation

bluss
Copy link
Member

@bluss bluss commented Aug 18, 2013

Implement CharIterator as a separate struct, so that it can be .clone()'d. Fix .char_range_at_reverse so that it performs better, closer to the forwards version. This makes the reverse iterators and users like .rfind() perform better.

Before
test str::bench::char_iterator ... bench: 146 ns/iter (+/- 0)
test str::bench::char_iterator_ascii ... bench: 397 ns/iter (+/- 49)
test str::bench::char_iterator_rev ... bench: 576 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 128 ns/iter (+/- 2)
test str::bench::char_offset_iterator_rev ... bench: 425 ns/iter (+/- 59)

After
test str::bench::char_iterator ... bench: 130 ns/iter (+/- 1)
test str::bench::char_iterator_ascii ... bench: 307 ns/iter (+/- 5)
test str::bench::char_iterator_rev ... bench: 185 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 131 ns/iter (+/- 13)
test str::bench::char_offset_iterator_rev ... bench: 183 ns/iter (+/- 2)

To be able to use a string slice to represent the CharIterator, a function slice_unchecked is added, that does the same as slice_bytes but without any boundary checks.

It would be possible to implement CharIterator with pointer arithmetic to make it much more efficient, but since vec iterator is still improving, it's too early to attempt to re-implement it in other places. Hopefully CharIterator can be implemented on top of vec iterator without any unsafe code later.

Additional changes fix the documentation about null termination.

@alexcrichton
Copy link
Member

This looks pretty good to me, I don't totally agree with word_iter => whitespace_split_iter but I've seen discussion happening on that so I figure that was the consensus at least.

It took me a second to figure out why you split the iterators into two structs, and it's kinda unfortunate that we have to. I suppose that &fn being cloneable is something that'll have to wait until the future. Could a comment be added explaining why there are separate structs instead of one? Probably a good idea to have some explanation because we don't want that to be the trend.

Could you also add some comments how the iterators are implemented? These would just be for the implementation details that most people wouldn't need to see, but it'd be nice to easily read that the &'self str field is actually just the rest of the string to be iterated over.

Other than that, nice job and nice speed improvements!

@bluss
Copy link
Member Author

bluss commented Aug 19, 2013

Thanks for the comments!

We could leave word_iter in place, but tag it "for later improvement".

I didn't really mind another struct and Iterator implementation, CharIterator is the default iterator for str after all (and I think this PR has the two Char/CharOffset nicely factored into pieces), but I suppose there could be other ways to solve it. A FilterMapStatic using a function pointer instead of &fn would resolve clonability of .byte_iter() and .iter() too.

Do you think the right thing to do is don't solve clonability now, but wait for later?

@alexcrichton
Copy link
Member

I agree that you've implemented the two structures well, and I'd be more in favor of what you've done than having another map iterator defined separately for extern functions.

I think it may also be a good idea to rename word_iter in a separate pull request if possible.

blake2-ppc added 10 commits August 19, 2013 11:19
Implement char_range_at_reverse similarly to char_range_at, instead of
re-using that method.
Add a function like raw::slice_bytes, but it doesn't check slice
boundaries. For iterator use where we always know the begin, end indices
are in range.
Let CharIterator be a separate type from CharOffsetIterator (so that
CharIterator can be cloned, for example).

Implement CharOffsetIterator by using the same technique as the method
subslice_offset.
@bluss
Copy link
Member Author

bluss commented Aug 19, 2013

Removed the word_iter rename

@bluss
Copy link
Member Author

bluss commented Aug 19, 2013

I added a cleanup of CharSplitIterator that's quite funky

Embed an iterator in the CharSplitIterator struct, and combine that with
the former bool `only_ascii`; so use an enum instead.
@alexcrichton
Copy link
Member

As one final thing, could you write a test which exercises the clone implementation of CharIterator? It'd be a shame to do this and then have someone accidentally refactor it back such that clone didn't work any more.

@bluss
Copy link
Member Author

bluss commented Aug 21, 2013

Added test for clone

bors added a commit that referenced this pull request Aug 22, 2013
Implement CharIterator as a separate struct, so that it can be .clone()'d. Fix `.char_range_at_reverse` so that it performs better, closer to the forwards version. This makes the reverse iterators and users like `.rfind()` perform better.

    Before
    test str::bench::char_iterator ... bench: 146 ns/iter (+/- 0)
    test str::bench::char_iterator_ascii ... bench: 397 ns/iter (+/- 49)
    test str::bench::char_iterator_rev ... bench: 576 ns/iter (+/- 8)
    test str::bench::char_offset_iterator ... bench: 128 ns/iter (+/- 2)
    test str::bench::char_offset_iterator_rev ... bench: 425 ns/iter (+/- 59)
    
    After
    test str::bench::char_iterator ... bench: 130 ns/iter (+/- 1)
    test str::bench::char_iterator_ascii ... bench: 307 ns/iter (+/- 5)
    test str::bench::char_iterator_rev ... bench: 185 ns/iter (+/- 8)
    test str::bench::char_offset_iterator ... bench: 131 ns/iter (+/- 13)
    test str::bench::char_offset_iterator_rev ... bench: 183 ns/iter (+/- 2)

To be able to use a string slice to represent the CharIterator, a function `slice_unchecked` is added, that does the same as `slice_bytes` but without any boundary checks.

It would be possible to implement CharIterator with pointer arithmetic to make it *much more efficient*, but since vec iterator is still improving, it's too early to attempt to re-implement it in other places. Hopefully CharIterator can be implemented on top of vec iterator without any unsafe code later.

Additional changes fix the documentation about null termination.
@bors bors closed this Aug 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants