CharIterator changes for std::str #8590

bluss · 2013-08-18T12:41:21Z

Implement CharIterator as a separate struct, so that it can be .clone()'d. Fix .char_range_at_reverse so that it performs better, closer to the forwards version. This makes the reverse iterators and users like .rfind() perform better.

Before
test str::bench::char_iterator ... bench: 146 ns/iter (+/- 0)
test str::bench::char_iterator_ascii ... bench: 397 ns/iter (+/- 49)
test str::bench::char_iterator_rev ... bench: 576 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 128 ns/iter (+/- 2)
test str::bench::char_offset_iterator_rev ... bench: 425 ns/iter (+/- 59)

After
test str::bench::char_iterator ... bench: 130 ns/iter (+/- 1)
test str::bench::char_iterator_ascii ... bench: 307 ns/iter (+/- 5)
test str::bench::char_iterator_rev ... bench: 185 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 131 ns/iter (+/- 13)
test str::bench::char_offset_iterator_rev ... bench: 183 ns/iter (+/- 2)

To be able to use a string slice to represent the CharIterator, a function slice_unchecked is added, that does the same as slice_bytes but without any boundary checks.

It would be possible to implement CharIterator with pointer arithmetic to make it much more efficient, but since vec iterator is still improving, it's too early to attempt to re-implement it in other places. Hopefully CharIterator can be implemented on top of vec iterator without any unsafe code later.

Additional changes fix the documentation about null termination.

alexcrichton · 2013-08-19T04:25:39Z

This looks pretty good to me, I don't totally agree with word_iter => whitespace_split_iter but I've seen discussion happening on that so I figure that was the consensus at least.

It took me a second to figure out why you split the iterators into two structs, and it's kinda unfortunate that we have to. I suppose that &fn being cloneable is something that'll have to wait until the future. Could a comment be added explaining why there are separate structs instead of one? Probably a good idea to have some explanation because we don't want that to be the trend.

Could you also add some comments how the iterators are implemented? These would just be for the implementation details that most people wouldn't need to see, but it'd be nice to easily read that the &'self str field is actually just the rest of the string to be iterated over.

Other than that, nice job and nice speed improvements!

bluss · 2013-08-19T08:23:27Z

Thanks for the comments!

We could leave word_iter in place, but tag it "for later improvement".

I didn't really mind another struct and Iterator implementation, CharIterator is the default iterator for str after all (and I think this PR has the two Char/CharOffset nicely factored into pieces), but I suppose there could be other ways to solve it. A FilterMapStatic using a function pointer instead of &fn would resolve clonability of .byte_iter() and .iter() too.

Do you think the right thing to do is don't solve clonability now, but wait for later?

alexcrichton · 2013-08-19T08:25:41Z

I agree that you've implemented the two structures well, and I'd be more in favor of what you've done than having another map iterator defined separately for extern functions.

I think it may also be a good idea to rename word_iter in a separate pull request if possible.

Implement char_range_at_reverse similarly to char_range_at, instead of re-using that method.

Add a function like raw::slice_bytes, but it doesn't check slice boundaries. For iterator use where we always know the begin, end indices are in range.

Let CharIterator be a separate type from CharOffsetIterator (so that CharIterator can be cloned, for example). Implement CharOffsetIterator by using the same technique as the method subslice_offset.

…&str

bluss · 2013-08-19T09:31:56Z

Removed the word_iter rename

bluss · 2013-08-19T14:08:58Z

I added a cleanup of CharSplitIterator that's quite funky

Embed an iterator in the CharSplitIterator struct, and combine that with the former bool `only_ascii`; so use an enum instead.

alexcrichton · 2013-08-20T08:18:15Z

As one final thing, could you write a test which exercises the clone implementation of CharIterator? It'd be a shame to do this and then have someone accidentally refactor it back such that clone didn't work any more.

bluss · 2013-08-21T22:36:19Z

Added test for clone

Implement CharIterator as a separate struct, so that it can be .clone()'d. Fix `.char_range_at_reverse` so that it performs better, closer to the forwards version. This makes the reverse iterators and users like `.rfind()` perform better. Before test str::bench::char_iterator ... bench: 146 ns/iter (+/- 0) test str::bench::char_iterator_ascii ... bench: 397 ns/iter (+/- 49) test str::bench::char_iterator_rev ... bench: 576 ns/iter (+/- 8) test str::bench::char_offset_iterator ... bench: 128 ns/iter (+/- 2) test str::bench::char_offset_iterator_rev ... bench: 425 ns/iter (+/- 59) After test str::bench::char_iterator ... bench: 130 ns/iter (+/- 1) test str::bench::char_iterator_ascii ... bench: 307 ns/iter (+/- 5) test str::bench::char_iterator_rev ... bench: 185 ns/iter (+/- 8) test str::bench::char_offset_iterator ... bench: 131 ns/iter (+/- 13) test str::bench::char_offset_iterator_rev ... bench: 183 ns/iter (+/- 2) To be able to use a string slice to represent the CharIterator, a function `slice_unchecked` is added, that does the same as `slice_bytes` but without any boundary checks. It would be possible to implement CharIterator with pointer arithmetic to make it *much more efficient*, but since vec iterator is still improving, it's too early to attempt to re-implement it in other places. Hopefully CharIterator can be implemented on top of vec iterator without any unsafe code later. Additional changes fix the documentation about null termination.

blake2-ppc added 10 commits August 19, 2013 11:19

std::str: Bench test for char iterators

548bdba

std::str: Small fix for slice

4043c70

std::str: Special case char_range_at_reverse so it is faster

3cb5b8d

Implement char_range_at_reverse similarly to char_range_at, instead of re-using that method.

std::str: Add str::raw::slice_unchecked

8a5889d

Add a function like raw::slice_bytes, but it doesn't check slice boundaries. For iterator use where we always know the begin, end indices are in range.

std::str: Implement CharIterator separately

db3eb72

Let CharIterator be a separate type from CharOffsetIterator (so that CharIterator can be cloned, for example). Implement CharOffsetIterator by using the same technique as the method subslice_offset.

std::str: Use CharOffsetIterator in .find() and .rfind()

595dd84

std::str: Correct docstrings for lack of null terminator in ~str and …

f33a30e

…&str

std::str: Only check char boundary for end index in .slice_to()

8931ad9

std::str: Use CharOffsetIterator in slice_chars

5eff3e1

std::str: Improve comments for CharIterator

30ab96b

std::str: Use iterators instead of while loops for CharSplitIterator

8fe8302

Embed an iterator in the CharSplitIterator struct, and combine that with the former bool `only_ascii`; so use an enum instead.

std::str: Add test for CharIterator .clone()

93de60e

bors closed this Aug 22, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CharIterator changes for std::str #8590

CharIterator changes for std::str #8590

Uh oh!

bluss commented Aug 18, 2013

Uh oh!

alexcrichton commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

alexcrichton commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

alexcrichton commented Aug 20, 2013

Uh oh!

bluss commented Aug 21, 2013

Uh oh!

Uh oh!

CharIterator changes for std::str #8590

CharIterator changes for std::str #8590

Uh oh!

Conversation

bluss commented Aug 18, 2013

Uh oh!

alexcrichton commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

alexcrichton commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

bluss commented Aug 19, 2013

Uh oh!

alexcrichton commented Aug 20, 2013

Uh oh!

bluss commented Aug 21, 2013

Uh oh!

Uh oh!