-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Speed up short UTF-16 distance calculations #62717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci test |
@swift-ci benchmark |
Benchmark results are... peculiar.
|
How stale is the baseline for the perf tests? |
@@ -201,6 +201,17 @@ extension String.UTF16View: BidirectionalCollection { | |||
return _foreignIndex(i, offsetBy: n) | |||
} | |||
|
|||
if n.magnitude <= _StringBreadcrumbs.breadcrumbStride { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On average, using breadcrumbs ought to win when |n| > breadcrumbStride/2, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the average cost of using breadcrumbs is breadcrumbStride/2 each time we do it. In this case, we're doing it twice, though, so I expect the cutoff to be doubled. 🤔
8e85017
to
3b340dc
Compare
@swift-ci benchmark |
We are building & measuring the baseline on the fly with each benchmark run, so it should be always up to date. Huh, unless we somehow select the wrong commit for the baseline builds, perhaps? Edit: Hm, maybe: the logs indicate that CI is for some reason rebuilding swiftAST and swiftParse after switching to the PR branch, but this PR only contains stdlib changes. 🤔 |
cf3551c
to
a8ab24c
Compare
Previously we insisted on using breadcrumbs even if we only needed to travel a very short way. This could be as much as ten times slower than the naive algorithm of simply visiting all the Unicode scalars in between the start and the end. (Using breadcrumbs generally means that we need to walk to both endpoints from their nearest breadcrumb, which on average requires walking half the distance between breadcrumbs — and this can mean visiting vastly more Unicode scalars than the ones that are simply lying in between the endpoints themselves.)
… ranges Instead of calling `_toUTF16Index` twice, call it once and then use `index(_:offsetBy:)` to potentially avoid another breadcrumbs lookup.
a8ab24c
to
6fee1b3
Compare
In fact, let's try if the rebuild set changes after rebasing this on top of current main. @swift-ci benchmark |
@swift-ci benchmark |
@swift-ci test |
This has still rebuilt a lot more source files between
|
Hm; it seems the CI benchmarks are no longer providing a usable signal. Even if we accept that the timer results are unreliable (given all the
|
The code size differences don't seem to reproduce in local benchmark builds -- evidently the "swift-ci benchmark" command is not doing the right thing. |
Is any of the code in question inlineable? Last time I poked at it, benchmarks didn't properly rebuild in the face of inlineable/transparent code changes. (I'm pretty sure all of this is behind ABI, but that's the first thing to check.) |
No, the changes in this PR only affect non-inlinable functions. The logs also indicate that the benchmarks are fully rebuilt twice -- once for the base commit (presumably from the head of |
Lovely, so
We need a reliable way to track the performance of the stdlib. This is not it. |
We commonly start from the `startIndex`, in which case `_nativeGetOffset` is essentially free. Consider this case when calculating the threshold for using breadcrumbs.
Speed up conversion between UTF-16 offset ranges and string index ranges, by carefully switching between absolute and relative index calculations, depending on the distance we need to go. It is a surprisingly tricky puzzle to do this correctly while avoiding redundant calculations. Offset ranges within substrings add the additional complication of having to bias offset values with the absolute offset of the substring’s start index.
b7d1174
to
d00f8ed
Compare
Evidently Local benchmark results, including the new benchmarks in #62783: Regression (30)
Improvement (32)
|
I'd like to understand what's going on with |
@@ -201,6 +204,14 @@ extension String.UTF16View: BidirectionalCollection { | |||
return _foreignIndex(i, offsetBy: n) | |||
} | |||
|
|||
let threshold = ( | |||
i == startIndex ? _breadcrumbStride / 2 : _breadcrumbStride) | |||
if n.magnitude < threshold, !_guts.isASCII { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the sense of this condition right? Isn't direct computation always faster when _guts.isASCII
is true? What am I misreading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _nativeGetIndex
/_nativeGetOffset
calls below have O(1) ASCII fast paths, so calling the advancing loop in _index(_:offsetBy:)
would make that case worse. (I used to have a special case for _guts.isASCII
inside this branch, but it seems simpler to just let the original code take care of it.)
The generated code looked reasonable enough, but the Frustratingly though this resulted in 130-135% regressions for (FWIW, the new -20% improvement to Regression (14)
Improvement (46)
|
@swift-ci test |
OK. I'm happy with taking this once we get a clean test run at this point. |
Evidently we did not have any tests that exercised `distance(from:to:)` and `index(_:offsetBy:)`. :-O
- Align input indices to scalar boundaries - Don’t pass decreasing indices to _utf16Distance
c564b1f
to
5d354ce
Compare
It turns out we did not have proper test coverage for The last commits do not materially change benchmark results. (Except for the ASCII Regression (17)
Improvement (63)
|
@swift-ci test |
…ithms [Bidirectional]Collection’s default index manipulation methods (as well as _utf16Distance) do not expect to be given unreachable indices, and they tend to fail when operating on them. Round indices down to the nearest scalar boundary before calling these.
@swift-ci test |
let's clone this for 5.8 too, once you're ready. |
|
@swift-ci test macOS platform |
1 similar comment
@swift-ci test macOS platform |
|
@swift-ci smoke test macOS platform |
This is a wild guess at what might be causing our persistent, random String failures on the main branch: ``` Swift(macosx-x86_64) :: Prototypes/CollectionTransformers.swift Swift(macosx-x86_64) :: stdlib/NSSlowString.swift Swift(macosx-x86_64) :: stdlib/NSStringAPI.swift Swift(macosx-x86_64) :: stdlib/StringIndex.swift Swift-validation(macosx-x86_64) :: stdlib/String.swift Swift-validation(macosx-x86_64) :: stdlib/StringBreadcrumbs.swift Swift-validation(macosx-x86_64) :: stdlib/StringUTF8.swift ``` FWIW, it appears this is *not* caused by swiftlang#62717: that change has also landed on release/5.8, and I haven’t seen these issues on that branch. Our atomic breadcrumbs initialization vs its non-atomic loading gives me an uneasy feeling that this may in fact be a long standing synchronization issue that is only now causing problems (for whatever reason). I am unable to reproduce these issues locally, so this guess may be (and probably is) wildly off the mark, but this PR is likely to be a good idea anyway, if only to rule out this possibility. rdar://104751936
This is a wild guess at what might be causing our persistent, random String failures on the main branch: ``` Swift(macosx-x86_64) :: Prototypes/CollectionTransformers.swift Swift(macosx-x86_64) :: stdlib/NSSlowString.swift Swift(macosx-x86_64) :: stdlib/NSStringAPI.swift Swift(macosx-x86_64) :: stdlib/StringIndex.swift Swift-validation(macosx-x86_64) :: stdlib/String.swift Swift-validation(macosx-x86_64) :: stdlib/StringBreadcrumbs.swift Swift-validation(macosx-x86_64) :: stdlib/StringUTF8.swift ``` FWIW, it appears this is *not* caused by swiftlang#62717: that change has also landed on release/5.8, and I haven’t seen these issues on that branch. Our atomic breadcrumbs initialization vs its non-atomic loading gives me an uneasy feeling that this may in fact be a long standing synchronization issue that is only now causing problems (for whatever reason). I am unable to reproduce these issues locally, so this guess may be (and probably is) wildly off the mark, but this PR is likely to be a good idea anyway, if only to rule out this possibility. rdar://104751936 (cherry picked from commit 73f349c)
Previously we insisted on using UTF-16 breadcrumbs even if we only needed to travel a very short way. This could be as much as
tenforty times slower than the naive algorithm of simply visiting all the Unicode scalars in between the start and the end.(Using breadcrumbs generally means that we need to walk to both endpoints from their nearest breadcrumb, which on average requires walking half the distance between breadcrumbs, twice — and this can mean visiting vastly more Unicode scalars than if we simply walked through the ones that are lying in between the endpoints themselves.)
To put it another way, when we want to measure how long it takes to walk between two trees within a nearby park, it probably isn't a great idea to start by separately measuring each of their distances from the nearest airport. 😛
rdar://103575481