Skip to content

[String] Add UTF-8 fast-paths for Foundation initializers #21959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2019

Conversation

milseman
Copy link
Member

Many Foundation initializers could benefit from faster string
construction and subsequent reads in Swift 5. Add UTF-8 fast paths for
when constructing a string from a valid UTF-8 code units.

Many Foundation initializers could benefit from faster string
construction and subsequent reads in Swift 5. Add UTF-8 fast paths for
when constructing a string from a valid UTF-8 code units.
@milseman
Copy link
Member Author

@swift-ci please test

@milseman
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 390 675 +73.1% 0.58x
UTF8Decode_InitFromData_ascii 544 648 +19.1% 0.84x (?)
Improvement
UTF8Decode_InitFromData 1084 214 -80.3% 5.07x
UTF8Decode_InitFromBytes 1053 228 -78.3% 4.62x
DictionaryBridgeToObjC_Access 968 780 -19.4% 1.24x (?)
Data.hash.Empty 64 58 -9.4% 1.10x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 432 701 +62.3% 0.62x
Improvement
UTF8Decode_InitFromData 1104 215 -80.5% 5.13x
UTF8Decode_InitFromBytes 1049 229 -78.2% 4.58x
Data.hash.Empty 66 61 -7.6% 1.08x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 423 691 +63.4% 0.61x
UTF8Decode_InitFromData_ascii 548 654 +19.3% 0.84x (?)
Improvement
UTF8Decode_InitFromData 1101 215 -80.5% 5.12x
UTF8Decode_InitFromBytes 1058 226 -78.6% 4.68x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB
--------------

@airspeedswift
Copy link
Member

Do we have test coverage for these paths?

@milseman
Copy link
Member Author

@swift-ci please test

@milseman
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - a088e13

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - a088e13

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 441 586 +32.9% 0.75x (?)
Improvement
UTF8Decode_InitFromData 1213 192 -84.2% 6.32x
UTF8Decode_InitFromBytes 1166 207 -82.2% 5.63x
UTF8Decode_InitFromData_ascii 641 529 -17.5% 1.21x
LessSubstringSubstring 43 39 -9.3% 1.10x
EqualStringSubstring 43 39 -9.3% 1.10x (?)
EqualSubstringSubstringGenericEquatable 43 39 -9.3% 1.10x
EqualSubstringString 43 39 -9.3% 1.10x (?)
LessSubstringSubstringGenericComparable 43 39 -9.3% 1.10x
Data.hash.Empty 71 65 -8.5% 1.09x (?)
StringComparison_slowerPrenormal 1720 1580 -8.1% 1.09x (?)
StringBuilderSmallReservingCapacity 384 354 -7.8% 1.08x (?)
StringBuilder 368 340 -7.6% 1.08x (?)
EqualSubstringSubstring 43 40 -7.0% 1.07x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 449 608 +35.4% 0.74x
Improvement
UTF8Decode_InitFromData 1218 193 -84.2% 6.31x
UTF8Decode_InitFromBytes 1167 210 -82.0% 5.56x
DataAppendDataLargeToLarge 50800 37800 -25.6% 1.34x (?)
LessSubstringSubstringGenericComparable 44 39 -11.4% 1.13x
EqualSubstringSubstring 43 39 -9.3% 1.10x (?)
EqualStringSubstring 43 39 -9.3% 1.10x (?)
EqualSubstringString 43 39 -9.3% 1.10x
Data.hash.Empty 74 68 -8.1% 1.09x (?)
UTF8Decode_InitFromData_ascii 598 552 -7.7% 1.08x
LessSubstringSubstring 43 40 -7.0% 1.07x
EqualSubstringSubstringGenericEquatable 43 40 -7.0% 1.07x (?)
StringComparison_slowerPrenormal 1730 1610 -6.9% 1.07x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
UTF8Decode_InitFromBytes_ascii 462 591 +27.9% 0.78x (?)
StrComplexWalk 6670 7360 +10.3% 0.91x (?)
Improvement
UTF8Decode_InitFromData 1220 194 -84.1% 6.29x
UTF8Decode_InitFromBytes 1176 209 -82.2% 5.63x
UTF8Decode_InitFromData_ascii 669 530 -20.8% 1.26x
ObjectiveCBridgeStubToNSDate2 1529 1314 -14.1% 1.16x (?)
DataAppendDataSmallToSmall 4520 4020 -11.1% 1.12x (?)
SortStringsUnicode 5305 4924 -7.2% 1.08x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

if let str = String(validatingUTF8: bytes) {
self = str
return
}
if let ns = NSString(utf8String: bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove the NSString-and-bridge part of this entirely?
Then we could also make it inlineable since it trivially forwards to the stdlib function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to preserve behavior if the bytes are invalidly encoded. We should probably deprecate this initializer.

@milseman
Copy link
Member Author

@swift-ci please test

@milseman milseman requested a review from itaiferber January 22, 2019 18:26
@milseman
Copy link
Member Author

@itaiferber these are some pretty exciting numbers, and they actually understate the benefit as they only measure creation time: reads of these fast-pathed strings will also be on all our fast-paths.

I'm working on the last regression currently, but we'd like to get this ready ASAP

Copy link
Contributor

@itaiferber itaiferber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good! Perf numbers are great 😄 I am somewhat concerned about memory binding here though — can we express this without the binding?

@@ -381,6 +395,14 @@ extension String {
/// Returns a `String` initialized by converting given `data` into
/// Unicode characters using a given `encoding`.
public init?(data: __shared Data, encoding: Encoding) {
if encoding == .utf8,
let str = data.withUnsafeBytes({
String._tryFromUTF8($0.bindMemory(to: UInt8.self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, in previous discussions, we've agreed that the safety of binding Data is dubious given the fact that the underlying buffer may have been previously bound. Are we making the assertion that binding to a trivial type here is safe? /CC @atrick

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to express the UTF-8 validation in terms of raw buffer pointers instead of typed? (Or overload to make this possible safely?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is something to be concerned with, but if it is:

The fix would be for Data to provide a way to access its contents as a UBP<UInt8>, since Data is the owner (or knows who the owner is) and is the only one allowed to do so. If String exposed something taking a URBP, then treated it as a collection of UInt8s, that seems like it's sweeping the potential time bomb under the living room rug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data might know who the owner is, but it doesn't necessarily make it safe to bind the contents to UInt8, since the previous owner may have bound the buffer to a nontrivial type, and Data cannot unbind it — URBPs specifically do allow reading as UInt8s via a raw interface, though, making it always safe to read from.

I'm not terribly concerned (as we've previously discussed — we can't prevent someone from doing something like reading arbitrary data bound to some non-trivial type U and trying to pass it along to be interpreted as UTF-8), but we did do a lot in Data to make sure we don't invoke undefined behavior like this silently on others' behalf. @atrick might have stronger opinions here, but I agree with him in that we shouldn't be repeating that here.

Copy link
Contributor

@karwa karwa Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI withContiguousStorageIfAvailable provides a typed pointer, so Data could use that. Unfortunately, URBP doesn't support it (I guess because it doesn't want to bind random memory). EDIT: Oops, I thought Data supported that, but it's doing what URBP does.

I encountered that issue with #22028
Currently, if you write:

class MyClass {}
var s = [MyClass(), MyClass(), MyClass()]
let sbytes = withUnsafeBytes(of: &s) { String(decoding: $0, as: UTF8.self) }

you will invoke a fast-path which calls assumingMemoryBound(to: UInt8.self) on the raw pointer, which IIUC is technically undefined behaviour. But it was marked "unsafe"...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any String utilities that we want to accept a raw byte stream, like Data, including validateUTF8 and _uncheckedFromUTF8 should either take URBP or some ContiguouslyStored protocol. Either will be source compatible with the existing APIs, so they can probably be fixed later.

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 98a38b4b3114668d56248049b5fecedd7c616f64

@milseman
Copy link
Member Author

@swift-ci please benchmark

let address = Int(bitPattern: ptr)

while (address &+ i) % stride != 0 && i < count {
guard ptr[i] <= 0x7F else { return false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really semantically important, but just for readability could we stick to (& 0x80…) rather than using that in one place and <= 0x7F in the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what the expected trip-count is here, it may not be worth explicitly aligning. Did you experiment with this at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and there's not really any time for much experimentation if this is going to ship. According to @jckarter, our loads and stores assumed aligned pointers, so I thought this was necessary for correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for now, then.

let bytes: UInt = UnsafePointer(
bitPattern: address &+ i
)._unsafelyUnwrappedUnchecked.pointee
guard bytes & UInt(truncatingIfNeeded: 0x8080_8080_8080_8080 as UInt64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should give the compiler folks a bug to unroll this so we don't have to do it by hand ;)

(Also, does truncatingIfNeeded elide truncation checks when given a constant like this? We definitely don't want any unnecessary branches in this loop)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean do the whole operation, or just the promoting to a higher stride? CC @eeckstein @aschwaighofer

(As wacky as it sounds, truncatingIfNeeded is the way to skip checks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both would be nice! But I meant doing multiple words per loop, which this doesn't currently do. We squeezed another 1.5x out of CF's by doing that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the first loop is unrolled AFAICT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. That matches my mental model of the state of LLVM's loop unroller, which is that it handles known trip counts well, but mostly bails on unknown trip counts.

@milseman
Copy link
Member Author

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 98a38b4b3114668d56248049b5fecedd7c616f64

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 98a38b4b3114668d56248049b5fecedd7c616f64

@milseman
Copy link
Member Author

@swift-ci please test

@milseman
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 245cacb102faefeef2d7a18901f2568d8c088624

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 245cacb102faefeef2d7a18901f2568d8c088624

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 98a38b4b3114668d56248049b5fecedd7c616f64

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
StringComparison_abnormal 1080 1240 +14.8% 0.87x (?)
Improvement
UTF8Decode_InitFromData 1210 233 -80.7% 5.19x
UTF8Decode_InitFromBytes 1158 246 -78.8% 4.71x
UTF8Decode_InitFromData_ascii 632 245 -61.2% 2.58x
UTF8Decode_InitDecoding_ascii 546 262 -52.0% 2.08x
CStringLongAscii 391 232 -40.7% 1.69x
UTF8Decode_InitFromBytes_ascii 442 303 -31.4% 1.46x
StringComparison_slowerPrenormal 1760 1600 -9.1% 1.10x
Data.hash.Empty 71 65 -8.5% 1.09x (?)
LessSubstringSubstring 42 39 -7.1% 1.08x (?)
LessSubstringSubstringGenericComparable 42 39 -7.1% 1.08x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataCreateSmall 2000 3000 +50.0% 0.67x
StringComparison_abnormal 1100 1240 +12.7% 0.89x (?)
Improvement
UTF8Decode_InitFromData 1214 234 -80.7% 5.19x
UTF8Decode_InitFromBytes 1168 247 -78.9% 4.73x
UTF8Decode_InitFromData_ascii 612 246 -59.8% 2.49x
UTF8Decode_InitDecoding_ascii 541 256 -52.7% 2.11x
CStringLongAscii 387 229 -40.8% 1.69x
UTF8Decode_InitFromBytes_ascii 454 306 -32.6% 1.48x
StringComparison_slowerPrenormal 1780 1610 -9.6% 1.11x
LessSubstringSubstringGenericComparable 43 39 -9.3% 1.10x (?)
Data.hash.Empty 74 68 -8.1% 1.09x (?)
EqualSubstringSubstringGenericEquatable 43 40 -7.0% 1.07x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 38800 51000 +31.4% 0.76x (?)
Histogram 8304 9663 +16.4% 0.86x (?)
LazilyFilteredArrays2 125055 144597 +15.6% 0.86x
Breadcrumbs.MutatedUTF16ToIdx.ASCII 23 26 +13.0% 0.88x (?)
Breadcrumbs.UTF16ToIdxRange.longASCII 358 401 +12.0% 0.89x (?)
ChainedFilterMap 333783 373491 +11.9% 0.89x (?)
LazilyFilteredArrayContains 1342309 1499861 +11.7% 0.89x (?)
Breadcrumbs.UTF16ToIdx.longASCII 1704 1897 +11.3% 0.90x (?)
Breadcrumbs.UTF16ToIdx.longMixed 1933 2148 +11.1% 0.90x (?)
Breadcrumbs.UTF16ToIdxRange.longMixed 452 494 +9.3% 0.91x (?)
FatCompactMap 448180 489180 +9.1% 0.92x
Improvement
UTF8Decode_InitFromData 1216 235 -80.7% 5.17x
UTF8Decode_InitFromBytes 1164 247 -78.8% 4.71x
UTF8Decode_InitFromData_ascii 651 261 -59.9% 2.49x
UTF8Decode_InitDecoding_ascii 682 395 -42.1% 1.73x
CStringLongAscii 417 261 -37.4% 1.60x
UTF8Decode_InitFromBytes_ascii 482 314 -34.9% 1.54x
StrComplexWalk 7350 6690 -9.0% 1.10x (?)
StringBuilderWithLongSubstring 3240 2950 -9.0% 1.10x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@milseman
Copy link
Member Author

There we go, that gives us a good 2x for ASCII content. The regressions seem unrelated and flaky benchmarks.

Perform ASCII checking using pointer-width strides, making sure to
align properly.
@milseman
Copy link
Member Author

That force-push is just to squash the WIP commit in the middle.

@swift-ci please smoke test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants