[string] Fix string implementation for big endian platforms #21077

mundaym · 2018-12-06T09:48:23Z

This commit supersedes 2866b4a which was overwritten by 4ab45df.

When code points are stored in multi-byte types (e.g. UInt64) we
need to convert the order of the bytes in the value to little
endian before we can access the value as raw bytes.

milseman

Why do we want to store SmallString as little endian in memory, and swap on read? Wouldn't we want SmallString to be the in-memory representation on both platforms?

milseman · 2018-12-06T23:52:10Z

stdlib/public/core/SmallString.swift

-  // Give raw, nul-terminated code units. This is only for limited internal
-  // usage: it always clears the discriminator and count (in case it's full)
-  @inlinable
-  internal var zeroTerminatedRawCodeUnits: RawBitPattern {


Why are you removing this? Wouldn't this simplify your code as well?

Also, this is an ABI break

I wasn't sure about this one. I think the new code is easier to understand without this property but I'm happy to reinstate it.

~~Why does this break the ABI?~~ (EDIT: Ah, I see, because it's inlinable) Is there any way to know when such breakages occur? This code isn't in a released version of Swift yet right?

If you run the test suite, you'll get a failure inside the api-digester.

edit: This code is in the 5.0 release branch.

I ran the validation test suite (specifically utils/build-script --release-debuginfo -T) on this PR on a Linux x86-64 box running Ubuntu 18.04 and it passed the tests. Should it have failed or will it only fail in the release branch?

Is the ABI still modifiable at this stage or is it already locked down?

It is modifiable in case of emergency, and worse case here we just keep this method around.

milseman · 2018-12-06T23:52:20Z

stdlib/public/core/SmallString.swift

+  internal init(_ object: _SmallString) {
+    self.init(raw: object.rawBits)
+  }
+


What does this init do over self = object?

Nothing, will fix it.

Oh, it does mean the code does an invariant check, whereas an assignment wouldn't I think.

Right, if you want to do a final invariant check at the end. Alternatively, we could have the discriminator setter do the check. The discriminator property was added by @lorentey after I wrote much of this code, so there's some potential cleanups like the one you found.

Adding an invariant check to the discriminator setter could misfire in the ASCII check if the string contents haven't yet been fully updated at the time the discriminator is set.

Good point. We do generally want to do an invariant check at the very end of creation, and at the very end of a mutating (e.g. verify isASCII after any RRC operation). Dealing with this was why I started to shy away from settable properties in _StringObject and _SmallString over always calling inits, as it's easy to forget to make the check.

Outside the scope of this PR obviously, we'll want to come up with a general programming pattern and codify it in https://github.com/apple/swift/blob/master/docs/StandardLibraryProgrammersManual.md

milseman · 2018-12-06T23:53:12Z

stdlib/public/core/SmallString.swift

@@ -210,7 +204,10 @@ extension _SmallString {
  internal func withUTF8<Result>(
    _ f: (UnsafeBufferPointer<UInt8>) throws -> Result
  ) rethrows -> Result {
-    var raw = self.zeroTerminatedRawCodeUnits
+    // Code points need to be in little endian byte order and zero terminated.


oops, thanks!

milseman · 2018-12-06T23:55:52Z

stdlib/public/core/SmallString.swift

  }

  @usableFromInline // @testable
  internal init?(_ base: _SmallString, appending other: _SmallString) {
    let totalCount = base.count + other.count
    guard totalCount <= _SmallString.capacity else { return nil }

-    // TODO(SIMD): The below can be replaced with just be a couple vector ops


Is this incorrect? Why?

Concatenating two 64 bit words to form one 128bit word and then treating it as a vector of bytes is likely to cause more endianess issues so I'd rather we avoided adding SIMD to this file. I also think this code could be made considerably faster without resorting to SIMD (using rotates and logical ops). I think that would be a better approach as all platforms could benefit regardless of whether they have the necessary SIMD instructions in the base ISA.

Right, but should Linux, macOS, and iOS suffer a several X performance regression in hot-paths to facilitate big-endian systems? At the very least, we would #if if necessary, assuming SIMD doesn't abstract it away sufficiently.

milseman · 2018-12-07T00:10:50Z

stdlib/public/core/SmallString.swift

+    // Code points need to be in little endian byte order and zero terminated.
+    let l = self.leadingRawBits
+    let t = self.trailingRawBits & _StringObject.Nibbles.largeAddressMask
+    var raw = (l.littleEndian, t.littleEndian)


Why do we convert the bytes to little endian on read? If the plan is to store the contents in big-endian on big-endian systems, then we don't want to convert on read, right?

And if not, then would it make more sense to just have a littleEndianBytes computed property?

The trouble is that the discriminator is stored in the most significant byte of the _object pointer. On little endian systems this means the discriminator is stored in the 'last' byte of the string objects memory:

[ x x x x x x x x x x x x x x x d ]

However on big endian systems this means that the discriminator is stored in the middle byte of the string object:

[ x x x x x x x x d x x x x x x x ]

Since the discriminator is in the middle we can only access the first 8 code units with a withUnsafeBytes call if we use big endian byte order. However if we byte swap to little endian byte order before any withUnsafeBytes calls then we can access all 15 code units and so big endian machines can do all the same operations a little endian machine can do. It also means we can add code units to the small string in the same order on both big and little endian platforms which is much cleaner and easier to manage.

We could add a littleEndianBytes computed property but I don't think the code saving is worth the additional obfuscation. We should only ever need to do the little endian conversion before a withUnsafeBytes call and ideally we won't add any more direct calls to that and instead always go through withUTF8 and withMutableCapacity.

We could also go back to the approach I took in 2866b4a and always store _SmallString._storage in little endian byte order. That avoids byte swapping before a read but means that _StringObject and _SmallString have different raw values for the same string. Though this is also true for 32-bit architectures, so maybe we should go down that path. It would have the nice side effect of making withUnsafeBytes always safe on _SmallString values.

Yeah, I think that's the path we'll want to go down. For 64-bit LE, _StringObject and _SmallString have equivalent layout, but otherwise we need to do extra unpacking / byte-swapping as needed.

That will probably make further maintenance easier as well, as we're more likely to add more readers than creators.

... and might let any SIMD, which we really want to do (on LE platforms at the very least), at least have a chance of being portable.

mundaym · 2019-01-03T11:17:14Z

I've reworked this PR to minimize the code changes and to remove the ABI changes. The code units in small strings are now always stored in little-endian byte order rather than being re-ordered when accessed as an array. Please take a look when you get a chance. Thanks!

milseman · 2019-01-03T19:14:41Z

I’m a little confused. We had said:

We could also go back to the approach I took in 2866b4a and always store _SmallString._storage in little endian byte order. That avoids byte swapping before a read but means that _StringObject and _SmallString have different raw values for the same string. Though this is also true for 32-bit architectures, so maybe we should go down that path. It would have the nice side effect of making withUnsafeBytes always safe on _SmallString values.

My reading of that was that we would byte swap when converting between _SmallString and a _StringObject (analogous to 32-bit), rather than before every read. That would make it so that on any platform, the _SmallString ’s in-memory representation is that of a 16-byte Array from first to last code unit, with a discriminator+count stored in the last byte. We’re much more likely to be extending/enhancing/maintaining readers of the contents of _SmallString than the converter to/from _StringObject.

In that sense, _StringObject stores the contents in little-endian byte order so that the discriminator stays in the high bits available from our 50-bit limited address space. They get swapped when forming the _SmallString struct, wherein they are now stored in native memory order (i.e. pretend its an Array). This seems less invasive in code as well, and keeps consistency by having _SmallString.rawBits be the raw bits of a _SmallString, rather than byte-swapped.

A read from a String will have a byte-swap somewhere down the line, it just seems the cleanest for that to be during the _StringObject -> _SmallString conversion so that we can have uniform code that accesses the _SmallString, like on 32-bit. This also speeds up code that does multiple operations on the _SmallString by avoiding extra byte swaps. Is there a downside to this approach that I’m missing?

mundaym · 2019-01-03T21:01:11Z

In the new code _StringObject is always stored in native byte order (so that the discriminator is in the right place as you mentioned). Then when we create a _SmallString we put the bytes into little-endian order. This happens in the _SmallString constructor rather than in _StringObject but the logic could live in either place. I think it's cleaner this way (only _SmallString needs to know it is in little-endian byte order).

The little-endian byte order of the raw bits in storage means that on big-endian systems the bytes in each 64-bit chunk of the stored tuple are reversed. This is great for operations such as withUTF8 which will now work out of the box. It does, however, add some extra complexity to operations, such as append, that manipulate the stored code units using 64-bit operations. Either we create two code paths, one for big-endian systems and one for little-endian systems, or we convert the tuple into native byte order before manipulating it and convert it back into little-endian byte order. I went for the second approach since it means that the algorithm is the same on all platforms and we only need one code path. The cost of byte swaps is negligible.

Ideally I'd change the data type of the RawBitPattern (like I did in 2866b4a) to encapsulate the endianness of the platform completely, but that changes the ABI. This nice thing with the new code is that it shouldn't have any effect on little-endian platforms (since UInt64.littleEndian is a no-op), however it is a bit more fragile than the fully encapsulated solution.

milseman · 2019-01-04T22:52:33Z

I feel like we’re talking past each other, and it might be due to conflicting terminology. Rather than say “it’s in little endian”, let’s use the term “memory order” to mean the order in memory that it would be laid out as if it were a 16-byte Array of UInt8 starting at the first code unit and finishing with the discriminator and count. To me, “memory order” seems like it would be equivalent to platform-native endianness, and I think we’re trying to say that _SmallStringis in memory order.

We both agree that we do a byte swap on BE platforms when creating a _SmallString from a _StringObject. This puts the discriminator in the LSB of the second word, and the first byte is in the MSB of the first word. I.e., it’s now in memory order. I wouldn't think of this as “little endian”; if anything it seems like _StringObject is the one in little endian because its first code unit is in the LSB of the first word, and the discriminator is in the MSB of the second word.

Back to the code. I find it pretty confusing and brittle to change rawBits to no longer be the raw bits, but rather byte swap away from memory-order. That means that an initializer on _SmallString which wanted to take these raw bits would need to remember to byte swap them before storing them, i.e. they’re not really the raw bits. Furthermore, code such as the appending initializer needs to do another byte-swap to undo the prior byte-swap done by rawBits, and this is all very confusing. All of this byte swapping is mostly just there to facilitate subscript, because subscript uses a helper function that byte-indexes a UInt with 0 meaning the LSB. Wouldn’t it be clearer to instead change _uncheckedGetByte, _uncheckedSetByte, and _bytesToUInt64 to be endianness correct, rather than byte swapping before and after every use? E.g. _uncheckedGetByte uses self.littleEndian rather than self.

This commit supersedes 2866b4a which was overwritten by 4ab45df. Store small string code units in little-endian byte order. This way the code units are in the same order on all machines and can be safely treated as an array of bytes.

mundaym · 2019-01-07T14:16:13Z

Thanks for your patience. I've updated the PR with your suggestions. Is it a bit closer to what you had in mind now?

milseman

This is looking great! Thanks for your patience as well.

Review notes are just details about comments, which I don't mind doing in a followup PR. I'll start testing and benchmarking now.

milseman · 2019-01-07T18:56:39Z

stdlib/public/core/SmallString.swift

@@ -74,7 +89,8 @@ extension _SmallString {

  @inlinable @inline(__always)


This is worth a comment stating it's back to _StringObject's notion of a raw discriminated object. I can add that in a followup PR.

Added a brief comment. Happy to drop this additional commit if you'd rather do a comment follow up yourself.

milseman · 2019-01-07T19:02:35Z

stdlib/public/core/SmallString.swift

-  // Fetches the `i`th byte, from least-significant to most-significant
-  //
-  // TODO: endianess awareness day
+  // Fetches the `i`th byte, from left to right.
  @inlinable @inline(__always)


What does "left to right" mean, memory-order/host-endianess?

Clarified the comment (memory-order).

milseman · 2019-01-07T19:04:36Z

@swift-ci please test

milseman · 2019-01-07T19:04:42Z

@swift-ci please benchmark

swift-ci · 2019-01-07T19:06:38Z

Build failed
Swift Test Linux Platform
Git Sha - 8289e16446b662007beda3ec063fed883e6f75a3

swift-ci · 2019-01-07T19:06:43Z

Build failed
Swift Test OS X Platform
Git Sha - 8289e16446b662007beda3ec063fed883e6f75a3

swift-ci · 2019-01-07T19:49:44Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
Breadcrumbs.CopyUTF16CodeUnits.Mixed	68	56	-17.6%	1.21x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
CountAlgoString	1760	1930	+9.7%	0.91x
Improvement
Breadcrumbs.CopyUTF16CodeUnits.Mixed	69	56	-18.8%	1.23x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
EqualStringSubstring	47	51	+8.5%	0.92x (?)

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

Clarify the expected behavior of some small string helper functions.

milseman · 2019-01-08T19:42:43Z

@swift-ci please test

swift-ci · 2019-01-08T19:43:17Z

Build failed
Swift Test OS X Platform
Git Sha - 9587582

swift-ci · 2019-01-08T19:44:49Z

Build failed
Swift Test Linux Platform
Git Sha - 9587582

mundaym · 2019-01-31T11:09:55Z

@milseman Is there a way to restart the CI tests? I'm not sure what went wrong with them but I don't think it was anything to do with this change. Thanks!

milseman · 2019-01-31T22:33:09Z

@swift-ci please test os x platform

milseman · 2019-01-31T22:33:22Z

@swift-ci please test linux platform

swift-ci · 2019-02-01T00:31:47Z

Build failed
Swift Test OS X Platform
Git Sha - 8d4608f

milseman · 2019-02-04T23:43:23Z

@swift-ci please test os x platform

lorentey requested a review from milseman December 6, 2018 15:15

milseman requested changes Dec 7, 2018

View reviewed changes

mundaym force-pushed the s390x-smallstring-v2 branch 2 times, most recently from ba3c7b9 to 8289e16 Compare January 3, 2019 11:14

mundaym force-pushed the s390x-smallstring-v2 branch from 8289e16 to 9587582 Compare January 7, 2019 14:02

milseman approved these changes Jan 7, 2019

View reviewed changes

[string] Improve some small string comments

8d4608f

Clarify the expected behavior of some small string helper functions.

milseman merged commit d376a0f into swiftlang:master Feb 5, 2019

mundaym mentioned this pull request Feb 14, 2019

[5.0] [string] Fix string implementation for big endian platforms #22614

Merged

		@@ -74,7 +89,8 @@ extension _SmallString {

		@inlinable @inline(__always)

[string] Fix string implementation for big endian platforms #21077

[string] Fix string implementation for big endian platforms #21077

Uh oh!

Conversation

mundaym commented Dec 6, 2018

Uh oh!

milseman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mundaym Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milseman Dec 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mundaym commented Jan 3, 2019

Uh oh!

milseman commented Jan 3, 2019

Uh oh!

mundaym commented Jan 3, 2019

Uh oh!

milseman commented Jan 4, 2019

Uh oh!

mundaym commented Jan 7, 2019

Uh oh!

milseman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mundaym Dec 7, 2018 •

edited

Loading

milseman Dec 7, 2018 •

edited

Loading