Skip to content

[string] Fix string implementation for big endian platforms #21077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 5, 2019

Conversation

mundaym
Copy link
Contributor

@mundaym mundaym commented Dec 6, 2018

This commit supersedes 2866b4a which was overwritten by 4ab45df.

When code points are stored in multi-byte types (e.g. UInt64) we
need to convert the order of the bytes in the value to little
endian before we can access the value as raw bytes.

@lorentey lorentey requested a review from milseman December 6, 2018 15:15
Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to store SmallString as little endian in memory, and swap on read? Wouldn't we want SmallString to be the in-memory representation on both platforms?

// Give raw, nul-terminated code units. This is only for limited internal
// usage: it always clears the discriminator and count (in case it's full)
@inlinable
internal var zeroTerminatedRawCodeUnits: RawBitPattern {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this? Wouldn't this simplify your code as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is an ABI break

Copy link
Contributor Author

@mundaym mundaym Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about this one. I think the new code is easier to understand without this property but I'm happy to reinstate it.

Why does this break the ABI? (EDIT: Ah, I see, because it's inlinable) Is there any way to know when such breakages occur? This code isn't in a released version of Swift yet right?

Copy link
Member

@milseman milseman Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run the test suite, you'll get a failure inside the api-digester.

edit: This code is in the 5.0 release branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the validation test suite (specifically utils/build-script --release-debuginfo -T) on this PR on a Linux x86-64 box running Ubuntu 18.04 and it passed the tests. Should it have failed or will it only fail in the release branch?

Is the ABI still modifiable at this stage or is it already locked down?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is modifiable in case of emergency, and worse case here we just keep this method around.

internal init(_ object: _SmallString) {
self.init(raw: object.rawBits)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this init do over self = object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it does mean the code does an invariant check, whereas an assignment wouldn't I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you want to do a final invariant check at the end. Alternatively, we could have the discriminator setter do the check. The discriminator property was added by @lorentey after I wrote much of this code, so there's some potential cleanups like the one you found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an invariant check to the discriminator setter could misfire in the ASCII check if the string contents haven't yet been fully updated at the time the discriminator is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We do generally want to do an invariant check at the very end of creation, and at the very end of a mutating (e.g. verify isASCII after any RRC operation). Dealing with this was why I started to shy away from settable properties in _StringObject and _SmallString over always calling inits, as it's easy to forget to make the check.

Outside the scope of this PR obviously, we'll want to come up with a general programming pattern and codify it in https://github.com/apple/swift/blob/master/docs/StandardLibraryProgrammersManual.md

@@ -210,7 +204,10 @@ extension _SmallString {
internal func withUTF8<Result>(
_ f: (UnsafeBufferPointer<UInt8>) throws -> Result
) rethrows -> Result {
var raw = self.zeroTerminatedRawCodeUnits
// Code points need to be in little endian byte order and zero terminated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code units

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, thanks!

}

@usableFromInline // @testable
internal init?(_ base: _SmallString, appending other: _SmallString) {
let totalCount = base.count + other.count
guard totalCount <= _SmallString.capacity else { return nil }

// TODO(SIMD): The below can be replaced with just be a couple vector ops
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this incorrect? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenating two 64 bit words to form one 128bit word and then treating it as a vector of bytes is likely to cause more endianess issues so I'd rather we avoided adding SIMD to this file. I also think this code could be made considerably faster without resorting to SIMD (using rotates and logical ops). I think that would be a better approach as all platforms could benefit regardless of whether they have the necessary SIMD instructions in the base ISA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but should Linux, macOS, and iOS suffer a several X performance regression in hot-paths to facilitate big-endian systems? At the very least, we would #if if necessary, assuming SIMD doesn't abstract it away sufficiently.

// Code points need to be in little endian byte order and zero terminated.
let l = self.leadingRawBits
let t = self.trailingRawBits & _StringObject.Nibbles.largeAddressMask
var raw = (l.littleEndian, t.littleEndian)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert the bytes to little endian on read? If the plan is to store the contents in big-endian on big-endian systems, then we don't want to convert on read, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if not, then would it make more sense to just have a littleEndianBytes computed property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is that the discriminator is stored in the most significant byte of the _object pointer. On little endian systems this means the discriminator is stored in the 'last' byte of the string objects memory:

[ x x x x x x x x x x x x x x x d ]

However on big endian systems this means that the discriminator is stored in the middle byte of the string object:

[ x x x x x x x x d x x x x x x x ]

Since the discriminator is in the middle we can only access the first 8 code units with a withUnsafeBytes call if we use big endian byte order. However if we byte swap to little endian byte order before any withUnsafeBytes calls then we can access all 15 code units and so big endian machines can do all the same operations a little endian machine can do. It also means we can add code units to the small string in the same order on both big and little endian platforms which is much cleaner and easier to manage.

We could add a littleEndianBytes computed property but I don't think the code saving is worth the additional obfuscation. We should only ever need to do the little endian conversion before a withUnsafeBytes call and ideally we won't add any more direct calls to that and instead always go through withUTF8 and withMutableCapacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also go back to the approach I took in 2866b4a and always store _SmallString._storage in little endian byte order. That avoids byte swapping before a read but means that _StringObject and _SmallString have different raw values for the same string. Though this is also true for 32-bit architectures, so maybe we should go down that path. It would have the nice side effect of making withUnsafeBytes always safe on _SmallString values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's the path we'll want to go down. For 64-bit LE, _StringObject and _SmallString have equivalent layout, but otherwise we need to do extra unpacking / byte-swapping as needed.

That will probably make further maintenance easier as well, as we're more likely to add more readers than creators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and might let any SIMD, which we really want to do (on LE platforms at the very least), at least have a chance of being portable.

@mundaym mundaym force-pushed the s390x-smallstring-v2 branch 2 times, most recently from ba3c7b9 to 8289e16 Compare January 3, 2019 11:14
@mundaym
Copy link
Contributor Author

mundaym commented Jan 3, 2019

I've reworked this PR to minimize the code changes and to remove the ABI changes. The code units in small strings are now always stored in little-endian byte order rather than being re-ordered when accessed as an array. Please take a look when you get a chance. Thanks!

@milseman
Copy link
Member

milseman commented Jan 3, 2019

I’m a little confused. We had said:

We could also go back to the approach I took in 2866b4a and always store _SmallString._storage in little endian byte order. That avoids byte swapping before a read but means that _StringObject and _SmallString have different raw values for the same string. Though this is also true for 32-bit architectures, so maybe we should go down that path. It would have the nice side effect of making withUnsafeBytes always safe on _SmallString values.

My reading of that was that we would byte swap when converting between _SmallString and a _StringObject (analogous to 32-bit), rather than before every read. That would make it so that on any platform, the _SmallString ’s in-memory representation is that of a 16-byte Array from first to last code unit, with a discriminator+count stored in the last byte. We’re much more likely to be extending/enhancing/maintaining readers of the contents of _SmallString than the converter to/from _StringObject.

In that sense, _StringObject stores the contents in little-endian byte order so that the discriminator stays in the high bits available from our 50-bit limited address space. They get swapped when forming the _SmallString struct, wherein they are now stored in native memory order (i.e. pretend its an Array). This seems less invasive in code as well, and keeps consistency by having _SmallString.rawBits be the raw bits of a _SmallString, rather than byte-swapped.

A read from a String will have a byte-swap somewhere down the line, it just seems the cleanest for that to be during the _StringObject -> _SmallString conversion so that we can have uniform code that accesses the _SmallString, like on 32-bit. This also speeds up code that does multiple operations on the _SmallString by avoiding extra byte swaps. Is there a downside to this approach that I’m missing?

@mundaym
Copy link
Contributor Author

mundaym commented Jan 3, 2019

In the new code _StringObject is always stored in native byte order (so that the discriminator is in the right place as you mentioned). Then when we create a _SmallString we put the bytes into little-endian order. This happens in the _SmallString constructor rather than in _StringObject but the logic could live in either place. I think it's cleaner this way (only _SmallString needs to know it is in little-endian byte order).

The little-endian byte order of the raw bits in storage means that on big-endian systems the bytes in each 64-bit chunk of the stored tuple are reversed. This is great for operations such as withUTF8 which will now work out of the box. It does, however, add some extra complexity to operations, such as append, that manipulate the stored code units using 64-bit operations. Either we create two code paths, one for big-endian systems and one for little-endian systems, or we convert the tuple into native byte order before manipulating it and convert it back into little-endian byte order. I went for the second approach since it means that the algorithm is the same on all platforms and we only need one code path. The cost of byte swaps is negligible.

Ideally I'd change the data type of the RawBitPattern (like I did in 2866b4a) to encapsulate the endianness of the platform completely, but that changes the ABI. This nice thing with the new code is that it shouldn't have any effect on little-endian platforms (since UInt64.littleEndian is a no-op), however it is a bit more fragile than the fully encapsulated solution.

@milseman
Copy link
Member

milseman commented Jan 4, 2019

I feel like we’re talking past each other, and it might be due to conflicting terminology. Rather than say “it’s in little endian”, let’s use the term “memory order” to mean the order in memory that it would be laid out as if it were a 16-byte Array of UInt8 starting at the first code unit and finishing with the discriminator and count. To me, “memory order” seems like it would be equivalent to platform-native endianness, and I think we’re trying to say that _SmallStringis in memory order.

We both agree that we do a byte swap on BE platforms when creating a _SmallString from a _StringObject. This puts the discriminator in the LSB of the second word, and the first byte is in the MSB of the first word. I.e., it’s now in memory order. I wouldn't think of this as “little endian”; if anything it seems like _StringObject is the one in little endian because its first code unit is in the LSB of the first word, and the discriminator is in the MSB of the second word.

Back to the code. I find it pretty confusing and brittle to change rawBits to no longer be the raw bits, but rather byte swap away from memory-order. That means that an initializer on _SmallString which wanted to take these raw bits would need to remember to byte swap them before storing them, i.e. they’re not really the raw bits. Furthermore, code such as the appending initializer needs to do another byte-swap to undo the prior byte-swap done by rawBits, and this is all very confusing. All of this byte swapping is mostly just there to facilitate subscript, because subscript uses a helper function that byte-indexes a UInt with 0 meaning the LSB. Wouldn’t it be clearer to instead change _uncheckedGetByte, _uncheckedSetByte, and _bytesToUInt64 to be endianness correct, rather than byte swapping before and after every use? E.g. _uncheckedGetByte uses self.littleEndian rather than self.

This commit supersedes 2866b4a which was overwritten by 4ab45df.

Store small string code units in little-endian byte order. This way
the code units are in the same order on all machines and can be
safely treated as an array of bytes.
@mundaym mundaym force-pushed the s390x-smallstring-v2 branch from 8289e16 to 9587582 Compare January 7, 2019 14:02
@mundaym
Copy link
Contributor Author

mundaym commented Jan 7, 2019

Thanks for your patience. I've updated the PR with your suggestions. Is it a bit closer to what you had in mind now?

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Thanks for your patience as well.

Review notes are just details about comments, which I don't mind doing in a followup PR. I'll start testing and benchmarking now.

@@ -74,7 +89,8 @@ extension _SmallString {

@inlinable @inline(__always)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth a comment stating it's back to _StringObject's notion of a raw discriminated object. I can add that in a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a brief comment. Happy to drop this additional commit if you'd rather do a comment follow up yourself.

// Fetches the `i`th byte, from least-significant to most-significant
//
// TODO: endianess awareness day
// Fetches the `i`th byte, from left to right.
@inlinable @inline(__always)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "left to right" mean, memory-order/host-endianess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the comment (memory-order).

@milseman
Copy link
Member

milseman commented Jan 7, 2019

@swift-ci please test

@milseman
Copy link
Member

milseman commented Jan 7, 2019

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Jan 7, 2019

Build failed
Swift Test Linux Platform
Git Sha - 8289e16446b662007beda3ec063fed883e6f75a3

@swift-ci
Copy link
Contributor

swift-ci commented Jan 7, 2019

Build failed
Swift Test OS X Platform
Git Sha - 8289e16446b662007beda3ec063fed883e6f75a3

@swift-ci
Copy link
Contributor

swift-ci commented Jan 7, 2019

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
Breadcrumbs.CopyUTF16CodeUnits.Mixed 68 56 -17.6% 1.21x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
CountAlgoString 1760 1930 +9.7% 0.91x
Improvement
Breadcrumbs.CopyUTF16CodeUnits.Mixed 69 56 -18.8% 1.23x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
EqualStringSubstring 47 51 +8.5% 0.92x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

Clarify the expected behavior of some small string helper functions.
@milseman
Copy link
Member

milseman commented Jan 8, 2019

@swift-ci please test

@swift-ci
Copy link
Contributor

swift-ci commented Jan 8, 2019

Build failed
Swift Test OS X Platform
Git Sha - 9587582

@swift-ci
Copy link
Contributor

swift-ci commented Jan 8, 2019

Build failed
Swift Test Linux Platform
Git Sha - 9587582

@mundaym
Copy link
Contributor Author

mundaym commented Jan 31, 2019

@milseman Is there a way to restart the CI tests? I'm not sure what went wrong with them but I don't think it was anything to do with this change. Thanks!

@milseman
Copy link
Member

@swift-ci please test os x platform

@milseman
Copy link
Member

@swift-ci please test linux platform

@swift-ci
Copy link
Contributor

swift-ci commented Feb 1, 2019

Build failed
Swift Test OS X Platform
Git Sha - 8d4608f

@milseman
Copy link
Member

milseman commented Feb 4, 2019

@swift-ci please test os x platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants