Skip to content

[stdlib][SR-14424] Zero-initialize any unused capacity in '_SmallString.init(initializingUTF8With:)' #36663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

xwu
Copy link
Collaborator

@xwu xwu commented Mar 30, 2021

There's a subtle problem with small strings:

let str = String(unsafeUninitializedCapacity: 2) {
    let last = $0.count &- 1
    $0[last] = 49 // "1"
    $0.baseAddress!.moveInitialize(from: $0.baseAddress! + last, count: 1)
    return 1
}
print(str) // "1"
print(str == "1") // false (!)

Here's why.

The unsafeUninitializedCapacity initializer requires users to return the initialized count and to leave the remaining bytes uninitialized. This is what's happening here. To wit:

  • Bytes are written from the end of the buffer towards the start as the string representation is being built up.
  • Once the final byte sequence for the string has been obtained, it is move-initialized to the start of the buffer (as per discussions on the forums, it is permitted to use moveInitialize with overlapping source and destination in this way).
  • This leaves the unused bytes uninitialized, as required by the documentation.

All of this works perfectly when the string isn't small. Digging through the sources, I understand that a _SmallString's buffer is backed by a tuple of (UInt64, UInt64). Although unused bytes are deinitialized here, they aren't zero immediately prior to deinitialization.

It has to be the job of _SmallString to initialize unused bytes to zero before rebinding the buffer to its original type. The user can't leave the unused capacity zero-initialized themselves after using it because the documentation tells the user that they should leave the unused capacity uninitialized, and therefore leaving it initialized to zero may not be what a non-small string expects.

To fix this bug, we simply zero-initialize the unused capacity before rebinding the memory back to the type of self._storage.

Resolves SR-14424.

@xwu
Copy link
Collaborator Author

xwu commented Mar 30, 2021

cc @milseman

@xwu
Copy link
Collaborator Author

xwu commented Mar 30, 2021

@swift-ci smoke test macOS platform

@xwu xwu force-pushed the smol-unused-capacity branch from 4d884f0 to df532ba Compare March 30, 2021 22:24
@xwu
Copy link
Collaborator Author

xwu commented Mar 30, 2021

@swift-ci smoke test macOS platform

@xwu
Copy link
Collaborator Author

xwu commented Mar 31, 2021

@swift-ci smoke test Linux platform

@milseman
Copy link
Member

milseman commented Mar 31, 2021

@xwu I think this is a better fix: #36667. If you want credit for it, feel free to cherry-pick that commit here and add your test case to its test cases, or vice versa.

Putting the fix in the common funnel function is better at consistently enforcing this invariant. That PR also has an assertion inside _invariantCheck() so that any test cases will check the invariant.

@xwu
Copy link
Collaborator Author

xwu commented Mar 31, 2021

I do like your fix better! I’ll defer to your PR, then add this test case afterwards.

@xwu xwu closed this Apr 3, 2021
@xwu xwu deleted the smol-unused-capacity branch April 3, 2021 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants