Skip to content

Make sure we convert the source to a native UTF-8 string before parsing #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 2, 2020

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Dec 1, 2020

It appears like the compiler is now optimizing the self += "" away in release builds. Furthermore, we didn't catch the issue since the assertion checking for a native UTF-8 string also gets optimized away in release builds.

Fix the issue in two ways:

  1. Make the assertion a precondition that also fails in release builds
  2. Use the newly (as of Swift 5.1) provided standard library APIs to make the String a contiguous UTF-8 string

rdar://71863757

@ahoppen ahoppen requested a review from akyrtzi December 1, 2020 16:55
@ahoppen
Copy link
Member Author

ahoppen commented Dec 1, 2020

@swift-ci Please test

@akyrtzi
Copy link
Contributor

akyrtzi commented Dec 1, 2020

@milseman do you maybe have another recommendation for forcing a native UTF8 string?
We also use this:

  var isNativeUTF8: Bool {
    return utf8.withContiguousStorageIfAvailable { _ in 0 } != nil
  }

@milseman
Copy link
Member

milseman commented Dec 1, 2020

Are you able to use Swift 5.1? SE-0247 Contiguous Strings provides the preferred API on String: var isContiguousUTF8: Bool and mutating func makeContiguousUTF8().

@milseman
Copy link
Member

milseman commented Dec 1, 2020

If you cannot use Swift 5.1, let's come up with a solution that doesn't assemble and then disassemble grapheme clusters. Have you tried something like self = String(decoding: self.utf16, as: UTF16.self) if you know it's a bridged fully-UTF-16 Cocoa string? You can compare that performance to self = String(decoding: self.utf8, as: UTF8.self).

@akyrtzi
Copy link
Contributor

akyrtzi commented Dec 1, 2020

We can use Swift 5.1, thank you!

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the String APIs that are provided for this kind of functionality.

It appears like the compiler is now optimizing the `self += ""` away in
release builds. Furthermore, we didn't catch the issue since the
assertion checking for a native UTF-8 string also gets optimized away in
release builds.

Fix the issue in two ways:
1. Make the assertion a precondition that also fails in release builds
2. Use the newly (as of Swift 5.1) provided standard library APIs to
   make the String a contiguous UTF-8 string
@ahoppen
Copy link
Member Author

ahoppen commented Dec 2, 2020

Thanks for the pointer @milseman, that’s a much cleaner solution.

@swift-ci Please test.

@swift-ci
Copy link
Contributor

swift-ci commented Dec 2, 2020

Build failed
Swift Test Linux Platform
Git Sha - b3b1f79

@swift-ci
Copy link
Contributor

swift-ci commented Dec 2, 2020

Build failed
Swift Test OS X Platform
Git Sha - b3b1f79

@akyrtzi akyrtzi merged commit ce90205 into swiftlang:main Dec 2, 2020
@ahoppen ahoppen deleted the utf8-string branch January 14, 2023 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants