-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib] Rewrite UTF8._isValidUTF8() #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci Please test |
@PatrickPijnappel There's a build failure, would you mind taking a look? |
@gribozavr My bad! Will take a look and resolve. |
// Require 10xx xxxx 110x xxxx. | ||
if buffer & 0xc0e0 != 0x80c0 { return false } | ||
// Disallow xxxx xxxx xxx0 000x (<= 7 bits case). | ||
if buffer & 0x001e == 0x0000 { return false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this case. I think you meant to test against 0x1f00
instead of 0x001e
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bytes come in reverse order. Never mind.
@PatrickPijnappel This is brilliant! Please fix the build issue, and I'll run the benchmarks. |
@PatrickPijnappel Great stuff! 👍 |
This is as a replacement for usages of UTF8._numTrailingBytes(). Note that the sanityCheck was redundant at both call sites.
The checks are technically different (previous check only rejected malformed initial code units, not all malformed sequences). Which is more correct is debatable, but since _buffer is only filled by transcoding from UTF-16 it should always be well-formed anyway and the difference is not very relevant.
Replaces the tests for the removed _numTrailingBytes()
@gribozavr OK fixed the issues and added some validation tests as well! |
@swift-ci Please test |
} | ||
} | ||
return true | ||
public static func _isValidUTF8(buffer: UInt32) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use public // @testable
(like we do in other places), for documentation purposes, and to make it easy to fix up the code when @testable
works for the standard library.
Added |
@swift-ci Please test |
Running benchmarks. |
I'm seeing >10% improvements for ErrorHandling, NSError, NSStringConversion, and SevenBoom. @PatrickPijnappel If you have a targeted microbenchmark, feel free to contribute it to a new file under |
[stdlib] Rewrite UTF8._isValidUTF8()
@gribozavr Added a UTF-8 benchmark (#1493). I'm in the process of simplifying/optimizing the other parts of UTF-8 decoding so it'll be useful to have a benchmark! |
What's in this pull request?
A rewrite of
UTF8._isValidUTF8()
, which further improves performance (mainly by removing branches) and reduces code size.Tested against original implementation for all input values (
0...0xffffffff
), results are identical.Before merging this pull request to apple/swift repository:
Triggering Swift CI
The swift-ci is triggered by writing a comment on this PR addressed to the github user @swift-ci. Different tests will run depending on the specific comment that you use. The currently available comments are:
Smoke Testing
Validation Testing
Note: Only members of the Apple organization can trigger swift-ci.