-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[stdlib]String normalization functions #21026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
Build failed |
Build failed |
@swift-ci please test |
Build failed |
@swift-ci please test |
@swift-ci please test Linux platform |
Build failed |
@swift-ci please test |
Build failed |
@swift-ci please benchmark |
Build failed |
Build comment file:Performance: -O
Performance: -Osize
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick, mostly cosmetic review. I'll do a more in-depth one after you're done with some of your in-flight changes
_ outputBuffer: UnsafeMutableBufferPointer<UInt8> | ||
) -> (Int, Int) { | ||
// Quick check if a scalar is NFC and a segment starter | ||
@inline(__always) func isNFCStarter(_ scalar: Unicode.Scalar) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already exists in StringNormalization.swift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still exists in StringNormalization.swift
|
||
var outputBufferThreshold: Int { | ||
return outputBuffer.count - 4 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this computed? It's not going to change.
return (inputCount, outputCount) | ||
} | ||
|
||
internal func transcodeToUTF16( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is specifically transcoding a segment. Please adjust the name to reflect that and add a comment, otherwise it looks like a general purpose transcoding function.
outputBuffer: UnsafeMutableBufferPointer<UInt8>, | ||
icuInputBuffer: UnsafeMutableBufferPointer<UInt16>, | ||
icuOutputBuffer: UnsafeMutableBufferPointer<UInt16> | ||
) -> NormalizationResult { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
return (index.encodedOffset - readIndex.encodedOffset, writeIndex) | ||
} | ||
|
||
private func sharedNormalize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared meaning shared strings, or shared code, or what? Please add comments to these methods. If this is a core common implementation method, Impl
or is probably better than Shared
@@ -23,6 +23,7 @@ internal func _isTrailingSurrogate(_ cu: UInt16) -> Bool { | |||
return cu & _surrogateMask == _trailingSurrogateBias | |||
} | |||
@inline(__always) | |||
@usableFromInline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
@@ -84,6 +85,23 @@ internal func _decodeUTF8( | |||
return Unicode.Scalar(_unchecked: value) | |||
} | |||
|
|||
@inlinable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this inlinable?
@@ -148,6 +166,7 @@ internal func _continuationPayload(_ x: UInt8) -> UInt32 { | |||
} | |||
|
|||
@inline(__always) | |||
@usableFromInline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
@swift-ci please benchmark |
@swift-ci please test |
@swift-ci please benchmark |
@swift-ci please test OS X platform |
Build failed |
Build comment file:Performance: -O
Performance: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
normalize = fastNormalize | ||
} else { | ||
normalize = foreignNormalize | ||
var bufferCleanup: (() -> ())? = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, why not just a Bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like all this logic could be refactored into a helper function that took the buffers inout
and told the caller if they're responsible for calling deallocate
.
@@ -219,7 +218,7 @@ internal func transcodeToUTF16( | |||
return (readIndex, writeIndex) | |||
} | |||
|
|||
internal func transcodeToUTF8( | |||
private func transcodeToUTF8( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment?
_ sourceBuffer: UnsafeBufferPointer<UInt8>, | ||
_ outputBuffer: UnsafeMutableBufferPointer<UInt8>, | ||
_ icuInputBuffer: UnsafeMutableBufferPointer<UInt16>, | ||
_ icuOutputBuffer: UnsafeMutableBufferPointer<UInt16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think argument labels are necessary here, otherwise it's hard to know what buffer to pass where.
E.g. _fastNormalized(startingAt:from:into:icuInputBuffer:icuOutputBuffer:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
let sourceCount = sourceBuffer.count - readIndex.encodedOffset | ||
let start = readIndex.encodedOffset | ||
let rebasedSourceBuffer = UnsafeBufferPointer(rebasing: sourceBuffer[start...]) | ||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scopes the returns for fastFill so I don't have to come up with some other name for read/filled. Also prevents me from accidentally using the wrong variable later on.
@swift-ci please benchmark |
Build comment file:Performance: -O
Performance: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please benchmark |
@swift-ci please test |
@swift-ci please benchmark |
@swift-ci please test |
@swift-ci please test |
Build failed |
Build failed |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please benchmark |
@swift-ci please test |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please test |
Build failed |
Build failed |
@swift-ci please test OS X platform |
Build failed |
@swift-ci please test |
Build failed |
Build failed |
@@ -114,11 +114,11 @@ let codeUnitNormalizationTests: [([UInt8], String)] = [ | |||
"몌̴ᆮ닮돈a֮𖬴̀̕b됆섨⽱쀪ﺏ탅" | |||
), | |||
( | |||
[0xc3, 0xa0, 0xd6, 0xae, 0xf0, 0x91, 0x8d, 0xb3, 0xcc, 0x95, 0x62, 0xe1, 0xb6, 0xb6, 0xec, 0x93, 0xa1, 0xed, 0x83, 0x97, 0xec, 0xa4, 0x80, 0xec, 0xa0, 0x95, 0xef, 0xbb, 0xaa, 0xec, 0x82, 0x90, 0xcc, 0xb4, 0xe1, 0x86, 0xb5, 0xed, 0x84, 0xa3, 0xed, 0x8b, 0xad, 0xeb, 0x9c, 0xaf, 0xef, 0xb6, 0x8e, 0xe3, 0x8e, 0xa1, 0xef, 0xb6, 0x94, 0xe3, 0x8e, 0x9a, 0xe2, 0x85, 0xef, 0xb0, 0x92, 0xeb, 0x83, 0xa8, 0xed, 0x9d, 0x94, 0xf0, 0x9f, 0x88, 0xb9], | |||
[0xc3, 0xa0, 0xd6, 0xae, 0xf0, 0x91, 0x8d, 0xb3, 0xcc, 0x95, 0x62, 0xe1, 0xb6, 0xb6, 0xec, 0x93, 0xa1, 0xed, 0x83, 0x97, 0xec, 0xa4, 0x80, 0xec, 0xa0, 0x95, 0xef, 0xbb, 0xaa, 0xec, 0x82, 0x90, 0xcc, 0xb4, 0xe1, 0x86, 0xb5, 0xed, 0x84, 0xa3, 0xed, 0x8b, 0xad, 0xeb, 0x9c, 0xaf, 0xef, 0xb6, 0x8e, 0xe3, 0x8e, 0xa1, 0xef, 0xb6, 0x94, 0xe3, 0x8e, 0x9a, 0xe2, 0x85, 0xb3, 0xef, 0xb0, 0x92, 0xeb, 0x83, 0xa8, 0xed, 0x9d, 0x94, 0xf0, 0x9f, 0x88, 0xb9], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
"à֮𑍳̕bᶶ쓡탗준정ﻪ삐̴ᆵ턣틭뜯ﶎ㎡ﶔ㎚ⅳﰒ냨흔🈹" | ||
), | ||
( | ||
[0xeb, 0xb3, 0x95, 0xf0, 0xa6, 0xbc, 0xac, 0xef, 0xbc, 0xba, 0xce, 0xa9, 0xec, 0xb2, 0x84, 0xe7, 0xb3, 0x96, 0xed, 0x91, 0xa0, 0xc3, 0xa0, 0xd6, 0xae, 0xe2, 0xb7, 0xb9, 0xcc, 0x95, 0x62, 0xec, 0xbc, 0x98, 0xe5, 0x8b, 0x9e, 0xec, 0x82, 0x9a, 0xed, 0x88, 0x92, 0xed, 0x87, 0x91, 0xec, 0xaf, 0x99, 0xe6, 0x86, 0xaf, 0xe2, 0xb5, 0xaf, 0xec, 0x9d, 0x8c, 0xe9, 0xb6, 0xe2, 0x93, 0xa2, 0xe2, 0x82, 0x84], | ||
[0xeb, 0xb3, 0x95, 0xf0, 0xa6, 0xbc, 0xac, 0xef, 0xbc, 0xba, 0xce, 0xa9, 0xec, 0xb2, 0x84, 0xe7, 0xb3, 0x96, 0xed, 0x91, 0xa0, 0xc3, 0xa0, 0xd6, 0xae, 0xe2, 0xb7, 0xb9, 0xcc, 0x95, 0x62, 0xec, 0xbc, 0x98, 0xe5, 0x8b, 0x9e, 0xec, 0x82, 0x9a, 0xed, 0x88, 0x92, 0xed, 0x87, 0x91, 0xec, 0xaf, 0x99, 0xe6, 0x86, 0xaf, 0xe2, 0xb5, 0xaf, 0xec, 0x9d, 0x8c, 0xe9, 0xb6, 0xb4, 0xe2, 0x93, 0xa2, 0xe2, 0x82, 0x84], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
@@ -134,323 +134,323 @@ let codeUnitNormalizationTests: [([UInt8], String)] = [ | |||
"쁏音MW삢썍곛뻏쩄뻍겔ᄇ̴ᅪ킾a〪᳟̖֚b죆퉆Ṑﰆ뒳킟저̴ᆼ" | |||
), | |||
( | |||
[0xec, 0x8c, 0x9d, 0xe6, 0x86, 0x8e, 0xec, 0xb0, 0x9a, 0xec, 0xba, 0xb0, 0xe1, 0xb6, 0xb0, 0xc3, 0xa0, 0xd6, 0xae, 0xe1, 0xaa, 0xb4, 0xcc, 0x95, 0x62, 0xec, 0xac, 0x9e, 0xec, 0xa7, 0xb9, 0xea, 0xbe, 0x89, 0xef, 0xad, 0xab, 0xd7, 0x9c, 0xd6, 0xbc, 0xe2, 0x85, 0x85, 0xeb, 0x94, 0x93, 0xe1, 0xbd, 0x92, 0xec, 0xaa, 0xa3, 0xeb, 0xab, 0x85, 0xec, 0xb7, 0x8c, 0xea, 0xb9, 0xa9, 0xec, 0xab, 0x8d, 0xec, 0x8c, 0x84, 0xe5, 0x8f, 0xa5, 0xeb, 0x89, 0x85, 0xeb, 0xed, 0x91, 0xa4, 0xeb, 0x9b, 0x95, 0xe3, 0x89, 0xa5, 0x61, 0xd6, 0xae, 0xf0, 0x9e, 0x80, 0x9d, 0xcc, 0x80, 0xcc, 0x95, 0x62, 0xe4, 0x84, 0xaf, 0xc8, 0x9f, 0xec, 0xba, 0x9b], | |||
[0xec, 0x8c, 0x9d, 0xe6, 0x86, 0x8e, 0xec, 0xb0, 0x9a, 0xec, 0xba, 0xb0, 0xe1, 0xb6, 0xb0, 0xc3, 0xa0, 0xd6, 0xae, 0xe1, 0xaa, 0xb4, 0xcc, 0x95, 0x62, 0xec, 0xac, 0x9e, 0xec, 0xa7, 0xb9, 0xea, 0xbe, 0x89, 0xef, 0xad, 0xab, 0xd7, 0x9c, 0xd6, 0xbc, 0xe2, 0x85, 0x85, 0xeb, 0x94, 0x93, 0xe1, 0xbd, 0x92, 0xec, 0xaa, 0xa3, 0xeb, 0xab, 0x85, 0xec, 0xb7, 0x8c, 0xea, 0xb9, 0xa9, 0xec, 0xab, 0x8d, 0xec, 0x8c, 0x84, 0xe5, 0x8f, 0xa5, 0xeb, 0x89, 0x85, 0xeb, 0x8c, 0xa6, 0xed, 0x91, 0xa4, 0xeb, 0x9b, 0x95, 0xe3, 0x89, 0xa5, 0x61, 0xd6, 0xae, 0xf0, 0x9e, 0x80, 0x9d, 0xcc, 0x80, 0xcc, 0x95, 0x62, 0xe4, 0x84, 0xaf, 0xc8, 0x9f, 0xec, 0xba, 0x9b], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the rest whitespace changes?
Michael approved in person |
* fast/foreignNormalize functions
rdar://problem/45443666