Skip to content

[stdlib] Implement native Unicode.Scalar binary properties #39597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 15, 2021

Conversation

Azoy
Copy link
Contributor

@Azoy Azoy commented Oct 5, 2021

This adds data tables for all binary properties for every scalar removing yet another function dependency on ICU. These binary property data tables are barely under 21KB.

@Azoy Azoy requested a review from milseman October 5, 2021 21:52
@Azoy
Copy link
Contributor Author

Azoy commented Oct 5, 2021

@swift-ci please test

@Azoy
Copy link
Contributor Author

Azoy commented Oct 5, 2021

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Oct 5, 2021

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
RemoveWhereMoveInts 31 34 +9.7% 0.91x (?)
RemoveWhereSwapInts 34 37 +8.8% 0.92x (?)
NormalizedIterator_ascii 86 93 +8.1% 0.92x (?)
ObjectiveCBridgeToNSArray 11750 12700 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 6794 6179 -9.1% 1.10x (?)
LessSubstringSubstring 42 39 -7.1% 1.08x
EqualStringSubstring 42 39 -7.1% 1.08x
EqualSubstringSubstringGenericEquatable 42 39 -7.1% 1.08x
EqualSubstringString 42 39 -7.1% 1.08x (?)
LessSubstringSubstringGenericComparable 42 39 -7.1% 1.08x (?)
DictionaryKeysContainsNative 28 26 -7.1% 1.08x (?)
EqualSubstringSubstring 43 40 -7.0% 1.07x (?)
SortStringsUnicode 3050 2845 -6.7% 1.07x

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 2035 2551 +25.4% 0.80x (?)
StringFromLongWholeSubstring 4 5 +25.0% 0.80x
FlattenListFlatMap 6073 7036 +15.9% 0.86x (?)
String.replaceSubrange.String.Small 66 73 +10.6% 0.90x
String.data.LargeUnicode 105 116 +10.5% 0.91x (?)
RemoveWhereMoveInts 34 37 +8.8% 0.92x (?)
RemoveWhereSwapInts 37 40 +8.1% 0.93x (?)
Array2D 6992 7520 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Breadcrumbs.MutatedUTF16ToIdx.Mixed 299 267 -10.7% 1.12x (?)
EqualSubstringSubstringGenericEquatable 43 39 -9.3% 1.10x
EqualSubstringString 44 40 -9.1% 1.10x
FindString.Loop1.Substring 529 489 -7.6% 1.08x (?)
SortStringsUnicode 3075 2865 -6.8% 1.07x (?)
DataToStringMedium 4550 4250 -6.6% 1.07x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
String.replaceSubrange.String 24 27 +12.5% 0.89x (?)
StringToDataLargeUnicode 7100 7900 +11.3% 0.90x (?)
MapReduceLazyCollection 35152 38056 +8.3% 0.92x (?)
ArrayAppendLazyMap 270920 292780 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 1180 1025 -13.1% 1.15x (?)
LineSink.bytes.alpha 303 272 -10.2% 1.11x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to beef up our testing, as before we only had to make sure we passed the right constant to ICU.

@swift-ci
Copy link
Contributor

swift-ci commented Oct 6, 2021

Build failed
Swift Test OS X Platform
Git Sha - 643cf08745eff24a4286110f0bf4ea152046d88b

@@ -27,7 +27,7 @@ public func formatCollection<C: Collection>(
result += " "
}

if rowLength + string.count + 1 > 80 {
if rowLength + string.count + 1 > 100 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had figured that this would stuff more data on a single line making it somewhat easier to get access to the actual lookup implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just decide to bump the stdlib's line length limit to 100... It seems to be the line length that GitHub uses for its web interface. (Although side-by-side diffs could become less useful.)

Increasing the limit would reduce the number of source files that can be comfortably displayed/edited side by side, but this isn't necessarily a big deal.

(OTOH, the 80 char limit also applies to the compiler, and if we do change it, it should probably apply to the entire repository. FWIW it comes from LLVM.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbf this is only increasing the line limit of the Unicode data arrays and nothing else. The implementation and all the stdlib changes still abide by the 80 rule.

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

Have you looked into testing at all?

@Azoy
Copy link
Contributor Author

Azoy commented Oct 6, 2021

@milseman It seems we have some tests regarding these binary properties in https://github.com/apple/swift/blob/main/test/stdlib/UnicodeScalarPropertiesTests.swift#L10 , so I don't know if that alleviates your concerns, or if you think we should have beefy tests.

@milseman
Copy link
Member

milseman commented Oct 7, 2021

Those are nice to have, but they're sanity check to make sure we're not passing the wrong constant to ICU. Bugs in the generator script, binary search, etc., would have a decent likelihood of hiding out.

We probably want to have a long-running test that walks each scalar and checks the property vs ICU, but I'm not sure where this would live.

@Azoy Azoy force-pushed the native-scalar-bin-props branch from 3cdbfe3 to fe568dd Compare October 7, 2021 03:44
@Azoy Azoy force-pushed the native-scalar-bin-props branch from fe568dd to 0630208 Compare November 9, 2021 00:28
@Azoy
Copy link
Contributor Author

Azoy commented Nov 9, 2021

@swift-ci please test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point of feedback is that I want to update our testing, even if just a little bit. Have you looked at our testing coverage? Can we expand it at least a little so that we feel more confident our data and way of accessing that data is correct?

@Azoy
Copy link
Contributor Author

Azoy commented Nov 15, 2021

@swift-ci please test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Azoy Azoy merged commit 3b402f0 into swiftlang:main Nov 15, 2021
@Azoy Azoy deleted the native-scalar-bin-props branch November 15, 2021 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants