[stdlib]String normalization functions #21026

TEST	OLD	NEW	DELTA	RATIO
Regression
StringUTF16Builder	400	432	+8.0%	0.93x
Improvement
UTF8Decode_InitDecoding_ascii	725	560	-22.8%	1.29x
UTF8Decode_InitDecoding	231	201	-13.0%	1.15x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
InsertCharacterEndIndex	147	162	+10.2%	0.91x (?)
NSStringConversion	539	588	+9.1%	0.92x
Improvement
Breadcrumbs.CopyUTF16CodeUnits.ASCII	20	18	-10.0%	1.11x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 16 GB

--------------

milseman

A quick, mostly cosmetic review. I'll do a more in-depth one after you're done with some of your in-flight changes

milseman · 2018-12-05T23:10:43Z

stdlib/public/core/StringNormalization.swift

+  _ outputBuffer: UnsafeMutableBufferPointer<UInt8>
+) -> (Int, Int) {
+  // Quick check if a scalar is NFC and a segment starter
+  @inline(__always) func isNFCStarter(_ scalar: Unicode.Scalar) -> Bool {


This already exists in StringNormalization.swift

Still exists in StringNormalization.swift

milseman · 2018-12-05T23:11:03Z

stdlib/public/core/StringNormalization.swift

+
+  var outputBufferThreshold: Int {
+    return outputBuffer.count - 4
+  }


Why is this computed? It's not going to change.

milseman · 2018-12-05T23:12:23Z

stdlib/public/core/StringNormalization.swift

+  return (inputCount, outputCount)
+}
+
+internal func transcodeToUTF16(


This function is specifically transcoding a segment. Please adjust the name to reflect that and add a comment, otherwise it looks like a general purpose transcoding function.

stdlib/public/core/StringNormalization.swift

+  outputBuffer: UnsafeMutableBufferPointer<UInt8>,
+  icuInputBuffer: UnsafeMutableBufferPointer<UInt16>,
+  icuOutputBuffer: UnsafeMutableBufferPointer<UInt16>
+) -> NormalizationResult {


milseman · 2018-12-05T23:15:56Z

stdlib/public/core/StringNormalization.swift

+  return (index.encodedOffset - readIndex.encodedOffset, writeIndex)
+}
+
+private func sharedNormalize(


Shared meaning shared strings, or shared code, or what? Please add comments to these methods. If this is a core common implementation method, Impl or is probably better than Shared

milseman · 2018-12-05T23:16:14Z

stdlib/public/core/UnicodeHelpers.swift

@@ -23,6 +23,7 @@ internal func _isTrailingSurrogate(_ cu: UInt16) -> Bool {
  return cu & _surrogateMask == _trailingSurrogateBias
 }
 @inline(__always)
+@usableFromInline


Why is this needed?

milseman · 2018-12-05T23:16:21Z

stdlib/public/core/UnicodeHelpers.swift

@@ -84,6 +85,23 @@ internal func _decodeUTF8(
  return Unicode.Scalar(_unchecked: value)
 }

+@inlinable


Why is this inlinable?

milseman · 2018-12-05T23:16:30Z

stdlib/public/core/UnicodeHelpers.swift

@@ -148,6 +166,7 @@ internal func _continuationPayload(_ x: UInt8) -> UInt32 {
 }

 @inline(__always)
+@usableFromInline


Why is this needed?

lancep · 2018-12-05T23:48:52Z

@swift-ci please benchmark

lancep · 2018-12-05T23:49:02Z

@swift-ci please test

lancep · 2018-12-06T01:11:34Z

@swift-ci please benchmark

lancep · 2018-12-06T01:11:55Z

@swift-ci please test OS X platform

swift-ci · 2018-12-06T01:42:34Z

Build failed
Swift Test OS X Platform
Git Sha - 21455e3

swift-ci · 2018-12-06T02:20:23Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_latin1	366	782	+113.7%	0.47x
StringHashing_fastPrenormal	1042	1691	+62.3%	0.62x
StringHashing_slowerPrenormal	1038	1607	+54.8%	0.65x
StringHashing_emoji	1258	1697	+34.9%	0.74x
WordCountUniqueUTF16	2766	3674	+32.8%	0.75x
StringHashing_nonBMPSlowestPrenormal	1624	2139	+31.7%	0.76x
WordCountHistogramUTF16	6406	7375	+15.1%	0.87x
Improvement
StringHashing_zalgo	4086	3601	-11.9%	1.13x
UTF8Decode_InitFromBytes_ascii	534	490	-8.2%	1.09x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_latin1	367	792	+115.8%	0.46x
StringHashing_fastPrenormal	1042	1698	+63.0%	0.61x
StringHashing_slowerPrenormal	1038	1552	+49.5%	0.67x
WordCountUniqueUTF16	2762	3749	+35.7%	0.74x
StringHashing_nonBMPSlowestPrenormal	1620	2162	+33.5%	0.75x
StringHashing_emoji	1258	1651	+31.2%	0.76x
WordCountHistogramUTF16	6576	7553	+14.9%	0.87x
Improvement
StringHashing_zalgo	4072	3607	-11.4%	1.13x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_latin1	822	1374	+67.2%	0.60x
StringHashing_fastPrenormal	1544	2545	+64.8%	0.61x
StringHashing_slowerPrenormal	1417	2103	+48.4%	0.67x
StringHashing_emoji	1514	2026	+33.8%	0.75x
StringHashing_nonBMPSlowestPrenormal	1976	2606	+31.9%	0.76x
Improvement
StringHashing_zalgo	4301	3788	-11.9%	1.14x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

milseman · 2018-12-06T19:37:50Z

stdlib/public/core/StringHashable.swift

-      normalize = fastNormalize
-    } else {
-      normalize = foreignNormalize
+    var bufferCleanup: (() -> ())? = nil


Err, why not just a Bool?

milseman · 2018-12-06T19:40:39Z

stdlib/public/core/StringHashable.swift

    }
+  }
+


Seems like all this logic could be refactored into a helper function that took the buffers inout and told the caller if they're responsible for calling deallocate.

milseman · 2018-12-06T19:41:17Z

stdlib/public/core/StringNormalization.swift

@@ -219,7 +218,7 @@ internal func transcodeToUTF16(
  return (readIndex, writeIndex)
 }

-internal func transcodeToUTF8(
+private func transcodeToUTF8(


milseman · 2018-12-06T19:41:55Z

stdlib/public/core/StringNormalization.swift

+  _ sourceBuffer: UnsafeBufferPointer<UInt8>,
+  _ outputBuffer: UnsafeMutableBufferPointer<UInt8>,
+  _ icuInputBuffer: UnsafeMutableBufferPointer<UInt16>,
+  _ icuOutputBuffer: UnsafeMutableBufferPointer<UInt16>


I think argument labels are necessary here, otherwise it's hard to know what buffer to pass where.

E.g. _fastNormalized(startingAt:from:into:icuInputBuffer:icuOutputBuffer:)

milseman · 2018-12-06T19:43:25Z

stdlib/public/core/StringNormalization.swift

+  let sourceCount = sourceBuffer.count - readIndex.encodedOffset
+  let start = readIndex.encodedOffset
+  let rebasedSourceBuffer = UnsafeBufferPointer(rebasing: sourceBuffer[start...])
+  do {


What does this do do?

Scopes the returns for fastFill so I don't have to come up with some other name for read/filled. Also prevents me from accidentally using the wrong variable later on.

lancep · 2018-12-06T21:39:49Z

@swift-ci please benchmark

swift-ci · 2018-12-06T22:39:23Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_nonBMPSlowestPrenormal	1603	1746	+8.9%	0.92x
StringHashing_fastPrenormal	1040	1122	+7.9%	0.93x
Improvement
StringHashing_zalgo	4062	3414	-16.0%	1.19x
StringHashing_abnormal	1681	1452	-13.6%	1.16x
UTF8Decode_InitDecoding	230	201	-12.6%	1.14x
WordCountUniqueUTF16	2690	2504	-6.9%	1.07x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_fastPrenormal	1042	1123	+7.8%	0.93x
Improvement
StringHashing_abnormal	1852	1443	-22.1%	1.28x
StringHashing_zalgo	4208	3381	-19.7%	1.24x
WordCountUniqueUTF16	2783	2523	-9.3%	1.10x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
StringHashing_abnormal	2135	1681	-21.3%	1.27x
StringHashing_zalgo	4412	3575	-19.0%	1.23x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

lancep · 2018-12-06T22:40:33Z

@swift-ci please benchmark

lancep · 2018-12-06T23:37:04Z

@swift-ci please test

lancep · 2018-12-06T23:37:11Z

@swift-ci please benchmark

lancep · 2018-12-06T23:38:54Z

@swift-ci please test

lancep · 2019-01-03T01:44:01Z

@swift-ci please test

swift-ci · 2019-01-03T01:45:59Z

Build failed
Swift Test Linux Platform
Git Sha - ad584dcfd471aaeea1d01eff7b05d2891d8036db

swift-ci · 2019-01-03T01:46:27Z

Build failed
Swift Test OS X Platform
Git Sha - ad584dcfd471aaeea1d01eff7b05d2891d8036db

swift-ci · 2019-01-03T02:30:52Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_nonBMPSlowestPrenormal	1620	2040	+25.9%	0.79x
StringHashing_fastPrenormal	1090	1360	+24.8%	0.80x
StringHashing_emoji	1252	1452	+16.0%	0.86x
StringHashing_latin1	380	436	+14.7%	0.87x
InsertCharacterEndIndex	147	166	+12.9%	0.89x
EqualSubstringString	39	44	+12.8%	0.89x
StringHashing_slowerPrenormal	1030	1140	+10.7%	0.90x
LessSubstringSubstring	39	43	+10.3%	0.91x
EqualStringSubstring	39	43	+10.3%	0.91x (?)
EqualSubstringSubstringGenericEquatable	39	43	+10.3%	0.91x
LessSubstringSubstringGenericComparable	39	43	+10.3%	0.91x
Improvement
StringComparison_abnormal	8980	1100	-87.8%	8.16x
StringComparison_zalgo	98000	63500	-35.2%	1.54x
StringHashing_abnormal	1800	1460	-18.9%	1.23x
StringHashing_zalgo	4050	3350	-17.3%	1.21x
RemoveWhereQuadraticString	423	378	-10.6%	1.12x
Added
StringNormalization_abnormal	1460	1500	1473	—
StringNormalization_ascii	164	167	165	—
StringNormalization_emoji	1552	1552	1552	—
StringNormalization_fastPrenormal	1460	1490	1470	—
StringNormalization_latin1	476	480	477	—
StringNormalization_nonBMPSlowestPrenormal	1980	2000	1990	—
StringNormalization_slowerPrenormal	1190	1220	1200	—
StringNormalization_zalgo	3425	3475	3442	—
Removed
NormalizedIterator_abnormal	1780	1820	1793	—
NormalizedIterator_ascii	141	143	142	—
NormalizedIterator_emoji	1328	1328	1328	—
NormalizedIterator_fastPrenormal	1210	1230	1217	—
NormalizedIterator_latin1	414	420	416	—
NormalizedIterator_nonBMPSlowestPrenormal	1720	1720	1720	—
NormalizedIterator_slowerPrenormal	1060	1070	1063	—
NormalizedIterator_zalgo	4075	4150	4108	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30353	31233	+2.9%	0.97x
RemoveWhere.o	23767	24119	+1.5%	0.99x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_fastPrenormal	1090	1340	+22.9%	0.81x
StringHashing_nonBMPSlowestPrenormal	1630	1990	+22.1%	0.82x
StringHashing_emoji	1260	1492	+18.4%	0.84x
StringHashing_latin1	380	434	+14.2%	0.88x
EqualStringSubstring	39	44	+12.8%	0.89x
EqualSubstringString	39	44	+12.8%	0.89x
LessSubstringSubstringGenericComparable	39	44	+12.8%	0.89x
EqualSubstringSubstring	39	43	+10.3%	0.91x
StringHashing_slowerPrenormal	1040	1140	+9.6%	0.91x
Improvement
StringComparison_abnormal	8880	1100	-87.6%	8.07x
StringWalk	3840	1520	-60.4%	2.53x
StrComplexWalk	5540	3530	-36.3%	1.57x
StringComparison_zalgo	97425	63550	-34.8%	1.53x
CSVParsingAltIndices2	1441	1078	-25.2%	1.34x
StringHashing_zalgo	4075	3375	-17.2%	1.21x
WordCountUniqueUTF16	2966	2557	-13.8%	1.16x
StringComparison_slowerPrenormal	1730	1590	-8.1%	1.09x
StringMatch	8800	8100	-8.0%	1.09x
HashTest	1394	1293	-7.2%	1.08x (?)
Added
StringNormalization_abnormal	1480	1520	1493	—
StringNormalization_ascii	164	167	165	—
StringNormalization_emoji	1564	1564	1564	—
StringNormalization_fastPrenormal	1460	1500	1473	—
StringNormalization_latin1	480	484	481	—
StringNormalization_nonBMPSlowestPrenormal	2000	2150	2060	—
StringNormalization_slowerPrenormal	1210	1230	1217	—
StringNormalization_zalgo	3400	3475	3425	—
Removed
NormalizedIterator_abnormal	1760	1780	1767	—
NormalizedIterator_ascii	142	153	146	—
NormalizedIterator_emoji	1328	1328	1328	—
NormalizedIterator_fastPrenormal	1210	1240	1220	—
NormalizedIterator_latin1	416	426	419	—
NormalizedIterator_nonBMPSlowestPrenormal	1720	1720	1720	—
NormalizedIterator_slowerPrenormal	1060	1090	1070	—
NormalizedIterator_zalgo	4125	4125	4125	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30433	30977	+1.8%	0.98x
Hash.o	19873	20193	+1.6%	0.98x
RemoveWhere.o	23710	24062	+1.5%	0.99x
ReversedCollections.o	9610	9746	+1.4%	0.99x
Improvement
RangeAssignment.o	5101	5018	-1.6%	1.02x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_fastPrenormal	1600	2000	+25.0%	0.80x
StringHashing_nonBMPSlowestPrenormal	2000	2460	+23.0%	0.81x
ReversedDictionary2	30740	37101	+20.7%	0.83x
StringHashing_emoji	1560	1800	+15.4%	0.87x
Improvement
StringComparison_abnormal	10740	2420	-77.5%	4.44x
StringComparison_zalgo	102250	66900	-34.6%	1.53x
StringHashing_abnormal	2100	1700	-19.0%	1.24x
StringHashing_zalgo	4300	3525	-18.0%	1.22x
QueueConcrete	18510	15779	-14.8%	1.17x
QueueGeneric	24309	21251	-12.6%	1.14x
ArrayOfPOD	858	779	-9.2%	1.10x
StrComplexWalk	7980	7340	-8.0%	1.09x
Added
StringNormalization_abnormal	1680	1720	1693	—
StringNormalization_ascii	526	526	526	—
StringNormalization_emoji	1808	1808	1808	—
StringNormalization_fastPrenormal	1850	1880	1867	—
StringNormalization_latin1	896	916	903	—
StringNormalization_nonBMPSlowestPrenormal	2350	2400	2373	—
StringNormalization_slowerPrenormal	1640	1660	1647	—
StringNormalization_zalgo	3550	3625	3575	—
Removed
NormalizedIterator_abnormal	2100	2100	2100	—
NormalizedIterator_ascii	582	583	582	—
NormalizedIterator_emoji	1588	1588	1588	—
NormalizedIterator_fastPrenormal	1740	1760	1747	—
NormalizedIterator_latin1	934	936	935	—
NormalizedIterator_nonBMPSlowestPrenormal	2090	2090	2090	—
NormalizedIterator_slowerPrenormal	1480	1520	1493	—
NormalizedIterator_zalgo	4325	4325	4325	—

✅	Benchmark Check Report
⛔️🔤	`StringNormalization_abnormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_fastPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_slowerPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_nonBMPSlowestPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_nonBMPSlowestPrenormal` name is 42 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`StringNormalization_emoji` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_zalgo` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_latin1` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

lancep · 2019-01-04T23:34:01Z

@swift-ci please benchmark

lancep · 2019-01-04T23:34:15Z

@swift-ci please test

swift-ci · 2019-01-05T00:20:29Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_nonBMPSlowestPrenormal	1450	2090	+44.1%	0.69x
StringHashing_emoji	1132	1628	+43.8%	0.70x
StringHashing_slowerPrenormal	920	1300	+41.3%	0.71x
StringHashing_fastPrenormal	930	1190	+28.0%	0.78x
StringHashing_latin1	330	394	+19.4%	0.84x
DataSetCountSmall	118	133	+12.7%	0.89x
Improvement
StringComparison_abnormal	7620	960	-87.4%	7.94x
StringComparison_zalgo	87050	57475	-34.0%	1.51x
StringHashing_zalgo	3575	3075	-14.0%	1.16x
Added
StringNormalization_abnormal	1280	1320	1293	—
StringNormalization_ascii	147	151	148	—
StringNormalization_emoji	1404	1416	1408	—
StringNormalization_fastPrenormal	1290	1310	1297	—
StringNormalization_latin1	426	432	428	—
StringNormalization_nonBMPSlowestPrenormal	1820	1830	1823	—
StringNormalization_slowerPrenormal	1080	1100	1087	—
StringNormalization_zalgo	3050	3125	3075	—
Removed
NormalizedIterator_abnormal	1480	1500	1487	—
NormalizedIterator_ascii	126	128	127	—
NormalizedIterator_emoji	1200	1204	1201	—
NormalizedIterator_fastPrenormal	1040	1060	1047	—
NormalizedIterator_latin1	358	362	359	—
NormalizedIterator_nonBMPSlowestPrenormal	1540	1540	1540	—
NormalizedIterator_slowerPrenormal	940	950	943	—
NormalizedIterator_zalgo	3625	3650	3633	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30353	31233	+2.9%	0.97x
RemoveWhere.o	23767	24119	+1.5%	0.99x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_fastPrenormal	930	1190	+28.0%	0.78x
StringHashing_emoji	1132	1360	+20.1%	0.83x
StringHashing_latin1	330	396	+20.0%	0.83x
StringHashing_nonBMPSlowestPrenormal	1460	1730	+18.5%	0.84x
StringHashing_slowerPrenormal	920	1050	+14.1%	0.88x
DataSetCountSmall	120	135	+12.5%	0.89x
InsertCharacterEndIndex	131	144	+9.9%	0.91x (?)
StringComparison_slowerPrenormal	1440	1550	+7.6%	0.93x
Improvement
StringComparison_abnormal	7480	980	-86.9%	7.63x
StringWalk	3440	1360	-60.5%	2.53x
StrComplexWalk	5000	3140	-37.2%	1.59x
StringComparison_zalgo	87100	57450	-34.0%	1.52x
CSVParsingAltIndices2	1298	979	-24.6%	1.33x
StringHashing_zalgo	3600	3025	-16.0%	1.19x
StringMatch	7800	7100	-9.0%	1.10x
StringHashing_abnormal	1440	1320	-8.3%	1.09x
Added
StringNormalization_abnormal	1300	1360	1327	—
StringNormalization_ascii	152	156	153	—
StringNormalization_emoji	1452	1456	1453	—
StringNormalization_fastPrenormal	1330	1360	1340	—
StringNormalization_latin1	438	446	441	—
StringNormalization_nonBMPSlowestPrenormal	1860	1900	1873	—
StringNormalization_slowerPrenormal	1110	1150	1127	—
StringNormalization_zalgo	3125	3225	3158	—
Removed
NormalizedIterator_abnormal	1500	1500	1500	—
NormalizedIterator_ascii	126	128	127	—
NormalizedIterator_emoji	1192	1196	1193	—
NormalizedIterator_fastPrenormal	1040	1050	1043	—
NormalizedIterator_latin1	358	362	359	—
NormalizedIterator_nonBMPSlowestPrenormal	1540	1540	1540	—
NormalizedIterator_slowerPrenormal	940	970	950	—
NormalizedIterator_zalgo	3675	3675	3675	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30433	30977	+1.8%	0.98x
Hash.o	19873	20193	+1.6%	0.98x
RemoveWhere.o	23710	24062	+1.5%	0.99x
ReversedCollections.o	9610	9746	+1.4%	0.99x
Improvement
RangeAssignment.o	5101	5018	-1.6%	1.02x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
StringHashing_fastPrenormal	1380	1700	+23.2%	0.81x
PopFrontArrayGeneric	4749	5361	+12.9%	0.89x
Improvement
StringComparison_abnormal	9220	2060	-77.7%	4.48x
StringComparison_zalgo	91575	60450	-34.0%	1.51x
StringHashing_zalgo	3825	3200	-16.3%	1.20x
CharIndexing_chinese_unicodeScalars	449600	378440	-15.8%	1.19x
StringHashing_abnormal	1740	1480	-14.9%	1.18x
CharIndexing_russian_unicodeScalars	486920	417080	-14.3%	1.17x
StrComplexWalk	7290	6610	-9.3%	1.10x
WordCountUniqueASCII	11761	10686	-9.1%	1.10x
WordCountUniqueUTF16	12486	11465	-8.2%	1.09x
Added
StringNormalization_abnormal	1440	1500	1460	—
StringNormalization_ascii	479	480	479	—
StringNormalization_emoji	1580	1580	1580	—
StringNormalization_fastPrenormal	1640	1670	1650	—
StringNormalization_latin1	804	842	817	—
StringNormalization_nonBMPSlowestPrenormal	2040	2080	2053	—
StringNormalization_slowerPrenormal	1380	1410	1390	—
StringNormalization_zalgo	3150	3225	3175	—
Removed
NormalizedIterator_abnormal	1720	1720	1720	—
NormalizedIterator_ascii	451	459	456	—
NormalizedIterator_emoji	1376	1376	1376	—
NormalizedIterator_fastPrenormal	1400	1410	1403	—
NormalizedIterator_latin1	738	738	738	—
NormalizedIterator_nonBMPSlowestPrenormal	1790	1790	1790	—
NormalizedIterator_slowerPrenormal	1260	1290	1273	—
NormalizedIterator_zalgo	3800	3825	3808	—

✅	Benchmark Check Report
⛔️🔤	`StringNormalization_abnormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_fastPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_slowerPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_nonBMPSlowestPrenormal` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_nonBMPSlowestPrenormal` name is 42 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`StringNormalization_emoji` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_zalgo` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`StringNormalization_latin1` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

--------------

lancep · 2019-01-05T00:28:55Z

@swift-ci please benchmark

swift-ci · 2019-01-05T01:14:30Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DataSetCountSmall	117	133	+13.7%	0.88x
StringComparison_slowerPrenormal	1440	1570	+9.0%	0.92x
Improvement
StringComparison_abnormal	7500	960	-87.2%	7.81x
StringComparison_zalgo	87650	57075	-34.9%	1.54x
StringHashing_zalgo	3600	3225	-10.4%	1.12x
NormalizedIterator_zalgo	3625	3275	-9.7%	1.11x
NormalizedIterator_ascii	126	114	-9.5%	1.11x
StringHashing_abnormal	1500	1360	-9.3%	1.10x
NormalizedIterator_latin1	356	332	-6.7%	1.07x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30353	31233	+2.9%	0.97x
RemoveWhere.o	23767	24119	+1.5%	0.99x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DataSetCountSmall	120	135	+12.5%	0.89x
InsertCharacterEndIndex	131	147	+12.2%	0.89x
StringComparison_slowerPrenormal	1440	1580	+9.7%	0.91x
InsertCharacterEndIndexNonASCII	50	54	+8.0%	0.93x (?)
Improvement
StringComparison_abnormal	7540	960	-87.3%	7.85x
StringWalk	3440	1360	-60.5%	2.53x
StrComplexWalk	4990	3120	-37.5%	1.60x
StringComparison_zalgo	87450	57000	-34.8%	1.53x
CSVParsingAltIndices2	1298	990	-23.7%	1.31x
NormalizedIterator_ascii	126	112	-11.1%	1.12x
StringHashing_zalgo	3600	3225	-10.4%	1.12x
NormalizedIterator_zalgo	3625	3275	-9.7%	1.11x
RemoveWhereMoveInts	11	10	-9.1%	1.10x
StringHashing_latin1	330	302	-8.5%	1.09x
StringHashing_abnormal	1480	1360	-8.1%	1.09x
NormalizedIterator_latin1	358	332	-7.3%	1.08x
NormalizedIterator_abnormal	1440	1340	-6.9%	1.07x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
CSVParsing.o	30433	30977	+1.8%	0.98x
Hash.o	19873	20193	+1.6%	0.98x
RemoveWhere.o	23710	24062	+1.5%	0.99x
ReversedCollections.o	9610	9746	+1.4%	0.99x
Improvement
RangeAssignment.o	5101	5018	-1.6%	1.02x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
InsertCharacterEndIndex	195	213	+9.2%	0.92x
InsertCharacterEndIndexNonASCII	70	76	+8.6%	0.92x (?)
Improvement
StringComparison_abnormal	9060	2160	-76.2%	4.19x
StringComparison_zalgo	91300	60150	-34.1%	1.52x
StrComplexWalk	7260	6110	-15.8%	1.19x
CharIteration_ascii_unicodeScalars_Backwards	376520	320520	-14.9%	1.17x
CharIndexing_punctuatedJapanese_unicodeScalars	99720	87680	-12.1%	1.14x
RemoveWhereFilterStrings	4191	3690	-12.0%	1.14x
WordCountUniqueUTF16	12265	10871	-11.4%	1.13x
NormalizedIterator_zalgo	3825	3400	-11.1%	1.12x
StringHashing_zalgo	3825	3400	-11.1%	1.12x
NormalizedIterator_abnormal	1720	1540	-10.5%	1.12x
StringHashing_abnormal	1740	1560	-10.3%	1.12x
CharIteration_punctuated_unicodeScalars_Backwards	79240	71440	-9.8%	1.11x
ArrayOfPOD	763	693	-9.2%	1.10x
WordCountUniqueASCII	11775	10764	-8.6%	1.09x
StringWalk	12520	11640	-7.0%	1.08x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

--------------

lancep · 2019-01-05T01:19:05Z

@swift-ci please test

swift-ci · 2019-01-05T01:21:23Z

Build failed
Swift Test Linux Platform
Git Sha - 280e002

swift-ci · 2019-01-05T01:22:07Z

Build failed
Swift Test OS X Platform
Git Sha - 280e002

lancep · 2019-01-07T18:43:49Z

@swift-ci please test OS X platform

swift-ci · 2019-01-07T19:42:45Z

Build failed
Swift Test OS X Platform
Git Sha - cef0fb8

lancep · 2019-01-07T19:52:15Z

@swift-ci please test

swift-ci · 2019-01-07T19:52:52Z

Build failed
Swift Test OS X Platform
Git Sha - cef0fb8

swift-ci · 2019-01-07T19:54:27Z

Build failed
Swift Test Linux Platform
Git Sha - cef0fb8

milseman · 2019-01-08T19:57:46Z

validation-test/stdlib/StringNormalization.swift

@@ -114,11 +114,11 @@ let codeUnitNormalizationTests: [([UInt8], String)] = [
    "몌̴ᆮ닮돈a֮𖬴̀̕b됆섨⽱쀪ﺏ탅"
  ),
  (
-    [0xc3, 0xa0, 0xd6, 0xae, 0xf0, 0x91, 0x8d, 0xb3, 0xcc, 0x95, 0x62, 0xe1, 0xb6, 0xb6, 0xec, 0x93, 0xa1, 0xed, 0x83, 0x97, 0xec, 0xa4, 0x80, 0xec, 0xa0, 0x95, 0xef, 0xbb, 0xaa, 0xec, 0x82, 0x90, 0xcc, 0xb4, 0xe1, 0x86, 0xb5, 0xed, 0x84, 0xa3, 0xed, 0x8b, 0xad, 0xeb, 0x9c, 0xaf, 0xef, 0xb6, 0x8e, 0xe3, 0x8e, 0xa1, 0xef, 0xb6, 0x94, 0xe3, 0x8e, 0x9a, 0xe2, 0x85, 0xef, 0xb0, 0x92, 0xeb, 0x83, 0xa8, 0xed, 0x9d, 0x94, 0xf0, 0x9f, 0x88, 0xb9],
+    [0xc3, 0xa0, 0xd6, 0xae, 0xf0, 0x91, 0x8d, 0xb3, 0xcc, 0x95, 0x62, 0xe1, 0xb6, 0xb6, 0xec, 0x93, 0xa1, 0xed, 0x83, 0x97, 0xec, 0xa4, 0x80, 0xec, 0xa0, 0x95, 0xef, 0xbb, 0xaa, 0xec, 0x82, 0x90, 0xcc, 0xb4, 0xe1, 0x86, 0xb5, 0xed, 0x84, 0xa3, 0xed, 0x8b, 0xad, 0xeb, 0x9c, 0xaf, 0xef, 0xb6, 0x8e, 0xe3, 0x8e, 0xa1, 0xef, 0xb6, 0x94, 0xe3, 0x8e, 0x9a, 0xe2, 0x85, 0xb3, 0xef, 0xb0, 0x92, 0xeb, 0x83, 0xa8, 0xed, 0x9d, 0x94, 0xf0, 0x9f, 0x88, 0xb9],


Why is this change needed?

milseman · 2019-01-08T19:57:53Z

validation-test/stdlib/StringNormalization.swift

    "à֮𑍳̕bᶶ쓡탗준정ﻪ삐̴ᆵ턣틭뜯ﶎ㎡ﶔ㎚ⅳﰒ냨흔🈹"
  ),
  (
-    [0xeb, 0xb3, 0x95, 0xf0, 0xa6, 0xbc, 0xac, 0xef, 0xbc, 0xba, 0xce, 0xa9, 0xec, 0xb2, 0x84, 0xe7, 0xb3, 0x96, 0xed, 0x91, 0xa0, 0xc3, 0xa0, 0xd6, 0xae, 0xe2, 0xb7, 0xb9, 0xcc, 0x95, 0x62, 0xec, 0xbc, 0x98, 0xe5, 0x8b, 0x9e, 0xec, 0x82, 0x9a, 0xed, 0x88, 0x92, 0xed, 0x87, 0x91, 0xec, 0xaf, 0x99, 0xe6, 0x86, 0xaf, 0xe2, 0xb5, 0xaf, 0xec, 0x9d, 0x8c, 0xe9, 0xb6, 0xe2, 0x93, 0xa2, 0xe2, 0x82, 0x84],
+    [0xeb, 0xb3, 0x95, 0xf0, 0xa6, 0xbc, 0xac, 0xef, 0xbc, 0xba, 0xce, 0xa9, 0xec, 0xb2, 0x84, 0xe7, 0xb3, 0x96, 0xed, 0x91, 0xa0, 0xc3, 0xa0, 0xd6, 0xae, 0xe2, 0xb7, 0xb9, 0xcc, 0x95, 0x62, 0xec, 0xbc, 0x98, 0xe5, 0x8b, 0x9e, 0xec, 0x82, 0x9a, 0xed, 0x88, 0x92, 0xed, 0x87, 0x91, 0xec, 0xaf, 0x99, 0xe6, 0x86, 0xaf, 0xe2, 0xb5, 0xaf, 0xec, 0x9d, 0x8c, 0xe9, 0xb6, 0xb4, 0xe2, 0x93, 0xa2, 0xe2, 0x82, 0x84],


Why is this change needed?

milseman · 2019-01-08T19:58:05Z

validation-test/stdlib/StringNormalization.swift

@@ -134,323 +134,323 @@ let codeUnitNormalizationTests: [([UInt8], String)] = [
    "쁏音MW삢썍곛뻏쩄뻍겔ᄇ̴ᅪ킾a〪᳟̖֚b죆퉆Ṑﰆ뒳킟저̴ᆼ"
  ),
  (
-    [0xec, 0x8c, 0x9d, 0xe6, 0x86, 0x8e, 0xec, 0xb0, 0x9a, 0xec, 0xba, 0xb0, 0xe1, 0xb6, 0xb0, 0xc3, 0xa0, 0xd6, 0xae, 0xe1, 0xaa, 0xb4, 0xcc, 0x95, 0x62, 0xec, 0xac, 0x9e, 0xec, 0xa7, 0xb9, 0xea, 0xbe, 0x89, 0xef, 0xad, 0xab, 0xd7, 0x9c, 0xd6, 0xbc, 0xe2, 0x85, 0x85, 0xeb, 0x94, 0x93, 0xe1, 0xbd, 0x92, 0xec, 0xaa, 0xa3, 0xeb, 0xab, 0x85, 0xec, 0xb7, 0x8c, 0xea, 0xb9, 0xa9, 0xec, 0xab, 0x8d, 0xec, 0x8c, 0x84, 0xe5, 0x8f, 0xa5, 0xeb, 0x89, 0x85, 0xeb, 0xed, 0x91, 0xa4, 0xeb, 0x9b, 0x95, 0xe3, 0x89, 0xa5, 0x61, 0xd6, 0xae, 0xf0, 0x9e, 0x80, 0x9d, 0xcc, 0x80, 0xcc, 0x95, 0x62, 0xe4, 0x84, 0xaf, 0xc8, 0x9f, 0xec, 0xba, 0x9b],
+    [0xec, 0x8c, 0x9d, 0xe6, 0x86, 0x8e, 0xec, 0xb0, 0x9a, 0xec, 0xba, 0xb0, 0xe1, 0xb6, 0xb0, 0xc3, 0xa0, 0xd6, 0xae, 0xe1, 0xaa, 0xb4, 0xcc, 0x95, 0x62, 0xec, 0xac, 0x9e, 0xec, 0xa7, 0xb9, 0xea, 0xbe, 0x89, 0xef, 0xad, 0xab, 0xd7, 0x9c, 0xd6, 0xbc, 0xe2, 0x85, 0x85, 0xeb, 0x94, 0x93, 0xe1, 0xbd, 0x92, 0xec, 0xaa, 0xa3, 0xeb, 0xab, 0x85, 0xec, 0xb7, 0x8c, 0xea, 0xb9, 0xa9, 0xec, 0xab, 0x8d, 0xec, 0x8c, 0x84, 0xe5, 0x8f, 0xa5, 0xeb, 0x89, 0x85, 0xeb, 0x8c, 0xa6, 0xed, 0x91, 0xa4, 0xeb, 0x9b, 0x95, 0xe3, 0x89, 0xa5, 0x61, 0xd6, 0xae, 0xf0, 0x9e, 0x80, 0x9d, 0xcc, 0x80, 0xcc, 0x95, 0x62, 0xe4, 0x84, 0xaf, 0xc8, 0x9f, 0xec, 0xba, 0x9b],


Are the rest whitespace changes?

lancep · 2019-01-08T21:57:08Z

Michael approved in person

* fast/foreignNormalize functions

* [stdlib]String normalization functions (#21026) * fast/foreignNormalize functions * fix bad cherry-pick

lancep requested a review from milseman December 5, 2018 00:35

lancep force-pushed the no-mo-iterato branch from 963eed1 to a3a723a Compare December 5, 2018 00:50

lancep force-pushed the no-mo-iterato branch from 42abea9 to 9c26ca4 Compare December 5, 2018 19:41

milseman requested changes Dec 5, 2018

View reviewed changes

milseman reviewed Dec 6, 2018

View reviewed changes

Lance Parker added 3 commits January 4, 2019 15:07

Make hashing faster

922437e

Fix character iteration benchmarks

a47bcb8

Remove some @inline(__always)

2e3eb1f

restore the original benchmark name (for now)

cef0fb8

fix up the gyb file

36989a1

milseman reviewed Jan 8, 2019

View reviewed changes

lancep merged commit 15aaa1e into swiftlang:master Jan 8, 2019

lancep deleted the no-mo-iterato branch January 8, 2019 21:55

lancep mentioned this pull request Jan 9, 2019

[stdlib]String normalization functions (#21026) #21744

Merged

lancep pushed a commit to lancep/swift that referenced this pull request Jan 10, 2019

[stdlib]String normalization functions (swiftlang#21026)

b25d3a0

* fast/foreignNormalize functions

airspeedswift pushed a commit that referenced this pull request Jan 11, 2019

[stdlib]String normalization functions (#21026) (#21744)

4469ae9

* [stdlib]String normalization functions (#21026) * fast/foreignNormalize functions * fix bad cherry-pick

[stdlib]String normalization functions #21026

[stdlib]String normalization functions #21026

Uh oh!

Conversation

lancep commented Dec 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

milseman commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Uh oh!

swift-ci commented Dec 5, 2018

Build comment file:

Performance: -O

Performance: -Osize

Uh oh!

milseman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

lancep commented Dec 5, 2018

Uh oh!

lancep commented Dec 6, 2018

Uh oh!

lancep commented Dec 6, 2018

Uh oh!

swift-ci commented Dec 6, 2018

Uh oh!

swift-ci commented Dec 6, 2018

Build comment file:

Performance: -O

Performance: -Osize

Performance: -Onone

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lancep commented Dec 5, 2018 •

edited

Loading