[cxx-interop] Add initial benchmark to compare vector<uint32_t> sum i… #61456

hyp · 2022-10-05T18:49:21Z

…n C++ vs Swift

This benchmark compares the performance of summing up a vector of a million elements between Swift and C++.

Initial numbers on M1 Mac:

#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
139,CreateObjects,200,11,29,14,2,15
140,CxxVectorOfU32SumInCxx,200,77,107,95,7,93
141,CxxVectorOfU32SumInSwift,36,24980,27957,25654,646,25546
142,CxxVectorOfU32SumInSwift_Fastest,200,77,83,77,1,77

We're investigating why the CxxVectorOfU32SumInSwift is slower and how to correctly fix it.

hyp · 2022-10-05T18:49:30Z

@swift-ci please test

hyp · 2022-10-05T18:49:38Z

@swift-ci please benchmark

hyp · 2022-10-05T19:03:57Z

@swift-ci please test macOS platform

zoecarver · 2022-10-05T21:35:15Z

Benchmark Info

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Dictionary4	153	193	+26.1%	0.79x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	1620	1386	-14.4%	1.17x (?)
ObjectiveCBridgeStubToNSStringRef	93	86	-7.5%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32SumInCxx	3	5	4	—
CxxVectorOfU32SumInSwift	38989	39456	39183	—
CxxVectorOfU32SumInSwift_Fastest	3	3	3	—

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
PrefixAnySeqCntRange	13	16	+23.1%	0.81x (?)
ObjectiveCBridgeStubDateAccess	130	152	+16.9%	0.86x (?)
FlattenListLoop	1385	1615	+16.6%	0.86x (?)
PrefixWhileAnySequence	172	198	+15.1%	0.87x (?)
StringComparison_ascii	360	390	+8.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	231	159	-31.2%	1.45x (?)
Dictionary4OfObjects	316	277	-12.3%	1.14x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32SumInCxx	54	56	55	—
CxxVectorOfU32SumInSwift	37043	38907	38182	—
CxxVectorOfU32SumInSwift_Fastest	35	36	35	—

Code size: -Osize

Performance (x86_64): -Onone

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64Exactly	475	441	-7.2%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32SumInCxx	8147	8154	8152	—
CxxVectorOfU32SumInSwift	72724	77494	75066	—
CxxVectorOfU32SumInSwift_Fastest	10992	11758	11369	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️🔤	`CxxVectorOfU32SumInSwift` name is composed of 6 words. _{Split CxxVectorOfU32SumInSwift name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32SumInSwift` has setup overhead of 36214 μs (99.6%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️Ⓜ️	`CxxVectorOfU32SumInSwift` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32SumInSwift mem_pages [i1, i2]: min=[3928, 3928] 𝚫=0 R=[5876, 2938]}
⚠️🔤	`CxxVectorOfU32SumInCxx` name is composed of 6 words. _{Split CxxVectorOfU32SumInCxx name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32SumInCxx` has setup overhead of 70 μs (100.0%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⛔️⏱	`CxxVectorOfU32SumInCxx` execution took 0 μs. _{Ensure the workload of CxxVectorOfU32SumInCxx has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).}
⚠️Ⓜ️	`CxxVectorOfU32SumInCxx` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32SumInCxx mem_pages [i1, i2]: min=[1960, 1960] 𝚫=0 R=[0, 19]}
⛔️🔤	`CxxVectorOfU32SumInSwift_Fastest` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⚠️🔤	`CxxVectorOfU32SumInSwift_Fastest` name is composed of 6 words. _{Split CxxVectorOfU32SumInSwift_Fastest name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32SumInSwift_Fastest` has setup overhead of 78 μs (106.8%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️⏱	`CxxVectorOfU32SumInSwift_Fastest` execution took -5 μs. _{Increase the workload of CxxVectorOfU32SumInSwift_Fastest to be more than 20 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

benchmark/cxx-source/CxxVectorSum.swift

hyp · 2022-10-07T17:44:45Z

@swift-ci please benchmark

hyp · 2022-10-07T17:44:51Z

@swift-ci please test

hyp · 2022-10-07T19:34:07Z

Benchmark info:

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
StringInterpolationManySmallSegments	7300	8800	+20.5%	0.83x (?)
Dictionary3	122	146	+19.7%	0.84x (?)
RemoveWhereMoveInts	12	13	+8.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	201	152	-24.4%	1.32x
Dictionary4OfObjects	248	209	-15.7%	1.19x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop	101	103	102	—
CxxVectorOfU32.Sum.Swift.ForInLoop	36881	41604	39568	—
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop	10690028	11153274	10972647	—
CxxVectorOfU32.Sum.Swift.RawIteratorLoop	90	91	91	—
CxxVectorOfU32.Sum.Swift.Reduce	39856	40399	40106	—

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
PrefixAnySeqCRangeIter	13	15	+15.4%	0.87x (?)
StringWalk	1360	1520	+11.8%	0.89x (?)
PrefixAnySeqCntRange	13	14	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x (?)
DictionaryKeysContainsCocoa	15	14	-6.7%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop	331	332	331	—
CxxVectorOfU32.Sum.Swift.ForInLoop	36610	38637	37663	—
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop	10746758	10871907	10796945	—
CxxVectorOfU32.Sum.Swift.RawIteratorLoop	325	329	327	—
CxxVectorOfU32.Sum.Swift.Reduce	39519	39997	39750	—

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubNSDateRefAccess	2516	3057	+21.5%	0.82x (?)
ObjectiveCBridgeStubDateAccess	2604	3077	+18.2%	0.85x (?)
NSError	371	403	+8.6%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateMutation	352	326	-7.4%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop	11179	11419	11336	—
CxxVectorOfU32.Sum.Swift.ForInLoop	67463	67624	67524	—
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop	703459153	704797431	704231260	—
CxxVectorOfU32.Sum.Swift.RawIteratorLoop	14508	14595	14548	—
CxxVectorOfU32.Sum.Swift.Reduce	80136	81468	80918	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️🔤	`CxxVectorOfU32.Sum.Cxx.RangedForLoop` name is composed of 6 words. _{Split CxxVectorOfU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32.Sum.Cxx.RangedForLoop` has setup overhead of 122 μs (64.2%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️Ⓜ️	`CxxVectorOfU32.Sum.Cxx.RangedForLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32.Sum.Cxx.RangedForLoop mem_pages [i1, i2]: min=[102, 102] 𝚫=0 R=[105, 210]}
⚠️🔤	`CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop` name is composed of 7 words. _{Split CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️🔤	`CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop` name is 46 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️⏱	`CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop` execution took at least 10495803 μs. _{Decrease the workload of CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop by a factor of 16384 (100000), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop mem_pages [i1, i2]: min=[1018, 1020] 𝚫=2 R=[206, 102]}
⚠️🔤	`CxxVectorOfU32.Sum.Swift.ForInLoop` name is composed of 6 words. _{Split CxxVectorOfU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32.Sum.Swift.ForInLoop` execution took at least 34850 μs. _{Decrease the workload of CxxVectorOfU32.Sum.Swift.ForInLoop by a factor of 64 (100), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVectorOfU32.Sum.Swift.ForInLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[299, 299] 𝚫=0 R=[1211, 301]}
⛔️⏱	`CxxVectorOfU32.Sum.Swift.Reduce` execution took at least 36683 μs. _{Decrease the workload of CxxVectorOfU32.Sum.Swift.Reduce by a factor of 64 (100), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVectorOfU32.Sum.Swift.Reduce` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[300, 300] 𝚫=0 R=[917, 1204]}
⚠️🔤	`CxxVectorOfU32.Sum.Swift.RawIteratorLoop` name is composed of 6 words. _{Split CxxVectorOfU32.Sum.Swift.RawIteratorLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVectorOfU32.Sum.Swift.RawIteratorLoop` has setup overhead of 126 μs (66.0%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️Ⓜ️	`CxxVectorOfU32.Sum.Swift.RawIteratorLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVectorOfU32.Sum.Swift.RawIteratorLoop mem_pages [i1, i2]: min=[102, 102] 𝚫=0 R=[105, 105]}

…n C++ vs Swift

…nd take iter into account

… that doesn't use C++ inline helpers

…times for slow runs

…C++, and subscript is fast with swiftlang#61499 fixed

…61499 is fixed

hyp · 2022-10-10T16:25:24Z

@swift-ci please benchmark

hyp · 2022-10-10T16:25:30Z

@swift-ci please test

hyp · 2022-10-10T17:51:33Z

Benchmark results

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous	123	161	+30.9%	0.76x
UTF8Decode_InitDecoding	123	161	+30.9%	0.76x
Dictionary3	125	151	+20.8%	0.83x (?)
StringUTF16Builder	220	260	+18.2%	0.85x (?)
StringBuilder	202	237	+17.3%	0.85x (?)
StringBuilderSmallReservingCapacity	210	246	+17.1%	0.85x (?)
UTF8Decode_InitFromCustom_noncontiguous	241	279	+15.8%	0.86x (?)
CStringLongNonAscii	144	164	+13.9%	0.88x (?)
ParseFloat.Double.Exp	9	10	+11.1%	0.90x (?)
NormalizedIterator_emoji	328	364	+11.0%	0.90x (?)
NormalizedIterator_nonBMPSlowestPrenormal	410	450	+9.8%	0.91x (?)
Calculator	141	153	+8.5%	0.92x (?)
Chars2	3150	3400	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	227	152	-33.0%	1.49x
LessSubstringSubstring	30	24	-20.0%	1.25x (?)
EqualSubstringSubstringGenericEquatable	30	24	-20.0%	1.25x (?)
EqualSubstringString	30	24	-20.0%	1.25x (?)
LessSubstringSubstringGenericComparable	30	24	-20.0%	1.25x (?)
EqualStringSubstring	31	25	-19.4%	1.24x
Dictionary4OfObjects	256	210	-18.0%	1.22x (?)
DictionaryKeysContainsCocoa	17	14	-17.6%	1.21x (?)
EqualSubstringSubstring	30	25	-16.7%	1.20x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	214	187	-12.6%	1.14x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	216	191	-11.6%	1.13x
Data.hash.Medium	27	24	-11.1%	1.12x (?)
StringComparison_longSharedPrefix	343	306	-10.8%	1.12x (?)
DropLastAnySeqCRangeIter	381	342	-10.2%	1.11x (?)
DistinctClassFieldAccesses	42	38	-9.5%	1.11x (?)
RemoveWhereFilterInts	27	25	-7.4%	1.08x (?)
PrefixWhileSequence	211	197	-6.6%	1.07x (?)
PrefixWhileAnySequence	197	184	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	8	8	8	—
CxxVecU32.Sum.Swift.ForInLoop	4610	4710	4643	—
CxxVecU32.Sum.Swift.IteratorLoop	8	8	8	—
CxxVecU32.Sum.Swift.Reduce	4544	4824	4654	—
CxxVecU32.Sum.Swift.SubscriptLoop	8	8	8	—

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous	121	161	+33.1%	0.75x (?)
UTF8Decode_InitDecoding	124	161	+29.8%	0.77x
StringWalk	1240	1520	+22.6%	0.82x (?)
StringUTF16Builder	210	250	+19.0%	0.84x
FlattenListLoop	1374	1617	+17.7%	0.85x (?)
StrComplexWalk	2790	3190	+14.3%	0.87x (?)
StringAdder	247	282	+14.2%	0.88x (?)
StringBuilder	208	236	+13.5%	0.88x (?)
UTF8Decode_InitFromCustom_noncontiguous	317	358	+12.9%	0.89x (?)
CStringLongNonAscii	145	162	+11.7%	0.90x (?)
StringBuilderSmallReservingCapacity	221	245	+10.9%	0.90x (?)
StringInterpolationSmall	1140	1240	+8.8%	0.92x (?)
Chars2	3450	3750	+8.7%	0.92x (?)
CharIteration_japanese_unicodeScalars	6200	6680	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	3589	2680	-25.3%	1.34x (?)
LessSubstringSubstring	30	24	-20.0%	1.25x (?)
EqualSubstringSubstring	30	24	-20.0%	1.25x (?)
EqualStringSubstring	30	24	-20.0%	1.25x (?)
EqualSubstringSubstringGenericEquatable	30	24	-20.0%	1.25x (?)
EqualSubstringString	30	24	-20.0%	1.25x
LessSubstringSubstringGenericComparable	30	24	-20.0%	1.25x (?)
Fibonacci	6	5	-16.7%	1.20x (?)
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x (?)
DistinctClassFieldAccesses	42	36	-14.3%	1.17x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	214	187	-12.6%	1.14x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	218	191	-12.4%	1.14x (?)
NormalizedIterator_fastPrenormal	530	470	-11.3%	1.13x (?)
Data.hash.Medium	27	24	-11.1%	1.12x (?)
StringComparison_longSharedPrefix	344	308	-10.5%	1.12x (?)
StringBuilderWithLongSubstring	1100	1000	-9.1%	1.10x (?)
NormalizedIterator_emoji	364	332	-8.8%	1.10x (?)
RemoveWhereFilterInts	25	23	-8.0%	1.09x (?)
StringBuilderLong	880	820	-6.8%	1.07x (?)
NormalizedIterator_slowerPrenormal	300	280	-6.7%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	36	36	36	—
CxxVecU32.Sum.Swift.ForInLoop	4461	4558	4504	—
CxxVecU32.Sum.Swift.IteratorLoop	36	36	36	—
CxxVecU32.Sum.Swift.Reduce	4738	4790	4771	—
CxxVecU32.Sum.Swift.SubscriptLoop	35	35	35	—

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	128	168	+31.2%	0.76x
UTF8Decode_InitFromCustom_contiguous	130	170	+30.8%	0.76x
CharacterLiteralsLarge	409	466	+13.9%	0.88x (?)
ArrayAppendLatin1Substring	23112	26172	+13.2%	0.88x (?)
ArrayAppendUTF16Substring	22932	25848	+12.7%	0.89x (?)
ArrayAppendAsciiSubstring	22932	25848	+12.7%	0.89x (?)
SubstringFromLongString2	88	95	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
EqualSubstringSubstringGenericEquatable	34	27	-20.6%	1.26x (?)
LessSubstringSubstringGenericComparable	33	27	-18.2%	1.22x
EqualStringSubstring	36	30	-16.7%	1.20x (?)
EqualSubstringSubstring	35	30	-14.3%	1.17x (?)
EqualSubstringString	35	30	-14.3%	1.17x (?)
LessSubstringSubstring	35	31	-11.4%	1.13x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	231	205	-11.3%	1.13x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	227	203	-10.6%	1.12x (?)
SIMDReduce.Int8	7626	6870	-9.9%	1.11x (?)
Data.hash.Medium	31	28	-9.7%	1.11x (?)
Breadcrumbs.MutatedIdxToUTF16.ASCII	14	13	-7.1%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	1662	1740	1688	—
CxxVecU32.Sum.Swift.ForInLoop	8489	9136	8767	—
CxxVecU32.Sum.Swift.IteratorLoop	1871	2193	2074	—
CxxVecU32.Sum.Swift.Reduce	9588	9680	9625	—
CxxVecU32.Sum.Swift.SubscriptLoop	1827	1876	1844	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️⏱	`CxxVecU32.Sum.Swift.SubscriptLoop` has setup overhead of 28 μs (75.7%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️⏱	`CxxVecU32.Sum.Swift.SubscriptLoop` execution took 9 μs. _{Increase the workload of CxxVecU32.Sum.Swift.SubscriptLoop to be more than 20 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.SubscriptLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.SubscriptLoop mem_pages [i1, i2]: min=[4, 4] 𝚫=0 R=[32, 0]}
⚠️🔤	`CxxVecU32.Sum.Swift.ForInLoop` name is composed of 5 words. _{Split CxxVecU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`CxxVecU32.Sum.Swift.ForInLoop` execution took at least 4374 μs. _{Decrease the workload of CxxVecU32.Sum.Swift.ForInLoop by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.ForInLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[69, 69] 𝚫=0 R=[164, 164]}
⛔️⏱	`CxxVecU32.Sum.Swift.IteratorLoop` has setup overhead of 30 μs (81.1%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️⏱	`CxxVecU32.Sum.Swift.IteratorLoop` execution took 7 μs. _{Increase the workload of CxxVecU32.Sum.Swift.IteratorLoop to be more than 20 μs.}
⚠️🔤	`CxxVecU32.Sum.Cxx.RangedForLoop` name is composed of 5 words. _{Split CxxVecU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⛔️⏱	`CxxVecU32.Sum.Cxx.RangedForLoop` has setup overhead of 32 μs (84.2%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⚠️⏱	`CxxVecU32.Sum.Cxx.RangedForLoop` execution took 6 μs. _{Increase the workload of CxxVecU32.Sum.Cxx.RangedForLoop to be more than 20 μs.}
⚠️⏱	`CxxVecU32.Sum.Swift.Reduce` execution took at least 4500 μs. _{Decrease the workload of CxxVecU32.Sum.Swift.Reduce by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.Reduce` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[70, 70] 𝚫=0 R=[164, 246]}

hyp · 2022-10-10T18:07:52Z

@swift-ci please benchmark

hyp · 2022-10-10T18:07:58Z

@swift-ci please test

hyp · 2022-10-10T19:44:32Z

benchmark info

### Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Dictionary3	124	154	+24.2%	0.81x (?)
DropFirstArray	13	14	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	227	152	-33.0%	1.49x
Dictionary4OfObjects	255	210	-17.6%	1.21x
Fibonacci	6	5	-16.7%	1.20x (?)
TypeName	673	620	-7.9%	1.09x (?)
RemoveWhereFilterInts	27	25	-7.4%	1.08x
DictionaryKeysContainsNative	15	14	-6.7%	1.07x (?)
DropLastAnySeqCRangeIter	381	356	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	10	10	10	—
CxxVecU32.Sum.Swift.ForInLoop	6185	6398	6316	—
CxxVecU32.Sum.Swift.IteratorLoop	10	10	10	—
CxxVecU32.Sum.Swift.Reduce	6376	6502	6432	—
CxxVecU32.Sum.Swift.SubscriptLoop	10	10	10	—

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
StringWalk	1240	1520	+22.6%	0.82x (?)
StrComplexWalk	2800	3100	+10.7%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	78	78	78	—
CxxVecU32.Sum.Swift.ForInLoop	6382	6490	6433	—
CxxVecU32.Sum.Swift.IteratorLoop	78	80	79	—
CxxVecU32.Sum.Swift.Reduce	6622	6719	6656	—
CxxVecU32.Sum.Swift.SubscriptLoop	78	78	78	—

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
SubstringFromLongString2	88	95	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromArrayOfNSString2	1670	1550	-7.2%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop	1587	1623	1600	—
CxxVecU32.Sum.Swift.ForInLoop	10684	10814	10750	—
CxxVecU32.Sum.Swift.IteratorLoop	1784	1827	1799	—
CxxVecU32.Sum.Swift.Reduce	12520	12689	12617	—
CxxVecU32.Sum.Swift.SubscriptLoop	1784	1816	1795	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️⏱	`CxxVecU32.Sum.Swift.SubscriptLoop` execution took 11 μs. _{Increase the workload of CxxVecU32.Sum.Swift.SubscriptLoop to be more than 20 μs.}
⚠️🔤	`CxxVecU32.Sum.Swift.ForInLoop` name is composed of 5 words. _{Split CxxVecU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`CxxVecU32.Sum.Swift.ForInLoop` execution took at least 6038 μs. _{Decrease the workload of CxxVecU32.Sum.Swift.ForInLoop by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.ForInLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[161, 79] 𝚫=82 R=[103, 324]}
⚠️⏱	`CxxVecU32.Sum.Swift.IteratorLoop` execution took 10 μs. _{Increase the workload of CxxVecU32.Sum.Swift.IteratorLoop to be more than 20 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.IteratorLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.IteratorLoop mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[0, 32]}
⚠️🔤	`CxxVecU32.Sum.Cxx.RangedForLoop` name is composed of 5 words. _{Split CxxVecU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`CxxVecU32.Sum.Cxx.RangedForLoop` execution took 11 μs. _{Increase the workload of CxxVecU32.Sum.Cxx.RangedForLoop to be more than 20 μs.}
⚠️⏱	`CxxVecU32.Sum.Swift.Reduce` execution took at least 6315 μs. _{Decrease the workload of CxxVecU32.Sum.Swift.Reduce by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.Sum.Swift.Reduce` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[80, 80] 𝚫=0 R=[164, 328]}

hyp · 2022-10-11T19:27:06Z

@swift-ci please test

hyp · 2022-10-11T19:27:13Z

@swift-ci please benchmark

hyp · 2022-10-11T20:15:11Z

results:

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Dictionary3	124	153	+23.4%	0.81x (?)

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	227	152	-33.0%	1.49x
Dictionary4OfObjects	253	210	-17.0%	1.20x (?)
RemoveWhereFilterInts	27	25	-7.4%	1.08x (?)
DropLastAnySeqCRangeIter	381	356	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop	9	11	10	—
CxxVecU32.sum.Swift.forInLoop	6206	6465	6374	—
CxxVecU32.sum.Swift.iteratorLoop	9	9	9	—
CxxVecU32.sum.Swift.reduce	6446	7498	7125	—
CxxVecU32.sum.Swift.subscriptLoop	10	10	10	—

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
StringWalk	1240	1520	+22.6%	0.82x
StrComplexWalk	2810	3110	+10.7%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	1586	1073	-32.3%	1.48x (?)
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop	93	99	96	—
CxxVecU32.sum.Swift.forInLoop	6531	6707	6641	—
CxxVecU32.sum.Swift.iteratorLoop	81	81	81	—
CxxVecU32.sum.Swift.reduce	6912	6949	6927	—
CxxVecU32.sum.Swift.subscriptLoop	81	81	81	—

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
StringBuilderLong	980	1060	+8.2%	0.92x (?)
SubstringFromLongString2	88	95	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
NSError	402	360	-10.4%	1.12x (?)

Added	MIN	MAX	MEAN	MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop	1698	1704	1700	—
CxxVecU32.sum.Swift.forInLoop	10681	10947	10819	—
CxxVecU32.sum.Swift.iteratorLoop	1781	1814	1792	—
CxxVecU32.sum.Swift.reduce	12434	12495	12466	—
CxxVecU32.sum.Swift.subscriptLoop	1783	1815	1794	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️⏱	`CxxVecU32.sum.Swift.subscriptLoop` execution took 11 μs. _{Increase the workload of CxxVecU32.sum.Swift.subscriptLoop to be more than 20 μs.}
⚠️🔤	`CxxVecU32.sum.Swift.forInLoop` name is composed of 5 words. _{Split CxxVecU32.sum.Swift.forInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`CxxVecU32.sum.Swift.forInLoop` execution took at least 6023 μs. _{Decrease the workload of CxxVecU32.sum.Swift.forInLoop by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.sum.Swift.forInLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.sum.Swift.forInLoop mem_pages [i1, i2]: min=[79, 79] 𝚫=0 R=[278, 185]}
⚠️⏱	`CxxVecU32.sum.Swift.iteratorLoop` execution took 10 μs. _{Increase the workload of CxxVecU32.sum.Swift.iteratorLoop to be more than 20 μs.}
⚠️🔤	`CxxVecU32.sum.Cxx.rangedForLoop` name is composed of 5 words. _{Split CxxVecU32.sum.Cxx.rangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`CxxVecU32.sum.Cxx.rangedForLoop` execution took 11 μs. _{Increase the workload of CxxVecU32.sum.Cxx.rangedForLoop to be more than 20 μs.}
⚠️Ⓜ️	`CxxVecU32.sum.Cxx.rangedForLoop` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.sum.Cxx.rangedForLoop mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[32, 0]}
⚠️⏱	`CxxVecU32.sum.Swift.reduce` execution took at least 6291 μs. _{Decrease the workload of CxxVecU32.sum.Swift.reduce by a factor of 8 (10), to be less than 1000 μs.}
⚠️Ⓜ️	`CxxVecU32.sum.Swift.reduce` has very wide range of memory used between independent, repeated measurements. _{CxxVecU32.sum.Swift.reduce mem_pages [i1, i2]: min=[80, 80] 𝚫=0 R=[185, 103]}

hyp · 2022-10-11T23:40:20Z

@swift-ci please test macOS platform

hyp · 2022-10-12T04:29:59Z

@swift-ci please test macOS platform

hyp · 2022-10-12T15:32:48Z

@swift-ci please test macOS platform

hyp · 2022-10-12T19:45:00Z

@swift-ci please test macOS platform

hyp requested review from egorzhdan and zoecarver October 5, 2022 18:49

hyp added c++ interop Feature: Interoperability with C++ benchmarks labels Oct 5, 2022

zoecarver reviewed Oct 5, 2022

View reviewed changes

benchmark/cxx-source/CxxVectorSum.swift Outdated Show resolved Hide resolved

zoecarver reviewed Oct 5, 2022

View reviewed changes

benchmark/cxx-source/CxxVectorSum.swift Outdated Show resolved Hide resolved

zoecarver reviewed Oct 5, 2022

View reviewed changes

benchmark/cxx-source/CxxVectorSum.swift Outdated Show resolved Hide resolved

hyp mentioned this pull request Oct 7, 2022

[performance] copy_addr is not optimized out for a C++ subscript invocation #61499

Open

hyp added 6 commits October 10, 2022 09:13

[cxx-interop] Add initial benchmark to compare vector<uint32_t> sum i…

d72b592

…n C++ vs Swift

[interop] rewrite the C++ vector sum benchmark to have clear naming a…

157be05

…nd take iter into account

[interop] benchmark: add run_CxxVectorOfU32_Sum_Swift_RawIteratorLoop…

2a07c1c

… that doesn't use C++ inline helpers

[interop] cxx vector sum benchmark - reduce iterations to avoid long …

4634230

…times for slow runs

[interop] cxx vector benchmark: iterator is fast with operator == in …

ca927dc

…C++, and subscript is fast with swiftlang#61499 fixed

[interop] cxx vector benchmark: do not use subscript until swiftlang#…

0e95c75

…61499 is fixed

hyp force-pushed the eng/benchmark-sum1 branch from 106ea29 to 0e95c75 Compare October 10, 2022 16:14

hyp added 2 commits October 10, 2022 11:06

[interop] cxx vec benchmark - initialize vector before benchmark

a50939b

[interop] cxx vec benchmark - bump up the iter repeat count

3341aef

[interop] vec benchmark - update benchmark naming

d50f834

hyp added 2 commits October 10, 2022 16:39

[interop] benchmark - add a comment to bump up the iteration count

5b88dc3

[interop] benchmark - disable linux until swiftlang#61547 is fixed

c4a9136

hyp merged commit 73dddfc into swiftlang:main Oct 12, 2022

[cxx-interop] Add initial benchmark to compare vector<uint32_t> sum i… #61456

[cxx-interop] Add initial benchmark to compare vector<uint32_t> sum i… #61456

Uh oh!

Conversation

hyp commented Oct 5, 2022

Uh oh!

hyp commented Oct 5, 2022

Uh oh!

hyp commented Oct 5, 2022

Uh oh!

hyp commented Oct 5, 2022

Uh oh!

zoecarver commented Oct 5, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyp commented Oct 7, 2022

Uh oh!

hyp commented Oct 7, 2022

Uh oh!

hyp commented Oct 7, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

hyp commented Oct 10, 2022

Uh oh!

hyp commented Oct 10, 2022

Uh oh!

hyp commented Oct 10, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

hyp commented Oct 10, 2022

Uh oh!

hyp commented Oct 10, 2022

Uh oh!

hyp commented Oct 10, 2022

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

hyp commented Oct 11, 2022

Uh oh!

hyp commented Oct 11, 2022

Uh oh!

hyp commented Oct 11, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

hyp commented Oct 11, 2022

Uh oh!

hyp commented Oct 12, 2022

Uh oh!

hyp commented Oct 12, 2022

Uh oh!

hyp commented Oct 12, 2022

Uh oh!

Uh oh!