Skip to content

[cxx-interop] Add initial benchmark to compare vector<uint32_t> sum i… #61456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 12, 2022

Conversation

hyp
Copy link
Contributor

@hyp hyp commented Oct 5, 2022

…n C++ vs Swift

This benchmark compares the performance of summing up a vector of a million elements between Swift and C++.

Initial numbers on M1 Mac:

#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
139,CreateObjects,200,11,29,14,2,15
140,CxxVectorOfU32SumInCxx,200,77,107,95,7,93
141,CxxVectorOfU32SumInSwift,36,24980,27957,25654,646,25546
142,CxxVectorOfU32SumInSwift_Fastest,200,77,83,77,1,77

We're investigating why the CxxVectorOfU32SumInSwift is slower and how to correctly fix it.

@hyp hyp requested review from egorzhdan and zoecarver October 5, 2022 18:49
@hyp
Copy link
Contributor Author

hyp commented Oct 5, 2022

@swift-ci please test

@hyp
Copy link
Contributor Author

hyp commented Oct 5, 2022

@swift-ci please benchmark

@hyp hyp added c++ interop Feature: Interoperability with C++ benchmarks labels Oct 5, 2022
@hyp
Copy link
Contributor Author

hyp commented Oct 5, 2022

@swift-ci please test macOS platform

@zoecarver
Copy link
Contributor

Benchmark Info

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
Dictionary4 153 193 +26.1% 0.79x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 1620 1386 -14.4% 1.17x (?)
ObjectiveCBridgeStubToNSStringRef 93 86 -7.5% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32SumInCxx 3 5 4
CxxVectorOfU32SumInSwift 38989 39456 39183
CxxVectorOfU32SumInSwift_Fastest 3 3 3

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
PrefixAnySeqCntRange 13 16 +23.1% 0.81x (?)
ObjectiveCBridgeStubDateAccess 130 152 +16.9% 0.86x (?)
FlattenListLoop 1385 1615 +16.6% 0.86x (?)
PrefixWhileAnySequence 172 198 +15.1% 0.87x (?)
StringComparison_ascii 360 390 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Dictionary4 231 159 -31.2% 1.45x (?)
Dictionary4OfObjects 316 277 -12.3% 1.14x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32SumInCxx 54 56 55
CxxVectorOfU32SumInSwift 37043 38907 38182
CxxVectorOfU32SumInSwift_Fastest 35 36 35

Code size: -Osize

Performance (x86_64): -Onone

Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64Exactly 475 441 -7.2% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32SumInCxx 8147 8154 8152
CxxVectorOfU32SumInSwift 72724 77494 75066
CxxVectorOfU32SumInSwift_Fastest 10992 11758 11369

Code size: -swiftlibs

Benchmark Check Report
⚠️🔤 CxxVectorOfU32SumInSwift name is composed of 6 words.
Split CxxVectorOfU32SumInSwift name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32SumInSwift has setup overhead of 36214 μs (99.6%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️Ⓜ️ CxxVectorOfU32SumInSwift has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32SumInSwift mem_pages [i1, i2]: min=[3928, 3928] 𝚫=0 R=[5876, 2938]
⚠️🔤 CxxVectorOfU32SumInCxx name is composed of 6 words.
Split CxxVectorOfU32SumInCxx name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32SumInCxx has setup overhead of 70 μs (100.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ CxxVectorOfU32SumInCxx execution took 0 μs.
Ensure the workload of CxxVectorOfU32SumInCxx has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⚠️Ⓜ️ CxxVectorOfU32SumInCxx has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32SumInCxx mem_pages [i1, i2]: min=[1960, 1960] 𝚫=0 R=[0, 19]
⛔️🔤 CxxVectorOfU32SumInSwift_Fastest name doesn`t conform to benchmark naming convention.
See http://bit.ly/BenchmarkNaming
⚠️🔤 CxxVectorOfU32SumInSwift_Fastest name is composed of 6 words.
Split CxxVectorOfU32SumInSwift_Fastest name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32SumInSwift_Fastest has setup overhead of 78 μs (106.8%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️ CxxVectorOfU32SumInSwift_Fastest execution took -5 μs.
Increase the workload of CxxVectorOfU32SumInSwift_Fastest to be more than 20 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

@hyp
Copy link
Contributor Author

hyp commented Oct 7, 2022

@swift-ci please benchmark

@hyp
Copy link
Contributor Author

hyp commented Oct 7, 2022

@swift-ci please test

@hyp
Copy link
Contributor Author

hyp commented Oct 7, 2022

Benchmark info:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
StringInterpolationManySmallSegments 7300 8800 +20.5% 0.83x (?)
Dictionary3 122 146 +19.7% 0.84x (?)
RemoveWhereMoveInts 12 13 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Dictionary4 201 152 -24.4% 1.32x
Dictionary4OfObjects 248 209 -15.7% 1.19x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop 101 103 102
CxxVectorOfU32.Sum.Swift.ForInLoop 36881 41604 39568
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop 10690028 11153274 10972647
CxxVectorOfU32.Sum.Swift.RawIteratorLoop 90 91 91
CxxVectorOfU32.Sum.Swift.Reduce 39856 40399 40106

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
PrefixAnySeqCRangeIter 13 15 +15.4% 0.87x (?)
StringWalk 1360 1520 +11.8% 0.89x (?)
PrefixAnySeqCntRange 13 14 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x (?)
DictionaryKeysContainsCocoa 15 14 -6.7% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop 331 332 331
CxxVectorOfU32.Sum.Swift.ForInLoop 36610 38637 37663
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop 10746758 10871907 10796945
CxxVectorOfU32.Sum.Swift.RawIteratorLoop 325 329 327
CxxVectorOfU32.Sum.Swift.Reduce 39519 39997 39750

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubNSDateRefAccess 2516 3057 +21.5% 0.82x (?)
ObjectiveCBridgeStubDateAccess 2604 3077 +18.2% 0.85x (?)
NSError 371 403 +8.6% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateMutation 352 326 -7.4% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVectorOfU32.Sum.Cxx.RangedForLoop 11179 11419 11336
CxxVectorOfU32.Sum.Swift.ForInLoop 67463 67624 67524
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop 703459153 704797431 704231260
CxxVectorOfU32.Sum.Swift.RawIteratorLoop 14508 14595 14548
CxxVectorOfU32.Sum.Swift.Reduce 80136 81468 80918

Code size: -swiftlibs

Benchmark Check Report
⚠️🔤 CxxVectorOfU32.Sum.Cxx.RangedForLoop name is composed of 6 words.
Split CxxVectorOfU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32.Sum.Cxx.RangedForLoop has setup overhead of 122 μs (64.2%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️Ⓜ️ CxxVectorOfU32.Sum.Cxx.RangedForLoop has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32.Sum.Cxx.RangedForLoop mem_pages [i1, i2]: min=[102, 102] 𝚫=0 R=[105, 210]
⚠️🔤 CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop name is composed of 7 words.
Split CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️🔤 CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop name is 46 characters long.
Benchmark name should not be longer than 40 characters.
⛔️⏱ CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop execution took at least 10495803 μs.
Decrease the workload of CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop by a factor of 16384 (100000), to be less than 1000 μs.
⚠️Ⓜ️ CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32.Sum.Swift.IndexAndSubscriptLoop mem_pages [i1, i2]: min=[1018, 1020] 𝚫=2 R=[206, 102]
⚠️🔤 CxxVectorOfU32.Sum.Swift.ForInLoop name is composed of 6 words.
Split CxxVectorOfU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32.Sum.Swift.ForInLoop execution took at least 34850 μs.
Decrease the workload of CxxVectorOfU32.Sum.Swift.ForInLoop by a factor of 64 (100), to be less than 1000 μs.
⚠️Ⓜ️ CxxVectorOfU32.Sum.Swift.ForInLoop has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[299, 299] 𝚫=0 R=[1211, 301]
⛔️⏱ CxxVectorOfU32.Sum.Swift.Reduce execution took at least 36683 μs.
Decrease the workload of CxxVectorOfU32.Sum.Swift.Reduce by a factor of 64 (100), to be less than 1000 μs.
⚠️Ⓜ️ CxxVectorOfU32.Sum.Swift.Reduce has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[300, 300] 𝚫=0 R=[917, 1204]
⚠️🔤 CxxVectorOfU32.Sum.Swift.RawIteratorLoop name is composed of 6 words.
Split CxxVectorOfU32.Sum.Swift.RawIteratorLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVectorOfU32.Sum.Swift.RawIteratorLoop has setup overhead of 126 μs (66.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️Ⓜ️ CxxVectorOfU32.Sum.Swift.RawIteratorLoop has very wide range of memory used between independent, repeated measurements.
CxxVectorOfU32.Sum.Swift.RawIteratorLoop mem_pages [i1, i2]: min=[102, 102] 𝚫=0 R=[105, 105]

@hyp hyp force-pushed the eng/benchmark-sum1 branch from 106ea29 to 0e95c75 Compare October 10, 2022 16:14
@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

@swift-ci please benchmark

@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

@swift-ci please test

@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

Benchmark results

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous 123 161 +30.9% 0.76x
UTF8Decode_InitDecoding 123 161 +30.9% 0.76x
Dictionary3 125 151 +20.8% 0.83x (?)
StringUTF16Builder 220 260 +18.2% 0.85x (?)
StringBuilder 202 237 +17.3% 0.85x (?)
StringBuilderSmallReservingCapacity 210 246 +17.1% 0.85x (?)
UTF8Decode_InitFromCustom_noncontiguous 241 279 +15.8% 0.86x (?)
CStringLongNonAscii 144 164 +13.9% 0.88x (?)
ParseFloat.Double.Exp 9 10 +11.1% 0.90x (?)
NormalizedIterator_emoji 328 364 +11.0% 0.90x (?)
NormalizedIterator_nonBMPSlowestPrenormal 410 450 +9.8% 0.91x (?)
Calculator 141 153 +8.5% 0.92x (?)
Chars2 3150 3400 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Dictionary4 227 152 -33.0% 1.49x
LessSubstringSubstring 30 24 -20.0% 1.25x (?)
EqualSubstringSubstringGenericEquatable 30 24 -20.0% 1.25x (?)
EqualSubstringString 30 24 -20.0% 1.25x (?)
LessSubstringSubstringGenericComparable 30 24 -20.0% 1.25x (?)
EqualStringSubstring 31 25 -19.4% 1.24x
Dictionary4OfObjects 256 210 -18.0% 1.22x (?)
DictionaryKeysContainsCocoa 17 14 -17.6% 1.21x (?)
EqualSubstringSubstring 30 25 -16.7% 1.20x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 214 187 -12.6% 1.14x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 216 191 -11.6% 1.13x
Data.hash.Medium 27 24 -11.1% 1.12x (?)
StringComparison_longSharedPrefix 343 306 -10.8% 1.12x (?)
DropLastAnySeqCRangeIter 381 342 -10.2% 1.11x (?)
DistinctClassFieldAccesses 42 38 -9.5% 1.11x (?)
RemoveWhereFilterInts 27 25 -7.4% 1.08x (?)
PrefixWhileSequence 211 197 -6.6% 1.07x (?)
PrefixWhileAnySequence 197 184 -6.6% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 8 8 8
CxxVecU32.Sum.Swift.ForInLoop 4610 4710 4643
CxxVecU32.Sum.Swift.IteratorLoop 8 8 8
CxxVecU32.Sum.Swift.Reduce 4544 4824 4654
CxxVecU32.Sum.Swift.SubscriptLoop 8 8 8

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous 121 161 +33.1% 0.75x (?)
UTF8Decode_InitDecoding 124 161 +29.8% 0.77x
StringWalk 1240 1520 +22.6% 0.82x (?)
StringUTF16Builder 210 250 +19.0% 0.84x
FlattenListLoop 1374 1617 +17.7% 0.85x (?)
StrComplexWalk 2790 3190 +14.3% 0.87x (?)
StringAdder 247 282 +14.2% 0.88x (?)
StringBuilder 208 236 +13.5% 0.88x (?)
UTF8Decode_InitFromCustom_noncontiguous 317 358 +12.9% 0.89x (?)
CStringLongNonAscii 145 162 +11.7% 0.90x (?)
StringBuilderSmallReservingCapacity 221 245 +10.9% 0.90x (?)
StringInterpolationSmall 1140 1240 +8.8% 0.92x (?)
Chars2 3450 3750 +8.7% 0.92x (?)
CharIteration_japanese_unicodeScalars 6200 6680 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 3589 2680 -25.3% 1.34x (?)
LessSubstringSubstring 30 24 -20.0% 1.25x (?)
EqualSubstringSubstring 30 24 -20.0% 1.25x (?)
EqualStringSubstring 30 24 -20.0% 1.25x (?)
EqualSubstringSubstringGenericEquatable 30 24 -20.0% 1.25x (?)
EqualSubstringString 30 24 -20.0% 1.25x
LessSubstringSubstringGenericComparable 30 24 -20.0% 1.25x (?)
Fibonacci 6 5 -16.7% 1.20x (?)
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x (?)
DistinctClassFieldAccesses 42 36 -14.3% 1.17x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 214 187 -12.6% 1.14x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 218 191 -12.4% 1.14x (?)
NormalizedIterator_fastPrenormal 530 470 -11.3% 1.13x (?)
Data.hash.Medium 27 24 -11.1% 1.12x (?)
StringComparison_longSharedPrefix 344 308 -10.5% 1.12x (?)
StringBuilderWithLongSubstring 1100 1000 -9.1% 1.10x (?)
NormalizedIterator_emoji 364 332 -8.8% 1.10x (?)
RemoveWhereFilterInts 25 23 -8.0% 1.09x (?)
StringBuilderLong 880 820 -6.8% 1.07x (?)
NormalizedIterator_slowerPrenormal 300 280 -6.7% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 36 36 36
CxxVecU32.Sum.Swift.ForInLoop 4461 4558 4504
CxxVecU32.Sum.Swift.IteratorLoop 36 36 36
CxxVecU32.Sum.Swift.Reduce 4738 4790 4771
CxxVecU32.Sum.Swift.SubscriptLoop 35 35 35

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 128 168 +31.2% 0.76x
UTF8Decode_InitFromCustom_contiguous 130 170 +30.8% 0.76x
CharacterLiteralsLarge 409 466 +13.9% 0.88x (?)
ArrayAppendLatin1Substring 23112 26172 +13.2% 0.88x (?)
ArrayAppendUTF16Substring 22932 25848 +12.7% 0.89x (?)
ArrayAppendAsciiSubstring 22932 25848 +12.7% 0.89x (?)
SubstringFromLongString2 88 95 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 34 27 -20.6% 1.26x (?)
LessSubstringSubstringGenericComparable 33 27 -18.2% 1.22x
EqualStringSubstring 36 30 -16.7% 1.20x (?)
EqualSubstringSubstring 35 30 -14.3% 1.17x (?)
EqualSubstringString 35 30 -14.3% 1.17x (?)
LessSubstringSubstring 35 31 -11.4% 1.13x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 231 205 -11.3% 1.13x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 227 203 -10.6% 1.12x (?)
SIMDReduce.Int8 7626 6870 -9.9% 1.11x (?)
Data.hash.Medium 31 28 -9.7% 1.11x (?)
Breadcrumbs.MutatedIdxToUTF16.ASCII 14 13 -7.1% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 1662 1740 1688
CxxVecU32.Sum.Swift.ForInLoop 8489 9136 8767
CxxVecU32.Sum.Swift.IteratorLoop 1871 2193 2074
CxxVecU32.Sum.Swift.Reduce 9588 9680 9625
CxxVecU32.Sum.Swift.SubscriptLoop 1827 1876 1844

Code size: -swiftlibs

Benchmark Check Report
⛔️⏱ CxxVecU32.Sum.Swift.SubscriptLoop has setup overhead of 28 μs (75.7%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️ CxxVecU32.Sum.Swift.SubscriptLoop execution took 9 μs.
Increase the workload of CxxVecU32.Sum.Swift.SubscriptLoop to be more than 20 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.SubscriptLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.SubscriptLoop mem_pages [i1, i2]: min=[4, 4] 𝚫=0 R=[32, 0]
⚠️🔤 CxxVecU32.Sum.Swift.ForInLoop name is composed of 5 words.
Split CxxVecU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ CxxVecU32.Sum.Swift.ForInLoop execution took at least 4374 μs.
Decrease the workload of CxxVecU32.Sum.Swift.ForInLoop by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.ForInLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[69, 69] 𝚫=0 R=[164, 164]
⛔️⏱ CxxVecU32.Sum.Swift.IteratorLoop has setup overhead of 30 μs (81.1%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️ CxxVecU32.Sum.Swift.IteratorLoop execution took 7 μs.
Increase the workload of CxxVecU32.Sum.Swift.IteratorLoop to be more than 20 μs.
⚠️🔤 CxxVecU32.Sum.Cxx.RangedForLoop name is composed of 5 words.
Split CxxVecU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ CxxVecU32.Sum.Cxx.RangedForLoop has setup overhead of 32 μs (84.2%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️ CxxVecU32.Sum.Cxx.RangedForLoop execution took 6 μs.
Increase the workload of CxxVecU32.Sum.Cxx.RangedForLoop to be more than 20 μs.
⚠️ CxxVecU32.Sum.Swift.Reduce execution took at least 4500 μs.
Decrease the workload of CxxVecU32.Sum.Swift.Reduce by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.Reduce has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[70, 70] 𝚫=0 R=[164, 246]

@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

@swift-ci please benchmark

@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

@swift-ci please test

@hyp
Copy link
Contributor Author

hyp commented Oct 10, 2022

benchmark info ### Performance (x86_64): -O
Regression OLD NEW DELTA RATIO
Dictionary3 124 154 +24.2% 0.81x (?)
DropFirstArray 13 14 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
Dictionary4 227 152 -33.0% 1.49x
Dictionary4OfObjects 255 210 -17.6% 1.21x
Fibonacci 6 5 -16.7% 1.20x (?)
TypeName 673 620 -7.9% 1.09x (?)
RemoveWhereFilterInts 27 25 -7.4% 1.08x
DictionaryKeysContainsNative 15 14 -6.7% 1.07x (?)
DropLastAnySeqCRangeIter 381 356 -6.6% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 10 10 10
CxxVecU32.Sum.Swift.ForInLoop 6185 6398 6316
CxxVecU32.Sum.Swift.IteratorLoop 10 10 10
CxxVecU32.Sum.Swift.Reduce 6376 6502 6432
CxxVecU32.Sum.Swift.SubscriptLoop 10 10 10

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
StringWalk 1240 1520 +22.6% 0.82x (?)
StrComplexWalk 2800 3100 +10.7% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 78 78 78
CxxVecU32.Sum.Swift.ForInLoop 6382 6490 6433
CxxVecU32.Sum.Swift.IteratorLoop 78 80 79
CxxVecU32.Sum.Swift.Reduce 6622 6719 6656
CxxVecU32.Sum.Swift.SubscriptLoop 78 78 78

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
SubstringFromLongString2 88 95 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromArrayOfNSString2 1670 1550 -7.2% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.Sum.Cxx.RangedForLoop 1587 1623 1600
CxxVecU32.Sum.Swift.ForInLoop 10684 10814 10750
CxxVecU32.Sum.Swift.IteratorLoop 1784 1827 1799
CxxVecU32.Sum.Swift.Reduce 12520 12689 12617
CxxVecU32.Sum.Swift.SubscriptLoop 1784 1816 1795

Code size: -swiftlibs

Benchmark Check Report
⚠️ CxxVecU32.Sum.Swift.SubscriptLoop execution took 11 μs.
Increase the workload of CxxVecU32.Sum.Swift.SubscriptLoop to be more than 20 μs.
⚠️🔤 CxxVecU32.Sum.Swift.ForInLoop name is composed of 5 words.
Split CxxVecU32.Sum.Swift.ForInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ CxxVecU32.Sum.Swift.ForInLoop execution took at least 6038 μs.
Decrease the workload of CxxVecU32.Sum.Swift.ForInLoop by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.ForInLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.ForInLoop mem_pages [i1, i2]: min=[161, 79] 𝚫=82 R=[103, 324]
⚠️ CxxVecU32.Sum.Swift.IteratorLoop execution took 10 μs.
Increase the workload of CxxVecU32.Sum.Swift.IteratorLoop to be more than 20 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.IteratorLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.IteratorLoop mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[0, 32]
⚠️🔤 CxxVecU32.Sum.Cxx.RangedForLoop name is composed of 5 words.
Split CxxVecU32.Sum.Cxx.RangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ CxxVecU32.Sum.Cxx.RangedForLoop execution took 11 μs.
Increase the workload of CxxVecU32.Sum.Cxx.RangedForLoop to be more than 20 μs.
⚠️ CxxVecU32.Sum.Swift.Reduce execution took at least 6315 μs.
Decrease the workload of CxxVecU32.Sum.Swift.Reduce by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.Sum.Swift.Reduce has very wide range of memory used between independent, repeated measurements.
CxxVecU32.Sum.Swift.Reduce mem_pages [i1, i2]: min=[80, 80] 𝚫=0 R=[164, 328]

@hyp
Copy link
Contributor Author

hyp commented Oct 11, 2022

@swift-ci please test

@hyp
Copy link
Contributor Author

hyp commented Oct 11, 2022

@swift-ci please benchmark

@hyp
Copy link
Contributor Author

hyp commented Oct 11, 2022

results:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
Dictionary3 124 153 +23.4% 0.81x (?)
 
Improvement OLD NEW DELTA RATIO
Dictionary4 227 152 -33.0% 1.49x
Dictionary4OfObjects 253 210 -17.0% 1.20x (?)
RemoveWhereFilterInts 27 25 -7.4% 1.08x (?)
DropLastAnySeqCRangeIter 381 356 -6.6% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop 9 11 10
CxxVecU32.sum.Swift.forInLoop 6206 6465 6374
CxxVecU32.sum.Swift.iteratorLoop 9 9 9
CxxVecU32.sum.Swift.reduce 6446 7498 7125
CxxVecU32.sum.Swift.subscriptLoop 10 10 10

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
StringWalk 1240 1520 +22.6% 0.82x
StrComplexWalk 2810 3110 +10.7% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 1586 1073 -32.3% 1.48x (?)
ObjectiveCBridgeStubDateAccess 152 130 -14.5% 1.17x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop 93 99 96
CxxVecU32.sum.Swift.forInLoop 6531 6707 6641
CxxVecU32.sum.Swift.iteratorLoop 81 81 81
CxxVecU32.sum.Swift.reduce 6912 6949 6927
CxxVecU32.sum.Swift.subscriptLoop 81 81 81

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
StringBuilderLong 980 1060 +8.2% 0.92x (?)
SubstringFromLongString2 88 95 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NSError 402 360 -10.4% 1.12x (?)
 
Added MIN MAX MEAN MAX_RSS
CxxVecU32.sum.Cxx.rangedForLoop 1698 1704 1700
CxxVecU32.sum.Swift.forInLoop 10681 10947 10819
CxxVecU32.sum.Swift.iteratorLoop 1781 1814 1792
CxxVecU32.sum.Swift.reduce 12434 12495 12466
CxxVecU32.sum.Swift.subscriptLoop 1783 1815 1794

Code size: -swiftlibs

Benchmark Check Report
⚠️ CxxVecU32.sum.Swift.subscriptLoop execution took 11 μs.
Increase the workload of CxxVecU32.sum.Swift.subscriptLoop to be more than 20 μs.
⚠️🔤 CxxVecU32.sum.Swift.forInLoop name is composed of 5 words.
Split CxxVecU32.sum.Swift.forInLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ CxxVecU32.sum.Swift.forInLoop execution took at least 6023 μs.
Decrease the workload of CxxVecU32.sum.Swift.forInLoop by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.sum.Swift.forInLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.sum.Swift.forInLoop mem_pages [i1, i2]: min=[79, 79] 𝚫=0 R=[278, 185]
⚠️ CxxVecU32.sum.Swift.iteratorLoop execution took 10 μs.
Increase the workload of CxxVecU32.sum.Swift.iteratorLoop to be more than 20 μs.
⚠️🔤 CxxVecU32.sum.Cxx.rangedForLoop name is composed of 5 words.
Split CxxVecU32.sum.Cxx.rangedForLoop name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ CxxVecU32.sum.Cxx.rangedForLoop execution took 11 μs.
Increase the workload of CxxVecU32.sum.Cxx.rangedForLoop to be more than 20 μs.
⚠️Ⓜ️ CxxVecU32.sum.Cxx.rangedForLoop has very wide range of memory used between independent, repeated measurements.
CxxVecU32.sum.Cxx.rangedForLoop mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[32, 0]
⚠️ CxxVecU32.sum.Swift.reduce execution took at least 6291 μs.
Decrease the workload of CxxVecU32.sum.Swift.reduce by a factor of 8 (10), to be less than 1000 μs.
⚠️Ⓜ️ CxxVecU32.sum.Swift.reduce has very wide range of memory used between independent, repeated measurements.
CxxVecU32.sum.Swift.reduce mem_pages [i1, i2]: min=[80, 80] 𝚫=0 R=[185, 103]

@hyp
Copy link
Contributor Author

hyp commented Oct 11, 2022

@swift-ci please test macOS platform

3 similar comments
@hyp
Copy link
Contributor Author

hyp commented Oct 12, 2022

@swift-ci please test macOS platform

@hyp
Copy link
Contributor Author

hyp commented Oct 12, 2022

@swift-ci please test macOS platform

@hyp
Copy link
Contributor Author

hyp commented Oct 12, 2022

@swift-ci please test macOS platform

@hyp hyp merged commit 73dddfc into swiftlang:main Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks c++ interop Feature: Interoperability with C++
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants