Skip to content

[benchmark] Add HTTP2StateMachine benchmark #19954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 15, 2020

Conversation

Lukasa
Copy link
Contributor

@Lukasa Lukasa commented Oct 19, 2018

This pull request adds a new benchmark to the benchmark suite: HTTP2StateMachine. This benchmark was extracted from work-in-progress code from the SwiftNIO HTTP/2 implementation, and reproduces the core stream state machine that will be used for SwiftNIO's pure-Swift HTTP/2 stack.

This state machine ultimately consists of a moderately-sized enumeration, almost all cases of which contain associated data. Some of this associated data is switched over in some functions, other portions of it are extracted and computed upon in the case bodies.

Why benchmark this?

For SwiftNIO, and network protocol programming in general, it is highly desirable to write protocol implementations that store almost all of their state in associated data of enumerations. This allows the type system to give as much assistance as possible to guaranteeing the correctness of the program.

For this reason, it's important that Swift generates good code for switching over these kinds of enumerations, to avoid providing developers with an incentive to use a less-powerful but faster pattern. Developers are infamous for choosing to do the fast thing instead of the correct thing.

Some quick analysis of the generated assembly for this example reveals that the Swift compiler currently generates pretty good code here, so I think this benchmark is more of a regression test than a target for improvement.

Differences from product code

  1. Some HTTP/2-specific types have been removed and replaced with Int. These types are ultimately structs backed by Int32, so the core layout of the enumeration is not meaningfully different, and I don't think this change will affect the benchmark.
  2. In two of these functions in the product code, the switch statement is contained within a do { } catch block, which has been removed here. That block does meaningfully affect the complexity of the functions in question, so it may be worth adding it back in: please let me know if that's worthwhile.

@Lukasa Lukasa force-pushed the cb-bench-h2-enumeration branch from 87af504 to 906a9fc Compare October 19, 2018 15:24
@lorentey
Copy link
Member

@swift-ci please smoke test

@lorentey
Copy link
Member

@swift-ci smoke benchmark

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

@swift-ci Please smoke test OS X platform

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

@swift-ci Please benchmark

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

oh sorry, Karoy had already kicked that off :)

@swift-ci
Copy link
Contributor

Build comment file:

Build failed before running benchmark.


1 similar comment
@swift-ci
Copy link
Contributor

Build comment file:

Build failed before running benchmark.


@Lukasa Lukasa force-pushed the cb-bench-h2-enumeration branch from 906a9fc to 0caab21 Compare October 19, 2018 16:15
@Lukasa
Copy link
Contributor Author

Lukasa commented Oct 19, 2018

@swift-ci please smoke test

@Lukasa
Copy link
Contributor Author

Lukasa commented Oct 19, 2018

Aww, clearly requires the commit bit. 😞

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

@swift-ci Please smoke benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
DictionaryKeysContainsNative 44 30 -31.8% 1.47x
Array2D 7505 6907 -8.0% 1.09x
MapReduce 428 397 -7.2% 1.08x
Added
HTTP2StateMachine 0 0 0

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7208 6611 -8.3% 1.09x
MapReduceAnyCollection 437 408 -6.6% 1.07x
Added
HTTP2StateMachine 0 0 0

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
HTTP2StateMachine 11 12 12
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@Lukasa Lukasa force-pushed the cb-bench-h2-enumeration branch from 0caab21 to 794fea9 Compare October 19, 2018 16:48
@Lukasa
Copy link
Contributor Author

Lukasa commented Oct 19, 2018

Added a few orders of magnitude.

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

@swift-ci Please benchmark

@weissi
Copy link
Contributor

weissi commented Oct 19, 2018

@swift-ci Please smoke benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7505 6907 -8.0% 1.09x
OpenClose 77 71 -7.8% 1.08x
MapReduceAnyCollection 398 370 -7.0% 1.08x
MapReduce 427 397 -7.0% 1.08x
Added
HTTP2StateMachine 5 6 5

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7208 6611 -8.3% 1.09x
Added
HTTP2StateMachine 5 5 5

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
HTTP2StateMachine 11000 11164 11055
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7505 6907 -8.0% 1.09x
MapReduceAnyCollection 398 370 -7.0% 1.08x
MapReduce 425 397 -6.6% 1.07x
Added
HTTP2StateMachine 5 6 5

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7208 6611 -8.3% 1.09x
MapReduceAnyCollection 437 408 -6.6% 1.07x (?)
Added
HTTP2StateMachine 5 5 5

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
HTTP2StateMachine 11518 11685 11574
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@Lukasa
Copy link
Contributor Author

Lukasa commented Oct 22, 2018

Ok, added two more orders of magnitude.

@Lukasa Lukasa force-pushed the cb-bench-h2-enumeration branch from 794fea9 to 32c366e Compare October 22, 2018 21:10
@weissi
Copy link
Contributor

weissi commented Oct 24, 2018

@swift-ci Please smoke benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DataCopyBytes 560 612 +9.3% 0.92x
Added
HTTP2StateMachine 571 571 571

Performance: -Osize

TEST MIN MAX MEAN MAX_RSS
Added
HTTP2StateMachine 571 571 571

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
HTTP2StateMachine 1131384 1132150 1131804
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@CodaFi
Copy link
Contributor

CodaFi commented Nov 18, 2019

@Lukasa This benchmark still seems valuable to have. Could you rebase this patch so we can get it merged?

@Lukasa Lukasa force-pushed the cb-bench-h2-enumeration branch from 32c366e to 7e82b6f Compare November 19, 2019 21:10
@Lukasa
Copy link
Contributor Author

Lukasa commented Nov 19, 2019

@swift-ci please smoke test

@Lukasa
Copy link
Contributor Author

Lukasa commented Nov 19, 2019

@swift-ci please benchmark

@Lukasa
Copy link
Contributor Author

Lukasa commented Nov 19, 2019

@CodaFi Sure thing, rebased and resolved the conflicts, and kicked off some tests.

@Lukasa
Copy link
Contributor Author

Lukasa commented Nov 19, 2019

I don't think that smoke test failure is on me.

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 7038 9437 +34.1% 0.75x (?)
CharacterLiteralsLarge 97 108 +11.3% 0.90x
 
Added MIN MAX MEAN MAX_RSS
HTTP2StateMachine 571 571 571
 
Removed MIN MAX MEAN MAX_RSS
InsertCharacterEndIndex 159 160 160
InsertCharacterEndIndexNonASCII 56 57 56
InsertCharacterStartIndex 656 656 656
InsertCharacterStartIndexNonASCII 362 365 363
InsertCharacterTowardsEndIndex 199 201 200
InsertCharacterTowardsEndIndexNonASCII 216 218 217
ParseInt.Large.Binary 271 271 271
ParseInt.Large.Decimal 127 137 131
ParseInt.Large.Hex 195 197 196
ParseInt.Large.UncommonRadix 144 144 144
ParseInt.Small.Binary 552 552 552
ParseInt.Small.Decimal 334 334 334
ParseInt.Small.Hex 344 367 352
ParseInt.Small.UncommonRadix 358 358 358

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
FlattenListLoop 4950 4079 -17.6% 1.21x (?)
NSStringConversion.LongUTF8 593 544 -8.3% 1.09x (?)
ObjectiveCBridgeStubNSDateRefAccess 400 371 -7.2% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
HTTP2StateMachine 571 571 571
 
Removed MIN MAX MEAN MAX_RSS
InsertCharacterEndIndex 159 174 164
InsertCharacterEndIndexNonASCII 56 60 58
InsertCharacterStartIndex 659 659 659
InsertCharacterStartIndexNonASCII 365 367 366
InsertCharacterTowardsEndIndex 213 218 215
InsertCharacterTowardsEndIndexNonASCII 222 225 223
ParseInt.Large.Binary 267 267 267
ParseInt.Large.Decimal 131 131 131
ParseInt.Large.Hex 199 201 200
ParseInt.Large.UncommonRadix 144 144 144
ParseInt.Small.Binary 566 575 569
ParseInt.Small.Decimal 334 334 334
ParseInt.Small.Hex 351 352 351
ParseInt.Small.UncommonRadix 354 354 354

Code size: -Osize

Performance: -Onone

Added MIN MAX MEAN MAX_RSS
HTTP2StateMachine 3419182 3505167 3476193
 
Removed MIN MAX MEAN MAX_RSS
InsertCharacterEndIndex 229 232 230
InsertCharacterEndIndexNonASCII 80 82 81
InsertCharacterStartIndex 754 758 757
InsertCharacterStartIndexNonASCII 413 413 413
InsertCharacterTowardsEndIndex 243 246 244
InsertCharacterTowardsEndIndexNonASCII 235 239 236
ParseInt.Large.Binary 11822 11857 11838
ParseInt.Large.Decimal 4090 4100 4095
ParseInt.Large.Hex 3737 3832 3772
ParseInt.Large.UncommonRadix 4679 4689 4685
ParseInt.Small.Binary 23380 23492 23433
ParseInt.Small.Decimal 13721 13761 13747
ParseInt.Small.Hex 13392 13446 13415
ParseInt.Small.UncommonRadix 15865 16021 15926

Code size: -swiftlibs

Benchmark Check Report
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@CodaFi
Copy link
Contributor

CodaFi commented Apr 14, 2020

@swift-ci test

@CodaFi
Copy link
Contributor

CodaFi commented Apr 15, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants