Skip to content

[Runtime] Cache protocol conformance descriptors, not witness tables. #20491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DougGregor
Copy link
Member

The conformance cache was caching the witness table for a conformance
T: P, where T is a concrete type and P is a protocol. However, it
essentially picked one of potentially many witness tables for that
conformance, because retroactive conformances might produce different results
from different modules.

Make the conformance cache what is says it is: a cache of the conformance
descriptor for a given T: P. Clients of the conformance cache can choose how to interpret
the protocol conformance descriptor, e.g., by instantiating a witness table with a
particular set of arguments.

We can bring back a specific conformance cache for swift_conformsToProtocol()
if it is profitable.

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
StringEqualPointerComparison 657 600 -8.7% 1.09x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeToNSDictionary 14477 18427 +27.3% 0.79x
ObjectiveCBridgeToNSSet 17451 19380 +11.1% 0.90x
Improvement
StringEqualPointerComparison 628 571 -9.1% 1.10x
EqualStringSubstring 13 12 -7.7% 1.08x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ArrayOfGenericPOD2 1066 1216 +14.1% 0.88x (?)
ArrayOfPOD 777 878 +13.0% 0.88x
ObjectiveCBridgeToNSSet 17931 19822 +10.5% 0.90x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@DougGregor
Copy link
Member Author

@swift-ci please smoke test Linux

2 similar comments
@DougGregor
Copy link
Member Author

@swift-ci please smoke test Linux

@DougGregor
Copy link
Member Author

@swift-ci please smoke test Linux

… contexts

When a (file)private entity occurs inside a generic context, we still need
information about the genericity of the enclosing context to demangle
to metadata. Emit complete context descriptors for parents of anonymous
contexts.

Fixes rdar://problem/46109026.
@DougGregor DougGregor force-pushed the runtime-conformance-descriptor-cache branch from 0770929 to 17699d4 Compare November 17, 2018 06:26
The conformance cache was caching the witness table for a conformance
`T: P`, where `T` is a concrete type and `P` is a protocol. However, it
essentially picked one of potentially many witness tables for that
conformance, because retroactive conformances might produce different results
from different modules.

Make the conformance cache what is says it is: a cache of the conformance
descriptor for a given `T: P`, potentially filtered by a module (when
requested). Clients of the conformance cache can choose how to interpret
the protocol conformance descriptor, e.g., by instantiating a witness table.

We can bring back a specific conformance cache for swift_conformsToProtocol()
if it is profitable.

(cherry picked from commit 0af2af00a739a4d912d2a9c3b196449e4164484f)
@DougGregor
Copy link
Member Author

@swift-ci please smoke test

…metadata.

Metadata uniquing might encounter witness tables that were distinctly
generated but come from identical descriptors. Handle this case in metadata
uniquing be looking into the protocol conformance descriptors themselves.
@DougGregor
Copy link
Member Author

@swift-ci please smoke test

2 similar comments
@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

2 similar comments
@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please benchmark

2 similar comments
@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DropLastAnySequenceLazy 3406 4010 +17.7% 0.85x
SuffixAnySequenceLazy 3586 4220 +17.7% 0.85x
ReversedBidirectional 11101 12943 +16.6% 0.86x
ObjectiveCBridgeToNSDictionary 16119 17970 +11.5% 0.90x
SuffixAnySeqCntRangeLazy 20359 21929 +7.7% 0.93x
COWArrayGuaranteedParameterOverhead 10542 11346 +7.6% 0.93x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
ReversedBidirectional 15937 18158 +13.9% 0.88x
SuffixAnySequenceLazy 3929 4425 +12.6% 0.89x
PrefixAnyCollectionLazy 58921 65967 +12.0% 0.89x
DropLastAnySequenceLazy 3797 4223 +11.2% 0.90x
DropLastAnyCollectionLazy 19811 21919 +10.6% 0.90x
DropFirstAnySequence 4418 4858 +10.0% 0.91x
ObjectiveCBridgeToNSArray 15916 17495 +9.9% 0.91x
SequenceAlgosAnySequence 11983 12936 +8.0% 0.93x
PrefixAnySeqCRangeIter 15963 17205 +7.8% 0.93x
Improvement
PrefixWhileAnySeqCntRangeLazy 176 159 -9.7% 1.11x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DropWhileSequenceLazy 11180 13286 +18.8% 0.84x
PrefixWhileSequenceLazy 9977 11835 +18.6% 0.84x
PrefixWhileAnySequenceLazy 10227 12090 +18.2% 0.85x
DropFirstAnySequence 10199 12043 +18.1% 0.85x
DropWhileAnySequenceLazy 11661 13752 +17.9% 0.85x
PrefixSequence 7480 8809 +17.8% 0.85x
DropFirstSequenceLazy 9264 10905 +17.7% 0.85x
DropFirstAnySequenceLazy 10252 12066 +17.7% 0.85x
DropFirstSequence 9322 10954 +17.5% 0.85x
PrefixAnySequenceLazy 7783 9131 +17.3% 0.85x
StringRemoveDupes 751 876 +16.6% 0.86x
DropWhileSequence 12712 14813 +16.5% 0.86x
PrefixSequenceLazy 7493 8690 +16.0% 0.86x
MapReduceLazySequence 19126 22008 +15.1% 0.87x
SequenceAlgosUnfoldSequence 6044 6953 +15.0% 0.87x
DropWhileAnySequence 13481 15490 +14.9% 0.87x
ArrayAppendUTF16 23036 26435 +14.8% 0.87x
ArrayAppendLatin1 23103 26472 +14.6% 0.87x
PrefixAnySequence 8355 9563 +14.5% 0.87x
ArrayAppendAscii 23078 26348 +14.2% 0.88x
ArrayAppendRepeatCol 197121 222786 +13.0% 0.88x
DropWhileAnySeqCntRangeLazy 23138 26126 +12.9% 0.89x
DropWhileAnyCollectionLazy 23143 26108 +12.8% 0.89x
DataAppendSequence 2040013 2299147 +12.7% 0.89x
DropLastAnySequenceLazy 22821 25479 +11.6% 0.90x
SetSubtractingInt50 499 557 +11.6% 0.90x
PrefixCountableRangeLazy 29566 32968 +11.5% 0.90x
DropLastAnyCollectionLazy 30253 33722 +11.5% 0.90x
DictionarySwap 4208 4689 +11.4% 0.90x
DictionaryCopy 264038 294059 +11.4% 0.90x
SetSubtractingInt100 581 647 +11.4% 0.90x
DropFirstCountableRangeLazy 29687 32993 +11.1% 0.90x
SuffixCountableRangeLazy 9889 10958 +10.8% 0.90x
PrefixWhileCountableRangeLazy 18465 20407 +10.5% 0.90x
SetSymmetricDifferenceInt100 709 780 +10.0% 0.91x
DropWhileCountableRangeLazy 22656 24917 +10.0% 0.91x
DropLastSequence 22368 24599 +10.0% 0.91x
SetSymmetricDifferenceInt50 750 824 +9.9% 0.91x
SuffixSequenceLazy 17415 19128 +9.8% 0.91x
SuffixAnySequence 17426 19136 +9.8% 0.91x
SequenceAlgosRange 1311544 1439537 +9.8% 0.91x
SuffixAnySequenceLazy 17863 19597 +9.7% 0.91x
PrefixWhileCountableRange 15012 16460 +9.6% 0.91x
DropLastSequenceLazy 22458 24615 +9.6% 0.91x
LazilyFilteredRange 572485 627237 +9.6% 0.91x
DropLastAnySequence 22425 24538 +9.4% 0.91x
SetSubtractingInt25 462 503 +8.9% 0.92x
SuffixSequence 17462 19005 +8.8% 0.92x
SequenceAlgosAnySequence 13381 14562 +8.8% 0.92x
DropWhileCountableRange 5289 5751 +8.7% 0.92x
PrefixAnySeqCntRange 16870 18312 +8.5% 0.92x
PrefixAnyCollection 16253 17608 +8.3% 0.92x
DropFirstAnyCollectionLazy 91575 99208 +8.3% 0.92x
DropLastAnyCollection 5427 5875 +8.3% 0.92x
PrefixAnySeqCRangeIter 16846 18233 +8.2% 0.92x
DropFirstAnyCollection 16285 17611 +8.1% 0.92x
MapReduceSequence 24064 26019 +8.1% 0.92x
SequenceAlgosList 8622 9320 +8.1% 0.93x
DropFirstAnySeqCntRangeLazy 21761 23468 +7.8% 0.93x
DropFirstAnySeqCRangeIterLazy 21834 23528 +7.8% 0.93x
SuffixArrayLazy 7412 7977 +7.6% 0.93x
PrefixAnySeqCRangeIterLazy 16455 17707 +7.6% 0.93x
DropFirstArrayLazy 22203 23878 +7.5% 0.93x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@DougGregor
Copy link
Member Author

Benchmark regressions imply that the code comparing two metadata cache keys is hot, given that it's so much slower now. We can certainly improve things there with (e.g.) some kind of run-length encoding for sequences of exact pointer comparisons and sequences of witness table comparisons.

Rather than scanning the type descriptor each time we perform a comparison
or hash of a metadata cache entry, do so only once to establish the number
of key parameters and the number of witness tables. Use those values to
more efficiently compare keys.
@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

1 similar comment
@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
ReversedBidirectional 11114 13082 +17.7% 0.85x
SuffixAnySequenceLazy 3559 4185 +17.6% 0.85x
ObjectiveCBridgeToNSSet 17234 19739 +14.5% 0.87x
PrefixAnyCollectionLazy 58668 65178 +11.1% 0.90x
DropFirstAnyCollectionLazy 58482 64546 +10.4% 0.91x
SuffixAnyCollectionLazy 19521 21530 +10.3% 0.91x
DropLastAnySeqCntRangeLazy 20283 22115 +9.0% 0.92x
SuffixAnySeqCRangeIterLazy 20448 22224 +8.7% 0.92x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
SuffixAnySequenceLazy 3842 4687 +22.0% 0.82x
DropLastAnySequenceLazy 3720 4428 +19.0% 0.84x
ObjectiveCBridgeToNSArray 14611 17264 +18.2% 0.85x
DropFirstAnySequence 4409 5186 +17.6% 0.85x
ReversedBidirectional 16010 18440 +15.2% 0.87x
SequenceAlgosAnySequence 11891 13161 +10.7% 0.90x
DropLastAnySeqCntRangeLazy 20157 22193 +10.1% 0.91x
DropLastAnySeqCRangeIterLazy 20161 22191 +10.1% 0.91x
DropFirstAnySeqCntRange 20462 22488 +9.9% 0.91x
DropFirstAnySeqCRangeIter 20488 22488 +9.8% 0.91x
SuffixAnySeqCRangeIterLazy 20306 22265 +9.6% 0.91x
SuffixAnySeqCntRangeLazy 20254 22195 +9.6% 0.91x
PrefixAnySeqCRangeIter 15908 17413 +9.5% 0.91x
PrefixAnySeqCntRange 15930 17406 +9.3% 0.92x
SuffixAnyCollectionLazy 20559 22323 +8.6% 0.92x
PrefixAnyCollectionLazy 62505 67695 +8.3% 0.92x
DropLastAnyCollectionLazy 20702 22363 +8.0% 0.93x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DropWhileSequenceLazy 11406 13703 +20.1% 0.83x
PrefixSequence 7717 9207 +19.3% 0.84x
CaptureProp 301126 359166 +19.3% 0.84x
DropFirstAnySequenceLazy 10442 12452 +19.2% 0.84x
PrefixWhileSequenceLazy 10160 12108 +19.2% 0.84x
SequenceAlgosUnfoldSequence 6146 7311 +19.0% 0.84x
DropWhileAnySequenceLazy 11829 14065 +18.9% 0.84x
DropWhileSequence 12815 15210 +18.7% 0.84x
DropWhileAnySequence 13552 16040 +18.4% 0.84x
PrefixWhileAnySequenceLazy 10430 12335 +18.3% 0.85x
MapReduceLazySequence 19261 22725 +18.0% 0.85x
PrefixAnySequenceLazy 8058 9474 +17.6% 0.85x
SetSubtractingInt100 582 684 +17.5% 0.85x
DropFirstSequenceLazy 9548 11201 +17.3% 0.85x
DropFirstAnySequence 10443 12217 +17.0% 0.85x
DropFirstSequence 9622 11225 +16.7% 0.86x
ArrayAppendAscii 23049 26812 +16.3% 0.86x
ArrayAppendLatin1 23109 26875 +16.3% 0.86x
PrefixAnySequence 8535 9900 +16.0% 0.86x
ArrayPlusEqualSingleElementCollection 191813 222482 +16.0% 0.86x
DropLastSequence 22556 26022 +15.4% 0.87x
ArrayAppendUTF16 23091 26560 +15.0% 0.87x
PrefixWhileAnySequence 19711 22544 +14.4% 0.87x
SetUnionInt100 333 380 +14.1% 0.88x
SetUnionInt50 477 544 +14.0% 0.88x
SetSymmetricDifferenceInt100 715 812 +13.6% 0.88x
DictionaryCopy 252124 285572 +13.3% 0.88x
SequenceAlgosAnySequence 13398 15163 +13.2% 0.88x
SuffixCountableRangeLazy 9779 11067 +13.2% 0.88x
DropFirstCountableRangeLazy 29344 33205 +13.2% 0.88x
PrefixCountableRangeLazy 29313 33167 +13.1% 0.88x
DropLastCountableRangeLazy 9810 11065 +12.8% 0.89x
SetUnionInt25 541 609 +12.6% 0.89x
DropLastAnySequence 22553 25347 +12.4% 0.89x
ArrayAppendRepeatCol 196613 220581 +12.2% 0.89x
SequenceAlgosList 8556 9594 +12.1% 0.89x
DropLastAnySequenceLazy 23102 25845 +11.9% 0.89x
SuffixAnySequence 17544 19618 +11.8% 0.89x
SuffixSequenceLazy 17563 19634 +11.8% 0.89x
SuffixAnySequenceLazy 18024 20119 +11.6% 0.90x
DataAppendSequence 2034861 2270944 +11.6% 0.90x
SuffixSequence 17604 19611 +11.4% 0.90x
PrefixWhileSequence 19729 21973 +11.4% 0.90x
SetSymmetricDifferenceInt50 743 827 +11.3% 0.90x
PrefixWhileAnySeqCRangeIter 28201 31314 +11.0% 0.90x
MapReduceSequence 24029 26568 +10.6% 0.90x
SetSymmetricDifferenceInt25 757 836 +10.4% 0.91x
DropWhileCountableRangeLazy 22487 24796 +10.3% 0.91x
SetSubtractingInt0 362 399 +10.2% 0.91x
DropFirstArrayLazy 22064 24281 +10.0% 0.91x
PrefixAnySeqCntRangeLazy 16181 17792 +10.0% 0.91x
PrefixArrayLazy 22025 24199 +9.9% 0.91x
PrefixWhileCountableRange 14988 16457 +9.8% 0.91x
DropLastArrayLazy 7367 8081 +9.7% 0.91x
DropFirstAnyCollection 16087 17607 +9.4% 0.91x
ArrayAppendLazyMap 119959 131214 +9.4% 0.91x
PrefixAnySeqCRangeIter 16763 18325 +9.3% 0.91x
PrefixAnySeqCRangeIterLazy 16258 17749 +9.2% 0.92x
PrefixAnyCollection 16063 17535 +9.2% 0.92x
PrefixWhileCountableRangeLazy 18580 20270 +9.1% 0.92x
SuffixArrayLazy 7438 8107 +9.0% 0.92x
DropFirstAnySeqCntRangeLazy 21598 23539 +9.0% 0.92x
DropFirstAnySeqCRangeIterLazy 21531 23450 +8.9% 0.92x
LazilyFilteredRange 565488 615053 +8.8% 0.92x
SequenceAlgosRange 1319659 1435045 +8.7% 0.92x
DropWhileCountableRange 5287 5738 +8.5% 0.92x
PrefixAnySeqCntRange 16699 18110 +8.4% 0.92x
ReversedBidirectional 38976 42246 +8.4% 0.92x
DropWhileArrayLazy 8830 9553 +8.2% 0.92x
DropLastAnySeqCntRange 33460 36091 +7.9% 0.93x
FatCompactMap 238524 257197 +7.8% 0.93x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

Rather than scanning through the generic parameters and generic requirements
each time we form a key for the generic metadata cache, compute these
values once, when the cache itself is first initialized.
@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please benchmark

@DougGregor
Copy link
Member Author

@swift-ci please smoke test

@DougGregor
Copy link
Member Author

@swift-ci please smoke test Linux

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeToNSArray 15016 17731 +18.1% 0.85x
DropLastAnySequenceLazy 3410 3669 +7.6% 0.93x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeToNSArray 15119 17169 +13.6% 0.88x
SuffixAnySequenceLazy 3857 4290 +11.2% 0.90x
DropLastAnySequenceLazy 3721 4083 +9.7% 0.91x (?)
ReversedBidirectional 15917 17434 +9.5% 0.91x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DropWhileSequenceLazy 11252 13964 +24.1% 0.81x
PrefixSequenceLazy 7496 8628 +15.1% 0.87x
DropFirstAnySequenceLazy 10423 11887 +14.0% 0.88x
DropWhileAnySequenceLazy 11691 13295 +13.7% 0.88x
PrefixSequence 7636 8651 +13.3% 0.88x
DropFirstSequence 9539 10794 +13.2% 0.88x
DropFirstSequenceLazy 9548 10796 +13.1% 0.88x
SequenceAlgosUnfoldSequence 6109 6905 +13.0% 0.88x
PrefixAnySequenceLazy 7919 8920 +12.6% 0.89x
DropWhileAnySequence 13535 15189 +12.2% 0.89x
DropWhileSequence 12789 14351 +12.2% 0.89x
DropFirstAnySequence 10410 11675 +12.2% 0.89x
PrefixWhileSequenceLazy 10089 11292 +11.9% 0.89x
PrefixAnySequence 8426 9397 +11.5% 0.90x
CaptureProp 303038 336682 +11.1% 0.90x
MapReduceLazySequence 19433 21528 +10.8% 0.90x
ArrayAppendLatin1 23168 25634 +10.6% 0.90x
ArrayAppendUTF16 23011 25437 +10.5% 0.90x
ArrayAppendAscii 23111 25444 +10.1% 0.91x
DataAppendSequence 2004492 2200069 +9.8% 0.91x
ArrayAppendRepeatCol 195887 213207 +8.8% 0.92x
DropLastAnySequenceLazy 23062 24869 +7.8% 0.93x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@DougGregor DougGregor merged commit d5f6d7f into swiftlang:master Nov 26, 2018
@DougGregor DougGregor deleted the runtime-conformance-descriptor-cache branch November 26, 2018 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants