Skip to content

[WIP] Fix the mid-level function-pass pipeline #31163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

[WIP] Fix the mid-level function-pass pipeline #31163

wants to merge 3 commits into from

Conversation

atrick
Copy link
Contributor

@atrick atrick commented Apr 20, 2020

This is the first of two major pass pipeline design changes that I have been wanting to make to fix performance instability.

This PR fixes the pipeline restart mechanism by rearranging passes to respect the function pass pipeline.

The second PR will fix handling of @semantic calls in the inliner so their inlining is deferred until late in the pipeline and they are inlined in a predictable order so that passes that optimize @semantic calls are guaranteed to see them.

The higher-level goal of both these PRs is to allow for incremental progress on improvements to the pass pipeline and inlining without being backed into a corner, where current benchmarks perform well only by chance. For this to happen, our design needs to strictly adhere to some principles:

  • Pass pipelines are organized according to clear rules and
    principles. Any other dependencies between passes are explicitly
    called out. The major phases should be

    • Optimization pipeline for unoptimized (Onone) builds
    • Optimization pipeline required before serialization
    • Optimization pipeline that handles @semantic and @effects functions
      (the most powerful optimizations need to go here)
    • Optimization pipeline that further reduces the inlined implementation
      of (non-nested) @semantic and @effects functions
      (this last phase has less information from analyses)
      (ultimiately this last phase will be split between OSSA/non-OSSA pipelines)
  • A single decision to inline can only expose more
    optimization/analyses opportunities (assuming it doesn't block
    further inlinining because of code size)

  • Inlining cannot prevent an optimization pass from processing a given
    region of code.

  • A @semantic call site will always be processed by the optimization
    pass that operates on those semantics regardless of inlining
    decisions.

This PR is only one small step toward these principles.

This PR currently exposes a bug where SimplifyCFG does not update the dominator tree. This will be hit be the unit test:
SILOptimizer/specialize_opaque_type_archetypes.swift

atrick added 3 commits April 20, 2020 15:29
Module passes need to be in a separate pipeline, otherwise the
pipeline restart mechanism will be broken.

This makes GlobalOpt and serialization run earlier in the
pipeline. There's no explicit reason for them to be run later, in the
middle of a function pass pipeline.

Also, pipeline boundaries, like serialization and module passes should
be explicit at the the top level function that creates the pass
pipelines.
Don't allow module passes to be inserted within a function pass
pipeline. This silently breaks the function pipeline both interfering
with analysis and the normal pipeline restart mechanism.
Not all simplifications correctly invalidate CFG analyses.
@airspeedswift
Copy link
Member

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
ArrayPlusEqualSingleElementCollection 799 4747 +494.1% 0.17x
ObjectiveCBridgeStubFromNSDateRef 4810 5810 +20.8% 0.83x (?)
FlattenListLoop 4451 4967 +11.6% 0.90x (?)
ObserverForwarderStruct 910 1015 +11.5% 0.90x (?)
Prims.NonStrongRef.UnownedSafe.Closure 795 883 +11.1% 0.90x (?)
Prims.NonStrongRef.UnownedSafe 795 882 +10.9% 0.90x (?)
DictionaryOfAnyHashableStrings_insert 5054 5586 +10.5% 0.90x (?)
ObserverUnappliedMethod 1200 1300 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ArrayPlusEqualFiveElementCollection 8436 5106 -39.5% 1.65x
String.data.LargeUnicode 124 102 -17.7% 1.22x (?)
SubstringEquatable 866 750 -13.4% 1.15x
ArrayAppendLazyMap 7030 6090 -13.4% 1.15x (?)
LazilyFilteredArrayContains 41400 36300 -12.3% 1.14x (?)
RemoveWhereMoveInts 36 33 -8.3% 1.09x (?)
MapReduce 371 342 -7.8% 1.08x (?)
RemoveWhereSwapInts 64 59 -7.8% 1.08x (?)
ParseInt.UInt64.Decimal 253 235 -7.1% 1.08x (?)
MapReduceAnyCollection 397 369 -7.1% 1.08x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
Array2D.o 2984 3016 +1.1% 0.99x
 
Improvement OLD NEW DELTA RATIO
PrimsNonStrongRef.o 121112 119344 -1.5% 1.01x
LazyFilter.o 8384 8296 -1.0% 1.01x

Performance: -Osize

Regression OLD NEW DELTA RATIO
ArrayPlusEqualSingleElementCollection 846 4653 +450.0% 0.18x
ObjectiveCBridgeStubToNSDateRef 3500 4020 +14.9% 0.87x (?)
Combos 490 562 +14.7% 0.87x (?)
ObserverForwarderStruct 870 975 +12.1% 0.89x (?)
Prims.NonStrongRef.UnownedSafe.Closure 786 873 +11.1% 0.90x (?)
Prims.NonStrongRef.UnownedSafe 786 873 +11.1% 0.90x (?)
DictionaryOfAnyHashableStrings_insert 5110 5628 +10.1% 0.91x (?)
DataCountSmall 31 34 +9.7% 0.91x (?)
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii 401 437 +9.0% 0.92x (?)
ObserverUnappliedMethod 1220 1320 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringEquatable 875 734 -16.1% 1.19x (?)
ObjectiveCBridgeFromNSSetAnyObjectToStringForced 96500 84500 -12.4% 1.14x (?)
ArrayAppendLazyMap 7900 6990 -11.5% 1.13x (?)
ObjectiveCBridgeStubToNSDate2 620 560 -9.7% 1.11x (?)
UTF8Decode_InitFromBytes_ascii 348 319 -8.3% 1.09x (?)
DictionaryBridgeToObjC_Access 851 781 -8.2% 1.09x (?)
RemoveWhereMoveInts 37 34 -8.1% 1.09x (?)
ArrayPlusEqualFiveElementCollection 8325 7696 -7.6% 1.08x (?)
RemoveWhereSwapInts 67 62 -7.5% 1.08x
FlattenListLoop 5193 4841 -6.8% 1.07x (?)
MapReduce 435 406 -6.7% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
ArrayOfPOD.o 3286 3558 +8.3% 0.92x
RecursiveOwnedParameter.o 1471 1551 +5.4% 0.95x
 
Improvement OLD NEW DELTA RATIO
SequenceAlgos.o 23463 22103 -5.8% 1.06x
DropFirst.o 19631 18975 -3.3% 1.03x
DataBenchmarks.o 57542 56829 -1.2% 1.01x
NibbleSort.o 13000 12856 -1.1% 1.01x
LazyFilter.o 8011 7923 -1.1% 1.01x
ArrayAppend.o 23159 22911 -1.1% 1.01x

Performance: -Onone

Regression OLD NEW DELTA RATIO
DictionaryRemove 12300 13750 +11.8% 0.89x (?)
CharIndexing_japanese_unicodeScalars_Backwards 756440 821400 +8.6% 0.92x (?)
Dict.CopyKeyValue.24k 5365 5822 +8.5% 0.92x (?)
DictionarySwap 4352 4716 +8.4% 0.92x (?)
Dict.CopyKeyValue.20k 4614 4984 +8.0% 0.93x (?)
ExclusivityIndependent 91 98 +7.7% 0.93x (?)
Dict.CopyKeyValue.16k 3914 4209 +7.5% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DictionarySubscriptDefaultMutation 12215 4388 -64.1% 2.78x
DictionarySubscriptDefaultMutationArray 12648 4721 -62.7% 2.68x
FrequenciesUsingReduceInto 9970 4080 -59.1% 2.44x
DictionarySubscriptDefaultMutationOfObjects 17220 8640 -49.8% 1.99x
DictionarySubscriptDefaultMutationArrayOfObjects 19680 10220 -48.1% 1.93x
FrequenciesUsingReduce 15620 9770 -37.5% 1.60x
Dictionary3OfObjects 1568 1463 -6.7% 1.07x (?)

Code size: -swiftlibs

Improvement OLD NEW DELTA RATIO
libswiftCore.dylib 4128768 4059136 -1.7% 1.02x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@atrick
Copy link
Contributor Author

atrick commented May 1, 2020

@meg-gupta took over this PR. Thanks.

@atrick atrick closed this May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants