Skip to content

[inliner] Treat inline_always as meaning truly inline_always even in … #20589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

…situations where we have large caller CFGs.

Currently if a caller is > 400 blocks, the inliner bails out of finding
inlinable targets. This is incorrect behavior for inline always functions. In
such cases, we should continue inlining inline always functions and skip any
functions that are not inline always.

rdar://45976860

…situations where we have large caller CFGs.

Currently if a caller is > 400 blocks, the inliner bails out of finding
inlinable targets. This is incorrect behavior for inline always functions. In
such cases, we should continue inlining inline always functions and skip any
functions that are not inline always.

rdar://45976860
@gottesmm gottesmm requested a review from eeckstein November 15, 2018 02:17
@gottesmm
Copy link
Contributor Author

I am going to add some tests before I commit this. Just want to do some initial testing/benchmarking.

@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci test compiler performance

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
NopDeinit 46488 54118 +16.4% 0.86x
Improvement
InsertCharacterEndIndex 163 151 -7.4% 1.08x
StringHashing_fastPrenormal 1677 1559 -7.0% 1.08x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
BinaryFloatingPointConversionFromBinaryInteger.o 11615 26319 +126.6% 0.44x
AnyHashableWithAClass.o 2965 3013 +1.6% 0.98x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
StringHashing_fastPrenormal 1676 1561 -6.9% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
BinaryFloatingPointConversionFromBinaryInteger.o 11519 25679 +122.9% 0.45x
AnyHashableWithAClass.o 3181 3229 +1.5% 0.99x
Improvement
SequenceAlgos.o 22418 21954 -2.1% 1.02x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
WordCountUniqueASCII 6133 7969 +29.9% 0.77x
WordCountUniqueUTF16 6948 8553 +23.1% 0.81x
UTF8Decode_InitDecoding_ascii 752 909 +20.9% 0.83x
Dictionary2 1403 1644 +17.2% 0.85x
Dictionary 1865 2141 +14.8% 0.87x
Improvement
BitCount 9882 4700 -52.4% 2.10x
ByteSwap 11018 5489 -50.2% 2.01x
MonteCarloPi 5949025 4684267 -21.3% 1.27x
ArraySubscript 118938 94335 -20.7% 1.26x
MonteCarloE 1254037 1020057 -18.7% 1.23x
ExclusivityIndependent 86 70 -18.6% 1.23x
Radix2CooleyTukeyf 52424 43540 -16.9% 1.20x
RandomDoubleLCG 51594 44232 -14.3% 1.17x
SubstringEquatable 3898 3376 -13.4% 1.15x
DropFirstCountableRange 407 353 -13.3% 1.15x (?)
PrefixCountableRange 406 353 -13.1% 1.15x
SubstringComparable 1093 966 -11.6% 1.13x
NibbleSort 216198 193535 -10.5% 1.12x
Radix2CooleyTukey 52800 47895 -9.3% 1.10x
ArrayOfPOD 859 780 -9.2% 1.10x (?)
Memset 12871 11728 -8.9% 1.10x
XorLoop 8018 7306 -8.9% 1.10x

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Regression
libswiftCore.dylib 2998272 3162112 +5.5% 0.95x
Improvement
libswiftSwiftOnoneSupport.dylib 163840 159744 -2.5% 1.03x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@swift-ci
Copy link
Contributor

Build comment file:

Summary for master full

Unexpected test results, excluded stats for Tagged, NonEmpty, ProcedureKitCloud, GRDB, ModelAssistant, Wordy

Regressions found (see below)

Debug-batch

debug-batch brief

Regressed (1)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 21,758,976,619,852 423,302,851,830,744 401,543,875,210,892 1845.42% ⛔
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 913,073,158 911,972,488 -1,100,670 -0.12%
time.swift-driver.wall 2243.0s 2230.9s -12.2s -0.54%

debug-batch detailed

Regressed (2)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 21,758,976,619,852 423,302,851,830,744 401,543,875,210,892 1845.42% ⛔
Sema.NumConformancesDeserialized 3,555,336 3,866,869 311,533 8.76% ⛔
Improved (1)
name old new delta delta_pct
Sema.USRGenerationRequest 6,089,550 6,014,746 -74,804 -1.23% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (92)
name old new delta delta_pct
AST.NumASTBytesAllocated 45,403,890,092 45,118,647,199 -285,242,893 -0.63%
AST.NumDecls 65,849 65,849 0 0.0%
AST.NumDependencies 152,691 152,682 -9 -0.01%
AST.NumImportedExternalDefinitions 960,730 960,730 0 0.0%
AST.NumInfixOperators 24,640 24,640 0 0.0%
AST.NumLinkLibraries 0 0 0 0.0%
AST.NumLoadedModules 180,845 180,845 0 0.0%
AST.NumLocalTypeDecls 113 113 0 0.0%
AST.NumObjCMethods 12,528 12,528 0 0.0%
AST.NumPostfixOperators 13 13 0 0.0%
AST.NumPrecedenceGroups 12,687 12,687 0 0.0%
AST.NumPrefixOperators 71 71 0 0.0%
AST.NumReferencedDynamicNames 101 101 0 0.0%
AST.NumReferencedMemberNames 3,046,566 3,046,566 0 0.0%
AST.NumReferencedTopLevelNames 210,981 210,981 0 0.0%
AST.NumSourceBuffers 288,503 288,503 0 0.0%
AST.NumSourceLines 2,107,547 2,107,547 0 0.0%
AST.NumSourceLinesPerSecond 1,171,759 1,167,085 -4,674 -0.4%
AST.NumTotalClangImportedEntities 3,563,211 3,559,780 -3,431 -0.1%
AST.NumUsedConformances 185,274 185,274 0 0.0%
Driver.ChildrenMaxRSS 69,075,830,784 69,003,202,560 -72,628,224 -0.11%
Driver.DriverDepCascadingDynamic 0 0 0 0.0%
Driver.DriverDepCascadingExternal 0 0 0 0.0%
Driver.DriverDepCascadingMember 0 0 0 0.0%
Driver.DriverDepCascadingNominal 0 0 0 0.0%
Driver.DriverDepCascadingTopLevel 0 0 0 0.0%
Driver.DriverDepDynamic 0 0 0 0.0%
Driver.DriverDepExternal 0 0 0 0.0%
Driver.DriverDepMember 0 0 0 0.0%
Driver.DriverDepNominal 0 0 0 0.0%
Driver.DriverDepTopLevel 0 0 0 0.0%
Driver.NumDriverJobsRun 13,529 13,529 0 0.0%
Driver.NumDriverJobsSkipped 0 0 0 0.0%
Driver.NumDriverPipePolls 300,431 297,791 -2,640 -0.88%
Driver.NumDriverPipeReads 340,265 338,697 -1,568 -0.46%
Driver.NumProcessFailures 0 0 0 0.0%
Frontend.MaxMallocUsage 359,990,470,464 358,861,768,616 -1,128,701,848 -0.31%
Frontend.NumProcessFailures 0 0 0 0.0%
IRModule.NumIRAliases 92,680 92,680 0 0.0%
IRModule.NumIRBasicBlocks 3,590,781 3,588,080 -2,701 -0.08%
IRModule.NumIRComdatSymbols 0 0 0 0.0%
IRModule.NumIRFunctions 1,634,126 1,632,660 -1,466 -0.09%
IRModule.NumIRGlobals 1,858,336 1,857,573 -763 -0.04%
IRModule.NumIRIFuncs 0 0 0 0.0%
IRModule.NumIRInsts 42,155,606 42,093,230 -62,376 -0.15%
IRModule.NumIRNamedMetaData 66,181 66,181 0 0.0%
IRModule.NumIRValueSymbols 3,122,971 3,120,747 -2,224 -0.07%
LLVM.NumLLVMBytesOutput 913,073,158 911,972,488 -1,100,670 -0.12%
Parse.NumFunctionsParsed 2,170,339 2,170,339 0 0.0%
Parse.NumIterableDeclContextParsed 865,730 865,730 0 0.0%
SILModule.NumSILGenDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILGenFunctions 1,286,682 1,286,682 0 0.0%
SILModule.NumSILGenGlobalVariables 26,821 26,821 0 0.0%
SILModule.NumSILGenVtables 10,171 10,171 0 0.0%
SILModule.NumSILGenWitnessTables 37,360 37,360 0 0.0%
SILModule.NumSILOptDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILOptFunctions 1,170,740 1,170,046 -694 -0.06%
SILModule.NumSILOptGlobalVariables 27,550 27,550 0 0.0%
SILModule.NumSILOptVtables 16,355 16,355 0 0.0%
SILModule.NumSILOptWitnessTables 72,616 72,675 59 0.08%
Sema.AccessLevelRequest 1,913,002 1,911,581 -1,421 -0.07%
Sema.DefaultAndMaxAccessLevelRequest 45,844 45,837 -7 -0.02%
Sema.EnumRawTypeRequest 13,205 13,205 0 0.0%
Sema.ExtendedNominalRequest 2,674,339 2,670,860 -3,479 -0.13%
Sema.InheritedDeclsReferencedRequest 84,545,515 84,451,918 -93,597 -0.11%
Sema.InheritedTypeRequest 441,390 441,421 31 0.01%
Sema.IsDynamicRequest 1,535,733 1,535,733 0 0.0%
Sema.IsObjCRequest 1,333,452 1,332,941 -511 -0.04%
Sema.NamedLazyMemberLoadFailureCount 18,175 18,168 -7 -0.04%
Sema.NamedLazyMemberLoadSuccessCount 13,012,538 13,017,051 4,513 0.03%
Sema.NominalTypeLookupDirectCount 24,975,521 24,968,523 -6,998 -0.03%
Sema.NumConstraintScopes 13,042,247 13,040,509 -1,738 -0.01%
Sema.NumConstraintsConsideredForEdgeContraction 29,328,630 29,328,404 -226 -0.0%
Sema.NumDeclsDeserialized 30,831,270 30,736,031 -95,239 -0.31%
Sema.NumDeclsValidated 1,645,642 1,645,643 1 0.0%
Sema.NumFunctionsTypechecked 920,531 920,531 0 0.0%
Sema.NumGenericSignatureBuilders 878,213 876,903 -1,310 -0.15%
Sema.NumLazyGenericEnvironments 6,210,553 6,193,700 -16,853 -0.27%
Sema.NumLazyGenericEnvironmentsLoaded 165,438 165,456 18 0.01%
Sema.NumLazyIterableDeclContexts 4,955,981 4,952,699 -3,282 -0.07%
Sema.NumLeafScopes 8,941,312 8,939,775 -1,537 -0.02%
Sema.NumTypesDeserialized 10,891,683 10,928,772 37,089 0.34%
Sema.NumTypesValidated 1,078,131 1,078,134 3 0.0%
Sema.NumUnloadedLazyIterableDeclContexts 3,527,489 3,529,718 2,229 0.06%
Sema.OverriddenDeclsRequest 4,081,422 4,043,569 -37,853 -0.93%
Sema.RequirementRequest 54,620 54,620 0 0.0%
Sema.SelfBoundsFromWhereClauseRequest 50,293,999 50,136,044 -157,955 -0.31%
Sema.SetterAccessLevelRequest 108,626 108,626 0 0.0%
Sema.SuperclassDeclRequest 66,570,131 66,549,883 -20,248 -0.03%
Sema.SuperclassTypeRequest 30,277 30,277 0 0.0%
Sema.TypeDeclsFromWhereClauseRequest 26,258 26,251 -7 -0.03%
Sema.UnderlyingTypeDeclsReferencedRequest 2,295,989 2,295,784 -205 -0.01%

Release

release brief

Regressed (1)
name old new delta delta_pct
time.swift-driver.wall 4349.4s 4442.0s 92.6s 2.13% ⛔
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 23,847,762,193,226 24,073,353,594,246 225,591,401,020 0.95%
LLVM.NumLLVMBytesOutput 811,777,136 810,474,628 -1,302,508 -0.16%

release detailed

Regressed (2)
name old new delta delta_pct
Sema.NumConformancesDeserialized 1,329,137 1,588,136 258,999 19.49% ⛔
Sema.NumTypesDeserialized 2,141,037 2,164,125 23,088 1.08% ⛔
Improved (2)
name old new delta delta_pct
SILModule.NumSILOptFunctions 722,403 694,467 -27,936 -3.87% ✅
Sema.NumGenericSignatureBuilders 147,471 145,901 -1,570 -1.06% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (19)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 178,485 178,485 0 0.0%
AST.NumLoadedModules 11,404 11,404 0 0.0%
AST.NumTotalClangImportedEntities 606,341 606,341 0 0.0%
AST.NumUsedConformances 186,004 186,004 0 0.0%
IRModule.NumIRBasicBlocks 2,978,566 2,975,491 -3,075 -0.1%
IRModule.NumIRFunctions 1,338,894 1,336,674 -2,220 -0.17%
IRModule.NumIRGlobals 1,482,942 1,470,633 -12,309 -0.83%
IRModule.NumIRInsts 28,679,406 28,624,546 -54,860 -0.19%
IRModule.NumIRValueSymbols 2,625,619 2,623,474 -2,145 -0.08%
LLVM.NumLLVMBytesOutput 811,777,136 810,474,628 -1,302,508 -0.16%
SILModule.NumSILGenFunctions 561,090 561,090 0 0.0%
Sema.NumConstraintScopes 11,497,127 11,497,127 0 0.0%
Sema.NumDeclsDeserialized 3,940,045 3,939,750 -295 -0.01%
Sema.NumDeclsValidated 855,460 855,460 0 0.0%
Sema.NumFunctionsTypechecked 450,388 450,388 0 0.0%
Sema.NumLazyGenericEnvironments 801,545 801,457 -88 -0.01%
Sema.NumLazyGenericEnvironmentsLoaded 16,261 16,260 -1 -0.01%
Sema.NumLazyIterableDeclContexts 532,848 532,841 -7 -0.0%
Sema.NumTypesValidated 401,820 401,820 0 0.0%

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gottesmm gottesmm merged commit 7537740 into swiftlang:master Nov 15, 2018
@gottesmm gottesmm deleted the pr-37958925c26b34b09c8e6638f5f45e8763af2110 branch November 15, 2018 21:17
@milseman
Copy link
Member

Do we know why the stdlib is 5% bigger with this? This is worth taking even if we have to suffer that regression, but I'm thinking that this might expose some needless code bloat.

@gottesmm
Copy link
Contributor Author

I didn't look into it. @airspeedswift and @eeckstein talked about this and decided we were ok taking the code-size hit.

@gottesmm
Copy link
Contributor Author

@milseman

Also, I took a look at the overall code-size per project. Here is a % change. You'll see that almost all of the projects actually /decrease/ in size (i.e. left is good, right is bad):

pastedgraphic-1

@gottesmm
Copy link
Contributor Author

To be clear by per project, I mean per source compatibility project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants