Skip to content

StringOptimization: optimize interpolated C strings. #38274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented Jul 6, 2021

Optimize code like:

   puts("\(String.self)")

Optimizing string interpolation and optimizing C-strings are both done in StringOptimization.
A second run of the StringOptimization is needed in the pipeline to optimize such code, because the result of the interpolation-optimization must be cleaned up so that the C-String optimization can kick in.

Also, StringOptimization must handle struct_extract(struct(literal)), where the struct_extract may be in a called function.
To solve a phase ordering problem with inlining String semantics and inlining the String(stringInterpolation: DefaultStringInterpolation) constructor, we do a simple analysis of the callee. Doing this simple "interprocedural" analysis avoids relying on inlining that String constructor.

rdar://79723829

Optimize code like:

   puts("\(String.self)")

Optimizing string interpolation and optimizing C-strings are both done in StringOptimization.
A second run of the StringOptimization is needed in the pipeline to optimize such code, because the result of the interpolation-optimization must be cleaned up so that the C-String optimization can kick in.

Also, StringOptimization must handle struct_extract(struct(literal)), where the struct_extract may be in a called function.
To solve a phase ordering problem with inlining String semantics and inlining the `String(stringInterpolation: DefaultStringInterpolation)` constructor, we do a simple analysis of the callee. Doing this simple "interprocedural" analysis avoids relying on inlining that String constructor.

rdar://74941849
@eeckstein eeckstein requested a review from meg-gupta July 6, 2021 14:26
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Jul 6, 2021

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
DictionaryOfAnyHashableStrings_insert 3094 5558 +79.6% 0.56x
Set.isDisjoint.Box25 358 508 +41.9% 0.70x (?)
Set.isDisjoint.Int25 268 345 +28.7% 0.78x (?)
Set.isDisjoint.Int50 268 339 +26.5% 0.79x (?)
StringFromLongWholeSubstringGeneric 5 6 +20.0% 0.83x (?)
DictionaryKeysContainsNative 22 25 +13.6% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
AngryPhonebook.Strasse.Small 780 588 -24.6% 1.33x
AngryPhonebook.Cyrillic.Small 675 514 -23.9% 1.31x (?)
AngryPhonebook.Armenian.Small 660 518 -21.5% 1.27x (?)
SortStringsUnicode 3105 2890 -6.9% 1.07x (?)

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 1633 2550 +56.2% 0.64x (?)
FlattenListFlatMap 5044 6071 +20.4% 0.83x (?)
StringFromLongWholeSubstringGeneric 5 6 +20.0% 0.83x
Data.hash.Medium 39 42 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
AngryPhonebook.Armenian.Small 719 508 -29.3% 1.42x
AngryPhonebook.Strasse.Small 824 593 -28.0% 1.39x
AngryPhonebook.Cyrillic.Small 676 519 -23.2% 1.30x (?)
LessSubstringSubstring 43 39 -9.3% 1.10x (?)
EqualSubstringSubstringGenericEquatable 43 39 -9.3% 1.10x
String.data.LargeUnicode 111 102 -8.1% 1.09x (?)
NSStringConversion.MutableCopy.Medium 842 778 -7.6% 1.08x (?)
String.data.Medium 109 101 -7.3% 1.08x (?)
EqualSubstringSubstring 42 39 -7.1% 1.08x (?)
EqualStringSubstring 42 39 -7.1% 1.08x
EqualSubstringString 42 39 -7.1% 1.08x
LessSubstringSubstringGenericComparable 42 39 -7.1% 1.08x (?)
SortStringsUnicode 3130 2915 -6.9% 1.07x (?)
SubstringEqualString 441 412 -6.6% 1.07x (?)

Code size: -Osize

Performance (x86_64): -Onone

Improvement OLD NEW DELTA RATIO
AngryPhonebook.Strasse.Small 806 597 -25.9% 1.35x (?)
AngryPhonebook.Armenian.Small 692 525 -24.1% 1.32x (?)
AngryPhonebook.Cyrillic.Small 704 536 -23.9% 1.31x (?)
DataToStringSmall 4500 3850 -14.4% 1.17x (?)
NSStringConversion.MutableCopy.Rebridge.Medium 861 746 -13.4% 1.15x (?)
NSStringConversion.Rebridge 533 463 -13.1% 1.15x (?)
NSStringConversion.Rebridge.Mutable 1626 1423 -12.5% 1.14x (?)
FloatingPointPrinting_Float_interpolated 75600 67400 -10.8% 1.12x (?)
ErrorHandling 4070 3650 -10.3% 1.12x (?)
DataToStringMedium 7100 6550 -7.7% 1.08x (?)
StringWordBuilder 2910 2700 -7.2% 1.08x (?)
String.data.LargeUnicode 183 170 -7.1% 1.08x (?)
SortStringsUnicode 4800 4470 -6.9% 1.07x (?)
StringWordBuilderReservingCapacity 2850 2660 -6.7% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

auto *si = dyn_cast<StructInst>(value);
if (!si)
return StringInfo::unknown();
value = si->getFieldValue(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are directly forwarding the value of the result of the callee after the pattern match. Just wondering does it work in cases where there could be a runtime failure (eg. cond_fail) in the callee's entry block before the return ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no problem. The call is not eliminated. It's just following SSA-values.

// exposed once optimized String interpolations (from the high-level string
// optimization) are cleaned up. But before the mid-level inliner inlines
// semantic calls.
P.addStringOptimization();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of clean up is required ? Is this something that can be implemented in InstSimplifier in the future, so that we don't need another instance of the pass here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All kind of function passes (e.g. SILMem2Reg) are needed to cleanup the code produced by the first StringOptimization.
Of course, we could put all optimizations in SILCombine, but that would defeat the purpose of having separate passes for separate optimizations. It's a design decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@eeckstein eeckstein merged commit 26cf028 into swiftlang:main Jul 7, 2021
@eeckstein eeckstein deleted the optimize-interpolated-c-strings branch July 7, 2021 17:37
@swift-ci
Copy link
Contributor

swift-ci commented Jul 7, 2021

Build failed before running benchmark.

@swift-ci
Copy link
Contributor

swift-ci commented Jul 7, 2021

Build failed
Swift Test Linux Platform
Git Sha - 88a74a2

@swift-ci
Copy link
Contributor

swift-ci commented Jul 7, 2021

Build failed
Swift Test OS X Platform
Git Sha - 88a74a2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants