Skip to content

Array related optimization improvements #40291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 29, 2021

Conversation

eeckstein
Copy link
Contributor

  • stdlib: Don't check for overflows when adding 1 to Array.count
  • instruction pass to simplify begin_cow_mutation instructions
  • replace Array.capacity with 0 if it's the empty array singleton
  • better handling of ref_element/tail_addr [immutable] in redundant load elimination
  • handle guaranteed arguments in non-OSSA mode in MemBehavior

For details see the commit messages

rdar://83825776
https://bugs.swift.org/browse/SR-15259

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein eeckstein force-pushed the cow-opt-improvements branch from 846d066 to 20041be Compare November 24, 2021 17:32
@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 20041becc4f5b5539adfd714f9c6278940ee95fd

@swift-ci
Copy link
Contributor

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
FlattenListLoop 1654 2550 +54.2% 0.65x (?)
FlattenListFlatMap 5166 6593 +27.6% 0.78x (?)
NSStringConversion.Rebridge.Mutable 1037 1188 +14.6% 0.87x (?)
InsertCharacterEndIndex 167 185 +10.8% 0.90x (?)
MapReduce 127 139 +9.4% 0.91x (?)
ArrayAppendSequence 640 700 +9.4% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
SuffixCountableRangeLazy 9 8 -11.1% 1.12x (?)
RemoveWhereFilterString 314 285 -9.2% 1.10x
StringBuilderSmallReservingCapacity 342 316 -7.6% 1.08x
StringBuilder 331 306 -7.6% 1.08x (?)
NSStringConversion.Rebridge.LongUTF8 54 50 -7.4% 1.08x (?)
EqualSubstringSubstring 42 39 -7.1% 1.08x (?)
EqualStringSubstring 42 39 -7.1% 1.08x (?)
EqualSubstringString 42 39 -7.1% 1.08x (?)
LessSubstringSubstring 43 40 -7.0% 1.07x (?)
EqualSubstringSubstringGenericEquatable 43 40 -7.0% 1.07x
LessSubstringSubstringGenericComparable 43 40 -7.0% 1.07x (?)
NSStringConversion.Rebridge.UTF8 450 419 -6.9% 1.07x (?)
BinaryFloatingPointPropertiesNextUp 30 28 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 5772 5964 +3.3% 0.97x
RomanNumbers.o 6748 6892 +2.1% 0.98x
 
Improvement OLD NEW DELTA RATIO
MapReduce.o 28651 27915 -2.6% 1.03x
BinaryFloatingPointProperties.o 4880 4784 -2.0% 1.02x
ArrayAppend.o 24825 24345 -1.9% 1.02x
Suffix.o 21052 20700 -1.7% 1.02x
IntegerParsing.o 59413 58453 -1.6% 1.02x
AngryPhonebook.o 8983 8855 -1.4% 1.01x
ObjectiveCBridgingStubs.o 19181 18909 -1.4% 1.01x
RangeAssignment.o 3404 3356 -1.4% 1.01x
NopDeinit.o 3639 3591 -1.3% 1.01x
FlattenList.o 3969 3921 -1.2% 1.01x
BucketSort.o 8855 8759 -1.1% 1.01x
Array2D.o 2965 2933 -1.1% 1.01x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
MapReduceLazyCollectionShort 62 85 +37.1% 0.73x
PrefixAnySeqCRangeIterLazy 158 176 +11.4% 0.90x
PrefixAnySeqCntRangeLazy 158 176 +11.4% 0.90x (?)
DropFirstAnySeqCRangeIterLazy 216 240 +11.1% 0.90x
DropFirstAnySeqCntRangeLazy 216 240 +11.1% 0.90x (?)
DropFirstAnySeqCRangeIter 164 182 +11.0% 0.90x (?)
DropFirstAnySeqCntRange 164 182 +11.0% 0.90x
BinaryFloatingPointPropertiesBinade 28 31 +10.7% 0.90x (?)
SuffixAnyCollection 57 63 +10.5% 0.90x
PrefixAnyCollection 163 180 +10.4% 0.91x
DropFirstAnyCollection 180 198 +10.0% 0.91x (?)
DropWhileAnyCollection 180 198 +10.0% 0.91x (?)
PrefixWhileAnyCollectionLazy 176 193 +9.7% 0.91x
DropLastAnyCollection 63 69 +9.5% 0.91x
DropWhileAnyCollectionLazy 252 275 +9.1% 0.92x (?)
DropWhileAnySeqCRangeIterLazy 264 287 +8.7% 0.92x (?)
DropWhileAnySeqCntRangeLazy 264 287 +8.7% 0.92x (?)
InsertCharacterEndIndex 168 181 +7.7% 0.93x (?)
PrefixWhileAnyCollection 233 251 +7.7% 0.93x
 
Improvement OLD NEW DELTA RATIO
SuffixAnySeqCRangeIter 947 801 -15.4% 1.18x
SuffixAnySeqCntRange 881 758 -14.0% 1.16x
DictionaryKeysContainsCocoa 29 26 -10.3% 1.12x (?)
LessSubstringSubstring 44 40 -9.1% 1.10x
EqualSubstringSubstring 43 40 -7.0% 1.07x (?)
EqualStringSubstring 43 40 -7.0% 1.07x (?)
EqualSubstringSubstringGenericEquatable 43 40 -7.0% 1.07x
EqualSubstringString 43 40 -7.0% 1.07x (?)
LessSubstringSubstringGenericComparable 43 40 -7.0% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 5597 5720 +2.2% 0.98x
 
Improvement OLD NEW DELTA RATIO
Suffix.o 24662 23849 -3.3% 1.03x
MapReduce.o 21255 20730 -2.5% 1.03x
IntegerParsing.o 54551 53251 -2.4% 1.02x
NopDeinit.o 4314 4234 -1.9% 1.02x
ArrayAppend.o 22953 22541 -1.8% 1.02x
BinaryFloatingPointProperties.o 4960 4872 -1.8% 1.02x
RomanNumbers.o 5321 5230 -1.7% 1.02x
ObjectiveCBridgingStubs.o 16993 16721 -1.6% 1.02x
Array2D.o 2768 2726 -1.5% 1.02x
FlattenList.o 3858 3801 -1.5% 1.01x
RangeAssignment.o 3261 3221 -1.2% 1.01x
BucketSort.o 8591 8488 -1.2% 1.01x
AngryPhonebook.o 8299 8207 -1.1% 1.01x
SortIntPyramids.o 8937 8841 -1.1% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 207 232 +12.1% 0.89x (?)
LineSink.scalars.alpha 346 377 +9.0% 0.92x (?)
RangeReplaceableCollectionPlusDefault 5828 6312 +8.3% 0.92x (?)
ClassArrayGetter2 4900 5280 +7.8% 0.93x (?)
Set.isSuperset.Seq.Box0 2340 2519 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ErrorHandling 4160 3590 -13.7% 1.16x (?)
StringBuilderLong 2120 1900 -10.4% 1.12x (?)
NSStringConversion.Rebridge.UTF8 769 705 -8.3% 1.09x (?)
SortStringsUnicode 5130 4775 -6.9% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@eeckstein eeckstein force-pushed the cow-opt-improvements branch from 20041be to b0be31a Compare November 26, 2021 17:01
@@ -245,28 +245,60 @@ void LSLocation::reduce(LSLocation Base, SILModule *M,
replaceSubLocations(Base, M, context, Locs, SubLocations);
}

void LSLocation::enumerateLSLocation(TypeExpansionContext context, SILModule *M,
std::pair<SILValue, bool> LSLocation::getBaseAddress(SILValue v,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eeckstein could you use AccessBase::compute(address), AccessBase::isObjectAccess(), AccessBase::getReference() (then check for immutability), and AccessBase::getStorageReferenceRoot() rather than using ad-hoc stripWhateverProjection. This way the possible patterns of object/variable access are well-defined by SIL, verified by the SIL verifier, and the logic used by passes is complete and maintainable.

Note that this version of getBaseAddress is misleading because it does not always return a "base address" as defined in MemAccessUtils.

Copy link
Contributor Author

@eeckstein eeckstein Nov 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's not that simple. RLE and DSE internally rely on that kind of stripX pattern. When I use AccessBase, I get a crash.
I don't wan't to rewrite and RLE and DSE for this small change - not yet 🙂.

Note that this version of getBaseAddress is misleading

I renamed it to getBaseAddressOrObject

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the crash inside an AccessBase method? I'd like to fix that before something else hits it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it crashed in RLE. There is no problem in AccessBase.

// argument.
// Note that in OSSA, getSingleBorrowIntroducingValue will detect a
// guaranteed argument.
for (;;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a partial reimplementation of findOwnershipReferenceAggregate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Using findOwnershipReferenceAggregate now

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about LSLocation::getBaseAddress, presumably it was copied from code in InstOptUtils. I'd strongly prefer the use of APIs in MemAccessUtils in two places. This sort of ad-hoc address/reference cast traversal makes it hard to migrate passes to handle SIL changes like access markers for example, and they contribute to pipeline instability and lack of test coverage.

Otherwise everything looks good to me.

@eeckstein eeckstein force-pushed the cow-opt-improvements branch from b0be31a to 6035fcf Compare November 28, 2021 15:43
That addition can not possibly overflow. If Array.count would be Int max, the allocation of the array buffer would have failed way before.
* add the IntegerLiteralInst + the Builder function to create it
* add Value.nonDebugUses
* add a general utility Sequence.isEmpty
* add PassContext.replaceAllUses
* add a function to erase all `debug_value` uses in PassContext.erase
…ructions

* Replace the uniqueness result of a begin_cow_mutation of an empty Array/Set/Dictionary singleton with zero.
* Remove empty begin_cow_mutation - end_cow_mutation pairs
* Remove empty end_cow_mutation - begin_cow_mutation pairs
…gleton

We already do that for Array.count. Now add Array.capacity.
Rerun RLE with cutting off the base address of loads at `ref_element/tail_addr [immutable]`. This increases the chance of catching loads of immutable COW class properties or elements.
In non-OSSA, this is the only case for which we know that if an instruction is within a "borrow-scope".
…y function.

This is an end-to-end test to check if the optimization improvements of the previous commits result in good code generation for this function.

rdar://83825776
https://bugs.swift.org/browse/SR-15259
@eeckstein eeckstein force-pushed the cow-opt-improvements branch from 6035fcf to 8e83d5f Compare November 29, 2021 08:42
@eeckstein
Copy link
Contributor Author

@swift-ci test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants