Skip to content

SILOptimizer: restructure the apply(partial_apply) peephole and the dead partial_apply elimination optimizations #29703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 11, 2020

Conversation

eeckstein
Copy link
Contributor

Changes:

  • Allow optimizing partial_apply capturing opened existential: we didn't do this originally because it was complicated to insert the required alloc/dealloc_stack instructions at the right places. Now we have the StackNesting utility, which makes this easier.

  • Support indirect-in parameters. Not super important, but why not? It's also easy to do with the StackNesting utility.

  • Share code between dead closure elimination and the apply(partial_apply) optimization. It's a bit of refactoring and allowed to eliminate some code which is not used anymore.

  • Fix an ownership problem: We inserted copies of partial_apply arguments after the partial_apply (which consumes the arguments).

  • When replacing an apply(partial_apply) -> apply and the partial_apply becomes dead, avoid inserting copies of the arguments twice.

These changes don't have any immediate effect on our current benchmarks, but will allow eliminating curry thunks for existentials.

@eeckstein eeckstein requested a review from gottesmm February 7, 2020 15:52
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Feb 7, 2020

Performance: -O

Improvement OLD NEW DELTA RATIO
UnicodeStringFromCodable 366 295 -19.4% 1.24x
DictionaryCompactMapValuesOfCastValue 6318 5670 -10.3% 1.11x (?)
NormalizedIterator_fastPrenormal 690 630 -8.7% 1.10x (?)
StringHashing_fastPrenormal 600 550 -8.3% 1.09x (?)
NormalizedIterator_latin1 230 214 -7.0% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
DropLastArray 5 9 +80.0% 0.56x
Chars2 3250 3600 +10.8% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
DropLastArrayLazy 9 4 -55.5% 2.25x
UnicodeStringFromCodable 366 295 -19.4% 1.24x (?)
DictionaryCompactMapValuesOfCastValue 6750 5940 -12.0% 1.14x (?)
Set.isDisjoint.Seq.Int.Empty 52 46 -11.5% 1.13x (?)
StringHashing_fastPrenormal 600 550 -8.3% 1.09x (?)
StringHashing_latin1 208 194 -6.7% 1.07x (?)

Code size: -Osize

Performance: -Onone

Improvement OLD NEW DELTA RATIO
UnicodeStringFromCodable 380 308 -18.9% 1.23x
ObjectiveCBridgeToNSString 300 278 -7.3% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Contributor

@gottesmm gottesmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick nit. I need to read this offline in my editor. I'll do a more in depth in a bit.

/// Returns true, if there are no other users beside those collected in \p
/// destroys, i.e. if \p inst can be considered as "dead".
bool collectDestroys(SingleValueInstruction *inst,
SmallVectorImpl<SILInstruction *> &destroys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run this patch through git-clang-format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, done

@eeckstein eeckstein force-pushed the fix-partial-apply-combine branch from ccef7b7 to f170052 Compare February 10, 2020 09:27
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@eeckstein eeckstein force-pushed the fix-partial-apply-combine branch from f170052 to 30beefe Compare February 10, 2020 11:13
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci smoke test


void swift::endLifetimeAtFrontier(
SILValue valueOrStackLoc, const ValueLifetimeAnalysis::Frontier &frontier,
SILBuilder &builder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass in a SILBuilder here? Better to pass in a SILBuilderContext and use a locally created SILBuilderWithScope. Also, why are you messing with RegularLocations here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SILBuilder: ok, makes sense.

The frontier instruction can have a ReturnLocation. Therefore I have to construct a RegularLocation.

@@ -190,7 +197,8 @@ bool ValueLifetimeAnalysis::computeFrontier(Frontier &frontier, Mode mode,

for (unsigned i = 0, e = succBlocks.size(); i != e; ++i) {
if (unhandledFrontierBlocks.count(succBlocks[i])) {
assert(isCriticalEdge(term, i) && "actually not a critical edge?");
assert((isCriticalEdge(term, i) || userSet.count(term)) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this to me? Why is this change needed? Does userSet.count(term) mean a critical edge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix.

/// An edge is also considered as "critical" if it has a single precedessor
/// but the predecessor's terminal instruction is a user of the value.
///

// frontier (see below).
assert(deadInSucc && "The final using TermInst must have successors");
// frontier (see below). Except it is a function exiting TermInst, like
// 'return' or 'throw'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand the comment here why the term inst is an exception. I understand why, it should just be explicit in the code.

}

builder.setInsertionPoint(insertPoint);
builder.setInsertionPoint(paiAI.getInstruction());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to use a new SILBuilderWithScope that takes in a SILBuilderContext.

/// not be needed anymore with OSSA.
static bool keepArgsOfPartialApplyAlive(PartialApplyInst *pai,
ArrayRef<SILInstruction *> paiUsers,
SILBuilder &builder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case where I would suggest passing in a SILBuilderContext and creating builders locally. Will prevent Debug Info bugs.

/// as they may be destroyed/deallocated before the last use by one of the
/// apply instructions.
bool PartialApplyCombiner::copyArgsToTemporaries(
const SmallVectorImpl<FullApplySite> &applies) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> ArrayRef

continue;
ArrayRef<SILParameterInfo> paramList = paiTy->getParameters();
auto argList = pai->getArgumentOperands();
paramList = paramList.drop_front(paramList.size() - argList.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make an API around this on ApplySite? Or on partial apply?

continue;
for (Operand *argOp : argsToHandle) {
SILValue arg = argOp->get();
int argIdx = argOp->getOperandNumber() - pai->getArgumentOperandNumber();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't hard code this. This should be sort sort of higher level API on function conventions or ApplySite.

…ead partial_apply elimination optimizations

Changes:

* Allow optimizing partial_apply capturing opened existential: we didn't do this originally because it was complicated to insert the required alloc/dealloc_stack instructions at the right places. Now we have the StackNesting utility, which makes this easier.

* Support indirect-in parameters. Not super important, but why not? It's also easy to do with the StackNesting utility.

* Share code between dead closure elimination and the apply(partial_apply) optimization. It's a bit of refactoring and allowed to eliminate some code which is not used anymore.

* Fix an ownership problem: We inserted copies of partial_apply arguments _after_ the partial_apply (which consumes the arguments).

* When replacing an apply(partial_apply) -> apply and the partial_apply becomes dead, avoid inserting copies of the arguments twice.

These changes don't have any immediate effect on our current benchmarks, but will allow eliminating curry thunks for existentials.
@eeckstein eeckstein force-pushed the fix-partial-apply-combine branch from 30beefe to 8578936 Compare February 11, 2020 11:50
@eeckstein
Copy link
Contributor Author

@gottesmm Thanks for the review. I have addressed your points in this new version. Much of the code is just copied from the original version, which is already quite old. That's why SILBuilderContext etc. was not used. But, yes, it makes sense to refresh the code.

@eeckstein
Copy link
Contributor Author

@swift-ci smoke test

@@ -104,10 +104,20 @@ SILInstruction *SILCombiner::visitPartialApplyInst(PartialApplyInst *PAI) {
if (foldInverseReabstractionThunks(PAI, this))
return nullptr;

tryOptimizeApplyOfPartialApply(PAI, Builder, getInstModCallbacks());
SILBuilderContext BuilderCtxt(Builder.getModule(), Builder.getTrackingList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get this directly from the builder.

@eeckstein eeckstein merged commit d4837c4 into swiftlang:master Feb 11, 2020
@eeckstein eeckstein deleted the fix-partial-apply-combine branch February 11, 2020 16:54
eeckstein added a commit that referenced this pull request Feb 12, 2020
@vedantk
Copy link
Contributor

vedantk commented Feb 13, 2020

Hi @eeckstein @gottesmm, I'm starting to see a use-before-init on the sanitizer bot after this PR:

/Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/swift/lib/SILOptimizer/Mandatory/MandatoryCombine.cpp:122:9: runtime error: load of value 109, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/swift/lib/SILOptimizer/Mandatory/MandatoryCombine.cpp:122:9 in 
Stack dump:
0.	Program arguments: /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/swift-macosx-x86_64/bin/swift -frontend -module-cache-path /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/./lldb-test-build.noindex/module-cache-clang -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -serialize-debugging-options -module-cache-path /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/test/Swift/Output/RemoteASTImport.test.tmp/cache -merge-modules -emit-module -parse-as-library -sil-merge-partial-modules -disable-diagnostic-passes -disable-sil-perf-optzns -module-name Library Library.part.swiftmodule -o Library.swiftmodule -I/Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/test/Swift/Output/RemoteASTImport.test.tmp 

This is here:

  bool runOnFunction(SILFunction &function) {
    bool changed = false;

    while (doOneIteration(function, iteration)) {
      changed = true;
      ++iteration;
    }

    if (invalidatedStackNesting) { //< here
      StackNesting().correctStackNesting(&function);
    }

Logs: https://ci.swift.org/job/oss-lldb-asan-osx/3867/consoleText. The easiest way to repro might be to pass --enable-ubsan --lldb --test to build-script.

@vedantk
Copy link
Contributor

vedantk commented Feb 13, 2020

This also shows up when building the swift stdlib, so no lldb testing required to repro:

3.      While running pass #1 SILFunctionTransform "MandatoryCombine" on SILFunction "@$sSAyxGs8_PointerssABP9_rawValueBpvgTW".
 for getter for _rawValue (at /Users/vsk/src/github-swift-master/swift/stdlib/public/core/BridgeObjectiveC.swift:408:14)

@eeckstein
Copy link
Contributor Author

@vedantk Thanks for letting me know.
#29842 should fix the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants