SILOptimizer: restructure the apply(partial_apply) peephole and the dead partial_apply elimination optimizations #29703

eeckstein · 2020-02-07T15:51:50Z

Changes:

Allow optimizing partial_apply capturing opened existential: we didn't do this originally because it was complicated to insert the required alloc/dealloc_stack instructions at the right places. Now we have the StackNesting utility, which makes this easier.
Support indirect-in parameters. Not super important, but why not? It's also easy to do with the StackNesting utility.
Share code between dead closure elimination and the apply(partial_apply) optimization. It's a bit of refactoring and allowed to eliminate some code which is not used anymore.
Fix an ownership problem: We inserted copies of partial_apply arguments after the partial_apply (which consumes the arguments).
When replacing an apply(partial_apply) -> apply and the partial_apply becomes dead, avoid inserting copies of the arguments twice.

These changes don't have any immediate effect on our current benchmarks, but will allow eliminating curry thunks for existentials.

eeckstein · 2020-02-07T15:52:11Z

@swift-ci test

eeckstein · 2020-02-07T15:52:19Z

@swift-ci benchmark

swift-ci · 2020-02-07T16:19:29Z

Performance: -O

Improvement	OLD	NEW	DELTA	RATIO
UnicodeStringFromCodable	366	295	-19.4%	1.24x
DictionaryCompactMapValuesOfCastValue	6318	5670	-10.3%	1.11x (?)
NormalizedIterator_fastPrenormal	690	630	-8.7%	1.10x (?)
StringHashing_fastPrenormal	600	550	-8.3%	1.09x (?)
NormalizedIterator_latin1	230	214	-7.0%	1.07x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DropLastArray	5	9	+80.0%	0.56x
Chars2	3250	3600	+10.8%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
DropLastArrayLazy	9	4	-55.5%	2.25x
UnicodeStringFromCodable	366	295	-19.4%	1.24x (?)
DictionaryCompactMapValuesOfCastValue	6750	5940	-12.0%	1.14x (?)
Set.isDisjoint.Seq.Int.Empty	52	46	-11.5%	1.13x (?)
StringHashing_fastPrenormal	600	550	-8.3%	1.09x (?)
StringHashing_latin1	208	194	-6.7%	1.07x (?)

Code size: -Osize

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
UnicodeStringFromCodable	380	308	-18.9%	1.23x
ObjectiveCBridgeToNSString	300	278	-7.3%	1.08x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

gottesmm

Quick nit. I need to read this offline in my editor. I'll do a more in depth in a bit.

gottesmm · 2020-02-07T21:26:06Z

include/swift/SILOptimizer/Utils/InstOptUtils.h

+/// Returns true, if there are no other users beside those collected in \p
+/// destroys, i.e. if \p inst can be considered as "dead".
+bool collectDestroys(SingleValueInstruction *inst,
+                             SmallVectorImpl<SILInstruction *> &destroys);


Can you run this patch through git-clang-format?

eeckstein · 2020-02-10T09:27:31Z

@swift-ci smoke test

eeckstein · 2020-02-10T11:13:22Z

@swift-ci smoke test

eeckstein · 2020-02-10T11:13:45Z

@swift-ci smoke test

gottesmm · 2020-02-10T20:02:53Z

lib/SILOptimizer/Utils/ValueLifetime.cpp

+
+void swift::endLifetimeAtFrontier(
+    SILValue valueOrStackLoc, const ValueLifetimeAnalysis::Frontier &frontier,
+    SILBuilder &builder) {


Why pass in a SILBuilder here? Better to pass in a SILBuilderContext and use a locally created SILBuilderWithScope. Also, why are you messing with RegularLocations here?

SILBuilder: ok, makes sense.

The frontier instruction can have a ReturnLocation. Therefore I have to construct a RegularLocation.

gottesmm · 2020-02-10T20:03:09Z

lib/SILOptimizer/Utils/ValueLifetime.cpp

@@ -190,7 +197,8 @@ bool ValueLifetimeAnalysis::computeFrontier(Frontier &frontier, Mode mode,

    for (unsigned i = 0, e = succBlocks.size(); i != e; ++i) {
      if (unhandledFrontierBlocks.count(succBlocks[i])) {
-        assert(isCriticalEdge(term, i) && "actually not a critical edge?");
+        assert((isCriticalEdge(term, i) || userSet.count(term)) &&


Can you explain this to me? Why is this change needed? Does userSet.count(term) mean a critical edge?

This is a bug fix.

/// An edge is also considered as "critical" if it has a single precedessor /// but the predecessor's terminal instruction is a user of the value. ///

gottesmm · 2020-02-10T20:29:26Z

lib/SILOptimizer/Utils/ValueLifetime.cpp

-      // frontier (see below).
-      assert(deadInSucc && "The final using TermInst must have successors");
+      // frontier (see below). Except it is a function exiting TermInst, like
+      // 'return' or 'throw'.


Can you expand the comment here why the term inst is an exception. I understand why, it should just be explicit in the code.

gottesmm · 2020-02-10T20:31:22Z

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

-  }
-
-  builder.setInsertionPoint(insertPoint);
+  builder.setInsertionPoint(paiAI.getInstruction());


Please change this to use a new SILBuilderWithScope that takes in a SILBuilderContext.

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

gottesmm · 2020-02-10T20:52:24Z

lib/SILOptimizer/Utils/InstOptUtils.cpp

+///       not be needed anymore with OSSA.
+static bool keepArgsOfPartialApplyAlive(PartialApplyInst *pai,
+                                        ArrayRef<SILInstruction *> paiUsers,
+                                        SILBuilder &builder) {


Another case where I would suggest passing in a SILBuilderContext and creating builders locally. Will prevent Debug Info bugs.

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

gottesmm · 2020-02-10T20:54:03Z

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

+/// as they may be destroyed/deallocated before the last use by one of the
+/// apply instructions.
+bool PartialApplyCombiner::copyArgsToTemporaries(
+    const SmallVectorImpl<FullApplySite> &applies) {


=> ArrayRef

gottesmm · 2020-02-10T20:55:02Z

lib/SILOptimizer/Utils/InstOptUtils.cpp

-      continue;
+  ArrayRef<SILParameterInfo> paramList = paiTy->getParameters();
+  auto argList = pai->getArgumentOperands();
+  paramList = paramList.drop_front(paramList.size() - argList.size());


Can we make an API around this on ApplySite? Or on partial apply?

gottesmm · 2020-02-10T20:55:45Z

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

-      continue;
+  for (Operand *argOp : argsToHandle) {
+    SILValue arg = argOp->get();
+    int argIdx = argOp->getOperandNumber() - pai->getArgumentOperandNumber();


Please don't hard code this. This should be sort sort of higher level API on function conventions or ApplySite.

…ead partial_apply elimination optimizations Changes: * Allow optimizing partial_apply capturing opened existential: we didn't do this originally because it was complicated to insert the required alloc/dealloc_stack instructions at the right places. Now we have the StackNesting utility, which makes this easier. * Support indirect-in parameters. Not super important, but why not? It's also easy to do with the StackNesting utility. * Share code between dead closure elimination and the apply(partial_apply) optimization. It's a bit of refactoring and allowed to eliminate some code which is not used anymore. * Fix an ownership problem: We inserted copies of partial_apply arguments _after_ the partial_apply (which consumes the arguments). * When replacing an apply(partial_apply) -> apply and the partial_apply becomes dead, avoid inserting copies of the arguments twice. These changes don't have any immediate effect on our current benchmarks, but will allow eliminating curry thunks for existentials.

eeckstein · 2020-02-11T11:52:27Z

@gottesmm Thanks for the review. I have addressed your points in this new version. Much of the code is just copied from the original version, which is already quite old. That's why SILBuilderContext etc. was not used. But, yes, it makes sense to refresh the code.

eeckstein · 2020-02-11T11:52:38Z

@swift-ci smoke test

gottesmm · 2020-02-11T16:20:22Z

lib/SILOptimizer/SILCombiner/SILCombinerApplyVisitors.cpp

@@ -104,10 +104,20 @@ SILInstruction *SILCombiner::visitPartialApplyInst(PartialApplyInst *PAI) {
  if (foldInverseReabstractionThunks(PAI, this))
    return nullptr;

-  tryOptimizeApplyOfPartialApply(PAI, Builder, getInstModCallbacks());
+  SILBuilderContext BuilderCtxt(Builder.getModule(), Builder.getTrackingList());


You can get this directly from the builder.

lib/SILOptimizer/Utils/PartialApplyCombiner.cpp

Minor cleanups after #29703

vedantk · 2020-02-13T19:13:20Z

Hi @eeckstein @gottesmm, I'm starting to see a use-before-init on the sanitizer bot after this PR:

/Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/swift/lib/SILOptimizer/Mandatory/MandatoryCombine.cpp:122:9: runtime error: load of value 109, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/swift/lib/SILOptimizer/Mandatory/MandatoryCombine.cpp:122:9 in 
Stack dump:
0.	Program arguments: /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/swift-macosx-x86_64/bin/swift -frontend -module-cache-path /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/./lldb-test-build.noindex/module-cache-clang -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -serialize-debugging-options -module-cache-path /Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/test/Swift/Output/RemoteASTImport.test.tmp/cache -merge-modules -emit-module -parse-as-library -sil-merge-partial-modules -disable-diagnostic-passes -disable-sil-perf-optzns -module-name Library Library.part.swiftmodule -o Library.swiftmodule -I/Users/buildnode/jenkins/workspace/oss-lldb-asan-osx/Ninja-ReleaseAssert+asan+ubsan/lldb-macosx-x86_64/test/Swift/Output/RemoteASTImport.test.tmp

This is here:

  bool runOnFunction(SILFunction &function) {
    bool changed = false;

    while (doOneIteration(function, iteration)) {
      changed = true;
      ++iteration;
    }

    if (invalidatedStackNesting) { //< here
      StackNesting().correctStackNesting(&function);
    }

Logs: https://ci.swift.org/job/oss-lldb-asan-osx/3867/consoleText. The easiest way to repro might be to pass --enable-ubsan --lldb --test to build-script.

vedantk · 2020-02-13T22:03:56Z

This also shows up when building the swift stdlib, so no lldb testing required to repro:

3.      While running pass #1 SILFunctionTransform "MandatoryCombine" on SILFunction "@$sSAyxGs8_PointerssABP9_rawValueBpvgTW".
 for getter for _rawValue (at /Users/vsk/src/github-swift-master/swift/stdlib/public/core/BridgeObjectiveC.swift:408:14)

eeckstein · 2020-02-14T12:56:19Z

@vedantk Thanks for letting me know.
#29842 should fix the issue

eeckstein requested a review from gottesmm February 7, 2020 15:52

gottesmm reviewed Feb 7, 2020

View reviewed changes

eeckstein force-pushed the fix-partial-apply-combine branch from ccef7b7 to f170052 Compare February 10, 2020 09:27

eeckstein force-pushed the fix-partial-apply-combine branch from f170052 to 30beefe Compare February 10, 2020 11:13

gottesmm reviewed Feb 10, 2020

View reviewed changes

eeckstein force-pushed the fix-partial-apply-combine branch from 30beefe to 8578936 Compare February 11, 2020 11:50

gottesmm reviewed Feb 11, 2020

View reviewed changes

eeckstein merged commit d4837c4 into swiftlang:master Feb 11, 2020

eeckstein deleted the fix-partial-apply-combine branch February 11, 2020 16:54

eeckstein added a commit that referenced this pull request Feb 12, 2020

Merge pull request #29762 from eeckstein/refactor

5a8e3ea

Minor cleanups after #29703

SILOptimizer: restructure the apply(partial_apply) peephole and the dead partial_apply elimination optimizations #29703

SILOptimizer: restructure the apply(partial_apply) peephole and the dead partial_apply elimination optimizations #29703

Uh oh!

Conversation

eeckstein commented Feb 7, 2020

Uh oh!

eeckstein commented Feb 7, 2020

Uh oh!

eeckstein commented Feb 7, 2020

Uh oh!

swift-ci commented Feb 7, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

gottesmm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Feb 10, 2020

Uh oh!

eeckstein commented Feb 10, 2020

Uh oh!

eeckstein commented Feb 10, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Feb 11, 2020

Uh oh!

eeckstein commented Feb 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vedantk commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vedantk commented Feb 13, 2020

Uh oh!

eeckstein commented Feb 14, 2020

Uh oh!

Uh oh!

vedantk commented Feb 13, 2020 •

edited

Loading