Skip to content

Fix logic related to isTriviallyDuplicatable. #28249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 14, 2019
Merged

Fix logic related to isTriviallyDuplicatable. #28249

merged 1 commit into from
Nov 14, 2019

Conversation

atrick
Copy link
Contributor

@atrick atrick commented Nov 14, 2019

In SILInstruction::isTriviallyDuplicatable():

  • Make deallocating instructions trivially duplicatable. They are by
    any useful definition--duplicating an instruction does not imply
    reordering it. Tail duplication was already treating deallocations
    as duplicatable, but doing it inconsistently. Sometimes it checks
    isTriviallyDuplicatable, and sometimes it doesn't, which appears to
    have been an accident. Disallowing duplication of deallocations will
    cause severe performance regressions. Instead, consistently allow
    them to be duplicated, making tail duplication more powerful, which
    could expose other bugs.

  • Do not duplicate on-stack AllocRefInst (without special
    consideration). This is a correctness fix that apparently was never
    exposed.

Fix SILLoop::canDuplicate():

  • Handle isDeallocatingStack. It's not clear how we were avoiding an
    assertion before when a stack allocatable reference was confined to
    a loop--probably just by luck.

  • Handle begin/end_access inside a loop. This is extremely important
    and probably prevented many loop optimizations from working with
    exclusivity.

Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.

In SILInstruction::isTriviallyDuplicatable():

- Make deallocating instructions trivially duplicatable. They are by
  any useful definition--duplicating an instruction does not imply
  reordering it. Tail duplication was already treating deallocations
  as duplicatable, but doing it inconsistently. Sometimes it checks
  isTriviallyDuplicatable, and sometimes it doesn't, which appears to
  have been an accident. Disallowing duplication of deallocations will
  cause severe performance regressions. Instead, consistently allow
  them to be duplicated, making tail duplication more powerful, which
  could expose other bugs.

- Do not duplicate on-stack AllocRefInst (without special
  consideration). This is a correctness fix that apparently was never
  exposed.

Fix SILLoop::canDuplicate():

- Handle isDeallocatingStack. It's not clear how we were avoiding an
  assertion before when a stack allocatable reference was confined to
  a loop--probably just by luck.

- Handle begin/end_access inside a loop. This is extremely important
  and probably prevented many loop optimizations from working with
  exclusivity.

Update LoopRotate canDuplicateOrMoveToPreheader(). This is NFC.
@atrick
Copy link
Contributor Author

atrick commented Nov 14, 2019

I figured out this is what was been blocking another correctness fix:
#27444

That fix was regressing performance because it was correctly checking isTriviallyDuplicatable before cloning blocks.

@atrick
Copy link
Contributor Author

atrick commented Nov 14, 2019

@swift-ci test

@atrick
Copy link
Contributor Author

atrick commented Nov 14, 2019

@swift-ci benchmark

@atrick
Copy link
Contributor Author

atrick commented Nov 14, 2019

@swift-ci test source compatibility

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
Set.isDisjoint.Seq.Box.Empty 145 160 +10.3% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
ClassArrayGetter2 1830 120 -93.4% 15.25x
MapReduceClass2 210 42 -80.0% 5.00x
MapReduceClassShort2 376 203 -46.0% 1.85x
Set.isDisjoint.Seq.Box25 484 296 -38.8% 1.64x
Set.isDisjoint.Seq.Box0 658 408 -38.0% 1.61x
Set.isSuperset.Seq.Box25 153 96 -37.3% 1.59x
Set.isSuperset.Seq.Box0 242 184 -24.0% 1.32x
ArrayAppendGenericStructs 1760 1340 -23.9% 1.31x (?)
MapReduceLazyCollectionShort 37 34 -8.1% 1.09x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
DictionaryGroup.o 16197 17573 +8.5% 0.92x
MapReduce.o 31909 32261 +1.1% 0.99x
SetTests.o 141957 143437 +1.0% 0.99x

Performance: -Osize

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDateRef 3780 4200 +11.1% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeFromNSStringForced 2735 2545 -6.9% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Hash.o 18997 19445 +2.4% 0.98x

Performance: -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubToNSDate2 1320 1470 +11.4% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDateRef 4850 4370 -9.9% 1.11x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tough I don't know why @aschwaighofer excluded partial_apply on stack. He should approve this specific change.

@aschwaighofer
Copy link
Contributor

The case of partial_apply [stack] is handled earlier by querying the isAllocatingStack() predicate so this part of the change is NFC.

if (auto *Dealloc = dyn_cast<DeallocRefInst>(I))
Alloc = dyn_cast<AllocRefInst>(Dealloc->getOperand());
// The matching alloc_stack must be in the loop.
return Alloc && contains(Alloc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add partial_apply [stack] here.

@atrick atrick merged commit 1ca57e0 into swiftlang:master Nov 14, 2019
@atrick
Copy link
Contributor Author

atrick commented Nov 14, 2019

I don't think the previous support in Loop canDuplicate for on-stack PartialApply actually worked because it only handled the PartialApply, not the corresponding DeallocStack.

I added the full support here
#28261

@atrick atrick deleted the fix-trivially-dup branch December 23, 2019 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants