Skip to content

[temp-rvalue] Handle promoting alloc_stack that are destroyed via a load [take] in ossa. #30088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gottesmm
Copy link
Contributor

There are a few interesting things here:

  1. All of this code is conditionalized implicitly on ossa by us checking for
    load [take]. If we are in non-ossa, then we will have unqualified loads and this
    code path will be skipped. I left in the old test that made sure we did not
    support this in non-ossa code for this purpose.

  2. We treat the load [take] like a destroy_addr. Thus the current method of
    providing safety via using lifetime analysis to show all "destroys" form a
    minimal post-dominating set for the copy_addr.

  3. We do not allow for load [take] on sub-element initializations. The reason
    why is that SILGen initializes temporaries completely in memory and generally
    destroys addresses completely as well. Given that is the pattern we are looking
    for, we just reduce the state space by banning this pattern.

  4. If we have a copy_addr [init], then we not only change the load [take] to
    have the original source address as an argument, but we also change the load
    [take] to a load [copy] so we don't lose the +1.

I am doing this to clean up some simple temp rvalue patterns in SIL before
semantic arc opts runs. This ensures that we are more likely to have a load
operation come from a more meaningful semantic entity (for instance a let field
of a guaranteed class) and thus allow us to convert a load [copy] to a
load_borrow.

I am also going to add support in subsequent commits for eliminating alloc_stack
that are stored into as well (another pattern I am seeing).

Replace this paragraph with a description of your changes and rationale. Provide links to external references/discussions if appropriate.

Resolves SR-NNNN.

@gottesmm gottesmm requested a review from eeckstein February 27, 2020 01:38
@gottesmm
Copy link
Contributor Author

@eeckstein I need to add a few more tests (I want to add one that shows load [take] not being converted to load [copy]) here but the implementation is complete.

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

Found the problem. The problem was that when I have a copy_addr that doesn't take the source, if I try to eliminate a load [take] (and thus convert the load [take] to be a load [copy]), I need to hoist the load [copy] to be where the copy_addr is since otherwise, in between where the copy_addr was and the load [take], the original value may be destroyed.

…oad [take] in ossa.

There are a few interesting things here:

1. All of this code is conditionalized implicitly on ossa by us checking for
load [take]. If we are in non-ossa, then we will have unqualified loads and this
code path will be skipped. I left in the old test that made sure we did not
support this in non-ossa code for this purpose.

2. We treat the load [take] like a destroy_addr. Thus the current method of
providing safety via using lifetime analysis to show all "destroys" form a
minimal post-dominating set for the copy_addr.

3. We do not allow for load [take] on sub-element initializations. The reason
why is that SILGen initializes temporaries completely in memory and generally
destroys addresses completely as well. Given that is the pattern we are looking
for, we just reduce the state space by banning this pattern.

4. If we have a copy_addr [init], then we not only change the load [take] to
have the original source address as an argument, but we also change the load
[take] to a load [copy] so we don't lose the +1. Additionally, we have to hoist
the load [copy] to the copy_addr [init] site to ensure that we do not insert a
use-after-free from the retain happening too late.

I am doing this to clean up some simple temp rvalue patterns in SIL before
semantic arc opts runs. This ensures that we are more likely to have a load
operation come from a more meaningful semantic entity (for instance a let field
of a guaranteed class) and thus allow us to convert a load [copy] to a
load_borrow.

I am also going to add support in subsequent commits for eliminating alloc_stack
that are stored into as well (another pattern I am seeing).
@gottesmm gottesmm force-pushed the pr-3e6f27334eea2b48b5d8768445143250d4fa13d4 branch from ba18825 to 3a725c3 Compare February 28, 2020 01:10
@gottesmm
Copy link
Contributor Author

@swift-ci test

1 similar comment
@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

1 similar comment
@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor Author

@swift-ci test

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

1 similar comment
@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - ba188252482d4503996a74c63fdbaa18ee2f0152

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 765 913 +19.3% 0.84x (?)
EqualSubstringSubstring 36 39 +8.3% 0.92x (?)
LessSubstringSubstring 36 39 +8.3% 0.92x
EqualSubstringSubstringGenericEquatable 36 39 +8.3% 0.92x (?)
EqualSubstringString 36 39 +8.3% 0.92x (?)
LessSubstringSubstringGenericComparable 36 39 +8.3% 0.92x
EqualStringSubstring 37 40 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 70 61 -12.9% 1.15x
String.data.Medium 89 82 -7.9% 1.09x (?)
StringRemoveDupes 334 312 -6.6% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 741 897 +21.1% 0.83x (?)
EqualSubstringString 36 39 +8.3% 0.92x (?)
EqualSubstringSubstring 37 40 +8.1% 0.93x (?)
LessSubstringSubstring 37 40 +8.1% 0.93x
EqualStringSubstring 37 40 +8.1% 0.93x (?)
EqualSubstringSubstringGenericEquatable 37 40 +8.1% 0.93x (?)
LessSubstringSubstringGenericComparable 37 40 +8.1% 0.93x
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 70 61 -12.9% 1.15x
FlattenListFlatMap 6652 5875 -11.7% 1.13x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
StringUTF16SubstringBuilder 15860 17950 +13.2% 0.88x (?)
LessSubstringSubstring 42 46 +9.5% 0.91x (?)
LessSubstringSubstringGenericComparable 42 46 +9.5% 0.91x
ObjectiveCBridgeFromNSSetAnyObjectToString 70000 75500 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 70 61 -12.9% 1.15x
ArrayOfPOD 987 913 -7.5% 1.08x (?)
ObjectiveCBridgeStringGetASCIIContents 295 275 -6.8% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
EqualSubstringSubstring 40 44 +10.0% 0.91x (?)
EqualSubstringSubstringGenericEquatable 40 44 +10.0% 0.91x (?)
EqualStringSubstring 41 45 +9.8% 0.91x (?)
ObjectiveCBridgeFromNSSetAnyObjectToString 72000 78000 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDate 6330 5450 -13.9% 1.16x (?)
ObjectiveCBridgeStringHash 78 68 -12.8% 1.15x
ObjectiveCBridgeStringGetASCIIContents 328 306 -6.7% 1.07x (?)
StringRemoveDupes 373 348 -6.7% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 827 996 +20.4% 0.83x (?)
EqualSubstringString 40 44 +10.0% 0.91x (?)
EqualSubstringSubstring 41 45 +9.8% 0.91x (?)
EqualStringSubstring 41 45 +9.8% 0.91x (?)
EqualSubstringSubstringGenericEquatable 41 45 +9.8% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 79 68 -13.9% 1.16x (?)
ObjectiveCBridgeStringCStringUsingEncoding 570 532 -6.7% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
LessSubstringSubstringGenericComparable 47 51 +8.5% 0.92x
EqualStringSubstring 48 52 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStringHash 78 68 -12.8% 1.15x
ArrayOfPOD 826 745 -9.8% 1.11x (?)
ObjectiveCBridgeStringGetASCIIContents 329 307 -6.7% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 16 GB

@gottesmm
Copy link
Contributor Author

@swift-ci build toolchain

1 similar comment
@gottesmm
Copy link
Contributor Author

@swift-ci build toolchain

@swift-ci
Copy link
Contributor

Linux Toolchain (Ubuntu 16.04)
Download Toolchain
Git Sha - 3a725c3

Install command
tar zxf swift-PR-30088-351-ubuntu16.04.tar.gz
More info

@swift-ci
Copy link
Contributor

macOS Toolchain
Download Toolchain
Git Sha - 3a725c3

Install command
tar -zxf swift-PR-30088-490-osx.tar.gz --directory ~/

@gottesmm
Copy link
Contributor Author

Seems like noise.

@gottesmm gottesmm merged commit c4fe3c2 into swiftlang:master Feb 28, 2020
@gottesmm gottesmm deleted the pr-3e6f27334eea2b48b5d8768445143250d4fa13d4 branch February 28, 2020 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants