Skip to content

[Runtime] Eliminate stack frames in swift_retain and swift_bridgeObjectRetain on ARM64. #61794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2022

Conversation

mikeash
Copy link
Contributor

@mikeash mikeash commented Oct 28, 2022

Rearrange the slow paths a bit to make them tail calls, which allows the compiler to emit these functions without frames.

Clang is happy to emit frameless functions on ARM64 if no stack space is needed on all execution paths. However, when there's a fast path which doesn't need stack space, and a slow path which does, clang emits code that pushes a stack frame and then decides which path to take. This is fine, but it means we're paying more than we'd like to on the fast path.

We can work around that by manually outlining the slow path, and ensuring that it's invoked with a tail call. Then the original function doesn't need a stack frame on any path and clang omits the stack frame.

We tweak RefCounts::increment to return the object it's being called on, which allows swift_retain to tail-call it. We manually outline the objc_retain call in swift_bridgeObjectRetain, which allows the swift_retain path to be frameless.

rdar://101764509

@mikeash
Copy link
Contributor Author

mikeash commented Oct 28, 2022

@swift-ci Please Apple Silicon benchmark

@mikeash
Copy link
Contributor Author

mikeash commented Oct 31, 2022

------- Performance (arm64): -O -------

REGRESSION                                 OLD   NEW   DELTA    RATIO    
ObjectiveCBridgeFromNSString               630   685   +8.7%    **0.92x (?)**

IMPROVEMENT                                OLD   NEW   DELTA    RATIO    
StringAdder                                248   173   -30.2%   **1.43x (?)**
StringFromLongWholeSubstringGeneric        4     3     -25.0%   **1.33x (?)**
PrefixWhileAnySequenceLazy                 403   329   -18.4%   **1.22x (?)**
DictionaryBridgeToObjC_Bridge              6     5     -16.7%   **1.20x (?)**
ParseFloat.Float.Exp                       7     6     -14.3%   **1.17x (?)**
LessSubstringSubstring                     22    19    -13.6%   **1.16x (?)**
EqualSubstringSubstring                    22    19    -13.6%   **1.16x (?)**
EqualStringSubstring                       22    19    -13.6%   **1.16x (?)**
EqualSubstringSubstringGenericEquatable    22    19    -13.6%   **1.16x (?)**
EqualSubstringString                       22    19    -13.6%   **1.16x (?)**
LessSubstringSubstringGenericComparable    22    19    -13.6%   **1.16x (?)**
ConvertFloatingPoint.MockFloat64ToDouble   10    9     -10.0%   **1.11x (?)**
Hanoi                                      950   860   -9.5%    **1.10x (?)**
SequenceAlgosList                          330   300   -9.1%    **1.10x (?)**
Set.subtracting.Box.Empty                  13    12    -7.7%    **1.08x (?)**
DictionaryKeysContainsNative               14    13    -7.1%    **1.08x (?)**

------- Performance (arm64): -Osize -------

REGRESSION                                        OLD    NEW    DELTA    RATIO    
RawBufferCopyBytes                                11     13     +18.2%   **0.85x (?)**
ObjectiveCBridgeStubNSDataAppend                  1020   1110   +8.8%    **0.92x (?)**
ArrayAppendOptionals                              370    400    +8.1%    **0.93x (?)**

IMPROVEMENT                                       OLD    NEW    DELTA    RATIO    
StringAdder                                       247    191    -22.7%   **1.29x (?)**
EqualStringSubstring                              22     19     -13.6%   **1.16x (?)**
EqualSubstringString                              22     19     -13.6%   **1.16x (?)**
LessSubstringSubstring                            23     20     -13.0%   **1.15x (?)**
EqualSubstringSubstring                           23     20     -13.0%   **1.15x (?)**
EqualSubstringSubstringGenericEquatable           23     20     -13.0%   **1.15x (?)**
LessSubstringSubstringGenericComparable           23     20     -13.0%   **1.15x (?)**
StringToDataEmpty                                 400    350    -12.5%   **1.14x (?)**
StringToDataSmall                                 450    400    -11.1%   **1.12x (?)**
CharIteration_russian_unicodeScalars              3040   2720   -10.5%   **1.12x (?)**
SequenceAlgosList                                 330    300    -9.1%    **1.10x (?)**
CharIteration_ascii_unicodeScalars                2560   2360   -7.8%    **1.08x**
Set.subtracting.Empty.Box                         13     12     -7.7%    **1.08x (?)**
CharIteration_korean_unicodeScalars               3280   3040   -7.3%    **1.08x (?)**
CharIteration_tweet_unicodeScalars                4960   4600   -7.3%    **1.08x (?)**
String.replaceSubrange.ArrChar                    28     26     -7.1%    **1.08x (?)**
Set.subtracting.Box.Empty                         14     13     -7.1%    **1.08x (?)**
CharIteration_chinese_unicodeScalars              2280   2120   -7.0%    **1.08x (?)**
CharIteration_punctuatedJapanese_unicodeScalars   600    560    -6.7%    **1.07x (?)**
Set.isStrictSuperset.Seq.Box0                     15     14     -6.7%    **1.07x (?)**
CharIteration_japanese_unicodeScalars             3640   3400   -6.6%    **1.07x (?)**

------- Performance (arm64): -Onone -------

REGRESSION                                OLD     NEW     DELTA    RATIO    
StringToDataLargeUnicode                  1800    1950    +8.3%    **0.92x (?)**
NSArray.bridged.objectAtIndex             264     285     +8.0%    **0.93x (?)**

IMPROVEMENT                               OLD     NEW     DELTA    RATIO    
SortArrayInClass                          74803   62321   -16.7%   **1.20x (?)**
EqualSubstringSubstringGenericEquatable   25      21      -16.0%   **1.19x (?)**
EqualSubstringSubstring                   26      22      -15.4%   **1.18x (?)**
LessSubstringSubstring                    25      22      -12.0%   **1.14x (?)**
LessSubstringSubstringGenericComparable   25      22      -12.0%   **1.14x (?)**
EqualStringSubstring                      26      23      -11.5%   **1.13x (?)**
EqualSubstringString                      26      23      -11.5%   **1.13x (?)**
SequenceAlgosList                         3950    3540    -10.4%   **1.12x**
SortSortedStrings                         59      55      -6.8%    **1.07x**
SortStrings                               902     842     -6.7%    **1.07x (?)**

@mikeash mikeash force-pushed the retain-stack-frames branch from 8b93591 to 9a94afc Compare November 3, 2022 16:47
@mikeash mikeash changed the title Experiment with avoiding stack frames in swift_retain and swift_bridgeObjectRetain [Runtime] Eliminate stack frames in swift_retain and swift_bridgeObjectRetain on ARM64. Nov 3, 2022
@mikeash mikeash marked this pull request as ready for review November 3, 2022 16:48
@mikeash mikeash force-pushed the retain-stack-frames branch 2 times, most recently from 12ae508 to f635d21 Compare November 3, 2022 17:47
@mikeash
Copy link
Contributor Author

mikeash commented Nov 3, 2022

@swift-ci please test

@mikeash mikeash force-pushed the retain-stack-frames branch from f635d21 to 10b5b69 Compare November 4, 2022 15:38
@mikeash
Copy link
Contributor Author

mikeash commented Nov 4, 2022

@swift-ci please test

…ctRetain on ARM64.

Rearrange the slow paths a bit to make them tail calls, which allows the compiler to emit these functions without frames.

Clang is happy to emit frameless functions on ARM64 if no stack space is needed on all execution paths. However, when there's a fast path which doesn't need stack space, and a slow path which does, clang emits code that pushes a stack frame and then decides which path to take. This is fine, but it means we're paying more than we'd like to on the fast path.

We can work around that by manually outlining the slow path, and ensuring that it's invoked with a tail call. Then the original function doesn't need a stack frame on any path and clang omits the stack frame.

We tweak RefCounts::increment to return the object it's being called on, which allows `swift_retain` to tail-call it. We manually outline the objc_retain call in swift_bridgeObjectRetain, which allows the swift_retain path to be frameless.

rdar://101764509
@mikeash mikeash force-pushed the retain-stack-frames branch from 10b5b69 to 724a9a7 Compare November 7, 2022 20:38
@mikeash
Copy link
Contributor Author

mikeash commented Nov 7, 2022

@swift-ci please test

@mikeash mikeash merged commit b9391c0 into swiftlang:main Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants