Skip to content

Throws Prediction + HotColdSplitting #75382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 9, 2024

Conversation

kavon
Copy link
Member

@kavon kavon commented Jul 20, 2024

It's reasonable to statically assume that errors are not typically thrown by functions. By making this assumption, we can annotate blocks of code dealing with errors as cold in LLVM. This aids in block layout among other things.

@kavon
Copy link
Member Author

kavon commented Jul 20, 2024

@swift-ci Please test compiler performance

@kavon
Copy link
Member Author

kavon commented Jul 23, 2024

@swift-ci Please benchmark

@kavon kavon force-pushed the static-branch-prediction branch from 4dfd362 to 51baeaf Compare July 30, 2024 20:43
@kavon
Copy link
Member Author

kavon commented Jul 30, 2024

@swift-ci Please benchmark

@kavon kavon requested a review from aschwaighofer July 30, 2024 21:19
@kavon
Copy link
Member Author

kavon commented Jul 30, 2024

Benchmark results with StaticBranchPrediction disabled:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 3110.0 3771.0 +21.3% 0.82x (?)

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2689.0 4224.0 +57.1% 0.64x (?)
FlattenListLoop 1411.0 1556.0 +10.3% 0.91x (?)

Code size: -Osize

Performance (x86_64): -Onone

Improvement OLD NEW DELTA RATIO
ArrayAppendGenericStructs 1402.5 1112.5 -20.7% 1.26x (?)

Code size: -swiftlibs

@kavon
Copy link
Member Author

kavon commented Jul 30, 2024

@swift-ci Please benchmark

Copy link
Contributor

@aschwaighofer aschwaighofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation of the llvm's LowerExpectIntrinsics pass states the following.

// These default values are chosen to represent an extremely skewed outcome for 
// a condition, but they leave some room for interpretation by later passes.    
//                                                                              
// If the documentation for __builtin_expect() was made explicit that it should 
// only be used in extreme cases, we could make this ratio higher. As it stands,
// programmers may be using __builtin_expect() / llvm.expect to annotate that a 
// branch is likely or unlikely to be taken.                                    
                                                                                
// WARNING: these values are internal implementation detail of the pass.        
// They should not be exposed to the outside of the pass, front-end codegen     
// should emit @llvm.expect intrinsics instead of using these weights directly. 

And then use the values 1 and 2000 as the branch weights.

Maybe, we should follow that recommendation and use the llvm.expect intrinsic to be guarded against potential llvm changes in the interpretation of the branch weights.

@kavon
Copy link
Member Author

kavon commented Jul 31, 2024

Looks like if you ignore the ones with noise (marked with ? below), enabling it shows a code size improvement in the micro benchmarks:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2306.0 3834.0 +66.3% 0.60x (?)
SortAdjacentIntPyramids 782.5 944.167 +20.7% 0.83x (?)
String.replaceSubrange.String 9.362 10.776 +15.1% 0.87x (?)
CxxStringConversion.cxx.to.swift 100.625 112.5 +11.8% 0.89x (?)
Phonebook 978.727 1090.6 +11.4% 0.90x (?)
SortStrings 471.0 515.5 +9.4% 0.91x (?)
String.initRepeating.longMixStr.Count100 702.0 761.333 +8.5% 0.92x (?)
StringBuilder 235.25 254.857 +8.3% 0.92x (?)
ArrayAppendAscii 1821.429 1970.3 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Data.hash.Medium 28.531 24.27 -14.9% 1.18x (?)
StringFromLongWholeSubstring 3.043 2.717 -10.7% 1.12x (?)
Breadcrumbs.UTF16ToIdx.longASCII 43.588 39.382 -9.6% 1.11x (?)
StringComparison_zalgo 41925.0 37900.0 -9.6% 1.11x (?)
NormalizedIterator_fastPrenormal 566.75 517.907 -8.6% 1.09x (?)
LineSink.bytes.complex 70.25 64.229 -8.6% 1.09x (?)
LineSink.scalars.complex 70.152 64.162 -8.5% 1.09x (?)
Breadcrumbs.UTF16ToIdx.longMixed 344.833 318.286 -7.7% 1.08x (?)
StringDistance.scalars.mixed 36.483 33.69 -7.7% 1.08x (?)
StringHashing_zalgo 2401.667 2221.875 -7.5% 1.08x (?)
StringDistance.scalars.ascii 751.333 697.333 -7.2% 1.08x (?)
NaiveRRC.init.largeContiguous 6.059 5.65 -6.7% 1.07x (?)

Code size: -O

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
Array.removeAll.keepingCapacity.Object 6.331 7.404 +16.9% 0.86x (?)
FlattenListFlatMap 3749.0 4198.0 +12.0% 0.89x (?)
FindString.Rec3.Substring 96.842 107.056 +10.5% 0.90x (?)
FindString.Loop1.Substring 215.818 236.889 +9.8% 0.91x (?)
Set.isDisjoint.Empty.Int 81.737 89.286 +9.2% 0.92x (?)
Set.isDisjoint.Seq.Empty.Int 81.75 89.294 +9.2% 0.92x (?)
Phonebook 994.0 1079.556 +8.6% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Data.init.Sequence.64kB.Count 58.0 29.455 -49.2% 1.97x
Data.init.Sequence.64kB.Count.I 58.0 29.467 -49.2% 1.97x
Data.append.Sequence.64kB.Count.RE 44.222 22.769 -48.5% 1.94x
Data.append.Sequence.64kB.Count.RE.I 44.231 22.853 -48.3% 1.94x
Data.init.Sequence.2049B.Count.I 99.375 52.477 -47.2% 1.89x
Data.init.Sequence.2047B.Count.I 99.313 52.45 -47.2% 1.89x
Data.init.Sequence.809B.Count.I 85.0 50.12 -41.0% 1.70x
Data.init.Sequence.809B.Count 85.0 50.28 -40.8% 1.69x
Data.init.Sequence.513B.Count.I 95.917 61.96 -35.4% 1.55x (?)
Data.init.Sequence.511B.Count.I 93.938 61.5 -34.5% 1.53x
DataAppendSequence 8836.842 6330.0 -28.4% 1.40x (?)
Data.append.Sequence.809B.Count.RE.I 86.278 64.143 -25.7% 1.35x (?)
Data.append.Sequence.809B.Count.RE 86.667 64.45 -25.6% 1.34x (?)
Data.hash.Medium 28.733 24.415 -15.0% 1.18x (?)
NibbleSort 1805.0 1555.0 -13.9% 1.16x (?)
Breadcrumbs.UTF16ToIdx.longASCII 44.68 39.345 -11.9% 1.14x (?)
SequenceAlgosArray 1927.5 1711.429 -11.2% 1.13x (?)
StringComparison_zalgo 41700.0 37700.0 -9.6% 1.11x (?)
NormalizedIterator_fastPrenormal 541.136 493.542 -8.8% 1.10x (?)
Breadcrumbs.UTF16ToIdx.longMixed 348.667 321.571 -7.8% 1.08x (?)
SortIntPyramid 640.556 591.364 -7.7% 1.08x (?)
NormalizedIterator_zalgo 2480.435 2292.708 -7.6% 1.08x (?)
StringHashing_zalgo 2393.75 2216.176 -7.4% 1.08x (?)
RGBHistogram 1788.571 1662.857 -7.0% 1.08x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
DataSubscriptMedium 52.25 59.909 +14.7% 0.87x (?)
ObjectiveCBridgeStubDateMutation 305.0 348.5 +14.3% 0.88x (?)
BinaryFloatingPointPropertiesUlp 65.29 74.1 +13.5% 0.88x (?)
String.replaceSubrange.String 10.594 11.864 +12.0% 0.89x (?)
Data.hash.Empty 94.333 102.933 +9.1% 0.92x (?)
String.replaceSubrange.Substring 12.302 13.394 +8.9% 0.92x (?)
String.initRepeating.longMixStr.Count100 697.0 754.0 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
SIMDReduce.Int8 7171.0 6431.0 -10.3% 1.12x (?)
Data.hash.Medium 32.095 29.0 -9.6% 1.11x (?)
StringComparison_zalgo 42800.0 39000.0 -8.9% 1.10x (?)
RandomDouble01LCG 17728.0 16187.0 -8.7% 1.10x (?)
NormalizedIterator_zalgo 2510.0 2326.471 -7.3% 1.08x (?)
StringHashing_zalgo 2451.667 2284.524 -6.8% 1.07x (?)
StringEnumRawValueInitialization 7572.0 7068.0 -6.7% 1.07x (?)
CharIteration_punctuated_unicodeScalars_Backwards 38000.0 35500.0 -6.6% 1.07x (?)

Code size: -swiftlibs

@kavon
Copy link
Member Author

kavon commented Jul 31, 2024

Maybe, we should follow that recommendation and use the llvm.expect intrinsic to be guarded against potential llvm changes in the interpretation of the branch weights.

@aschwaighofer I think we don't need to worry about a change to how the branch_weight metadata is interpreted, since we should see it right away with how our existing PGO emission works (which emits that metadata).

I think for this PR, we should prefer llvm.expect for the miscellaneous static predictions, so they keep up with any changes in the 1:2000 ratio,, but for the try_apply's emitted by SILGen, we leave them annotated with our SIL-equivalent branch_weights so that SIL passes have access to that data in a uniform way (whether that data came from actual profile data, or from these predictions)

@aschwaighofer
Copy link
Contributor

Is anyone using swift PGO?

how would we “see” if llvm’s interpretation change

hopefully nobody uses exceptions on the expected path — ie in which desired situation would profiling tell us otherwise — why should we design for this undesirable edge case.

@kavon kavon force-pushed the static-branch-prediction branch from e5a48d8 to 8b569c4 Compare August 7, 2024 21:30
@kavon
Copy link
Member Author

kavon commented Aug 7, 2024

@swift-ci test

@kavon kavon marked this pull request as ready for review August 7, 2024 21:31
@kavon kavon changed the title Static branch prediction Throws Prediction + HotColdSplitting Aug 7, 2024
@kavon kavon force-pushed the static-branch-prediction branch from 8b569c4 to a4c7df4 Compare August 8, 2024 15:20
@kavon
Copy link
Member Author

kavon commented Aug 8, 2024

@swift-ci test

Copy link
Contributor

@aschwaighofer aschwaighofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning of hooking up PGO to set the new ProfileCounter arguments of the try_apply?

An alternative approach could be to use either the profile counters or llvm.expect in IRGen depending on whether a ProfileCounter (eventually by PGO) is set or EnableThrowsPrediction is on.

But I am nitpicking.

This lgtm.

@kavon
Copy link
Member Author

kavon commented Aug 8, 2024

how would we “see” if llvm’s interpretation change

I think we have some regression tests, but if not, we can add them to see, e.g., if profile_weight continues to cause the blocks to change order.

hopefully nobody uses exceptions on the expected path

I plan to evaluate these new performance flags to see how it affects "ordinary" code. So in this PR everything's off by default.

I think it's reasonable to expect no one is using throw as fast-path control-flow. Using it to exit multiple nested control-flow structures (like a nested loops) is better done with break label. If people are using throws to catch non-error cases, then that's definitely unusual, since the type thrown has to conform to Error.

@kavon
Copy link
Member Author

kavon commented Aug 8, 2024

Are you planning of hooking up PGO to set the new ProfileCounter arguments of the try_apply?

@aschwaighofer yup that's the goal! This is a first step in that process.

@kavon kavon enabled auto-merge August 8, 2024 15:40
@kavon kavon force-pushed the static-branch-prediction branch from a4c7df4 to cafabc0 Compare August 8, 2024 19:07
@kavon
Copy link
Member Author

kavon commented Aug 8, 2024

@swift-ci smoke test

kavon added 2 commits August 8, 2024 21:21
Some calls to throwing functions aren't represented with `try_apply` in
SIL, so we generate llvm.expect's when throw prediction is enabled.
@kavon kavon force-pushed the static-branch-prediction branch from cafabc0 to baf87b4 Compare August 9, 2024 01:22
@kavon
Copy link
Member Author

kavon commented Aug 9, 2024

@swift-ci smoke test

@kavon kavon force-pushed the static-branch-prediction branch from baf87b4 to 11e8bb8 Compare August 9, 2024 16:09
@kavon
Copy link
Member Author

kavon commented Aug 9, 2024

@swift-ci smoke test

@kavon kavon merged commit 9d69f2b into swiftlang:main Aug 9, 2024
3 checks passed
@kavon kavon deleted the static-branch-prediction branch August 9, 2024 20:28
@rjmccall
Copy link
Contributor

rjmccall commented Aug 9, 2024

I'm not sure what criterion you used here to decide that some of these results are noise and others aren't. The wins here are very good, but it would be good to understand why it's a 10% regression on some benchmarks, including some fairly weighty ones like FlattenListFlatMap.

@kavon
Copy link
Member Author

kavon commented Aug 13, 2024

@rjmccall I plan to run the benchmarks with more samples on a local machine to see what really is noise or not. The (?)-marked benchmarks I suggested are noise came from the benchmark run's analysis tool itself. The regression in FlattenListFlatMap is odd since it doesn't appear to have any throwing functions at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants