Throws Prediction + HotColdSplitting #75382

kavon · 2024-07-20T03:56:10Z

It's reasonable to statically assume that errors are not typically thrown by functions. By making this assumption, we can annotate blocks of code dealing with errors as cold in LLVM. This aids in block layout among other things.

kavon · 2024-07-20T03:57:31Z

@swift-ci Please test compiler performance

kavon · 2024-07-23T18:11:46Z

@swift-ci Please benchmark

kavon · 2024-07-30T20:43:52Z

@swift-ci Please benchmark

kavon · 2024-07-30T22:02:20Z

Benchmark results with StaticBranchPrediction disabled:

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	3110.0	3771.0	+21.3%	0.82x (?)

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	2689.0	4224.0	+57.1%	0.64x (?)
FlattenListLoop	1411.0	1556.0	+10.3%	0.91x (?)

Code size: -Osize

Performance (x86_64): -Onone

Improvement	OLD	NEW	DELTA	RATIO
ArrayAppendGenericStructs	1402.5	1112.5	-20.7%	1.26x (?)

Code size: -swiftlibs

kavon · 2024-07-30T22:03:36Z

@swift-ci Please benchmark

aschwaighofer

The documentation of the llvm's LowerExpectIntrinsics pass states the following.

// These default values are chosen to represent an extremely skewed outcome for 
// a condition, but they leave some room for interpretation by later passes.    
//                                                                              
// If the documentation for __builtin_expect() was made explicit that it should 
// only be used in extreme cases, we could make this ratio higher. As it stands,
// programmers may be using __builtin_expect() / llvm.expect to annotate that a 
// branch is likely or unlikely to be taken.                                    
                                                                                
// WARNING: these values are internal implementation detail of the pass.        
// They should not be exposed to the outside of the pass, front-end codegen     
// should emit @llvm.expect intrinsics instead of using these weights directly.

And then use the values 1 and 2000 as the branch weights.

Maybe, we should follow that recommendation and use the llvm.expect intrinsic to be guarded against potential llvm changes in the interpretation of the branch weights.

kavon · 2024-07-31T16:55:34Z

Looks like if you ignore the ones with noise (marked with ? below), enabling it shows a code size improvement in the micro benchmarks:

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	2306.0	3834.0	+66.3%	0.60x (?)
SortAdjacentIntPyramids	782.5	944.167	+20.7%	0.83x (?)
String.replaceSubrange.String	9.362	10.776	+15.1%	0.87x (?)
CxxStringConversion.cxx.to.swift	100.625	112.5	+11.8%	0.89x (?)
Phonebook	978.727	1090.6	+11.4%	0.90x (?)
SortStrings	471.0	515.5	+9.4%	0.91x (?)
String.initRepeating.longMixStr.Count100	702.0	761.333	+8.5%	0.92x (?)
StringBuilder	235.25	254.857	+8.3%	0.92x (?)
ArrayAppendAscii	1821.429	1970.3	+8.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Data.hash.Medium	28.531	24.27	-14.9%	1.18x (?)
StringFromLongWholeSubstring	3.043	2.717	-10.7%	1.12x (?)
Breadcrumbs.UTF16ToIdx.longASCII	43.588	39.382	-9.6%	1.11x (?)
StringComparison_zalgo	41925.0	37900.0	-9.6%	1.11x (?)
NormalizedIterator_fastPrenormal	566.75	517.907	-8.6%	1.09x (?)
LineSink.bytes.complex	70.25	64.229	-8.6%	1.09x (?)
LineSink.scalars.complex	70.152	64.162	-8.5%	1.09x (?)
Breadcrumbs.UTF16ToIdx.longMixed	344.833	318.286	-7.7%	1.08x (?)
StringDistance.scalars.mixed	36.483	33.69	-7.7%	1.08x (?)
StringHashing_zalgo	2401.667	2221.875	-7.5%	1.08x (?)
StringDistance.scalars.ascii	751.333	697.333	-7.2%	1.08x (?)
NaiveRRC.init.largeContiguous	6.059	5.65	-6.7%	1.07x (?)

Code size: -O

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
Array.removeAll.keepingCapacity.Object	6.331	7.404	+16.9%	0.86x (?)
FlattenListFlatMap	3749.0	4198.0	+12.0%	0.89x (?)
FindString.Rec3.Substring	96.842	107.056	+10.5%	0.90x (?)
FindString.Loop1.Substring	215.818	236.889	+9.8%	0.91x (?)
Set.isDisjoint.Empty.Int	81.737	89.286	+9.2%	0.92x (?)
Set.isDisjoint.Seq.Empty.Int	81.75	89.294	+9.2%	0.92x (?)
Phonebook	994.0	1079.556	+8.6%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Data.init.Sequence.64kB.Count	58.0	29.455	-49.2%	1.97x
Data.init.Sequence.64kB.Count.I	58.0	29.467	-49.2%	1.97x
Data.append.Sequence.64kB.Count.RE	44.222	22.769	-48.5%	1.94x
Data.append.Sequence.64kB.Count.RE.I	44.231	22.853	-48.3%	1.94x
Data.init.Sequence.2049B.Count.I	99.375	52.477	-47.2%	1.89x
Data.init.Sequence.2047B.Count.I	99.313	52.45	-47.2%	1.89x
Data.init.Sequence.809B.Count.I	85.0	50.12	-41.0%	1.70x
Data.init.Sequence.809B.Count	85.0	50.28	-40.8%	1.69x
Data.init.Sequence.513B.Count.I	95.917	61.96	-35.4%	1.55x (?)
Data.init.Sequence.511B.Count.I	93.938	61.5	-34.5%	1.53x
DataAppendSequence	8836.842	6330.0	-28.4%	1.40x (?)
Data.append.Sequence.809B.Count.RE.I	86.278	64.143	-25.7%	1.35x (?)
Data.append.Sequence.809B.Count.RE	86.667	64.45	-25.6%	1.34x (?)
Data.hash.Medium	28.733	24.415	-15.0%	1.18x (?)
NibbleSort	1805.0	1555.0	-13.9%	1.16x (?)
Breadcrumbs.UTF16ToIdx.longASCII	44.68	39.345	-11.9%	1.14x (?)
SequenceAlgosArray	1927.5	1711.429	-11.2%	1.13x (?)
StringComparison_zalgo	41700.0	37700.0	-9.6%	1.11x (?)
NormalizedIterator_fastPrenormal	541.136	493.542	-8.8%	1.10x (?)
Breadcrumbs.UTF16ToIdx.longMixed	348.667	321.571	-7.8%	1.08x (?)
SortIntPyramid	640.556	591.364	-7.7%	1.08x (?)
NormalizedIterator_zalgo	2480.435	2292.708	-7.6%	1.08x (?)
StringHashing_zalgo	2393.75	2216.176	-7.4%	1.08x (?)
RGBHistogram	1788.571	1662.857	-7.0%	1.08x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
DataSubscriptMedium	52.25	59.909	+14.7%	0.87x (?)
ObjectiveCBridgeStubDateMutation	305.0	348.5	+14.3%	0.88x (?)
BinaryFloatingPointPropertiesUlp	65.29	74.1	+13.5%	0.88x (?)
String.replaceSubrange.String	10.594	11.864	+12.0%	0.89x (?)
Data.hash.Empty	94.333	102.933	+9.1%	0.92x (?)
String.replaceSubrange.Substring	12.302	13.394	+8.9%	0.92x (?)
String.initRepeating.longMixStr.Count100	697.0	754.0	+8.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
SIMDReduce.Int8	7171.0	6431.0	-10.3%	1.12x (?)
Data.hash.Medium	32.095	29.0	-9.6%	1.11x (?)
StringComparison_zalgo	42800.0	39000.0	-8.9%	1.10x (?)
RandomDouble01LCG	17728.0	16187.0	-8.7%	1.10x (?)
NormalizedIterator_zalgo	2510.0	2326.471	-7.3%	1.08x (?)
StringHashing_zalgo	2451.667	2284.524	-6.8%	1.07x (?)
StringEnumRawValueInitialization	7572.0	7068.0	-6.7%	1.07x (?)
CharIteration_punctuated_unicodeScalars_Backwards	38000.0	35500.0	-6.6%	1.07x (?)

Code size: -swiftlibs

kavon · 2024-07-31T17:25:45Z

Maybe, we should follow that recommendation and use the llvm.expect intrinsic to be guarded against potential llvm changes in the interpretation of the branch weights.

@aschwaighofer I think we don't need to worry about a change to how the branch_weight metadata is interpreted, since we should see it right away with how our existing PGO emission works (which emits that metadata).

I think for this PR, we should prefer llvm.expect for the miscellaneous static predictions, so they keep up with any changes in the 1:2000 ratio,, but for the try_apply's emitted by SILGen, we leave them annotated with our SIL-equivalent branch_weights so that SIL passes have access to that data in a uniform way (whether that data came from actual profile data, or from these predictions)

aschwaighofer · 2024-08-02T18:44:23Z

Is anyone using swift PGO?

how would we “see” if llvm’s interpretation change

hopefully nobody uses exceptions on the expected path — ie in which desired situation would profiling tell us otherwise — why should we design for this undesirable edge case.

kavon · 2024-08-07T21:31:00Z

@swift-ci test

kavon · 2024-08-08T15:21:03Z

@swift-ci test

aschwaighofer

Are you planning of hooking up PGO to set the new ProfileCounter arguments of the try_apply?

An alternative approach could be to use either the profile counters or llvm.expect in IRGen depending on whether a ProfileCounter (eventually by PGO) is set or EnableThrowsPrediction is on.

But I am nitpicking.

This lgtm.

kavon · 2024-08-08T15:35:50Z

how would we “see” if llvm’s interpretation change

I think we have some regression tests, but if not, we can add them to see, e.g., if profile_weight continues to cause the blocks to change order.

hopefully nobody uses exceptions on the expected path

I plan to evaluate these new performance flags to see how it affects "ordinary" code. So in this PR everything's off by default.

I think it's reasonable to expect no one is using throw as fast-path control-flow. Using it to exit multiple nested control-flow structures (like a nested loops) is better done with break label. If people are using throws to catch non-error cases, then that's definitely unusual, since the type thrown has to conform to Error.

kavon · 2024-08-08T15:40:06Z

Are you planning of hooking up PGO to set the new ProfileCounter arguments of the try_apply?

@aschwaighofer yup that's the goal! This is a first step in that process.

kavon · 2024-08-08T19:08:11Z

@swift-ci smoke test

Some calls to throwing functions aren't represented with `try_apply` in SIL, so we generate llvm.expect's when throw prediction is enabled.

kavon · 2024-08-09T01:23:02Z

@swift-ci smoke test

kavon · 2024-08-09T16:11:54Z

@swift-ci smoke test

rjmccall · 2024-08-09T21:43:51Z

I'm not sure what criterion you used here to decide that some of these results are noise and others aren't. The wins here are very good, but it would be good to understand why it's a 10% regression on some benchmarks, including some fairly weighty ones like FlattenListFlatMap.

kavon · 2024-08-13T21:38:56Z

@rjmccall I plan to run the benchmarks with more samples on a local machine to see what really is noise or not. The (?)-marked benchmarks I suggested are noise came from the benchmark run's analysis tool itself. The regression in FlattenListFlatMap is odd since it doesn't appear to have any throwing functions at all.

kavon force-pushed the static-branch-prediction branch from 4dfd362 to 51baeaf Compare July 30, 2024 20:43

kavon requested a review from aschwaighofer July 30, 2024 21:19

aschwaighofer reviewed Jul 31, 2024

View reviewed changes

kavon force-pushed the static-branch-prediction branch from e5a48d8 to 8b569c4 Compare August 7, 2024 21:30

kavon marked this pull request as ready for review August 7, 2024 21:31

kavon requested review from eeckstein, rjmccall, jckarter, artemcm, tshortli, hborla, slavapestov and xedin as code owners August 7, 2024 21:31

kavon changed the title ~~Static branch prediction~~ Throws Prediction + HotColdSplitting Aug 7, 2024

NFC: throws prediction frontend flag

d373f6f

kavon force-pushed the static-branch-prediction branch from 8b569c4 to a4c7df4 Compare August 8, 2024 15:20

aschwaighofer approved these changes Aug 8, 2024

View reviewed changes

kavon enabled auto-merge August 8, 2024 15:40

kavon force-pushed the static-branch-prediction branch from a4c7df4 to cafabc0 Compare August 8, 2024 19:07

kavon added 2 commits August 8, 2024 21:21

SIL: branch weights for try_apply's

0cce602

IRGen: throws prediction for non-SIL calls

885b758

Some calls to throwing functions aren't represented with `try_apply` in SIL, so we generate llvm.expect's when throw prediction is enabled.

kavon force-pushed the static-branch-prediction branch from cafabc0 to baf87b4 Compare August 9, 2024 01:22

IRGen: -enable-split-cold-code frontend feature

11e8bb8

kavon force-pushed the static-branch-prediction branch from baf87b4 to 11e8bb8 Compare August 9, 2024 16:09

kavon merged commit 9d69f2b into swiftlang:main Aug 9, 2024
3 checks passed

kavon deleted the static-branch-prediction branch August 9, 2024 20:28

Throws Prediction + HotColdSplitting #75382

Throws Prediction + HotColdSplitting #75382

Uh oh!

Conversation

kavon commented Jul 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kavon commented Jul 20, 2024

Uh oh!

kavon commented Jul 23, 2024

Uh oh!

kavon commented Jul 30, 2024

Uh oh!

kavon commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

kavon commented Jul 30, 2024

Uh oh!

aschwaighofer left a comment

Choose a reason for hiding this comment

Uh oh!

kavon commented Jul 31, 2024

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

kavon commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aschwaighofer commented Aug 2, 2024

Uh oh!

kavon commented Aug 7, 2024

Uh oh!

kavon commented Aug 8, 2024

Uh oh!

aschwaighofer left a comment

Choose a reason for hiding this comment

Uh oh!

kavon commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kavon commented Aug 8, 2024

Uh oh!

kavon commented Aug 8, 2024

Uh oh!

kavon commented Aug 9, 2024

Uh oh!

kavon commented Aug 9, 2024

Uh oh!

Uh oh!

rjmccall commented Aug 9, 2024

Uh oh!

kavon commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kavon commented Jul 20, 2024 •

edited

Loading

kavon commented Jul 30, 2024 •

edited

Loading

kavon commented Jul 31, 2024 •

edited

Loading

kavon commented Aug 8, 2024 •

edited

Loading

kavon commented Aug 13, 2024 •

edited

Loading