[stdlib] Add another fast path for generic floating-point conversion #33826

xwu · 2020-09-05T17:28:13Z

Based on my reading of IEEE 754, for any given w (exponent bit width) and p (precision, or significand bit width + 1), there can be one and only one encoding of any finite value [edit: or ±∞] in a binary interchange format with those parameters.

Therefore, we can use init(sign:exponentBitPattern:significandBitPattern:) to convert from any binary floating-point type with w and p parameters that match a known standard library type's parameters to the latter type. From there, we can use the existing protocol requirements to convert to Self.

In a perfect world, since everything is inlinable, the compiler would be able to see that essentially the whole conversion involves bit shifting values in opposite directions and converting between unsigned integer types that are type aliases of each other, turning everything into no-ops. But I would imagine that there is some overhead here that can't quite be eliminated.

There is a companion PR (#33821) with benchmarks which [edit: has been merged] to test how well this fast path does the job. [edit: Either the benchmarks are too-good-to-be-true, or the compiler is really doing a fantastic job of inlining.]

xwu · 2020-09-05T17:29:26Z

cc @troughton

xwu · 2020-09-06T18:13:58Z

@swift-ci smoke test

xwu · 2020-09-06T21:29:12Z

@swift-ci benchmark

swift-ci · 2020-09-06T22:19:46Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	2373	3023	+27.4%	0.78x (?)
UTF8Decode_InitDecoding	140	172	+22.9%	0.81x
UTF8Decode_InitFromCustom_contiguous	143	173	+21.0%	0.83x (?)
ParseFloat.Float.Exp	8	9	+12.5%	0.89x (?)
FlattenListLoop	927	1022	+10.2%	0.91x (?)
UTF8Decode_InitFromCustom_noncontiguous	298	326	+9.4%	0.91x (?)
FloatingPointPrinting_Float_description_uniform	3500	3800	+8.6%	0.92x (?)
StringHashing_latin1	214	232	+8.4%	0.92x (?)
ArrayInClass	915	990	+8.2%	0.92x (?)
StringHashing_fastPrenormal	620	670	+8.1%	0.93x (?)
ParseInt.IntSmall.Decimal	263	284	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2727	18	-99.3%	151.49x
EqualSubstringSubstring	29	22	-24.1%	1.32x
LessSubstringSubstring	29	22	-24.1%	1.32x
EqualStringSubstring	29	22	-24.1%	1.32x
EqualSubstringSubstringGenericEquatable	29	22	-24.1%	1.32x (?)
EqualSubstringString	29	22	-24.1%	1.32x
LessSubstringSubstringGenericComparable	29	22	-24.1%	1.32x
StringComparison_longSharedPrefix	378	342	-9.5%	1.11x (?)
Set.isSubset.Seq.Empty.Int	87	79	-9.2%	1.10x
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x (?)
ArrayAppendUTF16Substring	33588	30780	-8.4%	1.09x (?)
ArrayAppendAsciiSubstring	33588	30780	-8.4%	1.09x (?)
ArrayAppendLatin1Substring	34236	31392	-8.3%	1.09x (?)
Set.isDisjoint.Empty.Box	94	87	-7.4%	1.08x (?)
Set.isSuperset.Seq.Int.Empty	94	87	-7.4%	1.08x (?)
Set.isDisjoint.Seq.Empty.Box	94	87	-7.4%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	140	172	+22.9%	0.81x
UTF8Decode_InitFromCustom_contiguous	141	169	+19.9%	0.83x
UTF8Decode_InitFromCustom_noncontiguous	260	291	+11.9%	0.89x (?)
ArrayAppendUTF16Substring	32580	35280	+8.3%	0.92x (?)
StringHashing_fastPrenormal	620	670	+8.1%	0.93x (?)
ArrayAppendAsciiSubstring	32580	35136	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2737	20	-99.3%	136.84x
EqualSubstringSubstring	29	22	-24.1%	1.32x
LessSubstringSubstring	29	22	-24.1%	1.32x
EqualStringSubstring	29	22	-24.1%	1.32x
EqualSubstringSubstringGenericEquatable	29	22	-24.1%	1.32x
EqualSubstringString	29	22	-24.1%	1.32x
LessSubstringSubstringGenericComparable	29	22	-24.1%	1.32x
Set.isSuperset.Seq.Int.Empty	95	84	-11.6%	1.13x (?)
FlattenListLoop	1548	1386	-10.5%	1.12x (?)
StringComparison_longSharedPrefix	379	341	-10.0%	1.11x (?)
Data.hash.Medium	33	30	-9.1%	1.10x (?)
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x (?)
DropFirstAnySequenceLazy	1768	1643	-7.1%	1.08x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	149	180	+20.8%	0.83x
UTF8Decode_InitFromCustom_contiguous	154	182	+18.2%	0.85x
ConvertFloatingPoint.GenericDoubleToDouble	1143	1280	+12.0%	0.89x (?)
ArrayOfPOD	652	726	+11.3%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	4450	2250	-49.4%	1.98x
EqualSubstringSubstringGenericEquatable	33	26	-21.2%	1.27x
LessSubstringSubstringGenericComparable	33	26	-21.2%	1.27x
EqualSubstringSubstring	34	27	-20.6%	1.26x
LessSubstringSubstring	34	27	-20.6%	1.26x
EqualStringSubstring	34	27	-20.6%	1.26x (?)
EqualSubstringString	34	28	-17.6%	1.21x
Set.subtracting.Seq.Empty.Box	1157	1031	-10.9%	1.12x (?)
Set.subtracting.Empty.Box	221	197	-10.9%	1.12x (?)
StringAdder	426	381	-10.6%	1.12x (?)
ArrayAppendAscii	25874	23290	-10.0%	1.11x (?)
Set.isDisjoint.Empty.Box	1074	973	-9.4%	1.10x (?)
ArrayAppendLatin1	25704	23290	-9.4%	1.10x (?)
Set.isDisjoint.Box.Empty	1181	1074	-9.1%	1.10x (?)
Set.isDisjoint.Seq.Empty.Box	914	832	-9.0%	1.10x (?)
Set.subtracting.Box.Empty	246	225	-8.5%	1.09x (?)
ArrayAppendAsciiSubstring	47592	43596	-8.4%	1.09x (?)
SetIsSubsetBox0	1233	1131	-8.3%	1.09x (?)
Set.isDisjoint.Seq.Box.Empty	1128	1035	-8.2%	1.09x (?)
NSDictionaryCastToSwift	3460	3190	-7.8%	1.08x (?)
ArrayPlusEqualFiveElementCollection	176601	164724	-6.7%	1.07x (?)
ArrayValueProp4	5203	4860	-6.6%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

xwu · 2020-09-06T22:23:17Z

Wow. I did not expect that. Wow.

stdlib/public/core/FloatingPoint.swift

stephentyrone · 2020-09-07T12:13:08Z

As written, the approach is sound.

The main question I have is whether it makes sense to simply use an unsafeBitCast, which would eliminate some tests from the fast path (and eliminate potential slow paths entirely). That's formally unsound, but in a way that's fundamentally uninteresting and unlikely to cause problems. In particular, we could tighten the semantic requirements of BinaryFloatingPoint to specify the layout, and make the problem go away.

Also, would that then allow us to get rid of the switch added in the previous change?

xwu · 2020-09-07T14:43:46Z

I think the unsafeBitCast would certainly be a practical approach, but unless my eyes deceive me, the compiler is doing a fantastic job with this safe code. I'm relieved that the approach here is sound.

I have already refactored to merge both switches into one that matches on encoding parameters only, which actually might be faster (it appeared, with the previous change, that there was some difference between concrete and generic Double-to-Double conversion ~~which I would hypothesize arises from not being able to optimize away the first dynamic check whether the value can be cast to Float—I haven't inspected the generated code, however~~ [edit: apparently due to the issue outlined in #33839, though how this particular refactoring here sidesteps the optimizer issue is not clear to me]).

I've changed the fast path condition from value.isFinite to !value.isNaN, since the IEEE 754 specification also fixes the encoding of infinities.

I'll run the benchmarks again to see where this all shakes out.

stdlib/public/core/FloatingPoint.swift

        }
      }
-      self = Self._convert(from: value).value
+#endif
+    case (8, 7):


stdlib/public/core/FloatingPoint.swift

xwu · 2020-09-07T15:04:07Z

@swift-ci test

xwu · 2020-09-07T15:04:21Z

@swift-ci benchmark

swift-ci · 2020-09-07T15:39:49Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	110	142	+29.1%	0.77x
UTF8Decode_InitFromCustom_contiguous	141	171	+21.3%	0.82x
UTF8Decode_InitDecoding	141	170	+20.6%	0.83x
ParseInt.IntSmall.Decimal	263	285	+8.4%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2726	19	-99.3%	143.47x
ConvertFloatingPoint.GenericDoubleToDouble	59	15	-74.6%	3.93x
EqualSubstringSubstring	29	22	-24.1%	1.32x
LessSubstringSubstring	29	22	-24.1%	1.32x (?)
EqualStringSubstring	29	22	-24.1%	1.32x
EqualSubstringSubstringGenericEquatable	29	22	-24.1%	1.32x
EqualSubstringString	29	22	-24.1%	1.32x
LessSubstringSubstringGenericComparable	29	22	-24.1%	1.32x
StringComparison_longSharedPrefix	378	342	-9.5%	1.11x (?)
FlattenListLoop	1022	933	-8.7%	1.10x (?)
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x (?)
ArrayAppendUTF16Substring	33552	30780	-8.3%	1.09x (?)
Set.isSubset.Seq.Empty.Int	87	80	-8.0%	1.09x (?)
Set.isDisjoint.Empty.Box	94	87	-7.4%	1.08x (?)
Set.isSuperset.Seq.Int.Empty	94	87	-7.4%	1.08x (?)
Set.isDisjoint.Seq.Empty.Box	94	87	-7.4%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
StringFromLongWholeSubstringGeneric	3	4	+33.3%	0.75x
AngryPhonebook.ASCII2	110	142	+29.1%	0.77x
UTF8Decode_InitFromCustom_contiguous	142	170	+19.7%	0.84x (?)
UTF8Decode_InitDecoding	142	169	+19.0%	0.84x (?)
UTF8Decode_InitFromCustom_noncontiguous	262	290	+10.7%	0.90x (?)
ArrayAppendUTF16Substring	32580	35388	+8.6%	0.92x (?)
ArrayAppendLatin1Substring	33300	36036	+8.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2730	19	-99.3%	143.68x
ConvertFloatingPoint.GenericDoubleToDouble	59	15	-74.6%	3.93x
EqualSubstringSubstring	29	22	-24.1%	1.32x
LessSubstringSubstring	29	22	-24.1%	1.32x
EqualStringSubstring	29	22	-24.1%	1.32x
EqualSubstringSubstringGenericEquatable	29	22	-24.1%	1.32x
EqualSubstringString	29	22	-24.1%	1.32x
LessSubstringSubstringGenericComparable	29	22	-24.1%	1.32x
StringComparison_longSharedPrefix	379	340	-10.3%	1.11x (?)
Data.hash.Medium	33	30	-9.1%	1.10x (?)
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Box	95	88	-7.4%	1.08x (?)
Set.isSuperset.Seq.Int.Empty	95	88	-7.4%	1.08x (?)
Set.isStrictSubset.Int.Empty	56	52	-7.1%	1.08x (?)
PrefixAnySequenceLazy	1327	1233	-7.1%	1.08x (?)
PrefixWhileAnySequenceLazy	1328	1234	-7.1%	1.08x (?)
DropFirstAnySequenceLazy	1768	1643	-7.1%	1.08x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	110	142	+29.1%	0.77x
UTF8Decode_InitFromCustom_contiguous	155	185	+19.4%	0.84x
UTF8Decode_InitDecoding	151	180	+19.2%	0.84x

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	4443	1701	-61.7%	2.61x
LessSubstringSubstring	35	27	-22.9%	1.30x
EqualSubstringSubstringGenericEquatable	33	26	-21.2%	1.27x
LessSubstringSubstringGenericComparable	33	26	-21.2%	1.27x
EqualSubstringSubstring	34	27	-20.6%	1.26x
EqualStringSubstring	35	28	-20.0%	1.25x
EqualSubstringString	35	28	-20.0%	1.25x
ArrayAppendAsciiSubstring	47772	42444	-11.2%	1.13x (?)
ArrayAppendLatin1Substring	48204	42876	-11.1%	1.12x (?)
Set.subtracting.Seq.Empty.Box	1157	1032	-10.8%	1.12x
Set.isDisjoint.Seq.Empty.Box	915	819	-10.5%	1.12x
Set.subtracting.Empty.Box	222	200	-9.9%	1.11x
ArrayAppendAscii	25840	23290	-9.9%	1.11x (?)
DropFirstAnySequenceLazy	9461	8541	-9.7%	1.11x (?)
StringWalk	3320	3000	-9.6%	1.11x (?)
ArrayAppendLatin1	25738	23290	-9.5%	1.11x (?)
Set.subtracting.Box.Empty	247	225	-8.9%	1.10x
Set.isDisjoint.Box.Empty	1182	1078	-8.8%	1.10x (?)
Set.isDisjoint.Empty.Box	1073	979	-8.8%	1.10x (?)
Set.isDisjoint.Seq.Box.Empty	1124	1026	-8.7%	1.10x (?)
Set.subtracting.Seq.Empty.Int	554	508	-8.3%	1.09x (?)
StringAdder	423	389	-8.0%	1.09x (?)
SetIsSubsetBox0	1232	1135	-7.9%	1.09x (?)
StackPromo	75200	69300	-7.8%	1.09x (?)
DropWhileAnySequenceLazy	12352	11415	-7.6%	1.08x (?)
PrefixSequence	6959	6486	-6.8%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

xwu · 2020-09-08T00:10:55Z

// input.swift
func convert<
  T: BinaryFloatingPoint, U: BinaryFloatingPoint
>(_ value: T, to: U.Type) -> U {
  U(value)
}

func testDoubleToDouble(_ x: Double) -> Double {
  return convert(x, to: Double.self)
}

func testDoubleToFloat(_ x: Double) -> Float {
  return convert(x, to: Float.self)
}

The SIL generated for conversion between known stdlib types is perfect:

// testDoubleToDouble(_:)
sil hidden @$s5input012testDoubleToC0yS2dF : $@convention(thin) (Double) -> Double {
// %0 "x"                                         // users: %2, %3, %1
bb0(%0 : $Double):
  debug_value %0 : $Double, let, name "x", argno 1 // id: %1
  debug_value %0 : $Double, let, name "value", argno 1 // id: %2
  return %0 : $Double                             // id: %3
} // end sil function '$s5input012testDoubleToC0yS2dF'

// testDoubleToFloat(_:)
sil hidden @$s5input17testDoubleToFloatySfSdF : $@convention(thin) (Double) -> Float {
// %0 "x"                                         // users: %2, %3, %1
bb0(%0 : $Double):
  debug_value %0 : $Double, let, name "x", argno 1 // id: %1
  debug_value %0 : $Double, let, name "value", argno 1 // id: %2
  %3 = struct_extract %0 : $Double, #Double._value // user: %4
  %4 = builtin "fptrunc_FPIEEE64_FPIEEE32"(%3 : $Builtin.FPIEEE64) : $Builtin.FPIEEE32 // user: %5
  %5 = struct $Float (%4 : $Builtin.FPIEEE32)     // user: %6
  return %5 : $Float                              // id: %6
} // end sil function '$s5input17testDoubleToFloatySfSdF'

...for MockFloat64, not so much.

stephentyrone · 2020-09-08T13:38:19Z

Can we factor the !isNaN out of the switch?

…paths

xwu · 2020-09-08T16:13:16Z

@stephentyrone Yeah, I'm not totally satisfied with the control flow here, so I'm going to try something.

I don't think there's any harm--even theoretical--in preserving the encoding of any NaN values as-is between two types with the same exponent and significand bit width; there is the Float/Double/Float80-to-Self conversion afterwards in any case that will do its thing. Therefore, I'm going to delete the check for NaN and refactor accordingly.

It makes the whole implementation a lot more concise. I'll have to see if it affects code generation though.

xwu · 2020-09-08T16:30:30Z

@swift-ci test

xwu · 2020-09-08T16:33:24Z

@swift-ci benchmark

xwu · 2020-09-08T16:34:07Z

It makes the whole implementation a lot more concise. I'll have to see if it affects code generation though.

Code generation for standard library types is good. @swift-ci is being a little recalcitrant today.

stephentyrone · 2020-09-08T16:56:04Z

@swift-ci benchmark

stephentyrone · 2020-09-08T16:56:17Z

@swift-ci test

stephentyrone · 2020-09-08T16:56:32Z

@xwu CI outage. Should be restored now.

swift-ci · 2020-09-08T17:27:06Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	109	144	+32.1%	0.76x
EqualSubstringSubstring	22	29	+31.8%	0.76x
LessSubstringSubstring	22	29	+31.8%	0.76x (?)
EqualSubstringSubstringGenericEquatable	22	29	+31.8%	0.76x
EqualSubstringString	22	29	+31.8%	0.76x
LessSubstringSubstringGenericComparable	22	29	+31.8%	0.76x
EqualStringSubstring	23	29	+26.1%	0.79x
NopDeinit	8800	9900	+12.5%	0.89x
StringComparison_longSharedPrefix	341	379	+11.1%	0.90x (?)
ObjectiveCBridgeStringHash	82	89	+8.5%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2707	16	-99.4%	169.18x
Dictionary4	194	149	-23.2%	1.30x
Dictionary4OfObjects	223	184	-17.5%	1.21x
Set.isDisjoint.Int.Empty	59	53	-10.2%	1.11x (?)
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x
StringHashing_latin1	230	212	-7.8%	1.08x (?)
NSDictionaryCastToSwift	1440	1330	-7.6%	1.08x (?)
StringHashing_fastPrenormal	660	610	-7.6%	1.08x (?)
Set.isSuperset.Seq.Int.Empty	94	87	-7.4%	1.08x
Set.isStrictSubset.Empty.Int	139	129	-7.2%	1.08x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	109	144	+32.1%	0.76x
EqualSubstringSubstring	22	29	+31.8%	0.76x
LessSubstringSubstring	22	29	+31.8%	0.76x
EqualStringSubstring	22	29	+31.8%	0.76x (?)
EqualSubstringSubstringGenericEquatable	22	29	+31.8%	0.76x
EqualSubstringString	22	29	+31.8%	0.76x
LessSubstringSubstringGenericComparable	22	29	+31.8%	0.76x
StringComparison_longSharedPrefix	341	377	+10.6%	0.90x (?)
ObjectiveCBridgeStringHash	82	89	+8.5%	0.92x (?)
NSStringConversion.Long	612	662	+8.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	2706	16	-99.4%	169.11x
CharacterLiteralsLarge	71	63	-11.3%	1.13x (?)
ArrayLiteral2	116	104	-10.3%	1.12x (?)
StrComplexWalk	3280	2960	-9.8%	1.11x (?)
Set.isSubset.Seq.Empty.Int	88	80	-9.1%	1.10x (?)
PrefixWhileSequence	234	213	-9.0%	1.10x (?)
Set.isDisjoint.Seq.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Empty.Int	92	84	-8.7%	1.10x (?)
Set.isDisjoint.Int.Empty	61	56	-8.2%	1.09x (?)
StringHashing_latin1	230	212	-7.8%	1.08x (?)
NormalizedIterator_fastPrenormal	790	730	-7.6%	1.08x (?)
StringHashing_fastPrenormal	660	610	-7.6%	1.08x (?)
Set.isDisjoint.Empty.Box	95	88	-7.4%	1.08x (?)
Set.isSuperset.Seq.Int.Empty	95	88	-7.4%	1.08x (?)
StringWalk	1640	1520	-7.3%	1.08x (?)
Set.isStrictSubset.Int.Empty	56	52	-7.1%	1.08x (?)
Set.isDisjoint.Seq.Int100	128	119	-7.0%	1.08x (?)
NormalizedIterator_latin1	270	252	-6.7%	1.07x (?)

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	109	144	+32.1%	0.76x
EqualStringSubstring	27	35	+29.6%	0.77x
EqualSubstringSubstring	27	34	+25.9%	0.79x
LessSubstringSubstring	27	34	+25.9%	0.79x
EqualSubstringSubstringGenericEquatable	27	33	+22.2%	0.82x
LessSubstringSubstringGenericComparable	27	33	+22.2%	0.82x (?)
EqualSubstringString	28	34	+21.4%	0.82x
StringWalk	3000	3360	+12.0%	0.89x (?)

Improvement	OLD	NEW	DELTA	RATIO
ConvertFloatingPoint.MockFloat64ToDouble	4524	1608	-64.5%	2.81x
ArrayAppendAsciiSubstring	50400	44136	-12.4%	1.14x (?)
ArrayAppendLatin1Substring	50832	44604	-12.3%	1.14x (?)
ArrayAppendUTF16Substring	50292	44460	-11.6%	1.13x (?)
DropWhileSequenceLazy	12237	10863	-11.2%	1.13x (?)
Set.subtracting.Empty.Box	224	201	-10.3%	1.11x (?)
ArrayAppendAscii	25772	23222	-9.9%	1.11x (?)
SequenceAlgosUnfoldSequence	5790	5240	-9.5%	1.10x (?)
NSDictionaryCastToSwift	3490	3160	-9.5%	1.10x (?)
Set.isDisjoint.Seq.Empty.Box	926	840	-9.3%	1.10x (?)
Set.isDisjoint.Box.Empty	1182	1076	-9.0%	1.10x (?)
ArrayAppendLatin1	25568	23290	-8.9%	1.10x (?)
Set.subtracting.Seq.Empty.Box	1153	1051	-8.8%	1.10x (?)
DropWhileSequence	12360	11280	-8.7%	1.10x (?)
Combos	2040	1866	-8.5%	1.09x (?)
Set.isDisjoint.Seq.Box.Empty	1119	1027	-8.2%	1.09x (?)
Set.isDisjoint.Empty.Box	1086	1002	-7.7%	1.08x (?)
StringMatch	50000	46200	-7.6%	1.08x (?)
DropWhileAnySequenceLazy	12046	11145	-7.5%	1.08x (?)
ArrayInitFromSlice	616	574	-6.8%	1.07x (?)
PrefixWhileSequenceLazy	10275	9602	-6.5%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

xwu · 2020-09-08T19:49:26Z

@swift-ci test Windows platform

xwu · 2020-09-08T20:55:22Z

@stephentyrone Thoughts on the latest (and simplest) iteration?

stephentyrone · 2020-09-08T21:39:57Z

Seems reasonable to me. I would like to have a cleaner solution, but this is an acceptable stopgap.

We have to do due diligence on the regressions still. I'm fairly certain that they're just noise, but it would be good to disassemble the worst few and make sure there isn't a real problem somehow caused by this change (I can help with this).

xwu · 2020-09-08T22:42:58Z

Sure, your help with that would be great. The String/Substring tests oscillate between 22 and 29 μs (see earlier iterations of the benchmarks), but it would be worth confirming that we're not missing anything.

xwu · 2020-09-09T17:13:21Z

@stephentyrone Disassembly of AngryPhonebook.o, NopDeinit.o, and Substring.o (optimized x86_64) is identical before and after applying this patch. (As a positive control, disassembly of FloatingPointConversion.o is very different.)

xwu · 2020-09-09T17:19:38Z

@swift-ci test Windows platform

xwu requested a review from stephentyrone September 5, 2020 17:29

stephentyrone reviewed Sep 7, 2020

View reviewed changes

stdlib/public/core/FloatingPoint.swift Outdated Show resolved Hide resolved

xwu commented Sep 7, 2020

View reviewed changes

stdlib/public/core/FloatingPoint.swift Outdated

}

}

self = Self._convert(from: value).value

#endif

case (8, 7):

This comment was marked as outdated.

Sign in to view

xwu commented Sep 7, 2020

View reviewed changes

stdlib/public/core/FloatingPoint.swift Outdated Show resolved Hide resolved

xwu force-pushed the float-like-a-butterfly branch from e32aae2 to f560ecf Compare September 7, 2020 15:01

xwu mentioned this pull request Sep 7, 2020

[benchmark] Add new benchmark for floating-point conversion #33801

Merged

xwu requested a review from stephentyrone September 7, 2020 17:44

xwu added 2 commits September 7, 2020 18:47

[stdlib] Add another fast path for generic floating-point conversion

f40fd36

[stdlib] Refactor generic floating-point conversion fast paths

9a6900a

xwu added 2 commits September 8, 2020 11:32

[stdlib] Use 'truncatingIfNeeded:' in floating-point conversion fast …

45d69d5

…paths

[stdlib] Simplify generic floating-point conversion fast paths

643834e

xwu force-pushed the float-like-a-butterfly branch from f560ecf to 643834e Compare September 8, 2020 16:28

xwu merged commit 4865207 into swiftlang:master Sep 10, 2020

This was referenced Sep 10, 2020

[stdlib] Add fast paths for generic floating-point-to-integer conversion #33889

Open

[stdlib] Silence signaling NaN in generic conversions #33902

Merged

[stdlib] Simplify 'BinaryFloatingPoint.init?<T: BinaryFloatingPoint>(exactly: T)' #33910

Merged

xwu deleted the float-like-a-butterfly branch March 20, 2021 04:07

[stdlib] Add another fast path for generic floating-point conversion #33826

[stdlib] Add another fast path for generic floating-point conversion #33826

Uh oh!

Conversation

xwu commented Sep 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xwu commented Sep 5, 2020

Uh oh!

xwu commented Sep 6, 2020

Uh oh!

xwu commented Sep 6, 2020

Uh oh!

swift-ci commented Sep 6, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

xwu commented Sep 6, 2020

Uh oh!

Uh oh!

stephentyrone commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xwu commented Sep 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

xwu commented Sep 7, 2020

Uh oh!

xwu commented Sep 7, 2020

Uh oh!

swift-ci commented Sep 7, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

xwu commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentyrone commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentyrone commented Sep 8, 2020

Uh oh!

stephentyrone commented Sep 8, 2020

Uh oh!

stephentyrone commented Sep 8, 2020

Uh oh!

swift-ci commented Sep 8, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

xwu commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020

Uh oh!

stephentyrone commented Sep 8, 2020

Uh oh!

xwu commented Sep 8, 2020

xwu commented Sep 5, 2020 •

edited

Loading

stephentyrone commented Sep 7, 2020 •

edited

Loading

xwu commented Sep 7, 2020 •

edited

Loading

xwu commented Sep 8, 2020 •

edited

Loading

xwu commented Sep 8, 2020 •

edited

Loading

xwu commented Sep 9, 2020 •

edited

Loading