Skip to content

[stdlib] Add another fast path for generic floating-point conversion #33826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 10, 2020

Conversation

xwu
Copy link
Collaborator

@xwu xwu commented Sep 5, 2020

Based on my reading of IEEE 754, for any given w (exponent bit width) and p (precision, or significand bit width + 1), there can be one and only one encoding of any finite value [edit: or ±∞] in a binary interchange format with those parameters.

Therefore, we can use init(sign:exponentBitPattern:significandBitPattern:) to convert from any binary floating-point type with w and p parameters that match a known standard library type's parameters to the latter type. From there, we can use the existing protocol requirements to convert to Self.

In a perfect world, since everything is inlinable, the compiler would be able to see that essentially the whole conversion involves bit shifting values in opposite directions and converting between unsigned integer types that are type aliases of each other, turning everything into no-ops. But I would imagine that there is some overhead here that can't quite be eliminated.

There is a companion PR (#33821) with benchmarks which [edit: has been merged] to test how well this fast path does the job. [edit: Either the benchmarks are too-good-to-be-true, or the compiler is really doing a fantastic job of inlining.]

@xwu
Copy link
Collaborator Author

xwu commented Sep 5, 2020

cc @troughton

@xwu xwu requested a review from stephentyrone September 5, 2020 17:29
@xwu
Copy link
Collaborator Author

xwu commented Sep 6, 2020

@swift-ci smoke test

@xwu
Copy link
Collaborator Author

xwu commented Sep 6, 2020

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Sep 6, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListFlatMap 2373 3023 +27.4% 0.78x (?)
UTF8Decode_InitDecoding 140 172 +22.9% 0.81x
UTF8Decode_InitFromCustom_contiguous 143 173 +21.0% 0.83x (?)
ParseFloat.Float.Exp 8 9 +12.5% 0.89x (?)
FlattenListLoop 927 1022 +10.2% 0.91x (?)
UTF8Decode_InitFromCustom_noncontiguous 298 326 +9.4% 0.91x (?)
FloatingPointPrinting_Float_description_uniform 3500 3800 +8.6% 0.92x (?)
StringHashing_latin1 214 232 +8.4% 0.92x (?)
ArrayInClass 915 990 +8.2% 0.92x (?)
StringHashing_fastPrenormal 620 670 +8.1% 0.93x (?)
ParseInt.IntSmall.Decimal 263 284 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2727 18 -99.3% 151.49x
EqualSubstringSubstring 29 22 -24.1% 1.32x
LessSubstringSubstring 29 22 -24.1% 1.32x
EqualStringSubstring 29 22 -24.1% 1.32x
EqualSubstringSubstringGenericEquatable 29 22 -24.1% 1.32x (?)
EqualSubstringString 29 22 -24.1% 1.32x
LessSubstringSubstringGenericComparable 29 22 -24.1% 1.32x
StringComparison_longSharedPrefix 378 342 -9.5% 1.11x (?)
Set.isSubset.Seq.Empty.Int 87 79 -9.2% 1.10x
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x (?)
ArrayAppendUTF16Substring 33588 30780 -8.4% 1.09x (?)
ArrayAppendAsciiSubstring 33588 30780 -8.4% 1.09x (?)
ArrayAppendLatin1Substring 34236 31392 -8.3% 1.09x (?)
Set.isDisjoint.Empty.Box 94 87 -7.4% 1.08x (?)
Set.isSuperset.Seq.Int.Empty 94 87 -7.4% 1.08x (?)
Set.isDisjoint.Seq.Empty.Box 94 87 -7.4% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 140 172 +22.9% 0.81x
UTF8Decode_InitFromCustom_contiguous 141 169 +19.9% 0.83x
UTF8Decode_InitFromCustom_noncontiguous 260 291 +11.9% 0.89x (?)
ArrayAppendUTF16Substring 32580 35280 +8.3% 0.92x (?)
StringHashing_fastPrenormal 620 670 +8.1% 0.93x (?)
ArrayAppendAsciiSubstring 32580 35136 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2737 20 -99.3% 136.84x
EqualSubstringSubstring 29 22 -24.1% 1.32x
LessSubstringSubstring 29 22 -24.1% 1.32x
EqualStringSubstring 29 22 -24.1% 1.32x
EqualSubstringSubstringGenericEquatable 29 22 -24.1% 1.32x
EqualSubstringString 29 22 -24.1% 1.32x
LessSubstringSubstringGenericComparable 29 22 -24.1% 1.32x
Set.isSuperset.Seq.Int.Empty 95 84 -11.6% 1.13x (?)
FlattenListLoop 1548 1386 -10.5% 1.12x (?)
StringComparison_longSharedPrefix 379 341 -10.0% 1.11x (?)
Data.hash.Medium 33 30 -9.1% 1.10x (?)
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x (?)
DropFirstAnySequenceLazy 1768 1643 -7.1% 1.08x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 149 180 +20.8% 0.83x
UTF8Decode_InitFromCustom_contiguous 154 182 +18.2% 0.85x
ConvertFloatingPoint.GenericDoubleToDouble 1143 1280 +12.0% 0.89x (?)
ArrayOfPOD 652 726 +11.3% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 4450 2250 -49.4% 1.98x
EqualSubstringSubstringGenericEquatable 33 26 -21.2% 1.27x
LessSubstringSubstringGenericComparable 33 26 -21.2% 1.27x
EqualSubstringSubstring 34 27 -20.6% 1.26x
LessSubstringSubstring 34 27 -20.6% 1.26x
EqualStringSubstring 34 27 -20.6% 1.26x (?)
EqualSubstringString 34 28 -17.6% 1.21x
Set.subtracting.Seq.Empty.Box 1157 1031 -10.9% 1.12x (?)
Set.subtracting.Empty.Box 221 197 -10.9% 1.12x (?)
StringAdder 426 381 -10.6% 1.12x (?)
ArrayAppendAscii 25874 23290 -10.0% 1.11x (?)
Set.isDisjoint.Empty.Box 1074 973 -9.4% 1.10x (?)
ArrayAppendLatin1 25704 23290 -9.4% 1.10x (?)
Set.isDisjoint.Box.Empty 1181 1074 -9.1% 1.10x (?)
Set.isDisjoint.Seq.Empty.Box 914 832 -9.0% 1.10x (?)
Set.subtracting.Box.Empty 246 225 -8.5% 1.09x (?)
ArrayAppendAsciiSubstring 47592 43596 -8.4% 1.09x (?)
SetIsSubsetBox0 1233 1131 -8.3% 1.09x (?)
Set.isDisjoint.Seq.Box.Empty 1128 1035 -8.2% 1.09x (?)
NSDictionaryCastToSwift 3460 3190 -7.8% 1.08x (?)
ArrayPlusEqualFiveElementCollection 176601 164724 -6.7% 1.07x (?)
ArrayValueProp4 5203 4860 -6.6% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@xwu
Copy link
Collaborator Author

xwu commented Sep 6, 2020

Wow. I did not expect that. Wow.

@stephentyrone
Copy link
Contributor

stephentyrone commented Sep 7, 2020

As written, the approach is sound.

The main question I have is whether it makes sense to simply use an unsafeBitCast, which would eliminate some tests from the fast path (and eliminate potential slow paths entirely). That's formally unsound, but in a way that's fundamentally uninteresting and unlikely to cause problems. In particular, we could tighten the semantic requirements of BinaryFloatingPoint to specify the layout, and make the problem go away.

Also, would that then allow us to get rid of the switch added in the previous change?

@xwu
Copy link
Collaborator Author

xwu commented Sep 7, 2020

I think the unsafeBitCast would certainly be a practical approach, but unless my eyes deceive me, the compiler is doing a fantastic job with this safe code. I'm relieved that the approach here is sound.

I have already refactored to merge both switches into one that matches on encoding parameters only, which actually might be faster (it appeared, with the previous change, that there was some difference between concrete and generic Double-to-Double conversion which I would hypothesize arises from not being able to optimize away the first dynamic check whether the value can be cast to Float—I haven't inspected the generated code, however [edit: apparently due to the issue outlined in #33839, though how this particular refactoring here sidesteps the optimizer issue is not clear to me]).

I've changed the fast path condition from value.isFinite to !value.isNaN, since the IEEE 754 specification also fixes the encoding of infinities.

I'll run the benchmarks again to see where this all shakes out.

}
}
self = Self._convert(from: value).value
#endif
case (8, 7):

This comment was marked as outdated.

@xwu xwu force-pushed the float-like-a-butterfly branch from e32aae2 to f560ecf Compare September 7, 2020 15:01
@xwu
Copy link
Collaborator Author

xwu commented Sep 7, 2020

@swift-ci test

@xwu
Copy link
Collaborator Author

xwu commented Sep 7, 2020

@swift-ci benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Sep 7, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 110 142 +29.1% 0.77x
UTF8Decode_InitFromCustom_contiguous 141 171 +21.3% 0.82x
UTF8Decode_InitDecoding 141 170 +20.6% 0.83x
ParseInt.IntSmall.Decimal 263 285 +8.4% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2726 19 -99.3% 143.47x
ConvertFloatingPoint.GenericDoubleToDouble 59 15 -74.6% 3.93x
EqualSubstringSubstring 29 22 -24.1% 1.32x
LessSubstringSubstring 29 22 -24.1% 1.32x (?)
EqualStringSubstring 29 22 -24.1% 1.32x
EqualSubstringSubstringGenericEquatable 29 22 -24.1% 1.32x
EqualSubstringString 29 22 -24.1% 1.32x
LessSubstringSubstringGenericComparable 29 22 -24.1% 1.32x
StringComparison_longSharedPrefix 378 342 -9.5% 1.11x (?)
FlattenListLoop 1022 933 -8.7% 1.10x (?)
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x (?)
ArrayAppendUTF16Substring 33552 30780 -8.3% 1.09x (?)
Set.isSubset.Seq.Empty.Int 87 80 -8.0% 1.09x (?)
Set.isDisjoint.Empty.Box 94 87 -7.4% 1.08x (?)
Set.isSuperset.Seq.Int.Empty 94 87 -7.4% 1.08x (?)
Set.isDisjoint.Seq.Empty.Box 94 87 -7.4% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
StringFromLongWholeSubstringGeneric 3 4 +33.3% 0.75x
AngryPhonebook.ASCII2 110 142 +29.1% 0.77x
UTF8Decode_InitFromCustom_contiguous 142 170 +19.7% 0.84x (?)
UTF8Decode_InitDecoding 142 169 +19.0% 0.84x (?)
UTF8Decode_InitFromCustom_noncontiguous 262 290 +10.7% 0.90x (?)
ArrayAppendUTF16Substring 32580 35388 +8.6% 0.92x (?)
ArrayAppendLatin1Substring 33300 36036 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2730 19 -99.3% 143.68x
ConvertFloatingPoint.GenericDoubleToDouble 59 15 -74.6% 3.93x
EqualSubstringSubstring 29 22 -24.1% 1.32x
LessSubstringSubstring 29 22 -24.1% 1.32x
EqualStringSubstring 29 22 -24.1% 1.32x
EqualSubstringSubstringGenericEquatable 29 22 -24.1% 1.32x
EqualSubstringString 29 22 -24.1% 1.32x
LessSubstringSubstringGenericComparable 29 22 -24.1% 1.32x
StringComparison_longSharedPrefix 379 340 -10.3% 1.11x (?)
Data.hash.Medium 33 30 -9.1% 1.10x (?)
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Box 95 88 -7.4% 1.08x (?)
Set.isSuperset.Seq.Int.Empty 95 88 -7.4% 1.08x (?)
Set.isStrictSubset.Int.Empty 56 52 -7.1% 1.08x (?)
PrefixAnySequenceLazy 1327 1233 -7.1% 1.08x (?)
PrefixWhileAnySequenceLazy 1328 1234 -7.1% 1.08x (?)
DropFirstAnySequenceLazy 1768 1643 -7.1% 1.08x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 110 142 +29.1% 0.77x
UTF8Decode_InitFromCustom_contiguous 155 185 +19.4% 0.84x
UTF8Decode_InitDecoding 151 180 +19.2% 0.84x
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 4443 1701 -61.7% 2.61x
LessSubstringSubstring 35 27 -22.9% 1.30x
EqualSubstringSubstringGenericEquatable 33 26 -21.2% 1.27x
LessSubstringSubstringGenericComparable 33 26 -21.2% 1.27x
EqualSubstringSubstring 34 27 -20.6% 1.26x
EqualStringSubstring 35 28 -20.0% 1.25x
EqualSubstringString 35 28 -20.0% 1.25x
ArrayAppendAsciiSubstring 47772 42444 -11.2% 1.13x (?)
ArrayAppendLatin1Substring 48204 42876 -11.1% 1.12x (?)
Set.subtracting.Seq.Empty.Box 1157 1032 -10.8% 1.12x
Set.isDisjoint.Seq.Empty.Box 915 819 -10.5% 1.12x
Set.subtracting.Empty.Box 222 200 -9.9% 1.11x
ArrayAppendAscii 25840 23290 -9.9% 1.11x (?)
DropFirstAnySequenceLazy 9461 8541 -9.7% 1.11x (?)
StringWalk 3320 3000 -9.6% 1.11x (?)
ArrayAppendLatin1 25738 23290 -9.5% 1.11x (?)
Set.subtracting.Box.Empty 247 225 -8.9% 1.10x
Set.isDisjoint.Box.Empty 1182 1078 -8.8% 1.10x (?)
Set.isDisjoint.Empty.Box 1073 979 -8.8% 1.10x (?)
Set.isDisjoint.Seq.Box.Empty 1124 1026 -8.7% 1.10x (?)
Set.subtracting.Seq.Empty.Int 554 508 -8.3% 1.09x (?)
StringAdder 423 389 -8.0% 1.09x (?)
SetIsSubsetBox0 1232 1135 -7.9% 1.09x (?)
StackPromo 75200 69300 -7.8% 1.09x (?)
DropWhileAnySequenceLazy 12352 11415 -7.6% 1.08x (?)
PrefixSequence 6959 6486 -6.8% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

// input.swift
func convert<
  T: BinaryFloatingPoint, U: BinaryFloatingPoint
>(_ value: T, to: U.Type) -> U {
  U(value)
}

func testDoubleToDouble(_ x: Double) -> Double {
  return convert(x, to: Double.self)
}

func testDoubleToFloat(_ x: Double) -> Float {
  return convert(x, to: Float.self)
}

The SIL generated for conversion between known stdlib types is perfect:

// testDoubleToDouble(_:)
sil hidden @$s5input012testDoubleToC0yS2dF : $@convention(thin) (Double) -> Double {
// %0 "x"                                         // users: %2, %3, %1
bb0(%0 : $Double):
  debug_value %0 : $Double, let, name "x", argno 1 // id: %1
  debug_value %0 : $Double, let, name "value", argno 1 // id: %2
  return %0 : $Double                             // id: %3
} // end sil function '$s5input012testDoubleToC0yS2dF'

// testDoubleToFloat(_:)
sil hidden @$s5input17testDoubleToFloatySfSdF : $@convention(thin) (Double) -> Float {
// %0 "x"                                         // users: %2, %3, %1
bb0(%0 : $Double):
  debug_value %0 : $Double, let, name "x", argno 1 // id: %1
  debug_value %0 : $Double, let, name "value", argno 1 // id: %2
  %3 = struct_extract %0 : $Double, #Double._value // user: %4
  %4 = builtin "fptrunc_FPIEEE64_FPIEEE32"(%3 : $Builtin.FPIEEE64) : $Builtin.FPIEEE32 // user: %5
  %5 = struct $Float (%4 : $Builtin.FPIEEE32)     // user: %6
  return %5 : $Float                              // id: %6
} // end sil function '$s5input17testDoubleToFloatySfSdF'

...for MockFloat64, not so much.

@stephentyrone
Copy link
Contributor

Can we factor the !isNaN out of the switch?

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

@stephentyrone Yeah, I'm not totally satisfied with the control flow here, so I'm going to try something.

I don't think there's any harm--even theoretical--in preserving the encoding of any NaN values as-is between two types with the same exponent and significand bit width; there is the Float/Double/Float80-to-Self conversion afterwards in any case that will do its thing. Therefore, I'm going to delete the check for NaN and refactor accordingly.

It makes the whole implementation a lot more concise. I'll have to see if it affects code generation though.

@xwu xwu force-pushed the float-like-a-butterfly branch from f560ecf to 643834e Compare September 8, 2020 16:28
@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

@swift-ci test

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

@swift-ci benchmark

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

It makes the whole implementation a lot more concise. I'll have to see if it affects code generation though.

Code generation for standard library types is good. @swift-ci is being a little recalcitrant today.

@stephentyrone
Copy link
Contributor

@swift-ci benchmark

@stephentyrone
Copy link
Contributor

@swift-ci test

@stephentyrone
Copy link
Contributor

@xwu CI outage. Should be restored now.

@swift-ci
Copy link
Contributor

swift-ci commented Sep 8, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 109 144 +32.1% 0.76x
EqualSubstringSubstring 22 29 +31.8% 0.76x
LessSubstringSubstring 22 29 +31.8% 0.76x (?)
EqualSubstringSubstringGenericEquatable 22 29 +31.8% 0.76x
EqualSubstringString 22 29 +31.8% 0.76x
LessSubstringSubstringGenericComparable 22 29 +31.8% 0.76x
EqualStringSubstring 23 29 +26.1% 0.79x
NopDeinit 8800 9900 +12.5% 0.89x
StringComparison_longSharedPrefix 341 379 +11.1% 0.90x (?)
ObjectiveCBridgeStringHash 82 89 +8.5% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2707 16 -99.4% 169.18x
Dictionary4 194 149 -23.2% 1.30x
Dictionary4OfObjects 223 184 -17.5% 1.21x
Set.isDisjoint.Int.Empty 59 53 -10.2% 1.11x (?)
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x
StringHashing_latin1 230 212 -7.8% 1.08x (?)
NSDictionaryCastToSwift 1440 1330 -7.6% 1.08x (?)
StringHashing_fastPrenormal 660 610 -7.6% 1.08x (?)
Set.isSuperset.Seq.Int.Empty 94 87 -7.4% 1.08x
Set.isStrictSubset.Empty.Int 139 129 -7.2% 1.08x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 109 144 +32.1% 0.76x
EqualSubstringSubstring 22 29 +31.8% 0.76x
LessSubstringSubstring 22 29 +31.8% 0.76x
EqualStringSubstring 22 29 +31.8% 0.76x (?)
EqualSubstringSubstringGenericEquatable 22 29 +31.8% 0.76x
EqualSubstringString 22 29 +31.8% 0.76x
LessSubstringSubstringGenericComparable 22 29 +31.8% 0.76x
StringComparison_longSharedPrefix 341 377 +10.6% 0.90x (?)
ObjectiveCBridgeStringHash 82 89 +8.5% 0.92x (?)
NSStringConversion.Long 612 662 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 2706 16 -99.4% 169.11x
CharacterLiteralsLarge 71 63 -11.3% 1.13x (?)
ArrayLiteral2 116 104 -10.3% 1.12x (?)
StrComplexWalk 3280 2960 -9.8% 1.11x (?)
Set.isSubset.Seq.Empty.Int 88 80 -9.1% 1.10x (?)
PrefixWhileSequence 234 213 -9.0% 1.10x (?)
Set.isDisjoint.Seq.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Empty.Int 92 84 -8.7% 1.10x (?)
Set.isDisjoint.Int.Empty 61 56 -8.2% 1.09x (?)
StringHashing_latin1 230 212 -7.8% 1.08x (?)
NormalizedIterator_fastPrenormal 790 730 -7.6% 1.08x (?)
StringHashing_fastPrenormal 660 610 -7.6% 1.08x (?)
Set.isDisjoint.Empty.Box 95 88 -7.4% 1.08x (?)
Set.isSuperset.Seq.Int.Empty 95 88 -7.4% 1.08x (?)
StringWalk 1640 1520 -7.3% 1.08x (?)
Set.isStrictSubset.Int.Empty 56 52 -7.1% 1.08x (?)
Set.isDisjoint.Seq.Int100 128 119 -7.0% 1.08x (?)
NormalizedIterator_latin1 270 252 -6.7% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
AngryPhonebook.ASCII2 109 144 +32.1% 0.76x
EqualStringSubstring 27 35 +29.6% 0.77x
EqualSubstringSubstring 27 34 +25.9% 0.79x
LessSubstringSubstring 27 34 +25.9% 0.79x
EqualSubstringSubstringGenericEquatable 27 33 +22.2% 0.82x
LessSubstringSubstringGenericComparable 27 33 +22.2% 0.82x (?)
EqualSubstringString 28 34 +21.4% 0.82x
StringWalk 3000 3360 +12.0% 0.89x (?)
 
Improvement OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToDouble 4524 1608 -64.5% 2.81x
ArrayAppendAsciiSubstring 50400 44136 -12.4% 1.14x (?)
ArrayAppendLatin1Substring 50832 44604 -12.3% 1.14x (?)
ArrayAppendUTF16Substring 50292 44460 -11.6% 1.13x (?)
DropWhileSequenceLazy 12237 10863 -11.2% 1.13x (?)
Set.subtracting.Empty.Box 224 201 -10.3% 1.11x (?)
ArrayAppendAscii 25772 23222 -9.9% 1.11x (?)
SequenceAlgosUnfoldSequence 5790 5240 -9.5% 1.10x (?)
NSDictionaryCastToSwift 3490 3160 -9.5% 1.10x (?)
Set.isDisjoint.Seq.Empty.Box 926 840 -9.3% 1.10x (?)
Set.isDisjoint.Box.Empty 1182 1076 -9.0% 1.10x (?)
ArrayAppendLatin1 25568 23290 -8.9% 1.10x (?)
Set.subtracting.Seq.Empty.Box 1153 1051 -8.8% 1.10x (?)
DropWhileSequence 12360 11280 -8.7% 1.10x (?)
Combos 2040 1866 -8.5% 1.09x (?)
Set.isDisjoint.Seq.Box.Empty 1119 1027 -8.2% 1.09x (?)
Set.isDisjoint.Empty.Box 1086 1002 -7.7% 1.08x (?)
StringMatch 50000 46200 -7.6% 1.08x (?)
DropWhileAnySequenceLazy 12046 11145 -7.5% 1.08x (?)
ArrayInitFromSlice 616 574 -6.8% 1.07x (?)
PrefixWhileSequenceLazy 10275 9602 -6.5% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

@swift-ci test Windows platform

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

@stephentyrone Thoughts on the latest (and simplest) iteration?

@stephentyrone
Copy link
Contributor

Seems reasonable to me. I would like to have a cleaner solution, but this is an acceptable stopgap.

We have to do due diligence on the regressions still. I'm fairly certain that they're just noise, but it would be good to disassemble the worst few and make sure there isn't a real problem somehow caused by this change (I can help with this).

@xwu
Copy link
Collaborator Author

xwu commented Sep 8, 2020

Sure, your help with that would be great. The String/Substring tests oscillate between 22 and 29 μs (see earlier iterations of the benchmarks), but it would be worth confirming that we're not missing anything.

@xwu
Copy link
Collaborator Author

xwu commented Sep 9, 2020

@stephentyrone Disassembly of AngryPhonebook.o, NopDeinit.o, and Substring.o (optimized x86_64) is identical before and after applying this patch. (As a positive control, disassembly of FloatingPointConversion.o is very different.)

@xwu
Copy link
Collaborator Author

xwu commented Sep 9, 2020

@swift-ci test Windows platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants