Skip to content

[DO NOT MERGE] Inline data and dataprotocol #21165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[DO NOT MERGE] Inline data and dataprotocol #21165

wants to merge 4 commits into from

Conversation

atrick
Copy link
Contributor

@atrick atrick commented Dec 10, 2018

This adds three more optimizer fixes, in addition to the one that was already committed, in order to fix the performance of the generic Data initializer's dynamic customization.

phausler and others added 2 commits December 7, 2018 13:57
Previously the cast optimizer bailed out on any conformance with
requirements.

We can now constant-propagate this:

```
protocol P {}
struct S<E> {
  var e: E
}

extension S : P where E == Int {}

func specializeMe<T>(_ t: T) {
  if let p = t as? P {
    // do fast things.
  }
}

specializeMe(S(e: 0))
```

This turns out to be as simple as calling the TypeChecker.

<rdar://problem/46375150> Inlining does not seem to handle
specialization properly for Data.
@atrick
Copy link
Contributor Author

atrick commented Dec 10, 2018

@swift-ci benchmark.

@atrick
Copy link
Contributor Author

atrick commented Dec 10, 2018

@swift-ci test source compatibility.

@atrick
Copy link
Contributor Author

atrick commented Dec 10, 2018

@swift-ci test.

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DataSetCountMedium 464 6219 +1240.3% 0.07x
ObjectiveCBridgeStubDataAppend 5315 40536 +662.7% 0.13x
StringToDataMedium 2507 9323 +271.9% 0.27x
StringToDataSmall 2430 5450 +124.3% 0.45x
StringToDataEmpty 2423 5248 +116.6% 0.46x
DataCreateMedium 15616 17869 +14.4% 0.87x
Improvement
DataCreateEmpty 21345 179 -99.2% 119.25x
DataAppendBytesSmall 4366 247 -94.3% 17.68x
DataAccessBytesMedium 1021 66 -93.5% 15.47x
DataAccessBytesSmall 1022 79 -92.3% 12.94x
DataSubscriptSmall 192 30 -84.4% 6.40x
DataCreateEmptyArray 24194 3886 -83.9% 6.23x
DataCreateSmallArray 24692 4589 -81.4% 5.38x
DataSetCountSmall 446 107 -76.0% 4.17x
DataMutateBytesSmall 3747 977 -73.9% 3.84x
DataCopyBytesSmall 264 82 -68.9% 3.22x
DataSubscriptMedium 210 69 -67.1% 3.04x
DataReplaceSmall 5464 3058 -44.0% 1.79x
DataReplaceLarge 33344 19462 -41.6% 1.71x
DataAppendDataLargeToMedium 32126 18983 -40.9% 1.69x
DataAppendDataLargeToSmall 31036 18344 -40.9% 1.69x
DataCountSmall 33 20 -39.4% 1.65x
DataCreateSmall 85156 53851 -36.8% 1.58x
DataAppendDataMediumToSmall 5958 3841 -35.5% 1.55x
DataAppendDataMediumToMedium 6557 4397 -32.9% 1.49x
DataAppendDataSmallToMedium 5728 4165 -27.3% 1.38x
DataReplaceMedium 6962 5103 -26.7% 1.36x
DataAppendDataSmallToSmall 5386 4175 -22.5% 1.29x
DataReplaceMediumBuffer 12304 10046 -18.4% 1.22x
DataReplaceSmallBuffer 9595 8140 -15.2% 1.18x
DataToStringEmpty 2726 2344 -14.0% 1.16x
CharacterLiteralsLarge 97 87 -10.3% 1.11x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DataBenchmarks.o 55956 88959 +59.0% 0.63x
ObjectiveCBridgingStubs.o 19363 27763 +43.4% 0.70x
IterateData.o 1797 2119 +17.9% 0.85x
UTF8Decode.o 11514 12470 +8.3% 0.92x
Improvement
Codable.o 36479 27091 -25.7% 1.35x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataSetCountMedium 633 6284 +892.7% 0.10x
ObjectiveCBridgeStubDataAppend 6428 36656 +470.3% 0.18x
StringToDataMedium 2402 8059 +235.5% 0.30x
StringToDataSmall 2340 5356 +128.9% 0.44x
StringToDataEmpty 2331 5187 +122.5% 0.45x
ObjectiveCBridgeStubToNSStringRef 98 118 +20.4% 0.83x (?)
DataCreateMedium 14881 17800 +19.6% 0.84x
Improvement
DataCreateEmpty 24177 179 -99.3% 135.07x
DataAppendBytesSmall 4550 251 -94.5% 18.13x
DataAccessBytesMedium 1027 84 -91.8% 12.23x
DataAccessBytesSmall 1027 89 -91.3% 11.54x
DataSubscriptSmall 284 30 -89.4% 9.47x
DataCreateEmptyArray 26860 3891 -85.5% 6.90x
DataSetCountSmall 636 110 -82.7% 5.78x
DataCreateSmallArray 27835 5359 -80.7% 5.19x
DataSubscriptMedium 284 66 -76.8% 4.30x
DataMutateBytesSmall 3728 990 -73.4% 3.77x
DataCopyBytesSmall 264 82 -68.9% 3.22x
DataReplaceSmall 5346 2886 -46.0% 1.85x
DataReplaceLarge 33891 18964 -44.0% 1.79x
DataAppendDataLargeToSmall 31492 18182 -42.3% 1.73x
DataAppendDataLargeToMedium 31827 19045 -40.2% 1.67x
DataCreateSmall 87936 53622 -39.0% 1.64x
DataAppendDataMediumToSmall 6012 3847 -36.0% 1.56x
DataReplaceMedium 7119 4582 -35.6% 1.55x
DataAppendDataSmallToSmall 6186 4182 -32.4% 1.48x
DataAppendDataSmallToMedium 6011 4277 -28.8% 1.41x
DataAppendDataMediumToMedium 6173 4466 -27.7% 1.38x
DataReplaceMediumBuffer 12431 9285 -25.3% 1.34x
DataReplaceSmallBuffer 9653 7372 -23.6% 1.31x
DataCountSmall 30 23 -23.3% 1.30x
CharacterLiteralsLarge 100 89 -11.0% 1.12x
DataCountMedium 33 30 -9.1% 1.10x
CharacterLiteralsSmall 310 289 -6.8% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
IterateData.o 1989 2223 +11.8% 0.89x
Improvement
DataBenchmarks.o 51029 36243 -29.0% 1.41x
Codable.o 34855 27339 -21.6% 1.27x
UTF8Decode.o 11057 10822 -2.1% 1.02x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DataCreateEmptyArray 40543 596025 +1370.1% 0.07x
DataSetCountMedium 541 6272 +1059.3% 0.09x
DataCreateMediumArray 6830 67216 +884.1% 0.10x
DataCreateSmallArray 63897 628218 +883.2% 0.10x
ObjectiveCBridgeStubDataAppend 5601 35980 +542.4% 0.16x
StringToDataMedium 3184 10059 +215.9% 0.32x
StringToDataSmall 2992 6758 +125.9% 0.44x
StringToDataEmpty 2989 6573 +119.9% 0.45x
DataCreateMedium 14927 18808 +26.0% 0.79x
Improvement
DataCreateEmpty 21684 1051 -95.2% 20.63x
DataAppendBytesSmall 3795 333 -91.2% 11.40x
DataSubscriptMedium 397 92 -76.8% 4.32x
DataSubscriptSmall 397 105 -73.6% 3.78x
DataSetCountSmall 472 161 -65.9% 2.93x
DataCountSmall 200 82 -59.0% 2.44x
DataMutateBytesSmall 5104 2119 -58.5% 2.41x
DataCountMedium 200 84 -58.0% 2.38x
DataAccessBytesMedium 2112 888 -58.0% 2.38x
DataAccessBytesSmall 2112 928 -56.1% 2.28x
DataReplaceLarge 35165 18724 -46.8% 1.88x
DataReplaceSmall 5207 2927 -43.8% 1.78x
DataAppendDataLargeToMedium 32293 18585 -42.4% 1.74x
DataAppendDataLargeToSmall 30810 18427 -40.2% 1.67x
DataReplaceMedium 7594 4630 -39.0% 1.64x
DataCopyBytesSmall 305 200 -34.4% 1.52x
DataAppendDataMediumToSmall 5641 3819 -32.3% 1.48x
DataCreateSmall 89939 65783 -26.9% 1.37x
DataAppendDataSmallToMedium 5540 4231 -23.6% 1.31x
DataAppendDataMediumToMedium 5991 4672 -22.0% 1.28x
DataAppendDataSmallToSmall 5148 4464 -13.3% 1.15x

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Regression
libswiftFoundation.dylib 1515520 1671168 +10.3% 0.91x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB
--------------

@atrick
Copy link
Contributor Author

atrick commented Dec 10, 2018

@itaiferber Let me know if the commits on this branch solve the immediate
performance problems that you're blocked on.

I still see the following regressions:

------- Performance: -Osize -------

| TEST                              |   OLD |   NEW |   DELTA | RATIO     |
|                                   |       |       |         |           |
| **Regression**                    |       |       |         |           |
| DataSetCountMedium                |   633 |  6284 | +892.7% | **0.10x** |
| ObjectiveCBridgeStubDataAppend    |  6428 | 36656 | +470.3% | **0.18x** |
| StringToDataMedium                |  2402 |  8059 | +235.5% | **0.30x** |
| StringToDataSmall                 |  2340 |  5356 | +128.9% | **0.44x** |
| StringToDataEmpty                 |  2331 |  5187 | +122.5% | **0.45x** |
| ObjectiveCBridgeStubToNSStringRef |    98 |   118 |  +20.4% | **0.83x   |
| DataCreateMedium                  | 14881 | 17800 |  +19.6% | **0.84x** |

I haven't looked at any individually. Are these expected regressions?

I'm actually surprised that DataCreateMedium still lags, because the
Sequence initializer should be mostly optimized away now. Hopefully,
this is at least within the range that we can accept until the optimizer
has more time to mature.

Note that I see a ton of code being inlined now from Data's Sequence
initializer. If there's code that probably won't benefit from
inlining, that should be moved behind an @inline(never) attribute.

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 598de228948514b27ad49a72b7d4bb06fb4c3e8f

@itaiferber
Copy link
Contributor

@atrick Thanks so much for putting in the effort on this stuff! Our concerns about CreateArrayEmpty and CreateArrayMedium appear to be resolved here, but I'm really concerned about what's happening in DataCreateMedium, and more so, the StringToData tests. I'll dig into these today to see if we can figure out what's going on, and if there's anything we can do to work around these issues at the implementation level

@itaiferber
Copy link
Contributor

@atrick Upon reflection, it's entirely possible that we haven't conformed String.UTF8View to the necessary protocols to make this efficient; looking right now

@itaiferber
Copy link
Contributor

@airspeedswift @atrick @milseman It looks like we're likely hitting this because String.UTF8View doesn't appear to have withContiguousStorageIfAvailable, so we're falling into byte-wise copying. Is this expected, or should String.UTF8View expose the underlying buffer?

@milseman
Copy link
Member

IMO, we should fix String.UTF8View to expose this

@itaiferber
Copy link
Contributor

@milseman @airspeedswift Should I file a Radar or should we track this in 30541077?

@itaiferber
Copy link
Contributor

@atrick Otherwise these changes are great and unblock us!

@milseman
Copy link
Member

Wherever, I'll try to push it soon.

@milseman
Copy link
Member

#21178

@itaiferber
Copy link
Contributor

@milseman I just filed 46601459 so we can track this appropriately internally too. Thanks for the quick turnaround!

This allows expensive dynamic runtime checks like this:

  unconditional_checked_cast_addr Array<UInt8> in %array : $*Array<UInt8> to ContiguousBytes in %protocol : $*ContiguousBytes

To be converted into:

  %value = init_existential_addr %existential : $*ContiguousBytes, $Array<UInt8>
  store %array to %value : $*Array<UInt8>
Handle the unconditional_checked_cast case.

This involved a major cleanup/rewrite of the utility for analyzing
existential types.

Rewrite potentially fixes some latent bugs in the
ExistentialSpecializer that I found by code inspection.

There still appears to be a bug where it bit-casts an opened
existential to a concrete type, but I believe that code path is
disabled.
@atrick atrick closed this Dec 22, 2018
@atrick atrick deleted the inline_data_and_dataprotocol branch December 22, 2018 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants