[WIP][stdlib] Minor Sequence Optimizations #20758

palimondo · 2018-11-26T17:30:19Z

Following up on discussion from #20221 (review), these are minor tweaks to Sequence implementation:

In-place rotation of ringBuffer used in suffix, avoiding new array copy.
Internal use of ContiguousArray in dropLast, prefix and suffix (modeled after map and _filter implementations).

Rotate the ringBuffer in-place instead of copying to new array for lower memory footprint.

Nano-optimization: Following the template used in methods `map` and `_filter`, internally use ContiguousArray before converting to Array on return.

palimondo · 2018-11-26T17:36:42Z

When measuring locally, I saw no performance difference. But I don't think our benchmarks, slicing only short sequences of Ints, cover the affected area very thoroughly…

@airspeedswift Can this be accepted only on theoretical memory-saving grounds? Or do we need to devise better benchmark that showcases the tradeoffs between in-place rotation and array copy?

palimondo · 2018-12-08T21:26:39Z

@swift-ci please benchmark

palimondo · 2018-12-08T21:27:03Z

@swift-ci please smoke test

palimondo · 2018-12-08T21:27:42Z

@airspeedswift Please review 🙏

palimondo · 2018-12-10T21:39:47Z

Hmm… I don't recall seeing any regressions with SuffixSequence when I ran this locally 2 weeks ago. 🤨

palimondo · 2018-12-10T21:42:05Z

@airspeedswift Any idea why that ^^ might be?

palimondo · 2018-12-18T20:35:28Z

Let's try again…
@swift-ci please benchmark

swift-ci · 2018-12-18T21:28:36Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
SuffixSequenceLazy	569	2242	+294.0%	0.25x
SuffixAnySequence	570	2236	+292.3%	0.25x
SuffixSequence	569	2147	+277.3%	0.27x
DictionaryOfAnyHashableStrings_insert	5164	5918	+14.6%	0.87x
Improvement
PrefixWhileAnySeqCntRange	362	323	-10.8%	1.12x
PrefixWhileAnySeqCRangeIter	346	323	-6.6%	1.07x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
Suffix.o	25873	37761	+45.9%	0.69x
Improvement
PrefixWhile.o	22878	21422	-6.4%	1.07x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
SuffixSequenceLazy	948	2541	+168.0%	0.37x
SuffixAnySequence	948	2502	+163.9%	0.38x
SuffixSequence	948	2411	+154.3%	0.39x
PrefixWhileAnySeqCntRangeLazy	159	176	+10.7%	0.90x
SuffixCountableRangeLazy	11	12	+9.1%	0.92x
Improvement
PrefixWhileAnyCollectionLazy	176	160	-9.1%	1.10x
DropLastCountableRangeLazy	12	11	-8.3%	1.09x
PrefixWhileAnyCollection	253	235	-7.1%	1.08x (?)

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
Suffix.o	24657	37073	+50.4%	0.67x
Improvement
PrefixWhile.o	21366	20118	-5.8%	1.06x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
DropLastSequence	27911	18712	-33.0%	1.49x
DropLastAnySequence	27690	18679	-32.5%	1.48x
DropLastAnySequenceLazy	28728	19780	-31.1%	1.45x
DropLastSequenceLazy	27753	19366	-30.2%	1.43x
SuffixAnySequence	22602	17760	-21.4%	1.27x
SuffixSequence	22524	17728	-21.3%	1.27x
SuffixAnySequenceLazy	23169	18757	-19.0%	1.24x
SuffixSequenceLazy	22526	18305	-18.7%	1.23x
DropLastAnySeqCRangeIterLazy	63674	54343	-14.7%	1.17x
DropLastAnySeqCRangeIter	63187	54141	-14.3%	1.17x
DropLastAnySeqCntRangeLazy	63519	54535	-14.1%	1.16x
PrefixWhileAnySeqCRangeIter	61305	54217	-11.6%	1.13x
PrefixWhileSequence	33282	29723	-10.7%	1.12x
PrefixWhileAnySequence	33055	29742	-10.0%	1.11x
DropLastAnySeqCntRange	62930	57704	-8.3%	1.09x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

palimondo · 2018-12-19T07:13:48Z

I cannot reproduce the regression on SuffixSequence locally. Let's try remote performance debugging… First, reverting all changes to suffix for a clean baseline.

palimondo · 2018-12-19T07:14:38Z

@swift-ci please benchmark

palimondo · 2018-12-19T08:24:43Z

Now just the in-place Array rotation...

palimondo · 2018-12-19T09:26:49Z

OK, so the in-place rotation is causing the de-optimization in SuffixSequence case. Let's try if we can stop the optimizer from tripping over itself if this isn't inlined...

swift-ci · 2018-12-19T16:47:07Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
PrefixWhileAnySeqCntRange	362	324	-10.5%	1.12x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
PrefixWhile.o	22878	21422	-6.4%	1.07x
Suffix.o	25873	24513	-5.3%	1.06x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
PrefixWhileAnySeqCntRangeLazy	159	176	+10.7%	0.90x
Improvement
PrefixWhileAnyCollectionLazy	176	160	-9.1%	1.10x
DropLastCountableRangeLazy	12	11	-8.3%	1.09x
PrefixWhileAnyCollection	252	235	-6.7%	1.07x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
Suffix.o	24657	22961	-6.9%	1.07x
PrefixWhile.o	21366	20118	-5.8%	1.06x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
DropLastAnySequenceLazy	29419	19290	-34.4%	1.53x
DropLastAnySequence	27777	18640	-32.9%	1.49x
DropLastSequenceLazy	27720	18678	-32.6%	1.48x
DropLastSequence	27899	18882	-32.3%	1.48x
SuffixAnySequenceLazy	23745	18011	-24.1%	1.32x
SuffixAnySequence	22630	17529	-22.5%	1.29x
SuffixSequenceLazy	22440	17476	-22.1%	1.28x
SuffixSequence	22714	17758	-21.8%	1.28x
PrefixWhileSequence	34700	27716	-20.1%	1.25x
PrefixWhileAnySequence	34567	27671	-19.9%	1.25x
DropLastAnySeqCntRangeLazy	67280	54253	-19.4%	1.24x
DropLastAnySeqCRangeIterLazy	64172	54601	-14.9%	1.18x
DropLastAnySeqCRangeIter	62879	54022	-14.1%	1.16x
DropLastAnySeqCntRange	63356	55511	-12.4%	1.14x
SuffixAnySeqCRangeIterLazy	60113	53409	-11.2%	1.13x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

palimondo · 2018-12-19T17:00:12Z

@airspeedswift I think I have this down now… can you please have a look, because in order to work around the weird de-optimization in SuffixSequence, I had to take the array rotation out of the suffix method itself. What would be a proper way to declare a non-public utility extension like I did here for ContiguousArray.rotate(left:) inside the stdlib? I can't quickly find other precedent.

palimondo · 2018-12-19T17:00:42Z

CC @dabrahams @lorentey

dabrahams · 2018-12-19T17:33:07Z

stdlib/public/core/Sequence.swift

+  @usableFromInline
+  mutating func _rotate(left distance: Int) {
+    guard distance > 0 else { return }
+    let i = index(startIndex, offsetBy: distance)


This is not necessarily the best way to rotate, and is probably not the API we want to commit to supporting even internally. Please try to use what's already written in the algorithms prototype, and especially read this comment and test accordingly.

I guess it's better than to split this and do just the consistent use of ContiguousArray in this PR, and tackle the in-place rotation in a separate PR, right?

palimondo added 3 commits November 26, 2018 17:42

[stdlib] Sequence.suffix ringBuffer rotate in-place

0523e8c

Rotate the ringBuffer in-place instead of copying to new array for lower memory footprint.

[stdlib] dropLast, prefix, suffix: ContiguousArray

f211d01

Nano-optimization: Following the template used in methods `map` and `_filter`, internally use ContiguousArray before converting to Array on return.

[stdlib] Gardening: ringBuffer position index

91ff835

palimondo requested a review from airspeedswift November 26, 2018 17:30

This comment has been minimized.

Sign in to view

[DNM] revert all changes on Suffix

f244420

This comment has been minimized.

Sign in to view

palimondo force-pushed the within-one-stem branch from 6d943cc to 46b462f Compare December 19, 2018 09:31

swiftlang deleted a comment from swift-ci Dec 19, 2018

[DNM] Extract in-place rotation to ContiguousArray

fa15cb2

palimondo force-pushed the within-one-stem branch from 46b462f to fa15cb2 Compare December 19, 2018 16:00

palimondo changed the title ~~[stdlib] Minor Sequence Optimizations~~ [WIP][stdlib] Minor Sequence Optimizations Dec 19, 2018

dabrahams reviewed Dec 19, 2018

View reviewed changes

palimondo closed this Jan 5, 2019

palimondo mentioned this pull request Jan 5, 2019

[stdlib] Use ContiguousArray internally in Sequence dropLast, prefix, suffix #21658

Merged

palimondo mannequin mentioned this pull request Apr 25, 2022

[SR-9621] In place rotation for RingBuffer in Sequence.suffix #52067

Open

[WIP][stdlib] Minor Sequence Optimizations #20758

[WIP][stdlib] Minor Sequence Optimizations #20758

Uh oh!

Conversation

palimondo commented Nov 26, 2018

Uh oh!

palimondo commented Nov 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Dec 8, 2018

Uh oh!

palimondo commented Dec 8, 2018

Uh oh!

palimondo commented Dec 8, 2018

Uh oh!

This comment has been minimized.

palimondo commented Dec 10, 2018

Uh oh!

palimondo commented Dec 10, 2018

Uh oh!

palimondo commented Dec 18, 2018

Uh oh!

swift-ci commented Dec 18, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Uh oh!

palimondo commented Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Dec 19, 2018

Uh oh!

This comment has been minimized.

palimondo commented Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

palimondo commented Dec 19, 2018

Uh oh!

swift-ci commented Dec 19, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Uh oh!

palimondo commented Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Dec 19, 2018

Uh oh!

dabrahams Dec 19, 2018

Choose a reason for hiding this comment

Uh oh!

palimondo Dec 21, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

palimondo commented Nov 26, 2018 •

edited

Loading

palimondo commented Dec 19, 2018 •

edited

Loading

palimondo commented Dec 19, 2018 •

edited

Loading

palimondo commented Dec 19, 2018 •

edited

Loading