[DO NOT MERGE][stdlib] Collection: switch to using '<' instead of '==' in some loop termination conditions #41331

lorentey · 2022-02-11T03:14:15Z

Note: This is just an experiment; we may well end up solving this problem in another way, or leaving it as is for now.

IndexingIterator.next(), Collection.distance(from:to:) and Collection.index(_:offsetBy:limitedBy:) algorithms currently use i != end as their termination condition. This works great when the indices are valid in the collection they run on, but it leads to unterminated loops in cases where a Collection type allows (or even encourages) substricting by index values that aren't members of indices.

Unfortunately we have a small handful of types in the Standard Library that encourage such sloppy indexing -- the big one is String, whose various encoding views all use the same Index (since SE-0180), but there isn't necessarily a one-to-one mapping between indices of one view and the next.

let s = "💩Cafe\u{301} Banana" // "💩" is encoded in four bytes, "\u{301}" is encoded in two

let start = s.utf8.index(s.startIndex, offsetBy: 2)
let end = s.utf8.index(s.startIndex, offsetBy: 8)

In this case, start and end aren't strictly valid in the regular String value (as they point to the middle of the "💩" emoji and to the second Unicode scalar of the decomposed "é" grapheme cluster, respectively), but String still sort-of-kind-of accepts them:

print(s[start]) // "💩"
print(s[end]) // "\u{301}" a.k.a. "́"
print(s[start ..< end]) //  "💩La Cafe", note no acute accent

The behavior seems to be somewhat sensible -- the indices are rounded to down to the nearest Unicode Scalar boundary, then a substring is constructed out of that.

However, this illusion breaks down when we iterate through the substring:

for c in s[start ..< end] {
  print(c)
}

// 💩
// C
// a
// f
// é
// Swift/Collection.swift:711: Fatal error: Out of bounds: index >= endIndex

for i in s[start ..< end].indices {
    print("Character: \(s[i]) Index: \(i)")
}

// Character: 💩 Index: Index(_rawBits: 1)
// Character: C Index: Index(_rawBits: 262401)
// Character: a Index: Index(_rawBits: 327937)
// Character: f Index: Index(_rawBits: 393473)
// Character: é Index: Index(_rawBits: 459521)
// Swift/Substring.swift:165: Fatal error: Cannot increment beyond endIndex

This is rather surprising. The issue is that Substring and DefaultIndices both use IndexingIterator as their iterator type, and IndexingIterator.next() is defined to stop when _position == _elements.endIndex. Because _elements.endIndex falls in between two reachable indices, the condition never triggers and we run off the end of the substring, skipping over our endIndex. Worse, the entire final grapheme cluster "é" is returned as if it was part of the substring, which it isn't, accoring to how the substring is printed.

Similar issues can be found when calling distance(from:to:) or index(_:offsetBy:limitedBy:). Besides String, LazyFilterCollection also suffers from similar index sharing issues with the base collection, although the expectations aren't that well documented there.

Now in String's case, we could choose to catch this and trap when the substring is created with these iffy indices; however, SE-0180 rejected this approach in favor of this passage:

With respect to subscripts, an index that does not fall on an exact boundary in a given String or Substring view will be treated as falling at its encodedOffset in the underlying code units, with the actual contents of the result being an emergent property of applying the usual Unicode rules for decoding those code units. For example, when slicing a string with an index i that falls between two Character boundaries, i.encodedOffset is treated as a position in the string's underlying code units, and the Characters of the result are determined by performing standard Unicode grapheme breaking on the resulting sequence of code units.

However, iterating through a substring with such irregular bounds therefore ought to behave as if we were iterating over a string composed of the same code units -- the current behavior is clearly violating this by overrunning the end index.

(A strict reading of this passage implies that s[start] should return the replacement character "\u{FFFD}", as it attempts to read an invalid (partial) sequence of UTF-8 code units. This does not seem desirable to me; the current rounding-to-nearest-scalar behavior is very likely to be more useful in practice. We can't really split scalars without violating String's fundamental invariant that its underlying encoding must always be valid, so rounding to the nearest scalar boundary is the best we can do.)

This PR updates IndexingIterator.next() and the default implementations of Collection.distance(from:to:) and Collection.index(_:offsetBy:limitedBy:) to compare indices using < instead of == in their termination conditions, thereby resolving this subtle issue. (Another potential benefit is an (anecdotal) code gen improvement in some integer-indexed collection types -- < seems to enable better code analysis in some cases.)

One huge potential drawback is that Index.< might be much slower than Index.== -- for example, a hypothetical collection type that implements a classic linked list (like std::list in C++) would likely have an O(n) Index.< implementation while == would take constant time. This does not seem like a strong enough objection to me -- linked lists aren't that practical (or popular) these days, and it doesn't seem too much of a burden for such collection types to provide custom iterator / index navigation implementations. (After all, they will need to deal with far trickier issues like ARC stack exhaustion while destroying long lists.)

(There are no such performance issues with more practical Collection implementations, such as tree-based containers.)

Tweaking the implementation of such basic functions can also turn out to be source-breaking (or even ABI-breaking) enough that we won't be able to ship this change. In that case, it might be worth special casing String.Index in the implementation of these methods.

rdar://88705758

… termination conditions `IndexingIterator.next()`, `Collection.distance(from:to:)` and `Collection.index(_:offsetBy:limitedBy:)` algorithms currently use `i != end` as their termination condition. This works great when the indices are valid in the collection they run on, but it leads to unterminated loops in cases where a Collection type allows (or even encourages) substricting by index values that aren't members of `indices`. Unfortunately we have a small handful of types in the Standard Library that encourage such sloppy indexing -- the big one is `String`, whose various encoding views all use the same `Index` (since SE-0180), but there isn't necessarily a one-to-one mapping between indices of one view and the next. ``` let s = "💩Cafe\u{301} Banana" // "💩" is encoded in four bytes, "\u{301}" is encoded in two let start = s.utf8.index(s.startIndex, offsetBy: 2) let end = s.utf8.index(s.startIndex, offsetBy: 8) ``` In this case, `start` and `end` aren't strictly valid in the regular `String` value (as they point to the middle of the "💩" emoji and to the second Unicode scalar of the decomposed "é" grapheme cluster, respectively), but `String` still sort-of-kind-of accepts them: ``` print(s[start]) // "💩" print(s[end]) // "\u{301}" a.k.a. "́" print(s[start ..< end]) // "💩La Cafe", note no acute accent ``` The behavior seems to be somewhat sensible -- the indices are rounded to down to the nearest Unicode Scalar boundary, then a substring is constructed out of that. However, this illusion breaks down when we iterate through the substring: ``` for c in s[start ..< end] { print(c) } // 💩 // C // a // f // é // Swift/Collection.swift:711: Fatal error: Out of bounds: index >= endIndex ``` ``` for i in s[start ..< end].indices { print("Character: \(s[i]) Index: \(i)") } // Character: 💩 Index: Index(_rawBits: 1) // Character: C Index: Index(_rawBits: 262401) // Character: a Index: Index(_rawBits: 327937) // Character: f Index: Index(_rawBits: 393473) // Character: é Index: Index(_rawBits: 459521) // Swift/Substring.swift:165: Fatal error: Cannot increment beyond endIndex ``` This is rather surprising. The issue is that `Substring` and `DefaultIndices` both use `IndexingIterator` as their iterator type, and `IndexingIterator.next()` is defined to stop when `_position == _elements.endIndex`. Because `_elements.endIndex` falls in between two reachable indices, the condition never triggers and we run off the end of the substring, skipping over our endIndex. Worse, the entire final grapheme cluster "é" is returned as if it was part of the substring, which it isn't, accoring to how the substring is printed. Similar issues can be found when calling `distance(from:to:)` or `index(_:offsetBy:limitedBy:)`. Besides `String`, `LazyFilterCollection` also suffers from similar index sharing issues with the base collection, although the expectations aren't that well documented there. Now in String's case, we could choose to catch this and trap when the substring is created with these iffy indices; however, SE-0180 rejected this approach in favor of [this passage](https://github.com/apple/swift-evolution/blob/main/proposals/0180-string-index-overhaul.md#comparison-and-subscript-semantics): > With respect to subscripts, an index that does not fall on an exact boundary in a given String or Substring view will be treated as falling at its encodedOffset in the underlying code units, with the actual contents of the result being an emergent property of applying the usual Unicode rules for decoding those code units. For example, when slicing a string with an index i that falls between two Character boundaries, i.encodedOffset is treated as a position in the string's underlying code units, and the Characters of the result are determined by performing standard Unicode grapheme breaking on the resulting sequence of code units. However, iterating through a substring with such irregular bounds therefore ought to behave as if we were iterating over a string composed of the same code units -- the current behavior is clearly violating this by overrunning the end index. (A strict reading of this passage implies that `s[start]` should return the replacement character `"\u{FFFD}"`, as it attempts to read an invalid (partial) sequence of UTF-8 code units. This does not seem desirable to me; the current rounding-to-nearest-scalar behavior is very likely to be more useful in practice. We can't really split scalars without violating `String`'s fundamental invariant that its underlying encoding must always be valid, so rounding to the nearest scalar boundary is the best we can do.) This PR updates `IndexingIterator.next()` and the default implementations of `Collection.distance(from:to:)` and `Collection.index(_:offsetBy:limitedBy:)` to compare indices using `<` instead of `==` in their termination conditions, thereby resolving this subtle issue. (Another potential benefit is an (anecdotal) code gen improvement in some integer-indexed collection types -- `<` seems to enable better code analysis in some cases.) One huge potential drawback is that 'Index.<` might be much slower than `Index.==` -- for example, a hypothetical collection type that implements a classic linked list (like `std::list` in C++) would likely have an O(n) `Index.<` implementation while `==` would take constant time. This does not seem like a strong enough objection to me -- linked lists aren't that practical (or popular) these days, and it doesn't seem too much of a burden for such collection types to provide custom iterator / index navigation implementations. (After all, they will need to deal with far trickier issues like ARC stack exhaustion while destroying long lists.) (There are no such performance issues with more practical Collection implementations, such as tree-based containers.) Tweaking the implementation of such basic functions can also turn out to be source-breaking (or even ABI-breaking) enough that we won't be able to ship this change. In that case, it might be worth special casing `String.Index` in the implementation of these methods. rdar://88705758

lorentey · 2022-02-11T03:14:35Z

@swift-ci test

lorentey · 2022-02-11T03:14:44Z

@swift-ci benchmark

lorentey · 2022-02-11T03:15:27Z

(Sadly I don't expect this would uncover anything that this might break)

@swift-ci test source compatibility

lorentey · 2022-02-11T03:23:10Z

There is a very unfortunate thing about s[start ..< end] above that this PR does nothing to solve: the substring's endIndex is not reachable through its index navigation methods. I expect that will induce similar issues in a bunch of code inside & outside the stdlib; perhaps playing whack-a-mole with such algorithms isn't a winning strategy at all.

let p = s[start ..< end]
print(p.startIndex == start) // false -- there is rounding!
print(p.endIndex == end) // true, unfortunately

This means that such substrings aren't valid collections in the current stdlib.

One way to fix this would be to add a special case to Substring.index(after:) (and similar methods) to return endIndex when the regular grapheme cluster boundary would otherwise run off the end of the substring.

swift-ci · 2022-02-11T05:15:22Z

Build failed
Swift Test Linux Platform
Git Sha - 86bac55

swift-ci · 2022-02-11T06:07:26Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
DropLastArray	11	17	+54.5%	0.65x
DropLastArrayLazy	11	17	+54.5%	0.65x
PrefixArrayLazy	35	52	+48.6%	0.67x
MapReduceAnyCollection	137	194	+41.6%	0.71x
MapReduce	139	196	+41.0%	0.71x
MapReduceClass2	15	20	+33.3%	0.75x
DropWhileArrayLazy	82	105	+28.0%	0.78x
XorLoop	381	477	+25.2%	0.80x
PrefixWhileArray	88	105	+19.3%	0.84x
ReversedArray2	171	200	+17.0%	0.86x
LazilyFilteredArrayContains	14500	16400	+13.1%	0.88x
StringComparison_ascii	649	729	+12.3%	0.89x (?)
Walsh	336	377	+12.2%	0.89x (?)
ReversedDictionary2	315	346	+9.8%	0.91x (?)
DictionaryBridgeToObjC_Access	956	1048	+9.6%	0.91x (?)
UTF8Decode_InitDecoding	188	203	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	188	88	-53.2%	2.14x
SequenceAlgosRange	3140	1880	-40.1%	1.67x
StringEqualPointerComparison	228	171	-25.0%	1.33x
LazilyFilteredRange	3410	2610	-23.5%	1.31x
Sim2DArray	800	614	-23.2%	1.30x
StringFromLongWholeSubstring	5	4	-20.0%	1.25x
DropWhileCountableRangeLazy	106	87	-17.9%	1.22x
RawBufferInitializeMemory	171	142	-17.0%	1.20x
Array2D	7504	6672	-11.1%	1.12x
Diffing.Similar	576	514	-10.8%	1.12x (?)
SubstringComparable	13	12	-7.7%	1.08x
RemoveWhereFilterString	332	308	-7.2%	1.08x (?)
DropLastSequence	586	544	-7.2%	1.08x (?)
String.replaceSubrange.Substring.Small	84	78	-7.1%	1.08x (?)
PrefixAnySeqCntRange	28	26	-7.1%	1.08x (?)
StringHasPrefixUnicode	57000	53000	-7.0%	1.08x (?)
RandomShuffleLCG2	480	448	-6.7%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
Walsh.o	5196	5516	+6.2%	0.94x
DiffingMyers.o	6781	7149	+5.4%	0.95x
IterateData.o	1917	2013	+5.0%	0.95x
ArrayInClass.o	4139	4339	+4.8%	0.95x
RecursiveOwnedParameter.o	2198	2294	+4.4%	0.96x
SuperChars.o	1497	1561	+4.3%	0.96x
DropLast.o	21230	22062	+3.9%	0.96x
RomanNumbers.o	6812	7074	+3.8%	0.96x
ChaCha.o	15744	16336	+3.8%	0.96x
ArraySetElement.o	1300	1348	+3.7%	0.96x
Sim2DArray.o	1380	1428	+3.5%	0.97x
RangeAssignment.o	3275	3387	+3.4%	0.97x
Chars.o	1574	1622	+3.0%	0.97x
Radix2CooleyTukey.o	6264	6440	+2.8%	0.97x
IntegerParsing.o	58421	59957	+2.6%	0.97x
PopFrontGeneric.o	2662	2726	+2.4%	0.98x
DictionaryLiteralTest.o	1369	1400	+2.3%	0.98x
ObserverForwarderStruct.o	3036	3100	+2.1%	0.98x
ReduceInto.o	12267	12525	+2.1%	0.98x
RangeReplaceableCollectionPlusDefault.o	5740	5852	+2.0%	0.98x
CString.o	6271	6383	+1.8%	0.98x
Breadcrumbs.o	43755	44507	+1.7%	0.98x
ProtocolConformance.o	3935	3999	+1.6%	0.98x
ArrayOfGenericPOD.o	5445	5525	+1.5%	0.99x
Diffing.o	7963	8075	+1.4%	0.99x
ObserverClosure.o	2436	2468	+1.3%	0.99x
Prefix.o	17160	17384	+1.3%	0.99x
ObserverPartiallyAppliedMethod.o	2483	2515	+1.3%	0.99x
StringComparison.o	36792	37224	+1.2%	0.99x
DictTest.o	13724	13884	+1.2%	0.99x
DataBenchmarks.o	60155	60851	+1.2%	0.99x
QueueTest.o	12013	12149	+1.1%	0.99x
Histogram.o	3135	3167	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
Memset.o	2203	2011	-8.7%	1.10x
FloatingPointParsing.o	19809	18481	-6.7%	1.07x
StackPromo.o	2241	2145	-4.3%	1.04x
RangeOverlaps.o	6698	6419	-4.2%	1.04x
StaticArray.o	12728	12336	-3.1%	1.03x
SIMDRandomMask.o	6637	6445	-2.9%	1.03x
FloatingPointPrinting.o	5348	5204	-2.7%	1.03x
DropWhile.o	16291	15907	-2.4%	1.02x
StringMatch.o	4162	4082	-1.9%	1.02x
RandomValues.o	7795	7651	-1.8%	1.02x
ArrayAppend.o	24137	23785	-1.5%	1.01x
DropFirst.o	17331	17107	-1.3%	1.01x
PopFront.o	3738	3690	-1.3%	1.01x
SIMDReduceInteger.o	10140	10012	-1.3%	1.01x
FlattenList.o	3841	3793	-1.2%	1.01x
Hash.o	21931	21659	-1.2%	1.01x
Suffix.o	20636	20396	-1.2%	1.01x
BucketSort.o	8551	8455	-1.1%	1.01x
StringTests.o	7505	7425	-1.1%	1.01x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
DropWhileAnySeqCRangeIter	182	59769	+32739.9%	0.00x
DropWhileAnySeqCntRange	182	59540	+32614.1%	0.00x
SIMDRandomMask.Int8x16	482	1148	+138.2%	0.42x
DropLastArray	11	17	+54.5%	0.65x
DropLastArrayLazy	11	17	+54.5%	0.65x
PrefixArrayLazy	35	52	+48.6%	0.67x
RC4	171	238	+39.2%	0.72x
MapReduce	175	231	+32.0%	0.76x
XorLoop	394	493	+25.1%	0.80x (?)
PrefixWhileArray	88	105	+19.3%	0.84x
UTF8Decode_InitFromCustom_contiguous	188	222	+18.1%	0.85x (?)
UTF8Decode_InitDecoding	188	222	+18.1%	0.85x (?)
DataCreateEmpty	170	200	+17.6%	0.85x
DevirtualizeProtocolComposition	228	268	+17.5%	0.85x
StringEqualPointerComparison	171	200	+17.0%	0.86x
UTF8Decode_InitFromBytes_ascii	284	330	+16.2%	0.86x (?)
BucketSort	172	195	+13.4%	0.88x
ObjectiveCBridgeStubDateAccess	228	257	+12.7%	0.89x
ClosedRangeOverlapsClosedRange	90	100	+11.1%	0.90x
DataCountSmall	28	31	+10.7%	0.90x (?)
DataCountMedium	28	31	+10.7%	0.90x
RandomDoubleOpaqueLCG	469	516	+10.0%	0.91x
DropFirstAnyCollection	180	198	+10.0%	0.91x
Data.append.Sequence.809B.Count.RE.I	101	111	+9.9%	0.91x (?)
SequenceAlgosRange	3140	3430	+9.2%	0.92x (?)
SuffixArrayLazy	11	12	+9.1%	0.92x
SuffixArray	11	12	+9.1%	0.92x
ArrayAppendLatin1Substring	28512	31068	+9.0%	0.92x (?)
ArrayAppendUTF16Substring	27972	30456	+8.9%	0.92x (?)
ArrayAppendAsciiSubstring	27972	30456	+8.9%	0.92x

Improvement	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	188	88	-53.2%	2.14x
AnyHashableWithAClass	124500	96000	-22.9%	1.30x
Sim2DArray	794	625	-21.3%	1.27x
ArrayAppendRepeatCol	1030	850	-17.5%	1.21x (?)
RawBufferInitializeMemory	171	142	-17.0%	1.20x
SubstringComparable	13	11	-15.4%	1.18x
RemoveWhereSwapInts	43	37	-14.0%	1.16x (?)
DataSubscriptMedium	71	62	-12.7%	1.15x
Set.subtracting.Seq.Empty.Box	261	228	-12.6%	1.14x
RandomShuffleLCG2	512	448	-12.5%	1.14x
SIMDReduce.Int32x4.Cast	73	66	-9.6%	1.11x (?)
ConvertFloatingPoint.MockFloat64Exactly2	11	10	-9.1%	1.10x
DataSubscriptSmall	34	31	-8.8%	1.10x
PrefixWhileAnyCollectionLazy	193	176	-8.8%	1.10x (?)
PrefixAnySeqCRangeIterLazy	193	176	-8.8%	1.10x
PrefixAnySeqCntRangeLazy	193	176	-8.8%	1.10x
Breadcrumbs.MutatedUTF16ToIdx.Mixed	288	264	-8.3%	1.09x (?)
SuffixCountableRange	12	11	-8.3%	1.09x (?)
RemoveWhereMoveInts	37	34	-8.1%	1.09x
LazilyFilteredRange	7920	7290	-8.0%	1.09x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	293	270	-7.8%	1.09x (?)
StringUTF16SubstringBuilder	2700	2490	-7.8%	1.08x (?)
Array2D	7504	6928	-7.7%	1.08x
DropWhileCountableRangeLazy	94	87	-7.4%	1.08x
DistinctClassFieldAccesses	391	362	-7.4%	1.08x (?)
ArrayInClass	1940	1800	-7.2%	1.08x (?)
String.replaceSubrange.Substring.Small	84	78	-7.1%	1.08x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
IterateData.o	1648	2483	+50.7%	0.66x
RC4.o	3202	4242	+32.5%	0.75x
MonteCarloE.o	2797	3449	+23.3%	0.81x
ChaCha.o	12913	14533	+12.5%	0.89x
ArrayOfGenericPOD.o	5148	5740	+11.5%	0.90x
SIMDReduceInteger.o	8220	8844	+7.6%	0.93x
RomanNumbers.o	5149	5507	+7.0%	0.93x
RemoveWhere.o	13467	14050	+4.3%	0.96x
Prims.o	23258	24081	+3.5%	0.97x
PrimsSplit.o	23310	24133	+3.5%	0.97x
SuperChars.o	1434	1467	+2.3%	0.98x
Radix2CooleyTukey.o	5678	5808	+2.3%	0.98x
RangeReplaceableCollectionPlusDefault.o	5597	5725	+2.3%	0.98x
IntegerParsing.o	53133	54242	+2.1%	0.98x
QueueTest.o	11866	12107	+2.0%	0.98x
StringComparison.o	32565	33116	+1.7%	0.98x
SequenceAlgos.o	20142	20464	+1.6%	0.98x
InsertCharacter.o	4010	4074	+1.6%	0.98x
Histogram.o	1656	1682	+1.6%	0.98x
AngryPhonebook.o	8191	8318	+1.6%	0.98x
ArrayOfPOD.o	2951	2996	+1.5%	0.98x
CString.o	5793	5870	+1.3%	0.99x
DropWhile.o	15331	15534	+1.3%	0.99x
PrimsNonStrongRef.o	86439	87319	+1.0%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
SIMDRandomMask.o	6193	5583	-9.8%	1.11x
StrToInt.o	4633	4239	-8.5%	1.09x
RangeOverlaps.o	6362	6133	-3.6%	1.04x
Suffix.o	23793	22960	-3.5%	1.04x
RandomShuffle.o	3404	3297	-3.1%	1.03x
DropFirst.o	15299	14897	-2.6%	1.03x
FloatingPointPrinting.o	4543	4428	-2.5%	1.03x
FlattenList.o	3744	3656	-2.4%	1.02x
RandomValues.o	6701	6546	-2.3%	1.02x
XorLoop.o	1387	1358	-2.1%	1.02x
Walsh.o	4303	4224	-1.8%	1.02x
Array2D.o	2726	2678	-1.8%	1.02x
StackPromo.o	2127	2097	-1.4%	1.01x
BucketSort.o	8385	8275	-1.3%	1.01x
SortLettersInPlace.o	7945	7842	-1.3%	1.01x
StaticArray.o	11711	11571	-1.2%	1.01x
FindStringNaive.o	7992	7899	-1.2%	1.01x
SortIntPyramids.o	8596	8500	-1.1%	1.01x
ArrayAppend.o	22429	22185	-1.1%	1.01x
StringSplitting.o	32347	31997	-1.1%	1.01x
LazyFilter.o	7023	6952	-1.0%	1.01x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
RecursiveOwnedParameter	7424	8402	+13.2%	0.88x (?)
ConvertFloatingPoint.MockFloat64ToInt64	50305	56490	+12.3%	0.89x (?)
String.replaceSubrange.String	25	28	+12.0%	0.89x (?)
StringUTF16SubstringBuilder	10580	11560	+9.3%	0.92x (?)
Breadcrumbs.MutatedIdxToUTF16.ASCII	12	13	+8.3%	0.92x (?)
Hanoi	12860	13910	+8.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
AngryPhonebook.ASCII2	190	90	-52.6%	2.11x
FloatingPointPrinting_Double_description_small	29000	23500	-19.0%	1.23x (?)
AnyHashableWithAClass	165500	136500	-17.5%	1.21x (?)
ArrayOfPOD	1148	1060	-7.7%	1.08x (?)
ProtocolDispatch2	9915	9221	-7.0%	1.08x (?)
DataToStringLargeUnicode	10500	9800	-6.7%	1.07x (?)
Set.isDisjoint.Seq.Int100	2393	2234	-6.6%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

swift-ci · 2022-02-11T06:48:42Z

Build failed
Swift Test OS X Platform
Git Sha - 86bac55

karwa · 2022-02-11T11:41:26Z

Another potential benefit is an (anecdotal) code gen improvement in some integer-indexed collection types -- < seems to enable better code analysis in some cases.

It is relatively simple to demonstrate. In this example, compare the code generated for iterateOne and iterateTwo (which use a for loop and != termination condition respectively) against iterateThree, which uses a < condition. iterateFour is there to demonstrate that Array is full of delicately-tuned magic, so I'm not entirely surprised to see some regressions related to it. It may need some careful examination to determine exactly what is going on there.

I'd also recommend building a toolchain and benchmarking this with generic collections that don't use Array as their storage.

If this helps solve issues with String, that's great, but I think there is value to be unlocked via this change even if it turns out that String needs some other delicate workarounds.

lorentey · 2022-02-11T19:57:09Z

The benchmark results are a wash, which is expected -- except for DropWhileAnySeqCRangeIter and DropWhileAnySeqCntRange which seem to have suffered what looks like an algorithmic slowdown.

These are testing a Sequence algorithm over AnySequence<Range<Int>>; this change is involved in that, as Range uses an IndexingIterator, but it's unclear how/why there could be such a large impact.

The correctness issue uncovered by the unit tests is more pressing, though.

lorentey · 2022-02-11T20:04:49Z

(Note: Fixing Substring to correctly implement Collection is going to be the right fix to the original issue, not this PR. This means that this PR has lost most of its primary motivating case -- this does not mean that we won't/can't land it at some point, but its (considerable) risks need to be weighed against its potential benefits, and we need to find time to work on it. (It started as a correctness fix, and it is now reduced mosty to a small performance tweak that prioritizes integer-indexed collections over some exotic ones. We may still be able to carve out a correctness argument for it, but it looks like it will be far less convincing than I initially thought.)

karwa · 2022-02-11T20:19:04Z

As far as integer-indexed collections go, I think the indices of most correct Collections ultimately boil down to some integer or another. Even if they're opaque indices, since they represent a position of some sort, they usually wrap one or more integers. Yes, there are linked-lists, but Collection already requires that its indices be Comparable, so they'd need to invent some clever indexing scheme to efficiently implement that anyway.

I've been mentioning this particular thing for some time, as I've seen really great improvements in my libraries by using it (especially together with other tweaks, such as unsigned integer indexes). But I also appreciate that the standard library maintainers have limited time and must prioritise correctness over performance. I just hope it stays on the radar for whenever there's time to work on it :)

lorentey · 2022-02-11T20:40:51Z

I just hope it stays on the radar for whenever there's time to work on it :)

This is definitely on my radar -- I'll keep looking for an excuse to land it.

It's very difficult to ship potentially ABI breaking behavioral changes like this purely for performance reasons, especially if the performance benefits do not show up as a clear win in benchmarks. (The risk can be quite large -- debugging / dealing with binary compatibility issues tends to take a long time. Every minute spent on it takes resources away from more productive work, so we need to always weigh such risk against expected benefits.)

lorentey requested a review from milseman February 11, 2022 03:14

lorentey mentioned this pull request Feb 16, 2022

[stdlib] Fix String indexing edge cases, anomalies & validation bugs #41417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE][stdlib] Collection: switch to using '<' instead of '==' in some loop termination conditions #41331

[DO NOT MERGE][stdlib] Collection: switch to using '<' instead of '==' in some loop termination conditions #41331

Uh oh!

lorentey commented Feb 11, 2022 •

edited

Loading

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

swift-ci commented Feb 11, 2022

Uh oh!

swift-ci commented Feb 11, 2022

Uh oh!

swift-ci commented Feb 11, 2022

Uh oh!

karwa commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022 •

edited

Loading

Uh oh!

karwa commented Feb 11, 2022 •

edited

Loading

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

Uh oh!

[DO NOT MERGE][stdlib] Collection: switch to using '<' instead of '==' in some loop termination conditions #41331

Are you sure you want to change the base?

[DO NOT MERGE][stdlib] Collection: switch to using '<' instead of '==' in some loop termination conditions #41331

Uh oh!

Conversation

lorentey commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

swift-ci commented Feb 11, 2022

Uh oh!

swift-ci commented Feb 11, 2022

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Feb 11, 2022

Uh oh!

karwa commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

lorentey commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karwa commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentey commented Feb 11, 2022

Uh oh!

Uh oh!

lorentey commented Feb 11, 2022 •

edited

Loading

lorentey commented Feb 11, 2022 •

edited

Loading

karwa commented Feb 11, 2022 •

edited

Loading