Remove as much checked math as possible from buffer pointers. #37424

Lukasa · 2021-05-14T17:32:47Z

This patch removes as much checked math as I was able to find from the
unsafe buffer pointers types. In particular, I removed two kinds of
checked math:

Checked math that was provably unable to trap due to prior operations
done in the same function, or in a function that must have been
called before the current one. For example, in
index(_:offsetBy:limitedBy:), when we do the final index addition we
have already validated that it must be in-bounds. Therefore, it
cannot overflow.
Checked math that can only fail if the user is misusing the indices.
As previously discussed with Karoy and Andrew, the unsafe raw buffer
types are not obligated to crash if you misuse their indices, they
are free to invoke undefined behaviour. In these cases, I added
defensive overflow checks in debug mode.

The result of this change should be reductions in code size in almost
all usage sites of the raw buffer types.

Lukasa · 2021-05-14T17:32:56Z

@swift-ci please test

Lukasa · 2021-05-14T17:33:05Z

cc @atrick @lorentey

swift-ci · 2021-05-14T17:56:40Z

Build failed
Swift Test OS X Platform
Git Sha - ea3e1c5199dd2579643037d410ea40ffaa3cc97e

atrick

Thanks. I agree with the premise. UnsafeBufferPointer indices are only checked in debug builds, so this change makes the implementation more consistent. And these are are inlinable critical code paths where we don't want extra traps.

UnsafeRawBufferPointer may have the same issue, at least in the iterator.

swift-ci · 2021-05-14T18:29:54Z

Build failed
Swift Test Linux Platform
Git Sha - ea3e1c5199dd2579643037d410ea40ffaa3cc97e

Lukasa · 2021-05-14T19:05:43Z

@swift-ci please test

Lukasa · 2021-05-14T19:06:22Z

UnsafeRawBufferPointer definitely also needs cleaning up, I’ll tackle that as well.

swift-ci · 2021-05-14T19:39:09Z

Build failed
Swift Test OS X Platform
Git Sha - 0fedd0b9cfe3dcac2543dddec3f789d6d3b1016f

swift-ci · 2021-05-14T20:11:37Z

Build failed
Swift Test Linux Platform
Git Sha - 0fedd0b9cfe3dcac2543dddec3f789d6d3b1016f

Lukasa · 2021-05-15T06:51:33Z

@swift-ci please test

Lukasa · 2021-05-15T07:08:27Z

@swift-ci please test

Lukasa · 2021-05-15T07:09:06Z

@atrick I've added URBP to this as well, the same kinds of changes really, though I also removed a division by replacing it with a series of unchecked additions in a loop we were going to execute anyway.

swift-ci · 2021-05-15T07:33:12Z

Build failed
Swift Test OS X Platform
Git Sha - a38d9f6eb63de9ee3ba0f39a9debad1f81943a7c

swift-ci · 2021-05-15T07:52:52Z

Build failed
Swift Test Linux Platform
Git Sha - a38d9f6eb63de9ee3ba0f39a9debad1f81943a7c

Lukasa · 2021-05-15T09:34:05Z

@swift-ci please test

swift-ci · 2021-05-15T10:11:58Z

Build failed
Swift Test OS X Platform
Git Sha - bb36d84f3aec5bd681d3a4b58fce1f095edb3daf

swift-ci · 2021-05-15T10:31:15Z

Build failed
Swift Test Linux Platform
Git Sha - bb36d84f3aec5bd681d3a4b58fce1f095edb3daf

Lukasa · 2021-05-15T11:32:02Z

@swift-ci please test

karwa · 2021-05-15T12:07:51Z

Fantastic! I've noticed something like a 20% improvement to performance in WebURL by using a custom UBP which doesn't trap on overflow.

As discussed on the forums, I think this is actually a general improvement that applies to all Collections - including Array and ContiguousArray, and the default implementation for all collections with integer indexes.

The comments in this patch reason that it is okay to ignore overflow because this is an unsafe buffer; that is not entirely correct, or at least not the whole picture. It is okay to ignore overflow here because a Collection's indexing operations are not arithmetic operations, so arithmetic overflow is a meaningless signal, even in debug builds. If there were a boundary possibly worth raising a debug-mode error for, it would be incrementing past endIndex, not incrementing past Int.max.

The thing that gives safe types their memory safety is bounds-checking on access, and the reason UBP is unsafe because it doesn't do that. It has nothing to do with its indexing operations trapping on overflow or not.

The only thing trapping on overflow gives us is slightly earlier termination if somebody increments beyond endIndex, doesn't notice, continues going all the way to Int.max (a process which I estimate would take literally decades if I left my 2.6Ghz Intel MBP running 24/7), wrapped around to Int.min, and incremented all the way back (more decades) to coincidentally arrive at another valid index -- and even in that extreme case, memory safety would be preserved by bounds-checking the eventual access.

swift-ci · 2021-05-15T12:54:16Z

Build failed
Swift Test Linux Platform
Git Sha - 7e98c9c5745971969d70deb24456ec414d05b0cb

swift-ci · 2021-05-15T14:10:55Z

Build failed
Swift Test OS X Platform
Git Sha - 7e98c9c5745971969d70deb24456ec414d05b0cb

Lukasa · 2021-05-15T14:18:11Z

@gottesmm Any idea what's going on with the SILOptimizer pass here? I think this is just the result of the loop being restructured, but I haven't gotten around to rebuilding this toolchain locally to be sure.

Lukasa · 2021-05-18T13:15:41Z

Ok, here are my comparative SIL outputs. Firstly, from the existing 5.4-RELEASE toolchain:

; Function Attrs: minsize nounwind optsize
define protected swiftcc i64 @"$s4test0A9Subscriptys5Int64VSWF"(i64 %0, i64 %1) #0 {
entry:
  %2 = icmp eq i64 %0, 0
  br i1 %2, label %.thread3, label %3

3:                                                ; preds = %entry
  %4 = icmp eq i64 %1, 0
  br i1 %4, label %5, label %6

5:                                                ; preds = %3
  tail call void asm sideeffect "", "n"(i32 2) #2
  tail call void @llvm.trap()
  unreachable

6:                                                ; preds = %3
  %7 = sub i64 %1, %0
  %8 = icmp slt i64 %7, 0
  br i1 %8, label %23, label %9, !prof !7, !misexpect !8

9:                                                ; preds = %6
  %10 = icmp eq i64 %7, 0
  br i1 %10, label %.thread3, label %11

11:                                               ; preds = %9
  %12 = inttoptr i64 %0 to i8*
  br label %13

13:                                               ; preds = %11, %13
  %14 = phi i64 [ 0, %11 ], [ %20, %13 ]
  %15 = phi i64 [ 0, %11 ], [ %16, %13 ]
  %16 = add nuw i64 %15, 1
  %17 = getelementptr inbounds i8, i8* %12, i64 %15
  %18 = load i8, i8* %17, align 1
  %19 = zext i8 %18 to i64
  %20 = add i64 %14, %19
  %21 = icmp eq i64 %16, %7
  br i1 %21, label %.thread3, label %13

.thread3:                                         ; preds = %13, %entry, %9
  %22 = phi i64 [ 0, %9 ], [ 0, %entry ], [ %20, %13 ]
  ret i64 %22

23:                                               ; preds = %6
  tail call void asm sideeffect "", "n"(i32 0) #2
  tail call void @llvm.trap()
  unreachable
}

Then, from my new toolchain:

; Function Attrs: minsize nounwind optsize
define protected swiftcc i64 @"$s4test0A9Subscriptys5Int64VSWF"(i64 %0, i64 %1) #0 {
entry:
  %2 = icmp eq i64 %0, 0
  %3 = sub i64 %1, %0
  %4 = select i1 %2, i64 0, i64 %3
  %5 = icmp slt i64 %4, 0
  br i1 %5, label %20, label %6, !prof !7

6:                                                ; preds = %entry
  %7 = icmp eq i64 %4, 0
  br i1 %7, label %.loopexit, label %8

8:                                                ; preds = %6
  %9 = inttoptr i64 %0 to i8*
  br label %10

10:                                               ; preds = %8, %10
  %11 = phi i64 [ 0, %8 ], [ %17, %10 ]
  %12 = phi i64 [ 0, %8 ], [ %13, %10 ]
  %13 = add nuw i64 %12, 1
  %14 = getelementptr inbounds i8, i8* %9, i64 %12
  %15 = load i8, i8* %14, align 1
  %16 = zext i8 %15 to i64
  %17 = add i64 %11, %16
  %18 = icmp eq i64 %13, %4
  br i1 %18, label %.loopexit, label %10

.loopexit:                                        ; preds = %10, %6
  %19 = phi i64 [ 0, %6 ], [ %17, %10 ]
  ret i64 %19

20:                                               ; preds = %entry
  tail call void asm sideeffect "", "n"(i32 0) #2
  tail call void @llvm.trap()
  unreachable
}

Lukasa · 2021-05-18T15:04:20Z

I think I've convinced myself that this IR is strictly better. We don't do an immediate jump to the end of the function if the base pointer is nil which is a bit sad: instead we do a sub/cmov/cmp chain. I don't love that (not sure of the relative cost of cmov in this context).

Might need to chat to @gottesmm to work out what we think is better here. LLVM seems to have a strong preference for the cmov though.

atrick · 2021-05-19T15:59:14Z

In the interest of code size, LLVM should be able to reorder traps (if the frontend says it's ok). That's probably worth an LLVM radar. Although this case might be fixable in SIL. Would need to see -emit-sil and -emit-ir -disable-llvm-optzns.

Lukasa · 2021-05-19T17:19:29Z

To convince myself I replicated this code in C and compiled to LLVM IR, and then stripped out the debug noise. My code was:

#include <stddef.h>

long testSubscript(char *base, char *end) {
    size_t count;
    long accumulator = 0;

    if (base == NULL) {
        count = 0;
    } else {
        count = end - base;
    }

    for (size_t i = 0; i < count; i++) {
        accumulator += base[i];
    }

    return accumulator;
}

This includes the slightly weird pattern of NULL leading to zero count, which is unavoidable due to the way we have constructed URBP.

The resulting IR is:

define dso_local i64 @_Z13testSubscriptPcS_(i8* %0, i8* %1) local_unnamed_addr #0 !dbg !8 {
  %3 = icmp eq i8* %0, null
  %4 = ptrtoint i8* %1 to i64
  %5 = ptrtoint i8* %0 to i64
  %6 = sub i64 %4, %5
  %7 = icmp eq i64 %6, 0
  %8 = select i1 %3, i1 true, i1 %7
  br i1 %8, label %9, label %11

9:                                                ; preds = %11, %2
  %10 = phi i64 [ 0, %2 ], [ %17, %11 ]
  ret i64 %10

11:                                               ; preds = %2, %11
  %12 = phi i64 [ %18, %11 ], [ 0, %2 ]
  %13 = phi i64 [ %17, %11 ], [ 0, %2 ]
  %14 = getelementptr inbounds i8, i8* %0, i64 %12
  %15 = load i8, i8* %14, align 1
  %16 = sext i8 %15 to i64
  %17 = add nsw i64 %13, %16
  %18 = add nuw i64 %12, 1
  %19 = icmp eq i64 %18, %6
  br i1 %19, label %9, label %11
}

This is, modulo some differences in signedness and the extra precondition branch on the end pointer, basically the same as what we have now, including the loop preamble. So I've convinced myself this is fine.

lorentey · 2021-06-02T22:05:43Z

@swift-ci Apple Silicon benchmark

swift-ci · 2021-06-02T22:59:08Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryOfAnyHashableStrings_insert	2058	3696	+79.6%	0.56x
Set.isDisjoint.Box25	276	397	+43.8%	0.70x (?)
EqualSubstringSubstring	22	30	+36.4%	0.73x
LessSubstringSubstring	22	29	+31.8%	0.76x
EqualStringSubstring	22	29	+31.8%	0.76x
EqualSubstringSubstringGenericEquatable	22	29	+31.8%	0.76x
EqualSubstringString	22	29	+31.8%	0.76x
LessSubstringSubstringGenericComparable	22	29	+31.8%	0.76x
DataCreateSmallArray	1450	1850	+27.6%	0.78x (?)
DataCountSmall	15	19	+26.7%	0.79x
Set.isDisjoint.Int50	196	233	+18.9%	0.84x (?)
DictionaryKeysContainsCocoa	16	19	+18.7%	0.84x (?)
ObjectiveCBridgeStubDateAccess	130	152	+16.9%	0.86x (?)
DataCountMedium	15	17	+13.3%	0.88x
StringComparison_longSharedPrefix	325	358	+10.2%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
StaticArray	2	1	-50.0%	2.00x
FlattenListLoop	1565	1026	-34.4%	1.53x (?)
SIMDReduce.Int8x16.Cast	72	48	-33.3%	1.50x
ObjectiveCBridgeStubDataAppend	2420	1840	-24.0%	1.32x (?)
SIMDReduce.Int32x4.Initializer	71	56	-21.1%	1.27x
Dictionary4	186	148	-20.4%	1.26x
SortIntPyramid	515	420	-18.4%	1.23x
SortAdjacentIntPyramids	815	705	-13.5%	1.16x (?)
Data.init.Sequence.64kB.Count0.RE	183	159	-13.1%	1.15x (?)
Data.init.Sequence.64kB.Count0	178	157	-11.8%	1.13x (?)
Data.init.Sequence.809B.Count	55	49	-10.9%	1.12x (?)
Data.init.Sequence.64kB.Count0.I	176	157	-10.8%	1.12x (?)
ObjectiveCBridgeStubFromNSDate	3820	3440	-9.9%	1.11x (?)
Data.append.Sequence.64kB.Count0.RE.I	180	163	-9.4%	1.10x (?)
Data.append.Sequence.64kB.Count0	173	157	-9.2%	1.10x (?)
Data.append.Sequence.64kB.Count0.I	173	157	-9.2%	1.10x (?)
Data.append.Sequence.809B.Count0.I	246	225	-8.5%	1.09x (?)
Data.append.Sequence.809B.Count0	245	225	-8.2%	1.09x (?)
RGBHistogram	1610	1480	-8.1%	1.09x (?)
Phonebook	1113	1029	-7.5%	1.08x (?)
Data.append.Sequence.64kB.Count0.RE	176	164	-6.8%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgingStubs.o	15237	17189	+12.8%	0.89x

Improvement	OLD	NEW	DELTA	RATIO
SortLettersInPlace.o	8290	8162	-1.5%	1.02x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
LessSubstringSubstring	22	30	+36.4%	0.73x
EqualSubstringSubstring	22	30	+36.4%	0.73x
EqualStringSubstring	22	30	+36.4%	0.73x
EqualSubstringString	22	30	+36.4%	0.73x
LessSubstringSubstringGenericComparable	22	30	+36.4%	0.73x
EqualSubstringSubstringGenericEquatable	22	29	+31.8%	0.76x
DataCountSmall	15	19	+26.7%	0.79x
Data.init.Sequence.809B.Count.RE.I	40	48	+20.0%	0.83x
FlattenListFlatMap	2725	3248	+19.2%	0.84x (?)
Data.init.Sequence.809B.Count.RE	40	46	+15.0%	0.87x (?)
DataCountMedium	15	17	+13.3%	0.88x (?)
Data.append.Sequence.64kB.Count	29	32	+10.3%	0.91x (?)
Data.hash.Empty	45	49	+8.9%	0.92x (?)
DataCreateSmallArray	1850	2000	+8.1%	0.93x (?)
DataCreateMedium	7600	8200	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
Data.init.Sequence.64kB.Count.I	58	29	-50.0%	2.00x
Data.init.Sequence.64kB.Count	58	29	-50.0%	2.00x
Data.init.Sequence.2047B.Count.I	98	52	-46.9%	1.88x
Data.init.Sequence.2049B.Count.I	98	52	-46.9%	1.88x
Data.init.Sequence.809B.Count	85	50	-41.2%	1.70x
Data.init.Sequence.809B.Count.I	85	50	-41.2%	1.70x
Data.init.Sequence.511B.Count.I	93	60	-35.5%	1.55x
Data.init.Sequence.513B.Count.I	93	62	-33.3%	1.50x
Data.init.Sequence.64kB.Count0	212	155	-26.9%	1.37x
Data.init.Sequence.64kB.Count0.I	212	156	-26.4%	1.36x
ObjectiveCBridgeStubDataAppend	2420	1900	-21.5%	1.27x (?)
Data.init.Sequence.2047B.Count0.I	370	297	-19.7%	1.25x
Data.init.Sequence.2049B.Count0.I	366	295	-19.4%	1.24x
DataSubscriptSmall	19	16	-15.8%	1.19x
FlattenListLoop	1065	936	-12.1%	1.14x (?)
Data.append.Sequence.64kB.Count0	171	152	-11.1%	1.12x (?)
SortIntPyramid	820	740	-9.8%	1.11x (?)
WordCountHistogramASCII	2800	2600	-7.1%	1.08x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgingStubs.o	13827	15212	+10.0%	0.91x

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
LessSubstringSubstringGenericComparable	49	56	+14.3%	0.88x (?)
EqualSubstringSubstringGenericEquatable	50	57	+14.0%	0.88x
LessSubstringSubstring	51	58	+13.7%	0.88x (?)
EqualSubstringSubstring	51	58	+13.7%	0.88x (?)
EqualStringSubstring	51	58	+13.7%	0.88x (?)
EqualSubstringString	51	58	+13.7%	0.88x

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDataAppend	3260	2980	-8.6%	1.09x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

swift-ci · 2021-06-02T23:35:21Z

Performance (arm64): -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryOfAnyHashableStrings_insert	1316	2324	+76.6%	0.57x
Set.isDisjoint.Box25	117	199	+70.1%	0.59x (?)
Set.isDisjoint.Int25	89	135	+51.7%	0.66x (?)
Set.isDisjoint.Int50	89	131	+47.2%	0.68x (?)
StringRemoveDupes	121	147	+21.5%	0.82x (?)
ArrayAppend	380	460	+21.1%	0.83x (?)
DictionaryGroupOfObjects	459	549	+19.6%	0.84x (?)
DictionaryKeysContainsNative	12	14	+16.7%	0.86x (?)
SequenceAlgosUnfoldSequence	310	360	+16.1%	0.86x
DataCreateSmallArray	1300	1500	+15.4%	0.87x
DictionaryRemove	1750	2000	+14.3%	0.88x (?)
SetIsSubsetBox25	86	98	+14.0%	0.88x (?)
DictionaryRemoveOfObjects	5800	6600	+13.8%	0.88x (?)
Set.isStrictSubset.Box25	87	98	+12.6%	0.89x (?)
DictionarySwap	528	592	+12.1%	0.89x (?)
DictionaryGroup	117	131	+12.0%	0.89x (?)
Histogram	252	280	+11.1%	0.90x (?)
DictionarySwapAt	424	456	+7.5%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDataAppend	1720	1220	-29.1%	1.41x
DataAccessBytesMedium	45	32	-28.9%	1.41x
DataAppendBytesSmall	128	93	-27.3%	1.38x
DataAccessBytesSmall	48	35	-27.1%	1.37x
StringComparison_ascii	356	288	-19.1%	1.24x
Phonebook	896	742	-17.2%	1.21x
LessSubstringSubstring	22	19	-13.6%	1.16x
EqualSubstringSubstring	22	19	-13.6%	1.16x
EqualStringSubstring	22	19	-13.6%	1.16x
EqualSubstringSubstringGenericEquatable	22	19	-13.6%	1.16x
EqualSubstringString	22	19	-13.6%	1.16x
LessSubstringSubstringGenericComparable	22	19	-13.6%	1.16x
SIMDReduce.Int32x4.Initializer	49	44	-10.2%	1.11x
CharacterPropertiesStashed	610	550	-9.8%	1.11x (?)
DataCreateMediumArray	880	800	-9.1%	1.10x (?)
Chars2	2600	2400	-7.7%	1.08x
Set.filter.Int100.24k	874	808	-7.6%	1.08x
Dict.CopyKeyValue.24k	888	822	-7.4%	1.08x
Dict.CopyKeyValue.20k	768	715	-6.9%	1.07x
Calculator	135	126	-6.7%	1.07x
SetSymmetricDifferenceBox25	243	227	-6.6%	1.07x

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgingStubs.o	14006	15502	+10.7%	0.90x

Improvement	OLD	NEW	DELTA	RATIO
SortLettersInPlace.o	7529	7337	-2.6%	1.03x
SortStrings.o	23949	23693	-1.1%	1.01x

Performance (arm64): -Osize

Regression	OLD	NEW	DELTA	RATIO
DataCreateSmallArray	1550	1900	+22.6%	0.82x
Data.append.Sequence.64kB.Count.I	37	42	+13.5%	0.88x
Data.append.Sequence.64kB.Count	37	42	+13.5%	0.88x
Data.append.Sequence.809B.Count.I	61	68	+11.5%	0.90x
Data.append.Sequence.809B.Count	61	67	+9.8%	0.91x

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDataAppend	1720	1220	-29.1%	1.41x
DataAppendBytesSmall	138	106	-23.2%	1.30x
Phonebook	903	742	-17.8%	1.22x
EqualStringSubstring	22	19	-13.6%	1.16x
EqualSubstringString	22	19	-13.6%	1.16x
StringComparison_ascii	521	451	-13.4%	1.16x (?)
LessSubstringSubstring	23	20	-13.0%	1.15x
EqualSubstringSubstring	23	20	-13.0%	1.15x
EqualSubstringSubstringGenericEquatable	23	20	-13.0%	1.15x
LessSubstringSubstringGenericComparable	23	20	-13.0%	1.15x
SortStrings	419	378	-9.8%	1.11x
Chars2	3150	2900	-7.9%	1.09x
CharIndexing_punctuated_unicodeScalars	560	520	-7.1%	1.08x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgingStubs.o	13718	15006	+9.4%	0.91x

Improvement	OLD	NEW	DELTA	RATIO
DataBenchmarks.o	47449	46977	-1.0%	1.01x

Performance (arm64): -Onone

Regression	OLD	NEW	DELTA	RATIO
XorLoop	4745	5551	+17.0%	0.85x
IterateData	1469	1665	+13.3%	0.88x
StringToDataEmpty	1200	1300	+8.3%	0.92x (?)
NSArray.bridged.mutableCopy.objectAtIndex	597	643	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDataAppend	2180	1840	-15.6%	1.18x
LessSubstringSubstringGenericComparable	40	36	-10.0%	1.11x
ParseInt.UInt64.Decimal	1380	1275	-7.6%	1.08x
LessSubstringSubstring	40	37	-7.5%	1.08x
EqualSubstringSubstring	40	37	-7.5%	1.08x
EqualStringSubstring	40	37	-7.5%	1.08x
EqualSubstringSubstringGenericEquatable	40	37	-7.5%	1.08x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini9,1
  Total Number of Cores: 8 (4 performance and 4 efficiency)
  Memory: 16 GB

Lukasa · 2021-06-03T07:08:30Z

Thoughts on those results @lorentey? DictionaryOfAnyHashableStrings_insert is a particularly worrying result. I have a suspicion that some of these preconditions were load-bearing.

eeckstein · 2021-06-03T09:17:50Z

IIRC, _debugPrecondition is only completely eliminated if the function is inlined into client code. If it's called within the stdlib, it performs a dynamic check if it's a debug configuration. This may be even worse than an overflow check.
Maybe you want to use 'internalInvariant` instead? But this would not do any checks in debug builds (of the client code).

Lukasa · 2021-06-03T10:22:12Z

I think the performance of this object fairly critically relies on the idea that the function calls are being inlined into client code.

eeckstein · 2021-06-03T13:35:33Z

That's true, but if those functions are called and inlined within the stdlib core, the debug-configuration check is still not eliminated.

That could e.g. explain the perf regressions in the string benchmarks

Lukasa · 2021-06-03T15:41:17Z

Hmm, so we don't really have an appropriate kind of precondition for this?

eeckstein · 2021-06-03T16:08:29Z

I don't think so. Is internalInvariant an option for this?

Lukasa · 2021-06-03T16:25:31Z

I think that question is above my pay grade. @lorentey and @atrick are the right people to answer that.

lorentey · 2021-06-03T21:39:11Z

The benchmarks are largely in line with my expectations -- absolutely everything about these low-level types is load bearing, and I don't expect large performance improvements from eliminating highly predictable branches. (#34961 wasn't exactly worth the effort it took to land it, either, when we only look at the performance results. I consider this stuff to be mostly cosmetic.)

I was hoping for a somewhat larger impact on code size, though.

DictionaryOfAnyHashableStrings_insert is surprising -- I can't immediately think of a reason why a buffer pointer change would affect Dictionary or AnyHashable. Perhaps it's the same root cause as the String regressions -- String has a lot more non-inlinable code than usual for the stdlib.

I don't think _debugPrecondition does any runtime check for debug config (or at least not in the usual case) -- in the cases I've seen, in non-inlinable code it effectively turns into a regular precondition.

The next step towards landing this would probably be to look at codegen changes that could explain the regressions -- for that we'd need to pinpoint what exactly uses UBPs on the affected benchmarks.

If it turns out to be a precondition thing, then the usual approach to mitigate it is to create internal-only variants of the affected entry points that use _internalInvariant. The public entry points must continue to use _debugPrecondition though (or something with the same inlinable behavior).

karwa · 2021-06-04T19:30:12Z

The benchmarks are largely in line with my expectations -- absolutely everything about these low-level types is load bearing, and I don't expect large performance improvements from eliminating highly predictable branches. (#34961 wasn't exactly worth the effort it took to land it, either, when we only look at the performance results. I consider this stuff to be mostly cosmetic.)

@lorentey

The branches may be highly predictable, but that is a runtime consideration. It definitely blocks the compiler performing other optimisations, although the performance impact will of course vary based on the kind of code you're benchmarking. This isn't theoretical; it is practical experience. WebURL uses a custom UnsafeBufferPointer for exactly this reason. You can try it for yourself:

This is the command I use to run the benchmarks (from the repo's Benchmarks folder):

# To generate a baseline
xcrun -toolchain swift swift build -c release && ./.build/release/WebURLBenchmark --warmup-iterations 30 --format json > results/main.json

# And again, outputting to a different file and using the python script to compare results.
xcrun -toolchain swift swift build -c release && ./.build/release/WebURLBenchmark --warmup-iterations 30 --format json > results/latest.json && python compare.py results/main.json results/latest.json

Also, I run each command twice. I don't trust the benchmark results directly after a build.

The simplest way to test the impact of indexing overflow checks would be to change this property and have it return Self. All the code that uses it is generic, so everything should work and you can get a direct comparison.

If the only thing to consider was how predictable the branches are, and you don't expect a significant performance impact from eliminating them, there should be ~no change (within the margin of error). That is demonstrably not the case. Run the benchmarks a few times and you'll see the google benchmark library is very good at getting consistent results for the same code. They are reliable indicators of the code's performance and it's not a fluke.

I already optimised this code to death before I switched to the custom buffer pointer, and doing so gave me about a 20% increase in performance. It's huge. The best change I ever made, and I'm not sure I could reach that level of performance without it.

By all means, copy the type and test it in your own projects. It's very easy to adopt and to compare results. If you have code which uses wCSIA, just tack .withoutTrappingOnIndexOverflow on the end of the given buffer.

If it turns out to be a precondition thing, then the usual approach to mitigate it is to create internal-only variants of the affected entry points that use _internalInvariant. The public entry points must continue to use _debugPrecondition though (or something with the same inlinable behavior).

I would like to (respectfully) challenge this. Why does it need to trap on overflow? What is it actually protecting against?

As I mentioned in my earlier comment, if there were salient indexes which the developer should not increment past, it would be the bounds of the collection, not the bounds of the integer type. I've asked a number of times now, but haven't had a direct answer to these questions, and I'm quite disappointed about that. This is an area that I care about, have spent a lot of time digging through, and my experience shows that it can yield significant benefits for a great number of Swift developers.

It would be a terrible shame if I had to advise people that the standard library has known performance issues which won't be fixed for nebulous reasons. I know that I am not owed answers to any of these questions, but I would really appreciate some. I think we're missing a trick here that could unlock better performance for other developers.

Thanks.

lorentey · 2021-06-05T02:43:07Z

This is a gentle reminder that the single best way to convince us stdlib engineers that a potential change would be a performance improvement is to contribute a self-contained benchmark that demonstrates the issue. 😉

As it stands, this PR does not look like much of a performance improvement, and neither did #34961. (Which is not to say I'm arguing against landing it! I just don't think it's fair to push for this as a performance fix when the available data indicates it's more of a regression.)

Clearly, you have some specific algorithms that make this a measurable performance win -- have you considered porting & reducing some of them and adding them to the benchmarking suite? (If you aren't able to do that, I'd be more than happy to take on the work of understanding your project myself, but please understand that it will have to be on my own schedule.)

lorentey · 2021-06-05T03:58:34Z

It would be a terrible shame if I had to advise people that the standard library has known performance issues which won't be fixed for nebulous reasons.

Nice. To be extremely clear -- removing stdlib preconditions is an extraordinarily risky proposition. We are threading on extremely thin ice here.

I don't think it's fair to characterize the desire to check, double check, and triple check these changes as "nebulous". If we make a mistake here, it could result in irreparable damage to the language. (Re-adding a precondition we've previously removed would be an ABI breaking change, and because this is in inlinable code, merely updating libswiftCore.dylib wouldn't be enough to patch up a potential issue.)

Searching for some evidence that this is actually going to be a beneficial change (and investigating actually measurable regressions) is simply due diligence. The way you seem to be arguing against doing this work (up to and including intimidation tactics like the quote above) makes me want to look through this change again in case there is a hidden issue I'm not seeing -- even though I've gone through it many times now, and I've already indicated I'd like to merge this.

I would like to (respectfully) challenge this. Why does it need to trap on overflow? What is it actually protecting against?

By default, we like Swift code to perform overflow checks on arithmetic operations, despite the obvious fact that this can sometimes have a performance impact. These unsafe constructs are, by definition, unable to fully check their preconditions, but we keep widening this safety hole -- we decided to systematically weaken conditions that we can check, starting from our original sin: disabling the range check on the UBP subscript in optimized code. This is a risky process, but we still have one safety net left: we are still leaving the precondition checks enabled in debug builds.

I dislike the idea of removing even that, unless we can prove that the precondition cannot possibly trigger. I think we can prove this in the case of the range initializer calls (which is why I'm fine with using the unchecked Range initializer). We obviously can't prove this in the case of the indexing operations -- these can trivially overflow.

Disabling [debug mode -- ed] overflow checks in index(_:offsetBy:) is particularly suspect to me, as it can be used to generate logic errors, where incrementing an index by a positive value turns out to actually decrement it*. How confident are we that this will not lead to some clever exploit down the road? (That said, there is important precedent here, as IIRC, UnsafePointer.advanced() doesn't check for overflow either -- and UBP indices are sort of like pointers. Disabling the checks altogether is not entirely off the table; but it's a last resort thing.)

* Edit: Actually, this requires multiple index(_:offsetBy:) calls -- this does weaken my point, but it doesn't entirely negate it.
* Edit^2: added clarification in brackets above

Lukasa · 2021-06-05T06:14:35Z

Ok, I think we need to take a step back for a moment.

Based on @lorentey’s comment above, it is clear that I have been labouring under a misapprehension. Specifically, my understanding was that the position of the core standard library maintainers is that URBP should not be responsible for policing misuse of its indices: that given that we did not consider it an error to subscript URBP with an invalid index, and given that URBP’s index is an Int and so can trivially have unchecked operations performed on it in user code, there was simply no point adding costs to the (more verbose) Collection-based equivalents.

This understanding is clearly not true! @lorentey my read of what you’re saying is that in an ideal world you’d prefer URBP to have more checks, not fewer. This is definitely against the tide of this PR, and I’d like to better understand the situation before we do anything, including merging this.

Importantly, for me, I’d like to try to come to an understanding of what we think URBP is and is for. This understanding must necessarily be holistic: it needs to take into account our similar views about URP and the other stdlib Collection classes. Who should be using URBP? When? How does this differ from our ideal world?

In the absence of a clear, shared understanding of what this data type is for we are going to go around in circles on this kind of issue. I was more than happy to go digging to try to find out why the benchmarks got slower, but if this PR represents a direction the stdlib team aren’t happy with then I’d rather not waste my time.

lorentey · 2021-06-05T06:53:58Z

This understanding is clearly not true! @lorentey my read of what you’re saying is that in an ideal world you’d prefer URBP to have more checks, not fewer. This is definitely against the tide of this PR, and I’d like to better understand the situation before we do anything, including merging this.

This isn't what I'm arguing. I'm arguing that we need to preserve assertions in unoptimized debug builds. That's the contract we ended up with for these unsafe buffer pointer types.

(My personal opinion is indeed that eliminating the bounds check in the subscript was a mistake we'll eventually regret -- in hindsight, I would've preferred if UBP had a separate, explicitly unchecked subscript instead, leaving the default subscript safe. But this opinion doesn't matter the slightest -- in the stdlib we have, UBP types are completely unsafe in optimized code, and I want to make sure we implement this the best we can.)

In the absence of a clear, shared understanding of what this data type is for we are going to go around in circles on this kind of issue. I was more than happy to go digging to try to find out why the benchmarks got slower, but if this PR represents a direction the stdlib team aren’t happy with then I’d rather not waste my time.

This is not my position. Have I not been actively working on landing this?

What I do have an issue with is @karwa's push to merge this quickly by violating the debug mode contract, or by taking other shortcuts. I don't feel there is any reason to land this PR as soon as possible, but there is every reason to thread carefully. We should investigate, understand, and fix issues, and make sure that this isn't going to actually make things worse.

These are fundamental abstractions; absolutely any change we make to them will show up as a regression in some benchmark. This is okay! -- we can fix the egregious ones without affecting the goals of this PR. But is it really too much to ask to add a benchmark that is dedicated to exercising these code paths? UBP operations are woefully underrepresented in our benchmarking suite. (If the answer is yes, that's fine -- I'm more than willing to do it myself! In my own time, of course.)

atrick · 2021-06-07T00:41:14Z

I'm about to throw more mud into the water, so apologies for that. I personally think the Swift compiler has always been over-optimizing trap operations. Generally reordering traps and hoisting them above memory side effects is just fine. Hoisting traps above potential program exits, I/O, and memory barriers leads to a mystifying programming model and ultimately systems that can't be easily debugged. If we ever hope to improve that without too much pushback from performance watchdogs, it will help if critical paths are not laden with traps in -O code.

Add to that the fact that inlining heuristics are impossible to predict, and I think it's fair to write contrived benchmarks using @inline(never) to shut down heroic optimization.

As a rule though, there needs to be some way of checking UB assumptions. For bounds/overflow, the best way to do that is testing with -Onone. So let's not remove any checks from debug builds. If these critical paths are slowing down code within the stdlib itself, rather than being inlined at least into the application code, then I think we'll need to address that as a separate stdlib performance problem.

lorentey · 2021-06-10T17:58:25Z

Hoisting traps above potential program exits, I/O, and memory barriers leads to a mystifying programming model and ultimately systems that can't be easily debugged.

I honestly had no idea that was a thing swiftc was doing! 🤯

atrick · 2021-07-19T02:29:58Z

@Lukasa you can ignore the DictionaryOfAnyHashableStrings_insert benchmark regression. I'll be fixing that if someone else doesn't get to it first: rdar://80746149 (Fix 90% performance regression in benchmark DictionaryOfAnyHashableStrings_insert)

Lukasa · 2022-02-03T15:55:52Z

Just a quick reminder that this patch exists.

karwa · 2022-02-03T17:04:23Z

Thanks for bumping this. It's an important thing to address.

What I do have an issue with is @karwa's push to merge this quickly by violating the debug mode contract, or by taking other shortcuts. I don't feel there is any reason to land this PR as soon as possible, but there is every reason to thread carefully. We should investigate, understand, and fix issues, and make sure that this isn't going to actually make things worse.

I have an even bigger issue with this mischaracterisation, @lorentey. I'm not trying to push to merge this quickly - indeed, I've filed many, many bug reports and started forum topics (and you were pinged in many/all of them, but did not respond), trying to actually have a discussion about whether it is meaningful to trap on overflow for indexing operations which do not actually read from/write to memory. Precisely because I believe it is important to be careful, investigate, understand, and fix issues. This characterisation that I'm being lazy or hasty or "taking shortcuts" is grossly inaccurate and I'm deeply offended by the suggestion.

My understanding is that: in general, for bounds-checked collections, indexing operations which do not touch memory do not need to trap on overflow. Overflow is only possible in those situations if the developer makes a mistake using the collection, and bounds-checking will catch that when they actually try to touch memory.

For the unsafe buffers, bounds-checking is disabled in release mode. That means that -- even without overflow -- there are many, many more mistakes a developer can make which will allow them to touch memory outside of the buffer's bounds. When using unsafe buffers, we rely on developer to perform their own bounds-checking (in whichever form that takes), and if they do so correctly, that will also cover integer overflows. If they don't, they will see those problems long before they even come close to overflowing a 64-bit integer.

But - again - I'm very open to hearing alternative opinions. Preferably on the forum topic I linked to previously, because I think it is an important discussion for the entire Swift community.

stdlib/public/core/UnsafeBufferPointer.swift.gyb

lorentey · 2022-02-03T19:40:58Z

@swift-ci test

lorentey · 2022-02-03T20:27:27Z

@swift-ci benchmark

swift-ci · 2022-02-03T23:15:52Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
Data.append.Sequence.64kB.Count.RE.I	20	29	+45.0%	0.69x
Data.append.Sequence.64kB.Count.RE	20	29	+45.0%	0.69x
Data.append.Sequence.809B.Count.RE.I	55	70	+27.3%	0.79x
Data.append.Sequence.809B.Count.RE	55	69	+25.5%	0.80x
DataCreateSmallArray	1400	1700	+21.4%	0.82x
DataAppendSequence	5700	6900	+21.1%	0.83x
Breadcrumbs.MutatedIdxToUTF16.Mixed	190	223	+17.4%	0.85x (?)
Set.isStrictSubset.Int0	72	80	+11.1%	0.90x (?)
Set.isDisjoint.Empty.Int	79	87	+10.1%	0.91x (?)
Set.isSuperset.Seq.Int.Empty	80	88	+10.0%	0.91x (?)
Set.isDisjoint.Seq.Empty.Int	74	81	+9.5%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
RawBufferInitializeMemory	130	87	-33.1%	1.49x
ObjectiveCBridgeStubDataAppend	2520	1880	-25.4%	1.34x
DataCountSmall	19	15	-21.1%	1.27x
DataCountMedium	19	15	-21.1%	1.27x
FlattenListFlatMap	4313	3428	-20.5%	1.26x (?)
Data.init.Sequence.64kB.Count.RE	20	16	-20.0%	1.25x
Data.init.Sequence.64kB.Count.RE.I	19	16	-15.8%	1.19x
SortIntPyramid	540	460	-14.8%	1.17x (?)
PrefixWhileAnySequence	215	188	-12.6%	1.14x
PrefixWhileSequence	203	178	-12.3%	1.14x
Data.init.Sequence.809B.Count.RE	43	38	-11.6%	1.13x (?)
Data.init.Sequence.809B.Count.RE.I	43	38	-11.6%	1.13x
StringHasSuffixAscii	1650	1470	-10.9%	1.12x
RandomShuffleLCG2	320	288	-10.0%	1.11x
DataSubscriptMedium	41	37	-9.8%	1.11x
Array2D	4624	4176	-9.7%	1.11x
Data.hash.Empty	52	47	-9.6%	1.11x (?)
MapReduceString	46	42	-8.7%	1.10x (?)
ArrayInClass	1190	1090	-8.4%	1.09x (?)
DropLastAnySequence	331	304	-8.2%	1.09x
RGBHistogram	1790	1660	-7.3%	1.08x (?)
WordCountHistogramASCII	2800	2600	-7.1%	1.08x (?)
ConvertFloatingPoint.MockFloat64Exactly2	15	14	-6.7%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgingStubs.o	19069	21049	+10.4%	0.91x

Improvement	OLD	NEW	DELTA	RATIO
SortLettersInPlace.o	8720	8496	-2.6%	1.03x
SortIntPyramids.o	9299	9203	-1.0%	1.01x

Performance (x86_64): -Osize

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedIdxToUTF16.Mixed	191	224	+17.3%	0.85x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed	187	219	+17.1%	0.85x
Set.isStrictSubset.Int0	72	80	+11.1%	0.90x (?)
Set.subtracting.Empty.Int	28	31	+10.7%	0.90x
Set.isDisjoint.Seq.Empty.Box	81	89	+9.9%	0.91x (?)
Set.isStrictSubset.Box0	75	82	+9.3%	0.91x (?)
ObjectiveCBridgeStubToNSStringRef	89	97	+9.0%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
Data.append.Sequence.64kB.Count.I	58	30	-48.3%	1.93x
Data.append.Sequence.64kB.Count	58	30	-48.3%	1.93x
RawBufferInitializeMemory	152	87	-42.8%	1.75x
Data.append.Sequence.809B.Count.I	99	64	-35.4%	1.55x
Data.append.Sequence.809B.Count	98	64	-34.7%	1.53x
Dictionary4	196	155	-20.9%	1.26x
ObjectiveCBridgeStubDateAccess	152	130	-14.5%	1.17x
NibbleSort	2120	1850	-12.7%	1.15x (?)
DataCountSmall	17	15	-11.8%	1.13x (?)
DataCountMedium	17	15	-11.8%	1.13x
SortStrings	555	498	-10.3%	1.11x
Data.hash.Empty	52	47	-9.6%	1.11x
RemoveWhereSwapInts	23	21	-8.7%	1.10x (?)
RemoveWhereFilterInts	25	23	-8.0%	1.09x (?)
DataAccessBytesMedium	56	52	-7.1%	1.08x (?)
PrefixWhileCountableRange	14	13	-7.1%	1.08x (?)

Code size: -Osize

Performance (x86_64): -Onone

Regression	OLD	NEW	DELTA	RATIO
Breadcrumbs.MutatedUTF16ToIdx.Mixed	188	221	+17.6%	0.85x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed	199	231	+16.1%	0.86x (?)
ArrayOfPOD	667	733	+9.9%	0.91x (?)
SIMDReduce.Int8	6201	6737	+8.6%	0.92x (?)
ArrayPlusEqualThreeElements	4780	5150	+7.7%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataReplaceLargeBuffer	12	11	-8.3%	1.09x (?)
DropFirstSequence	8735	8116	-7.1%	1.08x
DropFirstSequenceLazy	8740	8121	-7.1%	1.08x
DropFirstAnySequence	9465	8821	-6.8%	1.07x (?)
DropFirstAnySequenceLazy	8968	8379	-6.6%	1.07x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

atrick approved these changes May 14, 2021

View reviewed changes

Lukasa force-pushed the cb-fewer-checks-in-ubp branch from 0fedd0b to b5d176b Compare May 15, 2021 06:51

Lukasa force-pushed the cb-fewer-checks-in-ubp branch from b5d176b to a38d9f6 Compare May 15, 2021 07:08

Lukasa force-pushed the cb-fewer-checks-in-ubp branch from a38d9f6 to bb36d84 Compare May 15, 2021 09:32

Lukasa force-pushed the cb-fewer-checks-in-ubp branch from bb36d84 to 7e98c9c Compare May 15, 2021 11:31

lorentey reviewed Feb 3, 2022

View reviewed changes

lorentey added 2 commits February 3, 2022 11:39

Reword note on wrapping arithmetic

a3fdcfa

Re-add debug-mode overflow check in index(_:,_offsetBy:limit:)

0a82ef8

lorentey merged commit 8533b42 into swiftlang:main Feb 4, 2022

swiftlang deleted a comment Feb 4, 2022

glessard mentioned this pull request Apr 27, 2022

[SR-14422] UnsafeRawBufferPointer.count can trap #56778

Closed

Remove as much checked math as possible from buffer pointers. #37424

Remove as much checked math as possible from buffer pointers. #37424

Uh oh!

Conversation

Lukasa commented May 14, 2021

Uh oh!

Lukasa commented May 14, 2021

Uh oh!

Lukasa commented May 14, 2021

Uh oh!

swift-ci commented May 14, 2021

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

swift-ci commented May 14, 2021

Uh oh!

Lukasa commented May 14, 2021

Uh oh!

Lukasa commented May 14, 2021

Uh oh!

swift-ci commented May 14, 2021

Uh oh!

swift-ci commented May 14, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

karwa commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

swift-ci commented May 15, 2021

Uh oh!

Lukasa commented May 15, 2021

Uh oh!

Lukasa commented May 18, 2021

Uh oh!

Lukasa commented May 18, 2021

Uh oh!

atrick commented May 19, 2021

Uh oh!

Lukasa commented May 19, 2021

Uh oh!

lorentey commented Jun 2, 2021

Uh oh!

swift-ci commented Jun 2, 2021

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

swift-ci commented Jun 2, 2021

Performance (arm64): -O

Code size: -O

Performance (arm64): -Osize

Code size: -Osize

Performance (arm64): -Onone

Code size: -swiftlibs

Uh oh!

Lukasa commented Jun 3, 2021

Uh oh!

eeckstein commented Jun 3, 2021

eeckstein commented Jun 3, 2021 •

edited

Loading

lorentey commented Jun 3, 2021 •

edited

Loading

karwa commented Jun 4, 2021 •

edited

Loading

lorentey commented Jun 5, 2021 •

edited

Loading

lorentey commented Jun 5, 2021 •

edited

Loading

karwa commented Feb 3, 2022 •

edited

Loading