Skip to content

Use associated objects to attach contiguous array buffers to lazy ones #75148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 20, 2024

Conversation

Catfish-Man
Copy link
Contributor

@Catfish-Man Catfish-Man commented Jul 10, 2024

This uses ObjC runtime trickery to allow us to get an inner pointer with lifetime bounded to the array buffer from every array, rather than just from native ones

Fixes rdar://132124808

@Catfish-Man Catfish-Man self-assigned this Jul 10, 2024
@Catfish-Man
Copy link
Contributor Author

@swift-ci please smoke test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

Benchmark results look fine so far, let's see if that stays true once I adopt it in CoW

------- Performance (arm64): -Osize -------

REGRESSION                            OLD        NEW        DELTA     RATIO    
StringWithCString2                    0.0        0.002      +200.0%   **0.33x**

IMPROVEMENT                           OLD        NEW        DELTA     RATIO    
StringSwitch                          201.909    150.067    -25.7%    **1.35x**
BufferFillFromSlice                   10.745     9.702      -9.7%     **1.11x (?)**
RawBufferCopyBytes                    12.308     11.149     -9.4%     **1.10x (?)**
DifferentiationIdentity               89.0       81.619     -8.3%     **1.09x (?)**
Set.isSubset.Seq.Int.Empty            86.923     80.207     -7.7%     **1.08x**
RandomTree.insert.Unmanaged.fast      75.037     69.259     -7.7%     **1.08x**
RandomTree.insert.Unmanaged.slow      82.84      76.577     -7.6%     **1.08x**
Set.isSuperset.Seq.Box0               85.037     78.679     -7.5%     **1.08x**
StringBuilderSmallReservingCapacity   161.429    149.813    -7.2%     **1.08x**
CharIteration_utf16_unicodeScalars    3527.273   3276.667   -7.1%     **1.08x**
CSVParsingAlt2                        579.089    540.796    -6.6%     **1.07x**

------- Code size: -Osize -------

IMPROVEMENT    OLD    NEW    DELTA    RATIO  
BufferFill.o   6825   5721   -16.2%   **1.19x**

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

Still doing ok. That honestly makes me wonder if we don't have benchmark coverage for this case, since I'd expect it to be better, worse, or both.

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

Trying out some new benchmarks for this in #75215

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

(Note: this is still not fully ready, due to a thread safety issue)

@Catfish-Man
Copy link
Contributor Author


IMPROVEMENT                               OLD        NEW       DELTA     RATIO    
NSArray.bridged.repeatedBufferAccess      545.25     43.529    -92.0%    **12.53x**
NSArray.bridged.bufferAccess              622.667    112.7     -81.9%    **5.52x**
StringSwitch                              202.0      149.533   -26.0%    **1.35x**
EqualSubstringString                      20.229     17.407    -13.9%    **1.16x**
LessSubstringSubstringGenericComparable   20.365     17.556    -13.8%    **1.16x**
EqualSubstringSubstringGenericEquatable   20.365     17.558    -13.8%    **1.16x**
EqualStringSubstring                      20.128     17.4      -13.6%    **1.16x**
LessSubstringSubstring                    20.26      17.566    -13.3%    **1.15x**
EqualSubstringSubstring                   20.25      17.567    -13.2%    **1.15x**
RandomTree.insert.Unmanaged.slow          84.957     78.52     -7.6%     **1.08x**
StringBuilderSmallReservingCapacity       162.071    151.2     -6.7%     **1.07x**

------- Code size: -Osize -------

REGRESSION             OLD    NEW    DELTA   RATIO  
ClassArrayGetter.o     3006   3102   +3.2%   **0.97x**
NopDeinit.o            3265   3361   +2.9%   **0.97x**
ArrayOfRef.o           5629   5729   +1.8%   **0.98x**
ArrayRemoveAll.o       5580   5676   +1.7%   **0.98x**
ArrayOfGenericRef.o    6300   6400   +1.6%   **0.98x**
SortLettersInPlace.o   6386   6482   +1.5%   **0.99x**
COWTree.o              8488   8584   +1.1%   **0.99x**
DictionaryGroup.o      8695   8791   +1.1%   **0.99x**

@Catfish-Man
Copy link
Contributor Author

@swift-ci please smoke test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@@ -51,12 +51,16 @@ internal struct _CocoaArrayWrapper: RandomAccessCollection {

@usableFromInline
internal var endIndex: Int {
return core.count
@_effects(releasenone) get {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My claim is that any NSArray subclass that responds to -count or -objectAtIndex: by releasing something relevant is broken and will crash when used in ObjC code anyway

@Catfish-Man
Copy link
Contributor Author

Fixed the code size issues! And adding the lock doesn't seem to have been particularly detrimental to performance

------- Performance (arm64): -Osize -------

REGRESSION                             OLD        NEW        DELTA    RATIO    
CharacterLiteralsLarge                 48.511     59.919     +23.5%   **0.81x**
CharacterLiteralsSmall                 142.688    165.5      +16.0%   **0.86x**
FindString.Rec3.Substring              53.079     60.829     +14.6%   **0.87x (?)**
SubstringFromLongStringGeneric2        18.546     19.969     +7.7%    **0.93x**
SubstringFromLongString2               18.548     19.97      +7.7%    **0.93x**
CharIteration_tweet_unicodeScalars     4915.789   5287.059   +7.6%    **0.93x**

IMPROVEMENT                            OLD        NEW        DELTA    RATIO    
NSArray.bridged.repeatedBufferAccess   543.75     45.673     -91.6%   **11.91x**
NSArray.bridged.bufferAccess           628.333    137.733    -78.1%   **4.56x**
Set.isDisjoint.Seq.Box.Empty           62.769     37.09      -40.9%   **1.69x**
Set.subtracting.Seq.Box.Empty          88.462     64.2       -27.4%   **1.38x**
StringSwitch                           201.909    150.067    -25.7%   **1.35x**
BufferFillFromSlice                    11.25      10.13      -10.0%   **1.11x (?)**
DataAppendDataMediumToLarge            5455.556   5023.077   -7.9%    **1.09x (?)**
RandomTree.insert.Unmanaged.slow       82.783     76.56      -7.5%    **1.08x (?)**
RandomTree.insert.Unmanaged.fast       74.741     69.786     -6.6%    **1.07x (?)**
StringBuilderSmallReservingCapacity    161.929    151.2      -6.6%    **1.07x**

------- Code size: -Osize -------

IMPROVEMENT          OLD    NEW    DELTA   RATIO  
ArrayRemoveAll.o     5580   5264   -5.7%   **1.06x**
ClassArrayGetter.o   3006   2958   -1.6%   **1.02x**
COWTree.o            8488   8380   -1.3%   **1.01x**
NopDeinit.o          3265   3225   -1.2%   **1.01x**

@Catfish-Man
Copy link
Contributor Author

@swift-ci please smoke test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man Catfish-Man marked this pull request as ready for review July 19, 2024 06:34
@Catfish-Man Catfish-Man requested a review from a team as a code owner July 19, 2024 06:34
@Catfish-Man Catfish-Man changed the title First draft of using associated objects to attach contiguous array buffers to lazy ones Use associated objects to attach contiguous array buffers to lazy ones Jul 19, 2024
@Catfish-Man Catfish-Man requested a review from mikeash July 19, 2024 19:48

@inlinable @_alwaysEmitIntoClient
internal func setAssociatedBuffer(_ buffer: _ContiguousArrayBuffer<Element>) {
let setter = unsafeBitCast(getSetAssociatedObjectPtr(), to: (@convention(c)(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ridiculous dance is a way of tricking swiftc into calling objc_setAssociatedObject() directly without a) being able to import the definition, or b) going through the importer's id -> Any thunking


@inlinable @_alwaysEmitIntoClient
internal func getAssociatedBuffer() -> _ContiguousArrayBuffer<Element>? {
let getter = objc_getAssociatedObject as @convention(c)(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below about objc_setAssociatedObject for what's going on here. It's almost the same trick but slightly less tangly because we don't have to deal with objc_associationPolicy


#endif

#if !FOUNDATION_HELPER_FILE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FoundationHelpers.mm does import the header, but this header can't import it, so we need to provide the decl only for non-FoundationHelpers.mm files. I hate this, suggestions welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this section be moved to a new separate header?

internal func withUnsafeBufferPointer_nonNative<R>(
_ body: (UnsafeBufferPointer<Element>) throws -> R
) rethrows -> R {
objc_sync_enter(_storage.objCInstance)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objc_sync_enter is equivalent to @synchronized(), which gives us a way to get a lock here without declaring any storage for it, which in turn means we can do it from inlinable code in a back deployment friendly way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you expect this to usually be retrieved multiple per object, then this might benefit from double-checked locking. Get, return if set, otherwise lock and get again in case something else set it. The locking in the associated objects stuff saves you from needing to worry about memory barriers.

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@mikeash mikeash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks... I hesitate to say "good" so let's just say "correct," overall. A few little comments.

internal func withUnsafeBufferPointer_nonNative<R>(
_ body: (UnsafeBufferPointer<Element>) throws -> R
) rethrows -> R {
objc_sync_enter(_storage.objCInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you expect this to usually be retrieved multiple per object, then this might benefit from double-checked locking. Get, return if set, otherwise lock and get again in case something else set it. The locking in the associated objects stuff saves you from needing to worry about memory barriers.


#endif

#if !FOUNDATION_HELPER_FILE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this section be moved to a new separate header?

_storage.objCInstance,
_ArrayBuffer.associationKey,
buffer._storage,
0o1401 //OBJC_ASSOCIATION_RETAIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBJC_ASSOCIATION_RETAIN_NONATOMIC will save you a retain/autorelease when getting the value. The atomic version just protects you from a concurrent set replacing the object while you're retrieving it, and that won't happen here. I expect the performance difference would be pretty small, though, since the autorelease will be elided.

setAssociatedBuffer(unwrapped)
}
defer { _fixLifetime(unwrapped) }
objc_sync_exit(_storage.objCInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this is true but just checking: this is definitely 100% sure always going to be the exact same pointer that you got from _storage.objCInstance above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also pretty sure it's true, but there's no reason not to be paranoid and stash it in a local

@@ -189,6 +189,10 @@ extension _ArrayBuffer {
internal __consuming func _consumeAndCreateNew(
bufferIsUnique: Bool, minimumCapacity: Int, growForAppend: Bool
) -> _ArrayBuffer {
if !_isNative && capacity >= minimumCapacity,
let associatedBuffer = getAssociatedBuffer() {
return _ArrayBuffer(_buffer: associatedBuffer, shiftedToStartIndex: 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right -- IIUC this function needs to return a uniquely referenced buffer, which (again IIUC) this won't be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting point. It will become uniquely referenced the moment it replaces the old buffer (because deallocating the old buffer will remove the association). Is that sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm no it is not, because the old buffer might have an extra reference from another variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a radar to track figuring out how to do this safely, going to just rip it out for now

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please Apple Silicon benchmark

to: (@convention(c)(
AnyObject,
UnsafeRawPointer
) -> Unmanaged<AnyObject>?).self
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Unmanaged here prevents swiftc from generating pointless calls to objc_retainAutoreleasedReturnValue

@Catfish-Man
Copy link
Contributor Author

------- Performance (arm64): -Osize -------

REGRESSION                             OLD       NEW       DELTA    RATIO    
String.replaceSubrange.String          5.708     6.187     +8.4%    **0.92x**
ArrayAppendToGeneric                   164.425   177.273   +7.8%    **0.93x (?)**
SuperChars2                            136.688   147.2     +7.7%    **0.93x**

IMPROVEMENT                            OLD       NEW       DELTA    RATIO    
NSArray.bridged.repeatedBufferAccess   547.0     15.232    -97.2%   **35.91x**
NSArray.bridged.bufferAccess           635.333   95.13     -85.0%   **6.68x**
Set.isDisjoint.Seq.Box.Empty           62.821    35.661    -43.2%   **1.76x**
Set.subtracting.Seq.Box.Empty          88.5      62.771    -29.1%   **1.41x**
StringSwitch                           201.909   149.8     -25.8%   **1.35x**
Join                                   66.759    58.364    -12.6%   **1.14x (?)**
BufferFillFromSlice                    11.5      10.161    -11.6%   **1.13x (?)**
MapReduceNSDecimalNumberShort          155.067   142.75    -7.9%    **1.09x (?)**
Set.isSubset.Seq.Int.Empty             87.077    80.185    -7.9%    **1.09x**
StringBuilderSmallReservingCapacity    162.071   150.063   -7.4%    **1.08x**
StringFromLongWholeSubstring           1.998     1.856     -7.1%    **1.08x**

------- Code size: -Osize -------

IMPROVEMENT            OLD    NEW    DELTA   RATIO  
ArrayRemoveAll.o       5580   5208   -6.7%   **1.07x**
ClassArrayGetter.o     3006   2902   -3.5%   **1.04x**
NopDeinit.o            3265   3169   -2.9%   **1.03x**
COWTree.o              8488   8324   -1.9%   **1.02x**
ArrayOfRef.o           5629   5537   -1.6%   **1.02x**
ArrayOfGenericRef.o    6300   6208   -1.5%   **1.01x**
SortLettersInPlace.o   6386   6314   -1.1%   **1.01x**

@mikeash 's suggestions worked very nicely

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man
Copy link
Contributor Author

@swift-ci please test

@Catfish-Man Catfish-Man enabled auto-merge (squash) July 20, 2024 00:45
@Catfish-Man Catfish-Man merged commit 7c01181 into swiftlang:main Jul 20, 2024
4 of 5 checks passed
iMostfa pushed a commit to iMostfa/swift that referenced this pull request Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants