Skip to content

Use in_guaranteed for let captures #29812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 10, 2020
Merged

Conversation

meg-gupta
Copy link
Contributor

@meg-gupta meg-gupta commented Feb 13, 2020

With this all let values will be captured with in_guaranteed convention
by the closure. Following are the main changes :

SILGen changes:

  • A new CaptureKind::Immutable is introduced, to capture let values as in_guaranteed.
  • SILGen of in_guaranteed capture had to be fixed.
    in_guaranteed captures as per convention are consumed by the closure. And so SILGen should not generate a destroy_addr for an in_guaranteed capture.
    But LetValueInitialization can push Dealloc and Release states of the captured arg in the Cleanup stack, and there is no way to access the CleanupHandle and disable the emission of destroy_addr while emitting the captures in SILGenFunction::emitCaptures.
    So we now create, temporary allocation of the in_guaranteed capture iduring SILGenFunction::emitCaptures without emitting destroy_addr for it.

SILOptimizer changes:

  • Handle in_guaranteed in CopyForwarding.
  • Adjust dealloc_stack of in_guaranteed capture to occur after destroy_addr for on_stack closures in ClosureLifetimeFixup.

IRGen changes :

  • Since HeapLayout can be non-fixed now, make sure emitSize is used conditionally
  • Don't consider ClassPointerSource kind parameter type for fulfillments while generating code for partial apply forwarder.
    The TypeMetadata of ClassPointSource kind sources are not populated in HeapLayout's NecessaryBindings. If we have a generic parameter on the HeapLayout which can be fulfilled by a ClassPointerSource, its TypeMetaData will not be found while constructing the dtor function of the HeapLayout.
    So it is important to skip considering sources of ClassPointerSource kind, so that TypeMetadata of a dependent generic parameters gets populated in HeapLayout's NecessaryBindings.

rdar://58459539

@meg-gupta
Copy link
Contributor Author

@swift-ci benchmark

@gottesmm
Copy link
Contributor

@meg-gupta before this goes in can you fix up the git history?

@meg-gupta
Copy link
Contributor Author

@gottesmm Will do

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
LazilyFilteredArrayContains 26000 34200 +31.5% 0.76x
ArrayAppendLazyMap 5420 6240 +15.1% 0.87x (?)
Data.init.Sequence.809B.Count.RE.I 113 128 +13.3% 0.88x (?)
Set.subtracting.Seq.Empty.Int 158 174 +10.1% 0.91x (?)
RemoveWhereSwapInts 53 58 +9.4% 0.91x (?)
DataCountSmall 23 25 +8.7% 0.92x (?)
MapReduceClass2 35 38 +8.6% 0.92x (?)
MapReduceAnyCollection 331 357 +7.9% 0.93x (?)
MapReduce 333 359 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DictionaryBridgeToObjC_Access 891 785 -11.9% 1.14x (?)
FlattenListLoop 4419 3986 -9.8% 1.11x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
LazyFilter.o 8177 8493 +3.9% 0.96x
Substring.o 18197 18493 +1.6% 0.98x

Performance: -Osize

Regression OLD NEW DELTA RATIO
ArrayAppendLazyMap 6040 6860 +13.6% 0.88x (?)
CharacterLiteralsLarge 89 100 +12.4% 0.89x
LazilyFilteredArrayContains 55900 62300 +11.4% 0.90x (?)
Data.init.Sequence.809B.Count.RE 120 131 +9.2% 0.92x (?)
ObjectiveCBridgeStubNSDateRefAccess 333 359 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 6578 4976 -24.4% 1.32x (?)
FlattenListLoop 4695 3990 -15.0% 1.18x (?)
DictionaryBridgeToObjC_Access 979 847 -13.5% 1.16x (?)
DataAccessBytesMedium 92 82 -10.9% 1.12x (?)
DataCountMedium 25 23 -8.0% 1.09x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
LazyFilter.o 7687 8123 +5.7% 0.95x
Substring.o 17029 17453 +2.5% 0.98x

Performance: -Onone

Improvement OLD NEW DELTA RATIO
ArrayAppendUTF16 21080 16762 -20.5% 1.26x (?)
ArrayAppendLatin1 21080 16796 -20.3% 1.26x (?)
ArrayAppendAscii 21080 16898 -19.8% 1.25x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@meg-gupta
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
LessSubstringSubstring 21 28 +33.3% 0.75x
EqualSubstringSubstring 21 27 +28.6% 0.78x
EqualSubstringString 21 27 +28.6% 0.78x
LessSubstringSubstringGenericComparable 21 27 +28.6% 0.78x
LazilyFilteredArrayContains 17500 22400 +28.0% 0.78x
EqualStringSubstring 22 28 +27.3% 0.79x
EqualSubstringSubstringGenericEquatable 22 28 +27.3% 0.79x
RemoveWhereSwapInts 33 39 +18.2% 0.85x
StringComparison_longSharedPrefix 313 352 +12.5% 0.89x
ArrayPlusEqualFiveElementCollection 4292 4810 +12.1% 0.89x
Array2D 3760 4192 +11.5% 0.90x
IterateData 798 864 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Data.init.Sequence.64kB.Count.RE.I 50 38 -24.0% 1.32x
Data.init.Sequence.64kB.Count.RE 50 38 -24.0% 1.32x
Set.isStrictSubset.Seq.Int.Empty 112 101 -9.8% 1.11x (?)
Data.init.Sequence.809B.Count.RE 94 85 -9.6% 1.11x (?)
ParseInt.Small.Decimal 199 185 -7.0% 1.08x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
LazyFilter.o 8177 8493 +3.9% 0.96x
Substring.o 18197 18493 +1.6% 0.98x

Performance: -Osize

Regression OLD NEW DELTA RATIO
Data.append.Sequence.64kB.Count.RE 20 30 +50.0% 0.67x
Data.append.Sequence.64kB.Count.RE.I 21 30 +42.9% 0.70x
LessSubstringSubstring 21 27 +28.6% 0.78x
EqualSubstringString 21 27 +28.6% 0.78x
LessSubstringSubstringGenericComparable 21 27 +28.6% 0.78x
EqualSubstringSubstring 22 28 +27.3% 0.79x
EqualStringSubstring 22 28 +27.3% 0.79x
EqualSubstringSubstringGenericEquatable 22 28 +27.3% 0.79x
DataCountSmall 15 19 +26.7% 0.79x
DataCountMedium 15 19 +26.7% 0.79x (?)
Data.init.Sequence.64kB.Count.RE.I 40 50 +25.0% 0.80x
Data.init.Sequence.64kB.Count.RE 41 50 +22.0% 0.82x
Data.init.Sequence.809B.Count.RE 81 95 +17.3% 0.85x
Data.init.Sequence.809B.Count.RE.I 82 96 +17.1% 0.85x (?)
ArraySetElement 283 327 +15.5% 0.87x
Data.append.Sequence.809B.Count.RE 91 105 +15.4% 0.87x (?)
Data.append.Sequence.809B.Count.RE.I 95 108 +13.7% 0.88x (?)
ObjectiveCBridgeStubNSDateRefAccess 174 196 +12.6% 0.89x (?)
StringComparison_longSharedPrefix 314 351 +11.8% 0.89x (?)
UTF8Decode_InitDecoding 116 125 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DataAccessBytesMedium 59 52 -11.9% 1.13x (?)
ArrayInClass 935 845 -9.6% 1.11x (?)
DataCreateEmptyArray 1500 1400 -6.7% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
LazyFilter.o 7687 8123 +5.7% 0.95x
Substring.o 17029 17453 +2.5% 0.98x

Performance: -Onone

Regression OLD NEW DELTA RATIO
EqualSubstringSubstringGenericEquatable 25 31 +24.0% 0.81x (?)
LessSubstringSubstringGenericComparable 25 31 +24.0% 0.81x (?)
LessSubstringSubstring 26 32 +23.1% 0.81x
EqualSubstringSubstring 26 31 +19.2% 0.84x
EqualStringSubstring 27 32 +18.5% 0.84x
EqualSubstringString 27 32 +18.5% 0.84x
DataSubscriptMedium 56 61 +8.9% 0.92x (?)
UTF8Decode_InitDecoding 131 142 +8.4% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSStringRef 131 120 -8.4% 1.09x (?)
ObjectiveCBridgeStubFromNSString 518 477 -7.9% 1.09x (?)
ObjectiveCBridgeStubFromNSDateRef 3050 2820 -7.5% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Contributor

@slavapestov slavapestov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy to see usages of Inout_Aliasable go away.

Ideally we would get rid of capture.isNoEscape() too. Is that a possibility? We should be able to promote box captures to stack in the SIL optimizer, instead of relying on Sema/SILGen setting capture.isNoEscape().

@slavapestov
Copy link
Contributor

Can we use inout instead of inout_aliasable for mutable captures as well? If I understand correctly, they already should not alias because of exclusivity checking.

@gottesmm
Copy link
Contributor

@slavapestov that is a much larger change. In reality what we need to do is make it so that inout_aliasable is either in_guaranteed or inout. It would let us eliminate a bunch of unfortunate things in the code base.

@meg-gupta
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - d297d9a391fda226881c3d39e038dd297653d2aa

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - d297d9a391fda226881c3d39e038dd297653d2aa

@meg-gupta
Copy link
Contributor Author

@slavapestov In certain cases there is a mismatch between the capture.isNoEscape in the frontend and outType.isNoEscape in IRGen. This can cause issue for generating code for the in_guaranteed param. I am working on changes to get rid of this. Will post an update soon.

@meg-gupta meg-gupta changed the title Eliminate copy_addr for in_guaranteed captures in no escape closures Use in_guaranteed for let captures Mar 3, 2020
@meg-gupta meg-gupta marked this pull request as ready for review March 3, 2020 22:40
@meg-gupta
Copy link
Contributor Author

@swift-ci please test

@meg-gupta
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Mar 3, 2020

Build failed
Swift Test Linux Platform
Git Sha - d88bf33e1945cf0c7bd2ec2a748d6bb80a4e0379

@swift-ci
Copy link
Contributor

swift-ci commented Mar 3, 2020

Build failed
Swift Test OS X Platform
Git Sha - d88bf33e1945cf0c7bd2ec2a748d6bb80a4e0379

@meg-gupta
Copy link
Contributor Author

@swift-ci please test

@meg-gupta
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Mar 4, 2020

Build failed
Swift Test Linux Platform
Git Sha - d88bf33e1945cf0c7bd2ec2a748d6bb80a4e0379

@meg-gupta
Copy link
Contributor Author

@swift-ci please test

@meg-gupta
Copy link
Contributor Author

@swift-ci Please test OS X platform

@meg-gupta
Copy link
Contributor Author

@swift-ci please benchmark

@meg-gupta meg-gupta merged commit 13b9915 into swiftlang:master Mar 10, 2020
@atrick
Copy link
Contributor

atrick commented Mar 10, 2020

Can we use inout instead of inout_aliasable for mutable captures as well? If I understand correctly, they already should not alias because of exclusivity checking.

@slavapestov exclusivity is a prerequisite. But now the no-escape closure representation needs to be fixed. @inout_aliasable is still used for aliasing stuff

func doit(_ f: ()->Int) -> Int {
  f()
}

public func testInoutAlias() -> Int {
  var i = 0
  let f = { i = 3 }
  return doit() {
    f()
    return i
  }
}

It's very clear to me how this should be fixed, and I wrote up a design proposal for it years ago but it has been lower priority than bug fixing.

rintaro added a commit that referenced this pull request Mar 10, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
@swiftlang swiftlang deleted a comment from swift-ci Mar 11, 2020
Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meg-gupta I'm very happy you got this working. It's a very clean design that defines away a category of memory bugs.

Please address these issues in a follow-up PR:

  • Comments should be 80 columns

In TypeConverter::getDeclCaptureKind:

  • assert that Immutable captures are address-only

  • Fix the above check for opaque values. Note that SILOpaqueValues has
    not effect on SIL calling convention, so whether it is enabled should probably
    have no effect on the CaptureKind.

In TempRValueOpts:

  • collectLoads: this looks correct, but please comment why mark_dependence base operand is safe to consider a load, and why the value operand needs to be transitive (otherwise we would miscompile!)

In irgen::emitFunctionPartialApplication():

  • In the block comment please explain what is happening clearly. You or @aschwaighofer or @rjmccall can explain this better than I can, but basically:

The context's HeapLayout is only "fixed" when all of the captures have
fixed type info. When the HeapLayout is fixed, then [the bindings]
need to map the context's fields to type parameters so that the
context destructor can find the metadata. When the HeapLayout is not
fixed, then the metadata is embedded in the context to be read
directly by the destructor, so there is not need to map captures to
type parameters--in this case, bypass computing the bindings because
it would fail when it sees ClassPointer or Metadata sources.

[I could have gotten this completely wrong, even after staring at the code for about an hour, so I think it's well worth explaining]

@aschwaighofer
Copy link
Contributor

aschwaighofer commented Mar 11, 2020

@atrick The issue is that we must not consider parameter sources as type sources (example: we can and do reconstruct type metadata T for a class reference parameter SomeClass in <T> closure_func(@in_guaranteed T, @guaranteed SomeClass<T>)) when computing which metadata to store in the heap layout because -- as opposed to the partially applied function forwarder -- those parameters (the unapplied ones) won't be available in the destructor.

This is only an issue in non fixed heap layouts because only for those we need the metadata of non-fixed captures in the destructor.

@atrick
Copy link
Contributor

atrick commented Mar 11, 2020

@aschwaighofer that makes perfect sense, but then why does this PR set considerParameterSources to false when HeapLayout is not fixed. That's what I find so confusing and why it needs to be explained.

@aschwaighofer
Copy link
Contributor

aschwaighofer commented Mar 11, 2020

"The issue is that we must not consider parameter sources as type sources..."
"This is only an issue in non fixed heap layouts ...."

It is only safe to considerParameterSources when we know that we won't need type metadata provided by said parameter sources in the destructor (the destructor does not have that parameter source). In other words: It is only the safe when the HeapLayout is fixed.

There are two sources of type metadata: the ones passed explicitly to a generic function (or stored explicitly in the heap context), and those implicit in other parameters (e.g SomeClass parameter). We must ignore the SomeClass parameter source because it is not available in the destructor but only in the partial apply forwarder.

All this is only an issue for non-fixed types because for fixed ones we don't need that Metadata.

@gottesmm
Copy link
Contributor

@aschwaighofer add that in a comment? Just doing a drive by.

@meg-gupta
Copy link
Contributor Author

@gottesmm I'll include it in a follow up PR with other suggestions above

@atrick
Copy link
Contributor

atrick commented Mar 12, 2020

@aschwaighofer thanks

My confusion had to do with what the "bindings" for type metadata are
used for... Let me try again:

The context's HeapLayout is "fixed" when all of its captures [partially
applied parameters?] have a fixed layout. A context with fixed
HeapLayout can be destroyed without access to type metadata;
consequently, it is safe to consider partially applied parameters as
the source of type metadata bindings even though those parameters are
unavailable when the context is destroyed.

When the context's HeapLayout is not fixed, then its destructor
requires access to type metadata of each capture. Therefore, partially
parameters, which are unavailable when the destructor is called, cannot
be the souce of type metadata.

(I'm not actually sure what the comment about ClassPointer or Metadata
sources has to do with, presumably there are other sources?)

meg-gupta added a commit that referenced this pull request Mar 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants