-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add value witnesses for single payload enums #12606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add value witnesses for single payload enums #12606
Conversation
So far single payload enums were implemented in terms of runtime functions which internally emitted several calls to value witnesses. This commit adds value witnesses to get and store the enum tag side stepping the need for witness calls as this information is statically available in many cases /// int (*getEnumTagSinglePayload)(const T* enum, UINT_TYPE emptyCases) /// Given an instance of valid single payload enum with a payload of this /// witness table's type (e.g Optional<ThisType>) , get the tag of the enum. /// void (*storeEnumTagSinglePayload)(T* enum, INT_TYPE whichCase, /// UINT_TYPE emptyCases) /// Given uninitialized memory for an instance of a single payload enum with a /// payload of this witness table's type (e.g Optional<ThisType>), store the /// tag. A simple 'for element in array' loop in generic code operating on a ContigousArray of Int is ~25% faster on arm64. rdar://31408033
Fast-isel does not like them
@swift-ci Please test |
Build failed |
@swift-ci Please test linux |
What's the code size impact for this change? |
I don’t have numbers yet. it should somewhat even out with the removal of the array witnesses (of which there where four) |
@swift-ci Please benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the extra inhabitant value witnesses are now only used when you have an enum whose payload is a structural type, like a tuple. Can you think of a clever way around this, implementing the single payload value witnesses for tuples in terms of the single payload value witnesses of the payload type? If so, you could trim the value witness table further.
lib/IRGen/GenType.cpp
Outdated
// numTags += ((emptyCases + (casesPerTagBitValue - 1U)) >> bits); | ||
// } | ||
// return (numTags < 256 ? 1 : | ||
// numTags < 65536 ? 2 : 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
lib/IRGen/GenType.cpp
Outdated
auto *phi = Builder.CreatePHI(int32Ty, 3); | ||
phi->addIncoming(one, entryBB); | ||
phi->addIncoming(numTags, notLT256BB); | ||
return phi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
For example, would it be sufficient to say that
This relies on the fact that extra inhabitants of a tuple type are always the extra inhabitants of its first component. If this were to change, the above would no longer work, but maybe we can live with that restriction? |
Build comment file:Optimized (O)Regression (9)
Improvement (14)
No Changes (311)
Unoptimized (Onone)Regression (3)
Improvement (60)
No Changes (271)
Hardware Overview
|
@slavapestov They are also used for non-fixed size structs and enums. Where the enums are the hairy part. structs currently follow the same rule as tuples (first field's extra inhabitants).
If we are okay to loose the single payload enums extra inhabitants I believe the scheme would work. A further complication is that now the enum witnesses would have to be described in terms of an abstract size such that we can find the address of the extra tag bytes when we forward to the payload of an enum/first element of a struct/tuple . This is doable (see below) but means that the witnesses of fixed size types will no longer simplify as well because the size is dynamic. The witnesses would get an additional argument that is the size of the 'outermost' type that we append any potential .
would become
We pass null at the top-level of witness calls. So fixed size types will still use the static size in the common case. Hand-wavy sketch for a fixed size type:
The witness for a tuple will look like:
What does that mean for code size: We have the cost of passing an extra zero at all call-sites plus bigger witness implementations for fixed size types because the size is dynamic. Not immediately clear that this is a code size win. You only pay for extra inhabitant witnesses if there are extra inhabitants. |
Ok, we can stick with the current approach then. |
Conditionally (on Osize) outline the witness call - this saves two loads
The standard library text size increases by 0.9% on x86-64. From 4254426 to 4292298 bytes. Out of the 37K bytes, 21K bytes are from the witnesses, the rest is overhead because we now have to load the witness on the caller side. |
(Numbers on arm are similar) |
@swift-ci Please smoke test mac os x |
@swift-ci Please test |
Build failed |
Build failed |
@swift-ci Please test linux |
@swift-ci Please benchmark |
Build comment file:Optimized (O)Regression (10)
Improvement (8)
No Changes (316)
Unoptimized (Onone)Regression (6)
Improvement (51)
No Changes (277)
Hardware Overview
|
Code size swift-source-compat-suite. On interesting fact I noticed while looking at:
that the delta (3.3%) could roughly be half that. There are duplicate value witnesses that llvm's merge function pass would merge. We run -wmo -num-threads 1..N and so these witnesses are split up per source file and don't get merged. That is -wmo -num-threads >0 is not helpful for the merge function pass and we leave code size improvements on the table (not only for witnesses). This is the obvious trade-off of num-threads but had not occurred to me before.
|
Revert "Merge pull request #12606 from aschwaighofer/single_payload_enum_witness"
Revert "Revert "Merge pull request swiftlang#12606 from aschwaighofer/single_payload_enum_witness"" This reverts commit c422f80.
This is essentially a long-belated follow-up to Arnold's swiftlang#12606. The key observation here is that the enum-tag-single-payload witnesses are strictly more powerful than the XI witnesses: you can simulate the XI witnesses by using an extra case count that's <= the XI count. Of course the result is less efficient than the XI witnesses, but that's less important than overall code size, and we can work on fast-paths for that. The extra inhabitant count is stored in a 32-bit field (always present) following the ValueWitnessFlags, which now occupy a fixed 32 bits. This inflates non-XI VWTs on 32-bit targets by a word, but the net effect on XI VWTs is to shrink them by two words, which is likely to be the more important change. Also, being able to access the XI count directly should be a nice win.
This is essentially a long-belated follow-up to Arnold's swiftlang#12606. The key observation here is that the enum-tag-single-payload witnesses are strictly more powerful than the XI witnesses: you can simulate the XI witnesses by using an extra case count that's <= the XI count. Of course the result is less efficient than the XI witnesses, but that's less important than overall code size, and we can work on fast-paths for that. The extra inhabitant count is stored in a 32-bit field (always present) following the ValueWitnessFlags, which now occupy a fixed 32 bits. This inflates non-XI VWTs on 32-bit targets by a word, but the net effect on XI VWTs is to shrink them by two words, which is likely to be the more important change. Also, being able to access the XI count directly should be a nice win.
So far single payload enums were implemented in terms of runtime functions which
internally emitted several calls to value witnesses.
This commit adds value witnesses to get and store the enum tag side stepping the
need for witness calls as this information is statically available in many cases
/// int (getEnumTagSinglePayload)(const T enum, UINT_TYPE emptyCases)
/// Given an instance of valid single payload enum with a payload of this
/// witness table's type (e.g Optional) , get the tag of the enum.
/// void (storeEnumTagSinglePayload)(T enum, INT_TYPE whichCase,
/// UINT_TYPE emptyCases)
/// Given uninitialized memory for an instance of a single payload enum with a
/// payload of this witness table's type (e.g Optional), store the
/// tag.
A simple 'for element in array' loop in generic code operating on a
ContiguousArray of Int is ~25% faster on arm64.
rdar://31408033