-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Don't merge] docs: description of new mangling scheme #5428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some results (of a prototype implementation) on stdlib, benchmarks and a large framework: |
@rjmccall, @slavapestov (and all others): can you please take a look at this? |
|
||
nominal-type-kind ::= 'C' // class | ||
nominal-type-kind ::= 'O' // enum | ||
nominal-type-kind ::= 'V' // struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to redo the mangling, we should do away with differentiating nominal type kinds, IMO.
type ::= 'XPM' metatype-repr type // existential metatype with representation | ||
type ::= context identifier 'a' // Type alias (DWARF only) | ||
type ::= type type 'c' THROWS-ANNOTATION? // function type | ||
type ::= type type 'cu' THROWS-ANNOTATION? // uncurried function type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function types should have a variable number of input arguments, instead of a single tuple argument. "Variadic" and "inout" should be argument modifiers on function types instead of separate type encodings.
values indicates a single generic parameter at the outermost depth:: | ||
|
||
q_q_cru // <T_0_0> T_0_0 -> T_0_0 | ||
q_qd_0_cr_0_u // <T_0_0><T_1_0, T_1_1> T_0_0 -> T_1_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a single character encoding for t_0_0
(x
in the old scheme) was a pretty significant size win, since a large number of generics have a single generic parameter. Is there room to do that in this scheme?
encoded string itself. For example, the identifier ``vergüenza`` is mangled | ||
to ``X12vergenza_JFa``. (The encoding in standard Punycode would be | ||
to ``0012vergenza_JFa``. (The encoding in standard Punycode would be | ||
``vergenza-95a``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug with this mangling scheme for Unicode we need to fix: https://bugs.swift.org/browse/SR-660 If the first ASCII character in an identifier is a digit, as in é22
, then we'll end up butting the 22
against the run length of the encoded string. We could fix that by just inserting a _
, or encoding UTF-8 directly (though Punycode should in theory be more efficient for single-script identifiers).
This is great! I dumped a laundry list of other things-to-fix, though we could do them incrementally on top of this. |
Just a heads-up that this will break compatibility with the swift remote mirrors library, so we should bump the reflection metadata version number. Alternatively we could keep the old mangling format around until we're ready to do an ABI break there. |
@eeckstein Since you are changing the prefix from _T => _S, we are probably going to have to change strip as well. |
76da89a
to
0f5f1c2
Compare
I accidentally close this PR |
Sorry, somehow I managed to close this PR accidentally. The new PR is #5433 |
[wasm] Revert unnecessary dso_local change
This PR is a proposal for a new mangling scheme. It's still draft and probably the grammar contains several bugs. But it should give an idea how it could look like.
These are the main changes:
This is the biggest change.
It will help to get more common prefixes in the mangled names to optimize the trie in the mach-o object files.
The length of the mangled names will mostly stay the same but the order of 'operands' inside the mangling is more or less reversed.
This change also required to use different 'operator' characters in some cases.
Similar to the S-substitutions, but finer grained. See section 'Identifiers'.
Reduces the size of mangled names in general.
A more efficient way to mangle multiple S-substitutions.
Reduces the size of mangled names with lots of substitutions, e.g. specialized functions.
(on John's request)
Because it's basically a completely new mangling scheme.