Skip to content

[Don't merge] docs: description of new mangling scheme #5428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

eeckstein
Copy link
Contributor

This PR is a proposal for a new mangling scheme. It's still draft and probably the grammar contains several bugs. But it should give an idea how it could look like.

These are the main changes:

  1. Change the order of the mangling to a post-fix like structure.

This is the biggest change.
It will help to get more common prefixes in the mangled names to optimize the trie in the mach-o object files.
The length of the mangled names will mostly stay the same but the order of 'operands' inside the mangling is more or less reversed.
This change also required to use different 'operator' characters in some cases.

  1. Word-substitutions

Similar to the S-substitutions, but finer grained. See section 'Identifiers'.
Reduces the size of mangled names in general.

  1. Combined substitutions

A more efficient way to mangle multiple S-substitutions.
Reduces the size of mangled names with lots of substitutions, e.g. specialized functions.

  1. Change the '_T' prefix to '_S'

(on John's request)
Because it's basically a completely new mangling scheme.

@eeckstein
Copy link
Contributor Author

Some results (of a prototype implementation) on stdlib, benchmarks and a large framework:
mach-o trie size reduction: ~15%
string table reduction (used for nlist symbols): ~10%
total file size reduction: ~4%

@eeckstein
Copy link
Contributor Author

@rjmccall, @slavapestov (and all others): can you please take a look at this?


nominal-type-kind ::= 'C' // class
nominal-type-kind ::= 'O' // enum
nominal-type-kind ::= 'V' // struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to redo the mangling, we should do away with differentiating nominal type kinds, IMO.

type ::= 'XPM' metatype-repr type // existential metatype with representation
type ::= context identifier 'a' // Type alias (DWARF only)
type ::= type type 'c' THROWS-ANNOTATION? // function type
type ::= type type 'cu' THROWS-ANNOTATION? // uncurried function type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function types should have a variable number of input arguments, instead of a single tuple argument. "Variadic" and "inout" should be argument modifiers on function types instead of separate type encodings.

values indicates a single generic parameter at the outermost depth::

q_q_cru // <T_0_0> T_0_0 -> T_0_0
q_qd_0_cr_0_u // <T_0_0><T_1_0, T_1_1> T_0_0 -> T_1_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single character encoding for t_0_0 (x in the old scheme) was a pretty significant size win, since a large number of generics have a single generic parameter. Is there room to do that in this scheme?

encoded string itself. For example, the identifier ``vergüenza`` is mangled
to ``X12vergenza_JFa``. (The encoding in standard Punycode would be
to ``0012vergenza_JFa``. (The encoding in standard Punycode would be
``vergenza-95a``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug with this mangling scheme for Unicode we need to fix: https://bugs.swift.org/browse/SR-660 If the first ASCII character in an identifier is a digit, as in é22, then we'll end up butting the 22 against the run length of the encoded string. We could fix that by just inserting a _, or encoding UTF-8 directly (though Punycode should in theory be more efficient for single-script identifiers).

@jckarter
Copy link
Contributor

This is great! I dumped a laundry list of other things-to-fix, though we could do them incrementally on top of this.

@slavapestov
Copy link
Contributor

Just a heads-up that this will break compatibility with the swift remote mirrors library, so we should bump the reflection metadata version number. Alternatively we could keep the old mangling format around until we're ready to do an ABI break there.

@gottesmm
Copy link
Contributor

@eeckstein Since you are changing the prefix from _T => _S, we are probably going to have to change strip as well.

@eeckstein
Copy link
Contributor Author

I accidentally close this PR

@eeckstein
Copy link
Contributor Author

Sorry, somehow I managed to close this PR accidentally. The new PR is #5433

MaxDesiatov pushed a commit that referenced this pull request Sep 7, 2023
[wasm] Revert unnecessary dso_local change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants