Skip to content

Modify cache file formats to shrink overall cache size #2108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 8, 2016

Conversation

Michael0x2a
Copy link
Collaborator

This pull request includes a small collection of improvements to the format of the data cache files, targeting primarily the Var, Argument, FuncDef, and TypeInfo classes.

More specifically, this pull request:

  • Avoids serializing flags with a default value of False within Var, FuncDef, and TypeInfo
  • No longer serializes the type_comments field from Argument
  • Serializes a shortened and compressed version of arguments inside FuncDef.

These changes together don't appear to have a statistically significant impact on the speed of mypy when it's running in incremental mode, but do end up shrinking the average cache size by approximately 45% to 50%.

This commit modifies several mypy nodes that tend to contain a large
number of usually-false boolean flags to store a compressed version of
those flags.

More precisely, rather then storing each flag and its boolean value as a
key-value pair in the output JSON, we now store a list of the flags that
are known to be true at time of serialization.

When deserializing, we iterate over the list of flags and set each
attribute to True, overriding the default value of False.

This does not result in any significant speedups, but does end up
reducing the size of the cache by a non-trivial factor.
This commit modifies the serialization logic of Argument to stop saving
the type_annotation data.

After some investigation, it appears as if that data is never actually used
once the cache data is loaded by mypy, and removing it would help shrink
the cache size by a respectable margin.
This commit modifies the serialization logic and some of the surrounding
code for FuncDefs so we no longer serialize the full argument list --
instead, we just stick to serializing the argument names and the
argument kinds.

As with the other commits, this does not really improve speed, but does
end up shrinking the cache size by a fair amount.

It would have been nice to avoid storing the arg kinds and arg names
altogether since the information can usually be found within the
corresponding callable type, but unfortunately, a FuncDef is not
guaranteed to actually have a type in all instances -- sometimes, the
type is None. This is a problem since we need the argument names and
kinds to be able to infer a fallback type later on in mypy.
@@ -587,6 +595,11 @@ class Var(SymbolNode, Statement):
# parse for some reason (eg a silenced module)
is_suppressed_import = False

FLAGS = [
'is_self', 'is_ready', 'is_initialized_in_class', 'is_staticmethod',
'is_classmethod', 'is_property', 'is_settable_property', 'is_suppressed_import'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still hasn't become second nature for you to add a trailing comma to a list like this, when the close bracket is on the next line? (Don't push a fix, just noticing. :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits die hard, I guess :(

I think I typically remember when modifying an existing list of item, since the trailing comma is already there, but it hasn't quite sunken in for new lists yet.

@gvanrossum gvanrossum merged commit 6cc3462 into python:master Sep 8, 2016
@Michael0x2a Michael0x2a deleted the shrink-cache-files branch September 9, 2016 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants