Skip to content

bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() #3454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Sep 10, 2017

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Sep 8, 2017

Current version passes tests but doesn't not support the _source attribute or the verbose option.

Obtains a 5x creation time speed-up and with small positive impacts on post creation performance.

=============== Baseline ===============
Cumulative time over 10,000 iterations:
0.044 arg_checking
0.238 templating
2.907 exec
3.257 all
Total with call overhead: 3.270 seconds

===== This verion ======
Cumulative time over 10,000 iterations:
0.073 arg_checking
0.306 template_and_exec
0.207 non_exec
0.626 all
Total with call overhead: 0.639 seconds

Ideas going forward

  • Cache the itemgetter instances
  • Generate _source dynamically
  • Need to compare to Jelle's patch

https://bugs.python.org/issue28638

)
for index, name in enumerate(field_names):
class_namespace[name] = property(fget = reuse_itemgetter(index),
doc = f'Alias for field number {index}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice hack! How about reusing docstring too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a great idea. Adding: doc = _sys.intern(f'Alias for field number {index}')

@rhettinger
Copy link
Contributor Author

Incorporating by reference all the comments in the closed PR: #2736

The two patches are mostly synced. The principal differences are:

  1. Dropped _source and verbose
  2. Better use of closure variables as a fast way to pass fixed values into the methods
  3. Use dict() notation versus {k:v} style
  4. Early conversion of field_names to a tuple so this gets done just once
  5. Re-use itemgetter objects and intern the docstrings
  6. Better comments of sections and note where exec interns for us
  7. Compact single-line template for __new__

@rhettinger rhettinger changed the title bpo-28638: Optimize namedtuple() creation time (Work-in-progress) bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() Sep 8, 2017
@serhiy-storchaka serhiy-storchaka self-requested a review September 8, 2017 21:52
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a compatibility-breaking change, it should be documented in corresponding section of What's New.

field_defs = '\n'.join(_field_template.format(index=index, name=name)
for index, name in enumerate(field_names))
# Variables used in the methods and docstrings
field_names = tuple(field_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intern names. Comparing identical strings is faster. In the current implementation the compiler does this for us.

# Create all the named tuple methods to be added to the class namespace

s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))'
namespace = dict(_tuple_new=tuple_new, __name__=module_name)
Copy link
Member

@emmatyping emmatyping Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you chose dict() over {k:v}? I believe {k:v} is faster because it doesn't need to make a function call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beauty counts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Now that I think about it, I wonder if optimizing it in peephole would be a good thing? The pattern is pretty consistent LOAD_GLOBAL (dict) (LOAD_CONST arg names and values) CALL_FUNCTION_KW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it because builtins.dict can be replaced by a user or shadowed in globals(). Just the facts of life in a very dynamic language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Thanks for the explanation! I should have guessed there was good reason for a "simple improvement" such as that not being implemented.

return _nt_itemgetters[index]
except KeyError:
getter = _nt_itemgetters[index] = _itemgetter(index)
return getter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about caching property?

doc = f'Alias for field number {index}'
prop = _nt_props[index] = property(_itemgetter(index), doc=doc)
return prop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't cache property() the docstring is mutable and can updated by the user, so the property objects need to be distinct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realized it.
Anyway, I prefer caching docstring to sys.intern, because sys.intern approach
requires temporal new string and hashing it.
But that's just my preference and there are no real problem.

if name in seen:
raise ValueError('Encountered duplicate field name: %r' % name)
raise ValueError('Encountered duplicate field name: {name!r}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing an f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for noticing.


for method in (__new__, _make.__func__, _replace,
__repr__, _asdict, __getnewargs__):
method.__module__ = module_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same as the __module__ attribute of the class. But the latter is defined at the end of this function, so the code needs reordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether the method.__module__ assignment should be removed entirely. It doesn't seem to be needed for pickling. For introspection purposes, the code is actually defined in collections.__init__ rather than in the caller's namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for pickling these methods.

This change is not required for this PR since pickling methods is not supported in the current implementation. But I think it is easy to add pickle support while we rewrite this code.


for method in (__new__, _make.__func__, _replace,
__repr__, _asdict, __getnewargs__):
method.__module__ = module_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for pickling these methods.

This change is not required for this PR since pickling methods is not supported in the current implementation. But I think it is easy to add pickle support while we rewrite this code.

or ``_source`` attribute which showed the generated source code for the
named tuple class. This was part of an optimization designed to speed-up
class creation. (Contributed by Jelle Zijlstra with further improvements
by INADA Naoki, Serhiy Storchaka, and Raymond Hettinger in :issue:`28638`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose here should be "Naoki Inada" instead of "INADA Naoki".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. That is for him to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't care in general, I prefer "INADA Naoki" here because I use it in git config "user.name".

@@ -435,6 +435,12 @@ API and Feature Removals
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive`
function instead.

* :func:`collections.namedtuple` no longer supports the *verbose* parameter
or ``_source`` attribute which showed the generated source code for the
named tuple class. This was part of an optimization designed to speed-up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth to add a note about an optimization in the "Optimizations" section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to litter the docs with more notes about this. The more important part to note is the feature removal in whatsnew. The optimization is mostly negligible and insignificant to most users (normal startup doesn't use named tuples at all; a module using one named tuple was using only 0.3ms which was just under 1% of the overall start-up time; formerly, we could make 3 namedtuple classes per millisecond and now we can make about 15). It is one of the least important optimizations we could have done.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except writing Naoki's names.

@rhettinger
Copy link
Contributor Author

Breakdown of time to create a named tuple class using the current patch:

  • 9μs for the argument checking and creation of variables used in the methods and docstrings
  • 32μs for creating __new__ using templating and exec()
  • 2μs for making all the other methods using def
  • 2μs for adding __qualname__ to all the methods
  • 20μs for making the class dict, making the properties, and calling type() to build the class
  • 3μs for function call overhead, computing __module__, and timing overhead

The total of 68μs with the patch compares to 327 μs without the patch, giving an approx 5x speed-up for named tuple class creation. There should be some memory savings (_source is no longer stored, the code objects for all methods with __new__ are reused, and the itemgetter objects are reused).

For the actual operation of named tuple instances, there are small (hard to measure) differences in performance:

  • 2% better for __new__() at about 0.4μs per call (with 3 fields)
  • 1% better for _replace() at about 1.2μs per call (with 2 out of 3 fields replaced)
  • 6% worse for _make() at about 0.4μs per call (with 3 fields)
  • 0% change for _asdict() at about 1.2μs per call (with 3 fields)

itemgetter_object = cache[index]
except KeyError:
itemgetter_object = cache[index] = _itemgetter(index)
doc = _sys.intern(f'Alias for field number {index}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the same cache for docstrings?

try:
    itemgetter_object, doc = cache[index]
except KeyError:
    itemgetter_object, doc = cache[index] = _itemgetter(index), f'Alias for field number {index}'
class_namespace[name] = property(itemgetter_object, doc=doc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need. sys.intern() takes care of the caching for us and does so at C-speed. And though it probably isn't needed, it can share strings across the whole system including hand-rolled named tuples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually using sys.intern() makes namedtuple class creation up to 10% slower due to the cost of string formatting, attribute lookup and function call.

Additionally, adding strings that are not attribute or parameter names to an interned strings hashtable increases the size of a hashtable and the probability of hash collisions.

@rhettinger rhettinger merged commit 8b57d73 into python:master Sep 10, 2017
@rhettinger rhettinger deleted the optimize-namedtuple branch September 10, 2017 17:23
@gvanrossum
Copy link
Member

Congrats Raymond and everyone who contributed to this major improvement!

(To whoever merged this PR, next time please clean up the commit description (it's too late for this one, commits pushed to GitHub are forever). Core committers who merge PRs are responsible for producing a readable commit message. The current message is just the concatenation of all local commits, including things like "Neaten-up a bit". Hopefully the news blurb is better.)

@methane
Copy link
Member

methane commented Sep 11, 2017

Congrats!
import inspect (asyncio imports it) time is reduced from about 19ms to 16ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants