Skip to content

bpo-28638: speed up namedtuple creation by avoiding exec #2736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

JelleZijlstra
Copy link
Member

@JelleZijlstra JelleZijlstra commented Jul 16, 2017

Creating a namedtuple is relatively slow because it uses exec().
This commit reduces the exec()'ed code, but still uses exec() for
creating the __new__ method. I don't know of a way to avoid using
exec() for __new__ beyond manipulating bytecode directly.

However, avoiding exec() for creating the class itself still yields
a significant speedup. In an unscientific benchmark I ran, creating
1000 namedtuple classes now takes about 0.14 s instead of 0.44 s.

There is one backward compatibility break: namedtuples no longer have
a _source attribute, because we no longer exec() their source. I kept
the verbose=True argument around for compatibility, but it now does
nothing.

cc @sixolet @methane

https://bugs.python.org/issue28638

Creating a namedtuple is relatively slow because it uses exec().
This commit reduces the exec()'ed code, but still uses exec() for
creating the __new__ method. I don't know of a way to avoid using
exec() for __new__ beyond manipulating bytecode directly.

However, avoiding exec() for creating the class itself still yields
a significant speedup. In an unscientific benchmark I ran, creating
1000 namedtuple classes now takes about 0.14 s instead of 0.44 s.

There is one backward compatibility break: namedtuples no longer have
a _source attribute, because we no longer exec() their source. I kept
the verbose=True argument around for compatibility, but it now does
nothing.
@mention-bot
Copy link

@JelleZijlstra, thanks for your PR! By analyzing the history of the files in this pull request, we identified @rhettinger, @serhiy-storchaka and @vsajip to be potential reviewers.

@pitrou pitrou added the performance Performance or resource usage label Jul 17, 2017
@ilevkivskyi
Copy link
Member

But why can't we keep a _source just for backward compatibility, containing a practical equivalent of the proposed no-exec version? Maybe make it a property to avoid some time overhead of large string formatting for users who don't use _source.

raise TypeError('Expected %d arguments, got %d' % (num_fields, len(result)))
return result

_make.__func__.__doc__ = 'Make a new {typename} object from a sequence or iterable'.format(typename=typename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line should be wrapped (as you did below for _replace).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
result = new(cls, iterable)
if len(result) != num_fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will keep the closure alive, maybe make num_fileds a (private) attribute of the class and define this function outside of namedtuple? (TBH however, I have no idea how efficient this will be, but intuitively this should be more efficient than with nested function at least in terms of memory).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably @serhiy-storchaka could say whether this makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my POC implementation (see bpo-28638) I used len(cls._fields).

But bpo-28638 is closed by Raymond. I suggest to close this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add cls._num_fields to avoid adding another function call here.

_make.__func__.__doc__ = 'Make a new {typename} object from a sequence or iterable'.format(typename=typename)

def _replace(_self, **kwds):
result = _self._make(map(kwds.pop, field_names, _self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same situation with closure here and in __repr__, maybe use _self._fields instead of field_names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rhettinger rhettinger self-assigned this Jul 18, 2017
@rhettinger
Copy link
Contributor

This looks pretty good so far. I would like to layer in lazy generation of "_source" and to keep the "verbose" option functional. I think is possible to do this all in a way that it transparent to the user, that keeps the current API fully intact and only modestly increases the complexity of the code

@JelleZijlstra
Copy link
Member Author

Thanks! I'm going to first address Ivan's comments related to closures, then work on bringing back _source.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good now. Maybe Raymond will have more comments. Also it would be good to see detailed benchmarks (space and time) for this particular implementation vs the original one.

exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel ', '.join(field_names) is pythonic code.
Is this line different from it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is in the case of 1-element tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the trailing comma desirable in the 1-element tuple case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it's no longer a tuple.

@methane
Copy link
Member

methane commented Jul 18, 2017

Quick and dirty bench on pypy3.5-5.8

# x.py
from collections import namedtuple
nts = [namedtuple(f"Foo{i}", "foo,bar,baz") for i in range(10000)]

original:

$ time pyenv/versions/pypy3.5-5.8.0/bin/pypy x.py
real	0m7.540s
user	0m7.448s
sys	0m0.092s

$ time pyenv/versions/pypy3.5-5.8.0/bin/pypy x.py
real	0m7.529s
user	0m7.448s
sys	0m0.076s

patched:

$ time pyenv/versions/pypy3.5-5.8.0/bin/pypy x.py
real	0m2.010s
user	0m1.988s
sys	0m0.020s

$ time pyenv/versions/pypy3.5-5.8.0/bin/pypy x.py
real	0m2.016s
user	0m1.980s
sys	0m0.032s

@methane
Copy link
Member

methane commented Jul 18, 2017

Same bench on Python 3.6.2 on macOS

original:

$ time ./pyenv/versions/3.6.2/bin/python3 x.py
real	0m4.469s
user	0m4.349s
sys	0m0.105s

$ time ./pyenv/versions/3.6.2/bin/python3 x.py
real	0m4.510s
user	0m4.374s
sys	0m0.116s

patched:

$ time ./pyenv/versions/3.6.2/bin/python3 x.py
real	0m1.131s
user	0m1.063s
sys	0m0.059s

$ time ./pyenv/versions/3.6.2/bin/python3 x.py
real	0m1.123s
user	0m1.055s
sys	0m0.060s

@methane
Copy link
Member

methane commented Jul 18, 2017

And memory usage on CPython 3.6.2

original:
270 arenas * 262144 bytes/arena = 70,778,880

patched:
144 arenas * 262144 bytes/arena = 37,748,736


.. versionadded:: 3.3

.. versionchanged:: 3.7
``_source`` is no longer used to created the named tuple class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence a bit confusing to understand. Something like "_source is no longer the actual named tuple class implementation, but an equivalent implementation" would be clearer to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to ``_source`` is no longer used to create the named tuple class implementation, but contains an equivalent implementation.

exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is in the case of 1-element tuple.

@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
result = new(cls, iterable)
if len(result) != cls._num_fields:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use just len(cls._fields)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to introduce another function call. That might be premature optimization though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to add num_fields=num_fields argument like len=len.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with @methane's suggestion.

_repr_template = '{name}=%r'
_new_template = '''
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be faster to set the __doc__ attribute explicitly rather than compile it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

namespace = {'_tuple': tuple}
new_source = _new_template.format(typename=typename, arg_list=arg_list)
exec(new_source, namespace)
__new__ = namespace['__new__']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set __qualname__ and __module__ attributes of the __new__ method and other methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pitrou
Copy link
Member

pitrou commented Jul 18, 2017

General question: do we want to use f-strings here rather than explicit format() calls?

@ilevkivskyi
Copy link
Member

General question: do we want to use f-strings here rather than explicit format() calls?

f-strings should be faster, so that probably we should use them where possible.

@JelleZijlstra
Copy link
Member Author

A downside of f-strings would be that they would make it harder to port this implementation to e.g. pypy, which doesn't support 3.6 yet. I can check how much of a difference it makes in benchmarks.

@methane
Copy link
Member

methane commented Jul 19, 2017

PyPy3.5 supports f-string.
But I don't know about other implementations (e.g. MicroPython support Python 3.5).

@mlouielu
Copy link
Contributor

mlouielu commented Jul 19, 2017

If we want performance about formatting string, why not use '%'-format? It seems faster than two others.

@methane
Copy link
Member

methane commented Jul 19, 2017

why not use '%'-format?

Because original code uses "...{typename}...".format(typename=typename) style.
Changing it to f-string (f"...{typename}...") is very easy and straight.

It seems faster than two others.

Why do you think so? It seems f-string is fast in most cases.

$ python3 -m perf timeit -s 'bar="bar"' -- '"foo{bar}baz".format(bar=bar)'
.....................
Mean +- std dev: 524 ns +- 14 ns

$ python3 -m perf timeit -s 'bar="bar"' -- 'f"foo{bar}baz"'
.....................
Mean +- std dev: 101 ns +- 3 ns

$ python3 -m perf timeit -s 'bar="bar"' -- '"foo%sbaz" % (bar,)'
.....................
Mean +- std dev: 237 ns +- 8 ns

@mlouielu
Copy link
Contributor

@methane I take a wrong mesurement about this, I'm testing format 10 items in the string, not only one item.

JelleZijlstra added a commit to JelleZijlstra/cnamedtuple that referenced this pull request Jul 19, 2017
Notes:
- fasternt.py is a copy of collections/__init__.py as of python/cpython#2736
- you need to install perf==0.9.6 to run the benchmark, the latest version doesn't work
- run the benchmark with ./bench in the prof directory.

In summary, fasternt is ~4x faster than the stdlib at creating namedtuple classes and has no
effect on instantiation or attribute access. cnamedtuple is 40x faster at class creation
and ~30% faster at instantiation and attribute access.
_new_template = '''
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
return _tuple.__new__(_cls, ({arg_list}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of _tuple.__new__, how about put namespace['_tuple_new'] = tuple.__new__ and call _tuple_new?
It makes eval and instantiation bit (about 3%) faster.

this pull request:

$ ./pyenv/versions/3.6.2/bin/python -m perf timeit -s 'from collections import namedtuple' -- 'namedtuple("Foo", "foo bar baz")'
.....................
Mean +- std dev: 90.1 us +- 3.3 us

$ ./pyenv/versions/3.6.2/bin/python -m perf timeit -s 'from collections import namedtuple; Foo=namedtuple("Foo", "foo bar baz")' -- 'Foo(1,2,3)'
.....................
Mean +- std dev: 503 ns +- 15 ns

_tuple_new version:

$ ./pyenv/versions/3.6.2/bin/python -m perf timeit -s 'from collections import namedtuple' -- 'namedtuple("Foo", "foo bar baz")'
.....................
Mean +- std dev: 87.1 us +- 2.7 us

$ ./pyenv/versions/3.6.2/bin/python -m perf timeit -s 'from collections import namedtuple; Foo=namedtuple("Foo", "foo bar baz")' -- 'Foo(1,2,3)'
.....................
Mean +- std dev: 485 ns +- 21 ns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, making that change.

@methane
Copy link
Member

methane commented Aug 27, 2017

@JelleZijlstra would you update this pull request?

@JelleZijlstra
Copy link
Member Author

Sorry, I've been busy. I'll try to find some time for it today.

@JelleZijlstra
Copy link
Member Author

I think I've addressed all comments since my last push. Let me know if you'd like to see any other changes.

Another open question: Should we use f-strings? They'll likely be a little faster, but may make it harder to port this code to other Python versions and implementations.

@methane
Copy link
Member

methane commented Aug 28, 2017

Another open question: Should we use f-strings? They'll likely be a little faster, but may make it harder to port this code to other Python versions and implementations.

This patch will not be backported even Python 3.6 which supports f-string.
There are no chance to backport this to Python 3.5.
Additionally, PyPy3.5 supports f-string.

So please don't worry about it.

@ilevkivskyi
Copy link
Member

Yes, I also think f-strings should be used here.

@rhettinger
Copy link
Contributor

Per past policy, optimizations don't get backported. I'll look at this patch more during the sprints next week.

__new__ = namespace['__new__']
__new__.__doc__ = f'Create new instance of {typename}({arg_list})'

@classmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be cleaner to move this down to the class_namespace, to avoid having to use .__func__


def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three functions without formatted docstrings don't need to be redeclared for each namedtuple, do they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doesn't work now because I'm setting __qualname__ and __module__ on them.

_repr_template = '{name}=%r'
_new_template = '''
def __new__(_cls, {arg_list}):
return _tuple_new(_cls, ({arg_list}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be better handled by inserting the trailing comma here?

new_source = _new_template.format(arg_list=arg_list)
exec(new_source, namespace)
__new__ = namespace['__new__']
__new__.__doc__ = f'Create new instance of {typename}({arg_list})'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you get a strange trailing comma in this docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request is about performance. I don't think changing much is better.
Current pull request follows current code.

>>> foo = collections.namedtuple("foo", "a")
>>> print(foo._source)
...
    def __new__(_cls, a,):
        'Create new instance of foo(a,)'
        return _tuple.__new__(_cls, (a,))
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the current pull request doesn't produce exactly the same code any more anyway, since _tuple.__new__ is now _tuple_new, so there's no particular argument against changing it other than "it doesn't need to be part of this pr" (which I agree is true)

@@ -0,0 +1,2 @@
Speed up namedtuple class creation by 4x by avoiding usage of exec(). Patch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be read as exec was eliminated, so perhaps '...by reducing usage of exec().'?

@rhettinger
Copy link
Contributor

The main differences are:

  1. Dropped _source and verbose
  2. Better use of closure variables as a fast way to pass fixed values into the methods
  3. Use dict() notation versus {k:v} style
  4. Early conversion of field_names to a tuple so this gets done just once
  5. Re-use itemgetter objects and intern the docstrings
  6. Better comments of sections and note where exec interns for us
  7. Compact single-line template for __new__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.