-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-28638: Optimize namedtuple() creation time by minimizing use of exec() #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lib/collections/__init__.py
Outdated
) | ||
for index, name in enumerate(field_names): | ||
class_namespace[name] = property(fget = reuse_itemgetter(index), | ||
doc = f'Alias for field number {index}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice hack! How about reusing docstring too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a great idea. Adding: doc = _sys.intern(f'Alias for field number {index}')
Incorporating by reference all the comments in the closed PR: #2736 The two patches are mostly synced. The principal differences are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a compatibility-breaking change, it should be documented in corresponding section of What's New.
Lib/collections/__init__.py
Outdated
field_defs = '\n'.join(_field_template.format(index=index, name=name) | ||
for index, name in enumerate(field_names)) | ||
# Variables used in the methods and docstrings | ||
field_names = tuple(field_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intern names. Comparing identical strings is faster. In the current implementation the compiler does this for us.
Lib/collections/__init__.py
Outdated
# Create all the named tuple methods to be added to the class namespace | ||
|
||
s = f'def __new__(_cls, {arg_list}): return _tuple_new(_cls, ({arg_list}))' | ||
namespace = dict(_tuple_new=tuple_new, __name__=module_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you chose dict() over {k:v}? I believe {k:v} is faster because it doesn't need to make a function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beauty counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Now that I think about it, I wonder if optimizing it in peephole would be a good thing? The pattern is pretty consistent LOAD_GLOBAL
(dict) (LOAD_CONST
arg names and values) CALL_FUNCTION_KW
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do it because builtins.dict can be replaced by a user or shadowed in globals(). Just the facts of life in a very dynamic language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. Thanks for the explanation! I should have guessed there was good reason for a "simple improvement" such as that not being implemented.
Lib/collections/__init__.py
Outdated
return _nt_itemgetters[index] | ||
except KeyError: | ||
getter = _nt_itemgetters[index] = _itemgetter(index) | ||
return getter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about caching property?
doc = f'Alias for field number {index}'
prop = _nt_props[index] = property(_itemgetter(index), doc=doc)
return prop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't cache property() the docstring is mutable and can updated by the user, so the property objects need to be distinct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realized it.
Anyway, I prefer caching docstring to sys.intern
, because sys.intern
approach
requires temporal new string and hashing it.
But that's just my preference and there are no real problem.
Lib/collections/__init__.py
Outdated
if name in seen: | ||
raise ValueError('Encountered duplicate field name: %r' % name) | ||
raise ValueError('Encountered duplicate field name: {name!r}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing an f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks for noticing.
Lib/collections/__init__.py
Outdated
|
||
for method in (__new__, _make.__func__, _replace, | ||
__repr__, _asdict, __getnewargs__): | ||
method.__module__ = module_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the same as the __module__
attribute of the class. But the latter is defined at the end of this function, so the code needs reordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether the method.__module__
assignment should be removed entirely. It doesn't seem to be needed for pickling. For introspection purposes, the code is actually defined in collections.__init__
rather than in the caller's namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for pickling these methods.
This change is not required for this PR since pickling methods is not supported in the current implementation. But I think it is easy to add pickle support while we rewrite this code.
Lib/collections/__init__.py
Outdated
|
||
for method in (__new__, _make.__func__, _replace, | ||
__repr__, _asdict, __getnewargs__): | ||
method.__module__ = module_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed for pickling these methods.
This change is not required for this PR since pickling methods is not supported in the current implementation. But I think it is easy to add pickle support while we rewrite this code.
or ``_source`` attribute which showed the generated source code for the | ||
named tuple class. This was part of an optimization designed to speed-up | ||
class creation. (Contributed by Jelle Zijlstra with further improvements | ||
by INADA Naoki, Serhiy Storchaka, and Raymond Hettinger in :issue:`28638`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose here should be "Naoki Inada" instead of "INADA Naoki".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. That is for him to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't care in general, I prefer "INADA Naoki" here because I use it in git config "user.name".
@@ -435,6 +435,12 @@ API and Feature Removals | |||
Python 3.1, and has now been removed. Use the :func:`~os.path.splitdrive` | |||
function instead. | |||
|
|||
* :func:`collections.namedtuple` no longer supports the *verbose* parameter | |||
or ``_source`` attribute which showed the generated source code for the | |||
named tuple class. This was part of an optimization designed to speed-up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth to add a note about an optimization in the "Optimizations" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to litter the docs with more notes about this. The more important part to note is the feature removal in whatsnew. The optimization is mostly negligible and insignificant to most users (normal startup doesn't use named tuples at all; a module using one named tuple was using only 0.3ms which was just under 1% of the overall start-up time; formerly, we could make 3 namedtuple classes per millisecond and now we can make about 15). It is one of the least important optimizations we could have done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except writing Naoki's names.
Breakdown of time to create a named tuple class using the current patch:
The total of 68μs with the patch compares to 327 μs without the patch, giving an approx 5x speed-up for named tuple class creation. There should be some memory savings ( For the actual operation of named tuple instances, there are small (hard to measure) differences in performance:
|
Lib/collections/__init__.py
Outdated
itemgetter_object = cache[index] | ||
except KeyError: | ||
itemgetter_object = cache[index] = _itemgetter(index) | ||
doc = _sys.intern(f'Alias for field number {index}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the same cache for docstrings?
try:
itemgetter_object, doc = cache[index]
except KeyError:
itemgetter_object, doc = cache[index] = _itemgetter(index), f'Alias for field number {index}'
class_namespace[name] = property(itemgetter_object, doc=doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need. sys.intern() takes care of the caching for us and does so at C-speed. And though it probably isn't needed, it can share strings across the whole system including hand-rolled named tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually using sys.intern()
makes namedtuple class creation up to 10% slower due to the cost of string formatting, attribute lookup and function call.
Additionally, adding strings that are not attribute or parameter names to an interned strings hashtable increases the size of a hashtable and the probability of hash collisions.
Congrats Raymond and everyone who contributed to this major improvement! (To whoever merged this PR, next time please clean up the commit description (it's too late for this one, commits pushed to GitHub are forever). Core committers who merge PRs are responsible for producing a readable commit message. The current message is just the concatenation of all local commits, including things like "Neaten-up a bit". Hopefully the news blurb is better.) |
Congrats! |
Current version passes tests but doesn't not support the _source attribute or the verbose option.
Obtains a 5x creation time speed-up and with small positive impacts on post creation performance.
=============== Baseline ===============
Cumulative time over 10,000 iterations:
0.044 arg_checking
0.238 templating
2.907 exec
3.257 all
Total with call overhead: 3.270 seconds
===== This verion ======
Cumulative time over 10,000 iterations:
0.073 arg_checking
0.306 template_and_exec
0.207 non_exec
0.626 all
Total with call overhead: 0.639 seconds
Ideas going forward
https://bugs.python.org/issue28638