Skip to content

bpo-39495: Remove default value from C impl of TreeBuilder.start #18275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 2, 2020

Conversation

hauntsaninja
Copy link
Contributor

@hauntsaninja hauntsaninja commented Jan 30, 2020

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jan 30, 2020

As suggested by Serhiy, this changes the C implementation of TreeBuilder.start to remove the default value.

Note however, that the C implementation of TreeBuilder.start still accepts None as a value whereas the Python implementation TypeErrors (see snippet at end).

I looked into changing the C implementation to also throw a TypeError. Unfortunately, it appears treebuilder_handle_start is called elsewhere where attrib can be Py_None (set on line 3310) and I'm not familiar enough with this module to know if I can break that:

res = treebuilder_handle_start((TreeBuilderObject*) self->target,

~/dev/cpython bpo39495 λ ./python.exe                       
Python 3.9.0a3+ (heads/bpo39495_2-dirty:2e6569b669, Jan 29 2020, 22:45:33) 
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree
>>> # show the fix
>>> xml.etree.ElementTree.TreeBuilder().start("asdf")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: start expected 2 arguments, got 1
>>> # function still accepts None
>>> xml.etree.ElementTree.TreeBuilder().start("asdf", None)
<Element 'asdf' at 0x10b2e6710>

vs

~ λ python3.9
Python 3.9.0a3+ (heads/master:188bb5b, Jan 29 2020, 22:55:03) 
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree
>>> # not fixed
>>> xml.etree.ElementTree.TreeBuilder().start("asdf")
<Element 'asdf' at 0x10b89f2c0>
>>> # function accepts None
>>> xml.etree.ElementTree.TreeBuilder().start("asdf", None)
<Element 'asdf' at 0x10b89f310>
>>> from test.support import import_fresh_module  
>>> pyElementTree = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
>>> # show the inconsistency with python implementation
>>> pyElementTree.TreeBuilder().start("asdf")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: start() missing 1 required positional argument: 'attrs'
>>> # function does not accept None, throws TypeError
>>> pyElementTree.TreeBuilder().start("asdf", None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/shantanu/.pyenv/versions/3.9-dev/lib/python3.9/xml/etree/ElementTree.py", line 1463, in start
    self._last = elem = self._factory(tag, attrs)
  File "/Users/shantanu/.pyenv/versions/3.9-dev/lib/python3.9/xml/etree/ElementTree.py", line 171, in __init__
    raise TypeError("attrib must be dict, not %s" % (
TypeError: attrib must be dict, not NoneType

@serhiy-storchaka
Copy link
Member

Thank you for your PR @hauntsaninja.

I prefer to remove also the support of None as a value of attrib in treebuilder_handle_start. According to the documentation attrib is a dict, and a dict is always passed to custom TreeBuilder.start().

treebuilder_handle_start is called from two places: the implementation of TreeBuilder.start() and expat_start_handler. In the latter case None is passed for optimization, to avoid creation of an empty dictionary. We can use NULL instead of None for this optimization.

@serhiy-storchaka
Copy link
Member

Or you can just use object(subclass_of='&PyDict_Type') for attrs.

@hauntsaninja
Copy link
Contributor Author

Thanks Dong-hee and Serhiy! Don't know much about argument clinic converters, that's a neat fix.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Feb 11, 2020

Thanks, sorry for the delay, done. It seems test_xml_etree_c.py automatically runs all tests in test_xml_etree.py, so I only added a test in the latter.

@codecov
Copy link

codecov bot commented Feb 11, 2020

Codecov Report

Merging #18275 into master will increase coverage by 1.10%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #18275       +/-   ##
===========================================
+ Coverage   82.12%   83.22%    +1.10%     
===========================================
  Files        1955     1571      -384     
  Lines      588706   415384   -173322     
  Branches    44401    44488       +87     
===========================================
- Hits       483462   345712   -137750     
+ Misses      95601    60027    -35574     
- Partials     9643     9645        +2     
Impacted Files Coverage Δ
Lib/distutils/tests/test_bdist_rpm.py 30.00% <0.00%> (-65.00%) ⬇️
Lib/distutils/command/bdist_rpm.py 7.63% <0.00%> (-56.88%) ⬇️
Lib/test/test_urllib2net.py 76.92% <0.00%> (-13.85%) ⬇️
Lib/test/test_smtpnet.py 78.57% <0.00%> (-7.15%) ⬇️
Lib/ftplib.py 63.85% <0.00%> (-6.06%) ⬇️
Lib/test/test_ftplib.py 87.11% <0.00%> (-4.72%) ⬇️
Tools/scripts/db2pickle.py 17.82% <0.00%> (-3.97%) ⬇️
Tools/scripts/pickle2db.py 16.98% <0.00%> (-3.78%) ⬇️
Lib/test/test_socket.py 71.94% <0.00%> (-3.77%) ⬇️
Lib/distutils/tests/test_bdist_msi.py 56.25% <0.00%> (-3.75%) ⬇️
... and 522 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e6569b...46695a0. Read the comment docs.

@corona10
Copy link
Member

corona10 commented Mar 1, 2020

@serhiy-storchaka Can you please take a look?

@serhiy-storchaka serhiy-storchaka merged commit 4edc95c into python:master Mar 2, 2020
@hauntsaninja hauntsaninja deleted the bpo39495_2 branch March 2, 2020 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants