Skip to content

bpo-31648: Improve ElementPath #3835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 30, 2017
5 changes: 5 additions & 0 deletions Doc/library/xml.etree.elementtree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,11 @@ Supported XPath syntax
| ``[tag]`` | Selects all elements that have a child named |
| | ``tag``. Only immediate children are supported. |
+-----------------------+------------------------------------------------------+
| ``[.='text']`` | Selects all elements whose complete text content, |
| | including descendants, equals the given ``text``. |
| | |
| | .. versionadded:: 3.7 |
+-----------------------+------------------------------------------------------+
| ``[tag='text']`` | Selects all elements that have a child named |
| | ``tag`` whose complete text content, including |
| | descendants, equals the given ``text``. |
Expand Down
8 changes: 8 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,14 @@ Function :func:`~uu.encode` now accepts an optional *backtick*
keyword argument. When it's true, zeros are represented by ``'`'``
instead of spaces. (Contributed by Xiang Zhang in :issue:`30103`.)

xml.etree
---------

:ref:`ElementPath <elementtree-xpath>` predicates in the :meth:`find`
methods can now compare text of the current node with ``[. = "text"]``,
not only text in children. Predicates also allow adding spaces for
better readability. (Contributed by Stefan Behnel in :issue:`31648`.)

zipapp
------

Expand Down
33 changes: 33 additions & 0 deletions Lib/test/test_xml_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -2237,6 +2237,39 @@ def test_findall(self):
['tag'] * 2)
self.assertEqual(e.findall('section//'), e.findall('section//*'))

self.assertEqual(summarize_list(e.findall(".//section[tag='subtext']")),
['section'])
self.assertEqual(summarize_list(e.findall(".//section[tag ='subtext']")),
['section'])
self.assertEqual(summarize_list(e.findall(".//section[tag= 'subtext']")),
['section'])
self.assertEqual(summarize_list(e.findall(".//section[tag = 'subtext']")),
['section'])
self.assertEqual(summarize_list(e.findall(".//section[ tag = 'subtext' ]")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be enough to test just this variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitebox testing, yes. Blackbox testing, no. It's short enough, so I chose the latter. In parsers, it's easy to accidentally forget about special cases.

['section'])

self.assertEqual(summarize_list(e.findall(".//tag[.='subtext']")),
['tag'])
self.assertEqual(summarize_list(e.findall(".//tag[. ='subtext']")),
['tag'])
self.assertEqual(summarize_list(e.findall('.//tag[.= "subtext"]')),
['tag'])
self.assertEqual(summarize_list(e.findall('.//tag[ . = "subtext" ]')),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be enough to test just this variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies.

['tag'])
self.assertEqual(summarize_list(e.findall(".//tag[. = 'subtext']")),
['tag'])
self.assertEqual(summarize_list(e.findall(".//tag[. = 'subtext ']")),
[])
self.assertEqual(summarize_list(e.findall(".//tag[.= ' subtext']")),
[])

# duplicate section => 2x tag matches
e[1] = e[2]
self.assertEqual(summarize_list(e.findall(".//section[tag = 'subtext']")),
['section', 'section'])
self.assertEqual(summarize_list(e.findall(".//tag[. = 'subtext']")),
['tag', 'tag'])

def test_test_find_with_ns(self):
e = ET.XML(SAMPLE_XML_NS)
self.assertEqual(summarize_list(e.findall('tag')), [])
Expand Down
23 changes: 16 additions & 7 deletions Lib/xml/etree/ElementPath.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@ def prepare_predicate(next, token):
return
if token[0] == "]":
break
if token == ('', ''):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for whitespaces? Please add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# ignore whitespace
continue
if token[0] and token[0][:1] in "'\"":
token = "'", token[0][1:-1]
signature.append(token[0] or "-")
Expand Down Expand Up @@ -188,16 +191,22 @@ def select(context, result):
if elem.find(tag) is not None:
yield elem
return select
if signature == "-='" and not re.match(r"\-?\d+$", predicate[0]):
# [tag='value']
if signature == ".='" or (signature == "-='" and not re.match(r"\-?\d+$", predicate[0])):
# [.='value'] or [tag='value']
tag = predicate[0]
value = predicate[-1]
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
if tag:
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
yield elem
break
else:
def select(context, result):
for elem in result:
if "".join(elem.itertext()) == value:
yield elem
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a special test case for this bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is already added!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done.

return select
if signature == "-" or signature == "-()" or signature == "-()-":
# [index] or [last()] or [last()-index]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Improvements to path predicates in ElementTree:

* Allow whitespace around predicate parts, i.e. "[a = 'text']" instead of requiring the less readable "[a='text']".
* Add support for text comparison of the current node, like "[.='text']".

Patch by Stefan Behnel.