Skip to content

Commit 2d2df11

Browse files
committed
bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilder in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default).
1 parent ab7341d commit 2d2df11

File tree

5 files changed

+258
-80
lines changed

5 files changed

+258
-80
lines changed

Doc/library/xml.etree.elementtree.rst

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1026,24 +1026,24 @@ TreeBuilder Objects
10261026
^^^^^^^^^^^^^^^^^^^
10271027

10281028

1029-
.. class:: TreeBuilder(element_factory=None, comment_factory=None, \
1030-
pi_factory=None)
1029+
.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \
1030+
pi_factory=None, insert_comments=False, insert_pis=False)
10311031

10321032
Generic element structure builder. This builder converts a sequence of
1033-
start, data, and end method calls to a well-formed element structure. You
1034-
can use this class to build an element structure using a custom XML parser,
1035-
or a parser for some other XML-like format.
1033+
start, data, end, comment and pi method calls to a well-formed element
1034+
structure. You can use this class to build an element structure using
1035+
a custom XML parser, or a parser for some other XML-like format.
10361036

10371037
*element_factory*, when given, must be a callable accepting two positional
10381038
arguments: a tag and a dict of attributes. It is expected to return a new
10391039
element instance.
10401040

10411041
The *comment_factory* and *pi_factory* functions, when given, should behave
10421042
like the :func:`Comment` and :func:`ProcessingInstruction` functions to
1043-
create comments and processing instructions. When not given, no comments
1044-
or processing instructions will be created. Note that these objects will
1045-
not currently be appended to the tree when they appear outside of the root
1046-
element.
1043+
create comments and processing instructions. When not given, the default
1044+
factories will be used. When *insert_comments* and/or *insert_pis* is true,
1045+
comments/pis will be inserted into the tree if they appear within the root
1046+
element (but not outside of it).
10471047

10481048
.. method:: close()
10491049

@@ -1068,13 +1068,15 @@ TreeBuilder Objects
10681068
Opens a new element. *tag* is the element name. *attrs* is a dictionary
10691069
containing element attributes. Returns the opened element.
10701070

1071+
10711072
.. method:: comment(text)
10721073

10731074
Adds a comment with the given *text*. If *comment_factory* is
10741075
:const:`None`, this will just return the text.
10751076

10761077
.. versionadded:: 3.8
10771078

1079+
10781080
.. method:: pi(target, text)
10791081

10801082
Adds a comment with the given *target* name and *text*. If
@@ -1201,11 +1203,13 @@ XMLPullParser Objects
12011203
data fed to the
12021204
parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
12031205
string representing the type of event (e.g. ``"end"``) and *elem* is the
1204-
encountered :class:`Element` object.
1205-
For ``start-ns`` events, the ``elem`` is a tuple ``(prefix, uri)`` naming
1206-
the declared namespace mapping. For ``end-ns`` events, the ``elem`` is
1207-
:const:`None`. For ``comment`` events, the second value is the comment
1208-
text and for ``pi`` events a tuple ``(target, text)``.
1206+
encountered :class:`Element` object, or other context value as follows.
1207+
1208+
* ``start``, ``end``: the current Element.
1209+
* ``comment``, ``pi``: the current comment / processing instruction
1210+
* ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace
1211+
mapping.
1212+
* ``end-ns``: :const:`None` (this may change in a future version)
12091213

12101214
Events provided in a previous call to :meth:`read_events` will not be
12111215
yielded again. Events are consumed from the internal queue only when

Lib/test/test_xml_etree.py

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,7 +1194,10 @@ def _feed(self, parser, data, chunk_size=None):
11941194
parser.feed(data[i:i+chunk_size])
11951195

11961196
def assert_events(self, parser, expected):
1197-
self.assertEqual(list(parser.read_events()), expected)
1197+
self.assertEqual(
1198+
[(event, (elem.tag, elem.text))
1199+
for event, elem in parser.read_events()],
1200+
expected)
11981201

11991202
def assert_event_tags(self, parser, expected):
12001203
events = parser.read_events()
@@ -1321,30 +1324,29 @@ def test_events(self):
13211324
def test_events_comment(self):
13221325
parser = ET.XMLPullParser(events=('start', 'comment', 'end'))
13231326
self._feed(parser, "<!-- text here -->\n")
1324-
self.assert_events(parser, [('comment', ' text here ')])
1327+
self.assert_events(parser, [('comment', (ET.Comment, ' text here '))])
13251328
self._feed(parser, "<!-- more text here -->\n")
1326-
self.assert_events(parser, [('comment', ' more text here ')])
1329+
self.assert_events(parser, [('comment', (ET.Comment, ' more text here '))])
13271330
self._feed(parser, "<root-tag>text")
13281331
self.assert_event_tags(parser, [('start', 'root-tag')])
13291332
self._feed(parser, "<!-- inner comment-->\n")
1330-
self.assert_events(parser, [('comment', ' inner comment')])
1333+
self.assert_events(parser, [('comment', (ET.Comment, ' inner comment'))])
13311334
self._feed(parser, "</root-tag>\n")
13321335
self.assert_event_tags(parser, [('end', 'root-tag')])
13331336
self._feed(parser, "<!-- outer comment -->\n")
1334-
self.assert_events(parser, [('comment', ' outer comment ')])
1337+
self.assert_events(parser, [('comment', (ET.Comment, ' outer comment '))])
13351338

13361339
parser = ET.XMLPullParser(events=('comment',))
13371340
self._feed(parser, "<!-- text here -->\n")
1338-
self.assert_events(parser, [('comment', ' text here ')])
1341+
self.assert_events(parser, [('comment', (ET.Comment, ' text here '))])
13391342

13401343
def test_events_pi(self):
13411344
parser = ET.XMLPullParser(events=('start', 'pi', 'end'))
13421345
self._feed(parser, "<?pitarget?>\n")
1343-
self.assert_events(parser, [('pi', ('pitarget', ''))])
1346+
self.assert_events(parser, [('pi', (ET.PI, 'pitarget'))])
13441347
parser = ET.XMLPullParser(events=('pi',))
13451348
self._feed(parser, "<?pitarget some text ?>\n")
1346-
self.assert_events(parser, [('pi', ('pitarget', 'some text '))])
1347-
1349+
self.assert_events(parser, [('pi', (ET.PI, 'pitarget some text '))])
13481350

13491351
def test_events_sequence(self):
13501352
# Test that events can be some sequence that's not just a tuple or list
@@ -1365,7 +1367,6 @@ def __next__(self):
13651367
self._feed(parser, "<foo>bar</foo>")
13661368
self.assert_event_tags(parser, [('start', 'foo'), ('end', 'foo')])
13671369

1368-
13691370
def test_unknown_event(self):
13701371
with self.assertRaises(ValueError):
13711372
ET.XMLPullParser(events=('start', 'end', 'bogus'))
@@ -2693,7 +2694,8 @@ class DummyBuilder(BaseDummyBuilder):
26932694

26942695
def test_treebuilder_comment(self):
26952696
b = ET.TreeBuilder()
2696-
self.assertEqual(b.comment('ctext'), 'ctext')
2697+
self.assertEqual(b.comment('ctext').tag, ET.Comment)
2698+
self.assertEqual(b.comment('ctext').text, 'ctext')
26972699

26982700
b = ET.TreeBuilder(comment_factory=ET.Comment)
26992701
self.assertEqual(b.comment('ctext').tag, ET.Comment)
@@ -2704,7 +2706,8 @@ def test_treebuilder_comment(self):
27042706

27052707
def test_treebuilder_pi(self):
27062708
b = ET.TreeBuilder()
2707-
self.assertEqual(b.pi('target', None), ('target', None))
2709+
self.assertEqual(b.pi('target', None).tag, ET.PI)
2710+
self.assertEqual(b.pi('target', None).text, 'target')
27082711

27092712
b = ET.TreeBuilder(pi_factory=ET.PI)
27102713
self.assertEqual(b.pi('target').tag, ET.PI)
@@ -3408,6 +3411,12 @@ def test_main(module=None):
34083411
# Copy the path cache (should be empty)
34093412
path_cache = ElementPath._cache
34103413
ElementPath._cache = path_cache.copy()
3414+
# Align the Comment/PI factories.
3415+
if hasattr(ET, '_set_factories'):
3416+
old_factories = ET._set_factories(ET.Comment, ET.PI)
3417+
else:
3418+
old_factories = None
3419+
34113420
try:
34123421
support.run_unittest(*test_classes)
34133422
finally:
@@ -3416,6 +3425,8 @@ def test_main(module=None):
34163425
nsmap.clear()
34173426
nsmap.update(nsmap_copy)
34183427
ElementPath._cache = path_cache
3428+
if old_factories is not None:
3429+
ET._set_factories(*old_factories)
34193430
# don't interfere with subsequent tests
34203431
ET = pyET = None
34213432

Lib/xml/etree/ElementTree.py

Lines changed: 32 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1374,22 +1374,30 @@ class TreeBuilder:
13741374
*element_factory* is an optional element factory which is called
13751375
to create new Element instances, as necessary.
13761376
1377-
*comment_factory* is a factory to create comments. If not provided,
1378-
comments will not be inserted into the tree and "comment" pull parser
1379-
events will only return the plain text.
1377+
*comment_factory* is a factory to create comments to be used instead of
1378+
the standard factory. If *insert_comments* is false (the default),
1379+
comments will not be inserted into the tree.
13801380
1381-
*pi_factory* is a factory to create processing instructions. If not
1382-
provided, PIs will not be inserted into the tree and "pi" pull parser
1383-
events will only return a (target, text) tuple.
1381+
*pi_factory* is a factory to create processing instructions to be used
1382+
instead of the standard factory. If *insert_pis* is false (the default),
1383+
processing instructions will not be inserted into the tree.
13841384
"""
1385-
def __init__(self, element_factory=None, comment_factory=None, pi_factory=None):
1385+
def __init__(self, element_factory=None, *,
1386+
comment_factory=None, pi_factory=None,
1387+
insert_comments=False, insert_pis=False):
13861388
self._data = [] # data collector
13871389
self._elem = [] # element stack
13881390
self._last = None # last element
13891391
self._root = None # root element
13901392
self._tail = None # true if we're after an end tag
1393+
if comment_factory is None:
1394+
comment_factory = Comment
13911395
self._comment_factory = comment_factory
1396+
self.insert_comments = insert_comments
1397+
if pi_factory is None:
1398+
pi_factory = ProcessingInstruction
13921399
self._pi_factory = pi_factory
1400+
self.insert_pis = insert_pis
13931401
if element_factory is None:
13941402
element_factory = Element
13951403
self._factory = element_factory
@@ -1450,34 +1458,28 @@ def end(self, tag):
14501458
def comment(self, text):
14511459
"""Create a comment using the comment_factory.
14521460
1453-
If no factory is provided, comments are ignored
1454-
and the text returned as is.
1455-
14561461
*text* is the text of the comment.
14571462
"""
1458-
if self._comment_factory is None:
1459-
return text
1460-
return self._handle_single(self._comment_factory, text)
1463+
return self._handle_single(
1464+
self._comment_factory, self.insert_comments, text)
14611465

14621466
def pi(self, target, text=None):
14631467
"""Create a processing instruction using the pi_factory.
14641468
1465-
If no factory is provided, PIs are ignored and a (target, text)
1466-
tuple is returned.
1467-
14681469
*target* is the target name of the processing instruction.
14691470
*text* is the data of the processing instruction, or ''.
14701471
"""
1471-
if self._pi_factory is None:
1472-
return (target, text)
1473-
return self._handle_single(self._pi_factory, target, text)
1474-
1475-
def _handle_single(self, factory, *args):
1476-
self._flush()
1477-
self._last = elem = factory(*args)
1478-
if self._elem:
1479-
self._elem[-1].append(elem)
1480-
self._tail = 1
1472+
return self._handle_single(
1473+
self._pi_factory, self.insert_pis, target, text)
1474+
1475+
def _handle_single(self, factory, insert, *args):
1476+
elem = factory(*args)
1477+
if insert:
1478+
self._flush()
1479+
self._last = elem
1480+
if self._elem:
1481+
self._elem[-1].append(elem)
1482+
self._tail = 1
14811483
return elem
14821484

14831485

@@ -1694,7 +1696,10 @@ def close(self):
16941696
# (see tests)
16951697
_Element_Py = Element
16961698

1697-
# Element, SubElement, ParseError, TreeBuilder, XMLParser
1699+
# Element, SubElement, ParseError, TreeBuilder, XMLParser, _set_factories
16981700
from _elementtree import *
1701+
from _elementtree import _set_factories
16991702
except ImportError:
17001703
pass
1704+
else:
1705+
_set_factories(Comment, ProcessingInstruction)

0 commit comments

Comments
 (0)