Skip to content

Commit 5fd8123

Browse files
authored
bpo-39011: Preserve line endings within ElementTree attributes (GH-18468)
* bpo-39011: Preserve line endings within attributes Line endings within attributes were previously normalized to "\n" in Py3.7/3.8. This patch removes that normalization, as line endings which were replaced by entity numbers should be preserved in original form.
1 parent 8f87eef commit 5fd8123

File tree

4 files changed

+22
-9
lines changed

4 files changed

+22
-9
lines changed

Doc/whatsnew/3.9.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,15 @@ customization consistently by always using the value specified by
412412
case), and one used ``__VENV_NAME__`` instead.
413413
(Contributed by Brett Cannon in :issue:`37663`.)
414414

415+
xml
416+
---
417+
418+
White space characters within attributes are now preserved when serializing
419+
:mod:`xml.etree.ElementTree` to XML file. EOLNs are no longer normalized
420+
to "\n". This is the result of discussion about how to interpret
421+
section 2.11 of XML spec.
422+
(Contributed by Mefistotelis in :issue:`39011`.)
423+
415424

416425
Optimizations
417426
=============

Lib/test/test_xml_etree.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -430,13 +430,14 @@ def test_attrib(self):
430430
self.assertEqual(ET.tostring(elem),
431431
b'<test testa="testval" testb="test1" testc="test2">aa</test>')
432432

433+
# Test preserving white space chars in attributes
433434
elem = ET.Element('test')
434435
elem.set('a', '\r')
435436
elem.set('b', '\r\n')
436437
elem.set('c', '\t\n\r ')
437-
elem.set('d', '\n\n')
438+
elem.set('d', '\n\n\r\r\t\t ')
438439
self.assertEqual(ET.tostring(elem),
439-
b'<test a="&#10;" b="&#10;" c="&#09;&#10;&#10; " d="&#10;&#10;" />')
440+
b'<test a="&#13;" b="&#13;&#10;" c="&#09;&#10;&#13; " d="&#10;&#10;&#13;&#13;&#09;&#09; " />')
440441

441442
def test_makeelement(self):
442443
# Test makeelement handling.

Lib/xml/etree/ElementTree.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,15 +1057,15 @@ def _escape_attrib(text):
10571057
text = text.replace(">", "&gt;")
10581058
if "\"" in text:
10591059
text = text.replace("\"", "&quot;")
1060-
# The following business with carriage returns is to satisfy
1061-
# Section 2.11 of the XML specification, stating that
1062-
# CR or CR LN should be replaced with just LN
1060+
# Although section 2.11 of the XML specification states that CR or
1061+
# CR LN should be replaced with just LN, it applies only to EOLNs
1062+
# which take part of organizing file into lines. Within attributes,
1063+
# we are replacing these with entity numbers, so they do not count.
10631064
# http://www.w3.org/TR/REC-xml/#sec-line-ends
1064-
if "\r\n" in text:
1065-
text = text.replace("\r\n", "\n")
1065+
# The current solution, contained in following six lines, was
1066+
# discussed in issue 17582 and 39011.
10661067
if "\r" in text:
1067-
text = text.replace("\r", "\n")
1068-
#The following four lines are issue 17582
1068+
text = text.replace("\r", "&#13;")
10691069
if "\n" in text:
10701070
text = text.replace("\n", "&#10;")
10711071
if "\t" in text:
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Normalization of line endings in ElementTree attributes was removed, as line
2+
endings which were replaced by entity numbers should be preserved in
3+
original form.

0 commit comments

Comments
 (0)