Skip to content

Commit e1d5dd6

Browse files
authored
bpo-13611: C14N 2.0 implementation for ElementTree (GH-12966)
* Implement C14N 2.0 as a new canonicalize() function in ElementTree. Missing features: - prefix renaming in XPath expressions (tag and attribute text is supported) - preservation of original prefixes given redundant namespace declarations
1 parent ee88af3 commit e1d5dd6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+920
-0
lines changed

Doc/library/xml.etree.elementtree.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,53 @@ Reference
465465
Functions
466466
^^^^^^^^^
467467

468+
.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)
469+
470+
`C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.
471+
472+
Canonicalization is a way to normalise XML output in a way that allows
473+
byte-by-byte comparisons and digital signatures. It reduced the freedom
474+
that XML serializers have and instead generates a more constrained XML
475+
representation. The main restrictions regard the placement of namespace
476+
declarations, the ordering of attributes, and ignorable whitespace.
477+
478+
This function takes an XML data string (*xml_data*) or a file path or
479+
file-like object (*from_file*) as input, converts it to the canonical
480+
form, and writes it out using the *out* file(-like) object, if provided,
481+
or returns it as a text string if not. The output file receives text,
482+
not bytes. It should therefore be opened in text mode with ``utf-8``
483+
encoding.
484+
485+
Typical uses::
486+
487+
xml_data = "<root>...</root>"
488+
print(canonicalize(xml_data))
489+
490+
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
491+
canonicalize(xml_data, out=out_file)
492+
493+
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
494+
canonicalize(from_file="inputfile.xml", out=out_file)
495+
496+
The configuration *options* are as follows:
497+
498+
- *with_comments*: set to true to include comments (default: false)
499+
- *strip_text*: set to true to strip whitespace before and after text content
500+
(default: false)
501+
- *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
502+
(default: false)
503+
- *qname_aware_tags*: a set of qname aware tag names in which prefixes
504+
should be replaced in text content (default: empty)
505+
- *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
506+
should be replaced in text content (default: empty)
507+
- *exclude_attrs*: a set of attribute names that should not be serialised
508+
- *exclude_tags*: a set of tag names that should not be serialised
509+
510+
In the option list above, "a set" refers to any collection or iterable of
511+
strings, no ordering is expected.
512+
513+
.. versionadded:: 3.8
514+
468515

469516
.. function:: Comment(text=None)
470517

@@ -1114,6 +1161,19 @@ TreeBuilder Objects
11141161
.. versionadded:: 3.8
11151162

11161163

1164+
.. class:: C14NWriterTarget(write, *, \
1165+
with_comments=False, strip_text=False, rewrite_prefixes=False, \
1166+
qname_aware_tags=None, qname_aware_attrs=None, \
1167+
exclude_attrs=None, exclude_tags=None)
1168+
1169+
A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the
1170+
same as for the :func:`canonicalize` function. This class does not build a
1171+
tree but translates the callback events directly into a serialised form
1172+
using the *write* function.
1173+
1174+
.. versionadded:: 3.8
1175+
1176+
11171177
.. _elementtree-xmlparser-objects:
11181178

11191179
XMLParser Objects

Doc/whatsnew/3.8.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -525,6 +525,10 @@ xml
525525
external entities by default.
526526
(Contributed by Christian Heimes in :issue:`17239`.)
527527

528+
* The :mod:`xml.etree.ElementTree` module provides a new function
529+
:func:`–xml.etree.ElementTree.canonicalize()` that implements C14N 2.0.
530+
(Contributed by Stefan Behnel in :issue:`13611`.)
531+
528532

529533
Optimizations
530534
=============

Lib/test/test_xml_etree.py

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
import itertools
1313
import locale
1414
import operator
15+
import os
1516
import pickle
1617
import sys
1718
import textwrap
@@ -20,6 +21,7 @@
2021
import warnings
2122
import weakref
2223

24+
from functools import partial
2325
from itertools import product, islice
2426
from test import support
2527
from test.support import TESTFN, findfile, import_fresh_module, gc_collect, swap_attr
@@ -3527,6 +3529,231 @@ def test_correct_import_pyET(self):
35273529
self.assertIsInstance(pyET.Element.__init__, types.FunctionType)
35283530
self.assertIsInstance(pyET.XMLParser.__init__, types.FunctionType)
35293531

3532+
3533+
# --------------------------------------------------------------------
3534+
3535+
def c14n_roundtrip(xml, **options):
3536+
return pyET.canonicalize(xml, **options)
3537+
3538+
3539+
class C14NTest(unittest.TestCase):
3540+
maxDiff = None
3541+
3542+
#
3543+
# simple roundtrip tests (from c14n.py)
3544+
3545+
def test_simple_roundtrip(self):
3546+
# Basics
3547+
self.assertEqual(c14n_roundtrip("<doc/>"), '<doc></doc>')
3548+
self.assertEqual(c14n_roundtrip("<doc xmlns='uri'/>"), # FIXME
3549+
'<doc xmlns="uri"></doc>')
3550+
self.assertEqual(c14n_roundtrip("<prefix:doc xmlns:prefix='uri'/>"),
3551+
'<prefix:doc xmlns:prefix="uri"></prefix:doc>')
3552+
self.assertEqual(c14n_roundtrip("<doc xmlns:prefix='uri'><prefix:bar/></doc>"),
3553+
'<doc><prefix:bar xmlns:prefix="uri"></prefix:bar></doc>')
3554+
self.assertEqual(c14n_roundtrip("<elem xmlns:wsu='http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd' xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/' />"),
3555+
'<elem></elem>')
3556+
3557+
# C14N spec
3558+
self.assertEqual(c14n_roundtrip("<doc>Hello, world!<!-- Comment 1 --></doc>"),
3559+
'<doc>Hello, world!</doc>')
3560+
self.assertEqual(c14n_roundtrip("<value>&#x32;</value>"),
3561+
'<value>2</value>')
3562+
self.assertEqual(c14n_roundtrip('<compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>'),
3563+
'<compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute>')
3564+
self.assertEqual(c14n_roundtrip('''<compute expr='value>"0" &amp;&amp; value&lt;"10" ?"valid":"error"'>valid</compute>'''),
3565+
'<compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot; ?&quot;valid&quot;:&quot;error&quot;">valid</compute>')
3566+
self.assertEqual(c14n_roundtrip("<norm attr=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/>"),
3567+
'<norm attr=" \' &#xD;&#xA;&#x9; \' "></norm>')
3568+
self.assertEqual(c14n_roundtrip("<normNames attr=' A &#x20;&#13;&#xa;&#9; B '/>"),
3569+
'<normNames attr=" A &#xD;&#xA;&#x9; B "></normNames>')
3570+
self.assertEqual(c14n_roundtrip("<normId id=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/>"),
3571+
'<normId id=" \' &#xD;&#xA;&#x9; \' "></normId>')
3572+
3573+
# fragments from PJ's tests
3574+
#self.assertEqual(c14n_roundtrip("<doc xmlns:x='http://example.com/x' xmlns='http://example.com/default'><b y:a1='1' xmlns='http://example.com/default' a3='3' xmlns:y='http://example.com/y' y:a2='2'/></doc>"),
3575+
#'<doc xmlns:x="http://example.com/x"><b xmlns:y="http://example.com/y" a3="3" y:a1="1" y:a2="2"></b></doc>')
3576+
3577+
def test_c14n_exclusion(self):
3578+
xml = textwrap.dedent("""\
3579+
<root xmlns:x="http://example.com/x">
3580+
<a x:attr="attrx">
3581+
<b>abtext</b>
3582+
</a>
3583+
<b>btext</b>
3584+
<c>
3585+
<x:d>dtext</x:d>
3586+
</c>
3587+
</root>
3588+
""")
3589+
self.assertEqual(
3590+
c14n_roundtrip(xml, strip_text=True),
3591+
'<root>'
3592+
'<a xmlns:x="http://example.com/x" x:attr="attrx"><b>abtext</b></a>'
3593+
'<b>btext</b>'
3594+
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
3595+
'</root>')
3596+
self.assertEqual(
3597+
c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr']),
3598+
'<root>'
3599+
'<a><b>abtext</b></a>'
3600+
'<b>btext</b>'
3601+
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
3602+
'</root>')
3603+
self.assertEqual(
3604+
c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d']),
3605+
'<root>'
3606+
'<a xmlns:x="http://example.com/x" x:attr="attrx"><b>abtext</b></a>'
3607+
'<b>btext</b>'
3608+
'<c></c>'
3609+
'</root>')
3610+
self.assertEqual(
3611+
c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr'],
3612+
exclude_tags=['{http://example.com/x}d']),
3613+
'<root>'
3614+
'<a><b>abtext</b></a>'
3615+
'<b>btext</b>'
3616+
'<c></c>'
3617+
'</root>')
3618+
self.assertEqual(
3619+
c14n_roundtrip(xml, strip_text=True, exclude_tags=['a', 'b']),
3620+
'<root>'
3621+
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
3622+
'</root>')
3623+
self.assertEqual(
3624+
c14n_roundtrip(xml, exclude_tags=['a', 'b']),
3625+
'<root>\n'
3626+
' \n'
3627+
' \n'
3628+
' <c>\n'
3629+
' <x:d xmlns:x="http://example.com/x">dtext</x:d>\n'
3630+
' </c>\n'
3631+
'</root>')
3632+
self.assertEqual(
3633+
c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d', 'b']),
3634+
'<root>'
3635+
'<a xmlns:x="http://example.com/x" x:attr="attrx"></a>'
3636+
'<c></c>'
3637+
'</root>')
3638+
self.assertEqual(
3639+
c14n_roundtrip(xml, exclude_tags=['{http://example.com/x}d', 'b']),
3640+
'<root>\n'
3641+
' <a xmlns:x="http://example.com/x" x:attr="attrx">\n'
3642+
' \n'
3643+
' </a>\n'
3644+
' \n'
3645+
' <c>\n'
3646+
' \n'
3647+
' </c>\n'
3648+
'</root>')
3649+
3650+
#
3651+
# basic method=c14n tests from the c14n 2.0 specification. uses
3652+
# test files under xmltestdata/c14n-20.
3653+
3654+
# note that this uses generated C14N versions of the standard ET.write
3655+
# output, not roundtripped C14N (see above).
3656+
3657+
def test_xml_c14n2(self):
3658+
datadir = findfile("c14n-20", subdir="xmltestdata")
3659+
full_path = partial(os.path.join, datadir)
3660+
3661+
files = [filename[:-4] for filename in sorted(os.listdir(datadir))
3662+
if filename.endswith('.xml')]
3663+
input_files = [
3664+
filename for filename in files
3665+
if filename.startswith('in')
3666+
]
3667+
configs = {
3668+
filename: {
3669+
# <c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
3670+
option.tag.split('}')[-1]: ((option.text or '').strip(), option)
3671+
for option in ET.parse(full_path(filename) + ".xml").getroot()
3672+
}
3673+
for filename in files
3674+
if filename.startswith('c14n')
3675+
}
3676+
3677+
tests = {
3678+
input_file: [
3679+
(filename, configs[filename.rsplit('_', 1)[-1]])
3680+
for filename in files
3681+
if filename.startswith(f'out_{input_file}_')
3682+
and filename.rsplit('_', 1)[-1] in configs
3683+
]
3684+
for input_file in input_files
3685+
}
3686+
3687+
# Make sure we found all test cases.
3688+
self.assertEqual(30, len([
3689+
output_file for output_files in tests.values()
3690+
for output_file in output_files]))
3691+
3692+
def get_option(config, option_name, default=None):
3693+
return config.get(option_name, (default, ()))[0]
3694+
3695+
for input_file, output_files in tests.items():
3696+
for output_file, config in output_files:
3697+
keep_comments = get_option(
3698+
config, 'IgnoreComments') == 'true' # no, it's right :)
3699+
strip_text = get_option(
3700+
config, 'TrimTextNodes') == 'true'
3701+
rewrite_prefixes = get_option(
3702+
config, 'PrefixRewrite') == 'sequential'
3703+
if 'QNameAware' in config:
3704+
qattrs = [
3705+
f"{{{el.get('NS')}}}{el.get('Name')}"
3706+
for el in config['QNameAware'][1].findall(
3707+
'{http://www.w3.org/2010/xml-c14n2}QualifiedAttr')
3708+
]
3709+
qtags = [
3710+
f"{{{el.get('NS')}}}{el.get('Name')}"
3711+
for el in config['QNameAware'][1].findall(
3712+
'{http://www.w3.org/2010/xml-c14n2}Element')
3713+
]
3714+
else:
3715+
qtags = qattrs = None
3716+
3717+
# Build subtest description from config.
3718+
config_descr = ','.join(
3719+
f"{name}={value or ','.join(c.tag.split('}')[-1] for c in children)}"
3720+
for name, (value, children) in sorted(config.items())
3721+
)
3722+
3723+
with self.subTest(f"{output_file}({config_descr})"):
3724+
if input_file == 'inNsRedecl' and not rewrite_prefixes:
3725+
self.skipTest(
3726+
f"Redeclared namespace handling is not supported in {output_file}")
3727+
if input_file == 'inNsSuperfluous' and not rewrite_prefixes:
3728+
self.skipTest(
3729+
f"Redeclared namespace handling is not supported in {output_file}")
3730+
if 'QNameAware' in config and config['QNameAware'][1].find(
3731+
'{http://www.w3.org/2010/xml-c14n2}XPathElement') is not None:
3732+
self.skipTest(
3733+
f"QName rewriting in XPath text is not supported in {output_file}")
3734+
3735+
f = full_path(input_file + ".xml")
3736+
if input_file == 'inC14N5':
3737+
# Hack: avoid setting up external entity resolution in the parser.
3738+
with open(full_path('world.txt'), 'rb') as entity_file:
3739+
with open(f, 'rb') as f:
3740+
f = io.BytesIO(f.read().replace(b'&ent2;', entity_file.read()))
3741+
3742+
text = ET.canonicalize(
3743+
from_file=f,
3744+
with_comments=keep_comments,
3745+
strip_text=strip_text,
3746+
rewrite_prefixes=rewrite_prefixes,
3747+
qname_aware_tags=qtags, qname_aware_attrs=qattrs)
3748+
3749+
with open(full_path(output_file + ".xml"), 'r', encoding='utf8') as f:
3750+
expected = f.read()
3751+
if input_file == 'inC14N3':
3752+
# FIXME: cET resolves default attributes but ET does not!
3753+
expected = expected.replace(' attr="default"', '')
3754+
text = text.replace(' attr="default"', '')
3755+
self.assertEqual(expected, text)
3756+
35303757
# --------------------------------------------------------------------
35313758

35323759

@@ -3559,6 +3786,8 @@ def test_main(module=None):
35593786
XMLParserTest,
35603787
XMLPullParserTest,
35613788
BugsTest,
3789+
KeywordArgsTest,
3790+
C14NTest,
35623791
]
35633792

35643793
# These tests will only run for the pure-Python version that doesn't import
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:IgnoreComments>true</c14n2:IgnoreComments>
3+
</dsig:CanonicalizationMethod>
4+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
</dsig:CanonicalizationMethod>
3+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
3+
</dsig:CanonicalizationMethod>
4+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
3+
<c14n2:QNameAware>
4+
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
5+
</c14n2:QNameAware>
6+
</dsig:CanonicalizationMethod>
7+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
3+
<c14n2:QNameAware>
4+
<c14n2:Element Name="bar" NS="http://a"/>
5+
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
6+
</c14n2:QNameAware>
7+
</dsig:CanonicalizationMethod>
8+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:QNameAware>
3+
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
4+
</c14n2:QNameAware>
5+
</dsig:CanonicalizationMethod>
6+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:QNameAware>
3+
<c14n2:Element Name="bar" NS="http://a"/>
4+
</c14n2:QNameAware>
5+
</dsig:CanonicalizationMethod>
6+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:QNameAware>
3+
<c14n2:Element Name="bar" NS="http://a"/>
4+
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
5+
</c14n2:QNameAware>
6+
</dsig:CanonicalizationMethod>
7+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
2+
<c14n2:TrimTextNodes>true</c14n2:TrimTextNodes>
3+
</dsig:CanonicalizationMethod>
4+

0 commit comments

Comments
 (0)