Skip to content

Commit ea97668

Browse files
committed
Make headerregistry fully part of the provisional api.
When I made the checkin of the provisional email policy, I knew that Address and Group needed to be made accessible from somewhere. The more I looked at it, though, the more it became clear that since this is a provisional API anyway, there's no good reason to hide headerregistry as a private API. It was designed to ultimately be part of the public API, and so it should be part of the provisional API. This patch fully documents the headerregistry API, and deletes the abbreviated version of those docs I had added to the provisional policy docs.
1 parent 393da32 commit ea97668

File tree

7 files changed

+429
-213
lines changed

7 files changed

+429
-213
lines changed

Doc/library/email.headerregistry.rst

Lines changed: 379 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
:mod:`email.headerregistry`: Custom Header Objects
2+
--------------------------------------------------
3+
4+
.. module:: email.headerregistry
5+
:synopsis: Automatic Parsing of headers based on the field name
6+
7+
.. note::
8+
9+
The headerregistry module has been included in the standard library on a
10+
:term:`provisional basis <provisional package>`. Backwards incompatible
11+
changes (up to and including removal of the module) may occur if deemed
12+
necessary by the core developers.
13+
14+
.. versionadded:: 3.3
15+
as a :term:`provisional module <provisional package>`
16+
17+
Headers are represented by customized subclasses of :class:`str`. The
18+
particular class used to represent a given header is determined by the
19+
:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
20+
effect when the headers are created. This section documents the particular
21+
``header_factory`` implemented by the email package for handling :RFC:`5322`
22+
compliant email messages, which not only provides customized header objects for
23+
various header types, but also provides an extension mechanism for applications
24+
to add their own custom header types.
25+
26+
When using any of the policy objects derived from
27+
:data:`~email.policy.EmailPolicy`, all headers are produced by
28+
:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
29+
class. Each header class has an additional base class that is determined by
30+
the type of the header. For example, many headers have the class
31+
:class:`.UnstructuredHeader` as their other base class. The specialized second
32+
class for a header is determined by the name of the header, using a lookup
33+
table stored in the :class:`.HeaderRegistry`. All of this is managed
34+
transparently for the typical application program, but interfaces are provided
35+
for modifying the default behavior for use by more complex applications.
36+
37+
The sections below first document the header base classes and their attributes,
38+
followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
39+
finally the support classes used to represent the data parsed from structured
40+
headers.
41+
42+
43+
.. class:: BaseHeader(name, value)
44+
45+
*name* and *value* are passed to ``BaseHeader`` from the
46+
:attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
47+
any header object is the *value* fully decoded to unicode.
48+
49+
This base class defines the following read-only properties:
50+
51+
52+
.. attribute:: name
53+
54+
The name of the header (the portion of the field before the ':'). This
55+
is exactly the value passed in the :attr:`~EmailPolicy.header_factory`
56+
call for *name*; that is, case is preserved.
57+
58+
59+
.. attribute:: defects
60+
61+
A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
62+
RFC compliance problems found during parsing. The email package tries to
63+
be complete about detecting compliance issues. See the :mod:`errors`
64+
module for a discussion of the types of defects that may be reported.
65+
66+
67+
.. attribute:: max_count
68+
69+
The maximum number of headers of this type that can have the same
70+
``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
71+
for this attribute is ``None``; it is expected that specialized header
72+
classes will override this value as needed.
73+
74+
``BaseHeader`` also provides the following method, which is called by the
75+
email library code and should not in general be called by application
76+
programs:
77+
78+
.. method:: fold(*, policy)
79+
80+
Return a string containing :attr:`~email.policy.Policy.linesep`
81+
characters as required to correctly fold the header according
82+
to *policy*. A :attr:`~email.policy.Policy.cte_type` of
83+
``8bit`` will be treated as if it were ``7bit``, since strings
84+
may not contain binary data.
85+
86+
87+
``BaseHeader`` by itself cannot be used to create a header object. It
88+
defines a protocol that each specialized header cooperates with in order to
89+
produce the header object. Specifically, ``BaseHeader`` requires that
90+
the specialized class provide a :func:`classmethod` named ``parse``. This
91+
method is called as follows::
92+
93+
parse(string, kwds)
94+
95+
``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
96+
``defects`` is an empty list. The parse method should append any detected
97+
defects to this list. On return, the ``kwds`` dictionary *must* contain
98+
values for at least the keys ``decoded`` and ``defects``. ``decoded``
99+
should be the string value for the header (that is, the header value fully
100+
decoded to unicode). The parse method should assume that *string* may
101+
contain transport encoded parts, but should correctly handle all valid
102+
unicode characters as well so that it can parse un-encoded header values.
103+
104+
``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
105+
``init`` method. The specialized class only needs to provide an ``init``
106+
method if it wishes to set additional attributes beyond those provided by
107+
``BaseHeader`` itself. Such an ``init`` method should look like this::
108+
109+
def init(self, *args, **kw):
110+
self._myattr = kw.pop('myattr')
111+
super().init(*args, **kw)
112+
113+
That is, anything extra that the specialized class puts in to the ``kwds``
114+
dictionary should be removed and handled, and the remaining contents of
115+
``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
116+
117+
118+
.. class:: UnstructuredHeader
119+
120+
An "unstructured" header is the default type of header in :rfc:`5322`.
121+
Any header that does not have a specified syntax is treated as
122+
unstructured. The classic example of an unstructured header is the
123+
:mailheader:`Subject` header.
124+
125+
In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
126+
ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
127+
mechanism for encoding non-ASCII text as ASCII characters within a header
128+
value. When a *value* containing encoded words is passed to the
129+
constructor, the ``UnstructuredHeader`` parser converts such encoded words
130+
back in to the original unicode, following the :rfc:`2047` rules for
131+
unstructured text. The parser uses heuristics to attempt to decode certain
132+
non-compliant encoded words. Defects are registered in such cases, as well
133+
as defects for issues such as invalid characters within the encoded words or
134+
the non-encoded text.
135+
136+
This header type provides no additional attributes.
137+
138+
139+
.. class:: DateHeader
140+
141+
:rfc:`5322` specifies a very specific format for dates within email headers.
142+
The ``DateHeader`` parser recognizes that date format, as well as
143+
recognizing a number of variant forms that are sometimes found "in the
144+
wild".
145+
146+
This header type provides the following additional attributes:
147+
148+
.. attribute:: datetime
149+
150+
If the header value can be recognized as a valid date of one form or
151+
another, this attribute will contain a :class:`~datetime.datetime`
152+
instance representing that date. If the timezone of the input date is
153+
specified as ``-0000`` (indicating it is in UTC but contains no
154+
information about the source timezone), then :attr:`.datetime` will be a
155+
naive :class:`~datetime.datetime`. If a specific timezone offset is
156+
found (including `+0000`), then :attr:`.datetime` will contain an aware
157+
``datetime`` that uses :class:`datetime.timezone` to record the timezone
158+
offset.
159+
160+
The ``decoded`` value of the header is determined by formatting the
161+
``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
162+
163+
email.utils.format_datetime(self.datetime)
164+
165+
When creating a ``DateHeader``, *value* may be
166+
:class:`~datetime.datetime` instance. This means, for example, that
167+
the following code is valid and does what one would expect::
168+
169+
msg['Date'] = datetime(2011, 7, 15, 21)
170+
171+
Because this is a naive ``datetime`` it will be interpreted as a UTC
172+
timestamp, and the resulting value will have a timezone of ``-0000``. Much
173+
more useful is to use the :func:`~email.utils.localtime` function from the
174+
:mod:`~email.utils` module::
175+
176+
msg['Date'] = utils.localtime()
177+
178+
This example sets the date header to the current time and date using
179+
the current timezone offset.
180+
181+
182+
.. class:: AddressHeader
183+
184+
Address headers are one of the most complex structured header types.
185+
The ``AddressHeader`` class provides a generic interface to any address
186+
header.
187+
188+
This header type provides the following additional attributes:
189+
190+
191+
.. attribute:: groups
192+
193+
A tuple of :class:`.Group` objects encoding the
194+
addresses and groups found in the header value. Addresses that are
195+
not part of a group are represented in this list as single-address
196+
``Groups`` whose :attr:`~.Group.display_name` is ``None``.
197+
198+
199+
.. attribute:: addresses
200+
201+
A tuple of :class:`.Address` objects encoding all
202+
of the individual addresses from the header value. If the header value
203+
contains any groups, the individual addresses from the group are included
204+
in the list at the point where the group occurs in the value (that is,
205+
the list of addresses is "flattened" into a one dimensional list).
206+
207+
The ``decoded`` value of the header will have all encoded words decoded to
208+
unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
209+
``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
210+
the elements of the ``groups`` attribute with ``', '``.
211+
212+
A list of :class:`.Address` and :class:`.Group` objects in any combination
213+
may be used to set the value of an address header. ``Group`` objects whose
214+
``display_name`` is ``None`` will be interpreted as single addresses, which
215+
allows an address list to be copied with groups intact by using the list
216+
obtained ``groups`` attribute of the source header.
217+
218+
219+
.. class:: SingleAddressHeader
220+
221+
A subclass of :class:`.AddressHeader` that adds one
222+
additional attribute:
223+
224+
225+
.. attribute:: address
226+
227+
The single address encoded by the header value. If the header value
228+
actually contains more than one address (which would be a violation of
229+
the RFC under the default :mod:`policy`), accessing this attribute will
230+
result in a :exc:`ValueError`.
231+
232+
233+
Each of the above classes also has a ``Unique`` variant (for example,
234+
``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
235+
variant, :attr:`~.BaseHeader.max_count` is set to 1.
236+
237+
238+
.. class:: HeaderRegistry(base_class=BaseHeader, \
239+
default_class=UnstructuredHeader, \
240+
use_default_map=True)
241+
242+
This is the factory used by :class:`~email.policy.EmailPolicy` by default.
243+
``HeaderRegistry`` builds the class used to create a header instance
244+
dynamically, using *base_class* and a specialized class retrieved from a
245+
registry that it holds. When a given header name does not appear in the
246+
registry, the class specified by *default_class* is used as the specialized
247+
class. When *use_default_map* is ``True`` (the default), the standard
248+
mapping of header names to classes is copied in to the registry during
249+
initialization. *base_class* is always the last class in the generated
250+
class's ``__bases__`` list.
251+
252+
The default mappings are:
253+
254+
:subject: UniqueUnstructuredHeader
255+
:date: UniqueDateHeader
256+
:resent-date: DateHeader
257+
:orig-date: UniqueDateHeader
258+
:sender: UniqueSingleAddressHeader
259+
:resent-sender: SingleAddressHeader
260+
:to: UniqueAddressHeader
261+
:resent-to: AddressHeader
262+
:cc: UniqueAddressHeader
263+
:resent-cc: AddressHeader
264+
:from: UniqueAddressHeader
265+
:resent-from: AddressHeader
266+
:reply-to: UniqueAddressHeader
267+
268+
``HeaderRegistry`` has the following methods:
269+
270+
271+
.. method:: map_to_type(self, name, cls)
272+
273+
*name* is the name of the header to be mapped. It will be converted to
274+
lower case in the registry. *cls* is the specialized class to be used,
275+
along with *base_class*, to create the class used to instantiate headers
276+
that match *name*.
277+
278+
279+
.. method:: __getitem__(name)
280+
281+
Construct and return a class to handle creating a *name* header.
282+
283+
284+
.. method:: __call__(name, value)
285+
286+
Retrieves the specialized header associated with *name* from the
287+
registry (using *default_class* if *name* does not appear in the
288+
registry) and composes it with *base_class* to produce a class,
289+
calls the constructed class's constructor, passing it the same
290+
argument list, and finally returns the class instance created thereby.
291+
292+
293+
The following classes are the classes used to represent data parsed from
294+
structured headers and can, in general, be used by an application program to
295+
construct structured values to assign to specific headers.
296+
297+
298+
.. class:: Address(display_name='', username='', domain='', addr_spec=None)
299+
300+
The class used to represent an email address. The general form of an
301+
address is::
302+
303+
[display_name] <username@domain>
304+
305+
or::
306+
307+
username@domain
308+
309+
where each part must conform to specific syntax rules spelled out in
310+
:rfc:`5322`.
311+
312+
As a convenience *addr_spec* can be specified instead of *username* and
313+
*domain*, in which case *username* and *domain* will be parsed from the
314+
*addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
315+
not ``Address`` will raise an error. Unicode characters are allowed and
316+
will be property encoded when serialized. However, per the RFCs, unicode is
317+
*not* allowed in the username portion of the address.
318+
319+
.. attribute:: display_name
320+
321+
The display name portion of the address, if any, with all quoting
322+
removed. If the address does not have a display name, this attribute
323+
will be an empty string.
324+
325+
.. attribute:: username
326+
327+
The ``username`` portion of the address, with all quoting removed.
328+
329+
.. attribute:: domain
330+
331+
The ``domain`` portion of the address.
332+
333+
.. attribute:: addr_spec
334+
335+
The ``username@domain`` portion of the address, correctly quoted
336+
for use as a bare address (the second form shown above). This
337+
attribute is not mutable.
338+
339+
.. method:: __str__()
340+
341+
The ``str`` value of the object is the address quoted according to
342+
:rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
343+
characters.
344+
345+
To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
346+
``username`` and ``domain`` are both the empty string (or ``None``), then
347+
the string value of the ``Address`` is ``<>``.
348+
349+
350+
.. class:: Group(display_name=None, addresses=None)
351+
352+
The class used to represent an address group. The general form of an
353+
address group is::
354+
355+
display_name: [address-list];
356+
357+
As a convenience for processing lists of addresses that consist of a mixture
358+
of groups and single addresses, a ``Group`` may also be used to represent
359+
single addresses that are not part of a group by setting *display_name* to
360+
``None`` and providing a list of the single address as *addresses*.
361+
362+
.. attribute:: display_name
363+
364+
The ``display_name`` of the group. If it is ``None`` and there is
365+
exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
366+
single address that is not in a group.
367+
368+
.. attribute:: addresses
369+
370+
A possibly empty tuple of :class:`.Address` objects representing the
371+
addresses in the group.
372+
373+
.. method:: __str__()
374+
375+
The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
376+
but with no Content Transfer Encoding of any non-ASCII characters. If
377+
``display_name`` is none and there is a single ``Address`` in the
378+
``addresses`` list, the ``str`` value will be the same as the ``str`` of
379+
that single ``Address``.

0 commit comments

Comments
 (0)