|
| 1 | +:mod:`email.headerregistry`: Custom Header Objects |
| 2 | +-------------------------------------------------- |
| 3 | + |
| 4 | +.. module:: email.headerregistry |
| 5 | + :synopsis: Automatic Parsing of headers based on the field name |
| 6 | + |
| 7 | +.. note:: |
| 8 | + |
| 9 | + The headerregistry module has been included in the standard library on a |
| 10 | + :term:`provisional basis <provisional package>`. Backwards incompatible |
| 11 | + changes (up to and including removal of the module) may occur if deemed |
| 12 | + necessary by the core developers. |
| 13 | + |
| 14 | +.. versionadded:: 3.3 |
| 15 | + as a :term:`provisional module <provisional package>` |
| 16 | + |
| 17 | +Headers are represented by customized subclasses of :class:`str`. The |
| 18 | +particular class used to represent a given header is determined by the |
| 19 | +:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in |
| 20 | +effect when the headers are created. This section documents the particular |
| 21 | +``header_factory`` implemented by the email package for handling :RFC:`5322` |
| 22 | +compliant email messages, which not only provides customized header objects for |
| 23 | +various header types, but also provides an extension mechanism for applications |
| 24 | +to add their own custom header types. |
| 25 | + |
| 26 | +When using any of the policy objects derived from |
| 27 | +:data:`~email.policy.EmailPolicy`, all headers are produced by |
| 28 | +:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base |
| 29 | +class. Each header class has an additional base class that is determined by |
| 30 | +the type of the header. For example, many headers have the class |
| 31 | +:class:`.UnstructuredHeader` as their other base class. The specialized second |
| 32 | +class for a header is determined by the name of the header, using a lookup |
| 33 | +table stored in the :class:`.HeaderRegistry`. All of this is managed |
| 34 | +transparently for the typical application program, but interfaces are provided |
| 35 | +for modifying the default behavior for use by more complex applications. |
| 36 | + |
| 37 | +The sections below first document the header base classes and their attributes, |
| 38 | +followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and |
| 39 | +finally the support classes used to represent the data parsed from structured |
| 40 | +headers. |
| 41 | + |
| 42 | + |
| 43 | +.. class:: BaseHeader(name, value) |
| 44 | + |
| 45 | + *name* and *value* are passed to ``BaseHeader`` from the |
| 46 | + :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of |
| 47 | + any header object is the *value* fully decoded to unicode. |
| 48 | + |
| 49 | + This base class defines the following read-only properties: |
| 50 | + |
| 51 | + |
| 52 | + .. attribute:: name |
| 53 | + |
| 54 | + The name of the header (the portion of the field before the ':'). This |
| 55 | + is exactly the value passed in the :attr:`~EmailPolicy.header_factory` |
| 56 | + call for *name*; that is, case is preserved. |
| 57 | + |
| 58 | + |
| 59 | + .. attribute:: defects |
| 60 | + |
| 61 | + A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any |
| 62 | + RFC compliance problems found during parsing. The email package tries to |
| 63 | + be complete about detecting compliance issues. See the :mod:`errors` |
| 64 | + module for a discussion of the types of defects that may be reported. |
| 65 | + |
| 66 | + |
| 67 | + .. attribute:: max_count |
| 68 | + |
| 69 | + The maximum number of headers of this type that can have the same |
| 70 | + ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value |
| 71 | + for this attribute is ``None``; it is expected that specialized header |
| 72 | + classes will override this value as needed. |
| 73 | + |
| 74 | + ``BaseHeader`` also provides the following method, which is called by the |
| 75 | + email library code and should not in general be called by application |
| 76 | + programs: |
| 77 | + |
| 78 | + .. method:: fold(*, policy) |
| 79 | + |
| 80 | + Return a string containing :attr:`~email.policy.Policy.linesep` |
| 81 | + characters as required to correctly fold the header according |
| 82 | + to *policy*. A :attr:`~email.policy.Policy.cte_type` of |
| 83 | + ``8bit`` will be treated as if it were ``7bit``, since strings |
| 84 | + may not contain binary data. |
| 85 | + |
| 86 | + |
| 87 | + ``BaseHeader`` by itself cannot be used to create a header object. It |
| 88 | + defines a protocol that each specialized header cooperates with in order to |
| 89 | + produce the header object. Specifically, ``BaseHeader`` requires that |
| 90 | + the specialized class provide a :func:`classmethod` named ``parse``. This |
| 91 | + method is called as follows:: |
| 92 | + |
| 93 | + parse(string, kwds) |
| 94 | + |
| 95 | + ``kwds`` is a dictionary containing one pre-initialized key, ``defects``. |
| 96 | + ``defects`` is an empty list. The parse method should append any detected |
| 97 | + defects to this list. On return, the ``kwds`` dictionary *must* contain |
| 98 | + values for at least the keys ``decoded`` and ``defects``. ``decoded`` |
| 99 | + should be the string value for the header (that is, the header value fully |
| 100 | + decoded to unicode). The parse method should assume that *string* may |
| 101 | + contain transport encoded parts, but should correctly handle all valid |
| 102 | + unicode characters as well so that it can parse un-encoded header values. |
| 103 | + |
| 104 | + ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its |
| 105 | + ``init`` method. The specialized class only needs to provide an ``init`` |
| 106 | + method if it wishes to set additional attributes beyond those provided by |
| 107 | + ``BaseHeader`` itself. Such an ``init`` method should look like this:: |
| 108 | + |
| 109 | + def init(self, *args, **kw): |
| 110 | + self._myattr = kw.pop('myattr') |
| 111 | + super().init(*args, **kw) |
| 112 | + |
| 113 | + That is, anything extra that the specialized class puts in to the ``kwds`` |
| 114 | + dictionary should be removed and handled, and the remaining contents of |
| 115 | + ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method. |
| 116 | + |
| 117 | + |
| 118 | +.. class:: UnstructuredHeader |
| 119 | + |
| 120 | + An "unstructured" header is the default type of header in :rfc:`5322`. |
| 121 | + Any header that does not have a specified syntax is treated as |
| 122 | + unstructured. The classic example of an unstructured header is the |
| 123 | + :mailheader:`Subject` header. |
| 124 | + |
| 125 | + In :rfc:`5322`, an unstructured header is a run of arbitrary text in the |
| 126 | + ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible |
| 127 | + mechanism for encoding non-ASCII text as ASCII characters within a header |
| 128 | + value. When a *value* containing encoded words is passed to the |
| 129 | + constructor, the ``UnstructuredHeader`` parser converts such encoded words |
| 130 | + back in to the original unicode, following the :rfc:`2047` rules for |
| 131 | + unstructured text. The parser uses heuristics to attempt to decode certain |
| 132 | + non-compliant encoded words. Defects are registered in such cases, as well |
| 133 | + as defects for issues such as invalid characters within the encoded words or |
| 134 | + the non-encoded text. |
| 135 | + |
| 136 | + This header type provides no additional attributes. |
| 137 | + |
| 138 | + |
| 139 | +.. class:: DateHeader |
| 140 | + |
| 141 | + :rfc:`5322` specifies a very specific format for dates within email headers. |
| 142 | + The ``DateHeader`` parser recognizes that date format, as well as |
| 143 | + recognizing a number of variant forms that are sometimes found "in the |
| 144 | + wild". |
| 145 | + |
| 146 | + This header type provides the following additional attributes: |
| 147 | + |
| 148 | + .. attribute:: datetime |
| 149 | + |
| 150 | + If the header value can be recognized as a valid date of one form or |
| 151 | + another, this attribute will contain a :class:`~datetime.datetime` |
| 152 | + instance representing that date. If the timezone of the input date is |
| 153 | + specified as ``-0000`` (indicating it is in UTC but contains no |
| 154 | + information about the source timezone), then :attr:`.datetime` will be a |
| 155 | + naive :class:`~datetime.datetime`. If a specific timezone offset is |
| 156 | + found (including `+0000`), then :attr:`.datetime` will contain an aware |
| 157 | + ``datetime`` that uses :class:`datetime.timezone` to record the timezone |
| 158 | + offset. |
| 159 | + |
| 160 | + The ``decoded`` value of the header is determined by formatting the |
| 161 | + ``datetime`` according to the :rfc:`5322` rules; that is, it is set to:: |
| 162 | + |
| 163 | + email.utils.format_datetime(self.datetime) |
| 164 | + |
| 165 | + When creating a ``DateHeader``, *value* may be |
| 166 | + :class:`~datetime.datetime` instance. This means, for example, that |
| 167 | + the following code is valid and does what one would expect:: |
| 168 | + |
| 169 | + msg['Date'] = datetime(2011, 7, 15, 21) |
| 170 | + |
| 171 | + Because this is a naive ``datetime`` it will be interpreted as a UTC |
| 172 | + timestamp, and the resulting value will have a timezone of ``-0000``. Much |
| 173 | + more useful is to use the :func:`~email.utils.localtime` function from the |
| 174 | + :mod:`~email.utils` module:: |
| 175 | + |
| 176 | + msg['Date'] = utils.localtime() |
| 177 | + |
| 178 | + This example sets the date header to the current time and date using |
| 179 | + the current timezone offset. |
| 180 | + |
| 181 | + |
| 182 | +.. class:: AddressHeader |
| 183 | + |
| 184 | + Address headers are one of the most complex structured header types. |
| 185 | + The ``AddressHeader`` class provides a generic interface to any address |
| 186 | + header. |
| 187 | + |
| 188 | + This header type provides the following additional attributes: |
| 189 | + |
| 190 | + |
| 191 | + .. attribute:: groups |
| 192 | + |
| 193 | + A tuple of :class:`.Group` objects encoding the |
| 194 | + addresses and groups found in the header value. Addresses that are |
| 195 | + not part of a group are represented in this list as single-address |
| 196 | + ``Groups`` whose :attr:`~.Group.display_name` is ``None``. |
| 197 | + |
| 198 | + |
| 199 | + .. attribute:: addresses |
| 200 | + |
| 201 | + A tuple of :class:`.Address` objects encoding all |
| 202 | + of the individual addresses from the header value. If the header value |
| 203 | + contains any groups, the individual addresses from the group are included |
| 204 | + in the list at the point where the group occurs in the value (that is, |
| 205 | + the list of addresses is "flattened" into a one dimensional list). |
| 206 | + |
| 207 | + The ``decoded`` value of the header will have all encoded words decoded to |
| 208 | + unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The |
| 209 | + ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of |
| 210 | + the elements of the ``groups`` attribute with ``', '``. |
| 211 | + |
| 212 | + A list of :class:`.Address` and :class:`.Group` objects in any combination |
| 213 | + may be used to set the value of an address header. ``Group`` objects whose |
| 214 | + ``display_name`` is ``None`` will be interpreted as single addresses, which |
| 215 | + allows an address list to be copied with groups intact by using the list |
| 216 | + obtained ``groups`` attribute of the source header. |
| 217 | + |
| 218 | + |
| 219 | +.. class:: SingleAddressHeader |
| 220 | + |
| 221 | + A subclass of :class:`.AddressHeader` that adds one |
| 222 | + additional attribute: |
| 223 | + |
| 224 | + |
| 225 | + .. attribute:: address |
| 226 | + |
| 227 | + The single address encoded by the header value. If the header value |
| 228 | + actually contains more than one address (which would be a violation of |
| 229 | + the RFC under the default :mod:`policy`), accessing this attribute will |
| 230 | + result in a :exc:`ValueError`. |
| 231 | + |
| 232 | + |
| 233 | +Each of the above classes also has a ``Unique`` variant (for example, |
| 234 | +``UniqueUnstructuredHeader``). The only difference is that in the ``Unique`` |
| 235 | +variant, :attr:`~.BaseHeader.max_count` is set to 1. |
| 236 | + |
| 237 | + |
| 238 | +.. class:: HeaderRegistry(base_class=BaseHeader, \ |
| 239 | + default_class=UnstructuredHeader, \ |
| 240 | + use_default_map=True) |
| 241 | + |
| 242 | + This is the factory used by :class:`~email.policy.EmailPolicy` by default. |
| 243 | + ``HeaderRegistry`` builds the class used to create a header instance |
| 244 | + dynamically, using *base_class* and a specialized class retrieved from a |
| 245 | + registry that it holds. When a given header name does not appear in the |
| 246 | + registry, the class specified by *default_class* is used as the specialized |
| 247 | + class. When *use_default_map* is ``True`` (the default), the standard |
| 248 | + mapping of header names to classes is copied in to the registry during |
| 249 | + initialization. *base_class* is always the last class in the generated |
| 250 | + class's ``__bases__`` list. |
| 251 | + |
| 252 | + The default mappings are: |
| 253 | + |
| 254 | + :subject: UniqueUnstructuredHeader |
| 255 | + :date: UniqueDateHeader |
| 256 | + :resent-date: DateHeader |
| 257 | + :orig-date: UniqueDateHeader |
| 258 | + :sender: UniqueSingleAddressHeader |
| 259 | + :resent-sender: SingleAddressHeader |
| 260 | + :to: UniqueAddressHeader |
| 261 | + :resent-to: AddressHeader |
| 262 | + :cc: UniqueAddressHeader |
| 263 | + :resent-cc: AddressHeader |
| 264 | + :from: UniqueAddressHeader |
| 265 | + :resent-from: AddressHeader |
| 266 | + :reply-to: UniqueAddressHeader |
| 267 | + |
| 268 | + ``HeaderRegistry`` has the following methods: |
| 269 | + |
| 270 | + |
| 271 | + .. method:: map_to_type(self, name, cls) |
| 272 | + |
| 273 | + *name* is the name of the header to be mapped. It will be converted to |
| 274 | + lower case in the registry. *cls* is the specialized class to be used, |
| 275 | + along with *base_class*, to create the class used to instantiate headers |
| 276 | + that match *name*. |
| 277 | + |
| 278 | + |
| 279 | + .. method:: __getitem__(name) |
| 280 | + |
| 281 | + Construct and return a class to handle creating a *name* header. |
| 282 | + |
| 283 | + |
| 284 | + .. method:: __call__(name, value) |
| 285 | + |
| 286 | + Retrieves the specialized header associated with *name* from the |
| 287 | + registry (using *default_class* if *name* does not appear in the |
| 288 | + registry) and composes it with *base_class* to produce a class, |
| 289 | + calls the constructed class's constructor, passing it the same |
| 290 | + argument list, and finally returns the class instance created thereby. |
| 291 | + |
| 292 | + |
| 293 | +The following classes are the classes used to represent data parsed from |
| 294 | +structured headers and can, in general, be used by an application program to |
| 295 | +construct structured values to assign to specific headers. |
| 296 | + |
| 297 | + |
| 298 | +.. class:: Address(display_name='', username='', domain='', addr_spec=None) |
| 299 | + |
| 300 | + The class used to represent an email address. The general form of an |
| 301 | + address is:: |
| 302 | + |
| 303 | + [display_name] <username@domain> |
| 304 | + |
| 305 | + or:: |
| 306 | + |
| 307 | + username@domain |
| 308 | + |
| 309 | + where each part must conform to specific syntax rules spelled out in |
| 310 | + :rfc:`5322`. |
| 311 | + |
| 312 | + As a convenience *addr_spec* can be specified instead of *username* and |
| 313 | + *domain*, in which case *username* and *domain* will be parsed from the |
| 314 | + *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is |
| 315 | + not ``Address`` will raise an error. Unicode characters are allowed and |
| 316 | + will be property encoded when serialized. However, per the RFCs, unicode is |
| 317 | + *not* allowed in the username portion of the address. |
| 318 | + |
| 319 | + .. attribute:: display_name |
| 320 | + |
| 321 | + The display name portion of the address, if any, with all quoting |
| 322 | + removed. If the address does not have a display name, this attribute |
| 323 | + will be an empty string. |
| 324 | + |
| 325 | + .. attribute:: username |
| 326 | + |
| 327 | + The ``username`` portion of the address, with all quoting removed. |
| 328 | + |
| 329 | + .. attribute:: domain |
| 330 | + |
| 331 | + The ``domain`` portion of the address. |
| 332 | + |
| 333 | + .. attribute:: addr_spec |
| 334 | + |
| 335 | + The ``username@domain`` portion of the address, correctly quoted |
| 336 | + for use as a bare address (the second form shown above). This |
| 337 | + attribute is not mutable. |
| 338 | + |
| 339 | + .. method:: __str__() |
| 340 | + |
| 341 | + The ``str`` value of the object is the address quoted according to |
| 342 | + :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII |
| 343 | + characters. |
| 344 | + |
| 345 | + To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if |
| 346 | + ``username`` and ``domain`` are both the empty string (or ``None``), then |
| 347 | + the string value of the ``Address`` is ``<>``. |
| 348 | + |
| 349 | + |
| 350 | +.. class:: Group(display_name=None, addresses=None) |
| 351 | + |
| 352 | + The class used to represent an address group. The general form of an |
| 353 | + address group is:: |
| 354 | + |
| 355 | + display_name: [address-list]; |
| 356 | + |
| 357 | + As a convenience for processing lists of addresses that consist of a mixture |
| 358 | + of groups and single addresses, a ``Group`` may also be used to represent |
| 359 | + single addresses that are not part of a group by setting *display_name* to |
| 360 | + ``None`` and providing a list of the single address as *addresses*. |
| 361 | + |
| 362 | + .. attribute:: display_name |
| 363 | + |
| 364 | + The ``display_name`` of the group. If it is ``None`` and there is |
| 365 | + exactly one ``Address`` in ``addresses``, then the ``Group`` represents a |
| 366 | + single address that is not in a group. |
| 367 | + |
| 368 | + .. attribute:: addresses |
| 369 | + |
| 370 | + A possibly empty tuple of :class:`.Address` objects representing the |
| 371 | + addresses in the group. |
| 372 | + |
| 373 | + .. method:: __str__() |
| 374 | + |
| 375 | + The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`, |
| 376 | + but with no Content Transfer Encoding of any non-ASCII characters. If |
| 377 | + ``display_name`` is none and there is a single ``Address`` in the |
| 378 | + ``addresses`` list, the ``str`` value will be the same as the ``str`` of |
| 379 | + that single ``Address``. |
0 commit comments