Skip to content

Commit 7a94c3e

Browse files
authored
PEP 618: Third Draft (#1429)
Lots of improvements, and GvR is now BDFL-Delegate.
1 parent 23585d7 commit 7a94c3e

File tree

1 file changed

+120
-56
lines changed

1 file changed

+120
-56
lines changed

pep-0618.rst

Lines changed: 120 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ Version: $Revision$
44
Last-Modified: $Date$
55
Author: Brandt Bucher <[email protected]>
66
Sponsor: Antoine Pitrou <[email protected]>
7+
BDFL-Delegate: Guido van Rossum <[email protected]>
78
Status: Draft
89
Type: Standards Track
910
Content-Type: text/x-rst
1011
Created: 01-May-2020
11-
Python-Version: 3.9
12-
Post-History: 01-May-2020
12+
Python-Version: 3.10
13+
Post-History: 01-May-2020, 10-May-2020
1314
Resolution:
1415

1516

@@ -24,14 +25,15 @@ raised if one of the arguments is exhausted before the others.
2425
Motivation
2526
==========
2627

27-
Many Python users find that most of their ``zip`` usage involves
28-
iterables that *should* be of equal length. Sometimes this invariant
29-
is proven true from the context of the surrounding code, but often the
30-
data being zipped is passed from the caller, sourced separately, or
31-
generated in some fashion. In any of these cases, the default
32-
behavior of ``zip`` means that faulty refactoring or logic errors
33-
could easily result in silently losing data. These bugs are not only
34-
difficult to diagnose, but difficult to even detect at all.
28+
It is clear from the author's personal experience and a `survey of the
29+
standard library <examples_>`_ that much (if not most) ``zip`` usage
30+
involves iterables that *must* be of equal length. Sometimes this
31+
invariant is proven true from the context of the surrounding code, but
32+
often the data being zipped is passed from the caller, sourced
33+
separately, or generated in some fashion. In any of these cases, the
34+
default behavior of ``zip`` means that faulty refactoring or logic
35+
errors could easily result in silently losing data. These bugs are
36+
not only difficult to diagnose, but difficult to even detect at all.
3537

3638
It is easy to come up with simple cases where this could be a problem.
3739
For example, the following code may work fine when ``items`` is a
@@ -40,16 +42,16 @@ if ``items`` is refactored by the caller to be a consumable iterator::
4042

4143
def apply_calculations(items):
4244
transformed = transform(items)
43-
for x, y in zip(items, transformed):
44-
yield something(x, y)
45+
for i, t in zip(items, transformed):
46+
yield calculate(i, t)
4547

4648
There are several other ways in which ``zip`` is commonly used.
4749
Idiomatic tricks are especially susceptible, because they are often
4850
employed by users who lack a complete understanding of how the code
4951
works. One example is unpacking into ``zip`` to lazily "unzip" or
5052
"transpose" nested iterables::
5153

52-
>>> x = iter([iter([1, 2, 3]), iter(["one" "two" "three"])])
54+
>>> x = [[1, 2, 3], ["one" "two" "three"]]
5355
>>> xt = list(zip(*x))
5456

5557
Another is "chunking" data into equal-sized groups::
@@ -63,19 +65,19 @@ the second case, data with a length that is not a multiple of ``n`` is
6365
often an error as well. However, both of these idioms will silently
6466
omit the tail-end items of malformed input.
6567

66-
Perhaps most convincingly, the current use of ``zip`` in the
67-
standard-library ``ast`` module has created multiple bugs that
68-
`silently drop parts of malformed nodes
68+
Perhaps most convincingly, the use of ``zip`` in the standard-library
69+
``ast`` module created a bug in ``literal_eval`` which `silently
70+
dropped parts of malformed nodes
6971
<https://bugs.python.org/issue40355>`_::
7072

7173
>>> from ast import Constant, Dict, literal_eval
7274
>>> nasty_dict = Dict(keys=[Constant(None)], values=[])
7375
>>> literal_eval(nasty_dict) # Like eval("{None: }")
7476
{}
7577

76-
In fact, the author has counted dozens of other call sites in Python's
77-
standard library and tooling where it would be appropriate to enable
78-
this new feature immediately.
78+
In fact, the author has `counted dozens of other call sites
79+
<examples_>`_ in Python's standard library and tooling where it
80+
would be appropriate to enable this new feature immediately.
7981

8082

8183
Rationale
@@ -93,10 +95,6 @@ functions which are typically called with compile-time constants:
9395

9496
Many more exist in the standard library.
9597

96-
A good rule of thumb is that "mode-switches" which change return types
97-
or significantly alter functionality are indeed an anti-pattern, while
98-
ones which enable or disable complementary checks or behavior are not.
99-
10098
The idea and name for this new parameter were `originally proposed
10199
<https://mail.python.org/archives/list/[email protected]/message/6GFUADSQ5JTF7W7OGWF7XF2NH2XUTUQM>`_
102100
by Ram Rachum. The thread received over 100 replies, with the
@@ -117,9 +115,6 @@ When the built-in ``zip`` is called with the keyword-only argument
117115
the arguments are exhausted at differing lengths. This error will
118116
occur at the point when iteration would normally stop today.
119117

120-
At most one additional item may be consumed from one of the iterators
121-
when compared to normal ``zip`` usage.
122-
123118

124119
Backward Compatibility
125120
======================
@@ -133,7 +128,7 @@ Reference Implementation
133128
The author has drafted a `C implementation
134129
<https://github.com/python/cpython/compare/master...brandtbucher:zip-strict>`_.
135130

136-
An approximate pure-Python translation is::
131+
An approximate Python translation is::
137132

138133
def zip(*iterables, strict=False):
139134
if not iterables:
@@ -160,38 +155,71 @@ An approximate pure-Python translation is::
160155
Rejected Ideas
161156
==============
162157

163-
Add Additional Flavors Of ``zip`` To ``itertools``
164-
''''''''''''''''''''''''''''''''''''''''''''''''''
158+
Add ``itertools.zip_strict``
159+
----------------------------
160+
161+
This is the alternative with the most support on the Python-Ideas
162+
mailing list, so it deserves do be discussed in some detail here. It
163+
does not have any disqualifying flaws, and could work well enough as a
164+
substitute if this PEP is rejected.
165+
166+
With that in mind, this section aims to outline why adding an optional
167+
parameter to ``zip`` is a smaller change that ultimately does a better
168+
job of solving the problems motivating this PEP.
165169

166-
Adding ``zip_strict`` to itertools is a larger change with greater
167-
maintenance burden than the simple modification being proposed.
170+
171+
Precedent
172+
'''''''''
168173

169174
It seems that a great deal of the motivation driving this alternative
170175
is that ``zip_longest`` already exists in ``itertools``. However,
171-
``zip_longest`` is really another beast entirely: it takes on the
172-
responsibility of filling in missing values, a problem neither of
173-
the other variants even have. It also arguably has the most
174-
specialized behavior of the three (to the point of exposing a new
175-
``fillvalue`` parameter), so it makes sense that it would live in
176-
``itertools`` while ``zip`` grows in-place.
177-
178-
Importing a drop-in replacement for a built-in also feels too heavy,
179-
especially just to check a tricky condition that should "always" be
180-
true. The goal here is not just to provide a way to catch bugs, but
181-
to also make it easy (even tempting) for a user to enable the check
182-
whenever using ``zip`` at a call site with this property.
176+
``zip_longest`` is in many ways a much more complicated, specialized
177+
utility: it takes on the responsibility of filling in missing values,
178+
a job neither of the other variants needs to concern themselves with.
179+
180+
If both ``zip`` and ``zip_longest`` lived alongside each other in
181+
``itertools`` or as builtins, then adding ``zip_strict`` in the same
182+
location would indeed be a much stronger argument. However, the new
183+
"strict" variant is conceptually *much* closer to ``zip`` in interface
184+
and behavior than ``zip_longest``, while still not meeting the high
185+
bar of being its own builtin. Given this situation, it seems most
186+
natural for ``zip`` to grow this new option in-place.
187+
188+
189+
Usability
190+
'''''''''
191+
192+
If ``zip`` is capable of preventing this class of bug, it becomes much
193+
simpler for users to enable the check at call sites with this
194+
property. Compare this with importing a drop-in replacement for a
195+
built-in utility, which feels somewhat heavy just to check a tricky
196+
condition that should "always" be true.
183197

184198
Some have also argued that a new function buried in the standard
185199
library is somehow more "discoverable" than a keyword parameter on the
186-
built-in itself. The author does not believe this to be true.
200+
built-in itself. The author does not agree with this assessment.
201+
202+
203+
Maintenance Cost
204+
''''''''''''''''
205+
206+
While implementation should only be a secondary concern when making
207+
usability improvements, it is important to recognize that adding a new
208+
utility is significantly more complicated than modifying an existing
209+
one. The CPython implementation accompanying this PEP is simple and
210+
has no measurable performance impact on default ``zip`` behavior,
211+
while adding an entirely new utility to ``itertools`` would require
212+
either:
187213

188-
Another proposed idiom, per-module shadowing of the built-in ``zip``
189-
with some subtly different variant from ``itertools``, is an
190-
anti-pattern that shouldn't be encouraged.
214+
- Duplicating much of the existing ``zip`` logic, as ``zip_longest``
215+
already does.
216+
- Significantly refactoring either ``zip``, ``zip_longest``, or both
217+
to share a common or inherited implementation (which may impact
218+
performance).
191219

192220

193221
Add Several "Modes" To Switch Between
194-
'''''''''''''''''''''''''''''''''''''
222+
-------------------------------------
195223

196224
This option only makes more sense than a binary flag if we anticipate
197225
having three or more modes. The "obvious" three choices for these
@@ -211,7 +239,7 @@ long-lived namesake utility in ``itertools``.
211239

212240

213241
Add A Method Or Alternate Constructor To The ``zip`` Type
214-
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''
242+
---------------------------------------------------------
215243

216244
Consider the following two options, which have both been proposed::
217245

@@ -235,11 +263,14 @@ nothing (which is the problem we are trying to avoid in the first
235263
place).
236264

237265
This proposal is further complicated by the fact that CPython's actual
238-
``zip`` type is an undocumented implementation detail.
266+
``zip`` type is currently an undocumented implementation detail. This
267+
means that choosing one of the above behaviors will effectively "lock
268+
in" the current implementation (or at least require it to be emulated)
269+
going forward.
239270

240271

241272
Change The Default Behavior Of ``zip``
242-
''''''''''''''''''''''''''''''''''''''
273+
--------------------------------------
243274

244275
There is nothing "wrong" with the default behavior of ``zip``, since
245276
there are many cases where it is indeed the correct way to handle
@@ -251,15 +282,15 @@ the "extra" tail-end data is still needed.
251282

252283

253284
Accept A Callback To Handle Remaining Items
254-
'''''''''''''''''''''''''''''''''''''''''''
285+
-------------------------------------------
255286

256287
While able to do basically anything a user could need, this solution
257288
makes handling the more common cases (like rejecting mismatched
258289
lengths) unnecessarily complicated and non-obvious.
259290

260291

261-
Raise An ``AssertionError`` Instead Of A ``ValueError``
262-
'''''''''''''''''''''''''''''''''''''''''''''''''''''''
292+
Raise An ``AssertionError``
293+
---------------------------
263294

264295
There are no built-in functions or types that raise an
265296
``AssertionError`` as part of their API. Further, the `official
@@ -276,7 +307,7 @@ Users desiring a check that is disabled in optimized mode (like an
276307

277308

278309
Add A Similar Feature to ``map``
279-
''''''''''''''''''''''''''''''''
310+
--------------------------------
280311

281312
This PEP does not propose any changes to ``map``, since the use of
282313
``map`` with multiple iterable arguments is quite rare. However, this
@@ -291,7 +322,7 @@ debated here for ``zip``.
291322

292323

293324
Do Nothing
294-
''''''''''
325+
----------
295326

296327
This option is perhaps the least attractive.
297328

@@ -303,6 +334,39 @@ are evidence that it's *very* easy to fall into the sort of trap that
303334
this feature aims to avoid.
304335

305336

337+
References
338+
==========
339+
340+
Examples
341+
--------
342+
343+
.. note:: This listing is not exhaustive.
344+
345+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3394
346+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3418
347+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3435
348+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L94-L95
349+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1184
350+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1275
351+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1363
352+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1391
353+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/copy.py#L217
354+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/csv.py#L142
355+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/dis.py#L462
356+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L142
357+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L143
358+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L1440
359+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L2095
360+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/os.py#L510
361+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/plistlib.py#L577
362+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1317
363+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1323
364+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1339
365+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3015
366+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3071
367+
- https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3901
368+
369+
306370
Copyright
307371
=========
308372

0 commit comments

Comments
 (0)