@@ -4,12 +4,13 @@ Version: $Revision$
4
4
Last-Modified: $Date$
5
5
Author: Brandt Bucher <
[email protected] >
6
6
Sponsor: Antoine Pitrou <
[email protected] >
7
+ BDFL-Delegate: Guido van Rossum <
[email protected] >
7
8
Status: Draft
8
9
Type: Standards Track
9
10
Content-Type: text/x-rst
10
11
Created: 01-May-2020
11
- Python-Version: 3.9
12
- Post-History: 01-May-2020
12
+ Python-Version: 3.10
13
+ Post-History: 01-May-2020, 10-May-2020
13
14
Resolution:
14
15
15
16
@@ -24,14 +25,15 @@ raised if one of the arguments is exhausted before the others.
24
25
Motivation
25
26
==========
26
27
27
- Many Python users find that most of their ``zip `` usage involves
28
- iterables that *should * be of equal length. Sometimes this invariant
29
- is proven true from the context of the surrounding code, but often the
30
- data being zipped is passed from the caller, sourced separately, or
31
- generated in some fashion. In any of these cases, the default
32
- behavior of ``zip `` means that faulty refactoring or logic errors
33
- could easily result in silently losing data. These bugs are not only
34
- difficult to diagnose, but difficult to even detect at all.
28
+ It is clear from the author's personal experience and a `survey of the
29
+ standard library <examples_> `_ that much (if not most) ``zip `` usage
30
+ involves iterables that *must * be of equal length. Sometimes this
31
+ invariant is proven true from the context of the surrounding code, but
32
+ often the data being zipped is passed from the caller, sourced
33
+ separately, or generated in some fashion. In any of these cases, the
34
+ default behavior of ``zip `` means that faulty refactoring or logic
35
+ errors could easily result in silently losing data. These bugs are
36
+ not only difficult to diagnose, but difficult to even detect at all.
35
37
36
38
It is easy to come up with simple cases where this could be a problem.
37
39
For example, the following code may work fine when ``items `` is a
@@ -40,16 +42,16 @@ if ``items`` is refactored by the caller to be a consumable iterator::
40
42
41
43
def apply_calculations(items):
42
44
transformed = transform(items)
43
- for x, y in zip(items, transformed):
44
- yield something(x, y )
45
+ for i, t in zip(items, transformed):
46
+ yield calculate(i, t )
45
47
46
48
There are several other ways in which ``zip `` is commonly used.
47
49
Idiomatic tricks are especially susceptible, because they are often
48
50
employed by users who lack a complete understanding of how the code
49
51
works. One example is unpacking into ``zip `` to lazily "unzip" or
50
52
"transpose" nested iterables::
51
53
52
- >>> x = iter([iter( [1, 2, 3]), iter( ["one" "two" "three"])])
54
+ >>> x = [ [1, 2, 3], ["one" "two" "three"]]
53
55
>>> xt = list(zip(*x))
54
56
55
57
Another is "chunking" data into equal-sized groups::
@@ -63,19 +65,19 @@ the second case, data with a length that is not a multiple of ``n`` is
63
65
often an error as well. However, both of these idioms will silently
64
66
omit the tail-end items of malformed input.
65
67
66
- Perhaps most convincingly, the current use of ``zip `` in the
67
- standard-library ``ast `` module has created multiple bugs that
68
- ` silently drop parts of malformed nodes
68
+ Perhaps most convincingly, the use of ``zip `` in the standard-library
69
+ ``ast `` module created a bug in `` literal_eval `` which ` silently
70
+ dropped parts of malformed nodes
69
71
<https://bugs.python.org/issue40355> `_::
70
72
71
73
>>> from ast import Constant, Dict, literal_eval
72
74
>>> nasty_dict = Dict(keys = [Constant(None )], values = [])
73
75
>>> literal_eval(nasty_dict) # Like eval("{None: }")
74
76
{}
75
77
76
- In fact, the author has counted dozens of other call sites in Python's
77
- standard library and tooling where it would be appropriate to enable
78
- this new feature immediately.
78
+ In fact, the author has ` counted dozens of other call sites
79
+ <examples_> `_ in Python's standard library and tooling where it
80
+ would be appropriate to enable this new feature immediately.
79
81
80
82
81
83
Rationale
@@ -93,10 +95,6 @@ functions which are typically called with compile-time constants:
93
95
94
96
Many more exist in the standard library.
95
97
96
- A good rule of thumb is that "mode-switches" which change return types
97
- or significantly alter functionality are indeed an anti-pattern, while
98
- ones which enable or disable complementary checks or behavior are not.
99
-
100
98
The idea and name for this new parameter were `originally proposed
101
99
<https://mail.python.org/archives/list/[email protected] /message/6GFUADSQ5JTF7W7OGWF7XF2NH2XUTUQM> `_
102
100
by Ram Rachum. The thread received over 100 replies, with the
@@ -117,9 +115,6 @@ When the built-in ``zip`` is called with the keyword-only argument
117
115
the arguments are exhausted at differing lengths. This error will
118
116
occur at the point when iteration would normally stop today.
119
117
120
- At most one additional item may be consumed from one of the iterators
121
- when compared to normal ``zip `` usage.
122
-
123
118
124
119
Backward Compatibility
125
120
======================
@@ -133,7 +128,7 @@ Reference Implementation
133
128
The author has drafted a `C implementation
134
129
<https://github.com/python/cpython/compare/master...brandtbucher:zip-strict> `_.
135
130
136
- An approximate pure- Python translation is::
131
+ An approximate Python translation is::
137
132
138
133
def zip(*iterables, strict=False):
139
134
if not iterables:
@@ -160,38 +155,71 @@ An approximate pure-Python translation is::
160
155
Rejected Ideas
161
156
==============
162
157
163
- Add Additional Flavors Of ``zip `` To ``itertools ``
164
- ''''''''''''''''''''''''''''''''''''''''''''''''''
158
+ Add ``itertools.zip_strict ``
159
+ ----------------------------
160
+
161
+ This is the alternative with the most support on the Python-Ideas
162
+ mailing list, so it deserves do be discussed in some detail here. It
163
+ does not have any disqualifying flaws, and could work well enough as a
164
+ substitute if this PEP is rejected.
165
+
166
+ With that in mind, this section aims to outline why adding an optional
167
+ parameter to ``zip `` is a smaller change that ultimately does a better
168
+ job of solving the problems motivating this PEP.
165
169
166
- Adding ``zip_strict `` to itertools is a larger change with greater
167
- maintenance burden than the simple modification being proposed.
170
+
171
+ Precedent
172
+ '''''''''
168
173
169
174
It seems that a great deal of the motivation driving this alternative
170
175
is that ``zip_longest `` already exists in ``itertools ``. However,
171
- ``zip_longest `` is really another beast entirely: it takes on the
172
- responsibility of filling in missing values, a problem neither of
173
- the other variants even have. It also arguably has the most
174
- specialized behavior of the three (to the point of exposing a new
175
- ``fillvalue `` parameter), so it makes sense that it would live in
176
- ``itertools `` while ``zip `` grows in-place.
177
-
178
- Importing a drop-in replacement for a built-in also feels too heavy,
179
- especially just to check a tricky condition that should "always" be
180
- true. The goal here is not just to provide a way to catch bugs, but
181
- to also make it easy (even tempting) for a user to enable the check
182
- whenever using ``zip `` at a call site with this property.
176
+ ``zip_longest `` is in many ways a much more complicated, specialized
177
+ utility: it takes on the responsibility of filling in missing values,
178
+ a job neither of the other variants needs to concern themselves with.
179
+
180
+ If both ``zip `` and ``zip_longest `` lived alongside each other in
181
+ ``itertools `` or as builtins, then adding ``zip_strict `` in the same
182
+ location would indeed be a much stronger argument. However, the new
183
+ "strict" variant is conceptually *much * closer to ``zip `` in interface
184
+ and behavior than ``zip_longest ``, while still not meeting the high
185
+ bar of being its own builtin. Given this situation, it seems most
186
+ natural for ``zip `` to grow this new option in-place.
187
+
188
+
189
+ Usability
190
+ '''''''''
191
+
192
+ If ``zip `` is capable of preventing this class of bug, it becomes much
193
+ simpler for users to enable the check at call sites with this
194
+ property. Compare this with importing a drop-in replacement for a
195
+ built-in utility, which feels somewhat heavy just to check a tricky
196
+ condition that should "always" be true.
183
197
184
198
Some have also argued that a new function buried in the standard
185
199
library is somehow more "discoverable" than a keyword parameter on the
186
- built-in itself. The author does not believe this to be true.
200
+ built-in itself. The author does not agree with this assessment.
201
+
202
+
203
+ Maintenance Cost
204
+ ''''''''''''''''
205
+
206
+ While implementation should only be a secondary concern when making
207
+ usability improvements, it is important to recognize that adding a new
208
+ utility is significantly more complicated than modifying an existing
209
+ one. The CPython implementation accompanying this PEP is simple and
210
+ has no measurable performance impact on default ``zip `` behavior,
211
+ while adding an entirely new utility to ``itertools `` would require
212
+ either:
187
213
188
- Another proposed idiom, per-module shadowing of the built-in ``zip ``
189
- with some subtly different variant from ``itertools ``, is an
190
- anti-pattern that shouldn't be encouraged.
214
+ - Duplicating much of the existing ``zip `` logic, as ``zip_longest ``
215
+ already does.
216
+ - Significantly refactoring either ``zip ``, ``zip_longest ``, or both
217
+ to share a common or inherited implementation (which may impact
218
+ performance).
191
219
192
220
193
221
Add Several "Modes" To Switch Between
194
- '''''''''''''''''''''''''''''''''''''
222
+ -------------------------------------
195
223
196
224
This option only makes more sense than a binary flag if we anticipate
197
225
having three or more modes. The "obvious" three choices for these
@@ -211,7 +239,7 @@ long-lived namesake utility in ``itertools``.
211
239
212
240
213
241
Add A Method Or Alternate Constructor To The ``zip `` Type
214
- '''''''''''''''''''''''''''''''''''''''''''''''''''''''''
242
+ ---------------------------------------------------------
215
243
216
244
Consider the following two options, which have both been proposed::
217
245
@@ -235,11 +263,14 @@ nothing (which is the problem we are trying to avoid in the first
235
263
place).
236
264
237
265
This proposal is further complicated by the fact that CPython's actual
238
- ``zip `` type is an undocumented implementation detail.
266
+ ``zip `` type is currently an undocumented implementation detail. This
267
+ means that choosing one of the above behaviors will effectively "lock
268
+ in" the current implementation (or at least require it to be emulated)
269
+ going forward.
239
270
240
271
241
272
Change The Default Behavior Of ``zip ``
242
- ''''''''''''''''''''''''''''''''''''''
273
+ --------------------------------------
243
274
244
275
There is nothing "wrong" with the default behavior of ``zip ``, since
245
276
there are many cases where it is indeed the correct way to handle
@@ -251,15 +282,15 @@ the "extra" tail-end data is still needed.
251
282
252
283
253
284
Accept A Callback To Handle Remaining Items
254
- '''''''''''''''''''''''''''''''''''''''''''
285
+ -------------------------------------------
255
286
256
287
While able to do basically anything a user could need, this solution
257
288
makes handling the more common cases (like rejecting mismatched
258
289
lengths) unnecessarily complicated and non-obvious.
259
290
260
291
261
- Raise An ``AssertionError `` Instead Of A `` ValueError ``
262
- '''''''''''''''''''''''''''''''''''''''''''''''''''''''
292
+ Raise An ``AssertionError ``
293
+ ---------------------------
263
294
264
295
There are no built-in functions or types that raise an
265
296
``AssertionError `` as part of their API. Further, the `official
@@ -276,7 +307,7 @@ Users desiring a check that is disabled in optimized mode (like an
276
307
277
308
278
309
Add A Similar Feature to ``map ``
279
- ''''''''''''''''''''''''''''''''
310
+ --------------------------------
280
311
281
312
This PEP does not propose any changes to ``map ``, since the use of
282
313
``map `` with multiple iterable arguments is quite rare. However, this
@@ -291,7 +322,7 @@ debated here for ``zip``.
291
322
292
323
293
324
Do Nothing
294
- ''''''''''
325
+ ----------
295
326
296
327
This option is perhaps the least attractive.
297
328
@@ -303,6 +334,39 @@ are evidence that it's *very* easy to fall into the sort of trap that
303
334
this feature aims to avoid.
304
335
305
336
337
+ References
338
+ ==========
339
+
340
+ Examples
341
+ --------
342
+
343
+ .. note :: This listing is not exhaustive.
344
+
345
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3394
346
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3418
347
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/_pydecimal.py#L3435
348
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L94-L95
349
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1184
350
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1275
351
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1363
352
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/ast.py#L1391
353
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/copy.py#L217
354
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/csv.py#L142
355
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/dis.py#L462
356
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L142
357
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/filecmp.py#L143
358
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L1440
359
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/inspect.py#L2095
360
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/os.py#L510
361
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/plistlib.py#L577
362
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1317
363
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1323
364
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/tarfile.py#L1339
365
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3015
366
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3071
367
+ - https://github.com/python/cpython/blob/27c0d9b54abaa4112d5a317b8aa78b39ad60a808/Lib/turtle.py#L3901
368
+
369
+
306
370
Copyright
307
371
=========
308
372
0 commit comments