Skip to content

Commit 530ad98

Browse files
author
MomIsBestFriend
committed
Merge remote-tracking branch 'upstream/master' into STY-repr-batch-3
2 parents e10d8cc + 83812e1 commit 530ad98

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+459
-520
lines changed

doc/source/user_guide/integer_na.rst

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@ numbers.
2525

2626
Pandas can represent integer data with possibly missing values using
2727
:class:`arrays.IntegerArray`. This is an :ref:`extension types <extending.extension-types>`
28-
implemented within pandas. It is not the default dtype for integers, and will not be inferred;
29-
you must explicitly pass the dtype into :meth:`array` or :class:`Series`:
28+
implemented within pandas.
3029

3130
.. ipython:: python
3231
@@ -50,24 +49,43 @@ NumPy array.
5049
You can also pass the list-like object to the :class:`Series` constructor
5150
with the dtype.
5251

53-
.. ipython:: python
52+
.. warning::
5453

55-
s = pd.Series([1, 2, np.nan], dtype="Int64")
56-
s
54+
Currently :meth:`pandas.array` and :meth:`pandas.Series` use different
55+
rules for dtype inference. :meth:`pandas.array` will infer a nullable-
56+
integer dtype
5757

58-
By default (if you don't specify ``dtype``), NumPy is used, and you'll end
59-
up with a ``float64`` dtype Series:
58+
.. ipython:: python
6059
61-
.. ipython:: python
60+
pd.array([1, None])
61+
pd.array([1, 2])
62+
63+
For backwards-compatibility, :class:`Series` infers these as either
64+
integer or float dtype
65+
66+
.. ipython:: python
67+
68+
pd.Series([1, None])
69+
pd.Series([1, 2])
6270
63-
pd.Series([1, 2, np.nan])
71+
We recommend explicitly providing the dtype to avoid confusion.
72+
73+
.. ipython:: python
74+
75+
pd.array([1, None], dtype="Int64")
76+
pd.Series([1, None], dtype="Int64")
77+
78+
In the future, we may provide an option for :class:`Series` to infer a
79+
nullable-integer dtype.
6480

6581
Operations involving an integer array will behave similar to NumPy arrays.
6682
Missing values will be propagated, and the data will be coerced to another
6783
dtype if needed.
6884

6985
.. ipython:: python
7086
87+
s = pd.Series([1, 2, None], dtype="Int64")
88+
7189
# arithmetic
7290
s + 1
7391

doc/source/user_guide/style.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -677,7 +677,7 @@
677677
"cell_type": "markdown",
678678
"metadata": {},
679679
"source": [
680-
"Notice that you're able share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
680+
"Notice that you're able to share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
681681
]
682682
},
683683
{

doc/source/whatsnew/v0.15.0.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -312,14 +312,13 @@ Timezone handling improvements
312312
previously this resulted in ``Exception`` or ``TypeError`` (:issue:`7812`)
313313

314314
.. ipython:: python
315-
:okwarning:
316315
317316
ts = pd.Timestamp('2014-08-01 09:00', tz='US/Eastern')
318317
ts
319318
ts.tz_localize(None)
320319
321-
didx = pd.DatetimeIndex(start='2014-08-01 09:00', freq='H',
322-
periods=10, tz='US/Eastern')
320+
didx = pd.date_range(start='2014-08-01 09:00', freq='H',
321+
periods=10, tz='US/Eastern')
323322
didx
324323
didx.tz_localize(None)
325324

doc/source/whatsnew/v1.0.0.rst

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,58 @@ The following methods now also correctly output values for unobserved categories
303303
304304
df.groupby(["cat_1", "cat_2"], observed=False)["value"].count()
305305
306+
:meth:`pandas.array` inference changes
307+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
308+
309+
:meth:`pandas.array` now infers pandas' new extension types in several cases (:issue:`29791`):
310+
311+
1. String data (including missing values) now returns a :class:`arrays.StringArray`.
312+
2. Integer data (including missing values) now returns a :class:`arrays.IntegerArray`.
313+
3. Boolean data (including missing values) now returns the new :class:`arrays.BooleanArray`
314+
315+
*pandas 0.25.x*
316+
317+
.. code-block:: python
318+
319+
>>> pd.array(["a", None])
320+
<PandasArray>
321+
['a', None]
322+
Length: 2, dtype: object
323+
324+
>>> pd.array([1, None])
325+
<PandasArray>
326+
[1, None]
327+
Length: 2, dtype: object
328+
329+
330+
*pandas 1.0.0*
331+
332+
.. ipython:: python
333+
334+
pd.array(["a", None])
335+
pd.array([1, None])
336+
337+
As a reminder, you can specify the ``dtype`` to disable all inference.
338+
339+
By default :meth:`Categorical.min` now returns the minimum instead of np.nan
340+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
341+
342+
When :class:`Categorical` contains ``np.nan``,
343+
:meth:`Categorical.min` no longer return ``np.nan`` by default (skipna=True) (:issue:`25303`)
344+
345+
*pandas 0.25.x*
346+
347+
.. code-block:: ipython
348+
349+
In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
350+
Out[1]: nan
351+
352+
353+
*pandas 1.0.0*
354+
355+
.. ipython:: python
356+
357+
pd.Categorical([1, 2, np.nan], ordered=True).min()
306358
307359
.. _whatsnew_1000.api_breaking.deps:
308360

@@ -388,7 +440,6 @@ Other API changes
388440
- :meth:`Series.dropna` has dropped its ``**kwargs`` argument in favor of a single ``how`` parameter.
389441
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
390442
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
391-
-
392443

393444

394445
.. _whatsnew_1000.api.documentation:
@@ -410,6 +461,8 @@ Deprecations
410461
- :func:`is_extension_type` is deprecated, :func:`is_extension_array_dtype` should be used instead (:issue:`29457`)
411462
- :func:`eval` keyword argument "truediv" is deprecated and will be removed in a future version (:issue:`29812`)
412463
- :meth:`Categorical.take_nd` is deprecated, use :meth:`Categorical.take` instead (:issue:`27745`)
464+
- The parameter ``numeric_only`` of :meth:`Categorical.min` and :meth:`Categorical.max` is deprecated and replaced with ``skipna`` (:issue:`25303`)
465+
-
413466

414467
.. _whatsnew_1000.prior_deprecations:
415468

@@ -465,6 +518,8 @@ or ``matplotlib.Axes.plot``. See :ref:`plotting.formatters` for more.
465518
- :meth:`pandas.Series.str.cat` now defaults to aligning ``others``, using ``join='left'`` (:issue:`27611`)
466519
- :meth:`pandas.Series.str.cat` does not accept list-likes *within* list-likes anymore (:issue:`27611`)
467520
- :meth:`Series.where` with ``Categorical`` dtype (or :meth:`DataFrame.where` with ``Categorical`` column) no longer allows setting new categories (:issue:`24114`)
521+
- :class:`DatetimeIndex`, :class:`TimedeltaIndex`, and :class:`PeriodIndex` constructors no longer allow ``start``, ``end``, and ``periods`` keywords, use :func:`date_range`, :func:`timedelta_range`, and :func:`period_range` instead (:issue:`23919`)
522+
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` constructors no longer have a ``verify_integrity`` keyword argument (:issue:`23919`)
468523
- :func:`core.internals.blocks.make_block` no longer accepts the "fastpath" keyword(:issue:`19265`)
469524
- :meth:`Block.make_block_same_class` no longer accepts the "dtype" keyword(:issue:`19434`)
470525
- Removed the previously deprecated :meth:`ExtensionArray._formatting_values`. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)

pandas/_libs/interval.pyx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -179,8 +179,8 @@ cdef class IntervalMixin:
179179
When `other` is not closed exactly the same as self.
180180
"""
181181
if self.closed != other.closed:
182-
msg = f"'{name}.closed' is '{other.closed}', expected '{self.closed}'."
183-
raise ValueError(msg)
182+
raise ValueError(f"'{name}.closed' is {repr(other.closed)}, "
183+
f"expected {repr(self.closed)}.")
184184

185185

186186
cdef _interval_like(other):
@@ -316,7 +316,7 @@ cdef class Interval(IntervalMixin):
316316
not tz_compare(left.tzinfo, right.tzinfo)):
317317
# GH 18538
318318
msg = (f"left and right must have the same time zone, got "
319-
f"'{left.tzinfo}' and '{right.tzinfo}'")
319+
f"{repr(left.tzinfo)}' and {repr(right.tzinfo)}")
320320
raise ValueError(msg)
321321
self.left = left
322322
self.right = right
@@ -379,7 +379,7 @@ cdef class Interval(IntervalMixin):
379379

380380
left, right = self._repr_base()
381381
name = type(self).__name__
382-
repr_str = f'{name}({left!r}, {right!r}, closed={self.closed!r})'
382+
repr_str = f'{name}({repr(left)}, {repr(right)}, closed={repr(self.closed)})'
383383
return repr_str
384384

385385
def __str__(self) -> str:

pandas/_libs/lib.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1313,7 +1313,7 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
13131313

13141314
elif isinstance(val, str):
13151315
if is_string_array(values, skipna=skipna):
1316-
return 'string'
1316+
return "string"
13171317

13181318
elif isinstance(val, bytes):
13191319
if is_bytes_array(values, skipna=skipna):

pandas/_libs/tslibs/offsets.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import cython
22

33
import time
4+
from typing import Any
45
from cpython.datetime cimport (PyDateTime_IMPORT,
56
PyDateTime_Check,
67
PyDelta_Check,
@@ -328,7 +329,7 @@ class _BaseOffset:
328329
def __setattr__(self, name, value):
329330
raise AttributeError("DateOffset objects are immutable.")
330331

331-
def __eq__(self, other) -> bool:
332+
def __eq__(self, other: Any) -> bool:
332333
if isinstance(other, str):
333334
try:
334335
# GH#23524 if to_offset fails, we are dealing with an

pandas/conftest.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def spmatrix(request):
8888
return getattr(sparse, request.param + "_matrix")
8989

9090

91-
@pytest.fixture(params=[0, 1, "index", "columns"], ids=lambda x: "axis {!r}".format(x))
91+
@pytest.fixture(params=[0, 1, "index", "columns"], ids=lambda x: f"axis {repr(x)}")
9292
def axis(request):
9393
"""
9494
Fixture for returning the axis numbers of a DataFrame.
@@ -99,7 +99,7 @@ def axis(request):
9999
axis_frame = axis
100100

101101

102-
@pytest.fixture(params=[0, "index"], ids=lambda x: "axis {!r}".format(x))
102+
@pytest.fixture(params=[0, "index"], ids=lambda x: f"axis {repr(x)}")
103103
def axis_series(request):
104104
"""
105105
Fixture for returning the axis numbers of a Series.

pandas/core/accessor.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -183,9 +183,9 @@ def _register_accessor(name, cls):
183183
def decorator(accessor):
184184
if hasattr(cls, name):
185185
warnings.warn(
186-
"registration of accessor {!r} under name {!r} for type "
187-
"{!r} is overriding a preexisting attribute with the same "
188-
"name.".format(accessor, name, cls),
186+
f"registration of accessor {repr(accessor)} under name "
187+
f"{repr(name)} for type {repr(cls)} is overriding a preexisting"
188+
f"attribute with the same name.",
189189
UserWarning,
190190
stacklevel=2,
191191
)

pandas/core/algorithms.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,10 +1194,8 @@ def compute(self, method):
11941194
dtype = frame[column].dtype
11951195
if not self.is_valid_dtype_n_method(dtype):
11961196
raise TypeError(
1197-
(
1198-
"Column {column!r} has dtype {dtype}, cannot use method "
1199-
"{method!r} with this dtype"
1200-
).format(column=column, dtype=dtype, method=method)
1197+
f"Column {repr(column)} has dtype {dtype}, "
1198+
f"cannot use method {repr(method)} with this dtype"
12011199
)
12021200

12031201
def get_indexer(current_indexer, other_indexer):

pandas/core/arrays/base.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,9 @@ def _values_for_argsort(self) -> np.ndarray:
451451
# Note: this is used in `ExtensionArray.argsort`.
452452
return np.array(self)
453453

454-
def argsort(self, ascending=True, kind="quicksort", *args, **kwargs):
454+
def argsort(
455+
self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
456+
) -> np.ndarray:
455457
"""
456458
Return the indices that would sort this array.
457459
@@ -467,7 +469,7 @@ def argsort(self, ascending=True, kind="quicksort", *args, **kwargs):
467469
468470
Returns
469471
-------
470-
index_array : ndarray
472+
ndarray
471473
Array of indices that sort ``self``. If NaN values are contained,
472474
NaN values are placed at the end.
473475
@@ -1198,10 +1200,9 @@ def _maybe_convert(arr):
11981200

11991201
if op.__name__ in {"divmod", "rdivmod"}:
12001202
a, b = zip(*res)
1201-
res = _maybe_convert(a), _maybe_convert(b)
1202-
else:
1203-
res = _maybe_convert(res)
1204-
return res
1203+
return _maybe_convert(a), _maybe_convert(b)
1204+
1205+
return _maybe_convert(res)
12051206

12061207
op_name = ops._get_op_name(op, True)
12071208
return set_function_name(_binop, op_name, cls)

pandas/core/arrays/categorical.py

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1632,7 +1632,7 @@ def sort_values(self, inplace=False, ascending=True, na_position="last"):
16321632
"""
16331633
inplace = validate_bool_kwarg(inplace, "inplace")
16341634
if na_position not in ["last", "first"]:
1635-
raise ValueError(f"invalid na_position: {na_position!r}")
1635+
raise ValueError(f"invalid na_position: {repr(na_position)}")
16361636

16371637
sorted_idx = nargsort(self, ascending=ascending, na_position=na_position)
16381638

@@ -1769,8 +1769,8 @@ def fillna(self, value=None, method=None, limit=None):
17691769

17701770
else:
17711771
raise TypeError(
1772-
'"value" parameter must be a scalar, dict '
1773-
f'or Series, but you passed a {type(value).__name__!r}"'
1772+
f"'value' parameter must be a scalar, dict "
1773+
f"or Series, but you passed a {type(value).__name__}"
17741774
)
17751775

17761776
return self._constructor(codes, dtype=self.dtype, fastpath=True)
@@ -2123,7 +2123,8 @@ def _reduce(self, name, axis=0, **kwargs):
21232123
raise TypeError(f"Categorical cannot perform the operation {name}")
21242124
return func(**kwargs)
21252125

2126-
def min(self, numeric_only=None, **kwargs):
2126+
@deprecate_kwarg(old_arg_name="numeric_only", new_arg_name="skipna")
2127+
def min(self, skipna=True):
21272128
"""
21282129
The minimum value of the object.
21292130
@@ -2139,17 +2140,18 @@ def min(self, numeric_only=None, **kwargs):
21392140
min : the minimum of this `Categorical`
21402141
"""
21412142
self.check_for_ordered("min")
2142-
if numeric_only:
2143-
good = self._codes != -1
2144-
pointer = self._codes[good].min(**kwargs)
2145-
else:
2146-
pointer = self._codes.min(**kwargs)
2147-
if pointer == -1:
2148-
return np.nan
2143+
good = self._codes != -1
2144+
if not good.all():
2145+
if skipna:
2146+
pointer = self._codes[good].min()
2147+
else:
2148+
return np.nan
21492149
else:
2150-
return self.categories[pointer]
2150+
pointer = self._codes.min()
2151+
return self.categories[pointer]
21512152

2152-
def max(self, numeric_only=None, **kwargs):
2153+
@deprecate_kwarg(old_arg_name="numeric_only", new_arg_name="skipna")
2154+
def max(self, skipna=True):
21532155
"""
21542156
The maximum value of the object.
21552157
@@ -2165,15 +2167,15 @@ def max(self, numeric_only=None, **kwargs):
21652167
max : the maximum of this `Categorical`
21662168
"""
21672169
self.check_for_ordered("max")
2168-
if numeric_only:
2169-
good = self._codes != -1
2170-
pointer = self._codes[good].max(**kwargs)
2171-
else:
2172-
pointer = self._codes.max(**kwargs)
2173-
if pointer == -1:
2174-
return np.nan
2170+
good = self._codes != -1
2171+
if not good.all():
2172+
if skipna:
2173+
pointer = self._codes[good].max()
2174+
else:
2175+
return np.nan
21752176
else:
2176-
return self.categories[pointer]
2177+
pointer = self._codes.max()
2178+
return self.categories[pointer]
21772179

21782180
def mode(self, dropna=True):
21792181
"""

pandas/core/arrays/numpy_.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ def __init__(self, dtype):
4545
self._type = dtype.type
4646

4747
def __repr__(self) -> str:
48-
return "PandasDtype({!r})".format(self.name)
48+
return f"PandasDtype({repr(self.name)})"
4949

5050
@property
5151
def numpy_dtype(self):

0 commit comments

Comments
 (0)