Skip to content

Commit 96ccda8

Browse files
authored
Merge pull request #35 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 72bf4ef + 808004a commit 96ccda8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+1135
-806
lines changed

ci/deps/travis-37.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ name: pandas-dev
22
channels:
33
- defaults
44
- conda-forge
5-
- c3i_test
65
dependencies:
76
- python=3.7.*
87

doc/source/getting_started/basics.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1973,7 +1973,7 @@ Pandas has two ways to store strings.
19731973
1. ``object`` dtype, which can hold any Python object, including strings.
19741974
2. :class:`StringDtype`, which is dedicated to strings.
19751975

1976-
Generally, we recommend using :class:`StringDtype`. See :ref:`text.types` fore more.
1976+
Generally, we recommend using :class:`StringDtype`. See :ref:`text.types` for more.
19771977

19781978
Finally, arbitrary objects may be stored using the ``object`` dtype, but should
19791979
be avoided to the extent possible (for performance and interoperability with

doc/source/user_guide/visualization.rst

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1641,3 +1641,46 @@ when plotting a large number of points.
16411641
:suppress:
16421642
16431643
plt.close('all')
1644+
1645+
Plotting backends
1646+
-----------------
1647+
1648+
Starting in version 0.25, pandas can be extended with third-party plotting backends. The
1649+
main idea is letting users select a plotting backend different than the provided
1650+
one based on Matplotlib.
1651+
1652+
This can be done by passsing 'backend.module' as the argument ``backend`` in ``plot``
1653+
function. For example:
1654+
1655+
.. code-block:: python
1656+
1657+
>>> Series([1, 2, 3]).plot(backend='backend.module')
1658+
1659+
Alternatively, you can also set this option globally, do you don't need to specify
1660+
the keyword in each ``plot`` call. For example:
1661+
1662+
.. code-block:: python
1663+
1664+
>>> pd.set_option('plotting.backend', 'backend.module')
1665+
>>> pd.Series([1, 2, 3]).plot()
1666+
1667+
Or:
1668+
1669+
.. code-block:: python
1670+
1671+
>>> pd.options.plotting.backend = 'backend.module'
1672+
>>> pd.Series([1, 2, 3]).plot()
1673+
1674+
This would be more or less equivalent to:
1675+
1676+
.. code-block:: python
1677+
1678+
>>> import backend.module
1679+
>>> backend.module.plot(pd.Series([1, 2, 3]))
1680+
1681+
The backend module can then use other visualization tools (Bokeh, Altair, hvplot,...)
1682+
to generate the plots. Some libraries implementing a backend for pandas are listed
1683+
on the ecosystem :ref:`ecosystem.visualization` page.
1684+
1685+
Developers guide can be found at
1686+
https://dev.pandas.io/docs/development/extending.html#plotting-backends

doc/source/whatsnew/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Version 1.0
2626

2727
v1.0.0
2828
v1.0.1
29+
v1.0.2
2930

3031
Version 0.25
3132
------------

doc/source/whatsnew/v1.0.1.rst

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_101:
22

3-
What's new in 1.0.1 (??)
4-
------------------------
3+
What's new in 1.0.1 (February 5, 2020)
4+
--------------------------------------
55

66
These are the changes in pandas 1.0.1. See :ref:`release` for a full changelog
77
including other versions of pandas.
@@ -19,13 +19,22 @@ Fixed regressions
1919
- Fixed regression when indexing a ``Series`` or ``DataFrame`` indexed by ``DatetimeIndex`` with a slice containg a :class:`datetime.date` (:issue:`31501`)
2020
- Fixed regression in ``DataFrame.__setitem__`` raising an ``AttributeError`` with a :class:`MultiIndex` and a non-monotonic indexer (:issue:`31449`)
2121
- Fixed regression in :class:`Series` multiplication when multiplying a numeric :class:`Series` with >10000 elements with a timedelta-like scalar (:issue:`31457`)
22+
- Fixed regression in ``.groupby().agg()`` raising an ``AssertionError`` for some reductions like ``min`` on object-dtype columns (:issue:`31522`)
23+
- Fixed regression in ``.groupby()`` aggregations with categorical dtype using Cythonized reduction functions (e.g. ``first``) (:issue:`31450`)
2224
- Fixed regression in :meth:`GroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
25+
- Fixed regression in :meth:`DataFrame.groupby` whereby taking the minimum or maximum of a column with period dtype would raise a ``TypeError``. (:issue:`31471`)
26+
- Fixed regression in :meth:`DataFrame.groupby` with an empty DataFrame grouping by a level of a MultiIndex (:issue:`31670`).
27+
- Fixed regression in :meth:`DataFrame.apply` with object dtype and non-reducing function (:issue:`31505`)
2328
- Fixed regression in :meth:`to_datetime` when parsing non-nanosecond resolution datetimes (:issue:`31491`)
2429
- Fixed regression in :meth:`~DataFrame.to_csv` where specifying an ``na_rep`` might truncate the values written (:issue:`31447`)
30+
- Fixed regression in :class:`Categorical` construction with ``numpy.str_`` categories (:issue:`31499`)
31+
- Fixed regression in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` when selecting a row containing a single ``datetime64`` or ``timedelta64`` column (:issue:`31649`)
2532
- Fixed regression where setting :attr:`pd.options.display.max_colwidth` was not accepting negative integer. In addition, this behavior has been deprecated in favor of using ``None`` (:issue:`31532`)
2633
- Fixed regression in objTOJSON.c fix return-type warning (:issue:`31463`)
2734
- Fixed regression in :meth:`qcut` when passed a nullable integer. (:issue:`31389`)
2835
- Fixed regression in assigning to a :class:`Series` using a nullable integer dtype (:issue:`31446`)
36+
- Fixed performance regression when indexing a ``DataFrame`` or ``Series`` with a :class:`MultiIndex` for the index using a list of labels (:issue:`31648`)
37+
- Fixed regression in :meth:`read_csv` used in file like object ``RawIOBase`` is not recognize ``encoding`` option (:issue:`31575`)
2938

3039
.. ---------------------------------------------------------------------------
3140
@@ -56,10 +65,15 @@ Bug fixes
5665

5766
- Plotting tz-aware timeseries no longer gives UserWarning (:issue:`31205`)
5867

68+
**Interval**
69+
70+
- Bug in :meth:`Series.shift` with ``interval`` dtype raising a ``TypeError`` when shifting an interval array of integers or datetimes (:issue:`34195`)
5971

6072
.. ---------------------------------------------------------------------------
6173
6274
.. _whatsnew_101.contributors:
6375

6476
Contributors
6577
~~~~~~~~~~~~
78+
79+
.. contributors:: v1.0.0..v1.0.1|HEAD

doc/source/whatsnew/v1.0.2.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
.. _whatsnew_102:
2+
3+
What's new in 1.0.2 (February ??, 2020)
4+
---------------------------------------
5+
6+
These are the changes in pandas 1.0.2. See :ref:`release` for a full changelog
7+
including other versions of pandas.
8+
9+
{{ header }}
10+
11+
.. ---------------------------------------------------------------------------
12+
13+
.. _whatsnew_102.regressions:
14+
15+
Fixed regressions
16+
~~~~~~~~~~~~~~~~~
17+
18+
-
19+
-
20+
21+
.. ---------------------------------------------------------------------------
22+
23+
.. _whatsnew_102.bug_fixes:
24+
25+
Bug fixes
26+
~~~~~~~~~
27+
28+
-
29+
-
30+
31+
.. ---------------------------------------------------------------------------
32+
33+
.. _whatsnew_102.contributors:
34+
35+
Contributors
36+
~~~~~~~~~~~~
37+
38+
.. contributors:: v1.0.1..v1.0.2|HEAD

doc/source/whatsnew/v1.1.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ Indexing
156156
- Bug in :meth:`Series.at` and :meth:`DataFrame.at` not matching ``.loc`` behavior when looking up an integer in a :class:`Float64Index` (:issue:`31329`)
157157
- Bug in :meth:`PeriodIndex.is_monotonic` incorrectly returning ``True`` when containing leading ``NaT`` entries (:issue:`31437`)
158158
- Bug in :meth:`DatetimeIndex.get_loc` raising ``KeyError`` with converted-integer key instead of the user-passed key (:issue:`31425`)
159+
- Bug in :meth:`Series.xs` incorrectly returning ``Timestamp`` instead of ``datetime64`` in some object-dtype cases (:issue:`31630`)
159160

160161
Missing
161162
^^^^^^^
@@ -180,7 +181,9 @@ I/O
180181
- Bug in :meth:`read_json` where integer overflow was occuring when json contains big number strings. (:issue:`30320`)
181182
- `read_csv` will now raise a ``ValueError`` when the arguments `header` and `prefix` both are not `None`. (:issue:`27394`)
182183
- Bug in :meth:`DataFrame.to_json` was raising ``NotFoundError`` when ``path_or_buf`` was an S3 URI (:issue:`28375`)
183-
-
184+
- Bug in :meth:`DataFrame.to_parquet` overwriting pyarrow's default for
185+
``coerce_timestamps``; following pyarrow's default allows writing nanosecond
186+
timestamps with ``version="2.0"`` (:issue:`31652`).
184187

185188
Plotting
186189
^^^^^^^^

doc/sphinxext/announce.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,16 @@ def get_authors(revision_range):
5757
pat = "^.*\\t(.*)$"
5858
lst_release, cur_release = [r.strip() for r in revision_range.split("..")]
5959

60+
if "|" in cur_release:
61+
# e.g. v1.0.1|HEAD
62+
maybe_tag, head = cur_release.split("|")
63+
assert head == "HEAD"
64+
if maybe_tag in this_repo.tags:
65+
cur_release = maybe_tag
66+
else:
67+
cur_release = head
68+
revision_range = f"{lst_release}..{cur_release}"
69+
6070
# authors, in current release and previous to current release.
6171
cur = set(re.findall(pat, this_repo.git.shortlog("-s", revision_range), re.M))
6272
pre = set(re.findall(pat, this_repo.git.shortlog("-s", lst_release), re.M))

doc/sphinxext/contributors.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,13 @@
66
77
This will be replaced with a message indicating the number of
88
code contributors and commits, and then list each contributor
9-
individually.
9+
individually. For development versions (before a tag is available)
10+
use::
11+
12+
.. contributors:: v0.23.0..v0.23.1|HEAD
13+
14+
While the v0.23.1 tag does not exist, that will use the HEAD of the
15+
branch as the end of the revision range.
1016
"""
1117
from announce import build_components
1218
from docutils import nodes

pandas/_libs/hashtable_class_helper.pxi.in

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -670,7 +670,9 @@ cdef class StringHashTable(HashTable):
670670
val = values[i]
671671

672672
if isinstance(val, str):
673-
v = get_c_string(val)
673+
# GH#31499 if we have a np.str_ get_c_string wont recognize
674+
# it as a str, even though isinstance does.
675+
v = get_c_string(<str>val)
674676
else:
675677
v = get_c_string(self.na_string_sentinel)
676678
vecs[i] = v
@@ -703,7 +705,9 @@ cdef class StringHashTable(HashTable):
703705
val = values[i]
704706

705707
if isinstance(val, str):
706-
v = get_c_string(val)
708+
# GH#31499 if we have a np.str_ get_c_string wont recognize
709+
# it as a str, even though isinstance does.
710+
v = get_c_string(<str>val)
707711
else:
708712
v = get_c_string(self.na_string_sentinel)
709713
vecs[i] = v

pandas/_libs/index.pyx

Lines changed: 0 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -535,61 +535,6 @@ cdef class PeriodEngine(Int64Engine):
535535
return super(PeriodEngine, self).get_indexer_non_unique(ordinal_array)
536536

537537

538-
cpdef convert_scalar(ndarray arr, object value):
539-
# we don't turn integers
540-
# into datetimes/timedeltas
541-
542-
# we don't turn bools into int/float/complex
543-
544-
if arr.descr.type_num == NPY_DATETIME:
545-
if util.is_array(value):
546-
pass
547-
elif isinstance(value, (datetime, np.datetime64, date)):
548-
return Timestamp(value).to_datetime64()
549-
elif util.is_timedelta64_object(value):
550-
# exclude np.timedelta64("NaT") from value != value below
551-
pass
552-
elif value is None or value != value:
553-
return np.datetime64("NaT", "ns")
554-
raise ValueError("cannot set a Timestamp with a non-timestamp "
555-
f"{type(value).__name__}")
556-
557-
elif arr.descr.type_num == NPY_TIMEDELTA:
558-
if util.is_array(value):
559-
pass
560-
elif isinstance(value, timedelta) or util.is_timedelta64_object(value):
561-
value = Timedelta(value)
562-
if value is NaT:
563-
return np.timedelta64("NaT", "ns")
564-
return value.to_timedelta64()
565-
elif util.is_datetime64_object(value):
566-
# exclude np.datetime64("NaT") which would otherwise be picked up
567-
# by the `value != value check below
568-
pass
569-
elif value is None or value != value:
570-
return np.timedelta64("NaT", "ns")
571-
raise ValueError("cannot set a Timedelta with a non-timedelta "
572-
f"{type(value).__name__}")
573-
574-
else:
575-
validate_numeric_casting(arr.dtype, value)
576-
577-
return value
578-
579-
580-
cpdef validate_numeric_casting(dtype, object value):
581-
# Note: we can't annotate dtype as cnp.dtype because that cases dtype.type
582-
# to integer
583-
if issubclass(dtype.type, (np.integer, np.bool_)):
584-
if util.is_float_object(value) and value != value:
585-
raise ValueError("Cannot assign nan to integer series")
586-
587-
if (issubclass(dtype.type, (np.integer, np.floating, np.complex)) and
588-
not issubclass(dtype.type, np.bool_)):
589-
if util.is_bool_object(value):
590-
raise ValueError("Cannot assign bool to float/integer series")
591-
592-
593538
cdef class BaseMultiIndexCodesEngine:
594539
"""
595540
Base class for MultiIndexUIntEngine and MultiIndexPyIntEngine, which

pandas/_libs/parsers.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -638,7 +638,7 @@ cdef class TextReader:
638638
raise ValueError(f'Unrecognized compression type: '
639639
f'{self.compression}')
640640

641-
if self.encoding and isinstance(source, io.BufferedIOBase):
641+
if self.encoding and isinstance(source, (io.BufferedIOBase, io.RawIOBase)):
642642
source = io.TextIOWrapper(
643643
source, self.encoding.decode('utf-8'), newline='')
644644

pandas/_libs/reduction.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,8 @@ cdef class Reducer:
114114
if self.typ is not None:
115115
# In this case, we also have self.index
116116
name = labels[i]
117-
cached_typ = self.typ(chunk, index=self.index, name=name)
117+
cached_typ = self.typ(
118+
chunk, index=self.index, name=name, dtype=arr.dtype)
118119

119120
# use the cached_typ if possible
120121
if cached_typ is not None:

pandas/_libs/tslibs/resolution.pyx

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ cdef:
2727

2828
# ----------------------------------------------------------------------
2929

30-
cpdef resolution(int64_t[:] stamps, tz=None):
30+
cpdef resolution(const int64_t[:] stamps, tz=None):
3131
cdef:
3232
Py_ssize_t i, n = len(stamps)
3333
npy_datetimestruct dts
@@ -38,7 +38,7 @@ cpdef resolution(int64_t[:] stamps, tz=None):
3838
return _reso_local(stamps, tz)
3939

4040

41-
cdef _reso_local(int64_t[:] stamps, object tz):
41+
cdef _reso_local(const int64_t[:] stamps, object tz):
4242
cdef:
4343
Py_ssize_t i, n = len(stamps)
4444
int reso = RESO_DAY, curr_reso
@@ -106,7 +106,7 @@ cdef inline int _reso_stamp(npy_datetimestruct *dts):
106106
return RESO_DAY
107107

108108

109-
def get_freq_group(freq):
109+
def get_freq_group(freq) -> int:
110110
"""
111111
Return frequency code group of given frequency str or offset.
112112

@@ -189,7 +189,7 @@ class Resolution:
189189
_freq_reso_map = {v: k for k, v in _reso_freq_map.items()}
190190

191191
@classmethod
192-
def get_str(cls, reso):
192+
def get_str(cls, reso: int) -> str:
193193
"""
194194
Return resolution str against resolution code.
195195

@@ -201,7 +201,7 @@ class Resolution:
201201
return cls._reso_str_map.get(reso, 'day')
202202

203203
@classmethod
204-
def get_reso(cls, resostr):
204+
def get_reso(cls, resostr: str) -> int:
205205
"""
206206
Return resolution str against resolution code.
207207

@@ -216,7 +216,7 @@ class Resolution:
216216
return cls._str_reso_map.get(resostr, cls.RESO_DAY)
217217

218218
@classmethod
219-
def get_freq_group(cls, resostr):
219+
def get_freq_group(cls, resostr: str) -> int:
220220
"""
221221
Return frequency str against resolution str.
222222

@@ -228,7 +228,7 @@ class Resolution:
228228
return get_freq_group(cls.get_freq(resostr))
229229

230230
@classmethod
231-
def get_freq(cls, resostr):
231+
def get_freq(cls, resostr: str) -> str:
232232
"""
233233
Return frequency str against resolution str.
234234

@@ -240,7 +240,7 @@ class Resolution:
240240
return cls._reso_freq_map[resostr]
241241

242242
@classmethod
243-
def get_str_from_freq(cls, freq):
243+
def get_str_from_freq(cls, freq: str) -> str:
244244
"""
245245
Return resolution str against frequency str.
246246

@@ -252,7 +252,7 @@ class Resolution:
252252
return cls._freq_reso_map.get(freq, 'day')
253253

254254
@classmethod
255-
def get_reso_from_freq(cls, freq):
255+
def get_reso_from_freq(cls, freq: str) -> int:
256256
"""
257257
Return resolution code against frequency str.
258258

0 commit comments

Comments
 (0)