Skip to content

Commit c75108d

Browse files
committed
Merge remote-tracking branch 'origin/multi-index-join' into multi-index-join
# Conflicts: # doc/source/whatsnew/v0.24.0.txt # pandas/core/reshape/merge.py # pandas/tests/reshape/merge/test_multi.py
2 parents 405c1a4 + f54c151 commit c75108d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+1813
-802
lines changed

ci/code_checks.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,14 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
4444
flake8 pandas/_libs --filename=*.pxi.in,*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403
4545
RET=$(($RET + $?)) ; echo $MSG "DONE"
4646

47+
# Check that cython casting is of the form `<type>obj` as opposed to `<type> obj`;
48+
# it doesn't make a difference, but we want to be internally consistent.
49+
# Note: this grep pattern is (intended to be) equivalent to the python
50+
# regex r'(?<![ ->])> '
51+
MSG='Linting .pyx code for spacing conventions in casting' ; echo $MSG
52+
! grep -r -E --include '*.pyx' --include '*.pxi.in' '> ' pandas/_libs | grep -v '[ ->]> '
53+
RET=$(($RET + $?)) ; echo $MSG "DONE"
54+
4755
# readability/casting: Warnings about C casting instead of C++ casting
4856
# runtime/int: Warnings about using C number types instead of C++ ones
4957
# build/include_subdir: Warnings about prefacing included header files with directory

doc/source/conf.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@
9999
# JP: added from sphinxdocs
100100
autosummary_generate = False
101101

102-
if any(re.match("\s*api\s*", l) for l in index_rst_lines):
102+
if any(re.match(r"\s*api\s*", l) for l in index_rst_lines):
103103
autosummary_generate = True
104104

105105
# numpydoc
@@ -341,8 +341,8 @@
341341
# file, target name, title, author, documentclass [howto/manual]).
342342
latex_documents = [
343343
('index', 'pandas.tex',
344-
u'pandas: powerful Python data analysis toolkit',
345-
u'Wes McKinney\n\& PyData Development Team', 'manual'),
344+
'pandas: powerful Python data analysis toolkit',
345+
r'Wes McKinney\n\& PyData Development Team', 'manual'),
346346
]
347347

348348
# The name of an image file (relative to this directory) to place at the top of
@@ -569,7 +569,11 @@ def linkcode_resolve(domain, info):
569569
return None
570570

571571
try:
572-
fn = inspect.getsourcefile(obj)
572+
# inspect.unwrap() was added in Python version 3.4
573+
if sys.version_info >= (3, 5):
574+
fn = inspect.getsourcefile(inspect.unwrap(obj))
575+
else:
576+
fn = inspect.getsourcefile(obj)
573577
except:
574578
fn = None
575579
if not fn:

doc/source/contributing.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,54 @@ Alternatively, you can install the ``grep`` and ``xargs`` commands via the
612612
`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
613613
commands above.
614614

615+
.. _contributing.import-formatting:
616+
617+
Import Formatting
618+
~~~~~~~~~~~~~~~~~
619+
*pandas* uses `isort <https://pypi.org/project/isort/>`__ to standardise import
620+
formatting across the codebase.
621+
622+
A guide to import layout as per pep8 can be found `here <https://www.python.org/dev/peps/pep-0008/#imports/>`__.
623+
624+
A summary of our current import sections ( in order ):
625+
626+
* Future
627+
* Python Standard Library
628+
* Third Party
629+
* ``pandas._libs``, ``pandas.compat``, ``pandas.util._*``, ``pandas.errors`` (largely not dependent on ``pandas.core``)
630+
* ``pandas.core.dtypes`` (largely not dependent on the rest of ``pandas.core``)
631+
* Rest of ``pandas.core.*``
632+
* Non-core ``pandas.io``, ``pandas.plotting``, ``pandas.tseries``
633+
* Local application/library specific imports
634+
635+
Imports are alphabetically sorted within these sections.
636+
637+
638+
As part of :ref:`Continuous Integration <contributing.ci>` checks we run::
639+
640+
isort --recursive --check-only pandas
641+
642+
to check that imports are correctly formatted as per the `setup.cfg`.
643+
644+
If you see output like the below in :ref:`Continuous Integration <contributing.ci>` checks:
645+
646+
.. code-block:: shell
647+
648+
Check import format using isort
649+
ERROR: /home/travis/build/pandas-dev/pandas/pandas/io/pytables.py Imports are incorrectly sorted
650+
Check import format using isort DONE
651+
The command "ci/code_checks.sh" exited with 1
652+
653+
You should run::
654+
655+
isort pandas/io/pytables.py
656+
657+
to automatically format imports correctly. This will modify your local copy of the files.
658+
659+
The `--recursive` flag can be passed to sort all files in a directory.
660+
661+
You can then verify the changes look ok, then git :ref:`commit <contributing.commit-code>` and :ref:`push <contributing.push-code>`.
662+
615663
Backwards Compatibility
616664
~~~~~~~~~~~~~~~~~~~~~~~
617665

@@ -1078,6 +1126,8 @@ or a new keyword argument (`example <https://github.com/pandas-dev/pandas/blob/v
10781126
Contributing your changes to *pandas*
10791127
=====================================
10801128

1129+
.. _contributing.commit-code:
1130+
10811131
Committing your code
10821132
--------------------
10831133

@@ -1122,6 +1172,8 @@ Now you can commit your changes in your local repository::
11221172

11231173
git commit -m
11241174

1175+
.. _contributing.push-code:
1176+
11251177
Pushing your changes
11261178
--------------------
11271179

doc/source/cookbook.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1226,6 +1226,17 @@ Computation
12261226
Correlation
12271227
***********
12281228

1229+
Often it's useful to obtain the lower (or upper) triangular form of a correlation matrix calculated from :func:`DataFrame.corr`. This can be achieved by passing a boolean mask to ``where`` as follows:
1230+
1231+
.. ipython:: python
1232+
1233+
df = pd.DataFrame(np.random.random(size=(100, 5)))
1234+
1235+
corr_mat = df.corr()
1236+
mask = np.tril(np.ones_like(corr_mat, dtype=np.bool), k=-1)
1237+
1238+
corr_mat.where(mask)
1239+
12291240
The `method` argument within `DataFrame.corr` can accept a callable in addition to the named correlation types. Here we compute the `distance correlation <https://en.wikipedia.org/wiki/Distance_correlation>`__ matrix for a `DataFrame` object.
12301241

12311242
.. code-block:: python

doc/source/groupby.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,16 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
125125
grouped = df.groupby('A')
126126
grouped = df.groupby(['A', 'B'])
127127
128+
.. versionadded:: 0.24
129+
130+
If we also have a MultiIndex on columns ``A`` and ``B``, we can group by all
131+
but the specified columns
132+
133+
.. ipython:: python
134+
135+
df2 = df.set_index(['A', 'B'])
136+
grouped = df2.groupby(level=df2.index.names.difference(['B'])
137+
128138
These will split the DataFrame on its index (rows). We could also split by the
129139
columns:
130140

doc/source/whatsnew/v0.24.0.txt

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,9 @@ v0.24.0 (Month XX, 2018)
1313
New features
1414
~~~~~~~~~~~~
1515
- :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`)
16-
17-
1816
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)
19-
17+
- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups
18+
<groupby.split>` for more information (:issue:`15475`, :issue:`15506`)
2019
- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing
2120
the user to override the engine's default behavior to include or omit the
2221
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
@@ -219,7 +218,8 @@ For earlier versions this can be done using the following.
219218
.. ipython:: python
220219

221220
pd.merge(left.reset_index(), right.reset_index(),
222-
on=['key'], how='inner').set_index(['key','X','Y'])
221+
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
222+
223223
.. _whatsnew_0240.enhancements.rename_axis:
224224

225225
Renaming names in a MultiIndex
@@ -267,6 +267,7 @@ Other Enhancements
267267
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
268268
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
269269
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
270+
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`22647`)
270271
- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
271272
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
272273
- :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
@@ -1100,6 +1101,7 @@ Performance Improvements
11001101
- Improved the performance of :func:`pandas.get_dummies` with ``sparse=True`` (:issue:`21997`)
11011102
- Improved performance of :func:`IndexEngine.get_indexer_non_unique` for sorted, non-unique indexes (:issue:`9466`)
11021103
- Improved performance of :func:`PeriodIndex.unique` (:issue:`23083`)
1104+
- Improved performance of :func:`pd.concat` for `Series` objects (:issue:`23404`)
11031105

11041106

11051107
.. _whatsnew_0240.docs:
@@ -1189,6 +1191,7 @@ Timezones
11891191
- Bug in :meth:`DatetimeIndex.unique` that did not re-localize tz-aware dates correctly (:issue:`21737`)
11901192
- Bug when indexing a :class:`Series` with a DST transition (:issue:`21846`)
11911193
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` where an ``AmbiguousTimeError`` or ``NonExistentTimeError`` would raise if a timezone aware timeseries ended on a DST transition (:issue:`19375`, :issue:`10117`)
1194+
- Bug in :meth:`DataFrame.drop` and :meth:`Series.drop` when specifying a tz-aware Timestamp key to drop from a :class:`DatetimeIndex` with a DST transition (:issue:`21761`)
11921195

11931196
Offsets
11941197
^^^^^^^
@@ -1236,6 +1239,7 @@ Indexing
12361239
- The traceback from a ``KeyError`` when asking ``.loc`` for a single missing label is now shorter and more clear (:issue:`21557`)
12371240
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError``, consistently with the case of a flat :class:`Int64Index`, rather than falling back to positional indexing (:issue:`21593`)
12381241
- Bug in :meth:`DatetimeIndex.reindex` when reindexing a tz-naive and tz-aware :class:`DatetimeIndex` (:issue:`8306`)
1242+
- Bug in :meth:`Series.reindex` when reindexing an empty series with a ``datetime64[ns, tz]`` dtype (:issue:`20869`)
12391243
- Bug in :class:`DataFrame` when setting values with ``.loc`` and a timezone aware :class:`DatetimeIndex` (:issue:`11365`)
12401244
- ``DataFrame.__getitem__`` now accepts dictionaries and dictionary keys as list-likes of labels, consistently with ``Series.__getitem__`` (:issue:`21294`)
12411245
- Fixed ``DataFrame[np.nan]`` when columns are non-unique (:issue:`21428`)
@@ -1312,6 +1316,7 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
13121316
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
13131317
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
13141318
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
1319+
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
13151320

13161321
Plotting
13171322
^^^^^^^^

pandas/_libs/algos.pyx

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ import missing
3232

3333
cdef float64_t FP_ERR = 1e-13
3434

35-
cdef double NaN = <double> np.NaN
35+
cdef double NaN = <double>np.NaN
3636
cdef double nan = NaN
3737

3838
cdef int64_t iNaT = get_nat()
@@ -77,6 +77,8 @@ class NegInfinity(object):
7777
__ge__ = lambda self, other: isinstance(other, NegInfinity)
7878

7979

80+
@cython.wraparound(False)
81+
@cython.boundscheck(False)
8082
cpdef ndarray[int64_t, ndim=1] unique_deltas(ndarray[int64_t] arr):
8183
"""
8284
Efficiently find the unique first-differences of the given array.
@@ -240,7 +242,7 @@ def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
240242
int64_t nobs = 0
241243
float64_t vx, vy, sumx, sumy, sumxx, sumyy, meanx, meany, divisor
242244

243-
N, K = (<object> mat).shape
245+
N, K = (<object>mat).shape
244246

245247
if minp is None:
246248
minpv = 1
@@ -305,7 +307,7 @@ def nancorr_spearman(ndarray[float64_t, ndim=2] mat, Py_ssize_t minp=1):
305307
int64_t nobs = 0
306308
float64_t vx, vy, sumx, sumxx, sumyy, mean, divisor
307309

308-
N, K = (<object> mat).shape
310+
N, K = (<object>mat).shape
309311

310312
result = np.empty((K, K), dtype=np.float64)
311313
mask = np.isfinite(mat).view(np.uint8)
@@ -529,7 +531,7 @@ def pad_2d_inplace(ndarray[algos_t, ndim=2] values,
529531
algos_t val
530532
int lim, fill_count = 0
531533

532-
K, N = (<object> values).shape
534+
K, N = (<object>values).shape
533535

534536
# GH#2778
535537
if N == 0:
@@ -728,7 +730,7 @@ def backfill_2d_inplace(ndarray[algos_t, ndim=2] values,
728730
algos_t val
729731
int lim, fill_count = 0
730732

731-
K, N = (<object> values).shape
733+
K, N = (<object>values).shape
732734

733735
# GH#2778
734736
if N == 0:
@@ -793,7 +795,7 @@ arrmap_bool = arrmap["uint8_t"]
793795

794796
@cython.boundscheck(False)
795797
@cython.wraparound(False)
796-
def is_monotonic(ndarray[algos_t] arr, bint timelike):
798+
def is_monotonic(ndarray[algos_t, ndim=1] arr, bint timelike):
797799
"""
798800
Returns
799801
-------

pandas/_libs/algos_common_helper.pxi.in

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,6 @@
11
"""
22
Template for each `dtype` helper function using 1-d template
33

4-
# 1-d template
5-
- pad
6-
- pad_1d
7-
- pad_2d
8-
- backfill
9-
- backfill_1d
10-
- backfill_2d
11-
- is_monotonic
12-
- arrmap
13-
144
WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
155
"""
166

@@ -44,7 +34,7 @@ def diff_2d_{{name}}(ndarray[{{c_type}}, ndim=2] arr,
4434
cdef:
4535
Py_ssize_t i, j, sx, sy
4636

47-
sx, sy = (<object> arr).shape
37+
sx, sy = (<object>arr).shape
4838
if arr.flags.f_contiguous:
4939
if axis == 0:
5040
if periods >= 0:
@@ -98,14 +88,14 @@ def put2d_{{name}}_{{dest_name}}(ndarray[{{c_type}}, ndim=2, cast=True] values,
9888
# ensure_dtype
9989
#----------------------------------------------------------------------
10090

101-
cdef int PLATFORM_INT = (<ndarray> np.arange(0, dtype=np.intp)).descr.type_num
91+
cdef int PLATFORM_INT = (<ndarray>np.arange(0, dtype=np.intp)).descr.type_num
10292

10393

10494
def ensure_platform_int(object arr):
10595
# GH3033, GH1392
10696
# platform int is the size of the int pointer, e.g. np.intp
10797
if util.is_array(arr):
108-
if (<ndarray> arr).descr.type_num == PLATFORM_INT:
98+
if (<ndarray>arr).descr.type_num == PLATFORM_INT:
10999
return arr
110100
else:
111101
return arr.astype(np.intp)
@@ -115,7 +105,7 @@ def ensure_platform_int(object arr):
115105

116106
def ensure_object(object arr):
117107
if util.is_array(arr):
118-
if (<ndarray> arr).descr.type_num == NPY_OBJECT:
108+
if (<ndarray>arr).descr.type_num == NPY_OBJECT:
119109
return arr
120110
else:
121111
return arr.astype(np.object_)
@@ -152,7 +142,7 @@ def get_dispatch(dtypes):
152142

153143
def ensure_{{name}}(object arr, copy=True):
154144
if util.is_array(arr):
155-
if (<ndarray> arr).descr.type_num == NPY_{{c_type}}:
145+
if (<ndarray>arr).descr.type_num == NPY_{{c_type}}:
156146
return arr
157147
else:
158148
return arr.astype(np.{{dtype}}, copy=copy)

pandas/_libs/algos_rank_helper.pxi.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ def rank_2d_{{dtype}}(object in_arr, axis=0, ties_method='average',
263263
np.putmask(values, mask, nan_value)
264264
{{endif}}
265265

266-
n, k = (<object> values).shape
266+
n, k = (<object>values).shape
267267
ranks = np.empty((n, k), dtype='f8')
268268

269269
{{if dtype == 'object'}}

pandas/_libs/algos_take_helper.pxi.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ cdef _take_2d(ndarray[take_t, ndim=2] values, object idx):
278278
ndarray[take_t, ndim=2] result
279279
object val
280280

281-
N, K = (<object> values).shape
281+
N, K = (<object>values).shape
282282

283283
if take_t is object:
284284
# evaluated at compile-time

0 commit comments

Comments
 (0)