Skip to content

Commit 30cb09b

Browse files
author
Albert Villanova del Moral
committed
Address requested changes
1 parent f41884e commit 30cb09b

File tree

5 files changed

+154
-66
lines changed

5 files changed

+154
-66
lines changed

doc/source/whatsnew/v0.20.0.txt

Lines changed: 116 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,7 @@ Other enhancements
228228
- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
229229
- ``pd.types.concat.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
230230
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
231+
- ``Index.intersection()`` accepts parameter ``sort`` (:issue:`15582`)
231232

232233
.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
233234

@@ -567,52 +568,135 @@ New Behavior:
567568

568569
.. _whatsnew_0200.api_breaking.index_order:
569570

570-
Index order after DataFrame inner join or Index intersection
571-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
571+
Index order after inner join due to Index intersection
572+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
572573

573-
The ``DataFrame`` inner join and the ``Index`` intersection, now preserve the
574-
order of the calling's Index (left) instead of the other's Index (right)
575-
(:issue:`15582`)
574+
The ``Index.intersection`` now preserves the order of the calling Index (left)
575+
instead of the other Index (right) (:issue:`15582`). This affects the inner
576+
joins (methods ``Index.join``, ``DataFrame.join``, ``DataFrame.merge`` and
577+
``pd.merge``) and the alignments with inner join (methods ``Series.align`` and
578+
``DataFrame.align``).
576579

577-
Previous Behavior:
580+
- ``Index.intersection`` and ``Index.join``
578581

579-
.. code-block:: ipython
580-
In [2]: df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
582+
.. ipython:: python
581583

582-
In [3]: df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
584+
idx1 = pd.Index([2, 1, 0])
585+
idx1
586+
idx2 = pd.Index([1, 2, 3])
587+
idx2
583588

584-
In [4]: df1.join(df2, how='inner')
585-
Out[4]:
586-
a b
587-
1 10 100
588-
2 20 200
589+
Previous Behavior:
589590

590-
In [5]: idx1 = pd.Index([5, 3, 2, 4, 1])
591+
.. code-block:: ipython
591592

592-
In [6]: idx2 = pd.Index([4, 7, 6, 5, 3])
593+
In [4]: idx1.intersection(idx2)
594+
Out[4]: Int64Index([1, 2], dtype='int64')
593595

594-
In [7]: idx1.intersection(idx2)
595-
Out[7]: Int64Index([4, 5, 3], dtype='int64')
596+
In [5]: idx1.join(idx2, how='inner')
597+
Out[5]: Int64Index([1, 2], dtype='int64')
596598

597-
New Behavior:
599+
New Behavior:
598600

599-
.. code-block:: ipython
600-
In [2]: df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
601+
.. ipython:: python
602+
603+
idx1.intersection(idx2)
604+
605+
idx1.join(idx2, how='inner')
606+
607+
- ``Series.align``
608+
609+
.. ipython:: python
610+
611+
s1 = pd.Series([20, 10, 0], index=[2, 1, 0])
612+
s1
613+
s2 = pd.Series([100, 200, 300], index=[1, 2, 3])
614+
s2
615+
616+
Previous Behavior:
617+
618+
.. code-block:: ipython
601619

602-
In [3]: df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
620+
In [4]: (res1, res2) = s1.align(s2, join='inner')
621+
622+
In [5]: res1
623+
Out[5]:
624+
1 10
625+
2 20
626+
dtype: int64
627+
628+
In [6]: res2
629+
Out[6]:
630+
1 100
631+
2 200
632+
dtype: int64
633+
634+
New Behavior:
635+
636+
.. ipython:: python
637+
638+
(res1, res2) = s1.align(s2, join='inner')
639+
res1
640+
res2
641+
642+
- ``DataFrame.join``, ``DataFrame.merge`` and ``pd.merge``
643+
644+
.. ipython:: python
645+
646+
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
647+
df1
648+
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
649+
df2
650+
651+
Previous Behavior:
652+
653+
.. code-block:: ipython
654+
655+
In [4]: df1.join(df2, how='inner')
656+
Out[4]:
657+
a b
658+
1 10 100
659+
2 20 200
660+
661+
In [5]: df1.merge(df2, how='inner', left_index=True, right_index=True)
662+
Out[5]:
663+
a b
664+
1 10 100
665+
2 20 200
666+
667+
In [6]: pd.merge(df1, df2, how='inner', left_index=True, right_index=True)
668+
Out[6]:
669+
a b
670+
1 10 100
671+
2 20 200
672+
673+
In [7]: (res1, res2) = df1.align(df2, axis=0, join='inner')
674+
675+
In [8]: res1
676+
Out[8]:
677+
a
678+
1 10
679+
2 20
680+
681+
In [9]: res2
682+
Out[9]:
683+
b
684+
1 100
685+
2 200
686+
687+
New Behavior:
688+
689+
.. ipython:: python
603690

604-
In [4]: df1.join(df2, how='inner')
605-
Out[4]:
606-
a b
607-
2 20 200
608-
1 10 100
691+
df1.join(df2, how='inner')
609692

610-
In [5]: idx1 = pd.Index([5, 3, 2, 4, 1])
693+
df1.merge(df2, how='inner', left_index=True, right_index=True)
611694

612-
In [6]: idx2 = pd.Index([4, 7, 6, 5, 3])
695+
pd.merge(df1, df2, how='inner', left_index=True, right_index=True)
613696

614-
In [7]: idx1.intersection(idx2)
615-
Out[7]: Int64Index([5, 3, 4], dtype='int64')
697+
(res1, res2) = df1.align(df2, axis=0, join='inner')
698+
res1
699+
res2
616700

617701

618702
.. _whatsnew_0200.api:
@@ -820,3 +904,4 @@ Bug Fixes
820904
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
821905
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
822906
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
907+
- Bug with ``sort=True`` in ``DataFrame.join``, ``DataFrame.merge`` and ``pd.merge`` when joining on index (:issue:`15582`)

pandas/core/frame.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,8 @@
129129
* left: use only keys from left frame (SQL: left outer join)
130130
* right: use only keys from right frame (SQL: right outer join)
131131
* outer: use union of keys from both frames (SQL: full outer join)
132-
* inner: use intersection of keys from both frames (SQL: inner join)
132+
* inner: use intersection of keys from both frames (SQL: inner join),
133+
preserving the order of the left keys
133134
on : label or list
134135
Field names to join on. Must be found in both DataFrames. If on is
135136
None and not merging on indexes, then it merges on the intersection of
@@ -149,7 +150,8 @@
149150
Use the index from the right DataFrame as the join key. Same caveats as
150151
left_index
151152
sort : boolean, default False
152-
Sort the join keys lexicographically in the result DataFrame
153+
Sort the join keys lexicographically in the result DataFrame. If False,
154+
the order of the join keys depends on the join type (how keyword)
153155
suffixes : 2-length sequence (tuple, list, ...)
154156
Suffix to apply to overlapping column names in the left and right
155157
side, respectively
@@ -4486,6 +4488,7 @@ def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
44864488
* right: use other frame's index
44874489
* outer: form union of calling frame's index (or column if on is
44884490
specified) with other frame's index, and sort it
4491+
lexicographically
44894492
* inner: form intersection of calling frame's index (or column if
44904493
on is specified) with other frame's index, preserving the
44914494
order of the calling's one

pandas/indexes/base.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2047,7 +2047,7 @@ def intersection(self, other):
20472047
Form the intersection of two Index objects.
20482048
20492049
This returns a new Index with elements common to the index and `other`,
2050-
preserving the calling index order.
2050+
preserving the order of the calling index.
20512051
20522052
Parameters
20532053
----------
@@ -2788,9 +2788,7 @@ def _reindex_non_unique(self, target):
27882788
new_index = self._shallow_copy_with_infer(new_labels, freq=None)
27892789
return new_index, indexer, new_indexer
27902790

2791-
def join(self, other, how='left', level=None, return_indexers=False,
2792-
sort=False):
2793-
"""
2791+
_index_shared_docs['join'] = """
27942792
*this is an internal non-public method*
27952793
27962794
Compute join_index and indexers to conform data
@@ -2804,10 +2802,16 @@ def join(self, other, how='left', level=None, return_indexers=False,
28042802
return_indexers : boolean, default False
28052803
sort : boolean, default False
28062804
2805+
.. versionadded:: 0.20.0
2806+
28072807
Returns
28082808
-------
28092809
join_index, (left_indexer, right_indexer)
28102810
"""
2811+
2812+
@Appender(_index_shared_docs['join'])
2813+
def join(self, other, how='left', level=None, return_indexers=False,
2814+
sort=False):
28112815
from .multi import MultiIndex
28122816
self_is_mi = isinstance(self, MultiIndex)
28132817
other_is_mi = isinstance(other, MultiIndex)

pandas/indexes/range.py

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -431,26 +431,9 @@ def union(self, other):
431431

432432
return self._int64index.union(other)
433433

434+
@Appender(_index_shared_docs['join'])
434435
def join(self, other, how='left', level=None, return_indexers=False,
435436
sort=False):
436-
"""
437-
*this is an internal non-public method*
438-
439-
Compute join_index and indexers to conform data
440-
structures to the new index.
441-
442-
Parameters
443-
----------
444-
other : Index
445-
how : {'left', 'right', 'inner', 'outer'}
446-
level : int or level name, default None
447-
return_indexers : boolean, default False
448-
sort : boolean, default False
449-
450-
Returns
451-
-------
452-
join_index, (left_indexer, right_indexer)
453-
"""
454437
if how == 'outer' and self is not other:
455438
# note: could return RangeIndex in more circumstances
456439
return self._int64index.join(other, how, level, return_indexers,

pandas/tests/frame/test_join.py

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
from __future__ import print_function
44

5+
import numpy as np
6+
57
import pandas as pd
68

79
from pandas.tests.frame.common import TestData
@@ -15,64 +17,75 @@ def test_join(self):
1517
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
1618
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
1719

20+
# default how='left'
1821
result = df1.join(df2)
19-
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, None]},
22+
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, np.nan]},
2023
index=[2, 1, 0])
2124
tm.assert_frame_equal(result, expected)
2225

26+
# how='left'
2327
result = df1.join(df2, how='left')
24-
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, None]},
28+
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, np.nan]},
2529
index=[2, 1, 0])
2630
tm.assert_frame_equal(result, expected)
2731

32+
# how='right'
2833
result = df1.join(df2, how='right')
29-
expected = pd.DataFrame({'a': [10, 20, None], 'b': [100, 200, 300]},
34+
expected = pd.DataFrame({'a': [10, 20, np.nan], 'b': [100, 200, 300]},
3035
index=[1, 2, 3])
3136
tm.assert_frame_equal(result, expected)
3237

38+
# how='inner'
3339
result = df1.join(df2, how='inner')
3440
expected = pd.DataFrame({'a': [20, 10], 'b': [200, 100]},
3541
index=[2, 1])
3642
tm.assert_frame_equal(result, expected)
3743

44+
# how='outer'
3845
result = df1.join(df2, how='outer')
39-
expected = pd.DataFrame({'a': [0, 10, 20, None],
40-
'b': [None, 100, 200, 300]},
46+
expected = pd.DataFrame({'a': [0, 10, 20, np.nan],
47+
'b': [np.nan, 100, 200, 300]},
4148
index=[0, 1, 2, 3])
4249
tm.assert_frame_equal(result, expected)
4350

4451
def test_join_sort(self):
4552
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
4653
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
4754

55+
# default how='left'
4856
result = df1.join(df2, sort=True)
49-
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [None, 100, 200]},
57+
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [np.nan, 100, 200]},
5058
index=[0, 1, 2])
5159
tm.assert_frame_equal(result, expected)
5260

61+
# how='left'
5362
result = df1.join(df2, how='left', sort=True)
54-
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [None, 100, 200]},
63+
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [np.nan, 100, 200]},
5564
index=[0, 1, 2])
5665
tm.assert_frame_equal(result, expected)
5766

67+
# how='right' (already sorted)
5868
result = df1.join(df2, how='right', sort=True)
59-
expected = pd.DataFrame({'a': [10, 20, None], 'b': [100, 200, 300]},
69+
expected = pd.DataFrame({'a': [10, 20, np.nan], 'b': [100, 200, 300]},
6070
index=[1, 2, 3])
6171
tm.assert_frame_equal(result, expected)
6272

73+
# how='right'
6374
result = df2.join(df1, how='right', sort=True)
64-
expected = pd.DataFrame([[None, 0], [100, 10], [200, 20]],
75+
expected = pd.DataFrame([[np.nan, 0], [100, 10], [200, 20]],
6576
columns=['b', 'a'], index=[0, 1, 2])
6677
tm.assert_frame_equal(result, expected)
6778

79+
# how='inner'
6880
result = df1.join(df2, how='inner', sort=True)
6981
expected = pd.DataFrame({'a': [10, 20], 'b': [100, 200]},
7082
index=[1, 2])
7183
tm.assert_frame_equal(result, expected)
7284

85+
# how='outer'
7386
result = df1.join(df2, how='outer', sort=True)
74-
expected = pd.DataFrame({'a': [0, 10, 20, None],
75-
'b': [None, 100, 200, 300]},
87+
expected = pd.DataFrame({'a': [0, 10, 20, np.nan],
88+
'b': [np.nan, 100, 200, 300]},
7689
index=[0, 1, 2, 3])
7790
tm.assert_frame_equal(result, expected)
7891

0 commit comments

Comments
 (0)