Skip to content

BUG: can't concatenate DataFrame with Series with duplicate keys #33805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2020

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Apr 26, 2020

@MarcoGorelli MarcoGorelli changed the title don't use getloc, which may return a slice BUG: can't concatenate DataFrame with Series with duplicate keys Apr 26, 2020
except KeyError as err:
raise ValueError(f"Key {key} not in level {level}") from err
mask = level == key
if not any(mask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mask.any()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’s a bit more idiomatic to use .get_indexer here which handles duplicates - if u can make that work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback thanks for the review. I've looked into that, but it seems it errors with duplicates:

>>> pd.Index(['a', 'b']).get_indexer(['a'])                                 
array([0])
>>> pd.Index(['a', 'b', 'b']).get_indexer(['a'])
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-10-0a1bc1d27a1b> in <module>
----> 1 pd.Index(['a', 'b', 'b']).get_indexer(['a'])

~/pandas-dev/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2926 
   2927         if not self.is_unique:
-> 2928             raise InvalidIndexError(
   2929                 "Reindexing only valid with uniquely valued Index objects"
   2930             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Should I open an issue for get_indexer first and make sure that deals with duplicates before coming back to this one?

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 27, 2020
@jreback jreback added this to the 1.1 milestone May 1, 2020
@jreback
Copy link
Contributor

jreback commented May 1, 2020

lgtm. let's merge on green.

@MarcoGorelli
Copy link
Member Author

Thanks - sure, it's green now

@WillAyd WillAyd merged commit b7f061c into pandas-dev:master May 1, 2020
@WillAyd
Copy link
Member

WillAyd commented May 1, 2020

Thanks @MarcoGorelli

@MarcoGorelli MarcoGorelli deleted the issue-33654 branch May 1, 2020 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: can't concatenate DataFrame with Series with duplicate keys
3 participants