Skip to content

BUG: MultiIndex dtypes incorrect if level names not unique #45175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 4, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -835,6 +835,7 @@ MultiIndex
- Bug in :meth:`MultiIndex.get_loc` raising ``TypeError`` instead of ``KeyError`` on nested tuple (:issue:`42440`)
- Bug in :meth:`MultiIndex.union` setting wrong ``sortorder`` causing errors in subsequent indexing operations with slices (:issue:`44752`)
- Bug in :meth:`MultiIndex.putmask` where the other value was also a :class:`MultiIndex` (:issue:`43212`)
- Bug in :meth:`MultiIndex.dtypes` when duplicate level names returned only one dtype (:issue:`45174`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify a bit? Maybe only one entry for all levels with same name. I though this would return a Series with the correct length but with the same dtype in every row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully done.

-

I/O
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -735,13 +735,14 @@ def array(self):
def dtypes(self) -> Series:
"""
Return the dtypes as a Series for the underlying MultiIndex.

.. versionchanged:: 1.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need this, since this qualifies as a bug fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Correct result when there are duplicated level names.
"""
from pandas import Series

names = com.fill_missing_names([level.name for level in self.levels])
return Series(
{names[idx]: level.dtype for idx, level in enumerate(self.levels)}
)
return Series([level.dtype for level in self.levels], index=names)

def __len__(self) -> int:
return len(self.codes[0])
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/indexes/multi/test_get_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@ def test_get_dtypes_no_level_name():
tm.assert_series_equal(expected, idx_multitype.dtypes)


def test_get_dtypes_duplicate_level_names():
# Test MultiIndex.dtypes with non-unique level names (# GH45174 )
result = MultiIndex.from_arrays([[1], [2]], names=[1, 1]).dtypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test with different dtypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected = pd.Series([np.dtype("int64"), np.dtype("int64")], index=[1, 1])
tm.assert_series_equal(result, expected)


def test_get_level_number_out_of_bounds(multiindex_dataframe_random_data):
frame = multiindex_dataframe_random_data

Expand Down