Skip to content

Commit db6a491

Browse files
authored
BUG: head and tail not dropping groups with nan (#45102)
1 parent d23a5f8 commit db6a491

File tree

3 files changed

+36
-0
lines changed

3 files changed

+36
-0
lines changed

doc/source/whatsnew/v1.4.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -898,6 +898,7 @@ Groupby/resample/rolling
898898
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` (:issue:`43926`)
899899
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not respecting right bound on centered datetime-like windows, if the index contain duplicates (:issue:`3944`)
900900
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
901+
- Bug in :meth:`GroupBy.head` and :meth:`GroupBy.tail` not dropping groups with ``NaN`` when ``dropna=True`` (:issue:`45089`)
901902
- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`#44821`)
902903
- Bug in :meth:`Groupby.rolling` when non-monotonic data passed, fails to correctly raise ``ValueError`` (:issue:`43909`)
903904
- Fixed bug where grouping by a :class:`Series` that has a categorical data type and length unequal to the axis of grouping raised ``ValueError`` (:issue:`44179`)

pandas/core/groupby/groupby.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3580,6 +3580,9 @@ def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
35803580
Series or DataFrame
35813581
Filtered _selected_obj.
35823582
"""
3583+
ids = self.grouper.group_info[0]
3584+
mask = mask & (ids != -1)
3585+
35833586
if self.axis == 0:
35843587
return self._selected_obj[mask]
35853588
else:

pandas/tests/groupby/test_nth.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -809,3 +809,35 @@ def test_nth_slices_with_column_axis(
809809
}[method](start, stop)
810810
expected = DataFrame([expected_values], columns=expected_columns)
811811
tm.assert_frame_equal(result, expected)
812+
813+
814+
def test_head_tail_dropna_true():
815+
# GH#45089
816+
df = DataFrame(
817+
[["a", "z"], ["b", np.nan], ["c", np.nan], ["c", np.nan]], columns=["X", "Y"]
818+
)
819+
expected = DataFrame([["a", "z"]], columns=["X", "Y"])
820+
821+
result = df.groupby(["X", "Y"]).head(n=1)
822+
tm.assert_frame_equal(result, expected)
823+
824+
result = df.groupby(["X", "Y"]).tail(n=1)
825+
tm.assert_frame_equal(result, expected)
826+
827+
result = df.groupby(["X", "Y"]).nth(n=0).reset_index()
828+
tm.assert_frame_equal(result, expected)
829+
830+
831+
def test_head_tail_dropna_false():
832+
# GH#45089
833+
df = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])
834+
expected = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])
835+
836+
result = df.groupby(["X", "Y"], dropna=False).head(n=1)
837+
tm.assert_frame_equal(result, expected)
838+
839+
result = df.groupby(["X", "Y"], dropna=False).tail(n=1)
840+
tm.assert_frame_equal(result, expected)
841+
842+
result = df.groupby(["X", "Y"], dropna=False).nth(n=0).reset_index()
843+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)