Skip to content

BUG: setitem with boolean mask and series as value is broken for Series with EA type #37676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 17, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 6, 2020

Was fixed already, added a test

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions labels Nov 6, 2020
{"a": [0, 0, np.nan, np.nan], "b": array(range(4), dtype="Int64")}
)
s = Series(array([1] * 4, dtype="Int64"))
s[df["a"].isna()] = df.loc[df["a"].isna(), "b"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this without constructing a DataFrame

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoided the DataFrame construction. Checked, that this raised the same error as reported in the issue.

@jreback jreback changed the title Add test for 26468 BUG: setitem with boolean mask and series as value is broken for Series with EA type Nov 7, 2020
@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 7, 2020
def test_setitem_boolean_ea_type(self):
# GH: 26468
s = Series(array([5, 6, 7, 8], dtype="Int64"))
s[s > 6] = Series(range(4), dtype="Int64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, couple of nitpicks: can you import array as pd_array, name this ser instead of s (doesnt have to be ser, the important thing is to avoid 1-letter variable names), instead of "ea_type" use "Int64_values" (or better yet, parametrize over nullable int dtypes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the tips

ser[ser > 6] = Series(range(4), dtype="Int64")
expected = Series([5, 6, 2, 3], dtype="Int64")
tm.assert_series_equal(ser, expected)
ser = Series(pd_array([5, 6, 7, 8], dtype="Int64"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a blank line between cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -126,6 +127,16 @@ def test_setitem_boolean_different_order(self, string_series):

tm.assert_series_equal(copy, expected)

def test_setitem_boolean_nullable_int_types(self, any_nullable_int_dtype):
# GH: 26468
ser = Series(pd_array([5, 6, 7, 8], dtype="Int64"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you be using the any_nullable_int_dtype fixture here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, somehow missed that. Thanks for pointing it out. Changed it

@jreback jreback added this to the 1.2 milestone Nov 9, 2020
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

� Conflicts:
�	pandas/tests/series/indexing/test_setitem.py
@@ -127,6 +127,15 @@ def test_setitem_boolean_different_order(self, string_series):

tm.assert_series_equal(copy, expected)

@pytest.mark.parametrize("value", [None, NaT, np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check that np.datetime64("NaT") doesn't get cast to td64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this disappear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats not from my change, that got introduced by 5c4f737

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue assigned a Series that was also subsetted (eg using the same mask). I suppose it won't matter in practice (as it gets aligned first), but can you add such a variant to the test as well?

@@ -135,6 +136,17 @@ def test_setitem_boolean_td64_values_cast_na(self, value):
expected = Series([NaT, 1, 2], dtype="timedelta64[ns]")
tm.assert_series_equal(series, expected)

def test_setitem_boolean_nullable_int_types(self, any_nullable_int_dtype):
# GH: 26468
ser = Series(pd_array([5, 6, 7, 8], dtype=any_nullable_int_dtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the pd.array here, since the Series constructor will handle the nullable dtype as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, removed it

tm.assert_series_equal(ser, expected)

ser = Series([5, 6, 7, 8], dtype=any_numeric_dtype)
ser.loc[ser > 6] = Series(range(4), dtype=any_numeric_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT thats from the reduction bugfix yesterday. im troubleshooting it now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuesday, so not really recently, merged now

@jreback jreback merged commit fef84f4 into pandas-dev:master Nov 17, 2020
@jreback
Copy link
Contributor

jreback commented Nov 17, 2020

thanks @phofl

@phofl phofl deleted the 26468 branch November 17, 2020 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: setitem with boolean mask and series as value is broken for Series with EA type
4 participants