Skip to content

TST: base test for ExtensionArray.astype to its own type + copy keyword #35116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 22, 2020
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
2b8f738
Add test for self dtype
Jul 2, 2020
6cc98c9
Fix boolean astype method copying behaior
Jul 2, 2020
ecbc762
Fix integer IntegerArray astype, same problem
Jul 2, 2020
4fa6d92
Hack in fix for PandasArray.astype()
Jul 2, 2020
6ff48ea
Same hacky fix for period
Jul 2, 2020
1470c6b
Same hacky fix for sparse
Jul 2, 2020
9e96547
Fix StringArray in same hacky way, does not fix __eq__ problem
Jul 2, 2020
2f287ca
Fix DecimalArray in same hacky way
Jul 2, 2020
c7baa62
Undo removal of string tests
Jul 3, 2020
b68f4fb
Fix formatting to pass 'black pandas'
Jul 3, 2020
462704d
Add changes to whatsnew
Jul 3, 2020
609b255
Merge branch 'master' into extension-base-test
Jul 19, 2020
b761926
Merge branch 'master' into extension-base-test
tomaszps Sep 6, 2020
72cb49e
Update v1.1.0.rst
tomaszps Sep 6, 2020
ced29ed
Temporarily disable TestCasting for strings
Sep 6, 2020
7762eb9
Merge branch 'extension-base-test' of https://github.com/tomaszps/pan…
Sep 6, 2020
4ff47fb
Revert "Temporarily disable TestCasting for strings"
Sep 8, 2020
d5095d4
Fix style in `sparse/array.py` to match other changes
Sep 8, 2020
5f464a6
Remove redundant copy check from `string_.py`
Sep 9, 2020
7f45370
Add 'if copy' check to base ExtensionArray astype class
Sep 9, 2020
f134cf4
Clean up small formatting/import issues in numpy_.py
Sep 9, 2020
fbbe28a
Swap `dtype == self.dtype` checks for `is_dtype_equal`
Sep 9, 2020
dcd884d
Remove pointless astype method definition in `PandasArray`
Sep 10, 2020
a4a32db
Make DecimalArray use preexisting code for own-type copy
Sep 10, 2020
b5b1ccc
Fix passing xfail tests in test_numpy.py
Sep 10, 2020
5720f67
Remove unnecessary import from numpy_
Sep 19, 2020
7155e0f
Merge branch 'master' into extension-base-test
Sep 19, 2020
047de22
Fix import formatting using isort
Sep 19, 2020
3dcef56
add comment
jorisvandenbossche Sep 19, 2020
854d2bb
Add changes to whatsnew
Sep 19, 2020
42cd940
Fix .rst typo
Sep 19, 2020
10f0f64
Update doc/source/whatsnew/v1.2.0.rst
tomaszps Sep 21, 2020
3f34d28
Update pandas/tests/extension/test_numpy.py
tomaszps Sep 21, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,10 @@ def astype(self, dtype, copy: bool = True) -> ArrayLike:

if isinstance(dtype, BooleanDtype):
values, mask = coerce_to_array(self, copy=copy)
return BooleanArray(values, mask, copy=False)
if not copy:
return self
else:
return BooleanArray(values, mask, copy=False)
elif isinstance(dtype, StringDtype):
return dtype.construct_array_type()._from_sequence(self, copy=False)

Expand Down
11 changes: 10 additions & 1 deletion pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from numpy.lib.mixins import NDArrayOperatorsMixin

from pandas._libs import lib
from pandas._typing import Scalar
from pandas._typing import Scalar, ArrayLike
from pandas.compat.numpy import function as nv
from pandas.util._decorators import doc
from pandas.util._validators import validate_fillna_kwargs
Expand Down Expand Up @@ -276,6 +276,15 @@ def __setitem__(self, key, value) -> None:

self._ndarray[key] = value

def astype(self, dtype, copy: bool = True) -> ArrayLike:
if dtype == self.dtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is that np.dtype(object) == ExtensionDtype raises an error, which is an error coming from numpy and not directly solvable in pandas.
So to workaround it, can you do is_dtype_equal(dtype, self.dtype):, with is_dtype_equal imported from pandas.core.dtypes.common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and moved checks to base class. This just calls super().astype now.

if not copy:
return self
elif copy:
return self.copy()
else:
return super().astype(dtype, copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this to the base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, tried it and it worked, seems logically consistent. Going to also look to see if I can drop the checks elsewhere by leaving it to the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem so, without doing more restructuring than I'm comfortable with ATM.


def isna(self) -> np.ndarray:
return isna(self._ndarray)

Expand Down
6 changes: 5 additions & 1 deletion pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,11 @@ def astype(self, dtype, copy: bool = True):
# We handle Period[T] -> Period[U]
# Our parent handles everything else.
dtype = pandas_dtype(dtype)

if dtype == self._dtype:
if not copy:
return self
elif copy:
return self.copy()
if is_period_dtype(dtype):
return self.asfreq(dtype.freq)
return super().astype(dtype, copy=copy)
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/arrays/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -1063,6 +1063,11 @@ def astype(self, dtype=None, copy=True):
IntIndex
Indices: array([2, 3], dtype=int32)
"""
if dtype == self._dtype:
if not copy:
return self
elif copy:
return self.copy()
dtype = self.dtype.update_dtype(dtype)
subtype = dtype._subtype_with_str
# TODO copy=False is broken for astype_nansafe with int -> float, so cannot
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,11 @@ def fillna(self, value=None, method=None, limit=None):
return super().fillna(value, method, limit)

def astype(self, dtype, copy=True):
if dtype == self.dtype:
if not copy:
return self
elif copy:
return self.copy()
dtype = pandas_dtype(dtype)
if isinstance(dtype, StringDtype):
if copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can remove the new code, as you can see on this line and the two lines below, the logic is already present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; must've not been paying attention.

Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/extension/base/casting.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import numpy as np
import pytest

import pandas as pd
from pandas.core.internals import ObjectBlock
Expand Down Expand Up @@ -56,3 +57,9 @@ def test_astype_empty_dataframe(self, dtype):
df = pd.DataFrame()
result = df.astype(dtype)
self.assert_frame_equal(result, df)

@pytest.mark.parametrize("copy", [True, False])
def test_astype_own_type(self, data, copy):
result = data.astype(data.dtype, copy=copy)
assert (result is data) is (not copy)
self.assert_extension_array_equal(result, data)
5 changes: 5 additions & 0 deletions pandas/tests/extension/decimal/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ def copy(self):
return type(self)(self._data.copy())

def astype(self, dtype, copy=True):
if dtype == self._dtype:
if not copy:
return self
elif copy:
return self.copy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Is this implemented correctly? I'm concerned because of the way returning a copy is implemented on line 141, which seems to take into account dtype.context. Maybe the fix would be to remove lines 137-138 and let that case fall to 141-142?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct, yes. The first is_dtype_equal(dtype, self._dtype) will only catch decimal dtypes with an equal context, and then the if isinstance(dtype, type(self.dtype)): will work correctly in case the context differs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but we might need to pass through the copy

(anyway, this array implementation is only for testing, it's not actually used, so it's not super important)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy enough to check, so I changed it to pass through the copy. It passes the test suite the same way, and seems right, so I'll go with that version.

dtype = pandas_dtype(dtype)
if isinstance(dtype, type(self.dtype)):
return type(self)(self._data, context=dtype.context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return type(self)(self._data, context=dtype.context)
return type(self)(self._data, copy=copy, context=dtype.context)

Expand Down