Skip to content

Fix Series construction with dtype=str #20401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1035,6 +1035,7 @@ Reshaping
- Bug in :class:`Series` constructor with ``Categorical`` where a ```ValueError`` is not raised when an index of different length is given (:issue:`19342`)
- Bug in :meth:`DataFrame.astype` where column metadata is lost when converting to categorical or a dictionary of dtypes (:issue:`19920`)
- Bug in :func:`cut` and :func:`qcut` where timezone information was dropped (:issue:`19872`)
- Bug in :class:`Series` constructor with a ``dtype=str``, previously raised in some cases (:issue:`19853`)

Other
^^^^^
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -4059,9 +4059,10 @@ def _try_cast(arr, take_fast_path):
if issubclass(subarr.dtype.type, compat.string_types):
# GH 16605
# If not empty convert the data to dtype
if not isna(data).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need all this, just change to use np.any

In [3]: np.any(pd.isna(''))
Out[3]: False

data = np.array(data, dtype=dtype, copy=False)

subarr = np.array(data, dtype=object, copy=copy)
# GH 19853: If data is a scalar, subarr has already the result
if not np.isscalar(data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but is this still an extra call here? do we need the scalar check? (and should be is_scalar anyhow if its needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a check indeed. That was the problem from the beginning.
If it is scalar, subarr has already the correct result.
I'll change it to is_scalar

if not np.all(isna(data)):
data = np.array(data, dtype=dtype, copy=False)
subarr = np.array(data, dtype=object, copy=copy)

return subarr
5 changes: 5 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,11 @@ def test_constructor_empty(self, input_class):
empty2 = Series(input_class(), index=lrange(10), dtype='float64')
assert_series_equal(empty, empty2)

# GH 19853 : with empty string, index and dtype str
empty = Series('', dtype=str, index=range(3))
empty2 = Series('', index=range(3))
assert_series_equal(empty, empty2)

@pytest.mark.parametrize('input_arg', [np.nan, float('nan')])
def test_constructor_nan(self, input_arg):
empty = Series(dtype='float64', index=lrange(10))
Expand Down