Skip to content

API: stop special-casing SparseArray._quantile #49583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Other API changes
- Passing ``dtype`` of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
- Changed behavior of :meth:`Series.quantile` and :meth:`DataFrame.quantile` with :class:`SparseDtype` to retain sparse dtype (:issue:`49583`)
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
- :meth:`Series.unique` with dtype "timedelta64[ns]" or "datetime64[ns]" now returns :class:`TimedeltaArray` or :class:`DatetimeArray` instead of ``numpy.ndarray`` (:issue:`49176`)
-
Expand Down
24 changes: 0 additions & 24 deletions pandas/core/arrays/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@

from pandas.core import arraylike
import pandas.core.algorithms as algos
from pandas.core.array_algos.quantile import quantile_with_mask
from pandas.core.arraylike import OpsMixin
from pandas.core.arrays import ExtensionArray
from pandas.core.arrays.sparse.dtype import SparseDtype
Expand Down Expand Up @@ -927,29 +926,6 @@ def value_counts(self, dropna: bool = True) -> Series:
index = keys
return Series(counts, index=index)

def _quantile(self, qs: npt.NDArray[np.float64], interpolation: str):

if self._null_fill_value or self.sp_index.ngaps == 0:
# We can avoid densifying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this optimization preventing the original sparse dtype from remaining? If not, I imagine this still being nice not to densify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparseArray.__array__ already doesn't densify in this case.

npvalues = self.sp_values
mask = np.zeros(npvalues.shape, dtype=bool)
else:
npvalues = self.to_numpy()
mask = self.isna()

fill_value = na_value_for_dtype(npvalues.dtype, compat=False)
res_values = quantile_with_mask(
npvalues,
mask,
fill_value,
qs,
interpolation,
)

# Special case: the returned array isn't _really_ sparse, so we don't
# wrap it in a SparseArray
return res_values

# --------
# Indexing
# --------
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/methods/test_quantile.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def test_quantile_sparse(self, df, expected):
# GH#17198
# GH#24600
result = df.quantile()

expected = expected.astype("Sparse[float]")
tm.assert_series_equal(result, expected)

def test_quantile(
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/series/methods/test_quantile.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ def test_quantile_nat(self):
def test_quantile_sparse(self, values, dtype):
ser = Series(values, dtype=dtype)
result = ser.quantile([0.5])
expected = Series(np.asarray(ser)).quantile([0.5])
expected = Series(np.asarray(ser)).quantile([0.5]).astype("Sparse[float]")
tm.assert_series_equal(result, expected)

def test_quantile_empty(self):
Expand Down