Skip to content

BUG: rolling aggregate() with a list of functions along axis 1 raises ValueError #46132 #46892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,7 @@ Groupby/resample/rolling
- Bug in :meth:`Rolling.var` and :meth:`Rolling.std` would give non-zero result with window of same values (:issue:`42064`)
- Bug in :meth:`.Rolling.var` would segfault calculating weighted variance when window size was larger than data size (:issue:`46760`)
- Bug in :meth:`Grouper.__repr__` where ``dropna`` was not included. Now it is (:issue:`46754`)
- Bug in :meth:`Rolling.aggregate` raise ValueError when ``func`` was a list of functions and ``axis=1`` (:issue:`46132`)

Reshaping
^^^^^^^^^
Expand Down
15 changes: 14 additions & 1 deletion pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -658,9 +658,22 @@ def _numba_apply(
return self._resolve_output(out, obj)

def aggregate(self, func, *args, **kwargs):
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
# GH46132
_axis_modifed_flag = False
if self.axis == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not using the _get_data_to_aggregate pattern here? this seems like making yet another way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to write another function or use some methods in some python class to get the data ready for aggregation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_data_to_aggregate pattern was never implemented on rolling, just groupby.

Maybe instead @xr-chen, could you try to investigate assigning ResamplerWindowApply's self.axis attribute like GroupByApply fixes this instead?

self.axis = obj.obj._get_axis_number(kwargs.get("axis", 0))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. I will have a try.

Copy link
Contributor Author

@xr-chen xr-chen May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to initialize ResamperWindowApply's self.axis like in GroupByApply, but that doesn't work. The agg_list_like method of Apply class behaves the same whatever the self.axis is, and it still calls obj._gotitem with subset as a Series and self.axis=1 which finally causes an error. @mroeschke

self.obj = self.obj.T
_axis_modifed_flag = True
self.axis = 0
try:
result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add an axis argument here and thread it thru. this state change is not at all obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just would like to make sure if you want me to make this comment more clear? And I notice that in PR #47078, it says rolling window doesn't support axis = 1 anymore and that caused this change didn't pass the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure what you mean. I don't really like this approach because its hiding the axis state. What I am suggesting is we actually fix this by passing thru the axis argument to do this properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the agg method of ResamplerWindowApply hasn't been implemented, yet, and it will call the agg method of Apply class which can't recognize that we want to agg on axis=1 instead of axis=0. Even though I am not entirely familiar with pandas, I think passing the axis argument could not help.

finally:
if _axis_modifed_flag:
self.axis = 1
self.obj = self.obj.T
if result is None:
return self.apply(func, raw=False, args=args, kwargs=kwargs)
if _axis_modifed_flag:
result = result.T
return result

agg = aggregate
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/window/test_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1851,6 +1851,22 @@ def test_rolling_var_same_value_count_logic(values, window, min_periods, expecte
tm.assert_series_equal(expected == 0, result_std == 0)


def test_rolling_agg_on_columns():
# GH 46132

df = DataFrame({"a": [1, 3], "b": [2, 4]})
res = df.rolling(window=2, axis=1, min_periods=1).aggregate([np.sum, np.mean])
expected_val = np.array([[1, 3.0], [1, 1.5], [3, 7], [3, 3.5]])

expected_index = MultiIndex.from_tuples(
[(0, "sum"), (0, "mean"), (1, "sum"), (1, "mean")]
)

expected_frame = DataFrame(expected_val, index=expected_index, columns=["a", "b"])

tm.assert_frame_equal(expected_frame, res)


def test_rolling_mean_sum_floating_artifacts():
# GH 42064.

Expand All @@ -1860,3 +1876,18 @@ def test_rolling_mean_sum_floating_artifacts():
assert (result[-3:] == 0).all()
result = r.sum()
assert (result[-3:] == 0).all()


def test_rolling_agg_when_agg_fail():
# GH 46132
df = DataFrame({"a": [1, 3], "b": [2, 4]})
win = df.rolling(window=2, axis=1, min_periods=1)
try:
win.aggregate([np.log, np.sqrt])
except ValueError:
pass
tm.assert_frame_equal(win.obj, df)
# make sure if aggregate fails the attribute of Rolling/Window will not be change
assert win.axis == 1
win.aggregate([np.mean, np.sum])
assert win.axis == 1