Skip to content

Center rolling window for time offset #38780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2021
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
11c1bb7
update syntax for pandas style
adamamiller Sep 3, 2020
3cac891
Merge branch 'master' of https://github.com/pandas-dev/pandas into ce…
adamamiller Sep 3, 2020
c73ffc4
fix syntax error
adamamiller Sep 3, 2020
58a8ebf
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Dec 29, 2020
dca9d04
reintroduce calculate_center_offset as private function
luholacc Dec 29, 2020
37cb6fa
fix double declaration of index_growth_sign
luholacc Dec 29, 2020
81e0e4e
apply review suggestions
luholacc Dec 30, 2020
463c7f0
remove unneccessary exception
luholacc Dec 30, 2020
321f07c
add method "_center_window" to class BaseWindow
luholacc Jan 5, 2021
9270bab
use spaces around operators
luholacc Jan 5, 2021
dd33e32
remove unneccessary calculations from rolling.py, tests work again
luholacc Jan 6, 2021
6e4da84
remove "test_invalid_center_datetimelike"
luholacc Jan 7, 2021
e0966e8
remove white spaces and TODO
luholacc Jan 7, 2021
b82f514
remove unnecessary lines
luholacc Jan 29, 2021
18a7b5b
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Jan 29, 2021
9c4cc58
change existing test to cover new case
luholacc Jan 29, 2021
c27f50e
adapt test parameters
luholacc Jan 29, 2021
abaa43b
add center testing to test_closed_fixed
sevberg Jan 29, 2021
dc046da
clean test_closed_fixed_binary_col
sevberg Jan 29, 2021
95e3f26
fix formatting and rename `center_window` to `center`
luholacc Feb 1, 2021
8d582a1
define len of test data dynamically
luholacc Feb 1, 2021
8d5a55c
move if-statement back into two lines (failed before)
luholacc Feb 1, 2021
525cc69
remove hard-coded center
sevberg Feb 1, 2021
6ac79b9
remove fixture
sevberg Feb 1, 2021
9bf6ce3
use `center` fixture for `test_closed_fixed_binary_col`
luholacc Feb 1, 2021
4f98fc5
add `center` testing to `test_rolling_window_as_string`
luholacc Feb 1, 2021
e5ae3b2
correct expected data and remove debug prints
luholacc Feb 2, 2021
d106940
align ddof usage of rolling sem with nanops nansem
sevberg Feb 2, 2021
d4f6d22
explicitly test centered datetimelike windows
sevberg Feb 2, 2021
c2a7333
correct sem test
sevberg Feb 2, 2021
73313e6
align rolling.sem with nanops.nansem
sevberg Feb 2, 2021
92f8992
fix test
sevberg Feb 2, 2021
648d2d3
revert ddof behavior
sevberg Feb 3, 2021
278d33f
revert sem test
sevberg Feb 3, 2021
5e50f36
side-step ddof bug
sevberg Feb 3, 2021
c11cf15
fix black failure
sevberg Feb 3, 2021
2e3f875
disable black
sevberg Feb 3, 2021
f63309b
fix missing datetimelike word
sevberg Feb 3, 2021
9f76a41
update whatsnew
sevberg Feb 3, 2021
f05ed61
add to enhancements
sevberg Feb 3, 2021
6fbd080
Merge remote-tracking branch 'upstream/master' into center_window
sevberg Feb 3, 2021
dff6942
Merge branch 'master' of https://github.com/pandas-dev/pandas into ce…
Zaubeerer Feb 10, 2021
0520e18
trim trailing whitespaces
luholacc Feb 25, 2021
a087a6b
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Mar 29, 2021
5b9b8ff
correct `too many blank lines`
luholacc Mar 29, 2021
fca3b4d
fix wrong var name after merge
luholacc Mar 29, 2021
6c1c58a
remove unused `type: ignore`
luholacc Mar 31, 2021
b7e5035
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Mar 31, 2021
f44c6e6
fix datatype in docstring (now bool)
luholacc Apr 6, 2021
3dcad64
dd parametrize for window_selections
luholgit Apr 7, 2021
0f9f6df
black formatting
luholgit Apr 7, 2021
315b320
parametrize "test_rolling_window_as_string" outside of function
luholgit Apr 7, 2021
f7d1110
add whatsnew note
luholgit Apr 7, 2021
fc88ae4
remove center=False, is already default
luholgit Apr 7, 2021
e07a1f2
remove prompts and output from whatsnew
luholgit Apr 7, 2021
cefbb16
remove `in` and `out` annotations
luholgit Apr 8, 2021
edbfd21
add datetime-like center example
luholgit Apr 8, 2021
bfc0f0d
add test for different behavior of window alignment
luholgit Apr 8, 2021
43e04ed
additional output in centering example
luholgit Apr 9, 2021
1e724dc
additional output in centering example and comparison
luholgit Apr 9, 2021
47a3b14
add version tag 1.3
luholgit Apr 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 34 additions & 7 deletions pandas/_libs/window/indexers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def calculate_variable_window_bounds(
int64_t num_values,
int64_t window_size,
object min_periods, # unused but here to match get_window_bounds signature
object center, # unused but here to match get_window_bounds signature
bint center,
object closed,
const int64_t[:] index
):
Expand All @@ -30,7 +30,7 @@ def calculate_variable_window_bounds(
ignored, exists for compatibility

center : object
ignored, exists for compatibility
center the rolling window on the current observation

closed : str
string of side of the window that should be closed
Expand All @@ -43,7 +43,8 @@ def calculate_variable_window_bounds(
(ndarray[int64], ndarray[int64])
"""
cdef:
bint left_closed = False, right_closed = False
bint left_closed = False
bint right_closed = False
ndarray[int64_t, ndim=1] start, end
int64_t start_bound, end_bound, index_growth_sign = 1
Py_ssize_t i, j
Expand Down Expand Up @@ -74,14 +75,27 @@ def calculate_variable_window_bounds(
# right endpoint is open
else:
end[0] = 0
if center:
for j in range(0, num_values + 1):
if (index[j] == index[0] + index_growth_sign * window_size / 2 and
right_closed):
end[0] = j + 1
break
elif index[j] >= index[0] + index_growth_sign * window_size / 2:
end[0] = j
break

with nogil:

# start is start of slice interval (including)
# end is end of slice interval (not including)
for i in range(1, num_values):
end_bound = index[i]
start_bound = index[i] - index_growth_sign * window_size
if center:
end_bound = index[i] + index_growth_sign * window_size / 2
start_bound = index[i] - index_growth_sign * window_size / 2
else:
end_bound = index[i]
start_bound = index[i] - index_growth_sign * window_size

# left endpoint is closed
if left_closed:
Expand All @@ -95,14 +109,27 @@ def calculate_variable_window_bounds(
start[i] = j
break

# for centered window advance the end bound until we are
# outside the constraint
if center:
for j in range(end[i - 1], num_values + 1):
if j == num_values:
end[i] = j
elif ((index[j] - end_bound) * index_growth_sign == 0 and
right_closed):
end[i] = j + 1
break
elif (index[j] - end_bound) * index_growth_sign >= 0:
end[i] = j
break
# end bound is previous end
# or current index
if (index[end[i - 1]] - end_bound) * index_growth_sign <= 0:
elif (index[end[i - 1]] - end_bound) * index_growth_sign <= 0:
end[i] = i + 1
else:
end[i] = end[i - 1]

# right endpoint is open
if not right_closed:
if not right_closed and not center:
end[i] -= 1
return start, end
11 changes: 3 additions & 8 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,9 @@ def _get_window_indexer(self) -> BaseIndexer:
return self.window
if self._win_freq_i8 is not None:
return VariableWindowIndexer(
index_array=self._index_array, window_size=self._win_freq_i8
index_array=self._index_array,
window_size=self._win_freq_i8,
center=self.center,
)
return FixedWindowIndexer(window_size=self.window)

Expand Down Expand Up @@ -2044,13 +2046,6 @@ def validate(self):

self._validate_monotonic()

# we don't allow center
if self.center:
raise NotImplementedError(
"center is not implemented for "
"datetimelike and offset based windows"
)

# this will raise ValueError on non-fixed freqs
try:
freq = to_offset(self.window)
Expand Down
215 changes: 162 additions & 53 deletions pandas/tests/window/test_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,29 +127,94 @@ def test_closed_fixed(closed, arithmetic_win_operators):
df_fixed = DataFrame({"A": [0, 1, 2, 3, 4]})
df_time = DataFrame({"A": [0, 1, 2, 3, 4]}, index=date_range("2020", periods=5))

result = getattr(df_fixed.rolling(2, closed=closed, min_periods=1), func_name)()
expected = getattr(df_time.rolling("2D", closed=closed), func_name)().reset_index(
drop=True
)
result = getattr(
df_fixed.rolling(2, closed=closed, min_periods=1, center=False), func_name
)()
expected = getattr(
df_time.rolling("2D", closed=closed, min_periods=1, center=False), func_name
)().reset_index(drop=True)

tm.assert_frame_equal(result, expected)


def test_closed_fixed_binary_col():
def test_datetimelike_centered_selections(closed, arithmetic_win_operators):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you paramterize on closed (e.g. directly put your window selection up there with the closed parameter), eg.. don't use the fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this is done, too.

# GH 34315
func_name = arithmetic_win_operators
df_time = DataFrame(
{"A": [0.0, 1.0, 2.0, 3.0, 4.0]}, index=date_range("2020", periods=5)
)

if closed == "both":
window_selections = [
[True, True, False, False, False],
[True, True, True, False, False],
[False, True, True, True, False],
[False, False, True, True, True],
[False, False, False, True, True],
]
elif closed == "left":
window_selections = [
[True, False, False, False, False],
[True, True, False, False, False],
[False, True, True, False, False],
[False, False, True, True, False],
[False, False, False, True, True],
]
elif closed == "right":
window_selections = [
[True, True, False, False, False],
[False, True, True, False, False],
[False, False, True, True, False],
[False, False, False, True, True],
[False, False, False, False, True],
]
else: # closed=="neither"
window_selections = [
[True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False],
[False, False, False, False, True],
]

expected = DataFrame(
{"A": [getattr(df_time["A"].iloc[s], func_name)() for s in window_selections]},
index=date_range("2020", periods=5),
)

if func_name == "sem":
kwargs = {"ddof": 0}
else:
kwargs = {}

result = getattr(
df_time.rolling("2D", closed=closed, min_periods=1, center=True), func_name
)(**kwargs)

tm.assert_frame_equal(result, expected, check_dtype=False)


def test_closed_fixed_binary_col(center):
# GH 34315
data = [0, 1, 1, 0, 0, 1, 0, 1]
df = DataFrame(
{"binary_col": data},
index=date_range(start="2020-01-01", freq="min", periods=len(data)),
)

rolling = df.rolling(window=len(df), closed="left", min_periods=1)
result = rolling.mean()
if center:
expected_data = [2 / 3, 0.5, 0.4, 0.5, 0.428571, 0.5, 0.571429, 0.5]
else:
expected_data = [np.nan, 0, 0.5, 2 / 3, 0.5, 0.4, 0.5, 0.428571]

expected = DataFrame(
[np.nan, 0, 0.5, 2 / 3, 0.5, 0.4, 0.5, 0.428571],
expected_data,
columns=["binary_col"],
index=date_range(start="2020-01-01", freq="min", periods=len(data)),
index=date_range(start="2020-01-01", freq="min", periods=len(expected_data)),
)

rolling = df.rolling(window=len(df), closed="left", min_periods=1, center=center)
result = rolling.mean()
tm.assert_frame_equal(result, expected)


Expand Down Expand Up @@ -394,7 +459,7 @@ def test_rolling_datetime(axis_frame, tz_naive_fixture):
tm.assert_frame_equal(result, expected)


def test_rolling_window_as_string():
def test_rolling_window_as_string(center):
# see gh-22590
date_today = datetime.now()
days = date_range(date_today, date_today + timedelta(365), freq="D")
Expand All @@ -405,50 +470,94 @@ def test_rolling_window_as_string():
df = DataFrame({"DateCol": days, "metric": data})

df.set_index("DateCol", inplace=True)
result = df.rolling(window="21D", min_periods=2, closed="left")["metric"].agg("max")

expData = (
[np.nan] * 2
+ [88.0] * 16
+ [97.0] * 9
+ [98.0]
+ [99.0] * 21
+ [95.0] * 16
+ [93.0] * 5
+ [89.0] * 5
+ [96.0] * 21
+ [94.0] * 14
+ [90.0] * 13
+ [88.0] * 2
+ [90.0] * 9
+ [96.0] * 21
+ [95.0] * 6
+ [91.0]
+ [87.0] * 6
+ [92.0] * 21
+ [83.0] * 2
+ [86.0] * 10
+ [87.0] * 5
+ [98.0] * 21
+ [97.0] * 14
+ [93.0] * 7
+ [87.0] * 4
+ [86.0] * 4
+ [95.0] * 21
+ [85.0] * 14
+ [83.0] * 2
+ [76.0] * 5
+ [81.0] * 2
+ [98.0] * 21
+ [95.0] * 14
+ [91.0] * 7
+ [86.0]
+ [93.0] * 3
+ [95.0] * 20
)
result = df.rolling(window="21D", min_periods=2, closed="left", center=center)[
"metric"
].agg("max")

if center:
expected_data = (
[88.0] * 7
+ [97.0] * 9
+ [98.0]
+ [99.0] * 21
+ [95.0] * 16
+ [93.0] * 5
+ [89.0] * 5
+ [96.0] * 21
+ [94.0] * 14
+ [90.0] * 13
+ [88.0] * 2
+ [90.0] * 9
+ [96.0] * 21
+ [95.0] * 6
+ [91.0]
+ [87.0] * 6
+ [92.0] * 21
+ [83.0] * 2
+ [86.0] * 10
+ [87.0] * 5
+ [98.0] * 21
+ [97.0] * 14
+ [93.0] * 7
+ [87.0] * 4
+ [86.0] * 4
+ [95.0] * 21
+ [85.0] * 14
+ [83.0] * 2
+ [76.0] * 5
+ [81.0] * 2
+ [98.0] * 21
+ [95.0] * 14
+ [91.0] * 7
+ [86.0]
+ [93.0] * 3
+ [95.0] * 29
+ [77.0] * 2
)

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if doing this pls paramterize outside the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

expected_data = (
[np.nan] * 2
+ [88.0] * 16
+ [97.0] * 9
+ [98.0]
+ [99.0] * 21
+ [95.0] * 16
+ [93.0] * 5
+ [89.0] * 5
+ [96.0] * 21
+ [94.0] * 14
+ [90.0] * 13
+ [88.0] * 2
+ [90.0] * 9
+ [96.0] * 21
+ [95.0] * 6
+ [91.0]
+ [87.0] * 6
+ [92.0] * 21
+ [83.0] * 2
+ [86.0] * 10
+ [87.0] * 5
+ [98.0] * 21
+ [97.0] * 14
+ [93.0] * 7
+ [87.0] * 4
+ [86.0] * 4
+ [95.0] * 21
+ [85.0] * 14
+ [83.0] * 2
+ [76.0] * 5
+ [81.0] * 2
+ [98.0] * 21
+ [95.0] * 14
+ [91.0] * 7
+ [86.0]
+ [93.0] * 3
+ [95.0] * 20
)

expected = Series(
expData, index=days.rename("DateCol")._with_freq(None), name="metric"
expected_data, index=days.rename("DateCol")._with_freq(None), name="metric"
)
tm.assert_series_equal(result, expected)

Expand Down Expand Up @@ -887,7 +996,7 @@ def test_rolling_sem(frame_or_series):
result = obj.rolling(2, min_periods=1).sem()
if isinstance(result, DataFrame):
result = Series(result[0].values)
expected = Series([np.nan] + [0.707107] * 2)
expected = Series([np.nan] + [0.7071067811865476] * 2)
tm.assert_series_equal(result, expected)


Expand Down
6 changes: 0 additions & 6 deletions pandas/tests/window/test_timeseries_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,6 @@ def test_invalid_minp(self, minp):
with pytest.raises(ValueError, match=msg):
self.regular.rolling(window="1D", min_periods=minp)

def test_invalid_center_datetimelike(self):
# center is not implemented
msg = "center is not implemented for datetimelike and offset based windows"
with pytest.raises(NotImplementedError, match=msg):
self.regular.rolling(window="1D", center=True)

def test_on(self):

df = self.regular
Expand Down