Skip to content

Center rolling window for time offset #38780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
11c1bb7
update syntax for pandas style
adamamiller Sep 3, 2020
3cac891
Merge branch 'master' of https://github.com/pandas-dev/pandas into ce…
adamamiller Sep 3, 2020
c73ffc4
fix syntax error
adamamiller Sep 3, 2020
58a8ebf
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Dec 29, 2020
dca9d04
reintroduce calculate_center_offset as private function
luholacc Dec 29, 2020
37cb6fa
fix double declaration of index_growth_sign
luholacc Dec 29, 2020
81e0e4e
apply review suggestions
luholacc Dec 30, 2020
463c7f0
remove unneccessary exception
luholacc Dec 30, 2020
321f07c
add method "_center_window" to class BaseWindow
luholacc Jan 5, 2021
9270bab
use spaces around operators
luholacc Jan 5, 2021
dd33e32
remove unneccessary calculations from rolling.py, tests work again
luholacc Jan 6, 2021
6e4da84
remove "test_invalid_center_datetimelike"
luholacc Jan 7, 2021
e0966e8
remove white spaces and TODO
luholacc Jan 7, 2021
b82f514
remove unnecessary lines
luholacc Jan 29, 2021
18a7b5b
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Jan 29, 2021
9c4cc58
change existing test to cover new case
luholacc Jan 29, 2021
c27f50e
adapt test parameters
luholacc Jan 29, 2021
abaa43b
add center testing to test_closed_fixed
sevberg Jan 29, 2021
dc046da
clean test_closed_fixed_binary_col
sevberg Jan 29, 2021
95e3f26
fix formatting and rename `center_window` to `center`
luholacc Feb 1, 2021
8d582a1
define len of test data dynamically
luholacc Feb 1, 2021
8d5a55c
move if-statement back into two lines (failed before)
luholacc Feb 1, 2021
525cc69
remove hard-coded center
sevberg Feb 1, 2021
6ac79b9
remove fixture
sevberg Feb 1, 2021
9bf6ce3
use `center` fixture for `test_closed_fixed_binary_col`
luholacc Feb 1, 2021
4f98fc5
add `center` testing to `test_rolling_window_as_string`
luholacc Feb 1, 2021
e5ae3b2
correct expected data and remove debug prints
luholacc Feb 2, 2021
d106940
align ddof usage of rolling sem with nanops nansem
sevberg Feb 2, 2021
d4f6d22
explicitly test centered datetimelike windows
sevberg Feb 2, 2021
c2a7333
correct sem test
sevberg Feb 2, 2021
73313e6
align rolling.sem with nanops.nansem
sevberg Feb 2, 2021
92f8992
fix test
sevberg Feb 2, 2021
648d2d3
revert ddof behavior
sevberg Feb 3, 2021
278d33f
revert sem test
sevberg Feb 3, 2021
5e50f36
side-step ddof bug
sevberg Feb 3, 2021
c11cf15
fix black failure
sevberg Feb 3, 2021
2e3f875
disable black
sevberg Feb 3, 2021
f63309b
fix missing datetimelike word
sevberg Feb 3, 2021
9f76a41
update whatsnew
sevberg Feb 3, 2021
f05ed61
add to enhancements
sevberg Feb 3, 2021
6fbd080
Merge remote-tracking branch 'upstream/master' into center_window
sevberg Feb 3, 2021
dff6942
Merge branch 'master' of https://github.com/pandas-dev/pandas into ce…
Zaubeerer Feb 10, 2021
0520e18
trim trailing whitespaces
luholacc Feb 25, 2021
a087a6b
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Mar 29, 2021
5b9b8ff
correct `too many blank lines`
luholacc Mar 29, 2021
fca3b4d
fix wrong var name after merge
luholacc Mar 29, 2021
6c1c58a
remove unused `type: ignore`
luholacc Mar 31, 2021
b7e5035
Merge branch 'master' of github.com:pandas-dev/pandas into center_window
luholacc Mar 31, 2021
f44c6e6
fix datatype in docstring (now bool)
luholacc Apr 6, 2021
3dcad64
dd parametrize for window_selections
luholgit Apr 7, 2021
0f9f6df
black formatting
luholgit Apr 7, 2021
315b320
parametrize "test_rolling_window_as_string" outside of function
luholgit Apr 7, 2021
f7d1110
add whatsnew note
luholgit Apr 7, 2021
fc88ae4
remove center=False, is already default
luholgit Apr 7, 2021
e07a1f2
remove prompts and output from whatsnew
luholgit Apr 7, 2021
cefbb16
remove `in` and `out` annotations
luholgit Apr 8, 2021
edbfd21
add datetime-like center example
luholgit Apr 8, 2021
bfc0f0d
add test for different behavior of window alignment
luholgit Apr 8, 2021
43e04ed
additional output in centering example
luholgit Apr 9, 2021
1e724dc
additional output in centering example and comparison
luholgit Apr 9, 2021
47a3b14
add version tag 1.3
luholgit Apr 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 37 additions & 7 deletions pandas/_libs/window/indexers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def calculate_variable_window_bounds(
int64_t num_values,
int64_t window_size,
object min_periods, # unused but here to match get_window_bounds signature
object center, # unused but here to match get_window_bounds signature
bint center,
object closed,
const int64_t[:] index
):
Expand All @@ -30,7 +30,7 @@ def calculate_variable_window_bounds(
ignored, exists for compatibility

center : object
ignored, exists for compatibility
center the rolling window on the current observation

closed : str
string of side of the window that should be closed
Expand All @@ -43,7 +43,9 @@ def calculate_variable_window_bounds(
(ndarray[int64], ndarray[int64])
"""
cdef:
bint left_closed = False, right_closed = False
bint left_closed = False
bint right_closed = False
bint center_window = False
ndarray[int64_t, ndim=1] start, end
int64_t start_bound, end_bound, index_growth_sign = 1
Py_ssize_t i, j
Expand All @@ -60,6 +62,8 @@ def calculate_variable_window_bounds(

if index[num_values - 1] < index[0]:
index_growth_sign = -1
if center:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sine we typed center as bint. This isnt needed anymore

center_window = True

start = np.empty(num_values, dtype='int64')
start.fill(-1)
Expand All @@ -74,14 +78,27 @@ def calculate_variable_window_bounds(
# right endpoint is open
else:
end[0] = 0
if center_window:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be center right (and elsewhere)?

for j in range(0, num_values+1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range(num_values + 1)

if (index[j] == index[0] + index_growth_sign*window_size/2 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use spaces around operators & parens to format this (we don't lint these files automatically)

right_closed):
end[0] = j+1
break
elif index[j] >= index[0] + index_growth_sign * window_size/2:
end[0] = j
break

with nogil:

# start is start of slice interval (including)
# end is end of slice interval (not including)
for i in range(1, num_values):
end_bound = index[i]
start_bound = index[i] - index_growth_sign * window_size
if center_window:
end_bound = index[i] + index_growth_sign * window_size/2
start_bound = index[i] - index_growth_sign * window_size/2
else:
end_bound = index[i]
start_bound = index[i] - index_growth_sign * window_size

# left endpoint is closed
if left_closed:
Expand All @@ -95,14 +112,27 @@ def calculate_variable_window_bounds(
start[i] = j
break

# for centered window advance the end bound until we are
# outside the constraint
if center_window:
for j in range(end[i - 1], num_values+1):
if j == num_values:
end[i] = j
elif ((index[j] - end_bound) * index_growth_sign == 0 and
right_closed):
end[i] = j+1
break
elif (index[j] - end_bound) * index_growth_sign >= 0:
end[i] = j
break
# end bound is previous end
# or current index
if (index[end[i - 1]] - end_bound) * index_growth_sign <= 0:
elif (index[end[i - 1]] - end_bound) * index_growth_sign <= 0:
end[i] = i + 1
else:
end[i] = end[i - 1]

# right endpoint is open
if not right_closed:
if not right_closed and not center_window:
end[i] -= 1
return start, end
49 changes: 41 additions & 8 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,9 +334,12 @@ def _get_window_indexer(self) -> BaseIndexer:
"""
if isinstance(self.window, BaseIndexer):
return self.window

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

if self._win_freq_i8 is not None:
return VariableWindowIndexer(
index_array=self._index_array, window_size=self._win_freq_i8
index_array=self._index_array,
window_size=self._win_freq_i8,
center=self.center,
)
return FixedWindowIndexer(window_size=self.window)

Expand Down Expand Up @@ -450,7 +453,34 @@ def homogeneous_func(values: np.ndarray):
if values.size == 0:
return values.copy()

def _calculate_center_offset(window) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this code below should be needed.

The center calculation should just adjust the indexers return in calculate_variable_window_bounds and everything else should work correctly.

"""
Calculate an offset necessary to have the window label to be centered.

Parameters
----------
window: ndarray or int
window weights or window

Returns
-------
int
"""
if not is_integer(window):
window = len(window)
return int((window - 1) / 2.0)

offset = (
_calculate_center_offset(self.window)
if self.center
and not isinstance(self._get_window_indexer(), VariableWindowIndexer)
else 0
)
additional_nans = np.array([np.nan] * offset)

def calc(x):
x = np.concatenate((x, additional_nans))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this white space?

start, end = window_indexer.get_window_bounds(
num_values=len(x),
min_periods=min_periods,
Expand All @@ -469,6 +499,15 @@ def calc(x):
if numba_cache_key is not None:
NUMBA_FUNC_CACHE[numba_cache_key] = func

# TODO: is this equivalent to the above?
# if use_numba_cache:
# NUMBA_FUNC_CACHE[(kwargs["original_func"], "rolling_apply")] = func

if self.center and not isinstance(
self._get_window_indexer(), VariableWindowIndexer
):
result = self._center_window(result, self.window)

return result

if self.method == "single":
Expand Down Expand Up @@ -1880,20 +1919,14 @@ def validate(self):
# we allow rolling on a datetimelike index
if (
self.obj.empty
# TODO: add "or self.is_datetimelike"?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this TODO

or isinstance(
self._on, (ABCDatetimeIndex, ABCTimedeltaIndex, ABCPeriodIndex)
)
) and isinstance(self.window, (str, BaseOffset, timedelta)):

self._validate_monotonic()

# we don't allow center
if self.center:
raise NotImplementedError(
"center is not implemented for "
"datetimelike and offset based windows"
)

# this will raise ValueError on non-fixed freqs
try:
freq = to_offset(self.window)
Expand Down