-
-
Notifications
You must be signed in to change notification settings - Fork 143
GH456 First attempt GroupBy.transform improved typing #1242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think it won't close 456, because that was also about agg
.
For
is there a convention to name the TypeAlias or should we keep the same name as in the pandas code? |
Probably shouldn't use the same name, because then the types won't match. Not that anyone should be importing that anyway. So keep the same name and append |
pandas-stubs/core/groupby/base.pyi
Outdated
|
||
@dataclasses.dataclass(order=True, frozen=True) | ||
class OutputKey: | ||
label: Hashable | ||
position: int | ||
|
||
reduction_kernels: TypeAlias = Literal[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented elsewhere, lets call this reduction_kernels_type
so that the types of the objects line up.
tests/test_series.py
Outdated
# type of `sum` not well inferred by mypy | ||
return sum(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use s.sum()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was if I passed sum
the builtin one in the aggregate method directly, here is does not change anything.
tests/test_series.py
Outdated
return s.astype(float).min() | ||
|
||
s = pd.Series([1, 2, 3, 4]) | ||
s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you want a check(assert_type(...
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct my mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the inference on the fly of lambdas is not super clear so you need to define the function on the side to have the right types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is an issue with lambda functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think you can have a test of
check(assert_type( s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()), pd.Series), pd.Series)
which would be worthwhile
tests/test_series.py
Outdated
# type of `sum` not well inferred by mypy | ||
return sum(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use s.sum()
check( | ||
assert_type(s.groupby(level=0).agg([min, sum]), pd.DataFrame), pd.DataFrame | ||
) | ||
|
||
|
||
def test_types_groupby_transform() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add tests for two of the string transform arguments (e.g., "mean", "first")
def aggregate( | ||
self, | ||
func: AggFuncTypeBase | None = ..., | ||
/, | ||
*args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this overload, you could add this overload:
@overload
def aggregate(
self,
func: Callable[[Series], S2],
*args,
engine: WindowingEngine = ...,
engine_kwargs: WindowingEngineKwargs = ...,
**kwargs,
) -> Series[S2]: ...
Then you know that if you start with a Series
with a known type, then the return type would be inferred from the callable. And it works with a lambda function, e.g.:
s = pd.Series([1, 2, 3, 4])
q = s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
In this case, q
would have type Series[float]
, which is what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because the type of new_func
isn't clear.
But I think it would work if you did check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), "pd.Series[int]"), pd.Series, int)
Because then it can know that x
is a Series[int]
and that the lambda
becomes Series[int]
Can you try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that for the last push, see
pandas-stubs/tests/test_series.py
Line 1167 in f9863d0
check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), "pd.Series[float]"), pd.Series, int) |
It fails in all CI:
===========================================
Beginning: 'Run mypy on 'tests' (using the local stubs) and on the local stubs'
===========================================
tests/test_series.py:1167: error: Expression is of type "Series[Any]", not "Series[float]" [assert-type]
Found 1 error in 1 file (checked 224 source files)
assert_type()
to assert the type of any return value