-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: groupby.agg returns incorrect results for uint64 cols (#26310) #26359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -313,3 +313,16 @@ def test_order_aggregate_multiple_funcs(): | |
expected = pd.Index(['sum', 'max', 'mean', 'ohlc', 'min']) | ||
|
||
tm.assert_index_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize('dtype', [np.int64, np.uint64]) | ||
@pytest.mark.parametrize('how', ['first', 'last', 'min', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just out of curiosity how did you select these methods? The issue would cover more than just these right? Just asking as at some point we should probably add a shared fixture for the different aggregation / transformation methods. Probably a separate PR but just curious if this subset is intentional There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This subset is not intentional. I tried to select a set that would satisfy the requests by the other reviewer. Since this PR alters a pretty generic method, I'm sure you could test for a way larger set of methods. It's just that I don't know the codebase well enough to design such tests. |
||
'max', 'mean', 'median']) | ||
def test_uint64_type_handling(dtype, how): | ||
# GH 26310 | ||
df = pd.DataFrame({'x': 6903052872240755750, 'y': [1, 2]}) | ||
expected = df.groupby('y').agg({'x': how}) | ||
df.x = df.x.astype(dtype) | ||
result = df.groupby('y').agg({'x': how}) | ||
result.x = result.x.astype(np.int64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add test coverage for a value that exceeds the upper limit of an int64? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not immediately clear to me what we would be testing for in that case. Issue #26310 is basically that there is some method If the value of If I have misunderstood, please elaborate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem here is going to be with unsigned integers in the range of 263 through 264-1 (range where uint64 exceeds int64). I'm not sure if that would even work with these agg functions and don't want to open up a Pandora's box here but if that doesn't work we might just need to reword the whatsnew There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest rewording whatsnew here and dealing with that in another pull request. What do you think @WillAyd? How would you like the whatsnew to be worded? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, input uint64's should work with your code change, but @mahepe ok to make an issue about this. |
||
tm.assert_frame_equal(result, expected, check_exact=True) |
Uh oh!
There was an error while loading. Please reload this page.