-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: Updating pandas.Interval docstring #20057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Updating pandas.Interval docstring #20057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added some comments
pandas/_libs/interval.pyx
Outdated
Left bound for the interval | ||
right : value | ||
Right bound for the interval | ||
left : orderable object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 'scalar' instead of object? To make clear an array-like is not accepted?
pandas/_libs/interval.pyx
Outdated
|
||
Notes | ||
----- | ||
You must be able to compare the parameters **left** and **right**, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use single backticks to refer to the keywords? (like `left`
), and double backticks for the code snippet on the line below ?
pandas/_libs/interval.pyx
Outdated
The examples below show how you can build Intervals of various types | ||
of elements. In particular a numeric interval, a time interval and | ||
finally a string interval. Given an Interval you can check whether | ||
an element belongs to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think it can be easier to follow to put some of those explanations between the examples, instead of first all explanation and then all examples.
Can you try the latest version? I think this should be fixed now.
and then try the script again.
Ah, indeed, that's a good catch. That's something we need to add to the guide. |
I would personally give 'good examples', as you did now. And the note you added about |
pandas/_libs/interval.pyx
Outdated
>>> year_2017 = pd.Interval(pd.Timestamp('2017-01-01'), | ||
... pd.Timestamp('2017-12-31'), closed='both') | ||
>>> pd.Timestamp('2017-01-01 00:00') in year_2017 | ||
True | ||
>>> year_2017.length | ||
Timedelta('364 days 00:00:00') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to change year_2017
to
year_2017 = pd.Interval(pd.Timestamp('2017-01-01'),
pd.Timestamp('2018-01-01'), closed='left')
That seems like a more accurate representation of a year, since the current version doesn't include anything beyond 2017-12-31 00:00:00
, e.g. 2017-12-31 12:00:00
isn't included in year_2017
as currently written. This change would also result in year_2017.length
being 365 days, which should be more intuitive for users.
…val_docstring Updating the advanced branch
…val_docstring Preparing for commit
Codecov Report
@@ Coverage Diff @@
## master #20057 +/- ##
==========================================
+ Coverage 91.7% 91.73% +0.02%
==========================================
Files 150 150
Lines 49168 49168
==========================================
+ Hits 45090 45102 +12
+ Misses 4078 4066 -12
Continue to review full report at Codecov.
|
Thanks for the comments, I tried to improve it following your guidelines. |
@jschendel could you give this another look? |
pandas/_libs/interval.pyx
Outdated
|
||
Examples | ||
-------- | ||
It is possible to build Intervals of different types, like numeric ones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add that left & right must be of the same type. (add to Notes)
|
||
>>> volume_1 = pd.Interval('Ant', 'Dog', closed='both') | ||
>>> 'Bee' in volume_1 | ||
True | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add Period here as well (its pretty much a time based interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add a couple of points here: Interval
is basically a more generic Period
, in that Period
requires a frequency and Interval
does not. The trade-off being that Period
has additional frequency specific functionality that Interval
does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM pending @jreback's comments. Added one comment regarding an optional potential addition that could be made.
>>> extended_iv = iv * 10.0 | ||
>>> extended_iv | ||
Interval(0.0, 50.0, closed='right') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Could maybe add that an interval does not necessarily need to be bounded, e.g. pd.Interval(-np.inf, 5)
. Not sure if there's a ton of utility in adding this though.
@akielbowicz Thanks a lot! |
Docstring assigned to the Buenos Aires chapter for the sprint.
Hi, I have a few doubts about the changes.
First about the type of the parameters left and right, it's ok to refer to them as orderable?
In the implementation, the constructor takes any pair of objects that satisfy
left <= right
and the comparison returns a boolean value.I tried to pass two numpy arrays and it breaks because of ambiguity, but still it's so general that you can even make Intervals of Intervals, sets or booleans. And I believe this is too permissive and prone to error.
So what it's better, put a warning about this or add an example that is a little more complex and uses this capability (I tried to think one, but nothing came up to my mind).
And last, validate_docstrings.py returned this error
2341, in _signature_from_callable 'no signature found for builtin type {!r}'.format(obj)) ValueError: no signature found for builtin type <class 'pandas._libs.interval.Interval'>
Thanks, happy to collaborate.
PS: Given that this is a Cython file I needed to recompile the project to see the changes in the documentation. I mention this because is not in the sprint guide and I don't know if I will able to add it before the event.