Skip to content

DOC: Updating pandas.Interval docstring #20057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 14, 2018
Merged

DOC: Updating pandas.Interval docstring #20057

merged 11 commits into from
Mar 14, 2018

Conversation

akielbowicz
Copy link
Contributor

Docstring assigned to the Buenos Aires chapter for the sprint.

Hi, I have a few doubts about the changes.

First about the type of the parameters left and right, it's ok to refer to them as orderable?
In the implementation, the constructor takes any pair of objects that satisfy left <= right and the comparison returns a boolean value.
I tried to pass two numpy arrays and it breaks because of ambiguity, but still it's so general that you can even make Intervals of Intervals, sets or booleans. And I believe this is too permissive and prone to error.

So what it's better, put a warning about this or add an example that is a little more complex and uses this capability (I tried to think one, but nothing came up to my mind).

And last, validate_docstrings.py returned this error

2341, in _signature_from_callable 'no signature found for builtin type {!r}'.format(obj)) ValueError: no signature found for builtin type <class 'pandas._libs.interval.Interval'>

Thanks, happy to collaborate.

PS: Given that this is a Cython file I needed to recompile the project to see the changes in the documentation. I mention this because is not in the sprint guide and I don't know if I will able to add it before the event.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added some comments

Left bound for the interval
right : value
Right bound for the interval
left : orderable object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 'scalar' instead of object? To make clear an array-like is not accepted?


Notes
-----
You must be able to compare the parameters **left** and **right**,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use single backticks to refer to the keywords? (like `left`), and double backticks for the code snippet on the line below ?

The examples below show how you can build Intervals of various types
of elements. In particular a numeric interval, a time interval and
finally a string interval. Given an Interval you can check whether
an element belongs to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it can be easier to follow to put some of those explanations between the examples, instead of first all explanation and then all examples.

@jorisvandenbossche
Copy link
Member

And last, validate_docstrings.py returned this error

2341, in _signature_from_callable 'no signature found for builtin type {!r}'.format(obj)) ValueError: no signature found for builtin type <class 'pandas._libs.interval.Interval'>

Can you try the latest version? I think this should be fixed now.
You can update this branch by doing:

git fetch upstream
git merge upstream/master

and then try the script again.

Given that this is a Cython file I needed to recompile the project to see the changes in the documentation. I mention this because is not in the sprint guide and I don't know if I will able to add it before the event.

Ah, indeed, that's a good catch. That's something we need to add to the guide.

@jorisvandenbossche
Copy link
Member

So what it's better, put a warning about this or add an example that is a little more complex and uses this capability (I tried to think one, but nothing came up to my mind).

I would personally give 'good examples', as you did now. And the note you added about left <= right needing to work seems sufficient to me.

>>> year_2017 = pd.Interval(pd.Timestamp('2017-01-01'),
... pd.Timestamp('2017-12-31'), closed='both')
>>> pd.Timestamp('2017-01-01 00:00') in year_2017
True
>>> year_2017.length
Timedelta('364 days 00:00:00')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to change year_2017 to

year_2017 = pd.Interval(pd.Timestamp('2017-01-01'),
                        pd.Timestamp('2018-01-01'), closed='left')

That seems like a more accurate representation of a year, since the current version doesn't include anything beyond 2017-12-31 00:00:00, e.g. 2017-12-31 12:00:00 isn't included in year_2017 as currently written. This change would also result in year_2017.length being 365 days, which should be more intuitive for users.

@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

Merging #20057 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20057      +/-   ##
==========================================
+ Coverage    91.7%   91.73%   +0.02%     
==========================================
  Files         150      150              
  Lines       49168    49168              
==========================================
+ Hits        45090    45102      +12     
+ Misses       4078     4066      -12
Flag Coverage Δ
#multiple 90.11% <ø> (+0.02%) ⬆️
#single 41.86% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 66.81% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df2e361...4682386. Read the comment docs.

@akielbowicz
Copy link
Contributor Author

Thanks for the comments, I tried to improve it following your guidelines.

@jorisvandenbossche
Copy link
Member

@jschendel could you give this another look?


Examples
--------
It is possible to build Intervals of different types, like numeric ones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add that left & right must be of the same type. (add to Notes)


>>> volume_1 = pd.Interval('Ant', 'Dog', closed='both')
>>> 'Bee' in volume_1
True

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add Period here as well (its pretty much a time based interval)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add a couple of points here: Interval is basically a more generic Period, in that Period requires a frequency and Interval does not. The trade-off being that Period has additional frequency specific functionality that Interval does not.

Copy link
Member

@jschendel jschendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM pending @jreback's comments. Added one comment regarding an optional potential addition that could be made.

>>> extended_iv = iv * 10.0
>>> extended_iv
Interval(0.0, 50.0, closed='right')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Could maybe add that an interval does not necessarily need to be bounded, e.g. pd.Interval(-np.inf, 5). Not sure if there's a ton of utility in adding this though.

@jorisvandenbossche jorisvandenbossche merged commit 921fefb into pandas-dev:master Mar 14, 2018
@jorisvandenbossche
Copy link
Member

@akielbowicz Thanks a lot!

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants