Skip to content

DOC: update the pandas.Series/DataFrame.interpolate docstring #20270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 151 additions & 39 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5257,32 +5257,29 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial',
'polynomial', 'spline', 'piecewise_polynomial', 'pad',
'from_derivatives', 'pchip', 'akima'}
Interpolation technique to use.

* 'linear': ignore the index and treat the values as equally
* 'linear': Ignore the index and treat the values as equally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally shouldn't need periods at the end of bullet points

spaced. This is the only method supported on MultiIndexes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you added these, but generally do not put punctuation at the end of bullet points. If you get an error as a result OK to ignore

default
* 'time': interpolation works on daily and higher resolution
data to interpolate given length of interval
* 'index', 'values': use the actual numerical values of the index
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'barycentric', 'polynomial' is passed to
* 'time': Works on daily and higher resolution
data to interpolate given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have spicy.interpolate.interp1d in See Also

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the same thing here you did for 'krogh' and move some of the implementation details down to the Notes section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be single backticks no?

require that you also specify an `order` (int),
e.g. df.interpolate(method='polynomial', order=4).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better served as a dedicated example than crammed into this

These use the actual numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
are all wrappers around the scipy interpolation methods of
similar names. These use the actual numerical values of the
index. For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__
* 'from_derivatives' refers to BPoly.from_derivatives which
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
Wrappers around the scipy interpolation methods of similar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SciPy instead of scipy when referring to the package outside of code (couple other places this pops up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

names. See `Notes`.
* 'from_derivatives': Refers to
``scipy.interpolate.BPoly.from_derivatives`` which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single backtick too?

replaces 'piecewise_polynomial' interpolation method in
scipy 0.18
scipy 0.18.

.. versionadded:: 0.18.1

Expand All @@ -5291,47 +5288,162 @@ def replace(self, to_replace=None, value=None, inplace=False, limit=None,
'piecewise_polynomial' in scipy 0.18; backwards-compatible with
scipy < 0.18

axis : {0, 1}, default 0
* 0: fill column-by-column
* 1: fill row-by-row
limit : int, default None.
Maximum number of consecutive NaNs to fill. Must be greater than 0.
axis : {0 or 'index', 1 or 'columns', None}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
limit_area : {'inside', 'outside'}, default None
* None: (default) no fill restriction
* 'inside' Only fill NaNs surrounded by valid values (interpolate).
* 'outside' Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0

If limit is specified, consecutive NaNs will be filled in this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put back ticks around `NaN`

direction.
inplace : bool, default False
Update the NDFrame in place if possible.
limit_area : {`None`, 'inside', 'outside'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ", default None" to the end here and remove the comment about it being the default below

If limit is specified, consecutive NaNs will be filled with this
restriction.

* None: No fill restriction (default).
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add an example for 'outside'


.. versionadded:: 0.21.0

downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
kwargs : keyword arguments to pass on to the interpolating function.
**kwargs
Keyword arguments to pass on to the interpolating function.

Returns
-------
Series or DataFrame of same shape interpolated at the NaNs
Series or DataFrame
Same-shape object interpolated at the NaN values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the description here say "Returns the same object type as the caller" - that wording has been used by a few other PRs so just want to be consistent


See Also
--------
reindex, replace, fillna
replace : replace a value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above - so much can be added here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

fillna : fill missing values
scipy.interpolate.Akima1DInterpolator : piecewise cubic polynomials
(Akima interpolator)
scipy.interpolate.BPoly.from_derivatives : piecewise polynomial in the
Bernstein basis
scipy.interpolate.interp1d : interpolate a 1-D function
scipy.interpolate.KroghInterpolator : interpolate polynomial (Krogh
interpolator)
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation
scipy.interpolate.CubicSpline : cubic spline data interpolator

Notes
-----
If the selected `method` is one of 'krogh', 'piecewise_polynomial',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The 'krogh', 'piecewise_polynomial', ... methods are wrappers around the respective SciPy implementations" would be better wording

'spline', 'pchip', 'akima':
They are wrappers around the scipy interpolation methods of similar
names. These use the actual numerical values of the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "These use the actual numerical values of the index" mean?

Copy link
Contributor Author

@math-and-data math-and-data Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"These use the actual numerical values of the index." Better grammar?

For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.

Examples
--------

Filling in NaNs
Filling in `NaN` in a :class:`~pandas.Series` via linear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double

interpolation.

>>> s = pd.Series([0, 1, np.nan, 3])
>>> s.interpolate()
0 0
1 1
2 2
3 3
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64

Filling in `NaN` in a Series by padding, but filling at most two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double ticks (here and next line)

consecutive `NaN` at a time.

>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object

Filling in `NaN` in a Series via polynomial interpolation or splines:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backtick

Both `polynomial` and `spline` methods require that you also specify
an `order` (int).

>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=1)
0 0.0
1 2.0
2 5.0
3 8.0
dtype: float64
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64

Create a :class:`~pandas.DataFrame` with missing values to fill it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are explaining here what the below code is going to do, but not really saying what it's important. Would be better worded as "Interpolation can also be applied to DataFrames" or something to the effect

with diffferent methods.

>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just construct with the missing values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly so people can see the "expected" Interpolation (I tried to have a pattern column-wise) and they can compare it with what actually happens, e.g. with lin. Interpolation (especially if the last entry is an NA)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I had this comment before but just use the NA values in your constructor - no reason to instantiate the DataFrame with values and then assign them missing values after the fact.

Also make sure you put a space after every comma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so one can see how the columns get created - and we have linear values in 3 columns and quadratic on the 4th.

... [2,3,4,-2,12],[3,4,5,-3,16]],
... columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 0 1 2 0 4
1 1 2 3 -1 8
2 2 3 4 -2 12
3 3 4 5 -3 16
>>> df.loc[1,'a'] = np.nan
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df.loc[2,'d'] = np.nan
>>> df.loc[1,'e'] = np.nan
>>> df
a b c d e
0 0.0 NaN 2 0.0 4.0
1 NaN 2.0 3 NaN NaN
2 2.0 3.0 4 NaN 12.0
3 NaN 4.0 5 -3.0 16.0

Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently
(because there is no entry after it to use for interpolation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the parentheses here (nor on the next line)

Note how the first entry in column `b` remains `NaN` (because there
is no entry befofe it to use for interpolation).

>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d e
0 0.0 NaN 2 0.0 4.0
1 1.0 2.0 3 -1.0 8.0
2 2.0 3.0 4 -2.0 12.0
3 2.0 4.0 5 -3.0 16.0
"""

@Appender(_shared_docs['interpolate'] % _shared_doc_kwargs)
Expand Down