Skip to content

DOC: improved the scatter method #20118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 14, 2018

Conversation

DaniGate
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:



################################################################################
################## Docstring (pandas.DataFrame.plot.scatter)  ##################
################################################################################

A scatter plot with point size *s* and color *c*.

The coordinates of each point *x,y* are defined by two dataframe
columns and filled circles are used to represent each point.

Parameters
----------
x : column name or column position
    Horizontal and vertical coordinates of each point.
y : column name or column position
    Vertical coordinates of each point.
s : scalar or array_like, optional
    Size of each point.
c : label, column name or column position, optional
    Color of each point.
kwds : optional
    Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them

See Also
--------
matplotlib.pyplot.scatter : scatter plot using multiple input data
    formats.

Examples
--------

.. plot::
    :context: close-figs

    >>> from sklearn.datasets import load_iris
    >>> iris = load_iris()
    >>> df = pd.DataFrame(iris.data[:,:2],
    ...                   columns=iris.feature_names[:2])
    >>> df['species'] = load_iris().target
    >>> f = df.plot.scatter(x='sepal length (cm)',
    ...                     y='sepal width (cm)',
    ...                     c='species',
    ...                     colormap='viridis')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.plot.scatter" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@@ -2852,22 +2852,48 @@ def pie(self, y=None, **kwds):

def scatter(self, x, y, s=None, c=None, **kwds):
"""
Scatter plot
A scatter plot with point size *s* and color *c*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the first line is "A scatter plot of two columns in the DataFrame."

Scatter plot
A scatter plot with point size *s* and color *c*.

The coordinates of each point *x,y* are defined by two dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter names should be in singel backticks.


Parameters
----------
x, y : label or position, optional
Coordinates for each point.
x : column name or column position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think label or position is OK. You can remove optional, as both x and y are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the description of the argument c specifies column name or column position. My intention was to make it consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some confusion about using "label or position" as type. In other PR @datapythonista suggests using real types like "int or str" as the argument type, and mention it's the column name or position in the description of the parameter on next line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my mistake, I realized later that label could be any type, I guess that's why label or position is being used.

.. plot::
:context: close-figs

>>> from sklearn.datasets import load_iris
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't import sklearn here, since it isn't installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just make a small sample dataset instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passed all checks and test and the plot was shown in the docs. But ok, I will write down the dataframe explicitely.

@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds):

def scatter(self, x, y, s=None, c=None, **kwds):
"""
Scatter plot
A scatter plot with point size `s` and color `c`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the rest of the docs from the chapter I'd start this with an infinitive verb like "Create a scatter plot", "Generate a scatter plot", "Make a scatter plot"...

@dukebody
Copy link
Contributor

FTR this is pandas.DataFrame.plot.scatter.

Copy link
Contributor

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the type of the parameters and don't mention them in the description and extended description.

@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds):

def scatter(self, x, y, s=None, c=None, **kwds):
"""
Scatter plot
A scatter plot with point size `s` and color `c`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More: I don't think you should mention the meaning of the arguments in the short description of the method. This is what the Parameters section is for.


Parameters
----------
x, y : label or position, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scatter plot
A scatter plot with point size `s` and color `c`.

The coordinates of each point `x,y` are defined by two dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: don't talk about the arguments, talk about what the function does, use cases, etc.

s : scalar or array_like, optional
Size of each point.
c : label or position, optional
c : label, column name or column position, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this column contain anything? How does it work? How does the function choose the color of the dots, just any unique color?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think valid options are matplotlib's + a column label, which extracts the array and passes it to matplotlib.

from plt.scatter:

c : color, sequence, or sequence of color, optional, default: 'b'
    `c` can be a single color format string, or a sequence of color
    specifications of length `N`, or a sequence of `N` numbers to be
    mapped to colors using the `cmap` and `norm` specified via kwargs
    (see below). Note that `c` should not be a single numeric RGB or
    RGBA sequence because that is indistinguishable from an array of
    values to be colormapped.  `c` can be a 2-D array in which the
    rows are RGB or RGBA, however, including the case of a single
    row to specify the same color for all points.

You'll to verify that. Check if / when a colorbar is added.

Color of each point.
`**kwds` : optional
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use **kwds instaed of kwds, even if the validation script complains. Sorry, this have been one the biggest sources of confusion in this sprint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove the : optional. Just **kwds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I believe we have used **kwds : optionals in other similar docstrings. Probably we should come up with an agreement on how to document the **kwds parameter in these cases.

Examples
--------

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some text above the code explaining what are you doing here?

>>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
... [6.4, 3.2, 1], [5.9, 3.0, 2]],
... columns = ['length', 'width', 'species'])
>>> f = df.plot.scatter(x='length',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If what is returned is an axis, better to use ax = ... here.

>>> f = df.plot.scatter(x='length',
... y='width',
... c='species',
... colormap='viridis')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add at least one more example with different parameters to see the differences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if we recommend cmap or colormap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger They are both accepted but the one described in pandas.DataFrame.plot is colormap.

…values, extended description with use cases, extended examples with extra case.
@pep8speaks
Copy link

pep8speaks commented Mar 13, 2018

Hello @DaniGate! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 14, 2018 at 12:57 Hours UTC

@codecov
Copy link

codecov bot commented Mar 13, 2018

Codecov Report

Merging #20118 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #20118   +/-   ##
=======================================
  Coverage   91.76%   91.76%           
=======================================
  Files         150      150           
  Lines       49151    49151           
=======================================
  Hits        45102    45102           
  Misses       4049     4049
Flag Coverage Δ
#multiple 90.14% <ø> (ø) ⬆️
#single 41.9% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.27% <ø> (ø) ⬆️
pandas/core/indexes/datetimelike.py 96.72% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.64% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21ae073...15b6073. Read the comment docs.


The coordinates of each point `x,y` are defined by two dataframe
columns and filled circles are used to represent each point.
The coordinates of each point are defined by two dataframe columns and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Muuuuch better!!!!!!

c : label, column name or column position, optional
Color of each point.
kwds : optional
x : int, str
Copy link
Contributor

@dukebody dukebody Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be int or str instead, according to the guide: "If more than one type is accepted, separate them by commas, except the last two types, that need to be separated by the word ‘or’"

- A single scalar so all points have the same size.

- A sequence of scalars, which will be used for each point's size
recursively. For intance [2,14] all points will be size 2 or 14,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For instance, using [2, 14] all points will be of size ...". Typo in "instance", the rest is a proposal to make it more prose-English.

c : str, int, array_like, optional
The color of each point. Possible values are:

- A single color string referred to by name, RGB or RGBA code,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified if these bullet points render nicely in the final HTML? I'm not good at restructured text so I tend to be wary about these things. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the blank lines to properly render the bullet points ;)
screen shot 2018-03-14 at 09 59 01

@dukebody
Copy link
Contributor

This is almost done I believe, just a couple of style fixes and IMO it will be good to merge. :)

@@ -2881,17 +2906,22 @@ def scatter(self, x, y, s=None, c=None, **kwds):

Examples
--------
Let's see how to draw a scatter plot using coordinates and color from
the values in three DataFrame columns.

.. plot::
:context: close-figs

>>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
... [6.4, 3.2, 1], [5.9, 3.0, 2]],
... columns = ['length', 'width', 'species'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove spaces between the equal sign? columns=['length...

@TomAugspurger
Copy link
Contributor

Thanks @DaniGate and @dukebody .

Fixed the line length issue.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 14, 2018

OK, also split the examples in two to be a little clearer. Merging, thanks!

@TomAugspurger TomAugspurger merged commit c0d93f9 into pandas-dev:master Mar 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants