Skip to content

DOC: improved the scatter method #20118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 14, 2018
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 29 additions & 5 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds):

def scatter(self, x, y, s=None, c=None, **kwds):
"""
Scatter plot
A scatter plot with point size `s` and color `c`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the rest of the docs from the chapter I'd start this with an infinitive verb like "Create a scatter plot", "Generate a scatter plot", "Make a scatter plot"...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More: I don't think you should mention the meaning of the arguments in the short description of the method. This is what the Parameters section is for.


The coordinates of each point `x,y` are defined by two dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: don't talk about the arguments, talk about what the function does, use cases, etc.

columns and filled circles are used to represent each point.

Parameters
----------
x, y : label or position, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinates for each point.
x : column name or column position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think label or position is OK. You can remove optional, as both x and y are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the description of the argument c specifies column name or column position. My intention was to make it consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some confusion about using "label or position" as type. In other PR @datapythonista suggests using real types like "int or str" as the argument type, and mention it's the column name or position in the description of the parameter on next line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my mistake, I realized later that label could be any type, I guess that's why label or position is being used.

Horizontal coordinates of each point.
y : column name or column position
Vertical coordinates of each point.
s : scalar or array_like, optional
Size of each point.
c : label or position, optional
c : label, column name or column position, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this column contain anything? How does it work? How does the function choose the color of the dots, just any unique color?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think valid options are matplotlib's + a column label, which extracts the array and passes it to matplotlib.

from plt.scatter:

c : color, sequence, or sequence of color, optional, default: 'b'
    `c` can be a single color format string, or a sequence of color
    specifications of length `N`, or a sequence of `N` numbers to be
    mapped to colors using the `cmap` and `norm` specified via kwargs
    (see below). Note that `c` should not be a single numeric RGB or
    RGBA sequence because that is indistinguishable from an array of
    values to be colormapped.  `c` can be a 2-D array in which the
    rows are RGB or RGBA, however, including the case of a single
    row to specify the same color for all points.

You'll to verify that. Check if / when a colorbar is added.

Color of each point.
`**kwds` : optional
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use **kwds instaed of kwds, even if the validation script complains. Sorry, this have been one the biggest sources of confusion in this sprint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove the : optional. Just **kwds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I believe we have used **kwds : optionals in other similar docstrings. Probably we should come up with an agreement on how to document the **kwds parameter in these cases.

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them

See Also
--------
matplotlib.pyplot.scatter : scatter plot using multiple input data
formats.

Examples
--------

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some text above the code explaining what are you doing here?

:context: close-figs

>>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
... [6.4, 3.2, 1], [5.9, 3.0, 2]],
... columns = ['length', 'width', 'species'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove spaces between the equal sign? columns=['length...

>>> f = df.plot.scatter(x='length',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If what is returned is an axis, better to use ax = ... here.

... y='width',
... c='species',
... colormap='viridis')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add at least one more example with different parameters to see the differences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if we recommend cmap or colormap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger They are both accepted but the one described in pandas.DataFrame.plot is colormap.

"""
return self(kind='scatter', x=x, y=y, c=c, s=s, **kwds)

Expand Down