DOC: improved the scatter method #20118

DaniGate · 2018-03-10T12:47:44Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:



################################################################################
################## Docstring (pandas.DataFrame.plot.scatter)  ##################
################################################################################

A scatter plot with point size *s* and color *c*.

The coordinates of each point *x,y* are defined by two dataframe
columns and filled circles are used to represent each point.

Parameters
----------
x : column name or column position
    Horizontal and vertical coordinates of each point.
y : column name or column position
    Vertical coordinates of each point.
s : scalar or array_like, optional
    Size of each point.
c : label, column name or column position, optional
    Color of each point.
kwds : optional
    Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them

See Also
--------
matplotlib.pyplot.scatter : scatter plot using multiple input data
    formats.

Examples
--------

.. plot::
    :context: close-figs

    >>> from sklearn.datasets import load_iris
    >>> iris = load_iris()
    >>> df = pd.DataFrame(iris.data[:,:2],
    ...                   columns=iris.feature_names[:2])
    >>> df['species'] = load_iris().target
    >>> f = df.plot.scatter(x='sepal length (cm)',
    ...                     y='sepal width (cm)',
    ...                     c='species',
    ...                     colormap='viridis')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.plot.scatter" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

TomAugspurger · 2018-03-10T13:02:10Z

pandas/plotting/_core.py

@@ -2852,22 +2852,48 @@ def pie(self, y=None, **kwds):

    def scatter(self, x, y, s=None, c=None, **kwds):
        """
-        Scatter plot
+        A scatter plot with point size *s* and color *c*.


How about the first line is "A scatter plot of two columns in the DataFrame."

TomAugspurger · 2018-03-10T13:02:19Z

pandas/plotting/_core.py

-        Scatter plot
+        A scatter plot with point size *s* and color *c*.
+
+        The coordinates of each point *x,y* are defined by two dataframe


Parameter names should be in singel backticks.

TomAugspurger · 2018-03-10T13:03:17Z

pandas/plotting/_core.py


        Parameters
        ----------
-        x, y : label or position, optional
-            Coordinates for each point.
+        x : column name or column position


I think label or position is OK. You can remove optional, as both x and y are required.

But the description of the argument c specifies column name or column position. My intention was to make it consistent

There is some confusion about using "label or position" as type. In other PR @datapythonista suggests using real types like "int or str" as the argument type, and mention it's the column name or position in the description of the parameter on next line.

yes, my mistake, I realized later that label could be any type, I guess that's why label or position is being used.

TomAugspurger · 2018-03-10T13:06:45Z

pandas/plotting/_core.py

+        .. plot::
+            :context: close-figs
+
+            >>> from sklearn.datasets import load_iris


We can't import sklearn here, since it isn't installed.

Maybe just make a small sample dataset instead.

It passed all checks and test and the plot was shown in the docs. But ok, I will write down the dataframe explicitely.

…ng a toy dataset instead

dukebody · 2018-03-10T14:37:22Z

pandas/plotting/_core.py

@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds):

    def scatter(self, x, y, s=None, c=None, **kwds):
        """
-        Scatter plot
+        A scatter plot with point size `s` and color `c`.


To be consistent with the rest of the docs from the chapter I'd start this with an infinitive verb like "Create a scatter plot", "Generate a scatter plot", "Make a scatter plot"...

dukebody · 2018-03-10T14:46:50Z

FTR this is pandas.DataFrame.plot.scatter.

dukebody

Please add the type of the parameters and don't mention them in the description and extended description.

dukebody · 2018-03-10T14:38:07Z

pandas/plotting/_core.py

@@ -2852,22 +2852,46 @@ def pie(self, y=None, **kwds):

    def scatter(self, x, y, s=None, c=None, **kwds):
        """
-        Scatter plot
+        A scatter plot with point size `s` and color `c`.


More: I don't think you should mention the meaning of the arguments in the short description of the method. This is what the Parameters section is for.

dukebody · 2018-03-10T14:46:09Z

pandas/plotting/_core.py


        Parameters
        ----------
-        x, y : label or position, optional


You are missing the types for all parameters. See https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-3-parameters.

dukebody · 2018-03-10T14:47:36Z

pandas/plotting/_core.py

-        Scatter plot
+        A scatter plot with point size `s` and color `c`.
+
+        The coordinates of each point `x,y` are defined by two dataframe


Same here: don't talk about the arguments, talk about what the function does, use cases, etc.

dukebody · 2018-03-10T14:49:35Z

pandas/plotting/_core.py

        s : scalar or array_like, optional
            Size of each point.
-        c : label or position, optional
+        c : label, column name or column position, optional


Can this column contain anything? How does it work? How does the function choose the color of the dots, just any unique color?

I think valid options are matplotlib's + a column label, which extracts the array and passes it to matplotlib.

from plt.scatter:

c : color, sequence, or sequence of color, optional, default: 'b' `c` can be a single color format string, or a sequence of color specifications of length `N`, or a sequence of `N` numbers to be mapped to colors using the `cmap` and `norm` specified via kwargs (see below). Note that `c` should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. `c` can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points.

You'll to verify that. Check if / when a colorbar is added.

dukebody · 2018-03-10T14:50:56Z

pandas/plotting/_core.py

            Color of each point.
-        `**kwds` : optional
+        kwds : optional


Use **kwds instaed of kwds, even if the validation script complains. Sorry, this have been one the biggest sources of confusion in this sprint.

And remove the : optional. Just **kwds.

@TomAugspurger I believe we have used **kwds : optionals in other similar docstrings. Probably we should come up with an agreement on how to document the **kwds parameter in these cases.

dukebody · 2018-03-10T15:05:27Z

pandas/plotting/_core.py

+        Examples
+        --------
+
+        .. plot::


Can you add some text above the code explaining what are you doing here?

dukebody · 2018-03-10T15:05:48Z

pandas/plotting/_core.py

+            >>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
+            ...                    [6.4, 3.2, 1], [5.9, 3.0, 2]],
+            ...                   columns = ['length', 'width', 'species'])
+            >>> f = df.plot.scatter(x='length',


If what is returned is an axis, better to use ax = ... here.

dukebody · 2018-03-10T15:07:53Z

pandas/plotting/_core.py

+            >>> f = df.plot.scatter(x='length',
+            ...                     y='width',
+            ...                     c='species',
+            ...                     colormap='viridis')


Can you add at least one more example with different parameters to see the differences?

Can you check if we recommend cmap or colormap?

@TomAugspurger They are both accepted but the one described in pandas.DataFrame.plot is colormap.

…values, extended description with use cases, extended examples with extra case.

pep8speaks · 2018-03-13T22:49:59Z

Hello @DaniGate! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 14, 2018 at 12:57 Hours UTC

codecov · 2018-03-13T22:50:18Z

Codecov Report

Merging #20118 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20118   +/-   ##
=======================================
  Coverage   91.76%   91.76%           
=======================================
  Files         150      150           
  Lines       49151    49151           
=======================================
  Hits        45102    45102           
  Misses       4049     4049

Flag	Coverage Δ
#multiple	`90.14% <ø> (ø)`	⬆️
#single	`41.9% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`82.27% <ø> (ø)`	⬆️
pandas/core/indexes/datetimelike.py	`96.72% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.64% <0%> (ø)`	⬆️
pandas/core/generic.py	`95.85% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21ae073...15b6073. Read the comment docs.

…values, extended description with use cases, extended examples with extra case.

… into doc-string_scatter

dukebody · 2018-03-14T07:20:24Z

pandas/plotting/_core.py


-        The coordinates of each point `x,y` are defined by two dataframe
-        columns and filled circles are used to represent each point.
+        The coordinates of each point are defined by two dataframe columns and


Muuuuch better!!!!!!

dukebody · 2018-03-14T07:21:01Z

pandas/plotting/_core.py

-        c : label, column name or column position, optional
-            Color of each point.
-        kwds : optional
+        x : int, str


I think this should be int or str instead, according to the guide: "If more than one type is accepted, separate them by commas, except the last two types, that need to be separated by the word ‘or’"

dukebody · 2018-03-14T07:23:39Z

pandas/plotting/_core.py

+            - A single scalar so all points have the same size.
+
+            - A sequence of scalars, which will be used for each point's size
+            recursively. For intance [2,14] all points will be size 2 or 14,


"For instance, using [2, 14] all points will be of size ...". Typo in "instance", the rest is a proposal to make it more prose-English.

dukebody · 2018-03-14T07:24:59Z

pandas/plotting/_core.py

+        c : str, int, array_like, optional
+            The color of each point. Possible values are:
+
+            - A single color string referred to by name, RGB or RGBA code,


Have you verified if these bullet points render nicely in the final HTML? I'm not good at restructured text so I tend to be wary about these things. :)

You need the blank lines to properly render the bullet points ;)

dukebody · 2018-03-14T07:42:41Z

This is almost done I believe, just a couple of style fixes and IMO it will be good to merge. :)

dukebody · 2018-03-14T07:44:01Z

pandas/plotting/_core.py

@@ -2881,17 +2906,22 @@ def scatter(self, x, y, s=None, c=None, **kwds):

        Examples
        --------
+        Let's see how to draw a scatter plot using coordinates and color from
+        the values in three DataFrame columns.

        .. plot::
            :context: close-figs

            >>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
            ...                    [6.4, 3.2, 1], [5.9, 3.0, 2]],
            ...                   columns = ['length', 'width', 'species'])


Can you remove spaces between the equal sign? columns=['length...

… into doc-string_scatter

[ci skip]

TomAugspurger · 2018-03-14T12:54:30Z

Thanks @DaniGate and @dukebody .

Fixed the line length issue.

[ci skip]

TomAugspurger · 2018-03-14T12:57:32Z

OK, also split the examples in two to be a little clearer. Merging, thanks!

DaniGate added 2 commits March 10, 2018 13:41

DOC: improved the scatter method

ab4757f

Typo in docstring variable description

9b17c14

TomAugspurger reviewed Mar 10, 2018

View reviewed changes

DaniGate added 2 commits March 10, 2018 14:56

Style changes

f71d139

Removing import of external packages in the example, explicitly writi…

c24a5c6

…ng a toy dataset instead

dukebody reviewed Mar 10, 2018

View reviewed changes

jreback added Docs Visualization plotting labels Mar 10, 2018

dukebody suggested changes Mar 10, 2018

View reviewed changes

Type of arguments added, more detailed explanation of their possible …

fcb8d77

…values, extended description with use cases, extended examples with extra case.

DaniGate and others added 3 commits March 13, 2018 23:55

Type of arguments added, more detailed explanation of their possible …

08b5f5a

…values, extended description with use cases, extended examples with extra case.

Merge branch 'doc-string_scatter' of https://github.com/DaniGate/pandas…

fca7db2

… into doc-string_scatter

Merge branch 'master' into doc-string_scatter

051bd76

dukebody reviewed Mar 14, 2018

View reviewed changes

DaniGate and others added 4 commits March 14, 2018 10:03

Some typo and style corrections

a4aede7

Some typo and style corrections

4fcb6a8

Merge branch 'doc-string_scatter' of https://github.com/DaniGate/pandas…

c70c1e2

… into doc-string_scatter

Line length [ci skip]

ded0ea4

[ci skip]

TomAugspurger added 2 commits March 14, 2018 07:54

kwds format [ci skip]

b2abf5a

[ci skip]

Split examples [ci skip]

15b6073

[ci skip]

TomAugspurger merged commit c0d93f9 into pandas-dev:master Mar 14, 2018

Uh oh!

DOC: improved the scatter method #20118

DOC: improved the scatter method #20118

Uh oh!

Conversation

DaniGate commented Mar 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dukebody commented Mar 10, 2018

Uh oh!

dukebody left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on March 14, 2018 at 12:57 Hours UTC

Uh oh!

codecov bot commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dukebody Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dukebody commented Mar 14, 2018

Uh oh!

pep8speaks commented Mar 13, 2018 •

edited

Loading

codecov bot commented Mar 13, 2018 •

edited

Loading

dukebody Mar 14, 2018 •

edited

Loading

TomAugspurger commented Mar 14, 2018 •

edited

Loading