DOC: Improve the docstring of Str.contains() #20870

Blair-Young · 2018-04-29T19:05:31Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "flags" description should start with capital letter

Achieved through PyData 2018 pandas sprint.

Blair-Young · 2018-04-29T19:14:25Z

Unsure best way to phrase flags parameter as description starts with the re module.

codecov · 2018-04-29T20:13:15Z

Codecov Report

Merging #20870 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20870      +/-   ##
==========================================
+ Coverage   91.79%    91.8%   +<.01%     
==========================================
  Files         153      153              
  Lines       49411    49411              
==========================================
+ Hits        45359    45361       +2     
+ Misses       4052     4050       -2

Flag	Coverage Δ
#multiple	`90.19% <ø> (ø)`	⬆️
#single	`41.91% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.34% <ø> (ø)`	⬆️
pandas/util/testing.py	`84.59% <0%> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4da79b...99eb015. Read the comment docs.

WillAyd

Thanks for submitting this

WillAyd · 2018-04-30T02:37:51Z

pandas/core/strings.py

@@ -295,20 +295,24 @@ def str_count(arr, pat, flags=0):

 def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
    """
-    Return boolean Series/``array`` whether given pattern/regex is
-    contained in each string in the Series/Index.
+    Test if pattern or regex is contained within each string of a Series.


This method is used by Index as well which was removed here - please be sure to add that back

WillAyd · 2018-04-30T02:38:26Z

pandas/core/strings.py

-    contained in each string in the Series/Index.
+    Test if pattern or regex is contained within each string of a Series.
+
+    Return boolean Series based on whether a given pattern or regex is


Related to comment above, this isn't always true

index = pd.Index(['foo>>> index = pd.Index(['foo', 'bar', 'baz']) >>> index.str.contains('a') array([False, True, True])

WillAyd · 2018-04-30T02:43:26Z

pandas/core/strings.py

    regex : bool, default True
-        If True use re.search, otherwise use Python in operator
+        If True, assumes the passed-in pattern is a regular expression.\n
+        If False, treats the pattern as a literal string.

    Returns


Maybe use the Returns section from the below PR as reference

https://github.com/pandas-dev/pandas/pull/20491/files

WillAyd · 2018-04-30T02:43:43Z

pandas/core/strings.py

    regex : bool, default True
-        If True use re.search, otherwise use Python in operator
+        If True, assumes the passed-in pattern is a regular expression.\n


Rogue newline?

WillAyd · 2018-04-30T02:59:27Z

pandas/core/strings.py

@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
    --------
    match : analogous, but stricter, relying on re.match instead of re.search

+    Examples
+    --------
+    >>> s = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])


There have been at least two issues opened on the pandas tracker recently due to a misunderstanding of how this function works - can you update the examples to account for and explain using a pattern of .0 against numbers like 40.0?

Here are the issues I am referring to:

Pandas series.str.replace('.0', '') replaces string preceding decimal point #20733

Pandas problem .str.replace #20836

WillAyd · 2018-04-30T03:00:26Z

pandas/core/strings.py

    flags : int, default 0 (no flags)
-        re module flags, e.g. re.IGNORECASE
-    na : default NaN, fill value for missing values.
+        re module flags, e.g. re.IGNORECASE.


To suppress the warning just rewrite this as "Flags to pass through to the re module, e.g. re.IGNORECASE." or something similar

WillAyd · 2018-04-30T03:09:51Z

pandas/core/strings.py

+    4      NaN
+    dtype: object
+
+    >>> s.str.contains('og', na=False, regex=True)


These examples are good but some text to direct the user on what to look at preceding each would be helpful. For example here I would prepend something like:

With the above examples, you'll note that NA values in the caller return `np.nan` by default and as a result cast the dtype of the returned object to `object`. You can control this behavior by specifying a an alternate `na` argument

WillAyd · 2018-04-30T03:10:22Z

pandas/core/strings.py

@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
    --------
    match : analogous, but stricter, relying on re.match instead of re.search

+    Examples


Can you add examples for the case and flags parameters?

Blair-Young · 2018-05-03T07:09:12Z

Hey @WillAyd I've tried to fix your correction. What's the next step?

jreback

lgtm. @TomAugspurger @jorisvandenbossche

TomAugspurger · 2018-05-10T11:30:00Z

Thanks @Blair-Young!

Blair-Young · 2018-05-10T12:31:03Z

Thanks @TomAugspurger looking forward to more.

Blair-Young added 3 commits April 29, 2018 19:43

Update docstring- str_contains

4059626

Update docstring- str_contains

699bd0a

Update docstring- str_contains

db2f862

WillAyd requested changes Apr 30, 2018

View reviewed changes

Blair-Young added 2 commits May 1, 2018 21:15

Merge remote-tracking branch 'upstream/master' into str.contains

c16d9b7

fix str.contain documentation

99eb015

jreback added Docs Strings String extension data type and string data labels May 3, 2018

jreback approved these changes May 3, 2018

View reviewed changes

WillAyd approved these changes May 5, 2018

View reviewed changes

TomAugspurger added this to the 0.23.0 milestone May 10, 2018

TomAugspurger merged commit 6d5d701 into pandas-dev:master May 10, 2018

Blair-Young deleted the str.contains branch May 10, 2018 12:31

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018

DOC: Improve the docstring of Str.contains() (pandas-dev#20870)

8fe2ec0

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018

DOC: Improve the docstring of Str.contains() (pandas-dev#20870)

341b4cb

Uh oh!

DOC: Improve the docstring of Str.contains() #20870

DOC: Improve the docstring of Str.contains() #20870

Uh oh!

Conversation

Blair-Young commented Apr 29, 2018

Uh oh!

Blair-Young commented Apr 29, 2018

Uh oh!

codecov bot commented Apr 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

WillAyd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blair-Young commented May 3, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented May 10, 2018

Uh oh!

Blair-Young commented May 10, 2018

Uh oh!

Uh oh!

codecov bot commented Apr 29, 2018 •

edited

Loading

WillAyd left a comment •

edited

Loading