Skip to content

DOC: Improve the docstring of Str.contains() #20870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 10, 2018

Conversation

Blair-Young
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameter "flags" description should start with capital letter

Achieved through PyData 2018 pandas sprint.

@Blair-Young
Copy link
Contributor Author

Unsure best way to phrase flags parameter as description starts with the re module.

@codecov
Copy link

codecov bot commented Apr 29, 2018

Codecov Report

Merging #20870 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20870      +/-   ##
==========================================
+ Coverage   91.79%    91.8%   +<.01%     
==========================================
  Files         153      153              
  Lines       49411    49411              
==========================================
+ Hits        45359    45361       +2     
+ Misses       4052     4050       -2
Flag Coverage Δ
#multiple 90.19% <ø> (ø) ⬆️
#single 41.91% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/strings.py 98.34% <ø> (ø) ⬆️
pandas/util/testing.py 84.59% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4da79b...99eb015. Read the comment docs.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this

@@ -295,20 +295,24 @@ def str_count(arr, pat, flags=0):

def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
"""
Return boolean Series/``array`` whether given pattern/regex is
contained in each string in the Series/Index.
Test if pattern or regex is contained within each string of a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used by Index as well which was removed here - please be sure to add that back

contained in each string in the Series/Index.
Test if pattern or regex is contained within each string of a Series.

Return boolean Series based on whether a given pattern or regex is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to comment above, this isn't always true

index = pd.Index(['foo>>> index = pd.Index(['foo', 'bar', 'baz'])
>>> index.str.contains('a')
array([False,  True,  True])

regex : bool, default True
If True use re.search, otherwise use Python in operator
If True, assumes the passed-in pattern is a regular expression.\n
If False, treats the pattern as a literal string.

Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the Returns section from the below PR as reference

https://github.com/pandas-dev/pandas/pull/20491/files

regex : bool, default True
If True use re.search, otherwise use Python in operator
If True, assumes the passed-in pattern is a regular expression.\n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rogue newline?

@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
--------
match : analogous, but stricter, relying on re.match instead of re.search

Examples
--------
>>> s = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been at least two issues opened on the pandas tracker recently due to a misunderstanding of how this function works - can you update the examples to account for and explain using a pattern of .0 against numbers like 40.0?

Here are the issues I am referring to:

flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
na : default NaN, fill value for missing values.
re module flags, e.g. re.IGNORECASE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To suppress the warning just rewrite this as "Flags to pass through to the re module, e.g. re.IGNORECASE." or something similar

4 NaN
dtype: object

>>> s.str.contains('og', na=False, regex=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are good but some text to direct the user on what to look at preceding each would be helpful. For example here I would prepend something like:

With the above examples, you'll note that NA values in the caller return `np.nan` by default and as a result cast the dtype of the returned object to `object`. You can control this behavior by specifying a an alternate `na` argument

@@ -318,6 +322,48 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):
--------
match : analogous, but stricter, relying on re.match instead of re.search

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add examples for the case and flags parameters?

@Blair-Young
Copy link
Contributor Author

Hey @WillAyd I've tried to fix your correction. What's the next step?

@jreback jreback added Docs Strings String extension data type and string data labels May 3, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone May 10, 2018
@TomAugspurger TomAugspurger merged commit 6d5d701 into pandas-dev:master May 10, 2018
@TomAugspurger
Copy link
Contributor

Thanks @Blair-Young!

@Blair-Young
Copy link
Contributor Author

Thanks @TomAugspurger looking forward to more.

@Blair-Young Blair-Young deleted the str.contains branch May 10, 2018 12:31
topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants