Skip to content

REF/PERF: deduplicate kth_smallest #40559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 23, 2021
Merged

Conversation

mzeitlin11
Copy link
Member

Ensuring a contiguous input seems to give about a 5-10% performance improvement on the existing benchmark gil.ParallelKth

@@ -1305,6 +1305,8 @@ def compute(self, method: str) -> Series:
narr = len(arr)
n = min(n, narr)

# arr passed into kth_smallest must be contiguous, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ascontiguousarray instead (or specify the ordering flag in copy)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -12,6 +12,48 @@ cdef inline Py_ssize_t swap(numeric *a, numeric *b) nogil:
return 0


cdef inline numeric kth_smallest_c(numeric* arr, Py_ssize_t k, Py_ssize_t n) nogil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the implementation in the pxd file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cimported from groupby.pyx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that means that it needs to be declared in the pxd file, the actual implementation can be in the pyx file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it I see what you mean will change


Parameters
----------
arr: numeric* arr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing "arr" not needed?

space before the colon in Parameters section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @jreback

@@ -64,6 +64,14 @@ cdef:
float64_t NaN = <float64_t>np.NaN
int64_t NPY_NAT = get_nat()

cdef enum TiebreakEnumType:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you move this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in the pxd but didnt need to be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sorry it was an unrelated change

@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance labels Mar 22, 2021
@jreback jreback added this to the 1.3 milestone Mar 22, 2021
@jreback
Copy link
Contributor

jreback commented Mar 22, 2021

@mzeitlin11 any top-level user visible perf worth mentioning in the what's new? (e.g. nlargest / nsmallest)

@mzeitlin11
Copy link
Member Author

@mzeitlin11 any top-level user visible perf worth mentioning in the what's new? (e.g. nlargest / nsmallest)

Based on the benchmark should speed up nlargest/nsmallest about 5-10%. Let me know if that's worth adding a whatsnew

@jreback jreback merged commit 86d5980 into pandas-dev:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @mzeitlin11

ok this was more of a refactor then.

@mzeitlin11 mzeitlin11 deleted the cln/kth_smallest branch March 23, 2021 16:18
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants