-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Speed up max_len_string_array #10024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
077d353
Network-ize a test
cpcloud b698772
Use explicit len dispatch to avoid overhead
cpcloud e6a831f
Improve perf
cpcloud c1caf7f
Use a fused type
cpcloud def1479
Ensure object on stata
cpcloud 13b7474
Test that we do not accept unicode
cpcloud b88139d
Use proper types so that we work with python3
cpcloud ee2626e
Better name for fused type
cpcloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
cimport numpy as np | ||
cimport cython | ||
import numpy as np | ||
import sys | ||
|
||
from numpy cimport * | ||
|
||
|
@@ -10,6 +11,7 @@ cdef extern from "numpy/arrayobject.h": | |
cdef enum NPY_TYPES: | ||
NPY_intp "NPY_INTP" | ||
|
||
|
||
from cpython cimport (PyDict_New, PyDict_GetItem, PyDict_SetItem, | ||
PyDict_Contains, PyDict_Keys, | ||
Py_INCREF, PyTuple_SET_ITEM, | ||
|
@@ -18,7 +20,14 @@ from cpython cimport (PyDict_New, PyDict_GetItem, PyDict_SetItem, | |
PyBytes_Check, | ||
PyTuple_SetItem, | ||
PyTuple_New, | ||
PyObject_SetAttrString) | ||
PyObject_SetAttrString, | ||
PyBytes_GET_SIZE, | ||
PyUnicode_GET_SIZE) | ||
|
||
try: | ||
from cpython cimport PyString_GET_SIZE | ||
except ImportError: | ||
from cpython cimport PyUnicode_GET_SIZE as PyString_GET_SIZE | ||
|
||
cdef extern from "Python.h": | ||
Py_ssize_t PY_SSIZE_T_MAX | ||
|
@@ -32,7 +41,6 @@ cdef extern from "Python.h": | |
Py_ssize_t *slicelength) except -1 | ||
|
||
|
||
|
||
cimport cpython | ||
|
||
isnan = np.isnan | ||
|
@@ -896,23 +904,32 @@ def clean_index_list(list obj): | |
|
||
return maybe_convert_objects(converted), 0 | ||
|
||
|
||
ctypedef fused pandas_string: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going to mention you could use a better name here :). |
||
str | ||
unicode | ||
bytes | ||
|
||
|
||
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
def max_len_string_array(ndarray arr): | ||
cpdef Py_ssize_t max_len_string_array(pandas_string[:] arr): | ||
""" return the maximum size of elements in a 1-dim string array """ | ||
cdef: | ||
int i, m, l | ||
int length = arr.shape[0] | ||
object v | ||
Py_ssize_t i, m = 0, l = 0, length = arr.shape[0] | ||
pandas_string v | ||
|
||
m = 0 | ||
for i from 0 <= i < length: | ||
for i in range(length): | ||
v = arr[i] | ||
if PyString_Check(v) or PyBytes_Check(v) or PyUnicode_Check(v): | ||
l = len(v) | ||
|
||
if l > m: | ||
m = l | ||
if PyString_Check(v): | ||
l = PyString_GET_SIZE(v) | ||
elif PyBytes_Check(v): | ||
l = PyBytes_GET_SIZE(v) | ||
elif PyUnicode_Check(v): | ||
l = PyUnicode_GET_SIZE(v) | ||
|
||
if l > m: | ||
m = l | ||
|
||
return m | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also speedup any user facing API?
pandas.lib
is not really for public consumption :).