@@ -58,27 +58,6 @@ library, making their behavior more consistent with the handling of
58
58
NumPy arrays. We'll do this by cleaning up pandas' internals and
59
59
adding new methods to the extension array interface.
60
60
61
- ### String data type
62
-
63
- Currently, pandas stores text data in an ` object ` -dtype NumPy array.
64
- The current implementation has two primary drawbacks: First, ` object `
65
- -dtype is not specific to strings: any Python object can be stored in an
66
- ` object ` -dtype array, not just strings. Second: this is not efficient.
67
- The NumPy memory model isn't especially well-suited to variable width
68
- text data.
69
-
70
- To solve the first issue, we propose a new extension type for string
71
- data. This will initially be opt-in, with users explicitly requesting
72
- ` dtype="string" ` . The array backing this string dtype may initially be
73
- the current implementation: an ` object ` -dtype NumPy array of Python
74
- strings.
75
-
76
- To solve the second issue (performance), we'll explore alternative
77
- in-memory array libraries (for example, Apache Arrow). As part of the
78
- work, we may need to implement certain operations expected by pandas
79
- users (for example the algorithm used in, ` Series.str.upper ` ). That work
80
- may be done outside of pandas.
81
-
82
61
### Apache Arrow interoperability
83
62
84
63
[ Apache Arrow] ( https://arrow.apache.org ) is a cross-language development
0 commit comments