Skip to content

WEB: remove "String data type" from "Roadmap points pending a PDEP" section. #61601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions web/pandas/about/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,27 +58,6 @@ library, making their behavior more consistent with the handling of
NumPy arrays. We'll do this by cleaning up pandas' internals and
adding new methods to the extension array interface.

### String data type

Currently, pandas stores text data in an `object` -dtype NumPy array.
The current implementation has two primary drawbacks: First, `object`
-dtype is not specific to strings: any Python object can be stored in an
`object` -dtype array, not just strings. Second: this is not efficient.
The NumPy memory model isn't especially well-suited to variable width
text data.

To solve the first issue, we propose a new extension type for string
data. This will initially be opt-in, with users explicitly requesting
`dtype="string"`. The array backing this string dtype may initially be
the current implementation: an `object` -dtype NumPy array of Python
strings.

To solve the second issue (performance), we'll explore alternative
in-memory array libraries (for example, Apache Arrow). As part of the
work, we may need to implement certain operations expected by pandas
users (for example the algorithm used in, `Series.str.upper`). That work
may be done outside of pandas.

### Apache Arrow interoperability

[Apache Arrow](https://arrow.apache.org) is a cross-language development
Expand Down