Skip to content

Roadmap: remove ArrayManager #57554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 0 additions & 27 deletions web/pandas/about/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,33 +90,6 @@ data types within pandas. This will let us take advantage of its I/O
capabilities and provide for better interoperability with other
languages and libraries using Arrow.

### Block manager rewrite

We'd like to replace pandas current internal data structures (a
collection of 1 or 2-D arrays) with a simpler collection of 1-D arrays.

Pandas internal data model is quite complex. A DataFrame is made up of
one or more 2-dimensional "blocks", with one or more blocks per dtype.
This collection of 2-D arrays is managed by the BlockManager.

The primary benefit of the BlockManager is improved performance on
certain operations (construction from a 2D array, binary operations,
reductions across the columns), especially for wide DataFrames. However,
the BlockManager substantially increases the complexity and maintenance
burden of pandas.

By replacing the BlockManager we hope to achieve

- Substantially simpler code
- Easier extensibility with new logical types
- Better user control over memory use and layout
- Improved micro-performance
- Option to provide a C / Cython API to pandas' internals

See [these design
documents](https://dev.pandas.io/pandas2/internal-architecture.html#removal-of-blockmanager-new-dataframe-internals)
for more.

### Decoupling of indexing and internals

The code for getting and setting values in pandas' data structures
Expand Down