Skip to content

PERF: avoid copy in concatenate_array_managers if reindex already copies #44559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
7 changes: 6 additions & 1 deletion pandas/core/internals/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,17 @@ def _concatenate_array_managers(
"""
# reindex all arrays
mgrs = []
axis1_needs_copy = False
for mgr, indexers in mgrs_indexers:
axis1_needs_copy_this = True
for ax, indexer in indexers.items():
mgr = mgr.reindex_indexer(
axes[ax], indexer, axis=ax, allow_dups=True, use_na_proxy=True
)
if ax == 1 and indexer is not None:
axis1_needs_copy_this = False
mgrs.append(mgr)
axis1_needs_copy = axis1_needs_copy or axis1_needs_copy_this

if concat_axis == 1:
# concatting along the rows -> concat the reindexed arrays
Expand All @@ -94,7 +99,7 @@ def _concatenate_array_managers(
# concatting along the columns -> combine reindexed arrays in a single manager
assert concat_axis == 0
arrays = list(itertools.chain.from_iterable([mgr.arrays for mgr in mgrs]))
if copy:
if copy and axis1_needs_copy_this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be cleaner (and avoid copies in corner cases) to do this up in the for loop right before mgrs.append(mgr)

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good idea, and that also simplified it quite a bit

arrays = [x.copy() for x in arrays]

new_mgr = ArrayManager(arrays, [axes[1], axes[0]], verify_integrity=False)
Expand Down