Skip to content

REF: io.pytables operate on DataFrames instead of Blocks #29871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 3, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@
import pandas.core.common as com
from pandas.core.computation.pytables import PyTablesExpr, maybe_expression
from pandas.core.index import ensure_index
from pandas.core.internals import BlockManager, _block_shape, make_block

from pandas.io.common import _stringify_path
from pandas.io.formats.printing import adjoin, pprint_thing
Expand Down Expand Up @@ -2301,7 +2300,7 @@ def set_atom_categorical(self, block, items, info=None):
# write the codes; must be in a block shape
self.ordered = values.ordered
self.typ = self.get_atom_data(block, kind=codes.dtype.name)
self.set_data(_block_shape(codes))
self.set_data(codes)

# write the categories
self.meta = "category"
Expand Down Expand Up @@ -3100,17 +3099,23 @@ def read(self, start=None, stop=None, **kwargs):
axes.append(ax)

items = axes[0]
blocks = []
dfs = []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be better to use a list comprehension here and make a function for the DataFrame creation (e.g. lines 3106-3110), but can be a followup

for i in range(self.nblocks):

blk_items = self.read_index(f"block{i}_items")
values = self.read_array(f"block{i}_values", start=_start, stop=_stop)
blk = make_block(
values, placement=items.get_indexer(blk_items), ndim=len(axes)
)
blocks.append(blk)

return self.obj_type(BlockManager(blocks, axes))
columns = items[items.get_indexer(blk_items)]
df = DataFrame(values.T, columns=columns, index=axes[1])
dfs.append(df)

if len(dfs) > 0:
out = concat(dfs, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually find this simpler as a chained operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will change in follow-up (since CI is failing for unrelated reasons right now)

out = out.reindex(columns=items, copy=False)
return out

return DataFrame(columns=axes[0], index=axes[1])

def write(self, obj, **kwargs):
super().write(obj, **kwargs)
Expand Down Expand Up @@ -4333,9 +4338,15 @@ def read(
if values.ndim == 1 and isinstance(values, np.ndarray):
values = values.reshape((1, values.shape[0]))

block = make_block(values, placement=np.arange(len(cols_)), ndim=2)
mgr = BlockManager([block], [cols_, index_])
frames.append(DataFrame(mgr))
if isinstance(values, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this with the line above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont understand the suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this

if values.ndim == 1 and isinstance(values, np.ndarray):
                values = values.reshape((1, values.shape[0]))	                

inside this if

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of that check is orthogonal to the purpose of this check. 4341-4348 are logically grouped together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I don't particularly like the fact that we have to construct things like this, but i guess ok

df = DataFrame(values.T, columns=cols_, index=index_)
elif isinstance(values, Index):
df = DataFrame(values, columns=cols_, index=index_)
else:
# Categorical
df = DataFrame([values], columns=cols_, index=index_)
assert (df.dtypes == values.dtype).all(), (df.dtypes, values.dtype)
frames.append(df)

if len(frames) == 1:
df = frames[0]
Expand Down