Skip to content

REF: io.pytables operate on DataFrames instead of Blocks #29871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 3, 2019

Conversation

jbrockmendel
Copy link
Member

Wouldn't be surprised if there is a perf hit here, will run asvs.

@jbrockmendel
Copy link
Member Author

asvs seem like noise:

       before           after         ratio
-      10.7±0.1ms      9.26±0.07ms     0.86  io.hdf.HDFStoreDataFrame.time_query_store_table
-         186±3ms          159±1ms     0.86  io.hdf.HDFStoreDataFrame.time_write_store_table_dc
-      7.59±0.2ms      6.34±0.07ms     0.84  io.hdf.HDFStoreDataFrame.time_store_info

       before           after         ratio
+        30.6±2ms         42.6±2ms     1.39  io.hdf.HDF.time_read_hdf('fixed')
-      7.55±0.2ms      6.80±0.01ms     0.90  io.hdf.HDFStoreDataFrame.time_store_info

       before           after         ratio
+        29.3±1ms         41.3±5ms     1.41  io.hdf.HDF.time_read_hdf('fixed')
+     4.00±0.04ms      4.40±0.07ms     1.10  io.hdf.HDFStoreDataFrame.time_read_store_table

@gfyoung gfyoung added IO Data IO issues that don't fit into a more specific label Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code labels Nov 27, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor comments, pls rebase

@jreback jreback added this to the 1.0 milestone Nov 27, 2019
@jbrockmendel
Copy link
Member Author

rebased+green

block = make_block(values, placement=np.arange(len(cols_)), ndim=2)
mgr = BlockManager([block], [cols_, index_])
frames.append(DataFrame(mgr))
if isinstance(values, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this with the line above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont understand the suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this

if values.ndim == 1 and isinstance(values, np.ndarray):
                values = values.reshape((1, values.shape[0]))	                

inside this if

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of that check is orthogonal to the purpose of this check. 4341-4348 are logically grouped together

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I don't particularly like the fact that we have to construct things like this, but i guess ok

@jbrockmendel jbrockmendel added IO HDF5 read_hdf, HDFStore and removed IO Data IO issues that don't fit into a more specific label labels Dec 1, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge like this or handle suggestions as followup.

@@ -3100,17 +3099,23 @@ def read(self, start=None, stop=None, **kwargs):
axes.append(ax)

items = axes[0]
blocks = []
dfs = []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be better to use a list comprehension here and make a function for the DataFrame creation (e.g. lines 3106-3110), but can be a followup

dfs.append(df)

if len(dfs) > 0:
out = concat(dfs, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually find this simpler as a chained operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will change in follow-up (since CI is failing for unrelated reasons right now)

block = make_block(values, placement=np.arange(len(cols_)), ndim=2)
mgr = BlockManager([block], [cols_, index_])
frames.append(DataFrame(mgr))
if isinstance(values, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I don't particularly like the fact that we have to construct things like this, but i guess ok

@jreback jreback merged commit 6705b2a into pandas-dev:master Dec 3, 2019
@jreback
Copy link
Contributor

jreback commented Dec 3, 2019

thanks

@jbrockmendel jbrockmendel deleted the cln-pytables-blocks branch December 3, 2019 16:08
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation IO HDF5 read_hdf, HDFStore Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants