Skip to content

use of iloc with heterogeneous DataFrame coerces dtype? #5256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jason-s opened this issue Oct 17, 2013 · 6 comments
Closed

use of iloc with heterogeneous DataFrame coerces dtype? #5256

jason-s opened this issue Oct 17, 2013 · 6 comments

Comments

@jason-s
Copy link

jason-s commented Oct 17, 2013

I'm using pandas 0.11.0 and numpy 1.7.1 in Anaconda Python 1.6.0.

The iloc method seems to coerce dtype of values in some cases. For example:

x5 = pd.DataFrame(index=['A','B','C'],columns=['foo','bar','baz'])
x5.foo = np.int16([-1,-2,-3])
x5.bar = np.uint16([1,2,3])
x5.baz = np.uint32([100000,200000,300000])
print x5
print x5.dtypes

   foo  bar     baz
A   -1    1  100000
B   -2    2  200000
C   -3    3  300000
foo     int16
bar    uint16
baz    uint32
dtype: object

So far, so good. Now using iloc, it seems to cause type coercion when a single row is selected:

print x5.iloc[1:2]
   foo  bar     baz
B   -2    2  200000

print x5.iloc[1]
foo    4294967294
bar             2
baz        200000
Name: B, dtype: uint32
@jason-s
Copy link
Author

jason-s commented Oct 17, 2013

Just upgraded to pandas 0.12.0, numpy 1.7.1, under Anaconda 1.7.0. The iloc coercion behavior is different (now appears more correct), but it still occurs:

print x5.iloc[-1]
foo        -3
bar         3
baz    300000
Name: C, dtype: int64

@jreback
Copy link
Contributor

jreback commented Oct 18, 2013

that is a correct coercion
by definition it has to pick a type that can hold the sub elements because it's a series
the DataFrame will preserve

@jason-s
Copy link
Author

jason-s commented Oct 18, 2013

So is this the best correct way to get the values out in an iterable format that preserves their types:

lastrow = x5.tail(1).itertuples(index=False).next()
print lastrow
print [type(val) for val in lastrow]

(-3, 3, 300000)
[<type 'numpy.int16'>, <type 'numpy.uint16'>, <type 'numpy.uint32'>]

@jreback
Copy link
Contributor

jreback commented Oct 18, 2013

what are you doing with the values?

the dtypes are column based

so best to operate that way

is there a reason you actually need these dtypes like this?

@jason-s
Copy link
Author

jason-s commented Oct 18, 2013

is there a reason you actually need these dtypes like this?

yes, they are fixed-point integers representing raw values from a data acquisition system, which are going straight into an PyTables HDF5 file. Some of the data are quantized analog samples, others are digital values. Preservation of the original format is important for auditing purposes.

They come in batches, and I'm just trying to process the last line of each batch for a UI update.

@jreback
Copy link
Contributor

jreback commented Oct 18, 2013

HDFStore will properly serialize this
but if u need a list then your approach will work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants