use of iloc with heterogeneous DataFrame coerces dtype? #5256

jason-s · 2013-10-17T23:00:50Z

I'm using pandas 0.11.0 and numpy 1.7.1 in Anaconda Python 1.6.0.

The iloc method seems to coerce dtype of values in some cases. For example:

x5 = pd.DataFrame(index=['A','B','C'],columns=['foo','bar','baz'])
x5.foo = np.int16([-1,-2,-3])
x5.bar = np.uint16([1,2,3])
x5.baz = np.uint32([100000,200000,300000])
print x5
print x5.dtypes

   foo  bar     baz
A   -1    1  100000
B   -2    2  200000
C   -3    3  300000
foo     int16
bar    uint16
baz    uint32
dtype: object

So far, so good. Now using iloc, it seems to cause type coercion when a single row is selected:

print x5.iloc[1:2]
   foo  bar     baz
B   -2    2  200000

print x5.iloc[1]
foo    4294967294
bar             2
baz        200000
Name: B, dtype: uint32

The text was updated successfully, but these errors were encountered:

jason-s · 2013-10-17T23:53:04Z

Just upgraded to pandas 0.12.0, numpy 1.7.1, under Anaconda 1.7.0. The iloc coercion behavior is different (now appears more correct), but it still occurs:

print x5.iloc[-1]
foo        -3
bar         3
baz    300000
Name: C, dtype: int64

jreback · 2013-10-18T00:01:02Z

that is a correct coercion
by definition it has to pick a type that can hold the sub elements because it's a series
the DataFrame will preserve

jason-s · 2013-10-18T00:09:03Z

So is this the best correct way to get the values out in an iterable format that preserves their types:

lastrow = x5.tail(1).itertuples(index=False).next()
print lastrow
print [type(val) for val in lastrow]

(-3, 3, 300000)
[<type 'numpy.int16'>, <type 'numpy.uint16'>, <type 'numpy.uint32'>]

jreback · 2013-10-18T00:31:40Z

what are you doing with the values?

the dtypes are column based

so best to operate that way

is there a reason you actually need these dtypes like this?

jason-s · 2013-10-18T01:57:13Z

is there a reason you actually need these dtypes like this?

yes, they are fixed-point integers representing raw values from a data acquisition system, which are going straight into an PyTables HDF5 file. Some of the data are quantized analog samples, others are digital values. Preservation of the original format is important for auditing purposes.

They come in batches, and I'm just trying to process the last line of each batch for a UI update.

jreback · 2013-10-18T02:00:15Z

HDFStore will properly serialize this
but if u need a list then your approach will work

jreback closed this as completed Oct 18, 2013

illbebach mentioned this issue Jun 2, 2025

BUG: use of iloc with heterogeneous DataFrame sometimes performs undocumented conversions #61537

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

use of iloc with heterogeneous DataFrame coerces dtype? #5256

use of iloc with heterogeneous DataFrame coerces dtype? #5256

jason-s commented Oct 17, 2013

jason-s commented Oct 17, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!

jason-s commented Oct 18, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!

jason-s commented Oct 18, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!

Uh oh!

use of iloc with heterogeneous DataFrame coerces dtype? #5256

use of iloc with heterogeneous DataFrame coerces dtype? #5256

Comments

jason-s commented Oct 17, 2013

jason-s commented Oct 17, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!

jason-s commented Oct 18, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!

jason-s commented Oct 18, 2013

Uh oh!

jreback commented Oct 18, 2013

Uh oh!