API: New global option to set the default dtypes to use #61620

datapythonista · 2025-06-10T08:15:11Z

This was already implemented before 2.0 in #50748, but then removed before the release in #51853, as in too many cases the option wasn't being respected.

The idea is to have a global option to let pandas know which dtype kind to use when data is created (the exact option name needs to be discussed, but I'll use use_arrow to illustrate):

pandas.options.mode.use_arrow = True

df = pandas.read_csv(...)  # The returned DataFrame will use pyarrow dtypes
df["foo"] = 1  # The added column will use pyarrow dtypes
df = pandas.DataFrame(...)  # The returned DataFrame will use pyarrow dtypes
...

I don't think adding the option is controversial, as it has no impact on users unless set, and it was already implemented without objections in the past.

I think the implementation requires a bit of discussion, as the exact behavior to implement is not immediately obvious, a least to me. Main points I can see

Should we have an option to set pyarrow as the default (since those should be the types we expect people to use in the future), or a more generic option to set dtype_backend to numpy|nullable|pyarrow?
I think at least initially it makes sense that if a user is specific about the dtype they want to use (e.g. Series([1, 2], dtype="Int32")) we let them do it. But could it make sense to have a second option force_arrow or force_dtype_backend so any operation that would use another dtype kind would fail? I think this could be helpful for users that only want to live in the pyarrow world, and it would also be helpful to identify undesired casts for us.
The exact namespace (mode vs future vs others) and name of the option, which clearly will depend on the previous points

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2025-06-10T10:31:54Z

2. I think at least initially it makes sense that if a user is specific about the dtype they want to use (e.g. Series([1, 2], dtype="Int32")) we let them do it. But could it make sense to have a second option force_arrow or force_dtype_backend so any operation that would use another dtype kind would fail? I think this could be helpful for users that only want to live in the pyarrow world, and it would also be helpful to identify undesired casts for us.

It would seem logical that if we have a global option that there is a mapping of dtypes to Arrow types silently. The purpose of the global option is to work with only Arrow types.

a secondary option, for control of that would perhaps be desirable for some users.

But definitely we would not want to require any code changes. The idea of the option would be to allow users to use PyArrow on existing code without any code changes.

We could perhaps give consideration to logical types, as per PDEP-13 #58455, as a future direction so that these silent dtypes mappings do not occur but that is definitely not a blocker to what you are proposing.

arthurlw · 2025-06-10T11:06:55Z

Should we have an option to set pyarrow as the default (since those should be the types we expect people to use in the future), or a more generic option to set dtype_backend to numpy|nullable|pyarrow?

Not a maintainer, but personally I would prefer the latter: it feels more future-proof and flexible, especially if other backends are considered later on.

datapythonista · 2025-06-10T11:13:16Z

Thanks @arthurlw, this is good feedback. I agree, and I prefer the first option, because I see the dtype backends not as a feature, but as something we had to do because we didn't get the backend we wanted initially.

Long term I think users should just think about float, int... and not how they are storaged internally. In that sense maybe pandas.options.mode.use_legacy_dtypes = True/False can even be clearer, if others share my point of view.

datapythonista added Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result labels Jun 10, 2025

datapythonista changed the title ~~ENH: New global option to set the default dtypes to use~~ API: New global option to set the default dtypes to use Jun 10, 2025

datapythonista added API Design and removed Dtype Conversions Unexpected or buggy dtype conversions pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result labels Jun 10, 2025

datapythonista mentioned this issue Jun 10, 2025

Moving to PyArrow dtypes by default #61618

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: New global option to set the default dtypes to use #61620

API: New global option to set the default dtypes to use #61620

datapythonista commented Jun 10, 2025

simonjayhawkins commented Jun 10, 2025

Uh oh!

arthurlw commented Jun 10, 2025

Uh oh!

datapythonista commented Jun 10, 2025

Uh oh!

Uh oh!

API: New global option to set the default dtypes to use #61620

API: New global option to set the default dtypes to use #61620

Comments

datapythonista commented Jun 10, 2025

simonjayhawkins commented Jun 10, 2025

Uh oh!

arthurlw commented Jun 10, 2025

Uh oh!

datapythonista commented Jun 10, 2025

Uh oh!