-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
API: New global option to set the default dtypes to use #61620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would seem logical that if we have a global option that there is a mapping of dtypes to Arrow types silently. The purpose of the global option is to work with only Arrow types. a secondary option, for control of that would perhaps be desirable for some users. But definitely we would not want to require any code changes. The idea of the option would be to allow users to use PyArrow on existing code without any code changes. We could perhaps give consideration to logical types, as per PDEP-13 #58455, as a future direction so that these silent dtypes mappings do not occur but that is definitely not a blocker to what you are proposing. |
Not a maintainer, but personally I would prefer the latter: it feels more future-proof and flexible, especially if other backends are considered later on. |
Thanks @arthurlw, this is good feedback. I agree, and I prefer the first option, because I see the dtype backends not as a feature, but as something we had to do because we didn't get the backend we wanted initially. Long term I think users should just think about float, int... and not how they are storaged internally. In that sense maybe |
This was already implemented before 2.0 in #50748, but then removed before the release in #51853, as in too many cases the option wasn't being respected.
The idea is to have a global option to let pandas know which dtype kind to use when data is created (the exact option name needs to be discussed, but I'll use
use_arrow
to illustrate):I don't think adding the option is controversial, as it has no impact on users unless set, and it was already implemented without objections in the past.
I think the implementation requires a bit of discussion, as the exact behavior to implement is not immediately obvious, a least to me. Main points I can see
dtype_backend
tonumpy|nullable|pyarrow
?Series([1, 2], dtype="Int32")
) we let them do it. But could it make sense to have a second optionforce_arrow
orforce_dtype_backend
so any operation that would use another dtype kind would fail? I think this could be helpful for users that only want to live in the pyarrow world, and it would also be helpful to identify undesired casts for us.mode
vsfuture
vs others) and name of the option, which clearly will depend on the previous pointsThe text was updated successfully, but these errors were encountered: