Skip to content

BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside a target. #61598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
monagai opened this issue Jun 7, 2025 · 1 comment
Open
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@monagai
Copy link

monagai commented Jun 7, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({
   ...:     'A': [1, 9, 6, 2, 7],
   ...:     'B': [6, 1, 3, 6, 3],
   ...:     'C': [2, 8, 4, 4, 4]
   ...: }, index=list('abcde'))
df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
df['vals'] = df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

Issue Description

This ia reprot about ~ opetarotr in pandas dataframe.

Here is the example on python=3.10.12, pandas=2.2.3.

python 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({
   ...:     'A': [1, 9, 6, 2, 7],
   ...:     'B': [6, 1, 3, 6, 3],
   ...:     'C': [2, 8, 4, 4, 4]
   ...: }, index=list('abcde'))

In [3]: df
Out[3]:
   A  B  C
a  1  6  2
b  9  1  8
c  6  3  4
d  2  6  4
e  7  3  4

In [3]: df
Out[3]:
   A  B  C
a  1  6  2
b  9  1  8
c  6  3  4
d  2  6  4
e  7  3  4

In [4]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[4]:
a    False
b     True
c     True
d    False
e     True
dtype: bool

In [5]: df['vals'] = df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

In [6]: df
Out[6]:
   A  B  C   vals
a  1  6  2  False
b  9  1  8   True
c  6  3  4   True
d  2  6  4  False
e  7  3  4   True

In [7]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[7]:
a   -2
b   -1
c   -1
d   -2
e   -1
dtype: int64

In the above example, the same df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1) is executed in step 4, 5, and 7.
However, the result of step 7 is ridiculous.
In spite of ~, not operator returns a correct answer.
It seems that ~ operator in pandas dataframe quite dangerous and unreliable.

In the environment of python 3.13.3, panads=2.2.3, only for the step 7, python returns warning that <ipython-input-7-7d5677ff0f59>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int..
However, I think this is a warning by python (not by pandas) from a different point of view.

Expected Behavior

The result of step 7 is same as step 4, 5.

Installed Versions

python = 3.10.12
pandas = 2.2.3

@monagai monagai added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 7, 2025
@monagai monagai changed the title BUG: Dangerous inconsistency: ~ operator changes behavior based on context. BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside the target. Jun 7, 2025
@monagai monagai changed the title BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside the target. BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside a target. Jun 7, 2025
@Liam3851
Copy link
Contributor

Liam3851 commented Jun 7, 2025

I believe the way to do what you want is simply

~((df['B'] > 3) & (df['C'] < 8)

This keeps all the math within pandas and returns a boolean Series.

When you call

df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

you are expressly telling the system that you want to take each row of df, cast it to a Series, look up individual elements of the Series, and then apply ~ to the elements. In this case you are taking all the math away from pandas and sending it to Python, which is doing something you don't want, which is why you get the Python warning.

On line 4 I think you happen to get away with it because the whole DataFrame is dtyped as np.int64 and so you're staying with numpy scalars (e.g. np.int64, rather than python int), and so your comparison operators return numpy.bool_s, and numpy is handling this the way you want. When you add the additional column you're getting an object dtyped Series on the cross-section (because you now have columns of different dtypes), and so it's going all the way to python, so you're getting python ints, and thus python bools, which give you a different answer, i.e.

In[1]: ~np.bool_(False)
Out[1]: True

In [2]: ~False
Out[2]: -1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants