-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Implement Kleene logic for BooleanArray #29842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
bb904cb
13c7ea3
fff786f
4067e7f
708c553
c56894e
2e9d547
373aaab
7f78a64
36b171b
747e046
d0a8cca
fe061b0
9f9e44c
0a34257
2ba0034
2d1129a
a24fc22
77dd1fc
7b9002c
c18046b
1237caa
2ecf9b8
87aeb09
969b6dc
1c9ba49
8eec954
cb47b6a
2a946b9
efb6f8b
004238e
5a2c81c
7032318
bbb7f9b
ce763b4
5bc5328
457bd08
31c2bc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
.. currentmodule:: pandas | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
import pandas as pd | ||
import numpy as np | ||
|
||
.. _boolean: | ||
|
||
************************** | ||
Nullable Boolean Data Type | ||
************************** | ||
|
||
.. versionadded:: 1.0.0 | ||
|
||
.. _boolean.klean: | ||
|
||
Kleene Logic | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
------------ | ||
|
||
:class:`arrays.BooleanArray` implements Kleene logic (sometimes called three-value logic) for | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or). | ||
|
||
Here's a table for ``and``. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
================= ========= | ||
Expression Result | ||
================= ========= | ||
``True & True`` ``True`` | ||
``True & False`` ``False`` | ||
``True & NA`` ``NA`` | ||
``False & False`` ``False`` | ||
``False & NA`` ``False`` | ||
``NA & NA`` ``NA`` | ||
``True | True`` ``True`` | ||
``True | False`` ``True`` | ||
``True | NA`` ``True`` | ||
``False | False`` ``False`` | ||
``False | NA`` ``NA`` | ||
``NA | NA`` ``NA`` | ||
``True ^ True`` ``False`` | ||
``True ^ False`` ``True`` | ||
``True ^ NA`` ``NA`` | ||
``False ^ False`` ``False`` | ||
``False ^ NA`` ``NA`` | ||
``NA ^ NA`` ``NA`` | ||
================= ========= | ||
|
||
When an ``NA`` is present in an operation, the output value is ``NA`` only if | ||
the result cannot be determined soley based on the other input. For example, | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``True | NA`` is ``True``, because both ``True | True`` and ``True | False`` | ||
are ``True``. In that case, we don't actually need to consider the value | ||
of the ``NA``. | ||
|
||
On the other hand, ``True & NA`` is ``NA``. The result depends on whether | ||
the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``, | ||
but ``True & False`` is ``False``, so we can't determine the output. | ||
|
||
|
||
This differs from how ``np.nan`` behaves in logical operations. Pandas treated | ||
``np.nan`` is *always false in the output*. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In ``or`` | ||
|
||
.. ipython:: python | ||
|
||
pd.Series([True, False, np.nan], dtype="object") | True | ||
pd.Series([True, False, np.nan], dtype="boolean") | True | ||
|
||
In ``and`` | ||
|
||
.. ipython:: python | ||
|
||
pd.Series([True, False, np.nan], dtype="object") & True | ||
pd.Series([True, False, np.nan], dtype="boolean") & True |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,6 +184,9 @@ class BooleanArray(ExtensionArray, ExtensionOpsMixin): | |
represented by 2 numpy arrays: a boolean array with the data and | ||
a boolean array with the mask (True indicating missing). | ||
|
||
BooleanArray implements Kleene logic (sometimes called three-value | ||
logic) for logical operations. See :ref:`` for more. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is "ref" here a placeholder? |
||
|
||
To construct an BooleanArray from generic array-like input, use | ||
:func:`pandas.array` specifying ``dtype="boolean"`` (see examples | ||
below). | ||
|
@@ -560,10 +563,12 @@ def logical_method(self, other): | |
return NotImplemented | ||
|
||
other = lib.item_from_zerodim(other) | ||
mask = None | ||
omask = mask = None | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
other_is_booleanarray = isinstance(other, BooleanArray) | ||
|
||
if isinstance(other, BooleanArray): | ||
other, mask = other._data, other._mask | ||
if other_is_booleanarray: | ||
other, omask = other._data, other._mask | ||
mask = omask | ||
elif is_list_like(other): | ||
other = np.asarray(other, dtype="bool") | ||
if other.ndim > 1: | ||
|
@@ -576,17 +581,38 @@ def logical_method(self, other): | |
|
||
# numpy will show a DeprecationWarning on invalid elementwise | ||
# comparisons, this will raise in the future | ||
with warnings.catch_warnings(): | ||
warnings.filterwarnings("ignore", "elementwise", FutureWarning) | ||
with np.errstate(all="ignore"): | ||
result = op(self._data, other) | ||
if lib.is_scalar(other) and np.isnan( | ||
other | ||
): # TODO(NA): change to libmissing.NA: | ||
result = self._data | ||
mask = True | ||
else: | ||
with warnings.catch_warnings(): | ||
warnings.filterwarnings("ignore", "elementwise", FutureWarning) | ||
with np.errstate(all="ignore"): | ||
result = op(self._data, other) | ||
|
||
# nans propagate | ||
if mask is None: | ||
mask = self._mask | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
mask = self._mask | mask | ||
|
||
# Kleene-logic adjustments to the mask. | ||
if op.__name__ in {"or_", "ror_"}: | ||
mask[result] = False | ||
elif op.__name__ in {"and_", "rand_"}: | ||
mask[~self._data & ~self._mask] = False | ||
if other_is_booleanarray: | ||
mask[~other & ~omask] = False | ||
elif lib.is_scalar(other) and np.isnan(other): # TODO(NA): change to NA | ||
mask[:] = True | ||
# Do we ever assume that masked values are False? | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result[mask] = False | ||
elif op.__name__ in {"xor", "rxor"}: | ||
# Do we ever assume that masked values are False? | ||
result[mask] = False | ||
|
||
return BooleanArray(result, mask) | ||
|
||
name = "__{name}__".format(name=op.__name__) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -391,13 +391,101 @@ def test_scalar(self, data, all_logical_operators): | |
|
||
def test_array(self, data, all_logical_operators): | ||
op_name = all_logical_operators | ||
if "or" in op_name: | ||
pytest.skip("confusing") | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
other = pd.array([True] * len(data), dtype="boolean") | ||
self._compare_other(data, op_name, other) | ||
other = np.array([True] * len(data)) | ||
self._compare_other(data, op_name, other) | ||
other = pd.Series([True] * len(data), dtype="boolean") | ||
self._compare_other(data, op_name, other) | ||
|
||
def test_kleene_or(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A careful review of these new test cases would be greatly appreciated. I've tried to make them as clear as possible, while covering all the cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went through the tests, very clear, added a few comments, for the rest looks good to me! |
||
# A clear test of behavior. | ||
a = pd.array([True] * 3 + [False] * 3 + [None] * 3, dtype="boolean") | ||
b = pd.array([True, False, None] * 3, dtype="boolean") | ||
result = a | b | ||
expected = pd.array( | ||
[True, True, True, True, False, None, True, None, None], dtype="boolean" | ||
) | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = b | a | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
def test_kleene_or_scalar(self): | ||
a = pd.array([True, False, None], dtype="boolean") | ||
result = a | np.nan # TODO: pd.NA | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
expected = pd.array([True, None, None], dtype="boolean") | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = np.nan | a # TODO: pd.NA | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
@pytest.mark.parametrize( | ||
"left,right,expected", | ||
[ | ||
([True, False, None], True, [True, True, True]), | ||
([True, False, None], False, [True, False, None]), | ||
([True, False, None], np.nan, [True, None, None]), | ||
# TODO: pd.NA | ||
], | ||
) | ||
def test_kleene_or_cases(self, left, right, expected): | ||
if isinstance(left, list): | ||
left = pd.array(left, dtype="boolean") | ||
if isinstance(right, list): | ||
right = pd.array(right, dtype="boolean") | ||
expected = pd.array(expected, dtype="boolean") | ||
result = left | right | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = right | left | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
def test_kleene_and(self): | ||
# A clear test of behavior. | ||
a = pd.array([True] * 3 + [False] * 3 + [None] * 3, dtype="boolean") | ||
b = pd.array([True, False, None] * 3, dtype="boolean") | ||
result = a & b | ||
expected = pd.array( | ||
[True, False, None, False, False, False, None, False, None], dtype="boolean" | ||
) | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = b & a | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
def test_kleene_and_scalar(self): | ||
a = pd.array([True, False, None], dtype="boolean") | ||
result = a & np.nan # TODO: pd.NA | ||
expected = pd.array([None, None, None], dtype="boolean") | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = np.nan & a # TODO: pd.na | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
def test_kleene_xor(self): | ||
a = pd.array([True] * 3 + [False] * 3 + [None] * 3, dtype="boolean") | ||
b = pd.array([True, False, None] * 3, dtype="boolean") | ||
result = a ^ b | ||
expected = pd.array( | ||
[False, True, None, True, False, None, None, None, None], dtype="boolean" | ||
) | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = b ^ a | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
def test_kleene_scalar(self): | ||
a = pd.array([True, False, None], dtype="boolean") | ||
result = a ^ np.nan # TODO: pd.NA | ||
expected = pd.array([None, None, None], dtype="boolean") | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
result = np.nan ^ a # TODO: pd.NA | ||
tm.assert_extension_array_equal(result, expected) | ||
|
||
|
||
class TestComparisonOps(BaseOpsUtil): | ||
def _compare_other(self, data, op_name, other): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.