Skip to content

Commit 0fcdd8d

Browse files
bpo-36502: Correct documentation of str.isspace() (GH-15019) (GH-15296)
The documented definition was much broader than the real one: there are tons of characters with general category "Other", and we don't (and shouldn't) treat most of them as whitespace. Rewrite the definition to agree with the comment on _PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py, which is what generates that function and so ultimately governs. Add suitable breadcrumbs so that a reader who wants to pin down exactly what this definition means (what's a "bidirectional class" of "B"?) can do so. The `unicodedata` module documentation is an appropriate central place for our references to Unicode's own copious documentation, so point there. Also add to the isspace() test a thorough check that the implementation agrees with the intended definition. (cherry picked from commit 8c1c426) Co-authored-by: Greg Price <[email protected]>
1 parent 316acf2 commit 0fcdd8d

File tree

2 files changed

+19
-4
lines changed

2 files changed

+19
-4
lines changed

Doc/library/stdtypes.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1731,9 +1731,13 @@ expression support in the :mod:`re` module).
17311731
.. method:: str.isspace()
17321732

17331733
Return true if there are only whitespace characters in the string and there is
1734-
at least one character, false otherwise. Whitespace characters are those
1735-
characters defined in the Unicode character database as "Other" or "Separator"
1736-
and those with bidirectional property being one of "WS", "B", or "S".
1734+
at least one character, false otherwise.
1735+
1736+
A character is *whitespace* if in the Unicode character database
1737+
(see :mod:`unicodedata`), either its general category is ``Zs``
1738+
("Separator, space"), or its bidirectional class is one of ``WS``,
1739+
``B``, or ``S``.
1740+
17371741

17381742
.. method:: str.istitle()
17391743

Lib/test/test_unicode.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import operator
1212
import struct
1313
import sys
14+
import unicodedata
1415
import unittest
1516
import warnings
1617
from test import support, string_tests
@@ -615,11 +616,21 @@ def test_isspace(self):
615616
self.checkequalnofix(True, '\u2000', 'isspace')
616617
self.checkequalnofix(True, '\u200a', 'isspace')
617618
self.checkequalnofix(False, '\u2014', 'isspace')
618-
# apparently there are no non-BMP spaces chars in Unicode 6
619+
# There are no non-BMP whitespace chars as of Unicode 12.
619620
for ch in ['\U00010401', '\U00010427', '\U00010429', '\U0001044E',
620621
'\U0001F40D', '\U0001F46F']:
621622
self.assertFalse(ch.isspace(), '{!a} is not space.'.format(ch))
622623

624+
@support.requires_resource('cpu')
625+
def test_isspace_invariant(self):
626+
for codepoint in range(sys.maxunicode + 1):
627+
char = chr(codepoint)
628+
bidirectional = unicodedata.bidirectional(char)
629+
category = unicodedata.category(char)
630+
self.assertEqual(char.isspace(),
631+
(bidirectional in ('WS', 'B', 'S')
632+
or category == 'Zs'))
633+
623634
def test_isalnum(self):
624635
super().test_isalnum()
625636
for ch in ['\U00010401', '\U00010427', '\U00010429', '\U0001044E',

0 commit comments

Comments
 (0)