Skip to content

Commit ec014a1

Browse files
sir-sigurdmiss-islington
authored andcommitted
bpo-34636: Use fast path for more chars in SRE category macros. (GH-9170)
When handling \s, \d, or \w (and their inverse) escapes in bytes regexes this a small but measurable performance improvement. <!-- issue-number: [bpo-34636](https://www.bugs.python.org/issue34636) --> https://bugs.python.org/issue34636 <!-- /issue-number -->
1 parent d13e59c commit ec014a1

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Speed up re scanning of many non-matching characters for \s \w and \d within
2+
bytes objects. (microoptimization)

Modules/_sre.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -87,13 +87,13 @@ static const char copyright[] =
8787
/* search engine state */
8888

8989
#define SRE_IS_DIGIT(ch)\
90-
((ch) < 128 && Py_ISDIGIT(ch))
90+
((ch) <= '9' && Py_ISDIGIT(ch))
9191
#define SRE_IS_SPACE(ch)\
92-
((ch) < 128 && Py_ISSPACE(ch))
92+
((ch) <= ' ' && Py_ISSPACE(ch))
9393
#define SRE_IS_LINEBREAK(ch)\
9494
((ch) == '\n')
9595
#define SRE_IS_WORD(ch)\
96-
((ch) < 128 && (Py_ISALNUM(ch) || (ch) == '_'))
96+
((ch) <= 'z' && (Py_ISALNUM(ch) || (ch) == '_'))
9797

9898
static unsigned int sre_lower_ascii(unsigned int ch)
9999
{

0 commit comments

Comments
 (0)