-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
bpo-35859: re module, fix three bugs about capturing groups #11756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fix is not correct, I'll update this PR when I can use my computer. |
@serhiy-storchaka |
Lib/test/test_re.py
Outdated
def test_bug_35859(self): | ||
# Capture behavior depends on the order of an alternation | ||
s = 'ab' | ||
self.assertEqual(re.search(r'(ab|a)*?b', s).groups(), ('a',)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why search()
is used instead of match()
or fullmatch()
?
ab|a
is equivalent to ab?
. Is there a reason why use the former? If there is a difference, it is better to use .b|a
instead, because ab|a
can be transformed to ab?
by the RE compiler in future versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, this PR doesn't fix the problem.
>>> re.match(r'(ab?)*?b', 'ab').groups()
('',)
The correct output should be:
>>> regex.match(r'(ab?)*?b', 'ab').groups()
('a',)
I will recheck the patch tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
search()
is used instead ofmatch()
orfullmatch()
?
ab|a
is equivalent toab?
. Is there a reason why use the former?
"Because it can be". (ab|a)*?b
is a reduced test case for a real regex I found during my research.
Lib/test/test_re.py
Outdated
s = 'ab' | ||
self.assertEqual(re.search(r'(ab|a)*?b', s).groups(), ('a',)) | ||
self.assertEqual(re.search(r'(ab|a)+?b', s).groups(), ('a',)) | ||
self.assertEqual(re.search(r'(ab|a){0,}?b', s).groups(), ('a',)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X{0,}?
is equivalent to X*?
, so this test is redundant.
Show the wrong behaviors before this fix.
MARK_PUSH(lastmark) macro didn't protect MARK-0 if it was the only available mark.
before this fix, the test-case returns ('c', 'c', 'c') No need to save state->repeat in ctx->u.rep. SRE_OP_BRANCH saves state->repeat in ctx->u.rep, this is because after JUMP_BRANCH, the state->repeat may be NULL. While SRE_OP_ASSERT_NOT doesn't have this problem.
GH-12134 is a backport of this pull request to the 2.7 branch. |
See PR 12427 |
Fix wrong capturing groups in rare cases, the span of capturing group may be lost when backtracking.
These bugs exist since Python 2.
MARK_PUSH(lastmark)
didn't protectMARK 0
if it was the only available markJUMP_MIN_UNTIL_3
needsLASTMARK_SAVE()
andMARK_PUSH()
JUMP_ASSERT_NOT
needsLASTMARK_SAVE()
andMARK_PUSH()
Please read review guide in issue35859:
https://bugs.python.org/issue35859