Skip to content

Commit ea4009a

Browse files
committed
syntax: fix flag scoping issue
This fixes a rather nasty bug where flags set inside a group were being applies to expressions outside the group. e.g., In the simplest case, `((?i)a)b)` would match `aB`, even though the case insensitive flag _shouldn't_ be applied to `b`. The issue here was that we were actually going out of our way to reset the flags when a group is popped only _some_ of the time. Namely, when flags were set via `(?i:a)b` syntax. Instead, flags should be reset to their previous state _every_ time a group is popped in the translator. The fix here is pretty simple. When we open a group, if the group itself does not have any flags, then we simply record the current state of the flags instead of trying to replace the current flags. Then, when we pop the group, we are guaranteed to obtain the old flags, at which point, we reset them. Fixes #640
1 parent 2a8ddd0 commit ea4009a

File tree

2 files changed

+38
-10
lines changed

2 files changed

+38
-10
lines changed

regex-syntax/src/hir/translate.rs

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -159,18 +159,19 @@ enum HirFrame {
159159
/// indicated by parentheses (including non-capturing groups). It is popped
160160
/// upon leaving a group.
161161
Group {
162-
/// The old active flags, if any, when this group was opened.
162+
/// The old active flags when this group was opened.
163163
///
164164
/// If this group sets flags, then the new active flags are set to the
165165
/// result of merging the old flags with the flags introduced by this
166-
/// group.
166+
/// group. If the group doesn't set any flags, then this is simply
167+
/// equivalent to whatever flags were set when the group was opened.
167168
///
168169
/// When this group is popped, the active flags should be restored to
169170
/// the flags set here.
170171
///
171172
/// The "active" flags correspond to whatever flags are set in the
172173
/// Translator.
173-
old_flags: Option<Flags>,
174+
old_flags: Flags,
174175
},
175176
/// This is pushed whenever a concatenation is observed. After visiting
176177
/// every sub-expression in the concatenation, the translator's stack is
@@ -219,8 +220,8 @@ impl HirFrame {
219220

220221
/// Assert that the current stack frame is a group indicator and return
221222
/// its corresponding flags (the flags that were active at the time the
222-
/// group was entered) if they exist.
223-
fn unwrap_group(self) -> Option<Flags> {
223+
/// group was entered).
224+
fn unwrap_group(self) -> Flags {
224225
match self {
225226
HirFrame::Group { old_flags } => old_flags,
226227
_ => {
@@ -252,8 +253,11 @@ impl<'t, 'p> Visitor for TranslatorI<'t, 'p> {
252253
}
253254
}
254255
Ast::Group(ref x) => {
255-
let old_flags = x.flags().map(|ast| self.set_flags(ast));
256-
self.push(HirFrame::Group { old_flags: old_flags });
256+
let old_flags = x
257+
.flags()
258+
.map(|ast| self.set_flags(ast))
259+
.unwrap_or_else(|| self.flags());
260+
self.push(HirFrame::Group { old_flags });
257261
}
258262
Ast::Concat(ref x) if x.asts.is_empty() => {}
259263
Ast::Concat(_) => {
@@ -350,9 +354,8 @@ impl<'t, 'p> Visitor for TranslatorI<'t, 'p> {
350354
}
351355
Ast::Group(ref x) => {
352356
let expr = self.pop().unwrap().unwrap_expr();
353-
if let Some(flags) = self.pop().unwrap().unwrap_group() {
354-
self.trans().flags.set(flags);
355-
}
357+
let old_flags = self.pop().unwrap().unwrap_group();
358+
self.trans().flags.set(old_flags);
356359
self.push(HirFrame::Expr(self.hir_group(x, expr)));
357360
}
358361
Ast::Concat(_) => {
@@ -1641,6 +1644,20 @@ mod tests {
16411644
hir_lit("β"),
16421645
])
16431646
);
1647+
assert_eq!(
1648+
t("(?:(?i-u)a)b"),
1649+
hir_cat(vec![
1650+
hir_group_nocap(hir_bclass(&[(b'A', b'A'), (b'a', b'a')])),
1651+
hir_lit("b"),
1652+
])
1653+
);
1654+
assert_eq!(
1655+
t("((?i-u)a)b"),
1656+
hir_cat(vec![
1657+
hir_group(1, hir_bclass(&[(b'A', b'A'), (b'a', b'a')])),
1658+
hir_lit("b"),
1659+
])
1660+
);
16441661
#[cfg(feature = "unicode-case")]
16451662
assert_eq!(
16461663
t("(?i)(?-i:a)a"),

tests/regression.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,3 +199,14 @@ fn regression_nfa_stops1() {
199199
let re = ::regex::bytes::Regex::new(r"\bs(?:[ab])").unwrap();
200200
assert_eq!(0, re.find_iter(b"s\xE4").count());
201201
}
202+
203+
// See: https://github.com/rust-lang/regex/issues/640
204+
#[cfg(feature = "unicode-case")]
205+
matiter!(
206+
flags_are_unset,
207+
r"((?i)foo)|Bar",
208+
"foo Foo bar Bar",
209+
(0, 3),
210+
(4, 7),
211+
(12, 15)
212+
);

0 commit comments

Comments
 (0)