This is an official plugin for Regex+ (it can also be used standalone) that adds support for recursive matching up to a specified max depth N, where N can be between 2 and 100. Generated regexes are native JavaScript RegExp
instances.
Note
Regex flavors vary on whether they offer infinite or fixed-depth recursion. For example, recursion in Oniguruma uses a default depth limit of 20.
Recursive matching is added to a regex via the following syntax. The recursion depth limit is provided in place of N.
(?R=N)
β Recursively match the entire regex at this position.\g<name&R=N>
or\g<number&R=N>
β Recursively match the contents of the group referenced by name or number at this position. The\g<β¦>
subroutine must be within the referenced group.
Details:
- Multiple uses of recursion within the same pattern are supported if they're non-overlapping.
- Named captures and backreferences are supported within recursion and are independent per depth level. A match result's
groups.name
property holds the value captured by groupname
at the top level of the recursion stack. Subpatternsgroups.name_$2
, etc. are available for each level of nested subpattern matches.
npm install regex regex-recursion
import {regex} from 'regex';
import {recursion} from 'regex-recursion';
const re = regex({plugins: [recursion]})`β¦`;
Using CommonJS require
const {regex} = require('regex');
const {recursion} = require('regex-recursion-cjs');
const re = regex({plugins: [recursion]})`β¦`;
Note: regex-recursion-cjs is a third-party CommonJS wrapper for this library. It might not always be up to date with the latest version.
Using a global name in browsers
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/regex.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/regex-recursion.min.js"></script>
<script>
const {regex} = Regex;
const {recursion} = Regex.plugins;
const re = regex({plugins: [recursion]})`β¦`;
</script>
// Matches sequences of up to 20 'a' chars followed by the same number of 'b'
const re = regex({plugins: [recursion]})`a(?R=20)?b`;
re.exec('test aaaaaabbb')[0];
// β 'aaabbb'
Use \g<name&R=N>
to recursively match just the specified group.
const re = regex({plugins: [recursion]})`
^ (?<r> a \g<r&R=20>? b) $
`;
re.test('aaabbb'); // β true
re.test('aaabb'); // β false
// Matches all balanced parentheses up to depth 20
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ([^\(\)] | (?R=20))* \)
`;
'test ) (balanced ((parens))) () ((a)) ( (b)'.match(parens);
/* β [
'(balanced ((parens)))',
'()',
'((a))',
'(b)'
] */
Following is an alternative that matches the same strings, but adds a nested quantifier. It then uses an atomic group to prevent the nested quantifier from creating the potential for catastrophic backtracking. Since the example above doesn't need a nested quantifier, this isn't an improvement but merely an alternative that shows how to deal with the general problem of nested quantifiers that create multiple ways to divide matches of the same strings.
// With an atomic group
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ((?> [^\(\)]+) | (?R=20))* \)
`;
// Same thing, but with a possessive quantifier
const parens = regex({flags: 'g', plugins: [recursion]})`
\( ([^\(\)]++ | (?R=20))* \)
`;
The first example above matches sequences of non-parentheses in one step with the nested +
quantifier, and avoids backtracking into these sequences by wrapping it with an atomic group (?>β¦)
. Given that what the nested quantifier +
matches overlaps with what the outer group can match with its *
quantifier, the atomic group is important here. It avoids exponential backtracking when matching long strings with unbalanced parentheses.
In cases where you're repeating a single token within an atomic group, possessive quantifiers (in this case, ++
) provide syntax sugar for the same behavior.
Atomic groups and possessive quantifiers are provided by the base Regex+ library.
const palindromes = regex({flags: 'gi', plugins: [recursion]})`
(?<char> \w)
# Recurse, or match a lone unbalanced char in the middle
((?R=15) | \w?)
\k<char>
`;
'Racecar, ABBA, and redivided'.match(palindromes);
// β ['Racecar', 'ABBA', 'edivide']
Palindromes are sequences that read the same backwards as forwards. In the example above, the max length of matched palindromes is 31. That's because it sets the max recursion depth to 15 with (?R=15)
. So, depth 15 Γ 2 chars (left + right) for each depth level + 1 optional unbalanced char in the middle = 31. To match longer palindromes, the max recursion depth can be increased to a max of 100, which would enable matching palindromes up to 201 characters long.
const palindromeWords = regex({flags: 'gi', plugins: [recursion]})`
\b
(?<palindrome>
(?<char> \w)
(\g<palindrome&R=15> | \w?)
\k<char>
)
\b
`;
'Racecar, ABBA, and redivided'.match(palindromeWords);
// β ['Racecar', 'ABBA']
Following is an example of using this library standalone, without Regex+.
import {recursion} from 'regex-recursion';
// Create a pattern that matches balanced parentheses
const pattern = String.raw`\(([^\(\)]|(?R=20))*\)`;
const processed = recursion(pattern);
// The processed pattern can be used as a standard RegExp
const re = new RegExp(processed.pattern);
re.exec('foo (bar (baz) blah) end')[0];
// β '(bar (baz) blah)'
All ES2025 regex syntax is supported, but because the generated pattern is used without Regex+, you can't include Regex+'s extended syntax like insignificant whitespace, atomic groups, possessive quantifiers, and non-recursive subroutines.
Created by Steven Levithan.
If you want to support this project, I'd love your help by contributing improvements, sharing it with others, or sponsoring ongoing development.
Β© 2024βpresent. MIT License.