Skip to content

πŸͺ† Adds recursive matching to native JavaScript regexes

License

Notifications You must be signed in to change notification settings

slevithan/regex-recursion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

85 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

regex-recursion πŸͺ†

npm version npm downloads bundle

This is an official plugin for Regex+ (it can also be used standalone) that adds support for recursive matching up to a specified max depth N, where N can be between 2 and 100. Generated regexes are native JavaScript RegExp instances.

Note

Regex flavors vary on whether they offer infinite or fixed-depth recursion. For example, recursion in Oniguruma uses a default depth limit of 20.

Recursive matching is added to a regex via the following syntax. The recursion depth limit is provided in place of N.

  • (?R=N) β€” Recursively match the entire regex at this position.
  • \g<name&R=N> or \g<number&R=N> β€” Recursively match the contents of the group referenced by name or number at this position. The \g<…> subroutine must be within the referenced group.

Details:

  • Multiple uses of recursion within the same pattern are supported if they're non-overlapping.
  • Named captures and backreferences are supported within recursion and are independent per depth level. A match result's groups.name property holds the value captured by group name at the top level of the recursion stack. Subpatterns groups.name_$2, etc. are available for each level of nested subpattern matches.

πŸ“œ Contents

πŸ•ΉοΈ Install and use

npm install regex regex-recursion
import {regex} from 'regex';
import {recursion} from 'regex-recursion';

const re = regex({plugins: [recursion]})`…`;
Using CommonJS require
const {regex} = require('regex');
const {recursion} = require('regex-recursion-cjs');

const re = regex({plugins: [recursion]})`…`;

Note: regex-recursion-cjs is a third-party CommonJS wrapper for this library. It might not always be up to date with the latest version.

Using a global name in browsers
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/regex.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/regex-recursion.min.js"></script>
<script>
  const {regex} = Regex;
  const {recursion} = Regex.plugins;

  const re = regex({plugins: [recursion]})`…`;
</script>

πŸͺ§ Examples

Match an equal number of two different subpatterns

Anywhere within a string

// Matches sequences of up to 20 'a' chars followed by the same number of 'b'
const re = regex({plugins: [recursion]})`a(?R=20)?b`;
re.exec('test aaaaaabbb')[0];
// β†’ 'aaabbb'

As the entire string

Use \g<name&R=N> to recursively match just the specified group.

const re = regex({plugins: [recursion]})`
  ^ (?<r> a \g<r&R=20>? b) $
`;
re.test('aaabbb'); // β†’ true
re.test('aaabb'); // β†’ false

Match balanced parentheses

// Matches all balanced parentheses up to depth 20
const parens = regex({flags: 'g', plugins: [recursion]})`
  \( ([^\(\)] | (?R=20))* \)
`;

'test ) (balanced ((parens))) () ((a)) ( (b)'.match(parens);
/* β†’ [
  '(balanced ((parens)))',
  '()',
  '((a))',
  '(b)'
] */

Following is an alternative that matches the same strings, but adds a nested quantifier. It then uses an atomic group to prevent the nested quantifier from creating the potential for catastrophic backtracking. Since the example above doesn't need a nested quantifier, this isn't an improvement but merely an alternative that shows how to deal with the general problem of nested quantifiers that create multiple ways to divide matches of the same strings.

// With an atomic group
const parens = regex({flags: 'g', plugins: [recursion]})`
  \( ((?> [^\(\)]+) | (?R=20))* \)
`;

// Same thing, but with a possessive quantifier
const parens = regex({flags: 'g', plugins: [recursion]})`
  \( ([^\(\)]++ | (?R=20))* \)
`;

The first example above matches sequences of non-parentheses in one step with the nested + quantifier, and avoids backtracking into these sequences by wrapping it with an atomic group (?>…). Given that what the nested quantifier + matches overlaps with what the outer group can match with its * quantifier, the atomic group is important here. It avoids exponential backtracking when matching long strings with unbalanced parentheses.

In cases where you're repeating a single token within an atomic group, possessive quantifiers (in this case, ++) provide syntax sugar for the same behavior.

Atomic groups and possessive quantifiers are provided by the base Regex+ library.

Match palindromes

Match palindromes anywhere within a string

const palindromes = regex({flags: 'gi', plugins: [recursion]})`
  (?<char> \w)
  # Recurse, or match a lone unbalanced char in the middle
  ((?R=15) | \w?)
  \k<char>
`;

'Racecar, ABBA, and redivided'.match(palindromes);
// β†’ ['Racecar', 'ABBA', 'edivide']

Palindromes are sequences that read the same backwards as forwards. In the example above, the max length of matched palindromes is 31. That's because it sets the max recursion depth to 15 with (?R=15). So, depth 15 Γ— 2 chars (left + right) for each depth level + 1 optional unbalanced char in the middle = 31. To match longer palindromes, the max recursion depth can be increased to a max of 100, which would enable matching palindromes up to 201 characters long.

Match palindromes as complete words

const palindromeWords = regex({flags: 'gi', plugins: [recursion]})`
  \b
  (?<palindrome>
    (?<char> \w)
    (\g<palindrome&R=15> | \w?)
    \k<char>
  )
  \b
`;

'Racecar, ABBA, and redivided'.match(palindromeWords);
// β†’ ['Racecar', 'ABBA']

⛓️‍πŸ’₯ Standalone use

Following is an example of using this library standalone, without Regex+.

import {recursion} from 'regex-recursion';

// Create a pattern that matches balanced parentheses
const pattern = String.raw`\(([^\(\)]|(?R=20))*\)`;
const processed = recursion(pattern);

// The processed pattern can be used as a standard RegExp
const re = new RegExp(processed.pattern);
re.exec('foo (bar (baz) blah) end')[0];
// β†’ '(bar (baz) blah)'

All ES2025 regex syntax is supported, but because the generated pattern is used without Regex+, you can't include Regex+'s extended syntax like insignificant whitespace, atomic groups, possessive quantifiers, and non-recursive subroutines.

🏷️ About

Created by Steven Levithan.

Sponsors and backers

Past sponsors

If you want to support this project, I'd love your help by contributing improvements, sharing it with others, or sponsoring ongoing development.

Β© 2024–present. MIT License.

About

πŸͺ† Adds recursive matching to native JavaScript regexes

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project