Skip to content

Commit 0a0e06f

Browse files
authored
[TableGen] Fix prefix detection with anchor (NFC) (llvm#71379)
instregex uses an optimization, where the constant prefix of the regex is extracted to perform a binary search first. However, this optimization currently mainly fails to apply, because most instregex uses have an explicit ^ anchor, which gets counted as a meta char and disables the optimization. Make sure the anchor is skipped when determining the prefix. Also fix an implementation bug this exposes, where the pick a too long prefix if the first meta character is a quantifier. This cuts the time needed to generate files like X86GenInstrInfo.inc by half.
1 parent d64d5ea commit 0a0e06f

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

llvm/utils/TableGen/CodeGenSchedule.cpp

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,25 @@ struct InstRegexOp : public SetTheory::Operator {
9191
PrintFatalError(Loc, "instregex requires pattern string: " +
9292
Expr->getAsString());
9393
StringRef Original = SI->getValue();
94+
// Drop an explicit ^ anchor to not interfere with prefix search.
95+
bool HadAnchor = Original.consume_front("^");
9496

9597
// Extract a prefix that we can binary search on.
9698
static const char RegexMetachars[] = "()^$|*+?.[]\\{}";
9799
auto FirstMeta = Original.find_first_of(RegexMetachars);
100+
if (FirstMeta != StringRef::npos && FirstMeta > 0) {
101+
// If we have a regex like ABC* we can only use AB as the prefix, as
102+
// the * acts on C.
103+
switch (Original[FirstMeta]) {
104+
case '+':
105+
case '*':
106+
case '?':
107+
--FirstMeta;
108+
break;
109+
default:
110+
break;
111+
}
112+
}
98113

99114
// Look for top-level | or ?. We cannot optimize them to binary search.
100115
if (removeParens(Original).find_first_of("|?") != std::string::npos)
@@ -106,7 +121,10 @@ struct InstRegexOp : public SetTheory::Operator {
106121
if (!PatStr.empty()) {
107122
// For the rest use a python-style prefix match.
108123
std::string pat = std::string(PatStr);
109-
if (pat[0] != '^') {
124+
// Add ^ anchor. If we had one originally, don't need the group.
125+
if (HadAnchor) {
126+
pat.insert(0, "^");
127+
} else {
110128
pat.insert(0, "^(");
111129
pat.insert(pat.end(), ')');
112130
}

0 commit comments

Comments
 (0)