Skip to content

[TableGen] Add !instances operator to get defined records #129680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 28, 2025

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Mar 4, 2025

The format is: !instances<T>([regex]).

This operator produces a list of records whose type is T. If
regex is provided, only records whose name matches the regular
expression regex will be included. The format of regex is ERE
(Extended POSIX Regular Expressions).

@llvmbot
Copy link
Member

llvmbot commented Mar 4, 2025

@llvm/pr-subscribers-tablegen

Author: Pengcheng Wang (wangpc-pp)

Changes

This operator is like !select&lt;T&gt;(regex), which produces a list of
records whose type is T and record names match regular expression
regex.


Full diff: https://github.com/llvm/llvm-project/pull/129680.diff

7 Files Affected:

  • (modified) llvm/docs/TableGen/ProgRef.rst (+8-4)
  • (modified) llvm/include/llvm/TableGen/Record.h (+33)
  • (modified) llvm/lib/TableGen/Record.cpp (+62)
  • (modified) llvm/lib/TableGen/TGLexer.cpp (+1)
  • (modified) llvm/lib/TableGen/TGLexer.h (+2-1)
  • (modified) llvm/lib/TableGen/TGParser.cpp (+38)
  • (added) llvm/test/TableGen/select.td (+60)
diff --git a/llvm/docs/TableGen/ProgRef.rst b/llvm/docs/TableGen/ProgRef.rst
index edb97109c9289..563832e324539 100644
--- a/llvm/docs/TableGen/ProgRef.rst
+++ b/llvm/docs/TableGen/ProgRef.rst
@@ -226,10 +226,10 @@ TableGen provides "bang operators" that have a wide variety of uses:
                : !initialized !interleave  !isa         !le          !listconcat
                : !listflatten !listremove  !listsplat   !logtwo      !lt
                : !mul         !ne          !not         !or          !range
-               : !repr        !setdagarg   !setdagname  !setdagop    !shl
-               : !size        !sra         !srl         !strconcat   !sub
-               : !subst       !substr      !tail        !tolower     !toupper
-               : !xor
+               : !repr        !select      !setdagarg   !setdagname  !setdagop
+               : !shl         !size        !sra         !srl         !strconcat
+               : !sub         !subst       !substr      !tail        !tolower
+               : !toupper     !xor
 
 The ``!cond`` operator has a slightly different
 syntax compared to other bang operators, so it is defined separately:
@@ -1920,6 +1920,10 @@ and non-0 as true.
     Represents *value* as a string. String format for the value is not
     guaranteed to be stable. Intended for debugging purposes only.
 
+``!select<``\ *type*\ ``>(``\ *regex*\ ``)``
+    This operator produces a list of records whose type is *type* and record
+    names match regular expression *regex*.
+
 ``!setdagarg(``\ *dag*\ ``,``\ *key*\ ``,``\ *arg*\ ``)``
     This operator produces a DAG node with the same operator and arguments as
     *dag*, but replacing the value of the argument specified by the *key* with
diff --git a/llvm/include/llvm/TableGen/Record.h b/llvm/include/llvm/TableGen/Record.h
index 334007524c954..22c5c7032864b 100644
--- a/llvm/include/llvm/TableGen/Record.h
+++ b/llvm/include/llvm/TableGen/Record.h
@@ -316,6 +316,7 @@ class Init {
     IK_FoldOpInit,
     IK_IsAOpInit,
     IK_ExistsOpInit,
+    IK_SelectOpInit,
     IK_AnonymousNameInit,
     IK_StringInit,
     IK_VarInit,
@@ -1191,6 +1192,38 @@ class ExistsOpInit final : public TypedInit, public FoldingSetNode {
   std::string getAsString() const override;
 };
 
+/// !select<type>(regex) - Produces a list of records whose type is `type` and
+/// record names match regular expression `regex`.
+class SelectOpInit final : public TypedInit, public FoldingSetNode {
+private:
+  const RecTy *Type;
+  const Init *Regex;
+
+  SelectOpInit(const RecTy *Type, const Init *Regex)
+      : TypedInit(IK_SelectOpInit, IntRecTy::get(Type->getRecordKeeper())),
+        Type(Type), Regex(Regex) {}
+
+public:
+  SelectOpInit(const SelectOpInit &) = delete;
+  SelectOpInit &operator=(const SelectOpInit &) = delete;
+
+  static bool classof(const Init *I) { return I->getKind() == IK_SelectOpInit; }
+
+  static const SelectOpInit *get(const RecTy *Type, const Init *Regex);
+
+  void Profile(FoldingSetNodeID &ID) const;
+
+  const Init *Fold() const;
+
+  bool isComplete() const override { return false; }
+
+  const Init *resolveReferences(Resolver &R) const override;
+
+  const Init *getBit(unsigned Bit) const override;
+
+  std::string getAsString() const override;
+};
+
 /// 'Opcode' - Represent a reference to an entire variable object.
 class VarInit final : public TypedInit {
   const Init *VarName;
diff --git a/llvm/lib/TableGen/Record.cpp b/llvm/lib/TableGen/Record.cpp
index 590656786bc66..68870991e672d 100644
--- a/llvm/lib/TableGen/Record.cpp
+++ b/llvm/lib/TableGen/Record.cpp
@@ -25,6 +25,7 @@
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
+#include "llvm/Support/Regex.h"
 #include "llvm/Support/SMLoc.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TableGen/Error.h"
@@ -83,6 +84,7 @@ struct RecordKeeperImpl {
   FoldingSet<FoldOpInit> TheFoldOpInitPool;
   FoldingSet<IsAOpInit> TheIsAOpInitPool;
   FoldingSet<ExistsOpInit> TheExistsOpInitPool;
+  FoldingSet<SelectOpInit> TheSelectOpInitPool;
   DenseMap<std::pair<const RecTy *, const Init *>, VarInit *> TheVarInitPool;
   DenseMap<std::pair<const TypedInit *, unsigned>, VarBitInit *>
       TheVarBitInitPool;
@@ -2199,6 +2201,66 @@ std::string ExistsOpInit::getAsString() const {
       .str();
 }
 
+static void ProfileSelectOpInit(FoldingSetNodeID &ID, const RecTy *Type,
+                                const Init *Regex) {
+  ID.AddPointer(Type);
+  ID.AddPointer(Regex);
+}
+
+const SelectOpInit *SelectOpInit::get(const RecTy *Type, const Init *Regex) {
+  FoldingSetNodeID ID;
+  ProfileSelectOpInit(ID, Type, Regex);
+
+  detail::RecordKeeperImpl &RK = Regex->getRecordKeeper().getImpl();
+  void *IP = nullptr;
+  if (const SelectOpInit *I =
+          RK.TheSelectOpInitPool.FindNodeOrInsertPos(ID, IP))
+    return I;
+
+  SelectOpInit *I = new (RK.Allocator) SelectOpInit(Type, Regex);
+  RK.TheSelectOpInitPool.InsertNode(I, IP);
+  return I;
+}
+
+void SelectOpInit::Profile(FoldingSetNodeID &ID) const {
+  ProfileSelectOpInit(ID, Type, Regex);
+}
+
+const Init *SelectOpInit::Fold() const {
+  if (const auto *RegexInit = dyn_cast<StringInit>(Regex)) {
+    StringRef RegexStr = RegexInit->getValue();
+    llvm::Regex Matcher(RegexStr);
+    if (!Matcher.isValid())
+      PrintFatalError(Twine("invalid regex '") + RegexStr + Twine("'"));
+
+    const RecordKeeper &RK = Type->getRecordKeeper();
+    SmallVector<Init *, 8> Selected;
+    for (auto &Def : RK.getAllDerivedDefinitionsIfDefined(Type->getAsString()))
+      if (Matcher.match(Def->getName()))
+        Selected.push_back(Def->getDefInit());
+
+    return ListInit::get(Selected, Type);
+  }
+  return this;
+}
+
+const Init *SelectOpInit::resolveReferences(Resolver &R) const {
+  const Init *NewRegex = Regex->resolveReferences(R);
+  if (Regex != NewRegex)
+    return get(Type, NewRegex)->Fold();
+  return this;
+}
+
+const Init *SelectOpInit::getBit(unsigned Bit) const {
+  return VarBitInit::get(this, Bit);
+}
+
+std::string SelectOpInit::getAsString() const {
+  return (Twine("!select<") + Type->getAsString() + ">(" +
+          Regex->getAsString() + ")")
+      .str();
+}
+
 const RecTy *TypedInit::getFieldType(const StringInit *FieldName) const {
   if (const auto *RecordType = dyn_cast<RecordRecTy>(getType())) {
     for (const Record *Rec : RecordType->getClasses()) {
diff --git a/llvm/lib/TableGen/TGLexer.cpp b/llvm/lib/TableGen/TGLexer.cpp
index 983242ade0fe5..e495aeb9cb7fa 100644
--- a/llvm/lib/TableGen/TGLexer.cpp
+++ b/llvm/lib/TableGen/TGLexer.cpp
@@ -644,6 +644,7 @@ tgtok::TokKind TGLexer::LexExclaim() {
           .Case("tolower", tgtok::XToLower)
           .Case("toupper", tgtok::XToUpper)
           .Case("repr", tgtok::XRepr)
+          .Case("select", tgtok::XSelect)
           .Default(tgtok::Error);
 
   return Kind != tgtok::Error ? Kind
diff --git a/llvm/lib/TableGen/TGLexer.h b/llvm/lib/TableGen/TGLexer.h
index 6680915211205..e2fe98b483c7a 100644
--- a/llvm/lib/TableGen/TGLexer.h
+++ b/llvm/lib/TableGen/TGLexer.h
@@ -158,7 +158,8 @@ enum TokKind {
   XSetDagArg,
   XSetDagName,
   XRepr,
-  BANG_OPERATOR_LAST = XRepr,
+  XSelect,
+  BANG_OPERATOR_LAST = XSelect,
 
   // String valued tokens.
   STRING_VALUE_FIRST,
diff --git a/llvm/lib/TableGen/TGParser.cpp b/llvm/lib/TableGen/TGParser.cpp
index 9a8301cffb930..b8f312be73884 100644
--- a/llvm/lib/TableGen/TGParser.cpp
+++ b/llvm/lib/TableGen/TGParser.cpp
@@ -1455,6 +1455,44 @@ const Init *TGParser::ParseOperation(Record *CurRec, const RecTy *ItemType) {
     return (ExistsOpInit::get(Type, Expr))->Fold(CurRec);
   }
 
+  case tgtok::XSelect: {
+    // Value ::= !select '<' Type '>' '(' Regex ')'
+    Lex.Lex(); // eat the operation.
+
+    const RecTy *Type = ParseOperatorType();
+    if (!Type)
+      return nullptr;
+
+    if (!consume(tgtok::l_paren)) {
+      TokError("expected '(' after type of !select");
+      return nullptr;
+    }
+
+    SMLoc RegexLoc = Lex.getLoc();
+    const Init *Regex = ParseValue(CurRec);
+    if (!Regex)
+      return nullptr;
+
+    const auto *RegexType = dyn_cast<TypedInit>(Regex);
+    if (!RegexType) {
+      Error(RegexLoc, "expected string type argument in !select operator");
+      return nullptr;
+    }
+
+    const auto *SType = dyn_cast<StringRecTy>(RegexType->getType());
+    if (!SType) {
+      Error(RegexLoc, "expected string type argument in !select operator");
+      return nullptr;
+    }
+
+    if (!consume(tgtok::r_paren)) {
+      TokError("expected ')' in !select");
+      return nullptr;
+    }
+
+    return (SelectOpInit::get(Type, Regex))->Fold();
+  }
+
   case tgtok::XConcat:
   case tgtok::XADD:
   case tgtok::XSUB:
diff --git a/llvm/test/TableGen/select.td b/llvm/test/TableGen/select.td
new file mode 100644
index 0000000000000..22e0c61784023
--- /dev/null
+++ b/llvm/test/TableGen/select.td
@@ -0,0 +1,60 @@
+// RUN: llvm-tblgen %s | FileCheck %s
+// RUN: not llvm-tblgen -DERROR1 %s 2>&1 | FileCheck --check-prefix=ERROR1 %s
+// RUN: not llvm-tblgen -DERROR2 %s 2>&1 | FileCheck --check-prefix=ERROR2 %s
+// RUN: not llvm-tblgen -DERROR3 %s 2>&1 | FileCheck --check-prefix=ERROR3 %s
+// XFAIL: vg_leak
+
+class A;
+def a0 : A;
+def a1 : A;
+
+class B : A;
+def b0 : B;
+def b1 : B;
+
+def select_A {
+  list<A> selected = !select<A>(".*");
+}
+
+def select_A_x0 {
+  list<A> selected = !select<A>(".*0");
+}
+
+def select_A_x1 {
+  list<A> selected = !select<A>(".*1");
+}
+
+def select_B {
+  list<B> selected = !select<B>(".*");
+}
+
+// CHECK-LABEL: def select_A {
+// CHECK-NEXT:    list<A> selected = [a0, a1, b0, b1];
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: def select_A_x0 {
+// CHECK-NEXT:    list<A> selected = [a0, b0];
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: def select_A_x1 {
+// CHECK-NEXT:    list<A> selected = [a1, b1];
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: def select_B {
+// CHECK-NEXT:    list<B> selected = [b0, b1];
+// CHECK-NEXT:  }
+
+#ifdef ERROR1
+defvar error1 = !select<A>(123)
+// ERROR1: error: expected string type argument in !select operator
+#endif
+
+#ifdef ERROR2
+defvar error2 = !select<1>("")
+// ERROR2: error: Unknown token when expecting a type
+#endif
+
+#ifdef ERROR3
+defvar error3 = !select<A>("([)]")
+// ERROR3: error: invalid regex '([)]'
+#endif

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Mar 4, 2025

Add some RISC-V reviewers here because the intention is to simplify the generation of lists of RVV pseudos. For example:

list<RISCVVPseudo> VADD = !select<RISCVVPseudo>(".*VADD.*");
list<RISCVVPseudo> M8 = !select<RISCVVPseudo>(".*_M8");

@jayfoad
Copy link
Contributor

jayfoad commented Mar 4, 2025

My gut feeling is that this operator does too much, and the name "select" is too generic. Can you split the functionality into smaller pieces:

  • generate a list of names of records of type T
  • filter a list based on a regex (using !filter?)
  • cast the names back to records (using !foreach and !cast?)

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Mar 4, 2025

My gut feeling is that this operator does too much, and the name "select" is too generic. Can you split the functionality into smaller pieces:

  • generate a list of names of records of type T
  • filter a list based on a regex (using !filter?)
  • cast the names back to records (using !foreach and !cast?)

There are two things we can't do with current operators:

  1. We can't generate names based on type. We can only do some string manipulations like concatenation/replacement.
  2. We can't filter a list based on a regex via using !filter because no operator supports regex now.

What about this solution?

  1. For the first problem, we can add a operator !defined<T>() which is a wrapper of getAllDerivedDefinitionsIfDefined.
  2. For the second problem, we can add an operator !match(str, regex) to test if str matches regex.
  3. To simplify !foreach(!filter(!match(), !defined()), !cast) chain (and optimize the performance since it takes too many intermediate steps), the !defined can accept an optional regex parameter and the format becomes !defined<T>(regex?).

WDYT?

@wangpc-pp wangpc-pp requested a review from jayfoad March 4, 2025 11:37
@jayfoad
Copy link
Contributor

jayfoad commented Mar 4, 2025

  1. For the first problem, we can add a operator !defined<T>() which is a wrapper of getAllDerivedDefinitionsIfDefined.

Sounds reasonable to me.

  1. For the second problem, we can add an operator !match(str, regex) to test if str matches regex.

Sounds reasonable, although there might be arguments about exactly what regex syntax it should support.

  1. To simplify !foreach(!filter(!match(), !defined()), !cast) chain (and optimize the performance since it takes too many intermediate steps), the !defined can accept an optional regex parameter and the format becomes !defined<T>(regex?).

This seems like premature optimization. Is there really a significant performance problem?

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this pull request Mar 4, 2025
These predicates can also be used in macro fusion and scheduling
model.

This is stacked on llvm#129680.
@wangpc-pp
Copy link
Contributor Author

  1. For the first problem, we can add a operator !defined<T>() which is a wrapper of getAllDerivedDefinitionsIfDefined.

Sounds reasonable to me.

  1. For the second problem, we can add an operator !match(str, regex) to test if str matches regex.

Sounds reasonable, although there might be arguments about exactly what regex syntax it should support.

I just use llvm/include/llvm/Support/Regex.h so it is a POSIX one: This file implements a POSIX regular expression matcher. Both Basic and Extended POSIX regular expressions (ERE) are supported.

  1. To simplify !foreach(!filter(!match(), !defined()), !cast) chain (and optimize the performance since it takes too many intermediate steps), the !defined can accept an optional regex parameter and the format becomes !defined<T>(regex?).

This seems like premature optimization. Is there really a significant performance problem?

I strongly believe it is! The TableGen chained operators will be at least 100x slower than the native filter implemented by C++.

@arsenm
Copy link
Contributor

arsenm commented Mar 4, 2025

This seems like premature optimization. Is there really a significant performance problem?

It's tablegen, so when isn't there one?

@wangpc-pp wangpc-pp force-pushed the main-tablegen-select branch from 8e9bdeb to 69b0b7b Compare March 12, 2025 09:52
@wangpc-pp wangpc-pp changed the title [TableGen] Add !select operator to select records [TableGen] Add !defined operator to get defined records Mar 12, 2025
@wangpc-pp
Copy link
Contributor Author

Ping!

@jayfoad
Copy link
Contributor

jayfoad commented Mar 17, 2025

I don't like the name !defined. Maybe !instances would be better? Anyone else have ideas?

@wangpc-pp
Copy link
Contributor Author

I don't like the name !defined. Maybe !instances would be better? Anyone else have ideas?

Maybe records? record is the terminology used in https://llvm.org/docs/TableGen/ProgRef.html.

@wangpc-pp wangpc-pp force-pushed the main-tablegen-select branch from 69b0b7b to d1535bf Compare March 17, 2025 12:34
@wangpc-pp wangpc-pp changed the title [TableGen] Add !defined operator to get defined records [TableGen] Add !records operator to get defined records Mar 17, 2025
@jayfoad
Copy link
Contributor

jayfoad commented Mar 18, 2025

I don't like the name !defined. Maybe !instances would be better? Anyone else have ideas?

Maybe records? record is the terminology used in https://llvm.org/docs/TableGen/ProgRef.html.

As an aside about naming: I know that record are called "records". But the name of this operator should suggest not just "records"; it should suggest "list of all records derived from a specified class".

@wangpc-pp
Copy link
Contributor Author

I don't like the name !defined. Maybe !instances would be better? Anyone else have ideas?

Maybe records? record is the terminology used in https://llvm.org/docs/TableGen/ProgRef.html.

As an aside about naming: I know that record are called "records". But the name of this operator should suggest not just "records"; it should suggest "list of all records derived from a specified class".

Yeah, what about recordsof?

@jayfoad
Copy link
Contributor

jayfoad commented Mar 18, 2025

I think I still prefer !instances, but I would be happy to hear any other suggestions or opinions!

@wangpc-pp
Copy link
Contributor Author

I think I still prefer !instances, but I would be happy to hear any other suggestions or opinions!

Make sense to me, I renamed it to !instances!

@wangpc-pp wangpc-pp changed the title [TableGen] Add !records operator to get defined records [TableGen] Add !instances operator to get defined records Mar 18, 2025
@Paul-C-Anagnostopoulos
Copy link

I think I still prefer !instances, but I would be happy to hear any other suggestions or opinions!

Yes, !instances.

@wangpc-pp wangpc-pp requested review from jayfoad and jurahul March 20, 2025 04:19
@wangpc-pp wangpc-pp force-pushed the main-tablegen-select branch from be046e8 to 72eeedc Compare March 25, 2025 10:18
Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The format is: `!records<T>([regex])`.

This operator produces a list of records whose type is `T`. If
`regex` is provided, only records whose name matches the regular
expression `regex` will be included. The format of `regex` is ERE
(Extended POSIX Regular Expressions).
@wangpc-pp wangpc-pp force-pushed the main-tablegen-select branch from e5c2d39 to 66f71ba Compare March 28, 2025 08:26
@wangpc-pp wangpc-pp merged commit 8836128 into llvm:main Mar 28, 2025
7 of 11 checks passed
@wangpc-pp wangpc-pp deleted the main-tablegen-select branch March 28, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants