TableGen: Optimize super-register class computation #134865

nhaehnle · 2025-04-08T14:51:19Z

Inferring super-register classes naively requires checking every
register class against every other register class and sub-register index.
Each of those checks is itself a non-trivial operation on register sets.

Culling as many (RC, RC, SubIdx) triples as possible is important for
the running time of TableGen for architectures with complex sub-register
relations.

Use transitivity to cull many (RC, RC, SubIdx) triples. This
unfortunately requires us to complete the transitive closure of
super-register classes explicitly, but it still cuts down the running
time on AMDGPU substantially -- in some upcoming work in the
backend by more than half (in very rough measurements).

This changes the names of some of the inferred register classes, since
the order in which they are inferred changes. The names of the inferred
register classes become shorter, which reduces the size of the generated
files.

Replacing some uses of SmallPtrSet by DenseSet shaves off a few more
percent; there are hundreds of register classes in AMDGPU.

Tweaking the topological signature check to skip reigsters without
super-registers further helps skip register classes that have "pseudo"
registers in them whose sub- and super-register structure is trivial.

Inferring register classes for getMatchingSuperRegClass via CodeGenRegBank::inferMatchingSuperRegClass is very expensive since it potentially iterates over all triple of (super class, sub class, sub class idx). Filtering out triples early is crucial. The filter based on topological signatures generally does a good job, but it fails in the presence of "pseudo registers" like FP_REG in AMDGPU: These registers don't participate in the usual subregister / superregister relationships and end up with a topological signature of 0 because of that. However, they appear in register classes whose "normal" registers all have a non-0 signature. We then fail to filter out those register classes. Since the TopoSigs are currently only used for inferring getMatchingSuperRegClass, we can explicitly skip registers that don't have super-registers. On some future work in the AMDGPU backend, this simple change reduces the running time of `llvm-tblgen -gen-register-info` by ~20% (in an extremely rough measurement).

Inferring super-register classes naively requires checking every register class against every other register class and sub-register index. Each of those checks is itself a non-trivial operation on register sets. Culling as many (RC, RC, SubIdx) triples as possible is important for the running time of TableGen for architectures with complex sub-register relations. Use transitivity to cull many (RC, RC, SubIdx) triples. This unfortunately requires us to complete the transitive closure of super-register classes explicitly, but it still cuts down the running time on AMDGPU substantially. Replacing some uses of SmallPtrSet by DenseSet shaves off a few more percent; there are hundreds of register classes in AMDGPU. This changes the names of some of the inferred register classes, since the order in which they are inferred changes. The names of the inferred register classes become shorter, which reduces the size of the generated files.

llvmbot · 2025-04-08T14:52:01Z

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-tablegen

Author: Nicolai Hähnle (nhaehnle)

Changes

Inferring super-register classes naively requires checking every
register class against every other register class and sub-register index.
Each of those checks is itself a non-trivial operation on register sets.

Culling as many (RC, RC, SubIdx) triples as possible is important for
the running time of TableGen for architectures with complex sub-register
relations.

Use transitivity to cull many (RC, RC, SubIdx) triples. This
unfortunately requires us to complete the transitive closure of
super-register classes explicitly, but it still cuts down the running
time on AMDGPU substantially -- in some upcoming work in the
backend by more than half (in very rough measurements).

This changes the names of some of the inferred register classes, since
the order in which they are inferred changes. The names of the inferred
register classes become shorter, which reduces the size of the generated
files.

Replacing some uses of SmallPtrSet by DenseSet shaves off a few more
percent; there are hundreds of register classes in AMDGPU.

Tweaking the topological signature check to skip reigsters without
super-registers further helps skip register classes that have "pseudo"
registers in them whose sub- and super-register structure is trivial.

Full diff: https://github.com/llvm/llvm-project/pull/134865.diff

3 Files Affected:

(modified) llvm/utils/TableGen/Common/CodeGenRegisters.cpp (+182-20)
(modified) llvm/utils/TableGen/Common/CodeGenRegisters.h (+41-11)
(modified) llvm/utils/TableGen/RegisterBankEmitter.cpp (+2-2)

diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
index 7105ced26be1c..3a6e828a99f2d 100644
--- a/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/Common/CodeGenRegisters.cpp
@@ -16,6 +16,8 @@
 #include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/IntEqClasses.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
@@ -765,7 +767,8 @@ static void sortAndUniqueRegisters(CodeGenRegister::Vec &M) {
 CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
                                            const Record *R)
     : TheDef(R), Name(std::string(R->getName())),
-      TopoSigs(RegBank.getNumTopoSigs()), EnumValue(-1), TSFlags(0) {
+      RegsWithSuperRegsTopoSigs(RegBank.getNumTopoSigs()), EnumValue(-1),
+      TSFlags(0) {
   GeneratePressureSet = R->getValueAsBit("GeneratePressureSet");
   std::vector<const Record *> TypeList = R->getValueAsListOfDefs("RegTypes");
   if (TypeList.empty())
@@ -791,7 +794,8 @@ CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
     const CodeGenRegister *Reg = RegBank.getReg((*Elements)[i]);
     Members.push_back(Reg);
     Artificial &= Reg->Artificial;
-    TopoSigs.set(Reg->getTopoSig());
+    if (!Reg->getSuperRegs().empty())
+      RegsWithSuperRegsTopoSigs.set(Reg->getTopoSig());
   }
   sortAndUniqueRegisters(Members);
 
@@ -849,13 +853,14 @@ CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
 CodeGenRegisterClass::CodeGenRegisterClass(CodeGenRegBank &RegBank,
                                            StringRef Name, Key Props)
     : Members(*Props.Members), TheDef(nullptr), Name(std::string(Name)),
-      TopoSigs(RegBank.getNumTopoSigs()), EnumValue(-1), RSI(Props.RSI),
-      CopyCost(0), Allocatable(true), AllocationPriority(0),
+      RegsWithSuperRegsTopoSigs(RegBank.getNumTopoSigs()), EnumValue(-1),
+      RSI(Props.RSI), CopyCost(0), Allocatable(true), AllocationPriority(0),
       GlobalPriority(false), TSFlags(0) {
   Artificial = true;
   GeneratePressureSet = false;
   for (const auto R : Members) {
-    TopoSigs.set(R->getTopoSig());
+    if (!R->getSuperRegs().empty())
+      RegsWithSuperRegsTopoSigs.set(R->getTopoSig());
     Artificial &= R->Artificial;
   }
 }
@@ -1173,6 +1178,28 @@ void CodeGenRegisterClass::buildRegUnitSet(
                    std::back_inserter(RegUnits));
 }
 
+// Combine our super classes of the given sub-register index with all of their
+// super classes in turn.
+void CodeGenRegisterClass::extendSuperRegClasses(CodeGenSubRegIndex *SubIdx) {
+  auto It = SuperRegClasses.find(SubIdx);
+  if (It == SuperRegClasses.end())
+    return;
+
+  SmallVector<CodeGenRegisterClass *> MidRCs;
+  MidRCs.insert(MidRCs.end(), It->second.begin(), It->second.end());
+
+  for (CodeGenRegisterClass *MidRC : MidRCs) {
+    for (auto &Pair : MidRC->SuperRegClasses) {
+      CodeGenSubRegIndex *ComposedSubIdx = Pair.first->compose(SubIdx);
+      if (!ComposedSubIdx)
+        continue;
+
+      for (CodeGenRegisterClass *SuperRC : Pair.second)
+        addSuperRegClass(ComposedSubIdx, SuperRC);
+    }
+  }
+}
+
 //===----------------------------------------------------------------------===//
 //                           CodeGenRegisterCategory
 //===----------------------------------------------------------------------===//
@@ -1290,6 +1317,8 @@ CodeGenRegBank::CodeGenRegBank(const RecordKeeper &Records,
     }
   }
 
+  computeSubRegIndicesRPOT();
+
   // Native register units are associated with a leaf register. They've all been
   // discovered now.
   NumNativeRegUnits = RegUnits.size();
@@ -1364,7 +1393,7 @@ void CodeGenRegBank::addToMaps(CodeGenRegisterClass *RC) {
 }
 
 // Create a synthetic sub-class if it is missing.
-CodeGenRegisterClass *
+std::pair<CodeGenRegisterClass *, bool>
 CodeGenRegBank::getOrCreateSubClass(const CodeGenRegisterClass *RC,
                                     const CodeGenRegister::Vec *Members,
                                     StringRef Name) {
@@ -1372,12 +1401,12 @@ CodeGenRegBank::getOrCreateSubClass(const CodeGenRegisterClass *RC,
   CodeGenRegisterClass::Key K(Members, RC->RSI);
   RCKeyMap::const_iterator FoundI = Key2RC.find(K);
   if (FoundI != Key2RC.end())
-    return FoundI->second;
+    return {FoundI->second, false};
 
   // Sub-class doesn't exist, create a new one.
   RegClasses.emplace_back(*this, Name, K);
   addToMaps(&RegClasses.back());
-  return &RegClasses.back();
+  return {&RegClasses.back(), true};
 }
 
 CodeGenRegisterClass *CodeGenRegBank::getRegClass(const Record *Def) const {
@@ -1694,6 +1723,81 @@ void CodeGenRegBank::computeSubRegLaneMasks() {
 
 namespace {
 
+// A directed graph on sub-register indices with a virtual source node that
+// has an arc to all other nodes, and an arc from A to B if sub-register index
+// B can be obtained by composing A with some other sub-register index.
+struct SubRegIndexCompositionGraph {
+  std::deque<CodeGenSubRegIndex> &SubRegIndices;
+  CodeGenSubRegIndex::CompMap EntryNode;
+
+  SubRegIndexCompositionGraph(std::deque<CodeGenSubRegIndex> &SubRegIndices)
+      : SubRegIndices(SubRegIndices) {
+    for (CodeGenSubRegIndex &Idx : SubRegIndices) {
+      EntryNode.try_emplace(&Idx, &Idx);
+    }
+  }
+};
+
+} // namespace
+
+template <> struct llvm::GraphTraits<SubRegIndexCompositionGraph> {
+  using NodeRef =
+      PointerUnion<CodeGenSubRegIndex *, const CodeGenSubRegIndex::CompMap *>;
+
+  // Using a reverse iterator causes sub-register indices to appear in their
+  // more natural order in RPOT.
+  using CompMapIt = CodeGenSubRegIndex::CompMap::const_reverse_iterator;
+  struct ChildIteratorType
+      : public iterator_adaptor_base<
+            ChildIteratorType, CompMapIt,
+            typename std::iterator_traits<CompMapIt>::iterator_category,
+            NodeRef> {
+    ChildIteratorType(CompMapIt I)
+        : ChildIteratorType::iterator_adaptor_base(I) {}
+
+    NodeRef operator*() const { return wrapped()->second; }
+  };
+
+  static NodeRef getEntryNode(const SubRegIndexCompositionGraph &G) {
+    return &G.EntryNode;
+  }
+
+  static const CodeGenSubRegIndex::CompMap *children(NodeRef N) {
+    if (auto *Idx = dyn_cast<CodeGenSubRegIndex *>(N))
+      return &Idx->getComposites();
+    return cast<const CodeGenSubRegIndex::CompMap *>(N);
+  }
+
+  static ChildIteratorType child_begin(NodeRef N) {
+    return ChildIteratorType(children(N)->rbegin());
+  }
+  static ChildIteratorType child_end(NodeRef N) {
+    return ChildIteratorType(children(N)->rend());
+  }
+
+  static auto nodes_begin(SubRegIndexCompositionGraph *G) {
+    return G->SubRegIndices.begin();
+  }
+  static auto nodes_end(SubRegIndexCompositionGraph *G) {
+    return G->SubRegIndices.end();
+  }
+
+  static unsigned size(SubRegIndexCompositionGraph *G) {
+    return G->SubRegIndices.size();
+  }
+};
+
+void CodeGenRegBank::computeSubRegIndicesRPOT() {
+  SubRegIndexCompositionGraph G(SubRegIndices);
+  ReversePostOrderTraversal<SubRegIndexCompositionGraph> RPOT(G);
+  for (const auto N : RPOT) {
+    if (auto *Idx = dyn_cast<CodeGenSubRegIndex *>(N))
+      SubRegIndicesRPOT.push_back(Idx);
+  }
+}
+
+namespace {
+
 // UberRegSet is a helper class for computeRegUnitWeights. Each UberRegSet is
 // the transitive closure of the union of overlapping register
 // classes. Together, the UberRegSets form a partition of the registers. If we
@@ -2323,8 +2427,10 @@ void CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) {
     if (SubIdx.Artificial)
       continue;
     // This is a real subset.  See if we have a matching class.
-    CodeGenRegisterClass *SubRC = getOrCreateSubClass(
-        RC, &I->second, RC->getName() + "_with_" + I->first->getName());
+    CodeGenRegisterClass *SubRC =
+        getOrCreateSubClass(RC, &I->second,
+                            RC->getName() + "_with_" + I->first->getName())
+            .first;
     RC->setSubClassWithSubReg(&SubIdx, SubRC);
   }
 }
@@ -2339,16 +2445,22 @@ void CodeGenRegBank::inferSubClassWithSubReg(CodeGenRegisterClass *RC) {
 void CodeGenRegBank::inferMatchingSuperRegClass(
     CodeGenRegisterClass *RC,
     std::list<CodeGenRegisterClass>::iterator FirstSubRegRC) {
+  DenseSet<const CodeGenSubRegIndex *> ImpliedSubRegIndices;
   std::vector<std::pair<const CodeGenRegister *, const CodeGenRegister *>>
       SubToSuperRegs;
   BitVector TopoSigs(getNumTopoSigs());
 
-  // Iterate in SubRegIndex numerical order to visit synthetic indices last.
-  for (auto &SubIdx : SubRegIndices) {
+  // Iterate subregister indices in topological order to visit larger indices
+  // first. This allows us to skip the smaller indices in many cases because
+  // their inferred super-register classes are implied.
+  for (auto *SubIdx : SubRegIndicesRPOT) {
     // Skip indexes that aren't fully supported by RC's registers. This was
     // computed by inferSubClassWithSubReg() above which should have been
     // called first.
-    if (RC->getSubClassWithSubReg(&SubIdx) != RC)
+    if (RC->getSubClassWithSubReg(SubIdx) != RC)
+      continue;
+
+    if (ImpliedSubRegIndices.count(SubIdx))
       continue;
 
     // Build list of (Sub, Super) pairs for this SubIdx, sorted by Sub. Note
@@ -2356,7 +2468,7 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
     SubToSuperRegs.clear();
     TopoSigs.reset();
     for (const auto Super : RC->getMembers()) {
-      const CodeGenRegister *Sub = Super->getSubRegs().find(&SubIdx)->second;
+      const CodeGenRegister *Sub = Super->getSubRegs().find(SubIdx)->second;
       assert(Sub && "Missing sub-register");
       SubToSuperRegs.emplace_back(Sub, Super);
       TopoSigs.set(Sub->getTopoSig());
@@ -2374,7 +2486,7 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
       if (SubRC.Artificial)
         continue;
       // Topological shortcut: SubRC members have the wrong shape.
-      if (!TopoSigs.anyCommon(SubRC.getTopoSigs()))
+      if (!TopoSigs.anyCommon(SubRC.getRegsWithSuperRegsTopoSigs()))
         continue;
       // Compute the subset of RC that maps into SubRC with a single linear scan
       // through SubToSuperRegs and the members of SubRC.
@@ -2395,15 +2507,54 @@ void CodeGenRegBank::inferMatchingSuperRegClass(
       // RC injects completely into SubRC.
       sortAndUniqueRegisters(SubSetVec);
       if (SubSetVec.size() == RC->getMembers().size()) {
-        SubRC.addSuperRegClass(&SubIdx, RC);
+        SubRC.addSuperRegClass(SubIdx, RC);
+
+        // We can skip checking subregister indices that can be composed from
+        // the current SubIdx.
+        //
+        // Proof sketch: Let SubRC' be another register class and SubSubIdx
+        // a subregister index that can be composed from SubIdx.
+        //
+        // Calling this function with SubRC in place of RC ensures the existence
+        // of a subclass X of SubRC with the registers that have subregisters in
+        // SubRC'.
+        //
+        // The set of registers in RC with SubSubIdx in SubRC' is equal to the
+        // set of registers in RC with SubIdx in X (because every register in
+        // RC has a corresponding subregister in SubRC), and so checking the
+        // pair (SubSubIdx, SubRC') is redundant with checking (SubIdx, X).
+        for (const auto &SubSubIdx : SubIdx->getComposites())
+          ImpliedSubRegIndices.insert(SubSubIdx.second);
+
         continue;
       }
 
       // Only a subset of RC maps into SubRC. Make sure it is represented by a
       // class.
-      getOrCreateSubClass(RC, &SubSetVec,
-                          RC->getName() + "_with_" + SubIdx.getName() + "_in_" +
-                              SubRC.getName());
+      //
+      // The name of the inferred register class follows the template
+      // "<RC>_with_<SubIdx>_in_<SubRC>".
+      //
+      // When SubRC is already an inferred class, prefer a name of the form
+      // "<RC>_with_<CompositeSubIdx>_in_<SubSubRC>" over a chain of the form
+      // "<RC>_with_<SubIdx>_in_<OtherRc>_with_<SubSubIdx>_in_<SubSubRC>".
+      CodeGenSubRegIndex *CompositeSubIdx = SubIdx;
+      CodeGenRegisterClass *CompositeSubRC = &SubRC;
+      if (CodeGenSubRegIndex *SubSubIdx = SubRC.getInferredFromSubRegIdx()) {
+        auto It = SubIdx->getComposites().find(SubSubIdx);
+        if (It != SubIdx->getComposites().end()) {
+          CompositeSubIdx = It->second;
+          CompositeSubRC = SubRC.getInferredFromRC();
+        }
+      }
+
+      auto [SubSetRC, Inserted] = getOrCreateSubClass(
+          RC, &SubSetVec,
+          RC->getName() + "_with_" + CompositeSubIdx->getName() + "_in_" +
+              CompositeSubRC->getName());
+
+      if (Inserted)
+        SubSetRC->setInferredFrom(CompositeSubIdx, CompositeSubRC);
     }
   }
 }
@@ -2438,7 +2589,7 @@ void CodeGenRegBank::computeInferredRegisterClasses() {
     inferMatchingSuperRegClass(RC);
 
     // New register classes are created while this loop is running, and we need
-    // to visit all of them.  I  particular, inferMatchingSuperRegClass needs
+    // to visit all of them.  In particular, inferMatchingSuperRegClass needs
     // to match old super-register classes with sub-register classes created
     // after inferMatchingSuperRegClass was called.  At this point,
     // inferMatchingSuperRegClass has checked SuperRC = [0..rci] with SubRC =
@@ -2451,6 +2602,17 @@ void CodeGenRegBank::computeInferredRegisterClasses() {
       FirstNewRC = NextNewRC;
     }
   }
+
+  // Compute the transitive closure for super-register classes.
+  //
+  // By iterating over sub-register indices in topological order, we only ever
+  // add super-register classes for sub-register indices that have not already
+  // been visited. That allows computing the transitive closure in a single
+  // pass.
+  for (CodeGenSubRegIndex *SubIdx : SubRegIndicesRPOT) {
+    for (CodeGenRegisterClass &SubRC : RegClasses)
+      SubRC.extendSuperRegClasses(SubIdx);
+  }
 }
 
 /// getRegisterClassForRegister - Find the register class that contains the
diff --git a/llvm/utils/TableGen/Common/CodeGenRegisters.h b/llvm/utils/TableGen/Common/CodeGenRegisters.h
index 5e2d1977545c1..f9a7904709830 100644
--- a/llvm/utils/TableGen/Common/CodeGenRegisters.h
+++ b/llvm/utils/TableGen/Common/CodeGenRegisters.h
@@ -21,7 +21,6 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SetVector.h"
-#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/ADT/StringMap.h"
@@ -337,12 +336,18 @@ class CodeGenRegisterClass {
   //
   //   R:SubRegIndex in this RC for all R in SuperRC.
   //
-  DenseMap<const CodeGenSubRegIndex *, SmallPtrSet<CodeGenRegisterClass *, 8>>
+  DenseMap<CodeGenSubRegIndex *, DenseSet<CodeGenRegisterClass *>>
       SuperRegClasses;
 
-  // Bit vector of TopoSigs for the registers in this class. This will be
-  // very sparse on regular architectures.
-  BitVector TopoSigs;
+  // Bit vector of TopoSigs for the registers with super registers in this
+  // class. This will be very sparse on regular architectures.
+  BitVector RegsWithSuperRegsTopoSigs;
+
+  // If the register class was inferred for getMatchingSuperRegClass, this
+  // holds the subregister index and subregister class for which the register
+  // class was created.
+  CodeGenSubRegIndex *InferredFromSubRegIdx = nullptr;
+  CodeGenRegisterClass *InferredFromRC = nullptr;
 
 public:
   unsigned EnumValue;
@@ -438,6 +443,8 @@ class CodeGenRegisterClass {
     SuperRegClasses[SubIdx].insert(SuperRC);
   }
 
+  void extendSuperRegClasses(CodeGenSubRegIndex *SubIdx);
+
   // getSubClasses - Returns a constant BitVector of subclasses indexed by
   // EnumValue.
   // The SubClasses vector includes an entry for this class.
@@ -463,8 +470,11 @@ class CodeGenRegisterClass {
   // getOrder(0).
   const CodeGenRegister::Vec &getMembers() const { return Members; }
 
-  // Get a bit vector of TopoSigs present in this register class.
-  const BitVector &getTopoSigs() const { return TopoSigs; }
+  // Get a bit vector of TopoSigs of registers with super registers in this
+  // register class.
+  const BitVector &getRegsWithSuperRegsTopoSigs() const {
+    return RegsWithSuperRegsTopoSigs;
+  }
 
   // Get a weight of this register class.
   unsigned getWeight(const CodeGenRegBank &) const;
@@ -505,6 +515,20 @@ class CodeGenRegisterClass {
       return TheDef->getValueAsInt("BaseClassOrder");
     return {};
   }
+
+  void setInferredFrom(CodeGenSubRegIndex *Idx, CodeGenRegisterClass *RC) {
+    assert(Idx && RC);
+    assert(!InferredFromSubRegIdx);
+
+    InferredFromSubRegIdx = Idx;
+    InferredFromRC = RC;
+  }
+
+  CodeGenSubRegIndex *getInferredFromSubRegIdx() const {
+    return InferredFromSubRegIdx;
+  }
+
+  CodeGenRegisterClass *getInferredFromRC() const { return InferredFromRC; }
 };
 
 // Register categories are used when we need to deterine the category a
@@ -587,6 +611,9 @@ class CodeGenRegBank {
   std::deque<CodeGenSubRegIndex> SubRegIndices;
   DenseMap<const Record *, CodeGenSubRegIndex *> Def2SubRegIdx;
 
+  // Subregister indices sorted topologically by composition.
+  std::vector<CodeGenSubRegIndex *> SubRegIndicesRPOT;
+
   CodeGenSubRegIndex *createSubRegIndex(StringRef Name, StringRef NameSpace);
 
   typedef std::map<SmallVector<CodeGenSubRegIndex *, 8>, CodeGenSubRegIndex *>
@@ -638,10 +665,10 @@ class CodeGenRegBank {
   // Add RC to *2RC maps.
   void addToMaps(CodeGenRegisterClass *);
 
-  // Create a synthetic sub-class if it is missing.
-  CodeGenRegisterClass *getOrCreateSubClass(const CodeGenRegisterClass *RC,
-                                            const CodeGenRegister::Vec *Membs,
-                                            StringRef Name);
+  // Create a synthetic sub-class if it is missing. Returns (RC, inserted).
+  std::pair<CodeGenRegisterClass *, bool>
+  getOrCreateSubClass(const CodeGenRegisterClass *RC,
+                      const CodeGenRegister::Vec *Membs, StringRef Name);
 
   // Infer missing register classes.
   void computeInferredRegisterClasses();
@@ -671,6 +698,9 @@ class CodeGenRegBank {
   // Compute a lane mask for each sub-register index.
   void computeSubRegLaneMasks();
 
+  // Compute RPOT of subregister indices by composition.
+  void computeSubRegIndicesRPOT();
+
   /// Computes a lane mask for each register unit enumerated by a physical
   /// register.
   void computeRegUnitLaneMasks();
diff --git a/llvm/utils/TableGen/RegisterBankEmitter.cpp b/llvm/utils/TableGen/RegisterBankEmitter.cpp
index a2fcf55e85132..e931000bb9c71 100644
--- a/llvm/utils/TableGen/RegisterBankEmitter.cpp
+++ b/llvm/utils/TableGen/RegisterBankEmitter.cpp
@@ -179,7 +179,7 @@ static void visitRegisterBankClasses(
     const CodeGenRegBank &RegisterClassHierarchy,
     const CodeGenRegisterClass *RC, const Twine &Kind,
     std::function<void(const CodeGenRegisterClass *, StringRef)> VisitFn,
-    SmallPtrSetImpl<const CodeGenRegisterClass *> &VisitedRCs) {
+    DenseSet<const CodeGenRegisterClass *> &VisitedRCs) {
 
   // Make sure we only visit each class once to avoid infinite loops.
   if (!VisitedRCs.insert(RC).second)
@@ -390,7 +390,7 @@ void RegisterBankEmitter::run(raw_ostream &OS) {
   Timer.startTimer("Analyze records");
   std::vector<RegisterBank> Banks;
   for (const auto &V : Records.getAllDerivedDefinitions("RegisterBank")) {
-    SmallPtrSet<const CodeGenRegisterClass *, 8> VisitedRCs;
+    DenseSet<const CodeGenRegisterClass *> VisitedRCs;
     RegisterBank Bank(*V, CGH.getNumModeIds());
 
     for (const CodeGenRegisterClass *RC :

llvm-ci · 2025-04-11T01:31:56Z

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 9 "test-build-bolt-check-large-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/16960

Here is the relevant piece of the build log for the reference

Step 9 (test-build-bolt-check-large-bolt) failure: test (failure)
******************** TEST 'bolt-tests :: X86/clang-nolbr.test' FAILED ********************
Exit Code: 254

Command Output (stderr):
--
mkdir -p /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output # RUN: at line 5
+ mkdir -p /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output
test -f /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang || unzstd /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/clang.zst -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang # RUN: at line 6
+ test -f /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang
perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.data --    /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/bf.cpp -O2 -std=c++11 -c -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.out # RUN: at line 9
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.data -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/bf.cpp -O2 -std=c++11 -c -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.out
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.433 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.data (11117 samples) ]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang -p /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.data -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.fdata -nl    |& /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/clang-nolbr.test -check-prefix=CHECK-P2B # RUN: at line 13
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang -p /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.data -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.fdata -nl
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/clang-nolbr.test -check-prefix=CHECK-P2B
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/llvm-bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp -data /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.fdata     -relocs -reorder-blocks=ext-tsp -split-functions=3 -split-all-cold     -split-eh -icf=1 -reorder-functions=hfsort+ -use-gnu-stack     -jump-tables=move -frame-opt=hot -peepholes=all -dyno-stats    |& /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/clang-nolbr.test -check-prefix=CHECK-BOLT # RUN: at line 17
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/llvm-bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Output/clang -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp -data /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.fdata -relocs -reorder-blocks=ext-tsp -split-functions=3 -split-all-cold -split-eh -icf=1 -reorder-functions=hfsort+ -use-gnu-stack -jump-tables=move -frame-opt=hot -peepholes=all -dyno-stats
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/clang-nolbr.test -check-prefix=CHECK-BOLT
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/bf.cpp -O2 -std=c++11 -c -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.out # RUN: at line 24
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/bf.cpp -O2 -std=c++11 -c -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.out
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x1d)[0xb91357]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN4llvm3sys17RunSignalHandlersEv+0x2f)[0xb90173]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0xb903f1]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f7c61c42520]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x37482df]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x373f667]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x199f49e]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x373db1f]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x365)[0x341c16f]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x36)[0x341bdfe]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x185)[0x2219375]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EE+0x825)[0x22a3455]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x2385f1e]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang8ParseASTERNS_4SemaEbb+0x140)[0x1e933b0]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang13CodeGenAction13ExecuteActionEv+0x36)[0x2385986]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang14FrontendAction7ExecuteEv+0x24)[0x2337034]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x120)[0x231c0e0]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x1a7)[0x23433a7]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0x415)[0x2173c15]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp(main+0x4da)[0x1768eea]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f7c61c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f7c61c29e40]
/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp[0x2172d29]
Stack dump:
0.	Program arguments: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -discard-value-names -main-file-name bf.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -momit-leaf-frame-pointer -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.gcno -resource-dir /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/lib/clang/6.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward -internal-isystem /usr/local/include -internal-isystem /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/lib/clang/6.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86 -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolttests/X86/Output/clang-nolbr.test.tmp.out -x c++ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/bolt-tests/test/X86/Inputs/bf.cpp 
1.	<eof> parser at end of file
2.	Code generation
...

bjope · 2025-04-15T11:50:05Z

@nhaehnle , I'm trying to understand the impact of this for our downstream out-of-tree target. Some of our benchmarks has shown regressions, and one thing I've noticed is that the register pressure sets differ a lot. There is really no comments in the commit message about this really changing anything (except for register class names). And there aren't any test cases that are impacted.

(No idea yet if we get codegen diffs due to something special we have in our downstream fork.)

Just want to double check if this just is supposed to save compilation time, without any changes in codegen. Or am I really supposed to see impact in llc codegen with this patch?

bjope · 2025-04-15T12:08:12Z

@nhaehnle , I'm trying to understand the impact of this for our downstream out-of-tree target. Some of our benchmarks has shown regressions, and one thing I've noticed is that the register pressure sets differ a lot. There is really no comments in the commit message about this really changing anything (except for register class names). And there aren't any test cases that are impacted.

(No idea yet if we get codegen diffs due to something special we have in our downstream fork.)

Just want to double check if this just is supposed to save compilation time, without any changes in codegen. Or am I really supposed to see impact in llc codegen with this patch?

I did check for some in-tree target and then the GenRegisterInfo.inc files isn't impacted. So it looks like we have some downstream added property on the register classes that isn't inferred the same way after this patch. Hmm. We have probably
been "lucky" with iteration order or something in the past. I'll keep digging on my end as I suspect this is a downstream issue.

Inferring super-register classes naively requires checking every register class against every other register class and sub-register index. Each of those checks is itself a non-trivial operation on register sets. Culling as many (RC, RC, SubIdx) triples as possible is important for the running time of TableGen for architectures with complex sub-register relations. Use transitivity to cull many (RC, RC, SubIdx) triples. This unfortunately requires us to complete the transitive closure of super-register classes explicitly, but it still cuts down the running time on AMDGPU substantially -- in some upcoming work in the backend by more than half (in very rough measurements). This changes the names of some of the inferred register classes, since the order in which they are inferred changes. The names of the inferred register classes become shorter, which reduces the size of the generated files. Replacing some uses of SmallPtrSet by DenseSet shaves off a few more percent; there are hundreds of register classes in AMDGPU. Tweaking the topological signature check to skip reigsters without super-registers further helps skip register classes that have "pseudo" registers in them whose sub- and super-register structure is trivial.

nhaehnle · 2025-04-18T16:56:36Z

Just to confirm, this was really only supposed to improve compile time. So yeah, if you depended on iteration order for something additional downstream, you may have gotten unlucky now.

nhaehnle added 2 commits April 8, 2025 16:38

nhaehnle requested review from jayfoad, arsenm and Sisyph April 8, 2025 14:51

llvmbot added the tablegen label Apr 8, 2025

arsenm approved these changes Apr 8, 2025

View reviewed changes

arsenm added the llvm:regalloc label Apr 8, 2025

nhaehnle merged commit 9c31155 into llvm:main Apr 10, 2025
14 checks passed

nhaehnle deleted the pub-tblgen branch April 10, 2025 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TableGen: Optimize super-register class computation #134865

TableGen: Optimize super-register class computation #134865

Uh oh!

nhaehnle commented Apr 8, 2025

Uh oh!

llvmbot commented Apr 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

llvm-ci commented Apr 11, 2025

Uh oh!

bjope commented Apr 15, 2025

Uh oh!

bjope commented Apr 15, 2025

Uh oh!

nhaehnle commented Apr 18, 2025

Uh oh!

Uh oh!

TableGen: Optimize super-register class computation #134865

TableGen: Optimize super-register class computation #134865

Uh oh!

Conversation

nhaehnle commented Apr 8, 2025

Uh oh!

llvmbot commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Apr 11, 2025

Uh oh!

bjope commented Apr 15, 2025

Uh oh!

bjope commented Apr 15, 2025

Uh oh!

nhaehnle commented Apr 18, 2025

Uh oh!

Uh oh!

llvmbot commented Apr 8, 2025 •

edited

Loading