Skip to content

[TableGen] Improvements to Named operands in InstrInfoEmitter #124960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

jurahul
Copy link
Contributor

@jurahul jurahul commented Jan 29, 2025

  • Assign OpName enum values in the same alphabetical order in which they are emitted.
  • Get rid of OPERAND_LAST which is not used anywhere.
  • Inline initOperandMapData into emitOperandNameMappings to help see clearly how various maps are computed.
  • Emit the static OperandMap table as int8_t when possible. This should help reduce the static size by 50% in the common case.
  • Change maps/vectors to use StringRef instead of std::string to avoid unnecessary copies.

@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/pr-subscribers-tablegen

Author: Rahul Joshi (jurahul)

Changes
  • Assign OpName enum values in the same alphabetical order in which they are emitted (and get of OPERAND_LAST which is not used anywhere).
  • Inline initOperandMapData into emitOperandNameMappings to help see clearly how various maps are computed.
  • Emit the static OperandMap table as int8_t when possible. This should help reduce the static size by 50% in the common case.

Full diff: https://github.com/llvm/llvm-project/pull/124960.diff

1 Files Affected:

  • (modified) llvm/utils/TableGen/InstrInfoEmitter.cpp (+54-66)
diff --git a/llvm/utils/TableGen/InstrInfoEmitter.cpp b/llvm/utils/TableGen/InstrInfoEmitter.cpp
index 12401a2f246a14..4f1bbe117bc9e8 100644
--- a/llvm/utils/TableGen/InstrInfoEmitter.cpp
+++ b/llvm/utils/TableGen/InstrInfoEmitter.cpp
@@ -40,7 +40,7 @@
 
 using namespace llvm;
 
-cl::OptionCategory InstrInfoEmitterCat("Options for -gen-instr-info");
+static cl::OptionCategory InstrInfoEmitterCat("Options for -gen-instr-info");
 static cl::opt<bool> ExpandMIOperandInfo(
     "instr-info-expand-mi-operand-info",
     cl::desc("Expand operand's MIOperandInfo DAG into suboperands"),
@@ -67,13 +67,6 @@ class InstrInfoEmitter {
   typedef std::vector<OperandInfoTy> OperandInfoListTy;
   typedef std::map<OperandInfoTy, unsigned> OperandInfoMapTy;
 
-  /// The keys of this map are maps which have OpName enum values as their keys
-  /// and instruction operand indices as their values.  The values of this map
-  /// are lists of instruction names.
-  typedef std::map<std::map<unsigned, unsigned>, std::vector<std::string>>
-      OpNameMapTy;
-  typedef std::map<std::string, unsigned>::iterator StrUintMapIter;
-
   /// Generate member functions in the target-specific GenInstrInfo class.
   ///
   /// This method is used to custom expand TIIPredicate definitions.
@@ -95,15 +88,9 @@ class InstrInfoEmitter {
   void emitOperandTypeMappings(
       raw_ostream &OS, const CodeGenTarget &Target,
       ArrayRef<const CodeGenInstruction *> NumberedInstructions);
-  void
-  initOperandMapData(ArrayRef<const CodeGenInstruction *> NumberedInstructions,
-                     StringRef Namespace,
-                     std::map<std::string, unsigned> &Operands,
-                     OpNameMapTy &OperandMap);
   void emitOperandNameMappings(
       raw_ostream &OS, const CodeGenTarget &Target,
       ArrayRef<const CodeGenInstruction *> NumberedInstructions);
-
   void emitLogicalOperandSizeMappings(
       raw_ostream &OS, StringRef Namespace,
       ArrayRef<const CodeGenInstruction *> NumberedInstructions);
@@ -239,35 +226,6 @@ void InstrInfoEmitter::EmitOperandInfo(raw_ostream &OS,
   }
 }
 
-/// Initialize data structures for generating operand name mappings.
-///
-/// \param Operands [out] A map used to generate the OpName enum with operand
-///        names as its keys and operand enum values as its values.
-/// \param OperandMap [out] A map for representing the operand name mappings for
-///        each instructions.  This is used to generate the OperandMap table as
-///        well as the getNamedOperandIdx() function.
-void InstrInfoEmitter::initOperandMapData(
-    ArrayRef<const CodeGenInstruction *> NumberedInstructions,
-    StringRef Namespace, std::map<std::string, unsigned> &Operands,
-    OpNameMapTy &OperandMap) {
-  unsigned NumOperands = 0;
-  for (const CodeGenInstruction *Inst : NumberedInstructions) {
-    if (!Inst->TheDef->getValueAsBit("UseNamedOperandTable"))
-      continue;
-    std::map<unsigned, unsigned> OpList;
-    for (const auto &Info : Inst->Operands) {
-      StrUintMapIter I = Operands.find(Info.Name);
-
-      if (I == Operands.end()) {
-        I = Operands.insert(Operands.begin(), {Info.Name, NumOperands++});
-      }
-      OpList[I->second] = Info.MIOperandNo;
-    }
-    OperandMap[OpList].push_back(Namespace.str() +
-                                 "::" + Inst->TheDef->getName().str());
-  }
-}
-
 /// Generate a table and function for looking up the indices of operands by
 /// name.
 ///
@@ -283,22 +241,52 @@ void InstrInfoEmitter::emitOperandNameMappings(
     raw_ostream &OS, const CodeGenTarget &Target,
     ArrayRef<const CodeGenInstruction *> NumberedInstructions) {
   StringRef Namespace = Target.getInstNamespace();
-  // Map of operand names to their enumeration value.  This will be used to
-  // generate the OpName enum.
-  std::map<std::string, unsigned> Operands;
-  OpNameMapTy OperandMap;
 
-  initOperandMapData(NumberedInstructions, Namespace, Operands, OperandMap);
+  /// To facilitate assigning OpName enum values in the sorted alphabetical
+  /// order, we go through an indirection from OpName -> ID, and Enum -> ID.
+  /// This allows us to build the OpList and assign IDs to OpNames in a single
+  /// scan of the instructions below.
+
+  // Map of operand names to their ID.
+  std::map<std::string, unsigned> OperandNameToID;
+  // Map from operand name enum value -> ID.
+  std::vector<unsigned> OperandEnumToID;
+
+  /// The keys of this map is a map which have OpName ID values as their keys
+  /// and instruction operand indices as their values. The values of this map
+  /// are lists of instruction names. This map helps to unique entries among
+  /// instructions that have identical OpName -> Operand index mapping.
+  std::map<std::map<unsigned, unsigned>, std::vector<std::string>> OperandMap;
+
+  // Max operand index seen.
+  unsigned MaxOperandNo = 0;
+  unsigned NumOperandNames = 0;
+
+  for (const CodeGenInstruction *Inst : NumberedInstructions) {
+    if (!Inst->TheDef->getValueAsBit("UseNamedOperandTable"))
+      continue;
+    std::map<unsigned, unsigned> OpList;
+    for (const auto &Info : Inst->Operands) {
+      auto [I, Inserted] = OperandNameToID.try_emplace(Info.Name, 0);
+      if (Inserted)
+        I->second = NumOperandNames++;
+      OpList[I->second] = Info.MIOperandNo;
+      MaxOperandNo = std::max(MaxOperandNo, Info.MIOperandNo);
+    }
+    OperandMap[OpList].push_back(Inst->TheDef->getName().str());
+  }
+
+  OperandEnumToID.reserve(NumOperandNames);
+  for (const auto & Op: OperandNameToID)
+    OperandEnumToID.push_back(Op.second);
 
   OS << "#ifdef GET_INSTRINFO_OPERAND_ENUM\n";
   OS << "#undef GET_INSTRINFO_OPERAND_ENUM\n";
   OS << "namespace llvm::" << Namespace << "::OpName {\n";
   OS << "enum {\n";
-  for (const auto &Op : Operands)
-    OS << "  " << Op.first << " = " << Op.second << ",\n";
-
-  OS << "  OPERAND_LAST";
-  OS << "\n};\n";
+  for (const auto &[I, Op] : enumerate(OperandNameToID))
+    OS << "  " << Op.first << " = " << I << ",\n";
+  OS << "};\n";
   OS << "} // end namespace llvm::" << Namespace << "::OpName\n";
   OS << "#endif //GET_INSTRINFO_OPERAND_ENUM\n\n";
 
@@ -307,28 +295,28 @@ void InstrInfoEmitter::emitOperandNameMappings(
   OS << "namespace llvm::" << Namespace << " {\n";
   OS << "LLVM_READONLY\n";
   OS << "int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx) {\n";
-  if (!Operands.empty()) {
-    OS << "  static const int16_t OperandMap [][" << Operands.size()
-       << "] = {\n";
+  if (NumOperandNames != 0) {
+    assert(MaxOperandNo <= std::numeric_limits<int16_t>::max());
+    StringRef Type = MaxOperandNo <= std::numeric_limits<int8_t>::max() ? "int8_t" : "int16_t";
+    OS << "  static constexpr " << Type << " OperandMap[][" << NumOperandNames << "] = {\n";
     for (const auto &Entry : OperandMap) {
       const std::map<unsigned, unsigned> &OpList = Entry.first;
-      OS << "{";
-
-      // Emit a row of the OperandMap table
-      for (unsigned i = 0, e = Operands.size(); i != e; ++i)
-        OS << (OpList.count(i) == 0 ? -1 : (int)OpList.find(i)->second) << ", ";
 
+      // Emit a row of the OperandMap table.
+      OS << "  {";
+      for (int i = 0; i < static_cast<int>(NumOperandNames); ++i) {
+        unsigned ID = OperandEnumToID[i];
+        OS << (OpList.count(ID) ? (int)OpList.find(ID)->second : -1) << ", ";
+      }
       OS << "},\n";
     }
     OS << "};\n";
 
     OS << "  switch(Opcode) {\n";
-    unsigned TableIndex = 0;
-    for (const auto &Entry : OperandMap) {
+    for (const auto &[TableIndex, Entry] : enumerate(OperandMap)) {
       for (const std::string &Name : Entry.second)
-        OS << "  case " << Name << ":\n";
-
-      OS << "    return OperandMap[" << TableIndex++ << "][NamedIdx];\n";
+        OS << "  case " << Namespace << "::" << Name << ":\n";
+      OS << "    return OperandMap[" << TableIndex << "][NamedIdx];\n";
     }
     OS << "  default: return -1;\n";
     OS << "  }\n";

Copy link

github-actions bot commented Jan 29, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jurahul jurahul force-pushed the instr_info_named_operand_idx branch 2 times, most recently from 13e662c to 114f547 Compare January 29, 2025 18:30
- Assign OpName enum values in the same alphabetical order in which
  they are emitted (and get of OPERAND_LAST which is not used anywhere).
- Inline `initOperandMapData` into `emitOperandNameMappings` to help
  see clearly how various maps are computed.
- Emit the static `OperandMap` table as int8_t when possible. This
  should help reduce the static size by 50% in the common case.
@jurahul jurahul force-pushed the instr_info_named_operand_idx branch from 114f547 to 3e865ea Compare January 30, 2025 17:15
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jurahul jurahul merged commit fdfd979 into llvm:main Jan 30, 2025
8 checks passed
@jurahul jurahul deleted the instr_info_named_operand_idx branch January 30, 2025 21:06
@nvjle
Copy link
Contributor

nvjle commented Jan 31, 2025

Hi @jurahul ,

My downstream backend uses sentinel OPERAND_LAST (perhaps others do as well). It seems best to restore that line. It is pretty common for TableGen to emit sentinels in enums like this, and they can be useful.

jurahul added a commit to jurahul/llvm-project that referenced this pull request Jan 31, 2025
- Looks like this sentinel value is used in some downstream backends,
  so restore emitting it.
- It now also has the correct value (earlier code may hve emitted an
  incorrect value for OPERAND_LAST and hence it was removed in
  llvm#124960)
@topperc
Copy link
Collaborator

topperc commented Jan 31, 2025

Hi @jurahul ,

My downstream backend uses sentinel OPERAND_LAST (perhaps others do as well). It seems best to restore that line. It is pretty common for TableGen to emit sentinels in enums like this, and they can be useful.

Can you say more about what you use it for? If I understand correctly the value wasn't guaranteed to be a value that wasn't used elsewhere in the enum.

@jurahul
Copy link
Contributor Author

jurahul commented Jan 31, 2025

Hi @jurahul ,
My downstream backend uses sentinel OPERAND_LAST (perhaps others do as well). It seems best to restore that line. It is pretty common for TableGen to emit sentinels in enums like this, and they can be useful.

Can you say more about what you use it for? If I understand correctly the value wasn't guaranteed to be a value that wasn't used elsewhere in the enum.

Really as a sentinel value, for example. to indicate that the enum value is not present. For example, a function that returns OpName can return OPERAND_LAST to indicate that it did not find the operand it was looking for. We could also instead use std::optional<OpName> but as @nvjle said its best to restore it as other downstream backends may be using this as well.

@topperc
Copy link
Collaborator

topperc commented Jan 31, 2025

Hi @jurahul ,
My downstream backend uses sentinel OPERAND_LAST (perhaps others do as well). It seems best to restore that line. It is pretty common for TableGen to emit sentinels in enums like this, and they can be useful.

Can you say more about what you use it for? If I understand correctly the value wasn't guaranteed to be a value that wasn't used elsewhere in the enum.

Really as a sentinel value, for example. to indicate that the enum value is not present. For example, a function that returns OpName can return OPERAND_LAST to indicate that it did not find the operand it was looking for. We could also instead use std::optional<OpName> but as @nvjle said its best to restore it as other downstream backends may be using this as well.

It's not named the same way most enums in LLVM are. "LAST" usually refers a value that is inclusive in the range. See LAST_FCMP_PREDICATE, LAST_MEMORY_OPCODE, LAST_*_INST. There are other exceptions to this, but that seemed to be the general pattern. enum values one past the end of the range are usually labeled END or NUM like NUM_TARGET_REGS or INSTRUCTION_LIST_END.

jurahul added a commit that referenced this pull request Jan 31, 2025
…125265)

- Looks like this sentinel value is used in some downstream backends, so
restore emitting it.
- It now also has the correct value (earlier code may have emitted an
incorrect value for OPERAND_LAST and hence it was removed in
#124960)
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 31, 2025
…nfoEmitter (#125265)

- Looks like this sentinel value is used in some downstream backends, so
restore emitting it.
- It now also has the correct value (earlier code may have emitted an
incorrect value for OPERAND_LAST and hence it was removed in
llvm/llvm-project#124960)
@jurahul
Copy link
Contributor Author

jurahul commented Jan 31, 2025

Hi @jurahul ,
My downstream backend uses sentinel OPERAND_LAST (perhaps others do as well). It seems best to restore that line. It is pretty common for TableGen to emit sentinels in enums like this, and they can be useful.

Can you say more about what you use it for? If I understand correctly the value wasn't guaranteed to be a value that wasn't used elsewhere in the enum.

Really as a sentinel value, for example. to indicate that the enum value is not present. For example, a function that returns OpName can return OPERAND_LAST to indicate that it did not find the operand it was looking for. We could also instead use std::optional<OpName> but as @nvjle said its best to restore it as other downstream backends may be using this as well.

It's not named the same way most enums in LLVM are. "LAST" usually refers a value that is inclusive in the range. See LAST_FCMP_PREDICATE, LAST_MEMORY_OPCODE, LAST_*_INST. There are other exceptions to this, but that seemed to be the general pattern. enum values one past the end of the range are usually labeled END or NUM like NUM_TARGET_REGS or INSTRUCTION_LIST_END.

Makes sense. We can add a NUM_OPERAND_NAMES alias to start with and remove OPERAND_LAST once all other users (downstream) migrate to it.

@mariusz-sikora-at-amd
Copy link
Contributor

Hi @jurahul ,
We are also using OPERAND_LAST in downstream branch and we noticed that it was/is accessing first element from next row in OperandMap 2dim table. Proposal: #125424

namespace llvm::AMDGPU::OpName {
enum {
  a16 = 0,
 ...
  OPERAND_LAST = 115,

int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx) {
  static constexpr int8_t OperandMap[][115] = {

@jurahul
Copy link
Contributor Author

jurahul commented Feb 2, 2025

Hi @jurahul , We are also using OPERAND_LAST in downstream branch and we noticed that it was/is accessing first element from next row in OperandMap 2dim table. Proposal: #125424

namespace llvm::AMDGPU::OpName {
enum {
  a16 = 0,
 ...
  OPERAND_LAST = 115,

int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIdx) {
  static constexpr int8_t OperandMap[][115] = {

I'd say passing OPERAND_LAST is an error into this function. It worked because it wasn't actually the last enum in many cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants