Skip to content

[GlobalISel] Optimized MatchData Handling #92115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Pierre-vh
Copy link
Contributor

@Pierre-vh Pierre-vh commented May 14, 2024

Surprisingly, MatchDatas were very, very expensive in GlobalISel Combiners. The time spent on constructing the big MatchData struct, and copying it to reset it (operator=) was equal or greater than the time spent iterating the MatchTable itself and executing it (excluding the time spent in callees, of course).

This was because I put every MatchData we could possibly need in a big struct, and rebuilt that struct every instruction we looked at. Turns out that when all your function is doing is iterating over a byte array and executing opcodes, the time spent allocating and initializing a huge non-trivially constructible/destructible struct over and over again is very, very significant.

Now, we use a simpler and more efficient strategy: be as lazy as possible, only allocate and deallocate as needed, end of story. The backend doesn't think about it anymore, and we simply don't pay for what we don't use.

Surprisingly, MatchDatas were very, very expensive in GlobalISel Combiners.
The time spent on constructing the big MatchData struct, and copying it to reset it (`operator=`) was equal or greater than the time spent iterating the MatchTable itself and executing it (excluding the time spent in callees, of course).

This was because I put every MatchData we could possibly need in a big struct, and rebuilt that struct every instruction we looked at.
Turns out that when all your function is doing is iterating over a byte array and executing opcodes, the time spent allocating and initializing a huge non-trivially constructible/destructible struct is very, very significant.

Now, we use a simpler and more efficient strategy: be as lazy as possible, only allocate and deallocate as needed, end of story. The backend doesn't think about it anymore, and we simply don't pay for what we don't use.
@llvmbot
Copy link
Member

llvmbot commented May 14, 2024

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

Surprisingly, MatchDatas were very, very expensive in GlobalISel Combiners. The time spent on constructing the big MatchData struct, and copying it to reset it (operator=) was equal or greater than the time spent iterating the MatchTable itself and executing it (excluding the time spent in callees, of course).

This was because I put every MatchData we could possibly need in a big struct, and rebuilt that struct every instruction we looked at. Turns out that when all your function is doing is iterating over a byte array and executing opcodes, the time spent allocating and initializing a huge non-trivially constructible/destructible struct is very, very significant.

Now, we use a simpler and more efficient strategy: be as lazy as possible, only allocate and deallocate as needed, end of story. The backend doesn't think about it anymore, and we simply don't pay for what we don't use.


Full diff: https://github.com/llvm/llvm-project/pull/92115.diff

7 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalISel/Combiner.h (+50)
  • (modified) llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td (+13-10)
  • (modified) llvm/test/TableGen/GlobalISelCombinerEmitter/pattern-parsing.td (+4-4)
  • (modified) llvm/utils/TableGen/Common/CMakeLists.txt (-1)
  • (removed) llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.cpp (-49)
  • (removed) llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.h (-90)
  • (modified) llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp (+77-29)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h b/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
index f826601544932..c3b483bdc200f 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
@@ -60,6 +60,56 @@ class Combiner : public GIMatchTableExecutor {
 
   bool combineMachineInstrs();
 
+  /// Base class for all dynamic MatchDatas definitions, used for type erasure
+  /// purposes.
+  ///
+  /// As derived classes may have non-trivial destructors, we need to be able to
+  /// call the destructor through the unique_ptr of the base class.
+  ///
+  /// There are two ways to achieve this.
+  ///   - (Easiest) Make MatchDataBase have a virtual destructor.
+  ///   - Use a custom deleter
+  ///
+  /// We use solution number 2, that way we don't have a vtable in all MatchData
+  /// objects (making them more compact), and we can have trivially destructible
+  /// MatchDatas, which avoids useless virtual function calls and allows the
+  /// compiler/libraries to use faster code (as it knows no destructor needs to
+  /// be called).
+  ///
+  /// This is really a micro-optimization, but MatchData used to be a
+  /// performance issue so better safe than sorry.
+  struct MatchDataBase {};
+
+  /// If a MatchData instance is active, cast it to Ty and return it.
+  /// Otherwise, create a new MatchData instance of type Ty and return it.
+  template <typename Ty> Ty &getOrCreateMatchData() const {
+    static_assert(std::is_base_of_v<MatchDataBase, Ty>,
+                  "Type must derive from MatchDataBase to be allocatable!");
+    // Allocate if we have no MatchData active, or if the MatchData that's
+    // active is not the one that we want.
+    if (!MatchData || Ty::ID != MatchDataID) {
+      MatchDataID = Ty::ID;
+      MatchData = std::unique_ptr<Ty, MatchDataDeleter>(
+          new Ty(), [](MatchDataBase *MD) { delete static_cast<Ty *>(MD); });
+    }
+    return *static_cast<Ty *>(MatchData.get());
+  }
+
+  /// Deallocates the currently active MatchData instance.
+  ///
+  /// TODO: Can we get away with never actually calling this? The MatchData
+  /// instance would then just be deleted by getOrCreateMatchData or when the
+  /// Combiner class is deallocated. I'm just a bit scared it'd cause issues
+  /// with captured values in some of the lambdas used as MatchInfo.
+  void resetMatchData() const { MatchData.reset(); }
+
+private:
+  using MatchDataDeleter = void (*)(MatchDataBase *);
+
+  mutable std::unique_ptr<MatchDataBase, MatchDataDeleter> MatchData = {
+      nullptr, [](auto *) {}};
+  mutable unsigned MatchDataID = unsigned(-1);
+
 protected:
   CombinerInfo &CInfo;
   GISelChangeObserver &Observer;
diff --git a/llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td b/llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td
index 1052e31b2d051..837f4c76c0f18 100644
--- a/llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td
+++ b/llvm/test/TableGen/GlobalISelCombinerEmitter/match-table.td
@@ -76,8 +76,15 @@ def MyCombiner: GICombiner<"GenMyCombiner", [
 
 // We have at most 2 registers used by one rule at a time, so we should only have 2 registers MDInfos.
 
-// CHECK:      struct MatchInfosTy {
-// CHECK-NEXT:   Register MDInfo0, MDInfo1;
+// CHECK:      struct MatchDataRule2 : MatchDataBase {
+// CHECK-NEXT:   static constexpr unsigned ID = 2;
+// CHECK-NEXT:   Register   r0;
+// CHECK-NEXT:   Register   r1;
+// CHECK-NEXT: };
+
+// CHECK:      struct MatchDataRule3 : MatchDataBase {
+// CHECK-NEXT:   static constexpr unsigned ID = 3;
+// CHECK-NEXT:   Register   r0;
 // CHECK-NEXT: };
 
 // Check predicates
@@ -96,13 +103,9 @@ def MyCombiner: GICombiner<"GenMyCombiner", [
 // CHECK-NEXT:   B.setInstrAndDebugLoc(I);
 // CHECK-NEXT:   State.MIs.clear();
 // CHECK-NEXT:   State.MIs.push_back(&I);
-// CHECK-NEXT:   MatchInfos = MatchInfosTy();
-// CHECK-EMPTY:
-// CHECK-NEXT:   if (executeMatchTable(*this, State, ExecInfo, B, getMatchTable(), *ST.getInstrInfo(), MRI, *MRI.getTargetRegisterInfo(), *ST.getRegBankInfo(), AvailableFeatures, /*CoverageInfo*/ nullptr))
-// CHECK-NEXT:     return true;
-// CHECK-NEXT:   }
-// CHECK-EMPTY:
-// CHECK-NEXT:   return false;
+// CHECK-NEXT:   const bool Result = executeMatchTable(*this, State, ExecInfo, B, getMatchTable(), *ST.getInstrInfo(), MRI, *MRI.getTargetRegisterInfo(), *ST.getRegBankInfo(), AvailableFeatures, /*CoverageInfo*/ nullptr);
+// CHECK-NEXT:   resetMatchData();
+// CHECK-NEXT:   return Result;
 // CHECK-NEXT: }
 
 // Check apply
@@ -120,7 +123,7 @@ def MyCombiner: GICombiner<"GenMyCombiner", [
 // CHECK-NEXT:   }
 // CHECK-NEXT:   case GICXXCustomAction_CombineApplyGICombiner1:{
 // CHECK-NEXT:     Helper.getBuilder().setInstrAndDebugLoc(*State.MIs[0]);
-// CHECK-NEXT:     APPLY MatchInfos.MDInfo0, MatchInfos.MDInfo1
+// CHECK-NEXT:     APPLY getOrCreateMatchData<MatchDataRule2>().r0, getOrCreateMatchData<MatchDataRule2>().r1
 // CHECK-NEXT:     return;
 // CHECK-NEXT:   }
 // CHECK-NEXT:   case GICXXCustomAction_CombineApplyGICombiner2:{
diff --git a/llvm/test/TableGen/GlobalISelCombinerEmitter/pattern-parsing.td b/llvm/test/TableGen/GlobalISelCombinerEmitter/pattern-parsing.td
index c261ec4575453..3de2924608faa 100644
--- a/llvm/test/TableGen/GlobalISelCombinerEmitter/pattern-parsing.td
+++ b/llvm/test/TableGen/GlobalISelCombinerEmitter/pattern-parsing.td
@@ -64,8 +64,8 @@ def InstTest0 : GICombineRule<
   (apply [{ APPLY }])>;
 
 // CHECK:      (CombineRule name:InstTest1 id:3 root:d
-// CHECK-NEXT:   (MatchDatas
-// CHECK-NEXT:      (MatchDataInfo pattern_symbol:r0 type:'Register' var_name:MDInfo0)
+// CHECK-NEXT:   (MatchData typename:MatchDataRule3 id:3
+// CHECK-NEXT:     (MatchDataDef symbol:r0 type:Register)
 // CHECK-NEXT:   )
 // CHECK-NEXT:   (MatchPats
 // CHECK-NEXT:     <match_root>d:(CodeGenInstructionPattern COPY operands:[<def>$a, i32:$b])
@@ -89,8 +89,8 @@ def InstTest1 : GICombineRule<
   (apply [{ APPLY }])>;
 
 // CHECK:      (CombineRule name:InstTest2 id:4 root:d
-// CHECK-NEXT:   (MatchDatas
-// CHECK-NEXT:     (MatchDataInfo pattern_symbol:r0 type:'Register' var_name:MDInfo0)
+// CHECK-NEXT:   (MatchData typename:MatchDataRule4 id:4
+// CHECK-NEXT:     (MatchDataDef symbol:r0 type:Register)
 // CHECK-NEXT:   )
 // CHECK-NEXT:   (MatchPats
 // CHECK-NEXT:     <match_root>__InstTest2_match_0:(CodeGenInstructionPattern COPY operands:[<def>$d, (i32 0):$x])
diff --git a/llvm/utils/TableGen/Common/CMakeLists.txt b/llvm/utils/TableGen/Common/CMakeLists.txt
index c31ed5a1de690..30f188ae48a2a 100644
--- a/llvm/utils/TableGen/Common/CMakeLists.txt
+++ b/llvm/utils/TableGen/Common/CMakeLists.txt
@@ -16,7 +16,6 @@ add_llvm_library(LLVMTableGenCommon STATIC OBJECT EXCLUDE_FROM_ALL
   GlobalISel/CXXPredicates.cpp
   GlobalISel/GlobalISelMatchTable.cpp
   GlobalISel/GlobalISelMatchTableExecutorEmitter.cpp
-  GlobalISel/MatchDataInfo.cpp
   GlobalISel/PatternParser.cpp
   GlobalISel/Patterns.cpp
 
diff --git a/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.cpp b/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.cpp
deleted file mode 100644
index b5c9e4f8c2485..0000000000000
--- a/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.cpp
+++ /dev/null
@@ -1,49 +0,0 @@
-//===- MatchDataInfo.cpp ----------------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-//
-//===----------------------------------------------------------------------===//
-
-#include "MatchDataInfo.h"
-#include "llvm/Support/Debug.h"
-#include "llvm/Support/raw_ostream.h"
-
-namespace llvm {
-namespace gi {
-
-StringMap<std::vector<std::string>> AllMatchDataVars;
-
-StringRef MatchDataInfo::getVariableName() const {
-  assert(hasVariableName());
-  return VarName;
-}
-
-void MatchDataInfo::print(raw_ostream &OS) const {
-  OS << "(MatchDataInfo pattern_symbol:" << PatternSymbol << " type:'" << Type
-     << "' var_name:" << (VarName.empty() ? "<unassigned>" : VarName) << ")";
-}
-
-void MatchDataInfo::dump() const { print(dbgs()); }
-
-void AssignMatchDataVariables(MutableArrayRef<MatchDataInfo> Infos) {
-  static unsigned NextVarID = 0;
-
-  StringMap<unsigned> SeenTypes;
-  for (auto &Info : Infos) {
-    unsigned &NumSeen = SeenTypes[Info.getType()];
-    auto &ExistingVars = AllMatchDataVars[Info.getType()];
-
-    if (NumSeen == ExistingVars.size())
-      ExistingVars.push_back("MDInfo" + std::to_string(NextVarID++));
-
-    Info.setVariableName(ExistingVars[NumSeen++]);
-  }
-}
-
-} // namespace gi
-} // namespace llvm
diff --git a/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.h b/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.h
deleted file mode 100644
index abe1245bc67d0..0000000000000
--- a/llvm/utils/TableGen/Common/GlobalISel/MatchDataInfo.h
+++ /dev/null
@@ -1,90 +0,0 @@
-//===- MatchDataInfo.h ------------------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-/// \file Contains utilities related to handling "match data" for GlobalISel
-///  Combiners. Match data allows for setting some arbitrary data in the "match"
-///  phase and pass it down to the "apply" phase.
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef LLVM_UTILS_MIRPATTERNS_MATCHDATAINFO_H
-#define LLVM_UTILS_MIRPATTERNS_MATCHDATAINFO_H
-
-#include "llvm/ADT/ArrayRef.h"
-#include "llvm/ADT/StringMap.h"
-#include "llvm/ADT/StringRef.h"
-#include <string>
-#include <vector>
-
-namespace llvm {
-
-class raw_ostream;
-
-namespace gi {
-
-/// Represents MatchData defined by the match stage and required by the apply
-/// stage.
-///
-/// This allows the plumbing of arbitrary data from C++ predicates between the
-/// stages.
-///
-/// When this class is initially created, it only has a pattern symbol and a
-/// type. When all of the MatchDatas declarations of a given pattern have been
-/// parsed, `AssignVariables` must be called to assign storage variable names to
-/// each MatchDataInfo.
-class MatchDataInfo {
-  StringRef PatternSymbol;
-  StringRef Type;
-  std::string VarName;
-
-public:
-  static constexpr StringLiteral StructTypeName = "MatchInfosTy";
-  static constexpr StringLiteral StructName = "MatchInfos";
-
-  MatchDataInfo(StringRef PatternSymbol, StringRef Type)
-      : PatternSymbol(PatternSymbol), Type(Type.trim()) {}
-
-  StringRef getPatternSymbol() const { return PatternSymbol; };
-  StringRef getType() const { return Type; };
-
-  bool hasVariableName() const { return !VarName.empty(); }
-  void setVariableName(StringRef Name) { VarName = Name; }
-  StringRef getVariableName() const;
-
-  std::string getQualifiedVariableName() const {
-    return StructName.str() + "." + getVariableName().str();
-  }
-
-  void print(raw_ostream &OS) const;
-  void dump() const;
-};
-
-/// Pool of type -> variables used to emit MatchData variables declarations.
-///
-/// e.g. if the map contains "int64_t" -> ["MD0", "MD1"], then two variable
-/// declarations must be emitted: `int64_t MD0` and `int64_t MD1`.
-///
-/// This has a static lifetime and will outlive all the `MatchDataInfo` objects
-/// by design. It needs a static lifetime so the backends can emit variable
-/// declarations after processing all the inputs.
-extern StringMap<std::vector<std::string>> AllMatchDataVars;
-
-/// Assign variable names to all MatchDatas used by a pattern. This must be
-/// called after all MatchData decls have been parsed for a given processing
-/// unit (e.g. a combine rule)
-///
-/// Requires an array of MatchDataInfo so we can handle cases where a pattern
-/// uses multiple instances of the same MatchData type.
-///
-/// Writes to \ref AllMatchDataVars.
-void AssignMatchDataVariables(MutableArrayRef<MatchDataInfo> Infos);
-
-} // namespace gi
-} // end namespace llvm
-
-#endif // ifndef LLVM_UTILS_MIRPATTERNS_MATCHDATAINFO_H
diff --git a/llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp b/llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
index ef9e9ff04f85f..02f84bd9f02df 100644
--- a/llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
+++ b/llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp
@@ -35,7 +35,6 @@
 #include "Common/GlobalISel/CombinerUtils.h"
 #include "Common/GlobalISel/GlobalISelMatchTable.h"
 #include "Common/GlobalISel/GlobalISelMatchTableExecutorEmitter.h"
-#include "Common/GlobalISel/MatchDataInfo.h"
 #include "Common/GlobalISel/PatternParser.h"
 #include "Common/GlobalISel/Patterns.h"
 #include "Common/SubtargetFeatureInfo.h"
@@ -605,6 +604,56 @@ CombineRuleOperandTypeChecker::getRuleEqClasses() const {
   return TECs;
 }
 
+//===- MatchData Handling -------------------------------------------------===//
+//
+/// Parsing
+///   Record all MatchData definitions seen.
+///
+///   Once all defs have been parsed, handleRuleMatchDataDefs is used to create
+///   a new MatchDataStruct (if at least one MatchDataDef exists) and add it to
+///   AllMatchDataStructDecls for emission later.
+///
+/// Layout
+///   Each rule that has at least one MatchData will create a struct (See
+///   getMatchDataStructTypeName) that derives from Combiner::MatchDataBase
+///   and containing each MatchData definition. Fields in that struct are
+///   named after the MatchData symbols.
+///
+/// Access/Code Expansion
+///   Combiner::getOrCreateMatchData is used to fetch (and lazily allocate)
+///   MatchDatas as they're needed.
+struct MatchDataDef {
+  MatchDataDef(StringRef Symbol, StringRef Type) : Symbol(Symbol), Type(Type) {}
+
+  StringRef Symbol;
+  StringRef Type;
+};
+
+struct MatchDataStruct {
+  std::string TypeName;
+  unsigned ID;
+  std::vector<MatchDataDef> Fields;
+};
+
+static std::vector<std::unique_ptr<MatchDataStruct>> AllMatchDataStructs;
+
+/// If \p Defs has at least one item, prepare a new MatchDataStruct to contain
+/// this rule's MatchData defs. If \p Defs is empty, return nullptr.
+static const MatchDataStruct *
+handleRuleMatchDataDefs(ArrayRef<MatchDataDef> Defs, unsigned RuleID) {
+  if (Defs.empty())
+    return nullptr;
+
+  auto MDS = std::make_unique<MatchDataStruct>();
+  MDS->TypeName = "MatchDataRule" + std::to_string(RuleID);
+  MDS->ID = RuleID;
+  for (const auto &Def : Defs)
+    MDS->Fields.push_back(Def);
+  auto *Ptr = MDS.get();
+  AllMatchDataStructs.push_back(std::move(MDS));
+  return Ptr;
+}
+
 //===- CombineRuleBuilder -------------------------------------------------===//
 
 /// Parses combine rule and builds a small intermediate representation to tie
@@ -777,8 +826,8 @@ class CombineRuleBuilder {
   Pattern *MatchRoot = nullptr;
   SmallDenseSet<InstructionPattern *, 2> ApplyRoots;
 
-  SmallVector<MatchDataInfo, 2> MatchDatas;
   SmallVector<PatternAlternatives, 1> PermutationsToEmit;
+  const MatchDataStruct *RuleMatchData = nullptr;
 };
 
 bool CombineRuleBuilder::parseAll() {
@@ -842,13 +891,12 @@ void CombineRuleBuilder::print(raw_ostream &OS) const {
   OS << "(CombineRule name:" << RuleDef.getName() << " id:" << RuleID
      << " root:" << RootName << '\n';
 
-  if (!MatchDatas.empty()) {
-    OS << "  (MatchDatas\n";
-    for (const auto &MD : MatchDatas) {
-      OS << "    ";
-      MD.print(OS);
-      OS << '\n';
-    }
+  if (RuleMatchData) {
+    OS << "  (MatchData typename:" << RuleMatchData->TypeName
+       << " id:" << RuleMatchData->ID << "\n";
+    for (const auto &MD : RuleMatchData->Fields)
+      OS << "    (MatchDataDef symbol:" << MD.Symbol << " type:" << MD.Type
+         << ")\n";
     OS << "  )\n";
   }
 
@@ -1007,8 +1055,13 @@ bool CombineRuleBuilder::addMatchPattern(std::unique_ptr<Pattern> Pat) {
 
 void CombineRuleBuilder::declareAllMatchDatasExpansions(
     CodeExpansions &CE) const {
-  for (const auto &MD : MatchDatas)
-    CE.declare(MD.getPatternSymbol(), MD.getQualifiedVariableName());
+  if (!RuleMatchData)
+    return;
+
+  const std::string BaseAccessor =
+      "getOrCreateMatchData<" + RuleMatchData->TypeName + ">().";
+  for (const auto &MD : RuleMatchData->Fields)
+    CE.declare(MD.Symbol, (BaseAccessor + MD.Symbol).str());
 }
 
 void CombineRuleBuilder::addCXXPredicate(RuleMatcher &M,
@@ -1429,6 +1482,7 @@ bool CombineRuleBuilder::parseDefs(const DagInit &Def) {
     return false;
   }
 
+  SmallVector<MatchDataDef, 2> MatchDatas;
   SmallVector<StringRef> Roots;
   for (unsigned I = 0, E = Def.getNumArgs(); I < E; ++I) {
     if (isSpecificDef(*Def.getArg(I), "root")) {
@@ -1463,9 +1517,7 @@ bool CombineRuleBuilder::parseDefs(const DagInit &Def) {
   }
 
   RootName = Roots.front();
-
-  // Assign variables to all MatchDatas.
-  AssignMatchDataVariables(MatchDatas);
+  RuleMatchData = handleRuleMatchDataDefs(MatchDatas, RuleID);
   return true;
 }
 
@@ -2371,15 +2423,12 @@ void GICombinerEmitter::emitAdditionalImpl(raw_ostream &OS) {
      << "  B.setInstrAndDebugLoc(I);\n"
      << "  State.MIs.clear();\n"
      << "  State.MIs.push_back(&I);\n"
-     << "  " << MatchDataInfo::StructName << " = "
-     << MatchDataInfo::StructTypeName << "();\n\n"
-     << "  if (executeMatchTable(*this, State, ExecInfo, B"
+     << "  const bool Result = executeMatchTable(*this, State, ExecInfo, B"
      << ", getMatchTable(), *ST.getInstrInfo(), MRI, "
         "*MRI.getTargetRegisterInfo(), *ST.getRegBankInfo(), AvailableFeatures"
-     << ", /*CoverageInfo*/ nullptr)) {\n"
-     << "    return true;\n"
-     << "  }\n\n"
-     << "  return false;\n"
+     << ", /*CoverageInfo*/ nullptr);\n"
+     << "  resetMatchData();\n"
+     << "  return Result;\n"
      << "}\n\n";
 }
 
@@ -2472,14 +2521,13 @@ void GICombinerEmitter::emitRunCustomAction(raw_ostream &OS) {
 
 void GICombinerEmitter::emitAdditionalTemporariesDecl(raw_ostream &OS,
                                                       StringRef Indent) {
-  OS << Indent << "struct " << MatchDataInfo::StructTypeName << " {\n";
-  for (const auto &[Type, VarNames] : AllMatchDataVars) {
-    assert(!VarNames.empty() && "Cannot have no vars for this type!");
-    OS << Indent << "  " << Type << " " << join(VarNames, ", ") << ";\n";
-  }
-  OS << Indent << "};\n"
-     << Indent << "mutable " << MatchDataInfo::StructTypeName << " "
-     << MatchDataInfo::StructName << ";\n\n";
+  for (const auto &MDS : AllMatchDataStructs) {
+    OS << Indent << "struct " << MDS->TypeName << " : MatchDataBase {\n"
+       << Indent << "  static constexpr unsigned ID = " << MDS->ID << ";\n";
+    for (const auto &Field : MDS->Fields)
+      OS << Indent << "  " << Field.Type << " " << Field.Symbol << ";\n";
+    OS << Indent << "};\n\n";
+  }
 }
 
 GICombinerEmitter::GICombinerEmitter(RecordKeeper &RK,

@Pierre-vh
Copy link
Contributor Author

@aemerson If you have some time, can you check this on CTMark? I think it should be a neat improvement according to my profiling done using perf record but I'd like someone else to confirm

@jayfoad
Copy link
Contributor

jayfoad commented May 14, 2024

One related thing I have never understood is, why is every combine split into a "match" function and an "apply" function? Aren't they always effectively called in tandem like this:

  if (matchFunc(MatchData))
    applyFunc(MatchData);

? Is there a use case for calling "match" without "apply"?

@Pierre-vh
Copy link
Contributor Author

One related thing I have never understood is, why is every combine split into a "match" function and an "apply" function? Aren't they always effectively called in tandem like this:

  if (matchFunc(MatchData))
    applyFunc(MatchData);

? Is there a use case for calling "match" without "apply"?

Matchers can fail/reject after doing further analysis (e.g. calling KnowBits), so the matcher is like an analysis part and the apply is the actual transform.

We can also have C++ matcher but then have the apply written in C++. Not sure if we have any combine doing that but it's something that's definitely possible

@jayfoad
Copy link
Contributor

jayfoad commented May 14, 2024

One related thing I have never understood is, why is every combine split into a "match" function and an "apply" function? Aren't they always effectively called in tandem like this:

  if (matchFunc(MatchData))
    applyFunc(MatchData);

? Is there a use case for calling "match" without "apply"?

Matchers can fail/reject after doing further analysis (e.g. calling KnowBits), so the matcher is like an analysis part and the apply is the actual transform.

Right, but why not always compile the match+apply as a single C++ function like:

  bool matchAndApply(MI) {
    // match part goes here

    if (analysis failed)
      return false;

    // apply part goes here
    return true;
  }

Then you don't have to worry about managing the lifetime of matchdata objects outside of this function.

@aemerson
Copy link
Contributor

I'll get some perf numbers on this change, thanks.

Also curious: do you think there's any significant problem with our common use of the build function lambdas?

@aemerson
Copy link
Contributor

Strangely, after 2 attempts at running everything with 10x repeats each time, I'm seeing performance regressions, if any change.

Program                                       compile_time
                                              lhs          rhs    diff
Bullet/bullet                                  10.98        11.00  0.2%
7zip/7zip-benchmark                            14.76        14.78  0.2%
lencod/lencod                                   4.65         4.66  0.2%
tramp3d-v4/tramp3d-v4                           5.68         5.69  0.1%
mafft/pairlocalalign                            2.68         2.68  0.0%
SPASS/SPASS                                     4.57         4.57  0.0%
kimwitu++/kc                                    6.65         6.65 -0.0%
consumer-typeset/consumer-typeset               3.59         3.59 -0.0%
sqlite3/sqlite3                                 2.49         2.49 -0.1%
                           Geomean difference                      0.1%

This is measured in an MacBook with release mode + assertions disabled, no LTO.

@aemerson
Copy link
Contributor

I'm going to rebuild with ThinLTO to see if there's a difference there.

@aemerson
Copy link
Contributor

Similar thing with LTO:

Program                                       compile_time
                                              lhs          rhs    diff
kimwitu++/kc                                    6.30         6.33  0.4%
7zip/7zip-benchmark                            14.14        14.17  0.2%
Bullet/bullet                                  10.43        10.44  0.1%
sqlite3/sqlite3                                 2.36         2.36  0.1%
SPASS/SPASS                                     4.34         4.34  0.1%
consumer-typeset/consumer-typeset               3.41         3.40 -0.0%
mafft/pairlocalalign                            2.55         2.54 -0.0%
tramp3d-v4/tramp3d-v4                           5.37         5.36 -0.0%
lencod/lencod                                   4.41         4.41 -0.1%
                           Geomean difference                      0.1%

@aemerson
Copy link
Contributor

aemerson commented May 15, 2024

One thing I just noticed was that the binary sizes for clang w/ thinlto were quite different.
Before: 181781160
After: 182051240

That's a -0.15% size regression which is significant for such a relatively small change. Using a script to analyze function size diffs I see this as the top regressions (symbol, before, after, delta)

__ZNK12_GLOBAL__N_125AMDGPURegBankCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 940 1824 884
__ZNK12_GLOBAL__N_129RISCVPreLegalizerCombinerImpl18testMIPredicate_MIEjRKN4llvm12MachineInstrERKNS1_20GIMatchTableExecutor12MatcherStateE 8376 9552 1176
__ZNK12_GLOBAL__N_132AArch64PostLegalizerLoweringImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 7712 8996 1284
__ZN12_GLOBAL__N_19Demangler10parseLNameEPN4llvm16itanium_demangle12OutputBufferERNSt3__117basic_string_viewIcNS5_11char_traitsIcEEEEm 46275648 46277264 1616
__ZNK12_GLOBAL__N_130AMDGPUPreLegalizerCombinerImpl18testMIPredicate_MIEjRKN4llvm12MachineInstrERKNS1_20GIMatchTableExecutor12MatcherStateE 8500 10168 1668
__ZNK12_GLOBAL__N_131AArch64PreLegalizerCombinerImpl18testMIPredicate_MIEjRKN4llvm12MachineInstrERKNS1_20GIMatchTableExecutor12MatcherStateE 8708 10416 1708
__ZNK12_GLOBAL__N_131AMDGPUPostLegalizerCombinerImpl18testMIPredicate_MIEjRKN4llvm12MachineInstrERKNS1_20GIMatchTableExecutor12MatcherStateE 9004 10736 1732
__ZNK12_GLOBAL__N_132AArch64PostLegalizerCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 3720 6948 3228
__ZNK12_GLOBAL__N_129RISCVPreLegalizerCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 4296 7624 3328
__ZNK12_GLOBAL__N_130AMDGPUPreLegalizerCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 4424 7760 3336
__ZNK12_GLOBAL__N_131AMDGPUPostLegalizerCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 4812 8248 3436
__ZNK12_GLOBAL__N_131AArch64PreLegalizerCombinerImpl15runCustomActionEjRKN4llvm20GIMatchTableExecutor12MatcherStateERNS1_11SmallVectorINS1_19MachineInstrBuilderELj4EEE 4676 8328 3652

demangled:

(anonymous namespace)::AMDGPURegBankCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 940 1824 884
(anonymous namespace)::RISCVPreLegalizerCombinerImpl::testMIPredicate_MI(unsigned int, llvm::MachineInstr const&, llvm::GIMatchTableExecutor::MatcherState const&) const 8376 9552 1176
(anonymous namespace)::AArch64PostLegalizerLoweringImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 7712 8996 1284
(anonymous namespace)::Demangler::parseLName(llvm::itanium_demangle::OutputBuffer*, std::__1::basic_string_view<char, std::__1::char_traits<char>>&, unsigned long) 46275648 46277264 1616
(anonymous namespace)::AMDGPUPreLegalizerCombinerImpl::testMIPredicate_MI(unsigned int, llvm::MachineInstr const&, llvm::GIMatchTableExecutor::MatcherState const&) const 8500 10168 1668
(anonymous namespace)::AArch64PreLegalizerCombinerImpl::testMIPredicate_MI(unsigned int, llvm::MachineInstr const&, llvm::GIMatchTableExecutor::MatcherState const&) const 8708 10416 1708
(anonymous namespace)::AMDGPUPostLegalizerCombinerImpl::testMIPredicate_MI(unsigned int, llvm::MachineInstr const&, llvm::GIMatchTableExecutor::MatcherState const&) const 9004 10736 1732
(anonymous namespace)::AArch64PostLegalizerCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 3720 6948 3228
(anonymous namespace)::RISCVPreLegalizerCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 4296 7624 3328
(anonymous namespace)::AMDGPUPreLegalizerCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 4424 7760 3336
(anonymous namespace)::AMDGPUPostLegalizerCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 4812 8248 3436
(anonymous namespace)::AArch64PreLegalizerCombinerImpl::runCustomAction(unsigned int, llvm::GIMatchTableExecutor::MatcherState const&, llvm::SmallVector<llvm::MachineInstrBuilder, 4u>&) const 4676 8328 3652

I'm not sure what happened here, my guess this is all overeager inlining related.

@Pierre-vh
Copy link
Contributor Author

One related thing I have never understood is, why is every combine split into a "match" function and an "apply" function? Aren't they always effectively called in tandem like this:

  if (matchFunc(MatchData))
    applyFunc(MatchData);

? Is there a use case for calling "match" without "apply"?

Matchers can fail/reject after doing further analysis (e.g. calling KnowBits), so the matcher is like an analysis part and the apply is the actual transform.

Right, but why not always compile the match+apply as a single C++ function like:

  bool matchAndApply(MI) {
    // match part goes here

    if (analysis failed)
      return false;

    // apply part goes here
    return true;
  }

Then you don't have to worry about managing the lifetime of matchdata objects outside of this function.

That's interesting, I'm not sure if it's possible to do that but I'll add that to my list.

I'll get some perf numbers on this change, thanks.

Also curious: do you think there's any significant problem with our common use of the build function lambdas?

I haven't seen issues related to the build lambdas, no

As for the (size and runtime) regression, that's really weird. perf record led me to believe this would improve things.
The size regression is probably due to inlining of the getOrCreateMatchData all over the place indeed.

I'm wondering if the pointer cast + indirection to reach the MatchData is adding too much overhead, I'll rethink this and try to understand why

@Pierre-vh
Copy link
Contributor Author

I'll go with @jayfoad's idea: Call match & apply in the same function. I tested it and it works a lot better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants