Skip to content

llvm-mca: Disentangle MemoryGroup from LSUnitBase #114159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

andrej
Copy link
Contributor

@andrej andrej commented Oct 30, 2024

In MCA, the load/store unit is modeled through a LSUnitBase class. Judging from the name LSUnitBase, I believe there is an intent to allow for different specialized load/store unit implementations. (However, currently there is only one implementation used in-tree, LSUnit.)

PR #101534 fixed one instance where the specialized LSUnit was hard-coded, opening the door for other subclasses to be used, but what subclasses can do is, in my opinion, still overly limited due to a reliance on the MemoryGroup class, e.g. here.

The MemoryGroup class is currently used in the default LSUnit implementation to model data dependencies/hazards in the pipeline. MemoryGroups form a graph of memory dependencies that inform the scheduler when load/store instructions can be executed relative to each other.

In my eyes, this is an implementation detail. Other LSUnits may want to keep track of data dependencies in different ways. As a concrete example, a downstream use I am working on[1] uses a custom load/store unit that makes use of available aliasing information. I haven't been able to shoehorn our additional aliasing information into the existing MemoryGroup abstraction. I think there is no need to force subclasses to use MemoryGroups; users of LSUnitBase are only concerned with when, and for how long, a load/store instruction executes.

This PR makes changes to instead leave it up to the subclasses how to model such dependencies, and only prescribes an abstract interface in LSUnitBase. It also moves data members and methods that are not necessary to provide an abstract interface from LSUnitBase to the LSUnit subclass. I decided to make the MemoryGroup a protected subclass of LSUnit; that way, specializations may inherit from LSUnit and still make use of MemoryGroups if they wish to do so (e.g. if they want to only overwrite the dispatch method).

Drawbacks / Considerations

My reason for suggesting this PR is an out-of-tree use. As such, these changes don't introduce any new functionality for in-tree LLVM uses. However, in my opinion, these changes improve code clarity and prescribe a clear interface, which would be the main benefit for the LLVM community.

A drawback of the more abstract interface is that virtual dispatching is used in more places. However, note that virtual dispatch is already currently used in some critical parts of the LSUnitBase, e.g. the isAvailable and dispatch methods. As a quick check to ensure these changes don't significantly negatively impact performance, I also ran time llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s before and after the changes; there was no observable difference in runtimes (0.292 s total before, 0.286 s total after changes).

Alternatives

If there is a desire to require subclasses to use some form of MemoryGroups, it would also be possible to create something like an AbstractMemoryGroup interface and only reference that from within LSUnitBase. That would still allow subclassing of LSUnitBase to use different kinds of memory groups.

Thank you in advance for giving this your consideration!

[1]: MCAD started by @mshockwave and @chinmaydd.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 30, 2024

@llvm/pr-subscribers-tools-llvm-mca

Author: André Rösti (andrej)

Changes

In MCA, the load/store unit is modeled through a LSUnitBase class. Judging from the name LSUnitBase, I believe there is an intent to allow for different specialized load/store unit implementations. (However, currently there is only one implementation used in-tree, LSUnit.)

PR #101534 fixed one instance where the specialized LSUnit was hard-coded, opening the door for other subclasses to be used, but what subclasses can do is, in my opinion, still overly limited due to a reliance on the MemoryGroup class, e.g. here.

The MemoryGroup class is currently used in the default LSUnit implementation to model data dependencies/hazards in the pipeline. MemoryGroups form a graph of memory dependencies that inform the scheduler when load/store instructions can be executed relative to each other.

In my eyes, this is an implementation detail. Other LSUnits may want to keep track of data dependencies in different ways. As a concrete example, a downstream use I am working on<sup>[1]</sup> uses a custom load/store unit that makes use of available aliasing information. I haven't been able to shoehorn our additional aliasing information into the existing MemoryGroup abstraction. I think there is no need to force subclasses to use MemoryGroups; users of LSUnitBase are only concerned with when, and for how long, a load/store instruction executes.

This PR makes changes to instead leave it up to the subclasses how to model such dependencies, and only prescribes an abstract interface in LSUnitBase. It also moves data members and methods that are not necessary to provide an abstract interface from LSUnitBase to the LSUnit subclass. I decided to make the MemoryGroup a protected subclass of LSUnit; that way, specializations may inherit from LSUnit and still make use of MemoryGroups if they wish to do so (e.g. if they want to only overwrite the dispatch method).

Drawbacks / Considerations

My reason for suggesting this PR is an out-of-tree use. As such, these changes don't introduce any new functionality for in-tree LLVM uses. However, in my opinion, these changes improve code clarity and prescribe a clear interface, which would be the main benefit for the LLVM community.

A drawback of the more abstract interface is that virtual dispatching is used in more places. However, note that virtual dispatch is already currently used in some critical parts of the LSUnitBase, e.g. the isAvailable and dispatch methods. As a quick check to ensure these changes don't significantly negatively impact performance, I also ran time llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s before and after the changes; there was no observable difference in runtimes (0.292 s total before, 0.286 s total after changes).

Alternatives

If there is a desire to require subclasses to use some form of MemoryGroups, it would also be possible to create something like an AbstractMemoryGroup interface and only reference that from within LSUnitBase. That would still allow subclassing of LSUnitBase to use different kinds of memory groups.

Thank you in advance for giving this your consideration!

<sup>[1]: MCAD started by @mshockwave and @chinmaydd.</sup>


Patch is 22.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114159.diff

3 Files Affected:

  • (modified) llvm/include/llvm/MCA/HardwareUnits/LSUnit.h (+243-214)
  • (modified) llvm/lib/MCA/HardwareUnits/LSUnit.cpp (+15-21)
  • (modified) llvm/lib/MCA/HardwareUnits/Scheduler.cpp (+3-2)
diff --git a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
index 81a5453bac26c4..38d6b77916ca8b 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
@@ -24,168 +24,6 @@
 namespace llvm {
 namespace mca {
 
-/// A node of a memory dependency graph. A MemoryGroup describes a set of
-/// instructions with same memory dependencies.
-///
-/// By construction, instructions of a MemoryGroup don't depend on each other.
-/// At dispatch stage, instructions are mapped by the LSUnit to MemoryGroups.
-/// A Memory group identifier is then stored as a "token" in field
-/// Instruction::LSUTokenID of each dispatched instructions. That token is used
-/// internally by the LSUnit to track memory dependencies.
-class MemoryGroup {
-  unsigned NumPredecessors = 0;
-  unsigned NumExecutingPredecessors = 0;
-  unsigned NumExecutedPredecessors = 0;
-
-  unsigned NumInstructions = 0;
-  unsigned NumExecuting = 0;
-  unsigned NumExecuted = 0;
-  // Successors that are in a order dependency with this group.
-  SmallVector<MemoryGroup *, 4> OrderSucc;
-  // Successors that are in a data dependency with this group.
-  SmallVector<MemoryGroup *, 4> DataSucc;
-
-  CriticalDependency CriticalPredecessor;
-  InstRef CriticalMemoryInstruction;
-
-  MemoryGroup(const MemoryGroup &) = delete;
-  MemoryGroup &operator=(const MemoryGroup &) = delete;
-
-public:
-  MemoryGroup() = default;
-  MemoryGroup(MemoryGroup &&) = default;
-
-  size_t getNumSuccessors() const {
-    return OrderSucc.size() + DataSucc.size();
-  }
-  unsigned getNumPredecessors() const { return NumPredecessors; }
-  unsigned getNumExecutingPredecessors() const {
-    return NumExecutingPredecessors;
-  }
-  unsigned getNumExecutedPredecessors() const {
-    return NumExecutedPredecessors;
-  }
-  unsigned getNumInstructions() const { return NumInstructions; }
-  unsigned getNumExecuting() const { return NumExecuting; }
-  unsigned getNumExecuted() const { return NumExecuted; }
-
-  const InstRef &getCriticalMemoryInstruction() const {
-    return CriticalMemoryInstruction;
-  }
-  const CriticalDependency &getCriticalPredecessor() const {
-    return CriticalPredecessor;
-  }
-
-  void addSuccessor(MemoryGroup *Group, bool IsDataDependent) {
-    // Do not need to add a dependency if there is no data
-    // dependency and all instructions from this group have been
-    // issued already.
-    if (!IsDataDependent && isExecuting())
-      return;
-
-    Group->NumPredecessors++;
-    assert(!isExecuted() && "Should have been removed!");
-    if (isExecuting())
-      Group->onGroupIssued(CriticalMemoryInstruction, IsDataDependent);
-
-    if (IsDataDependent)
-      DataSucc.emplace_back(Group);
-    else
-      OrderSucc.emplace_back(Group);
-  }
-
-  bool isWaiting() const {
-    return NumPredecessors >
-           (NumExecutingPredecessors + NumExecutedPredecessors);
-  }
-  bool isPending() const {
-    return NumExecutingPredecessors &&
-           ((NumExecutedPredecessors + NumExecutingPredecessors) ==
-            NumPredecessors);
-  }
-  bool isReady() const { return NumExecutedPredecessors == NumPredecessors; }
-  bool isExecuting() const {
-    return NumExecuting && (NumExecuting == (NumInstructions - NumExecuted));
-  }
-  bool isExecuted() const { return NumInstructions == NumExecuted; }
-
-  void onGroupIssued(const InstRef &IR, bool ShouldUpdateCriticalDep) {
-    assert(!isReady() && "Unexpected group-start event!");
-    NumExecutingPredecessors++;
-
-    if (!ShouldUpdateCriticalDep)
-      return;
-
-    unsigned Cycles = IR.getInstruction()->getCyclesLeft();
-    if (CriticalPredecessor.Cycles < Cycles) {
-      CriticalPredecessor.IID = IR.getSourceIndex();
-      CriticalPredecessor.Cycles = Cycles;
-    }
-  }
-
-  void onGroupExecuted() {
-    assert(!isReady() && "Inconsistent state found!");
-    NumExecutingPredecessors--;
-    NumExecutedPredecessors++;
-  }
-
-  void onInstructionIssued(const InstRef &IR) {
-    assert(!isExecuting() && "Invalid internal state!");
-    ++NumExecuting;
-
-    // update the CriticalMemDep.
-    const Instruction &IS = *IR.getInstruction();
-    if ((bool)CriticalMemoryInstruction) {
-      const Instruction &OtherIS = *CriticalMemoryInstruction.getInstruction();
-      if (OtherIS.getCyclesLeft() < IS.getCyclesLeft())
-        CriticalMemoryInstruction = IR;
-    } else {
-      CriticalMemoryInstruction = IR;
-    }
-
-    if (!isExecuting())
-      return;
-
-    // Notify successors that this group started execution.
-    for (MemoryGroup *MG : OrderSucc) {
-      MG->onGroupIssued(CriticalMemoryInstruction, false);
-      // Release the order dependency with this group.
-      MG->onGroupExecuted();
-    }
-
-    for (MemoryGroup *MG : DataSucc)
-      MG->onGroupIssued(CriticalMemoryInstruction, true);
-  }
-
-  void onInstructionExecuted(const InstRef &IR) {
-    assert(isReady() && !isExecuted() && "Invalid internal state!");
-    --NumExecuting;
-    ++NumExecuted;
-
-    if (CriticalMemoryInstruction &&
-        CriticalMemoryInstruction.getSourceIndex() == IR.getSourceIndex()) {
-      CriticalMemoryInstruction.invalidate();
-    }
-
-    if (!isExecuted())
-      return;
-
-    // Notify data dependent successors that this group has finished execution.
-    for (MemoryGroup *MG : DataSucc)
-      MG->onGroupExecuted();
-  }
-
-  void addInstruction() {
-    assert(!getNumSuccessors() && "Cannot add instructions to this group!");
-    ++NumInstructions;
-  }
-
-  void cycleEvent() {
-    if (isWaiting() && CriticalPredecessor.Cycles)
-      CriticalPredecessor.Cycles--;
-  }
-};
-
 /// Abstract base interface for LS (load/store) units in llvm-mca.
 class LSUnitBase : public HardwareUnit {
   /// Load queue size.
@@ -214,10 +52,6 @@ class LSUnitBase : public HardwareUnit {
   /// alias with stores.
   const bool NoAlias;
 
-  /// Used to map group identifiers to MemoryGroups.
-  DenseMap<unsigned, std::unique_ptr<MemoryGroup>> Groups;
-  unsigned NextGroupID;
-
 public:
   LSUnitBase(const MCSchedModel &SM, unsigned LoadQueueSize,
              unsigned StoreQueueSize, bool AssumeNoAlias);
@@ -265,72 +99,35 @@ class LSUnitBase : public HardwareUnit {
   bool isSQFull() const { return SQSize && SQSize == UsedSQEntries; }
   bool isLQFull() const { return LQSize && LQSize == UsedLQEntries; }
 
-  bool isValidGroupID(unsigned Index) const {
-    return Index && Groups.contains(Index);
-  }
-
   /// Check if a peviously dispatched instruction IR is now ready for execution.
-  bool isReady(const InstRef &IR) const {
-    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
-    const MemoryGroup &Group = getGroup(GroupID);
-    return Group.isReady();
-  }
+  virtual bool isReady(const InstRef &IR) const = 0;
 
   /// Check if instruction IR only depends on memory instructions that are
   /// currently executing.
-  bool isPending(const InstRef &IR) const {
-    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
-    const MemoryGroup &Group = getGroup(GroupID);
-    return Group.isPending();
-  }
+  virtual bool isPending(const InstRef &IR) const = 0;
 
   /// Check if instruction IR is still waiting on memory operations, and the
   /// wait time is still unknown.
-  bool isWaiting(const InstRef &IR) const {
-    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
-    const MemoryGroup &Group = getGroup(GroupID);
-    return Group.isWaiting();
-  }
+  virtual bool isWaiting(const InstRef &IR) const = 0;
 
-  bool hasDependentUsers(const InstRef &IR) const {
-    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
-    const MemoryGroup &Group = getGroup(GroupID);
-    return !Group.isExecuted() && Group.getNumSuccessors();
-  }
+  virtual bool hasDependentUsers(const InstRef &IR) const = 0;
 
-  const MemoryGroup &getGroup(unsigned Index) const {
-    assert(isValidGroupID(Index) && "Group doesn't exist!");
-    return *Groups.find(Index)->second;
-  }
+  virtual const CriticalDependency getCriticalPredecessor(unsigned GroupId) = 0;
 
-  MemoryGroup &getGroup(unsigned Index) {
-    assert(isValidGroupID(Index) && "Group doesn't exist!");
-    return *Groups.find(Index)->second;
-  }
-
-  unsigned createMemoryGroup() {
-    Groups.insert(
-        std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));
-    return NextGroupID++;
-  }
-
-  virtual void onInstructionExecuted(const InstRef &IR);
+  virtual void onInstructionExecuted(const InstRef &IR) = 0;
 
   // Loads are tracked by the LDQ (load queue) from dispatch until completion.
   // Stores are tracked by the STQ (store queue) from dispatch until commitment.
   // By default we conservatively assume that the LDQ receives a load at
   // dispatch. Loads leave the LDQ at retirement stage.
-  virtual void onInstructionRetired(const InstRef &IR);
+  virtual void onInstructionRetired(const InstRef &IR) = 0;
 
-  virtual void onInstructionIssued(const InstRef &IR) {
-    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
-    Groups[GroupID]->onInstructionIssued(IR);
-  }
+  virtual void onInstructionIssued(const InstRef &IR) = 0;
 
-  virtual void cycleEvent();
+  virtual void cycleEvent() = 0;
 
 #ifndef NDEBUG
-  void dump() const;
+  virtual void dump() const = 0;
 #endif
 };
 
@@ -396,6 +193,7 @@ class LSUnitBase : public HardwareUnit {
 /// the load/store queue(s). That also means, all the older loads/stores have
 /// already been executed.
 class LSUnit : public LSUnitBase {
+
   // This class doesn't know about the latency of a load instruction. So, it
   // conservatively/pessimistically assumes that the latency of a load opcode
   // matches the instruction latency.
@@ -434,6 +232,175 @@ class LSUnit : public LSUnitBase {
   // An instruction that both 'MayLoad' and 'HasUnmodeledSideEffects' is
   // conservatively treated as a load barrier. It forces older loads to execute
   // before newer loads are issued.
+
+protected:
+  /// A node of a memory dependency graph. A MemoryGroup describes a set of
+  /// instructions with same memory dependencies.
+  ///
+  /// By construction, instructions of a MemoryGroup don't depend on each other.
+  /// At dispatch stage, instructions are mapped by the LSUnit to MemoryGroups.
+  /// A Memory group identifier is then stored as a "token" in field
+  /// Instruction::LSUTokenID of each dispatched instructions. That token is
+  /// used internally by the LSUnit to track memory dependencies.
+  class MemoryGroup {
+    unsigned NumPredecessors = 0;
+    unsigned NumExecutingPredecessors = 0;
+    unsigned NumExecutedPredecessors = 0;
+
+    unsigned NumInstructions = 0;
+    unsigned NumExecuting = 0;
+    unsigned NumExecuted = 0;
+    // Successors that are in a order dependency with this group.
+    SmallVector<MemoryGroup *, 4> OrderSucc;
+    // Successors that are in a data dependency with this group.
+    SmallVector<MemoryGroup *, 4> DataSucc;
+
+    CriticalDependency CriticalPredecessor;
+    InstRef CriticalMemoryInstruction;
+
+    MemoryGroup(const MemoryGroup &) = delete;
+    MemoryGroup &operator=(const MemoryGroup &) = delete;
+
+  public:
+    MemoryGroup() = default;
+    MemoryGroup(MemoryGroup &&) = default;
+
+    size_t getNumSuccessors() const {
+      return OrderSucc.size() + DataSucc.size();
+    }
+    unsigned getNumPredecessors() const { return NumPredecessors; }
+    unsigned getNumExecutingPredecessors() const {
+      return NumExecutingPredecessors;
+    }
+    unsigned getNumExecutedPredecessors() const {
+      return NumExecutedPredecessors;
+    }
+    unsigned getNumInstructions() const { return NumInstructions; }
+    unsigned getNumExecuting() const { return NumExecuting; }
+    unsigned getNumExecuted() const { return NumExecuted; }
+
+    const InstRef &getCriticalMemoryInstruction() const {
+      return CriticalMemoryInstruction;
+    }
+    const CriticalDependency &getCriticalPredecessor() const {
+      return CriticalPredecessor;
+    }
+
+    void addSuccessor(MemoryGroup *Group, bool IsDataDependent) {
+      // Do not need to add a dependency if there is no data
+      // dependency and all instructions from this group have been
+      // issued already.
+      if (!IsDataDependent && isExecuting())
+        return;
+
+      Group->NumPredecessors++;
+      assert(!isExecuted() && "Should have been removed!");
+      if (isExecuting())
+        Group->onGroupIssued(CriticalMemoryInstruction, IsDataDependent);
+
+      if (IsDataDependent)
+        DataSucc.emplace_back(Group);
+      else
+        OrderSucc.emplace_back(Group);
+    }
+
+    bool isWaiting() const {
+      return NumPredecessors >
+             (NumExecutingPredecessors + NumExecutedPredecessors);
+    }
+    bool isPending() const {
+      return NumExecutingPredecessors &&
+             ((NumExecutedPredecessors + NumExecutingPredecessors) ==
+              NumPredecessors);
+    }
+    bool isReady() const { return NumExecutedPredecessors == NumPredecessors; }
+    bool isExecuting() const {
+      return NumExecuting && (NumExecuting == (NumInstructions - NumExecuted));
+    }
+    bool isExecuted() const { return NumInstructions == NumExecuted; }
+
+    void onGroupIssued(const InstRef &IR, bool ShouldUpdateCriticalDep) {
+      assert(!isReady() && "Unexpected group-start event!");
+      NumExecutingPredecessors++;
+
+      if (!ShouldUpdateCriticalDep)
+        return;
+
+      unsigned Cycles = IR.getInstruction()->getCyclesLeft();
+      if (CriticalPredecessor.Cycles < Cycles) {
+        CriticalPredecessor.IID = IR.getSourceIndex();
+        CriticalPredecessor.Cycles = Cycles;
+      }
+    }
+
+    void onGroupExecuted() {
+      assert(!isReady() && "Inconsistent state found!");
+      NumExecutingPredecessors--;
+      NumExecutedPredecessors++;
+    }
+
+    void onInstructionIssued(const InstRef &IR) {
+      assert(!isExecuting() && "Invalid internal state!");
+      ++NumExecuting;
+
+      // update the CriticalMemDep.
+      const Instruction &IS = *IR.getInstruction();
+      if ((bool)CriticalMemoryInstruction) {
+        const Instruction &OtherIS =
+            *CriticalMemoryInstruction.getInstruction();
+        if (OtherIS.getCyclesLeft() < IS.getCyclesLeft())
+          CriticalMemoryInstruction = IR;
+      } else {
+        CriticalMemoryInstruction = IR;
+      }
+
+      if (!isExecuting())
+        return;
+
+      // Notify successors that this group started execution.
+      for (MemoryGroup *MG : OrderSucc) {
+        MG->onGroupIssued(CriticalMemoryInstruction, false);
+        // Release the order dependency with this group.
+        MG->onGroupExecuted();
+      }
+
+      for (MemoryGroup *MG : DataSucc)
+        MG->onGroupIssued(CriticalMemoryInstruction, true);
+    }
+
+    void onInstructionExecuted(const InstRef &IR) {
+      assert(isReady() && !isExecuted() && "Invalid internal state!");
+      --NumExecuting;
+      ++NumExecuted;
+
+      if (CriticalMemoryInstruction &&
+          CriticalMemoryInstruction.getSourceIndex() == IR.getSourceIndex()) {
+        CriticalMemoryInstruction.invalidate();
+      }
+
+      if (!isExecuted())
+        return;
+
+      // Notify data dependent successors that this group has finished
+      // execution.
+      for (MemoryGroup *MG : DataSucc)
+        MG->onGroupExecuted();
+    }
+
+    void addInstruction() {
+      assert(!getNumSuccessors() && "Cannot add instructions to this group!");
+      ++NumInstructions;
+    }
+
+    void cycleEvent() {
+      if (isWaiting() && CriticalPredecessor.Cycles)
+        CriticalPredecessor.Cycles--;
+    }
+  };
+  /// Used to map group identifiers to MemoryGroups.
+  DenseMap<unsigned, std::unique_ptr<MemoryGroup>> Groups;
+  unsigned NextGroupID = 1;
+
   unsigned CurrentLoadGroupID;
   unsigned CurrentLoadBarrierGroupID;
   unsigned CurrentStoreGroupID;
@@ -453,6 +420,35 @@ class LSUnit : public LSUnitBase {
   /// accomodate instruction IR.
   Status isAvailable(const InstRef &IR) const override;
 
+  bool isReady(const InstRef &IR) const override {
+    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+    const MemoryGroup &Group = getGroup(GroupID);
+    return Group.isReady();
+  }
+
+  bool isPending(const InstRef &IR) const override {
+    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+    const MemoryGroup &Group = getGroup(GroupID);
+    return Group.isPending();
+  }
+
+  bool isWaiting(const InstRef &IR) const override {
+    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+    const MemoryGroup &Group = getGroup(GroupID);
+    return Group.isWaiting();
+  }
+
+  bool hasDependentUsers(const InstRef &IR) const override {
+    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+    const MemoryGroup &Group = getGroup(GroupID);
+    return !Group.isExecuted() && Group.getNumSuccessors();
+  }
+
+  const CriticalDependency getCriticalPredecessor(unsigned GroupId) override {
+    const MemoryGroup &Group = getGroup(GroupId);
+    return Group.getCriticalPredecessor();
+  }
+
   /// Allocates LS resources for instruction IR.
   ///
   /// This method assumes that a previous call to `isAvailable(IR)` succeeded
@@ -468,7 +464,40 @@ class LSUnit : public LSUnitBase {
   /// 6. A store has to wait until an older store barrier is fully executed.
   unsigned dispatch(const InstRef &IR) override;
 
-  void onInstructionExecuted(const InstRef &IR) override;
+  virtual void onInstructionIssued(const InstRef &IR) override {
+    unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+    Groups[GroupID]->onInstructionIssued(IR);
+  }
+
+  virtual void onInstructionRetired(const InstRef &IR) override;
+
+  virtual void onInstructionExecuted(const InstRef &IR) override;
+
+  virtual void cycleEvent() override;
+
+#ifndef NDEBUG
+  virtual void dump() const override;
+#endif
+
+private:
+  bool isValidGroupID(unsigned Index) const {
+    return Index && Groups.contains(Index);
+  }
+
+  const MemoryGroup &getGroup(unsigned Index) const {
+    assert(isValidGroupID(Index) && "Group doesn't exist!");
+    return *Groups.find(Index)->second;
+  }
+
+  MemoryGroup &getGroup(unsigned Index) {
+    assert(isValidGroupID(Index) && "Group doesn't exist!");
+    return *Groups.find(Index)->second;
+  }
+
+  unsigned createMemoryGroup() {
+    Groups.insert(std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));
+    return NextGroupID++;
+  }
 };
 
 } // namespace mca
diff --git a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
index bdc8b3d0e390e2..bf0b4325248814 100644
--- a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
+++ b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
@@ -24,7 +24,7 @@ namespace mca {
 LSUnitBase::LSUnitBase(const MCSchedModel &SM, unsigned LQ, unsigned SQ,
                        bool AssumeNoAlias)
     : LQSize(LQ), SQSize(SQ), UsedLQEntries(0), UsedSQEntries(0),
-      NoAlias(AssumeNoAlias), NextGroupID(1) {
+      NoAlias(AssumeNoAlias) {
   if (SM.hasExtraProcessorInfo()) {
     const MCExtraProcessorInfo &EPI = SM.getExtraProcessorInfo();
     if (!LQSize && EPI.LoadQueueID) {
@@ -41,13 +41,13 @@ LSUnitBase::LSUnitBase(const MCSchedModel &SM, unsigned LQ, unsigned SQ,
 
 LSUnitBase::~LSUnitBase() = default;
 
-void LSUnitBase::cycleEvent() {
+void LSUnit::cycleEvent() {
   for (const std::pair<unsigned, std::unique_ptr<MemoryGroup>> &G : Groups)
     G.second->cycleEvent();
 }
 
 #ifndef NDEBUG
-void LSUnitBase::dump() const {
+void LSUnit::dump() const {
   dbgs() << "[LSUnit] LQ_Size = " << getLoadQueueSize() << '\n';
   dbgs() << "[LSUnit] SQ_Size = " << getStoreQueueSize() << '\n';
   dbgs() << "[LSUnit] NextLQSlotIdx = " << getUsedLQEntries() << '\n';
@@ -96,8 +96,8 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
     if (CurrentStoreBarrierGroupID) {
       MemoryGroup &StoreGroup = getGroup(CurrentStoreBarrierGroupID);
       LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
-                        << CurrentStoreBarrierGroupID
-                        << ") --> (" << NewGID << ")\n");
+                        << CurrentStoreBarrierGroupID << ") --> (" << NewGID
+                        << ")\n");
       StoreGroup.addSuccessor(&NewGroup, true);
     }
 
@@ -110,7 +110,6 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
       StoreGroup.addSuccessor(&NewGroup, !assumeNoAlias());
     }
 
-
     CurrentStoreGroupID = NewGI...
[truncated]

@adibiagio
Copy link
Collaborator

adibiagio commented Oct 30, 2024

In my eyes, this is an implementation detail. Other LSUnits may want to keep track of data dependencies in different ways.

In my eyes, using a DAG for scheduling memory groups is not necessarily "just" an implementation detail.
The general idea is that, no matter which consistency model you want to implement, you can always represent memory dependencies using edges of a DAG, where nodes are (groups of) instructions.
Other algorithms in LLVM use similar approaches. For example, machine schedulers use a DAG to track dependencies in a code region.

The original intention was to provide a target independent mechanism for adding nodes and edges to the dependency graph. Custom LSUnits can then decide when/how to assign edges/form groups.

Just to clarify, I am not against this patch. I am mainly curious about why using the existing framework would not work in your particular case. What is your solution using for tracking execution of dependent memory operations?

Generally speaking, I would be tempted to accept your patch. However, I'd like to hear the opinion of @mshockwave and @michaelmaitland on this.

@andrej
Copy link
Contributor Author

andrej commented Oct 30, 2024

Thank you for replying to this so quickly.

In my eyes, using a DAG for scheduling memory groups is not necessarily "just" an implementation detail.
The general idea is that, no matter which consistency model you want to implement, you can always represent memory dependencies using edges of a DAG, where nodes are (groups of) instructions.

I agree that DAGs make sense for tracking data dependencies. Our implementation does in fact also use a DAG, but ...

I am mainly curious about why using the existing framework would not work in your particular case. What is your solution using for tracking execution of dependent memory operations?

... the problem is that subclasses of LSUnitBase are forced to use the MemoryGroup class to represent the DAG. That robs them of the opportunity of attaching more detailed information or custom behavior for the nodes of the DAG. In our case, we wanted to attach the addresses at which instructions access memory (obtained from a trace) to the MemoryGroup, but this isn't possible without patching.

Note that subclasses of LSUnit could still use MemoryGroup, e.g. if they only wanted to override dispatch. Subclassing LSUnitBase would then be reserved for more broadly different approaches to modeling the load/store unit. The overarching principle being that LSUnitBase would only specify the interface that other hardware units would need to interact with the load/store unit, e.g. the scheduler.

Alternative Approach
I can see that there could be arguments for keeping the DAG as part of the LSUnitBase interface. My suggestion in that case would be to expose MemoryGroups at a more abstract level, so that subclasses still can customize the DAG implementation. I've implemented this as the "alternative approach" linked above, and am happy to instead make a PR for that if that approach if it is preferred.

Click for details on alternative approach This alternative implementation represents the DAG in `LSUnitBase` using a new class `AbstractMemoryGroup`, which specifies an interface for inspecting the DAG using virtual methods (such as `getNumPredecessors()` or `isWaiting()` for each group). `MemoryGroup` becomes a subclass of `AbstractMemoryGroup` -- a specific implementation of a DAG as used by the default `LSUnit` implementation, which contains the methods for how that load/store unit implementation might modify the DAG (such as adding nodes in `dispatch`). This still gives other `LSUnitBase` subclasses options to implement the DAG in an alternative way, but makes it clear that it is expected of all subclasses to provide some DAG in some way.

On another note, I realized my performance comparison above was flawed because that test does not contain any load/store instructions. However I just ran time ./bin/llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 ../llvm/test/tools/llvm-mca/X86/BtVer2/load-store-alias.s and can confirm that the virtual dispatch still did not cause a significant slowdown (in neither this PR nor the "alternative approach" linked). All run in 0.60s.

Sorry for the long message. Please let me know if there is any preference between the approach in this PR (generalize LSUnitBase overall) or the alternative approach (generalize MemoryGroup only).

@adibiagio
Copy link
Collaborator

... the problem is that subclasses of LSUnitBase are forced to use the MemoryGroup class to represent the DAG. That robs them of the opportunity of attaching more detailed information or custom behavior for the nodes of the DAG. In our case, we wanted to attach the addresses at which instructions access memory (obtained from a trace) to the MemoryGroup, but this isn't possible without patching.

Thanks, I understand it now.

This patch looks good to me, and I personally prefer it to the alternative approach.

Let see if @mshockwave and @michaelmaitland are also of the same opinion.

@andrej
Copy link
Contributor Author

andrej commented Nov 6, 2024

Ping

Copy link
Collaborator

@adibiagio adibiagio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrej
Copy link
Contributor Author

andrej commented Nov 12, 2024

Can this be merged in that case or are we still waiting for reviews?

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry being late.
LGTM

@adibiagio adibiagio merged commit abda8ce into llvm:main Nov 18, 2024
10 checks passed
Copy link

@andrej Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@andrej
Copy link
Contributor Author

andrej commented Nov 18, 2024

Thank you for merging this. I saw some checks didn't pass; from what I can tell, these are unrelated to this changeset, e.g. failing builds due to "no space left on device".

andrej added a commit to securesystemslab/LLVM-MCA-Daemon that referenced this pull request Nov 28, 2024
This patch currently conflicts because it landed in LLVM upstream [LLVM #114159](llvm/llvm-project#114159)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants