-
Notifications
You must be signed in to change notification settings - Fork 14.3k
llvm-mca: Disentangle MemoryGroup
from LSUnitBase
#114159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-tools-llvm-mca Author: André Rösti (andrej) ChangesIn MCA, the load/store unit is modeled through a PR #101534 fixed one instance where the specialized The In my eyes, this is an implementation detail. Other This PR makes changes to instead leave it up to the subclasses how to model such dependencies, and only prescribes an abstract interface in Drawbacks / Considerations My reason for suggesting this PR is an out-of-tree use. As such, these changes don't introduce any new functionality for in-tree LLVM uses. However, in my opinion, these changes improve code clarity and prescribe a clear interface, which would be the main benefit for the LLVM community. A drawback of the more abstract interface is that virtual dispatching is used in more places. However, note that virtual dispatch is already currently used in some critical parts of the Alternatives If there is a desire to require subclasses to use some form of Thank you in advance for giving this your consideration! <sup>[1]: MCAD started by @mshockwave and @chinmaydd.</sup> Patch is 22.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114159.diff 3 Files Affected:
diff --git a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
index 81a5453bac26c4..38d6b77916ca8b 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/LSUnit.h
@@ -24,168 +24,6 @@
namespace llvm {
namespace mca {
-/// A node of a memory dependency graph. A MemoryGroup describes a set of
-/// instructions with same memory dependencies.
-///
-/// By construction, instructions of a MemoryGroup don't depend on each other.
-/// At dispatch stage, instructions are mapped by the LSUnit to MemoryGroups.
-/// A Memory group identifier is then stored as a "token" in field
-/// Instruction::LSUTokenID of each dispatched instructions. That token is used
-/// internally by the LSUnit to track memory dependencies.
-class MemoryGroup {
- unsigned NumPredecessors = 0;
- unsigned NumExecutingPredecessors = 0;
- unsigned NumExecutedPredecessors = 0;
-
- unsigned NumInstructions = 0;
- unsigned NumExecuting = 0;
- unsigned NumExecuted = 0;
- // Successors that are in a order dependency with this group.
- SmallVector<MemoryGroup *, 4> OrderSucc;
- // Successors that are in a data dependency with this group.
- SmallVector<MemoryGroup *, 4> DataSucc;
-
- CriticalDependency CriticalPredecessor;
- InstRef CriticalMemoryInstruction;
-
- MemoryGroup(const MemoryGroup &) = delete;
- MemoryGroup &operator=(const MemoryGroup &) = delete;
-
-public:
- MemoryGroup() = default;
- MemoryGroup(MemoryGroup &&) = default;
-
- size_t getNumSuccessors() const {
- return OrderSucc.size() + DataSucc.size();
- }
- unsigned getNumPredecessors() const { return NumPredecessors; }
- unsigned getNumExecutingPredecessors() const {
- return NumExecutingPredecessors;
- }
- unsigned getNumExecutedPredecessors() const {
- return NumExecutedPredecessors;
- }
- unsigned getNumInstructions() const { return NumInstructions; }
- unsigned getNumExecuting() const { return NumExecuting; }
- unsigned getNumExecuted() const { return NumExecuted; }
-
- const InstRef &getCriticalMemoryInstruction() const {
- return CriticalMemoryInstruction;
- }
- const CriticalDependency &getCriticalPredecessor() const {
- return CriticalPredecessor;
- }
-
- void addSuccessor(MemoryGroup *Group, bool IsDataDependent) {
- // Do not need to add a dependency if there is no data
- // dependency and all instructions from this group have been
- // issued already.
- if (!IsDataDependent && isExecuting())
- return;
-
- Group->NumPredecessors++;
- assert(!isExecuted() && "Should have been removed!");
- if (isExecuting())
- Group->onGroupIssued(CriticalMemoryInstruction, IsDataDependent);
-
- if (IsDataDependent)
- DataSucc.emplace_back(Group);
- else
- OrderSucc.emplace_back(Group);
- }
-
- bool isWaiting() const {
- return NumPredecessors >
- (NumExecutingPredecessors + NumExecutedPredecessors);
- }
- bool isPending() const {
- return NumExecutingPredecessors &&
- ((NumExecutedPredecessors + NumExecutingPredecessors) ==
- NumPredecessors);
- }
- bool isReady() const { return NumExecutedPredecessors == NumPredecessors; }
- bool isExecuting() const {
- return NumExecuting && (NumExecuting == (NumInstructions - NumExecuted));
- }
- bool isExecuted() const { return NumInstructions == NumExecuted; }
-
- void onGroupIssued(const InstRef &IR, bool ShouldUpdateCriticalDep) {
- assert(!isReady() && "Unexpected group-start event!");
- NumExecutingPredecessors++;
-
- if (!ShouldUpdateCriticalDep)
- return;
-
- unsigned Cycles = IR.getInstruction()->getCyclesLeft();
- if (CriticalPredecessor.Cycles < Cycles) {
- CriticalPredecessor.IID = IR.getSourceIndex();
- CriticalPredecessor.Cycles = Cycles;
- }
- }
-
- void onGroupExecuted() {
- assert(!isReady() && "Inconsistent state found!");
- NumExecutingPredecessors--;
- NumExecutedPredecessors++;
- }
-
- void onInstructionIssued(const InstRef &IR) {
- assert(!isExecuting() && "Invalid internal state!");
- ++NumExecuting;
-
- // update the CriticalMemDep.
- const Instruction &IS = *IR.getInstruction();
- if ((bool)CriticalMemoryInstruction) {
- const Instruction &OtherIS = *CriticalMemoryInstruction.getInstruction();
- if (OtherIS.getCyclesLeft() < IS.getCyclesLeft())
- CriticalMemoryInstruction = IR;
- } else {
- CriticalMemoryInstruction = IR;
- }
-
- if (!isExecuting())
- return;
-
- // Notify successors that this group started execution.
- for (MemoryGroup *MG : OrderSucc) {
- MG->onGroupIssued(CriticalMemoryInstruction, false);
- // Release the order dependency with this group.
- MG->onGroupExecuted();
- }
-
- for (MemoryGroup *MG : DataSucc)
- MG->onGroupIssued(CriticalMemoryInstruction, true);
- }
-
- void onInstructionExecuted(const InstRef &IR) {
- assert(isReady() && !isExecuted() && "Invalid internal state!");
- --NumExecuting;
- ++NumExecuted;
-
- if (CriticalMemoryInstruction &&
- CriticalMemoryInstruction.getSourceIndex() == IR.getSourceIndex()) {
- CriticalMemoryInstruction.invalidate();
- }
-
- if (!isExecuted())
- return;
-
- // Notify data dependent successors that this group has finished execution.
- for (MemoryGroup *MG : DataSucc)
- MG->onGroupExecuted();
- }
-
- void addInstruction() {
- assert(!getNumSuccessors() && "Cannot add instructions to this group!");
- ++NumInstructions;
- }
-
- void cycleEvent() {
- if (isWaiting() && CriticalPredecessor.Cycles)
- CriticalPredecessor.Cycles--;
- }
-};
-
/// Abstract base interface for LS (load/store) units in llvm-mca.
class LSUnitBase : public HardwareUnit {
/// Load queue size.
@@ -214,10 +52,6 @@ class LSUnitBase : public HardwareUnit {
/// alias with stores.
const bool NoAlias;
- /// Used to map group identifiers to MemoryGroups.
- DenseMap<unsigned, std::unique_ptr<MemoryGroup>> Groups;
- unsigned NextGroupID;
-
public:
LSUnitBase(const MCSchedModel &SM, unsigned LoadQueueSize,
unsigned StoreQueueSize, bool AssumeNoAlias);
@@ -265,72 +99,35 @@ class LSUnitBase : public HardwareUnit {
bool isSQFull() const { return SQSize && SQSize == UsedSQEntries; }
bool isLQFull() const { return LQSize && LQSize == UsedLQEntries; }
- bool isValidGroupID(unsigned Index) const {
- return Index && Groups.contains(Index);
- }
-
/// Check if a peviously dispatched instruction IR is now ready for execution.
- bool isReady(const InstRef &IR) const {
- unsigned GroupID = IR.getInstruction()->getLSUTokenID();
- const MemoryGroup &Group = getGroup(GroupID);
- return Group.isReady();
- }
+ virtual bool isReady(const InstRef &IR) const = 0;
/// Check if instruction IR only depends on memory instructions that are
/// currently executing.
- bool isPending(const InstRef &IR) const {
- unsigned GroupID = IR.getInstruction()->getLSUTokenID();
- const MemoryGroup &Group = getGroup(GroupID);
- return Group.isPending();
- }
+ virtual bool isPending(const InstRef &IR) const = 0;
/// Check if instruction IR is still waiting on memory operations, and the
/// wait time is still unknown.
- bool isWaiting(const InstRef &IR) const {
- unsigned GroupID = IR.getInstruction()->getLSUTokenID();
- const MemoryGroup &Group = getGroup(GroupID);
- return Group.isWaiting();
- }
+ virtual bool isWaiting(const InstRef &IR) const = 0;
- bool hasDependentUsers(const InstRef &IR) const {
- unsigned GroupID = IR.getInstruction()->getLSUTokenID();
- const MemoryGroup &Group = getGroup(GroupID);
- return !Group.isExecuted() && Group.getNumSuccessors();
- }
+ virtual bool hasDependentUsers(const InstRef &IR) const = 0;
- const MemoryGroup &getGroup(unsigned Index) const {
- assert(isValidGroupID(Index) && "Group doesn't exist!");
- return *Groups.find(Index)->second;
- }
+ virtual const CriticalDependency getCriticalPredecessor(unsigned GroupId) = 0;
- MemoryGroup &getGroup(unsigned Index) {
- assert(isValidGroupID(Index) && "Group doesn't exist!");
- return *Groups.find(Index)->second;
- }
-
- unsigned createMemoryGroup() {
- Groups.insert(
- std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));
- return NextGroupID++;
- }
-
- virtual void onInstructionExecuted(const InstRef &IR);
+ virtual void onInstructionExecuted(const InstRef &IR) = 0;
// Loads are tracked by the LDQ (load queue) from dispatch until completion.
// Stores are tracked by the STQ (store queue) from dispatch until commitment.
// By default we conservatively assume that the LDQ receives a load at
// dispatch. Loads leave the LDQ at retirement stage.
- virtual void onInstructionRetired(const InstRef &IR);
+ virtual void onInstructionRetired(const InstRef &IR) = 0;
- virtual void onInstructionIssued(const InstRef &IR) {
- unsigned GroupID = IR.getInstruction()->getLSUTokenID();
- Groups[GroupID]->onInstructionIssued(IR);
- }
+ virtual void onInstructionIssued(const InstRef &IR) = 0;
- virtual void cycleEvent();
+ virtual void cycleEvent() = 0;
#ifndef NDEBUG
- void dump() const;
+ virtual void dump() const = 0;
#endif
};
@@ -396,6 +193,7 @@ class LSUnitBase : public HardwareUnit {
/// the load/store queue(s). That also means, all the older loads/stores have
/// already been executed.
class LSUnit : public LSUnitBase {
+
// This class doesn't know about the latency of a load instruction. So, it
// conservatively/pessimistically assumes that the latency of a load opcode
// matches the instruction latency.
@@ -434,6 +232,175 @@ class LSUnit : public LSUnitBase {
// An instruction that both 'MayLoad' and 'HasUnmodeledSideEffects' is
// conservatively treated as a load barrier. It forces older loads to execute
// before newer loads are issued.
+
+protected:
+ /// A node of a memory dependency graph. A MemoryGroup describes a set of
+ /// instructions with same memory dependencies.
+ ///
+ /// By construction, instructions of a MemoryGroup don't depend on each other.
+ /// At dispatch stage, instructions are mapped by the LSUnit to MemoryGroups.
+ /// A Memory group identifier is then stored as a "token" in field
+ /// Instruction::LSUTokenID of each dispatched instructions. That token is
+ /// used internally by the LSUnit to track memory dependencies.
+ class MemoryGroup {
+ unsigned NumPredecessors = 0;
+ unsigned NumExecutingPredecessors = 0;
+ unsigned NumExecutedPredecessors = 0;
+
+ unsigned NumInstructions = 0;
+ unsigned NumExecuting = 0;
+ unsigned NumExecuted = 0;
+ // Successors that are in a order dependency with this group.
+ SmallVector<MemoryGroup *, 4> OrderSucc;
+ // Successors that are in a data dependency with this group.
+ SmallVector<MemoryGroup *, 4> DataSucc;
+
+ CriticalDependency CriticalPredecessor;
+ InstRef CriticalMemoryInstruction;
+
+ MemoryGroup(const MemoryGroup &) = delete;
+ MemoryGroup &operator=(const MemoryGroup &) = delete;
+
+ public:
+ MemoryGroup() = default;
+ MemoryGroup(MemoryGroup &&) = default;
+
+ size_t getNumSuccessors() const {
+ return OrderSucc.size() + DataSucc.size();
+ }
+ unsigned getNumPredecessors() const { return NumPredecessors; }
+ unsigned getNumExecutingPredecessors() const {
+ return NumExecutingPredecessors;
+ }
+ unsigned getNumExecutedPredecessors() const {
+ return NumExecutedPredecessors;
+ }
+ unsigned getNumInstructions() const { return NumInstructions; }
+ unsigned getNumExecuting() const { return NumExecuting; }
+ unsigned getNumExecuted() const { return NumExecuted; }
+
+ const InstRef &getCriticalMemoryInstruction() const {
+ return CriticalMemoryInstruction;
+ }
+ const CriticalDependency &getCriticalPredecessor() const {
+ return CriticalPredecessor;
+ }
+
+ void addSuccessor(MemoryGroup *Group, bool IsDataDependent) {
+ // Do not need to add a dependency if there is no data
+ // dependency and all instructions from this group have been
+ // issued already.
+ if (!IsDataDependent && isExecuting())
+ return;
+
+ Group->NumPredecessors++;
+ assert(!isExecuted() && "Should have been removed!");
+ if (isExecuting())
+ Group->onGroupIssued(CriticalMemoryInstruction, IsDataDependent);
+
+ if (IsDataDependent)
+ DataSucc.emplace_back(Group);
+ else
+ OrderSucc.emplace_back(Group);
+ }
+
+ bool isWaiting() const {
+ return NumPredecessors >
+ (NumExecutingPredecessors + NumExecutedPredecessors);
+ }
+ bool isPending() const {
+ return NumExecutingPredecessors &&
+ ((NumExecutedPredecessors + NumExecutingPredecessors) ==
+ NumPredecessors);
+ }
+ bool isReady() const { return NumExecutedPredecessors == NumPredecessors; }
+ bool isExecuting() const {
+ return NumExecuting && (NumExecuting == (NumInstructions - NumExecuted));
+ }
+ bool isExecuted() const { return NumInstructions == NumExecuted; }
+
+ void onGroupIssued(const InstRef &IR, bool ShouldUpdateCriticalDep) {
+ assert(!isReady() && "Unexpected group-start event!");
+ NumExecutingPredecessors++;
+
+ if (!ShouldUpdateCriticalDep)
+ return;
+
+ unsigned Cycles = IR.getInstruction()->getCyclesLeft();
+ if (CriticalPredecessor.Cycles < Cycles) {
+ CriticalPredecessor.IID = IR.getSourceIndex();
+ CriticalPredecessor.Cycles = Cycles;
+ }
+ }
+
+ void onGroupExecuted() {
+ assert(!isReady() && "Inconsistent state found!");
+ NumExecutingPredecessors--;
+ NumExecutedPredecessors++;
+ }
+
+ void onInstructionIssued(const InstRef &IR) {
+ assert(!isExecuting() && "Invalid internal state!");
+ ++NumExecuting;
+
+ // update the CriticalMemDep.
+ const Instruction &IS = *IR.getInstruction();
+ if ((bool)CriticalMemoryInstruction) {
+ const Instruction &OtherIS =
+ *CriticalMemoryInstruction.getInstruction();
+ if (OtherIS.getCyclesLeft() < IS.getCyclesLeft())
+ CriticalMemoryInstruction = IR;
+ } else {
+ CriticalMemoryInstruction = IR;
+ }
+
+ if (!isExecuting())
+ return;
+
+ // Notify successors that this group started execution.
+ for (MemoryGroup *MG : OrderSucc) {
+ MG->onGroupIssued(CriticalMemoryInstruction, false);
+ // Release the order dependency with this group.
+ MG->onGroupExecuted();
+ }
+
+ for (MemoryGroup *MG : DataSucc)
+ MG->onGroupIssued(CriticalMemoryInstruction, true);
+ }
+
+ void onInstructionExecuted(const InstRef &IR) {
+ assert(isReady() && !isExecuted() && "Invalid internal state!");
+ --NumExecuting;
+ ++NumExecuted;
+
+ if (CriticalMemoryInstruction &&
+ CriticalMemoryInstruction.getSourceIndex() == IR.getSourceIndex()) {
+ CriticalMemoryInstruction.invalidate();
+ }
+
+ if (!isExecuted())
+ return;
+
+ // Notify data dependent successors that this group has finished
+ // execution.
+ for (MemoryGroup *MG : DataSucc)
+ MG->onGroupExecuted();
+ }
+
+ void addInstruction() {
+ assert(!getNumSuccessors() && "Cannot add instructions to this group!");
+ ++NumInstructions;
+ }
+
+ void cycleEvent() {
+ if (isWaiting() && CriticalPredecessor.Cycles)
+ CriticalPredecessor.Cycles--;
+ }
+ };
+ /// Used to map group identifiers to MemoryGroups.
+ DenseMap<unsigned, std::unique_ptr<MemoryGroup>> Groups;
+ unsigned NextGroupID = 1;
+
unsigned CurrentLoadGroupID;
unsigned CurrentLoadBarrierGroupID;
unsigned CurrentStoreGroupID;
@@ -453,6 +420,35 @@ class LSUnit : public LSUnitBase {
/// accomodate instruction IR.
Status isAvailable(const InstRef &IR) const override;
+ bool isReady(const InstRef &IR) const override {
+ unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+ const MemoryGroup &Group = getGroup(GroupID);
+ return Group.isReady();
+ }
+
+ bool isPending(const InstRef &IR) const override {
+ unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+ const MemoryGroup &Group = getGroup(GroupID);
+ return Group.isPending();
+ }
+
+ bool isWaiting(const InstRef &IR) const override {
+ unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+ const MemoryGroup &Group = getGroup(GroupID);
+ return Group.isWaiting();
+ }
+
+ bool hasDependentUsers(const InstRef &IR) const override {
+ unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+ const MemoryGroup &Group = getGroup(GroupID);
+ return !Group.isExecuted() && Group.getNumSuccessors();
+ }
+
+ const CriticalDependency getCriticalPredecessor(unsigned GroupId) override {
+ const MemoryGroup &Group = getGroup(GroupId);
+ return Group.getCriticalPredecessor();
+ }
+
/// Allocates LS resources for instruction IR.
///
/// This method assumes that a previous call to `isAvailable(IR)` succeeded
@@ -468,7 +464,40 @@ class LSUnit : public LSUnitBase {
/// 6. A store has to wait until an older store barrier is fully executed.
unsigned dispatch(const InstRef &IR) override;
- void onInstructionExecuted(const InstRef &IR) override;
+ virtual void onInstructionIssued(const InstRef &IR) override {
+ unsigned GroupID = IR.getInstruction()->getLSUTokenID();
+ Groups[GroupID]->onInstructionIssued(IR);
+ }
+
+ virtual void onInstructionRetired(const InstRef &IR) override;
+
+ virtual void onInstructionExecuted(const InstRef &IR) override;
+
+ virtual void cycleEvent() override;
+
+#ifndef NDEBUG
+ virtual void dump() const override;
+#endif
+
+private:
+ bool isValidGroupID(unsigned Index) const {
+ return Index && Groups.contains(Index);
+ }
+
+ const MemoryGroup &getGroup(unsigned Index) const {
+ assert(isValidGroupID(Index) && "Group doesn't exist!");
+ return *Groups.find(Index)->second;
+ }
+
+ MemoryGroup &getGroup(unsigned Index) {
+ assert(isValidGroupID(Index) && "Group doesn't exist!");
+ return *Groups.find(Index)->second;
+ }
+
+ unsigned createMemoryGroup() {
+ Groups.insert(std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));
+ return NextGroupID++;
+ }
};
} // namespace mca
diff --git a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
index bdc8b3d0e390e2..bf0b4325248814 100644
--- a/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
+++ b/llvm/lib/MCA/HardwareUnits/LSUnit.cpp
@@ -24,7 +24,7 @@ namespace mca {
LSUnitBase::LSUnitBase(const MCSchedModel &SM, unsigned LQ, unsigned SQ,
bool AssumeNoAlias)
: LQSize(LQ), SQSize(SQ), UsedLQEntries(0), UsedSQEntries(0),
- NoAlias(AssumeNoAlias), NextGroupID(1) {
+ NoAlias(AssumeNoAlias) {
if (SM.hasExtraProcessorInfo()) {
const MCExtraProcessorInfo &EPI = SM.getExtraProcessorInfo();
if (!LQSize && EPI.LoadQueueID) {
@@ -41,13 +41,13 @@ LSUnitBase::LSUnitBase(const MCSchedModel &SM, unsigned LQ, unsigned SQ,
LSUnitBase::~LSUnitBase() = default;
-void LSUnitBase::cycleEvent() {
+void LSUnit::cycleEvent() {
for (const std::pair<unsigned, std::unique_ptr<MemoryGroup>> &G : Groups)
G.second->cycleEvent();
}
#ifndef NDEBUG
-void LSUnitBase::dump() const {
+void LSUnit::dump() const {
dbgs() << "[LSUnit] LQ_Size = " << getLoadQueueSize() << '\n';
dbgs() << "[LSUnit] SQ_Size = " << getStoreQueueSize() << '\n';
dbgs() << "[LSUnit] NextLQSlotIdx = " << getUsedLQEntries() << '\n';
@@ -96,8 +96,8 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
if (CurrentStoreBarrierGroupID) {
MemoryGroup &StoreGroup = getGroup(CurrentStoreBarrierGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
- << CurrentStoreBarrierGroupID
- << ") --> (" << NewGID << ")\n");
+ << CurrentStoreBarrierGroupID << ") --> (" << NewGID
+ << ")\n");
StoreGroup.addSuccessor(&NewGroup, true);
}
@@ -110,7 +110,6 @@ unsigned LSUnit::dispatch(const InstRef &IR) {
StoreGroup.addSuccessor(&NewGroup, !assumeNoAlias());
}
-
CurrentStoreGroupID = NewGI...
[truncated]
|
In my eyes, using a DAG for scheduling memory groups is not necessarily "just" an implementation detail. The original intention was to provide a target independent mechanism for adding nodes and edges to the dependency graph. Custom LSUnits can then decide when/how to assign edges/form groups. Just to clarify, I am not against this patch. I am mainly curious about why using the existing framework would not work in your particular case. What is your solution using for tracking execution of dependent memory operations? Generally speaking, I would be tempted to accept your patch. However, I'd like to hear the opinion of @mshockwave and @michaelmaitland on this. |
Thank you for replying to this so quickly.
I agree that DAGs make sense for tracking data dependencies. Our implementation does in fact also use a DAG, but ...
... the problem is that subclasses of Note that subclasses of Alternative Approach Click for details on alternative approachThis alternative implementation represents the DAG in `LSUnitBase` using a new class `AbstractMemoryGroup`, which specifies an interface for inspecting the DAG using virtual methods (such as `getNumPredecessors()` or `isWaiting()` for each group). `MemoryGroup` becomes a subclass of `AbstractMemoryGroup` -- a specific implementation of a DAG as used by the default `LSUnit` implementation, which contains the methods for how that load/store unit implementation might modify the DAG (such as adding nodes in `dispatch`). This still gives other `LSUnitBase` subclasses options to implement the DAG in an alternative way, but makes it clear that it is expected of all subclasses to provide some DAG in some way.On another note, I realized my performance comparison above was flawed because that test does not contain any load/store instructions. However I just ran Sorry for the long message. Please let me know if there is any preference between the approach in this PR (generalize |
Thanks, I understand it now. This patch looks good to me, and I personally prefer it to the alternative approach. Let see if @mshockwave and @michaelmaitland are also of the same opinion. |
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can this be merged in that case or are we still waiting for reviews? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry being late.
LGTM
@andrej Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
Thank you for merging this. I saw some checks didn't pass; from what I can tell, these are unrelated to this changeset, e.g. failing builds due to "no space left on device". |
This patch currently conflicts because it landed in LLVM upstream [LLVM #114159](llvm/llvm-project#114159)
In MCA, the load/store unit is modeled through a
LSUnitBase
class. Judging from the nameLSUnitBase
, I believe there is an intent to allow for different specialized load/store unit implementations. (However, currently there is only one implementation used in-tree,LSUnit
.)PR #101534 fixed one instance where the specialized
LSUnit
was hard-coded, opening the door for other subclasses to be used, but what subclasses can do is, in my opinion, still overly limited due to a reliance on theMemoryGroup
class, e.g. here.The
MemoryGroup
class is currently used in the defaultLSUnit
implementation to model data dependencies/hazards in the pipeline.MemoryGroups
form a graph of memory dependencies that inform the scheduler when load/store instructions can be executed relative to each other.In my eyes, this is an implementation detail. Other
LSUnit
s may want to keep track of data dependencies in different ways. As a concrete example, a downstream use I am working on[1] uses a custom load/store unit that makes use of available aliasing information. I haven't been able to shoehorn our additional aliasing information into the existingMemoryGroup
abstraction. I think there is no need to force subclasses to useMemoryGroup
s; users ofLSUnitBase
are only concerned with when, and for how long, a load/store instruction executes.This PR makes changes to instead leave it up to the subclasses how to model such dependencies, and only prescribes an abstract interface in
LSUnitBase
. It also moves data members and methods that are not necessary to provide an abstract interface fromLSUnitBase
to theLSUnit
subclass. I decided to make theMemoryGroup
a protected subclass ofLSUnit
; that way, specializations may inherit fromLSUnit
and still make use ofMemoryGroup
s if they wish to do so (e.g. if they want to only overwrite thedispatch
method).Drawbacks / Considerations
My reason for suggesting this PR is an out-of-tree use. As such, these changes don't introduce any new functionality for in-tree LLVM uses. However, in my opinion, these changes improve code clarity and prescribe a clear interface, which would be the main benefit for the LLVM community.
A drawback of the more abstract interface is that virtual dispatching is used in more places. However, note that virtual dispatch is already currently used in some critical parts of the
LSUnitBase
, e.g. theisAvailable
anddispatch
methods. As a quick check to ensure these changes don't significantly negatively impact performance, I also rantime llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=3000 llvm/test/tools/llvm-mca/X86/BtVer2/dot-product.s
before and after the changes; there was no observable difference in runtimes (0.292 s
total before,0.286 s
total after changes).Alternatives
If there is a desire to require subclasses to use some form of
MemoryGroup
s, it would also be possible to create something like anAbstractMemoryGroup
interface and only reference that from withinLSUnitBase
. That would still allow subclassing ofLSUnitBase
to use different kinds of memory groups.Thank you in advance for giving this your consideration!
[1]: MCAD started by @mshockwave and @chinmaydd.