-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen][SDAG] Remove Combiner WorklistMap #92900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself.
@llvm/pr-subscribers-llvm-selectiondag Author: None (aengelke) ChangesDenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself. This brings a substantial performance improvement, see here (compile-time-tracker). Full diff: https://github.com/llvm/llvm-project/pull/92900.diff 2 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index ac94c6099d080..7adda059647cb 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -619,6 +619,9 @@ END_TWO_BYTE_PACK()
/// Unique id per SDNode in the DAG.
int NodeId = -1;
+ /// Index in worklist of DAGCombiner, or -1.
+ int CombinerWorklistIndex = -1;
+
/// The values that are used by this operation.
SDUse *OperandList = nullptr;
@@ -747,6 +750,12 @@ END_TWO_BYTE_PACK()
/// Set unique node id.
void setNodeId(int Id) { NodeId = Id; }
+ /// Get worklist index for DAGCombiner
+ int getCombinerWorklistIndex() const { return CombinerWorklistIndex; }
+
+ /// Set worklist index for DAGCombiner
+ void setCombinerWorklistIndex(int Index) { CombinerWorklistIndex = Index; }
+
/// Return the node ordering.
unsigned getIROrder() const { return IROrder; }
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8607b50175359..90c940e991835 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -170,16 +170,11 @@ namespace {
/// back and when processing we pop off of the back.
///
/// The worklist will not contain duplicates but may contain null entries
- /// due to nodes being deleted from the underlying DAG.
+ /// due to nodes being deleted from the underlying DAG. For fast lookup and
+ /// deduplication, the index of the node in this vector is stored in the
+ /// node in SDNode::CombinerWorklistIndex.
SmallVector<SDNode *, 64> Worklist;
- /// Mapping from an SDNode to its position on the worklist.
- ///
- /// This is used to find and remove nodes from the worklist (by nulling
- /// them) when they are deleted from the underlying DAG. It relies on
- /// stable indices of nodes within the worklist.
- DenseMap<SDNode *, unsigned> WorklistMap;
-
/// This records all nodes attempted to be added to the worklist since we
/// considered a new worklist entry. As we keep do not add duplicate nodes
/// in the worklist, this is different from the tail of the worklist.
@@ -238,10 +233,9 @@ namespace {
}
if (N) {
- bool GoodWorklistEntry = WorklistMap.erase(N);
- (void)GoodWorklistEntry;
- assert(GoodWorklistEntry &&
+ assert(N->getCombinerWorklistIndex() >= 0 &&
"Found a worklist entry without a corresponding map entry!");
+ N->setCombinerWorklistIndex(-1);
}
return N;
}
@@ -285,8 +279,10 @@ namespace {
if (IsCandidateForPruning)
ConsiderForPruning(N);
- if (WorklistMap.insert(std::make_pair(N, Worklist.size())).second)
+ if (N->getCombinerWorklistIndex() == -1) {
+ N->setCombinerWorklistIndex(Worklist.size());
Worklist.push_back(N);
+ }
}
/// Remove all instances of N from the worklist.
@@ -295,13 +291,13 @@ namespace {
PruningList.remove(N);
StoreRootCountMap.erase(N);
- auto It = WorklistMap.find(N);
- if (It == WorklistMap.end())
+ int WorklistIndex = N->getCombinerWorklistIndex();
+ if (WorklistIndex == -1)
return; // Not in the worklist.
// Null out the entry rather than erasing it to avoid a linear operation.
- Worklist[It->second] = nullptr;
- WorklistMap.erase(It);
+ Worklist[WorklistIndex] = nullptr;
+ N->setCombinerWorklistIndex(-1);
}
void deleteAndRecombine(SDNode *N);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings a substantial performance improvement, see here (compile-time-tracker).
Nice!
@@ -619,6 +619,9 @@ END_TWO_BYTE_PACK() | |||
/// Unique id per SDNode in the DAG. | |||
int NodeId = -1; | |||
|
|||
/// Index in worklist of DAGCombiner, or -1. | |||
int CombinerWorklistIndex = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this increase the size of the node, or does it fill a hole (when pointers are 64-bit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fills 4 padding bytes, the size of an SDNode remains unchanged at 88 bytes (x86-64).
Love this! Although I think it takes all the padding bits we'd be considering using to extend NumOperands/NumValues to U32 to fix #55737 etc. :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something with PersistentId instead? Or at least, steal the bytes in release builds?
I guess, we probably could? I'm not too familiar how SDAG behaves under pressure, but I can imagine that the worklist could exceed 64k nodes, so the worklist index would need 32 bits. SDNodeFlags is 2 bytes (14 bits), so there are 2 padding bytes near the end before CFIType (which btw appears to be only used for KCFI calls). When swapping Flags and PersistentId, we should get 32 bits where to put the combiner index (+ place the PersistentId under ENABLE_ABI_BREAKING_CHECKS, see D120714). (NB: just from glancing at the code, I haven't compile-checked.)
Well... I personally think that nobody should use large aggregates as IR values (especially not with >65k values) in the first place -- it's been discouraged for as long as I can remember -- and that this is something we shouldn't need to support. It's probably easier (and also in their interest w.r.t. compile times) to generate different IR in the front-end. |
This feels a bit pointless at this stage, but keeps the four padding bytes in the middle of the struct for future use.
I moved the CombinerWorklistIndex to the end, so there are currently two padding bytes in the middle for future use. Two further bytes, e.g. for expanding NumValues/NumOperands to 32 bits each, could be gained in future by placing PersistentId under LLVM_ENABLE_ABI_BREAKING_CHECKS. |
+ another reorder to avoid unnecessary visibility changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers!
DenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself.
This brings a substantial performance improvement, see here (compile-time-tracker).