Skip to content

[CodeGen][SDAG] Remove Combiner WorklistMap #92900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

aengelke
Copy link
Contributor

DenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself.

This brings a substantial performance improvement, see here (compile-time-tracker).

DenseMap for pointer lookup is expensive, and this is only used for
deduplication and index lookup. Instead, store the worklist index in the
node itself.
@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label May 21, 2024
@llvmbot
Copy link
Member

llvmbot commented May 21, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: None (aengelke)

Changes

DenseMap for pointer lookup is expensive, and this is only used for deduplication and index lookup. Instead, store the worklist index in the node itself.

This brings a substantial performance improvement, see here (compile-time-tracker).


Full diff: https://github.com/llvm/llvm-project/pull/92900.diff

2 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+9)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+12-16)
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index ac94c6099d080..7adda059647cb 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -619,6 +619,9 @@ END_TWO_BYTE_PACK()
   /// Unique id per SDNode in the DAG.
   int NodeId = -1;
 
+  /// Index in worklist of DAGCombiner, or -1.
+  int CombinerWorklistIndex = -1;
+
   /// The values that are used by this operation.
   SDUse *OperandList = nullptr;
 
@@ -747,6 +750,12 @@ END_TWO_BYTE_PACK()
   /// Set unique node id.
   void setNodeId(int Id) { NodeId = Id; }
 
+  /// Get worklist index for DAGCombiner
+  int getCombinerWorklistIndex() const { return CombinerWorklistIndex; }
+
+  /// Set worklist index for DAGCombiner
+  void setCombinerWorklistIndex(int Index) { CombinerWorklistIndex = Index; }
+
   /// Return the node ordering.
   unsigned getIROrder() const { return IROrder; }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 8607b50175359..90c940e991835 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -170,16 +170,11 @@ namespace {
     /// back and when processing we pop off of the back.
     ///
     /// The worklist will not contain duplicates but may contain null entries
-    /// due to nodes being deleted from the underlying DAG.
+    /// due to nodes being deleted from the underlying DAG. For fast lookup and
+    /// deduplication, the index of the node in this vector is stored in the
+    /// node in SDNode::CombinerWorklistIndex.
     SmallVector<SDNode *, 64> Worklist;
 
-    /// Mapping from an SDNode to its position on the worklist.
-    ///
-    /// This is used to find and remove nodes from the worklist (by nulling
-    /// them) when they are deleted from the underlying DAG. It relies on
-    /// stable indices of nodes within the worklist.
-    DenseMap<SDNode *, unsigned> WorklistMap;
-
     /// This records all nodes attempted to be added to the worklist since we
     /// considered a new worklist entry. As we keep do not add duplicate nodes
     /// in the worklist, this is different from the tail of the worklist.
@@ -238,10 +233,9 @@ namespace {
       }
 
       if (N) {
-        bool GoodWorklistEntry = WorklistMap.erase(N);
-        (void)GoodWorklistEntry;
-        assert(GoodWorklistEntry &&
+        assert(N->getCombinerWorklistIndex() >= 0 &&
                "Found a worklist entry without a corresponding map entry!");
+        N->setCombinerWorklistIndex(-1);
       }
       return N;
     }
@@ -285,8 +279,10 @@ namespace {
       if (IsCandidateForPruning)
         ConsiderForPruning(N);
 
-      if (WorklistMap.insert(std::make_pair(N, Worklist.size())).second)
+      if (N->getCombinerWorklistIndex() == -1) {
+        N->setCombinerWorklistIndex(Worklist.size());
         Worklist.push_back(N);
+      }
     }
 
     /// Remove all instances of N from the worklist.
@@ -295,13 +291,13 @@ namespace {
       PruningList.remove(N);
       StoreRootCountMap.erase(N);
 
-      auto It = WorklistMap.find(N);
-      if (It == WorklistMap.end())
+      int WorklistIndex = N->getCombinerWorklistIndex();
+      if (WorklistIndex == -1)
         return; // Not in the worklist.
 
       // Null out the entry rather than erasing it to avoid a linear operation.
-      Worklist[It->second] = nullptr;
-      WorklistMap.erase(It);
+      Worklist[WorklistIndex] = nullptr;
+      N->setCombinerWorklistIndex(-1);
     }
 
     void deleteAndRecombine(SDNode *N);

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings a substantial performance improvement, see here (compile-time-tracker).

Nice!

@@ -619,6 +619,9 @@ END_TWO_BYTE_PACK()
/// Unique id per SDNode in the DAG.
int NodeId = -1;

/// Index in worklist of DAGCombiner, or -1.
int CombinerWorklistIndex = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this increase the size of the node, or does it fill a hole (when pointers are 64-bit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fills 4 padding bytes, the size of an SDNode remains unchanged at 88 bytes (x86-64).

@RKSimon
Copy link
Collaborator

RKSimon commented May 23, 2024

Love this!

Although I think it takes all the padding bits we'd be considering using to extend NumOperands/NumValues to U32 to fix #55737 etc. :(

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something with PersistentId instead? Or at least, steal the bytes in release builds?

@aengelke
Copy link
Contributor Author

Can we do something with PersistentId instead? Or at least, steal the bytes in release builds?

I guess, we probably could? I'm not too familiar how SDAG behaves under pressure, but I can imagine that the worklist could exceed 64k nodes, so the worklist index would need 32 bits. SDNodeFlags is 2 bytes (14 bits), so there are 2 padding bytes near the end before CFIType (which btw appears to be only used for KCFI calls). When swapping Flags and PersistentId, we should get 32 bits where to put the combiner index (+ place the PersistentId under ENABLE_ABI_BREAKING_CHECKS, see D120714). (NB: just from glancing at the code, I haven't compile-checked.)

it takes all the padding bits we'd be considering using to extend NumOperands/NumValues to U32 to fix #55737 etc. :(

Well... I personally think that nobody should use large aggregates as IR values (especially not with >65k values) in the first place -- it's been discouraged for as long as I can remember -- and that this is something we shouldn't need to support. It's probably easier (and also in their interest w.r.t. compile times) to generate different IR in the front-end.

This feels a bit pointless at this stage, but keeps the four padding
bytes in the middle of the struct for future use.
@aengelke
Copy link
Contributor Author

aengelke commented Jun 3, 2024

I moved the CombinerWorklistIndex to the end, so there are currently two padding bytes in the middle for future use. Two further bytes, e.g. for expanding NumValues/NumOperands to 32 bits each, could be gained in future by placing PersistentId under LLVM_ENABLE_ABI_BREAKING_CHECKS.

+ another reorder to avoid unnecessary visibility changes
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers!

@aengelke aengelke merged commit 6150e84 into llvm:main Jun 5, 2024
7 checks passed
@aengelke aengelke deleted the perf/sdag-no-worklistmap branch June 5, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants