-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MCA] Do not allocate space for DependenceEdge by default in DependencyGraphNode (NFC) #125080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cyGraphNode (NFC) For each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's rather unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences: Number of | Number of nodes with edges | this # of edges -------------------------------------------- 0 | 8239447 1 | 464252 2 | 6164 3 | 6783 4 | 939 5 | 500 6 | 545 7 | 116 8 | 2 9 | 1 10 | 1 Approximately the same distribution is produced by llvm-mca lit tests (even modified ones with extra dependencies added). On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically increases memory consumption without any need for it. In my case, replacing it with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size (2.2GB in absolute values). There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases).
@llvm/pr-subscribers-tools-llvm-mca Author: Anton Sidorenko (asi-sc) ChangesFor each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences:
Approximately the same distribution is produced by llvm-mca lit tests for X86, AArch and RISC-V (even modified ones with extra dependencies added). There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases). This change was made with the same intention as #124904 and optimizes I believe quite an unusual scenario. However, if there is no negative impact on other known scenarios, I'd like to have the change in llvm-project. Full diff: https://github.com/llvm/llvm-project/pull/125080.diff 1 Files Affected:
diff --git a/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h b/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
index 529090cf543fc4..2621efb0413ae2 100644
--- a/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
+++ b/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
@@ -228,7 +228,7 @@ class DependencyGraph {
unsigned Depth;
DependencyEdge CriticalPredecessor;
- SmallVector<DependencyEdge, 8> OutgoingEdges;
+ SmallVector<DependencyEdge, 0> OutgoingEdges;
};
SmallVector<DGNode, 16> Nodes;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number of | Number of nodes with
edges | this # of edges0 | 8239447 1 | 464252 2 | 6164 3 | 6783 4 | 939 5 | 500 6 | 545 7 | 116 8 | 2 9 | 1 10 | 1
This is an interesting distribution. If my math is correct ~95% of the nodes have no out-going edges. Thus I think this is a reasonable change.
Thanks for this patch! I left a minor comment. Otherwise, the change looks good to me. |
I don't see any comment, I think you might have forgot to press the submit button :-) |
@@ -228,7 +228,7 @@ class DependencyGraph { | |||
unsigned Depth; | |||
|
|||
DependencyEdge CriticalPredecessor; | |||
SmallVector<DependencyEdge, 8> OutgoingEdges; | |||
SmallVector<DependencyEdge, 0> OutgoingEdges; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is useful to have a comment (not more than a couple of lines) explaining why zero was used for the number of inline elements. Otherwise it would be unclear for the reader why we chose that number in this case.
…ependencyGraphNode (NFC)
For each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences:
Approximately the same distribution is produced by llvm-mca lit tests for X86, AArch and RISC-V (even modified ones with extra dependencies added).
On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically increases memory consumption without any need for it. In my case, replacing it with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size (2.2GB in absolute values).
There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases).
This change was made with the same intention as #124904 and optimizes I believe quite an unusual scenario. However, if there is no negative impact on other known scenarios, I'd like to have the change in llvm-project.