-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF. For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1. For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.
Context: All CUs are partitioned in the following way: all CUs with cross-references are put in a single bucket, and the rest are put into buckets of size llvm-project/bolt/lib/Rewrite/DWARFRewriter.cpp Lines 568 to 571 in 78a195e
However, in llvm-project/bolt/lib/Core/DIEBuilder.cpp Line 275 in 219355d
For large binaries this is overly conservative and caused massive memory allocations. It is enough to allocate space only for CUs in the current partition – either for current CU itself (no cross-CU refs), or for all CUs that have cross-refs (which are all put in the same partition). Note that the partition assignment is based on the heuristic that CUs with cross-refs share the abbrev_table. If this is not the case, BOLT detects that using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please retitle to avoid "we" and update the summary to provide more context to the readers with no background.
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest more explicit title e.g.:
"[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size"
We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF.
For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1.
For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.