Skip to content

[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 21, 2023

Conversation

ayermolo
Copy link
Contributor

We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.

We would always allocate maximum amount for vector containing DWARFUnitInfo. In
real usecases what ends up hapenning is we allocate a giant vector when
processing one CU, or for thin-lto case multiple CUs. This lead to a lot of
memory overhead, and 2x BOLT processing slowdown for at least one service built
with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross references
will share an abbrev table and will be processed in one batch. Rest of CUs are
processesd in --cu-processing-batch-size size. Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do not
share abbrev will increase the size of CloneUnitCtxMap as each CU is being
processsed.
@ayermolo ayermolo requested review from aaupov and maksfb December 18, 2023 23:54
@ayermolo ayermolo added the BOLT label Dec 18, 2023
@aaupov
Copy link
Contributor

aaupov commented Dec 19, 2023

Context:
BOLT normally processes CUs in a way to reduce memory overhead because raising DWARF to IR is memory intensive.

All CUs are partitioned in the following way: all CUs with cross-references are put in a single bucket, and the rest are put into buckets of size cu-processing-batch-size (default is 1, meaning they are handled one by one). This partitioning is done in partitionCUs, which returns a vector of vector of DWARFUnit pointers, where each first-level vector entry is a partition.

/// Partitions CUs in to buckets. Bucket size is controlled by
/// cu-processing-batch-size. All the CUs that have cross CU reference reference
/// as a source are put in to the same initial bucket.
static CUPartitionVector partitionCUs(DWARFContext &DwCtx) {

However, in buildCompileUnits invoked from DWARFRewriter::updateDebugInfo with those partitions, we used to allocate space in CloneUnitCtxMap for all CUs in DwarfContext (all CUs in the input binary):

getState().CloneUnitCtxMap.resize(DwarfContext->getNumCompileUnits());

For large binaries this is overly conservative and caused massive memory allocations. It is enough to allocate space only for CUs in the current partition – either for current CU itself (no cross-CU refs), or for all CUs that have cross-refs (which are all put in the same partition).

Note that the partition assignment is based on the heuristic that CUs with cross-refs share the abbrev_table. If this is not the case, BOLT detects that using DUList so extra entries are allocated in CloneUnitCtxMap to ensure correctness.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retitle to avoid "we" and update the summary to provide more context to the readers with no background.

@ayermolo ayermolo changed the title [BOLT][DWARF][NFC] Change how we re-size CloneUnitCtxMap [BOLT][DWARF][NFC] Change how CloneUnitCtxMap is re-sized Dec 20, 2023
Copy link

github-actions bot commented Dec 20, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest more explicit title e.g.:
"[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size"

@ayermolo ayermolo changed the title [BOLT][DWARF][NFC] Change how CloneUnitCtxMap is re-sized [BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size Dec 21, 2023
@ayermolo ayermolo merged commit ad4cead into llvm:main Dec 21, 2023
@ayermolo ayermolo deleted the debugMapResize branch December 21, 2023 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants