[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

ayermolo · 2023-12-18T23:54:38Z

We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF.

For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1.

For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.

We would always allocate maximum amount for vector containing DWARFUnitInfo. In real usecases what ends up hapenning is we allocate a giant vector when processing one CU, or for thin-lto case multiple CUs. This lead to a lot of memory overhead, and 2x BOLT processing slowdown for at least one service built with monolithic DWARF. For binaries built with LTO with clang all of CUs that have cross references will share an abbrev table and will be processed in one batch. Rest of CUs are processesd in --cu-processing-batch-size size. Which defaults to 1. For theoretical cases where cross-cu references are present, but they do not share abbrev will increase the size of CloneUnitCtxMap as each CU is being processsed.

aaupov · 2023-12-19T20:41:39Z

Context:
BOLT normally processes CUs in a way to reduce memory overhead because raising DWARF to IR is memory intensive.

All CUs are partitioned in the following way: all CUs with cross-references are put in a single bucket, and the rest are put into buckets of size cu-processing-batch-size (default is 1, meaning they are handled one by one). This partitioning is done in partitionCUs, which returns a vector of vector of DWARFUnit pointers, where each first-level vector entry is a partition.

llvm-project/bolt/lib/Rewrite/DWARFRewriter.cpp

Lines 568 to 571 in 78a195e

    
           /// Partitions CUs in to buckets. Bucket size is controlled by 
        
           /// cu-processing-batch-size. All the CUs that have cross CU reference reference 
        
           /// as a source are put in to the same initial bucket. 
        
           static CUPartitionVector partitionCUs(DWARFContext &DwCtx) {

However, in buildCompileUnits invoked from DWARFRewriter::updateDebugInfo with those partitions, we used to allocate space in CloneUnitCtxMap for all CUs in DwarfContext (all CUs in the input binary):

llvm-project/bolt/lib/Core/DIEBuilder.cpp

Line 275 in 219355d

getState().CloneUnitCtxMap.resize(DwarfContext->getNumCompileUnits());

For large binaries this is overly conservative and caused massive memory allocations. It is enough to allocate space only for CUs in the current partition – either for current CU itself (no cross-CU refs), or for all CUs that have cross-refs (which are all put in the same partition).

Note that the partition assignment is based on the heuristic that CUs with cross-refs share the abbrev_table. If this is not the case, BOLT detects that using DUList so extra entries are allocated in CloneUnitCtxMap to ensure correctness.

aaupov

Please retitle to avoid "we" and update the summary to provide more context to the readers with no background.

bolt/lib/Core/DIEBuilder.cpp

github-actions · 2023-12-20T22:32:06Z

✅ With the latest revision this PR passed the C/C++ code formatter.

aaupov

I'd suggest more explicit title e.g.:
"[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size"

ayermolo requested review from aaupov and maksfb December 18, 2023 23:54

ayermolo added the BOLT label Dec 18, 2023

aaupov reviewed Dec 19, 2023

View reviewed changes

bolt/lib/Core/DIEBuilder.cpp Show resolved Hide resolved

bolt/lib/Core/DIEBuilder.cpp Outdated Show resolved Hide resolved

Updated comments

5856dd1

ayermolo changed the title ~~[BOLT][DWARF][NFC] Change how we re-size CloneUnitCtxMap~~ [BOLT][DWARF][NFC] Change how CloneUnitCtxMap is re-sized Dec 20, 2023

clang-format

dfc28af

aaupov approved these changes Dec 21, 2023

View reviewed changes

ayermolo changed the title ~~[BOLT][DWARF][NFC] Change how CloneUnitCtxMap is re-sized~~ [BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size Dec 21, 2023

ayermolo merged commit ad4cead into llvm:main Dec 21, 2023

ayermolo deleted the debugMapResize branch December 21, 2023 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

Uh oh!

ayermolo commented Dec 18, 2023

Uh oh!

aaupov commented Dec 19, 2023

Uh oh!

aaupov left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 20, 2023 •

edited

Loading

Uh oh!

aaupov left a comment

Uh oh!

Uh oh!

[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

[BOLT][DWARF][NFC] Initialize CloneUnitCtxMap with current partition size #75876

Uh oh!

Conversation

ayermolo commented Dec 18, 2023

Uh oh!

aaupov commented Dec 19, 2023

Uh oh!

aaupov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaupov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 20, 2023 •

edited

Loading