Skip to content

Commit 5519e4d

Browse files
committed
Re-land "[PDB] Merge types in parallel when using ghashing"
Stored Error objects have to be checked, even if they are success values. This reverts commit 8d250ac. Relands commit 49b3459.. Original commit message: ----------------------------------------- This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE | AFTER Input File Reading: 956 ms | 968 ms Code Layout: 258 ms | 190 ms Commit Output File: 6 ms | 7 ms PDB Emission (Cumulative): 6691 ms | 4253 ms Add Objects: 4341 ms | 2927 ms Type Merging: 2814 ms | 1269 ms -55%! Symbol Merging: 1509 ms | 1645 ms Publics Stream Layout: 111 ms | 112 ms TPI Stream Layout: 764 ms | 26 ms trivial Commit to Disk: 1322 ms | 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805
1 parent 722d792 commit 5519e4d

File tree

18 files changed

+1084
-236
lines changed

18 files changed

+1084
-236
lines changed

lld/COFF/DebugTypes.cpp

Lines changed: 750 additions & 97 deletions
Large diffs are not rendered by default.

lld/COFF/DebugTypes.h

Lines changed: 111 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,37 @@
1010
#define LLD_COFF_DEBUGTYPES_H
1111

1212
#include "lld/Common/LLVM.h"
13-
#include "llvm/DebugInfo/CodeView/TypeIndex.h"
13+
#include "llvm/ADT/BitVector.h"
14+
#include "llvm/ADT/DenseMap.h"
15+
#include "llvm/DebugInfo/CodeView/TypeIndexDiscovery.h"
16+
#include "llvm/DebugInfo/CodeView/TypeRecord.h"
1417
#include "llvm/Support/Error.h"
1518
#include "llvm/Support/MemoryBuffer.h"
1619

1720
namespace llvm {
1821
namespace codeview {
19-
class PrecompRecord;
20-
class TypeServer2Record;
22+
struct GloballyHashedType;
2123
} // namespace codeview
2224
namespace pdb {
2325
class NativeSession;
26+
class TpiStream;
2427
}
2528
} // namespace llvm
2629

2730
namespace lld {
2831
namespace coff {
2932

33+
using llvm::codeview::GloballyHashedType;
3034
using llvm::codeview::TypeIndex;
3135

3236
class ObjFile;
3337
class PDBInputFile;
3438
class TypeMerger;
39+
struct GHashState;
3540

3641
class TpiSource {
3742
public:
38-
enum TpiKind { Regular, PCH, UsingPCH, PDB, PDBIpi, UsingPDB };
43+
enum TpiKind : uint8_t { Regular, PCH, UsingPCH, PDB, PDBIpi, UsingPDB };
3944

4045
TpiSource(TpiKind k, ObjFile *f);
4146
virtual ~TpiSource();
@@ -53,21 +58,97 @@ class TpiSource {
5358
/// caller-provided ObjectIndexMap.
5459
virtual Error mergeDebugT(TypeMerger *m);
5560

61+
/// Load global hashes, either by hashing types directly, or by loading them
62+
/// from LLVM's .debug$H section.
63+
virtual void loadGHashes();
64+
65+
/// Use global hashes to merge type information.
66+
virtual void remapTpiWithGHashes(GHashState *g);
67+
68+
// Remap a type index in place.
69+
bool remapTypeIndex(TypeIndex &ti, llvm::codeview::TiRefKind refKind) const;
70+
71+
protected:
72+
void remapRecord(MutableArrayRef<uint8_t> rec,
73+
ArrayRef<llvm::codeview::TiReference> typeRefs);
74+
75+
void mergeTypeRecord(llvm::codeview::CVType ty);
76+
77+
// Merge the type records listed in uniqueTypes. beginIndex is the TypeIndex
78+
// of the first record in this source, typically 0x1000. When PCHs are
79+
// involved, it may start higher.
80+
void mergeUniqueTypeRecords(
81+
ArrayRef<uint8_t> debugTypes,
82+
TypeIndex beginIndex = TypeIndex(TypeIndex::FirstNonSimpleIndex));
83+
84+
// Use the ghash table to construct a map from source type index to
85+
// destination PDB type index. Usable for either TPI or IPI.
86+
void fillMapFromGHashes(GHashState *m,
87+
llvm::SmallVectorImpl<TypeIndex> &indexMap);
88+
89+
// Copies ghashes from a vector into an array. These are long lived, so it's
90+
// worth the time to copy these into an appropriately sized vector to reduce
91+
// memory usage.
92+
void assignGHashesFromVector(std::vector<GloballyHashedType> &&hashVec);
93+
94+
// Walk over file->debugTypes and fill in the isItemIndex bit vector.
95+
void fillIsItemIndexFromDebugT();
96+
97+
public:
98+
bool remapTypesInSymbolRecord(MutableArrayRef<uint8_t> rec);
99+
100+
void remapTypesInTypeRecord(MutableArrayRef<uint8_t> rec);
101+
56102
/// Is this a dependent file that needs to be processed first, before other
57103
/// OBJs?
58104
virtual bool isDependency() const { return false; }
59105

60-
static void forEachSource(llvm::function_ref<void(TpiSource *)> fn);
106+
/// Returns true if this type record should be omitted from the PDB, even if
107+
/// it is unique. This prevents a record from being added to the input ghash
108+
/// table.
109+
bool shouldOmitFromPdb(uint32_t ghashIdx) {
110+
return ghashIdx == endPrecompGHashIdx;
111+
}
112+
113+
/// All sources of type information in the program.
114+
static std::vector<TpiSource *> instances;
115+
116+
/// Dependency type sources, such as type servers or PCH object files. These
117+
/// must be processed before objects that rely on them. Set by
118+
/// TpiSources::sortDependencies.
119+
static ArrayRef<TpiSource *> dependencySources;
120+
121+
/// Object file sources. These must be processed after dependencySources.
122+
static ArrayRef<TpiSource *> objectSources;
123+
124+
/// Sorts the dependencies and reassigns TpiSource indices.
125+
static void sortDependencies();
61126

62127
static uint32_t countTypeServerPDBs();
63128
static uint32_t countPrecompObjs();
64129

130+
/// Free heap allocated ghashes.
131+
static void clearGHashes();
132+
65133
/// Clear global data structures for TpiSources.
66134
static void clear();
67135

68136
const TpiKind kind;
137+
bool ownedGHashes = true;
138+
uint32_t tpiSrcIdx = 0;
139+
140+
protected:
141+
/// The ghash index (zero based, not 0x1000-based) of the LF_ENDPRECOMP record
142+
/// in this object, if one exists. This is the all ones value otherwise. It is
143+
/// recorded here so that it can be omitted from the final ghash table.
144+
uint32_t endPrecompGHashIdx = ~0U;
145+
146+
public:
69147
ObjFile *file;
70148

149+
/// An error encountered during type merging, if any.
150+
Error typeMergingError = Error::success();
151+
71152
// Storage for tpiMap or ipiMap, depending on the kind of source.
72153
llvm::SmallVector<TypeIndex, 0> indexMapStorage;
73154

@@ -76,6 +157,31 @@ class TpiSource {
76157
// objects.
77158
llvm::ArrayRef<TypeIndex> tpiMap;
78159
llvm::ArrayRef<TypeIndex> ipiMap;
160+
161+
/// Array of global type hashes, indexed by TypeIndex. May be calculated on
162+
/// demand, or present in input object files.
163+
llvm::ArrayRef<llvm::codeview::GloballyHashedType> ghashes;
164+
165+
/// When ghashing is used, record the mapping from LF_[M]FUNC_ID to function
166+
/// type index here. Both indices are PDB indices, not object type indexes.
167+
llvm::DenseMap<TypeIndex, TypeIndex> funcIdToType;
168+
169+
/// Indicates if a type record is an item index or a type index.
170+
llvm::BitVector isItemIndex;
171+
172+
/// A list of all "unique" type indices which must be merged into the final
173+
/// PDB. GHash type deduplication produces this list, and it should be
174+
/// considerably smaller than the input.
175+
std::vector<uint32_t> uniqueTypes;
176+
177+
struct MergedInfo {
178+
std::vector<uint8_t> recs;
179+
std::vector<uint16_t> recSizes;
180+
std::vector<uint32_t> recHashes;
181+
};
182+
183+
MergedInfo mergedTpi;
184+
MergedInfo mergedIpi;
79185
};
80186

81187
TpiSource *makeTpiSource(ObjFile *file);

lld/COFF/Driver.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,13 @@ bool link(ArrayRef<const char *> args, bool canExitEarly, raw_ostream &stdoutOS,
6969
lld::stderrOS = &stderrOS;
7070

7171
errorHandler().cleanupCallback = []() {
72+
TpiSource::clear();
7273
freeArena();
7374
ObjFile::instances.clear();
7475
PDBInputFile::instances.clear();
7576
ImportFile::instances.clear();
7677
BitcodeFile::instances.clear();
7778
memset(MergeChunk::instances, 0, sizeof(MergeChunk::instances));
78-
TpiSource::clear();
7979
OutputSection::clear();
8080
};
8181

0 commit comments

Comments
 (0)