-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm-symbolizer] nfc, use map instead of vector #69552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -70,20 +70,6 @@ SymbolizableObjectFile::create(const object::ObjectFile *Obj, | |
return std::move(E); | ||
} | ||
|
||
std::vector<SymbolDesc> &SS = res->Symbols; | ||
// Sort by (Addr,Size,Name). If several SymbolDescs share the same Addr, | ||
// pick the one with the largest Size. This helps us avoid symbols with no | ||
// size information (Size=0). | ||
llvm::stable_sort(SS); | ||
auto I = SS.begin(), E = SS.end(), J = SS.begin(); | ||
while (I != E) { | ||
auto OI = I; | ||
while (++I != E && OI->Addr == I->Addr) { | ||
} | ||
*J++ = I[-1]; | ||
} | ||
SS.erase(J, SS.end()); | ||
|
||
return std::move(res); | ||
} | ||
|
||
|
@@ -134,7 +120,10 @@ Error SymbolizableObjectFile::addCoffExportSymbols( | |
uint32_t NextOffset = I != E ? I->Offset : Export.Offset + 1; | ||
uint64_t SymbolStart = ImageBase + Export.Offset; | ||
uint64_t SymbolSize = NextOffset - Export.Offset; | ||
Symbols.push_back({SymbolStart, SymbolSize, Export.Name, 0}); | ||
// If the SymbolStart exists, use the one with the large Size. | ||
// This helps us avoid symbols with no size information (Size = 0). | ||
if (!Symbols.count(SymbolStart) || SymbolSize >= Symbols[SymbolStart].Size) | ||
Symbols[SymbolStart] = {SymbolSize, Export.Name, 0}; | ||
Comment on lines
+125
to
+126
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What will happen if two symbols have the same address? For example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If two symbols have same address, it will follow legacy logic, i.e. use the one with the large size. So for your case, |
||
} | ||
return Error::success(); | ||
} | ||
|
@@ -215,7 +204,14 @@ Error SymbolizableObjectFile::addSymbol(const SymbolRef &Symbol, | |
|
||
if (Obj.isELF() && ELFSymbolRef(Symbol).getBinding() != ELF::STB_LOCAL) | ||
ELFSymIdx = 0; | ||
Symbols.push_back({SymbolAddress, SymbolSize, SymbolName, ELFSymIdx}); | ||
|
||
// If the SymbolAddress exists, use the one with the large Size. | ||
// This helps us avoid symbols with no size information (Size = 0). | ||
if (!Symbols.count(SymbolAddress) || | ||
SymbolSize >= Symbols[SymbolAddress].Size) { | ||
Symbols[SymbolAddress] = {SymbolSize, SymbolName, ELFSymIdx}; | ||
} | ||
|
||
return Error::success(); | ||
} | ||
|
||
|
@@ -234,26 +230,25 @@ uint64_t SymbolizableObjectFile::getModulePreferredBase() const { | |
bool SymbolizableObjectFile::getNameFromSymbolTable( | ||
uint64_t Address, std::string &Name, uint64_t &Addr, uint64_t &Size, | ||
std::string &FileName) const { | ||
SymbolDesc SD{Address, UINT64_C(-1), StringRef(), 0}; | ||
auto SymbolIterator = llvm::upper_bound(Symbols, SD); | ||
auto SymbolIterator = Symbols.upper_bound(Address); | ||
if (SymbolIterator == Symbols.begin()) | ||
return false; | ||
--SymbolIterator; | ||
if (SymbolIterator->Size != 0 && | ||
SymbolIterator->Addr + SymbolIterator->Size <= Address) | ||
|
||
auto &SD = SymbolIterator->second; | ||
if (SD.Size != 0 && SymbolIterator->first + SD.Size <= Address) | ||
return false; | ||
Name = SymbolIterator->Name.str(); | ||
Addr = SymbolIterator->Addr; | ||
Size = SymbolIterator->Size; | ||
Name = SD.Name.str(); | ||
Addr = SymbolIterator->first; | ||
Size = SD.Size; | ||
|
||
if (SymbolIterator->ELFLocalSymIdx != 0) { | ||
if (SD.ELFLocalSymIdx != 0) { | ||
// If this is an ELF local symbol, find the STT_FILE symbol preceding | ||
// SymbolIterator to get the filename. The ELF spec requires the STT_FILE | ||
// symbol (if present) precedes the other STB_LOCAL symbols for the file. | ||
assert(Module->isELF()); | ||
auto It = llvm::upper_bound( | ||
FileSymbols, | ||
std::make_pair(SymbolIterator->ELFLocalSymIdx, StringRef())); | ||
auto It = llvm::upper_bound(FileSymbols, | ||
std::make_pair(SD.ELFLocalSymIdx, StringRef())); | ||
if (It != FileSymbols.begin()) | ||
FileName = It[-1].second.str(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want a
std::map
and not a sorted vector. https://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc gives a lot of thoughts on this and provides various other alternatives too that might be more efficient.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
map
is better than sorted vector for my case. In #69553, when add a new symbol, I need to know if there is any existing symbol that has same address. If I am understanding sorted vector right, each time when a new symbol is inserted to a sorted vector, it has to be sorted manually and then for a new insertion, use a log(n) query(std::lower_bound) to check if the address already exists. I don't see the benefit compared withmap
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main benefit is the performance. As explained in the programming guide,
std::map
is not a very efficient data structure. Please do some runtime comparisons of the tool with and without this change, where you have for both a large number of elements and a small number. In particular, how does the runtime and memory usage compare before and after the change?The programming guide seems pretty clear in this regards: a sorted vector is the right approach if you want to insert lots of elements but don't care about the order during insertion. In other words, you haven't understood "sorted vector" correctly. The existing usage looks precisely like is described in the programmer's manual: lots of insertions, followed by a sort, and only then is the vector looked in:
Why do you need to know this at insertion time rather than just sorting things differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the data we stored to
Symbols
. For now, The symbol is stored as:When we do the sorting, we need to know some symbol flags besides address and size for XCOFF. See the comments here https://github.com/llvm/llvm-project/pull/69553/files#diff-795c0eb20ded2ff47056aa629fc0e183c20225b8fd40b607627ab3106995779dR222-R225
Is it acceptable to you that we add a new element (maybe a
SymbolRef
type and can eliminate current fieldELFLocalSymIdx
) instruct SymbolDesc
? If using this way, we don't need to change the container, can still use vector and do the sort once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's an issue expanding
SymbolDesc
with more fields as necessary, though for the sake of memory usage, I'd try to keep those elements as small as possible. That would be the preferred approach for adding more conditions to sort on.