-
Notifications
You must be signed in to change notification settings - Fork 341
Cache Abbreviation entries after materializing them to speedup MCCAS replay #7852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c515933
to
53fe5f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exciting!
2158b92
to
53fe5f9
Compare
53fe5f9
to
7922368
Compare
} | ||
|
||
Error DIEVisitor::materializeAbbrevDIE(unsigned AbbrevIdx) { | ||
constexpr auto AddrSize = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this come from? Should it be hardcoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of our purposes the AddressSize is always 8 bytes on MachO, if you see DIEVisitor::visitDIEAttrs
, the same is true there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if you see getAndSetDebugAbbrevOffsetAndSkip
, we have a check to make sure that the address size is 8
if (AddressSize != 8)
return createStringError(
inconvertibleErrorCode(),
"Address size is not 8 bytes, unsupported architecture for MCCAS!");
Error DIEVisitor::materializeAbbrevDIE(unsigned AbbrevIdx) { | ||
constexpr auto AddrSize = 8; | ||
constexpr auto FormParams = | ||
dwarf::FormParams{4 /*Version*/, AddrSize, dwarf::DwarfFormat::DWARF32}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that also the version for DWARF 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, what happens if you use MCCAS with a dwarf 2 binary? Is there a graceful degradation path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we never really addressed the dwarf version issue.
We need to do that asap but separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Felipe, the DWARF version is a separate issue and I will put up a separate patch for that
llvm/lib/MCCAS/MCCASObjectV1.cpp
Outdated
} | ||
|
||
auto AbbrevDIECacheVal = | ||
AbbrevDIECache[decodeAbbrevIndexAsAbbrevSetIdx(AbbrevIdx)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use the find() method because operator[] default-constructs the value and thus silently fails if there is no cache hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more evident once we disable the default constructor for that struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, AbbrevDIECache
is a SmallVector
, not a DenseMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nevermind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how does this function work? It materializes all abbreviations (3299) every time an entry is needed? I must be missing something obvious...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue has been fixed, that was an oversight which did work when this patch was on top of the patch that switched materialization to iterative from recursive, but now I have a constructor in DIEVisitor to do the AbbrevEntryCache creation instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, a bunch of things escaped me the first time I looked into this. Feel free to merge once you address these and Adrian's comments.
(I think the DWARF version issue should be handled separately, as it is a pre-existing problem)
Error DIEVisitor::materializeAbbrevDIE(unsigned AbbrevIdx) { | ||
constexpr auto AddrSize = 8; | ||
constexpr auto FormParams = | ||
dwarf::FormParams{4 /*Version*/, AddrSize, dwarf::DwarfFormat::DWARF32}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we never really addressed the dwarf version issue.
We need to do that asap but separately
llvm/lib/MCCAS/MCCASObjectV1.cpp
Outdated
} | ||
|
||
auto AbbrevDIECacheVal = | ||
AbbrevDIECache[decodeAbbrevIndexAsAbbrevSetIdx(AbbrevIdx)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more evident once we disable the default constructor for that struct
6b8bb0c
to
ec9ed7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
AbbrevEntryCache.reserve(AbbrevEntries.size()); | ||
for (unsigned I = 0; I < AbbrevEntries.size(); I++) { | ||
if (Error E = materializeAbbrevDIE(encodeAbbrevIndex(I))) | ||
report_fatal_error(std::move(E)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we are crashing the compiler on invalid input.
I don't remember how this used to be done in previous iterations of this PR, but we could (maybe in a separate PR) move this out of DIEVisitor entirely; the DIEVisitor could take the already-cached abbreviations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, once we move to the flat patch, we can do this materialization in visitDIERef
ec9ed7f
to
5dab406
Compare
@swift-ci please test llvm |
5dab406
to
6fcca53
Compare
This patch caches abbreviation entries after materializing them onces to avoid rematerializing the same abbreviation entry for every DIE, thus speeding up MCCAS replay
6fcca53
to
4e9c0dc
Compare
Cherry-pick PR #7852 into stable/20230725
When looking at a trace of MCCAS replay, we noticed that a large chunk of time is spent in the function
visitDIEAttrs
The reason for that is because, we materialize abbreviations every time we see a new DIE, which leads to decoding the same ULEB128 values that represent the form and attribute of a DIE for every DIE in the debug info section. We can materialize them once and cache them, which would reduce the time spent and give us some speedup.
We see some speedup as well:
MCCAS With Abbrev Caching
ninja clang 150.46s user 48.35s system 536% cpu 37.071 total
ninja clang 150.43s user 47.33s system 561% cpu 35.240 total
ninja clang 150.78s user 47.23s system 565% cpu 34.987 total
MCCAS Without Abbrev Caching
ninja clang 193.36s user 48.17s system 557% cpu 43.320 total
ninja clang 190.28s user 46.98s system 595% cpu 39.867 total
ninja clang 191.55s user 46.95s system 602% cpu 39.602 total