-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[MC][NFC] Allow MCInstrAnalysis to store state #65479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, all the analysis functions provided by `MCInstrAnalysis` work on a single instruction. On some targets, this limits the kind of instructions that can be successfully analyzed as common constructs may need multiple instructions. For example, a typical call sequence on RISC-V uses a auipc+jalr pair. In order to analyse the jalr inside `evaluateBranch`, information about the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr pairs to access globals. This patch proposes to add state to `MCInstrAnalysis` to support these use cases. Two new virtual methods are added: - `updateState`: takes an instruction and its address. This methods should be called by clients on every instruction and allows targets to store whatever information they need to analyse future instructions. - `resetState`: clears the state whenever it becomes irrelevant. Clients could call this, for example, when starting to disassemble a new function. Note that the default implementations do nothing so this patch is NFC. No actual state is stored inside `MCInstrAnalysis`; deciding the structure of the state is left to the targets. This patch also modifies llvm-objdump to use the new interface. This patch is an alternative to D116677 and the idea of storing state in `MCInstrAnalysis` was first discussed there.
524fed0
to
7228670
Compare
Ping. |
@llvm/pr-subscribers-llvm-binary-utilities ChangesCurrently, all the analysis functions provided by For example, a typical call sequence on RISC-V uses a auipc+jalr pair. In order to analyse the jalr inside This patch proposes to add state to
Note that the default implementations do nothing so this patch is NFC. No actual state is stored inside This patch also modifies llvm-objdump to use the new interface. This patch is an alternative to D116677 and the idea of storing state in Full diff: https://github.com/llvm/llvm-project/pull/65479.diff 2 Files Affected:
diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..e3ddf0b8b8939c9 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
virtual ~MCInstrAnalysis() = default;
+ /// Clear the internal state. See updateState for more information.
+ virtual void resetState() {}
+
+ /// Update internal state with \p Inst at \p Addr.
+ ///
+ /// For some types of analyses, inspecting a single instruction is not
+ /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+ /// on AArch64. To support inspecting multiple instructions, targets may keep
+ /// track of an internal state while analysing instructions. Clients should
+ /// call updateState for every instruction which allows later calls to one of
+ /// the analysis functions to take previous instructions into account.
+ /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+ /// new function), clients should call resetState to clear it.
+ virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
virtual bool isBranch(const MCInst &Inst) const {
return Info->get(Inst.getOpcode()).isBranch();
}
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 96d74d6e2d5e865..8f6479d3c6e31e4 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -842,7 +842,7 @@ class DisassemblerTarget {
std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
std::shared_ptr<MCContext> Context;
std::unique_ptr<MCDisassembler> DisAsm;
- std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+ std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
std::shared_ptr<MCInstPrinter> InstPrinter;
PrettyPrinter *Printer;
@@ -1265,14 +1265,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
}
}
-static void collectLocalBranchTargets(
- ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
- MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
- uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+ MCDisassembler *DisAsm, MCInstPrinter *IP,
+ const MCSubtargetInfo *STI, uint64_t SectionAddr,
+ uint64_t Start, uint64_t End,
+ std::unordered_map<uint64_t, std::string> &Labels) {
// So far only supports PowerPC and X86.
if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
return;
+ if (MIA)
+ MIA->resetState();
+
Labels.clear();
unsigned LabelCount = 0;
Start += SectionAddr;
@@ -1298,6 +1303,7 @@ static void collectLocalBranchTargets(
!Labels.count(Target) &&
!(STI->getTargetTriple().isPPC() && Target == Index))
Labels[Target] = ("L" + Twine(LabelCount++)).str();
+ MIA->updateState(Inst, Index);
}
Index += Size;
}
@@ -1939,6 +1945,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
BBAddrMapLabels);
}
+ if (DT->InstrAnalysis)
+ DT->InstrAnalysis->resetState();
+
while (Index < End) {
// ARM and AArch64 ELF binaries can interleave data and text in the
// same section. We rely on the markers introduced to understand what
@@ -2155,6 +2164,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
if (TargetOS == &CommentStream)
*TargetOS << "\n";
}
+
+ DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
}
}
|
This change LGTM, but I'd appreciate an OK from someone who has a broader view of the MC infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks okay to me, but I probably need to see the usage in practice.
Please see #65480 for an application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code formatter check seems to have failed. Might need to run clang-format?
No objections from me - the solution seems reasonably neat. Please give it a few more days to let @MaskRay chime in as he has a better understanding of the MC disassembly stuff than I do and he was away recently, so might still be catching up.
I'm not entirely sure why they're failing but it seems to be some intermittent error, unrelated to formatting, that I see happening every now and then on GH:
Sounds good! |
Currently, all the analysis functions provided by
MCInstrAnalysis
work on a single instruction. On some targets, this limits the kind of instructions that can be successfully analyzed as common constructs may need multiple instructions.For example, a typical call sequence on RISC-V uses a auipc+jalr pair. In order to analyse the jalr inside
evaluateBranch
, information about the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr pairs to access globals.This patch proposes to add state to
MCInstrAnalysis
to support these use cases. Two new virtual methods are added:updateState
: takes an instruction and its address. This methods should be called by clients on every instruction and allows targets to store whatever information they need to analyse future instructions.resetState
: clears the state whenever it becomes irrelevant. Clients could call this, for example, when starting to disassemble a new function.Note that the default implementations do nothing so this patch is NFC. No actual state is stored inside
MCInstrAnalysis
; deciding the structure of the state is left to the targets.This patch also modifies llvm-objdump to use the new interface.
This patch is an alternative to D116677 and the idea of storing state in
MCInstrAnalysis
was first discussed there.