-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb-dap] Fix disassembled ranges for disassemble
request
#105446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TODO: * Update test case * Check if we can also support disassembling across functions Incorporated feedback from https://reviews.llvm.org/D140358
You can test this locally with the following command:darker --check --diff -r 82ee31f75ac1316006fa9e21dddfddec37cf7072...b809f570dd8055e5b899c337ec9ff5110ab94c6e lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py View the diff from darker here.--- packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py 2024-08-20 00:24:14.000000 +0000
+++ packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py 2024-08-20 23:09:29.317158 +0000
@@ -672,11 +672,15 @@
"arguments": args_dict,
}
return self.send_recv(command_dict)
def request_disassemble(
- self, memoryReference, instructionOffset=0, instructionCount=10, resolveSymbols=True
+ self,
+ memoryReference,
+ instructionOffset=0,
+ instructionCount=10,
+ resolveSymbols=True,
):
args_dict = {
"memoryReference": memoryReference,
"instructionOffset": instructionOffset,
"instructionCount": instructionCount,
|
@clayborg I would be interested in your guidance for this PR, given that you previously reviewed a similar change in https://reviews.llvm.org/D140358 I did use I did not know how to address the part
from your comment, though. Is there some API to find the next symbol? Or some API to get a list of symbols, sorted by their start address? I could of course use a brute-force approach, similar to
but that seems a bit wasteful |
@llvm/pr-subscribers-lldb Author: Adrian Vogelsgesang (vogelsgesang) ChangesThis is a first draft PR which fixes #103021 The main issue was that the This commit also incorporates previous feedback from https://reviews.llvm.org/D140358: With this change, we are using symbols and DWARF debug information to find function boundaries and correctly start disassembling at this address. TODO:
Full diff: https://github.com/llvm/llvm-project/pull/105446.diff 3 Files Affected:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index a324af57b61df3..75731ebfde6723 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -674,11 +674,11 @@ def request_disconnect(self, terminateDebuggee=None):
return self.send_recv(command_dict)
def request_disassemble(
- self, memoryReference, offset=-50, instructionCount=200, resolveSymbols=True
+ self, memoryReference, instructionOffset=0, instructionCount=10, resolveSymbols=True
):
args_dict = {
"memoryReference": memoryReference,
- "offset": offset,
+ "instructionOffset": instructionOffset,
"instructionCount": instructionCount,
"resolveSymbols": resolveSymbols,
}
diff --git a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
index 9e8ef5b289f2e8..ab72eb2d13d729 100644
--- a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
+++ b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
@@ -29,6 +29,14 @@ def test_disassemble(self):
)
self.continue_to_next_stop()
+ stackFrames = self.get_stackFrames(
+ threadId=threadId, startFrame=frameIndex, levels=1
+ )
+ self.assertIsNotNone(stackFrames)
+
+ # XXX finish updating test case
+ memoryReference = stackFrames[0]["instructionPointerReference"]
+
pc_assembly = self.disassemble(frameIndex=0)
self.assertIn("location", pc_assembly, "Source location missing.")
self.assertIn("instruction", pc_assembly, "Assembly instruction missing.")
diff --git a/lldb/tools/lldb-dap/lldb-dap.cpp b/lldb/tools/lldb-dap/lldb-dap.cpp
index ea84f31aec3a6c..5d9d28b59ea805 100644
--- a/lldb/tools/lldb-dap/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/lldb-dap.cpp
@@ -3910,8 +3910,8 @@ void request_disassemble(const llvm::json::Object &request) {
return;
}
- addr_ptr += GetSigned(arguments, "instructionOffset", 0);
- lldb::SBAddress addr(addr_ptr, g_dap.target);
+ addr_ptr += GetSigned(arguments, "offset", 0);
+ lldb::SBAddress addr = g_dap.target.ResolveLoadAddress(addr_ptr);
if (!addr.IsValid()) {
response["success"] = false;
response["message"] = "Memory reference not found in the current binary.";
@@ -3919,9 +3919,27 @@ void request_disassemble(const llvm::json::Object &request) {
return;
}
- const auto inst_count = GetUnsigned(arguments, "instructionCount", 0);
- lldb::SBInstructionList insts =
- g_dap.target.ReadInstructions(addr, inst_count);
+ int64_t inst_offset = GetSigned(arguments, "instructionOffset", 0);
+ const int64_t inst_count = GetSigned(arguments, "instructionCount", 0);
+ if (inst_count < 0) {
+ response["success"] = false;
+ response["message"] = "Negative instruction count.";
+ g_dap.SendJSON(llvm::json::Value(std::move(response)));
+ return;
+ }
+
+ lldb::SBInstructionList insts;
+ if (lldb::SBFunction func = addr.GetFunction()) {
+ // First try to identify the function boundaries through debug information
+ insts = func.GetInstructions(g_dap.target);
+ } else if (lldb::SBSymbol sym = addr.GetSymbol()) {
+ // Try to identify the function boundaries through the symbol table
+ insts = sym.GetInstructions(g_dap.target);
+ } else {
+ // We could not identify the function. Just disassemble starting from the
+ // provided address.
+ insts = g_dap.target.ReadInstructions(addr, inst_offset + inst_count);
+ }
if (!insts.IsValid()) {
response["success"] = false;
@@ -3930,10 +3948,46 @@ void request_disassemble(const llvm::json::Object &request) {
return;
}
+ // Find the start offset in the disassembled function
+ ssize_t offset = 0;
+ while (insts.GetInstructionAtIndex(offset).GetAddress().GetLoadAddress(
+ g_dap.target) < addr.GetLoadAddress(g_dap.target))
+ ++offset;
+ offset += inst_offset;
+
+ // Format the disassembled instructions
const bool resolveSymbols = GetBoolean(arguments, "resolveSymbols", false);
llvm::json::Array instructions;
- const auto num_insts = insts.GetSize();
- for (size_t i = 0; i < num_insts; ++i) {
+ for (ssize_t i = offset; i < inst_count + offset; ++i) {
+ // Before the range of disassembled instructions?
+ if (i < 0) {
+ auto fake_addr =
+ insts.GetInstructionAtIndex(0).GetAddress().GetLoadAddress(
+ g_dap.target) -
+ offset + i;
+ llvm::json::Object disassembled_inst{
+ {"address", "0x" + llvm::utohexstr(fake_addr)},
+ {"instruction", "Instruction before disassembled address range"},
+ {"presentationHint", "invalid"},
+ };
+ instructions.emplace_back(std::move(disassembled_inst));
+ continue;
+ }
+ // Past the range of disassembled instructions?
+ if (static_cast<uint64_t>(i) >= insts.GetSize()) {
+ auto fake_addr = insts.GetInstructionAtIndex(insts.GetSize() - 1)
+ .GetAddress()
+ .GetLoadAddress(g_dap.target) -
+ offset + i - insts.GetSize();
+ llvm::json::Object disassembled_inst{
+ {"address", "0x" + llvm::utohexstr(fake_addr)},
+ {"instruction", "Instruction after disassembled address range"},
+ {"presentationHint", "invalid"},
+ };
+ instructions.emplace_back(std::move(disassembled_inst));
+ continue;
+ }
+
lldb::SBInstruction inst = insts.GetInstructionAtIndex(i);
auto addr = inst.GetAddress();
const auto inst_addr = addr.GetLoadAddress(g_dap.target);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, let me know if you have any questions!
@@ -3910,18 +3910,36 @@ void request_disassemble(const llvm::json::Object &request) { | |||
return; | |||
} | |||
|
|||
addr_ptr += GetSigned(arguments, "instructionOffset", 0); | |||
lldb::SBAddress addr(addr_ptr, g_dap.target); | |||
addr_ptr += GetSigned(arguments, "offset", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "offset" required? If so we don't want to set it to zero if this fails, we would wan't to return an error. Use the above GetSigned
function that returns an llvm::Expected
@@ -3910,18 +3910,36 @@ void request_disassemble(const llvm::json::Object &request) { | |||
return; | |||
} | |||
|
|||
addr_ptr += GetSigned(arguments, "instructionOffset", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to use the "instructionOffset" here if it is specified. Seems like it isn't optional, so if it isn't here, then maybe return an error if it is required. We might want to add a overload of GetSigned like:
llvm::Expected<int64_t> GetSigned(arguments, "instructionOffset");
This would return an error if it isn't in the args, or if the decoding failed. We can just return a string version of the error as an error to this command.
lldb::SBInstructionList insts = | ||
g_dap.target.ReadInstructions(addr, inst_count); | ||
int64_t inst_offset = GetSigned(arguments, "instructionOffset", 0); | ||
const int64_t inst_count = GetSigned(arguments, "instructionCount", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the llvm::Expected<int64_t> GetSigned(...)
? If this is required. Same thing for "instructionOffset" above.
while (insts.GetInstructionAtIndex(offset).GetAddress().GetLoadAddress( | ||
g_dap.target) < addr.GetLoadAddress(g_dap.target)) | ||
++offset; | ||
offset += inst_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to watch for "offset" exceeding the size of the insts
instruction list here. If we exceed, we need to get the next function and continue the search and update the "inst_offset" and "offset" correctly if we exceed the boundary
instructions.emplace_back(std::move(disassembled_inst)); | ||
continue; | ||
} | ||
// Past the range of disassembled instructions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will only have 1 function's worth of instructions in "insts" and we will need to update it. I would keep trying to resolve a new Symbol or Function by adding the last instruction byte size the last load address of the last instruction in the list and keep searching for a function or symbol after each resolve call. We will bail if we reach the end of the bytes that were specified.
superseded by #140486 |
This is a first draft PR which fixes #103021
The main issue was that the
instructionOffset
was handled as a byte offset and not as an instruction offset.This commit also incorporates previous feedback from https://reviews.llvm.org/D140358: With this change, we are using symbols and DWARF debug information to find function boundaries and correctly start disassembling at this address.
TODO: