Skip to content

[lldb-dap] Fix disassembled ranges for disassemble request #105446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vogelsgesang
Copy link
Member

This is a first draft PR which fixes #103021

The main issue was that the instructionOffset was handled as a byte offset and not as an instruction offset.

This commit also incorporates previous feedback from https://reviews.llvm.org/D140358: With this change, we are using symbols and DWARF debug information to find function boundaries and correctly start disassembling at this address.

TODO:

  • Update test case
  • Check if we can also support disassembling across functions

TODO:
* Update test case
* Check if we can also support disassembling across functions

Incorporated feedback from https://reviews.llvm.org/D140358
Copy link

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 82ee31f75ac1316006fa9e21dddfddec37cf7072...b809f570dd8055e5b899c337ec9ff5110ab94c6e lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
View the diff from darker here.
--- packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py	2024-08-20 00:24:14.000000 +0000
+++ packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py	2024-08-20 23:09:29.317158 +0000
@@ -672,11 +672,15 @@
             "arguments": args_dict,
         }
         return self.send_recv(command_dict)
 
     def request_disassemble(
-        self, memoryReference, instructionOffset=0, instructionCount=10, resolveSymbols=True
+        self,
+        memoryReference,
+        instructionOffset=0,
+        instructionCount=10,
+        resolveSymbols=True,
     ):
         args_dict = {
             "memoryReference": memoryReference,
             "instructionOffset": instructionOffset,
             "instructionCount": instructionCount,

@vogelsgesang
Copy link
Member Author

@clayborg I would be interested in your guidance for this PR, given that you previously reviewed a similar change in https://reviews.llvm.org/D140358

I did use SBFunction / SBSymbol as suggested by you in this comment.

I did not know how to address the part

    // then we need to repeat the search for the next function or symbol. 
    // note there may be bytes between functions or symbols which we can disassemble
    // by calling _get_instructions_from_memory(...) but we must find the next
    // symbol or function boundary and get back on track

from your comment, though. Is there some API to find the next symbol? Or some API to get a list of symbols, sorted by their start address?

I could of course use a brute-force approach, similar to

SBAddress addr;
while (!addr.GetFunction() && !addr.GetSymbol())
   addr = addr.OffsetAddress(1);

but that seems a bit wasteful

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2024

@llvm/pr-subscribers-lldb

Author: Adrian Vogelsgesang (vogelsgesang)

Changes

This is a first draft PR which fixes #103021

The main issue was that the instructionOffset was handled as a byte offset and not as an instruction offset.

This commit also incorporates previous feedback from https://reviews.llvm.org/D140358: With this change, we are using symbols and DWARF debug information to find function boundaries and correctly start disassembling at this address.

TODO:

  • Update test case
  • Check if we can also support disassembling across functions

Full diff: https://github.com/llvm/llvm-project/pull/105446.diff

3 Files Affected:

  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+2-2)
  • (modified) lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py (+8)
  • (modified) lldb/tools/lldb-dap/lldb-dap.cpp (+61-7)
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index a324af57b61df3..75731ebfde6723 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -674,11 +674,11 @@ def request_disconnect(self, terminateDebuggee=None):
         return self.send_recv(command_dict)
 
     def request_disassemble(
-        self, memoryReference, offset=-50, instructionCount=200, resolveSymbols=True
+        self, memoryReference, instructionOffset=0, instructionCount=10, resolveSymbols=True
     ):
         args_dict = {
             "memoryReference": memoryReference,
-            "offset": offset,
+            "instructionOffset": instructionOffset,
             "instructionCount": instructionCount,
             "resolveSymbols": resolveSymbols,
         }
diff --git a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
index 9e8ef5b289f2e8..ab72eb2d13d729 100644
--- a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
+++ b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
@@ -29,6 +29,14 @@ def test_disassemble(self):
         )
         self.continue_to_next_stop()
 
+        stackFrames = self.get_stackFrames(
+            threadId=threadId, startFrame=frameIndex, levels=1
+        )
+        self.assertIsNotNone(stackFrames)
+
+        # XXX finish updating test case
+        memoryReference = stackFrames[0]["instructionPointerReference"]
+
         pc_assembly = self.disassemble(frameIndex=0)
         self.assertIn("location", pc_assembly, "Source location missing.")
         self.assertIn("instruction", pc_assembly, "Assembly instruction missing.")
diff --git a/lldb/tools/lldb-dap/lldb-dap.cpp b/lldb/tools/lldb-dap/lldb-dap.cpp
index ea84f31aec3a6c..5d9d28b59ea805 100644
--- a/lldb/tools/lldb-dap/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/lldb-dap.cpp
@@ -3910,8 +3910,8 @@ void request_disassemble(const llvm::json::Object &request) {
     return;
   }
 
-  addr_ptr += GetSigned(arguments, "instructionOffset", 0);
-  lldb::SBAddress addr(addr_ptr, g_dap.target);
+  addr_ptr += GetSigned(arguments, "offset", 0);
+  lldb::SBAddress addr = g_dap.target.ResolveLoadAddress(addr_ptr);
   if (!addr.IsValid()) {
     response["success"] = false;
     response["message"] = "Memory reference not found in the current binary.";
@@ -3919,9 +3919,27 @@ void request_disassemble(const llvm::json::Object &request) {
     return;
   }
 
-  const auto inst_count = GetUnsigned(arguments, "instructionCount", 0);
-  lldb::SBInstructionList insts =
-      g_dap.target.ReadInstructions(addr, inst_count);
+  int64_t inst_offset = GetSigned(arguments, "instructionOffset", 0);
+  const int64_t inst_count = GetSigned(arguments, "instructionCount", 0);
+  if (inst_count < 0) {
+    response["success"] = false;
+    response["message"] = "Negative instruction count.";
+    g_dap.SendJSON(llvm::json::Value(std::move(response)));
+    return;
+  }
+
+  lldb::SBInstructionList insts;
+  if (lldb::SBFunction func = addr.GetFunction()) {
+    // First try to identify the function boundaries through debug information
+    insts = func.GetInstructions(g_dap.target);
+  } else if (lldb::SBSymbol sym = addr.GetSymbol()) {
+    // Try to identify the function boundaries through the symbol table
+    insts = sym.GetInstructions(g_dap.target);
+  } else {
+    // We could not identify the function. Just disassemble starting from the
+    // provided address.
+    insts = g_dap.target.ReadInstructions(addr, inst_offset + inst_count);
+  }
 
   if (!insts.IsValid()) {
     response["success"] = false;
@@ -3930,10 +3948,46 @@ void request_disassemble(const llvm::json::Object &request) {
     return;
   }
 
+  // Find the start offset in the disassembled function
+  ssize_t offset = 0;
+  while (insts.GetInstructionAtIndex(offset).GetAddress().GetLoadAddress(
+             g_dap.target) < addr.GetLoadAddress(g_dap.target))
+    ++offset;
+  offset += inst_offset;
+
+  // Format the disassembled instructions
   const bool resolveSymbols = GetBoolean(arguments, "resolveSymbols", false);
   llvm::json::Array instructions;
-  const auto num_insts = insts.GetSize();
-  for (size_t i = 0; i < num_insts; ++i) {
+  for (ssize_t i = offset; i < inst_count + offset; ++i) {
+    // Before the range of disassembled instructions?
+    if (i < 0) {
+      auto fake_addr =
+          insts.GetInstructionAtIndex(0).GetAddress().GetLoadAddress(
+              g_dap.target) -
+          offset + i;
+      llvm::json::Object disassembled_inst{
+          {"address", "0x" + llvm::utohexstr(fake_addr)},
+          {"instruction", "Instruction before disassembled address range"},
+          {"presentationHint", "invalid"},
+      };
+      instructions.emplace_back(std::move(disassembled_inst));
+      continue;
+    }
+    // Past the range of disassembled instructions?
+    if (static_cast<uint64_t>(i) >= insts.GetSize()) {
+      auto fake_addr = insts.GetInstructionAtIndex(insts.GetSize() - 1)
+                           .GetAddress()
+                           .GetLoadAddress(g_dap.target) -
+                       offset + i - insts.GetSize();
+      llvm::json::Object disassembled_inst{
+          {"address", "0x" + llvm::utohexstr(fake_addr)},
+          {"instruction", "Instruction after disassembled address range"},
+          {"presentationHint", "invalid"},
+      };
+      instructions.emplace_back(std::move(disassembled_inst));
+      continue;
+    }
+
     lldb::SBInstruction inst = insts.GetInstructionAtIndex(i);
     auto addr = inst.GetAddress();
     const auto inst_addr = addr.GetLoadAddress(g_dap.target);

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, let me know if you have any questions!

@@ -3910,18 +3910,36 @@ void request_disassemble(const llvm::json::Object &request) {
return;
}

addr_ptr += GetSigned(arguments, "instructionOffset", 0);
lldb::SBAddress addr(addr_ptr, g_dap.target);
addr_ptr += GetSigned(arguments, "offset", 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "offset" required? If so we don't want to set it to zero if this fails, we would wan't to return an error. Use the above GetSigned function that returns an llvm::Expected

@@ -3910,18 +3910,36 @@ void request_disassemble(const llvm::json::Object &request) {
return;
}

addr_ptr += GetSigned(arguments, "instructionOffset", 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to use the "instructionOffset" here if it is specified. Seems like it isn't optional, so if it isn't here, then maybe return an error if it is required. We might want to add a overload of GetSigned like:

llvm::Expected<int64_t> GetSigned(arguments, "instructionOffset");

This would return an error if it isn't in the args, or if the decoding failed. We can just return a string version of the error as an error to this command.

lldb::SBInstructionList insts =
g_dap.target.ReadInstructions(addr, inst_count);
int64_t inst_offset = GetSigned(arguments, "instructionOffset", 0);
const int64_t inst_count = GetSigned(arguments, "instructionCount", 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the llvm::Expected<int64_t> GetSigned(...)? If this is required. Same thing for "instructionOffset" above.

while (insts.GetInstructionAtIndex(offset).GetAddress().GetLoadAddress(
g_dap.target) < addr.GetLoadAddress(g_dap.target))
++offset;
offset += inst_offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to watch for "offset" exceeding the size of the insts instruction list here. If we exceed, we need to get the next function and continue the search and update the "inst_offset" and "offset" correctly if we exceed the boundary

instructions.emplace_back(std::move(disassembled_inst));
continue;
}
// Past the range of disassembled instructions?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will only have 1 function's worth of instructions in "insts" and we will need to update it. I would keep trying to resolve a new Symbol or Function by adding the last instruction byte size the last load address of the last instruction in the list and keep searching for a function or symbol after each resolve call. We will bail if we reach the end of the bytes that were specified.

@vogelsgesang
Copy link
Member Author

superseded by #140486

@vogelsgesang vogelsgesang deleted the avogelsgesang-fix-disassemble branch June 11, 2025 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lldb-dap] incorrect offsets in assembly view
3 participants