Skip to content

Initial step in targets DAP support #86623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 25, 2024

Conversation

jeffreytan81
Copy link
Contributor

@jeffreytan81 jeffreytan81 commented Mar 26, 2024

This patch provides the initial implementation for the "Step Into Specific/Step In Targets" feature in VSCode DAP.

The implementation disassembles all the call instructions in step range and try to resolve operand name (assuming one operand) using debug info. Later, the call target function name is chosen by end user and specified in the StepInto() API call.

It is v1 because of using the existing step in target function name API. This implementation has several limitations:

  • Won't for indirect/virtual function call -- in most cases, our disassembler won't be able to solve the indirect call target address/name.
  • Won't work for target function without debug info -- if the target function has symbol but not debug info, the existing ThreadPlanStepInRange won't stop.
  • Relying on function names can be fragile -- if there is some middle glue/thunk code, our disassembler can only resolve the glue/thunk code's name not the real target function name. It can be fragile to depend compiler/linker emits the same names for both.
  • Does not support step into raw address call sites -- it is a valid scenario that in Visual Studio debugger, user can explicitly choose a raw address to step into which land in the function without debug info/symbol, then choose UI to load the debug info on-demand for that module/frame to continue exploring.

A more reliable design could be extending the ThreadPlanStepInRange to support step in based on call-site instruction offset/PC which I will propose in next iteration.

Copy link

github-actions bot commented Mar 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

github-actions bot commented Mar 26, 2024

✅ With the latest revision this PR passed the Python code formatter.

@jeffreytan81 jeffreytan81 marked this pull request as ready for review March 26, 2024 16:20
@llvmbot llvmbot added the lldb label Mar 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2024

@llvm/pr-subscribers-lldb

Author: None (jeffreytan81)

Changes

This patch provides the initial implementation for the "Step Into Specific/Step In Targets" feature in VSCode DAP.

The implementation disassembles all the call instructions in step range and try to resolve operand name (assuming one operand) using debug info. Later, the call target function name is chosen by end user and specified in the StepInto() API call.

It is v1 because of using the existing step in target function name API. This implementation has several limitations:

  • Won't for indirect/virtual function call -- in most cases, our disassembler won't be able to solve the indirect call target address/name.
  • Won't work for target function without debug info -- if the target function has symbol but not debug info, the existing ThreadPlanStepInRange won't stop.
  • Relying on function names can be fragile -- if there is some middle glue/thunk code, our disassembler can only resolve the glue/thunk code's name not the real target function name. It can be fragile to depend compiler/linker emits the same names for both.
  • Does not support step into raw address call sites -- it is a valid scenario that in Visual Studio debugger, user can explicitly choose a raw address to step into which land in the function without debug info/symbol, then choose UI to load the debug info on-demand for that module/frame to continue exploring.

A more reliable design could be extending the ThreadPlanStepInRange to support step in based on call-site instruction offset/PC which I will propose in next iteration.


Full diff: https://github.com/llvm/llvm-project/pull/86623.diff

10 Files Affected:

  • (modified) lldb/include/lldb/API/SBLineEntry.h (+3)
  • (modified) lldb/include/lldb/API/SBSymbolContextList.h (+1)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+16-5)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py (+2-2)
  • (modified) lldb/source/API/SBLineEntry.cpp (+15)
  • (added) lldb/test/API/tools/lldb-dap/stepInTargets/Makefile (+6)
  • (added) lldb/test/API/tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py (+66)
  • (added) lldb/test/API/tools/lldb-dap/stepInTargets/main.cpp (+11)
  • (modified) lldb/tools/lldb-dap/DAP.h (+2)
  • (modified) lldb/tools/lldb-dap/lldb-dap.cpp (+148-2)
diff --git a/lldb/include/lldb/API/SBLineEntry.h b/lldb/include/lldb/API/SBLineEntry.h
index 7c2431ba3c8a51..d70c4fac6ec717 100644
--- a/lldb/include/lldb/API/SBLineEntry.h
+++ b/lldb/include/lldb/API/SBLineEntry.h
@@ -29,6 +29,9 @@ class LLDB_API SBLineEntry {
 
   lldb::SBAddress GetEndAddress() const;
 
+  lldb::SBAddress
+  GetSameLineContiguousAddressRangeEnd(bool include_inlined_functions) const;
+
   explicit operator bool() const;
 
   bool IsValid() const;
diff --git a/lldb/include/lldb/API/SBSymbolContextList.h b/lldb/include/lldb/API/SBSymbolContextList.h
index 4026afc213571c..95100d219df20f 100644
--- a/lldb/include/lldb/API/SBSymbolContextList.h
+++ b/lldb/include/lldb/API/SBSymbolContextList.h
@@ -44,6 +44,7 @@ class LLDB_API SBSymbolContextList {
 protected:
   friend class SBModule;
   friend class SBTarget;
+  friend class SBCompileUnit;
 
   lldb_private::SymbolContextList *operator->() const;
 
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 27a76a652f4063..45e48a10c7611d 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -811,23 +811,34 @@ def request_next(self, threadId):
         command_dict = {"command": "next", "type": "request", "arguments": args_dict}
         return self.send_recv(command_dict)
 
-    def request_stepIn(self, threadId):
+    def request_stepInTargets(self, frameId):
         if self.exit_status is not None:
-            raise ValueError("request_continue called after process exited")
-        args_dict = {"threadId": threadId}
+            raise ValueError("request_stepInTargets called after process exited")
+        args_dict = {"frameId": frameId}
+        command_dict = {
+            "command": "stepInTargets",
+            "type": "request",
+            "arguments": args_dict,
+        }
+        return self.send_recv(command_dict)
+    
+    def request_stepIn(self, threadId, targetId):
+        if self.exit_status is not None:
+            raise ValueError("request_stepIn called after process exited")
+        args_dict = {"threadId": threadId, "targetId": targetId}
         command_dict = {"command": "stepIn", "type": "request", "arguments": args_dict}
         return self.send_recv(command_dict)
 
     def request_stepOut(self, threadId):
         if self.exit_status is not None:
-            raise ValueError("request_continue called after process exited")
+            raise ValueError("request_stepOut called after process exited")
         args_dict = {"threadId": threadId}
         command_dict = {"command": "stepOut", "type": "request", "arguments": args_dict}
         return self.send_recv(command_dict)
 
     def request_pause(self, threadId=None):
         if self.exit_status is not None:
-            raise ValueError("request_continue called after process exited")
+            raise ValueError("request_pause called after process exited")
         if threadId is None:
             threadId = self.get_thread_id()
         args_dict = {"threadId": threadId}
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index 23f650d2d36fdd..d56ea5dca14beb 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -218,8 +218,8 @@ def set_global(self, name, value, id=None):
         """Set a top level global variable only."""
         return self.dap_server.request_setVariable(2, name, str(value), id=id)
 
-    def stepIn(self, threadId=None, waitForStop=True):
-        self.dap_server.request_stepIn(threadId=threadId)
+    def stepIn(self, threadId=None, targetId=None, waitForStop=True):
+        self.dap_server.request_stepIn(threadId=threadId, targetId=targetId)
         if waitForStop:
             return self.dap_server.wait_for_stopped()
         return None
diff --git a/lldb/source/API/SBLineEntry.cpp b/lldb/source/API/SBLineEntry.cpp
index 99a7b8fe644cb5..216ea6d18eab89 100644
--- a/lldb/source/API/SBLineEntry.cpp
+++ b/lldb/source/API/SBLineEntry.cpp
@@ -67,6 +67,21 @@ SBAddress SBLineEntry::GetEndAddress() const {
   return sb_address;
 }
 
+SBAddress SBLineEntry::GetSameLineContiguousAddressRangeEnd(
+    bool include_inlined_functions) const {
+  LLDB_INSTRUMENT_VA(this);
+
+  SBAddress sb_address;
+  if (m_opaque_up) {
+    AddressRange line_range = m_opaque_up->GetSameLineContiguousAddressRange(
+        include_inlined_functions);
+
+    sb_address.SetAddress(line_range.GetBaseAddress());
+    sb_address.OffsetAddress(line_range.GetByteSize());
+  }
+  return sb_address;
+}
+
 bool SBLineEntry::IsValid() const {
   LLDB_INSTRUMENT_VA(this);
   return this->operator bool();
diff --git a/lldb/test/API/tools/lldb-dap/stepInTargets/Makefile b/lldb/test/API/tools/lldb-dap/stepInTargets/Makefile
new file mode 100644
index 00000000000000..f772575cd5613b
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/stepInTargets/Makefile
@@ -0,0 +1,6 @@
+	
+ENABLE_THREADS := YES
+
+CXX_SOURCES := main.cpp
+
+include Makefile.rules
diff --git a/lldb/test/API/tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py b/lldb/test/API/tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py
new file mode 100644
index 00000000000000..3fa955305037cf
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py
@@ -0,0 +1,66 @@
+"""
+Test lldb-dap stepInTargets request
+"""
+
+import dap_server
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+import lldbdap_testcase
+from lldbsuite.test import lldbutil
+
+
+class TestDAP_stepInTargets(lldbdap_testcase.DAPTestCaseBase):
+    @skipIf(archs=no_match(["x86_64"]))  # ARM flow kind is not supported yet.
+    def test_basic(self):
+        """
+        Tests the basic stepping in targets with directly calls.
+        """
+        program = self.getBuildArtifact("a.out")
+        self.build_and_launch(program)
+        source = "main.cpp"
+
+        breakpoint_line = line_number(source, "// set breakpoint here")
+        lines = [breakpoint_line]
+        # Set breakpoint in the thread function so we can step the threads
+        breakpoint_ids = self.set_source_breakpoints(source, lines)
+        self.assertEqual(
+            len(breakpoint_ids), len(lines), "expect correct number of breakpoints"
+        )
+        self.continue_to_breakpoints(breakpoint_ids)
+
+        threads = self.dap_server.get_threads()
+        self.assertEqual(len(threads), 1, "expect one thread")
+        tid = threads[0]["id"]
+
+        leaf_frame = self.dap_server.get_stackFrame()
+        self.assertIsNotNone(leaf_frame, "expect a leaf frame")
+
+        # Request all step in targets list and verify the response.
+        step_in_targets_response = self.dap_server.request_stepInTargets(
+            leaf_frame["id"]
+        )
+        self.assertEqual(step_in_targets_response["success"], True, "expect success")
+        self.assertIn(
+            "body", step_in_targets_response, "expect body field in response body"
+        )
+        self.assertIn(
+            "targets",
+            step_in_targets_response["body"],
+            "expect targets field in response body",
+        )
+
+        step_in_targets = step_in_targets_response["body"]["targets"]
+        self.assertEqual(len(step_in_targets), 3, "expect 3 step in targets")
+
+        # Verify the target names are correct.
+        self.assertEqual(step_in_targets[0]["label"], "bar()", "expect bar()")
+        self.assertEqual(step_in_targets[1]["label"], "bar2()", "expect bar2()")
+        self.assertEqual(
+            step_in_targets[2]["label"], "foo(int, int)", "expect foo(int, int)"
+        )
+
+        # Choose to step into second target and verify that we are in bar2()
+        self.stepIn(threadId=tid, targetId=step_in_targets[1]["id"], waitForStop=True)
+        leaf_frame = self.dap_server.get_stackFrame()
+        self.assertIsNotNone(leaf_frame, "expect a leaf frame")
+        self.assertEqual(leaf_frame["name"], "bar2()")
diff --git a/lldb/test/API/tools/lldb-dap/stepInTargets/main.cpp b/lldb/test/API/tools/lldb-dap/stepInTargets/main.cpp
new file mode 100644
index 00000000000000..d3c3dbcc139ef0
--- /dev/null
+++ b/lldb/test/API/tools/lldb-dap/stepInTargets/main.cpp
@@ -0,0 +1,11 @@
+
+int foo(int val, int extra) { return val + extra; }
+
+int bar() { return 22; }
+
+int bar2() { return 54; }
+
+int main(int argc, char const *argv[]) {
+  foo(bar(), bar2()); // set breakpoint here
+  return 0;
+}
diff --git a/lldb/tools/lldb-dap/DAP.h b/lldb/tools/lldb-dap/DAP.h
index 8015dec9ba8fe6..5c70a056fea4bf 100644
--- a/lldb/tools/lldb-dap/DAP.h
+++ b/lldb/tools/lldb-dap/DAP.h
@@ -162,6 +162,8 @@ struct DAP {
   std::vector<std::string> exit_commands;
   std::vector<std::string> stop_commands;
   std::vector<std::string> terminate_commands;
+  // Map step in target id to list of function targets that user can choose.
+  llvm::DenseMap<lldb::addr_t, std::string> step_in_targets;
   // A copy of the last LaunchRequest or AttachRequest so we can reuse its
   // arguments if we get a RestartRequest.
   std::optional<llvm::json::Object> last_launch_or_attach_request;
diff --git a/lldb/tools/lldb-dap/lldb-dap.cpp b/lldb/tools/lldb-dap/lldb-dap.cpp
index 55f8c920e60016..30cba6d23f542d 100644
--- a/lldb/tools/lldb-dap/lldb-dap.cpp
+++ b/lldb/tools/lldb-dap/lldb-dap.cpp
@@ -1645,7 +1645,7 @@ void request_initialize(const llvm::json::Object &request) {
   // The debug adapter supports the gotoTargetsRequest.
   body.try_emplace("supportsGotoTargetsRequest", false);
   // The debug adapter supports the stepInTargetsRequest.
-  body.try_emplace("supportsStepInTargetsRequest", false);
+  body.try_emplace("supportsStepInTargetsRequest", true);
   // The debug adapter supports the completions request.
   body.try_emplace("supportsCompletionsRequest", true);
   // The debug adapter supports the disassembly request.
@@ -3180,14 +3180,159 @@ void request_stepIn(const llvm::json::Object &request) {
   llvm::json::Object response;
   FillResponse(request, response);
   auto arguments = request.getObject("arguments");
+
+  std::string step_in_target;
+  uint64_t target_id = GetUnsigned(arguments, "targetId", 0);
+  auto it = g_dap.step_in_targets.find(target_id);
+  if (it != g_dap.step_in_targets.end())
+    step_in_target = it->second;
+
+  const bool single_thread = GetBoolean(arguments, "singleThread", false);
+  lldb::RunMode run_mode =
+      single_thread ? lldb::eOnlyThisThread : lldb::eOnlyDuringStepping;
   lldb::SBThread thread = g_dap.GetLLDBThread(*arguments);
   if (thread.IsValid()) {
     // Remember the thread ID that caused the resume so we can set the
     // "threadCausedFocus" boolean value in the "stopped" events.
     g_dap.focus_tid = thread.GetThreadID();
-    thread.StepInto();
+    thread.StepInto(step_in_target.c_str(), run_mode);
+  } else {
+    response["success"] = llvm::json::Value(false);
+  }
+  g_dap.SendJSON(llvm::json::Value(std::move(response)));
+}
+
+// "StepInTargetsRequest": {
+//   "allOf": [ { "$ref": "#/definitions/Request" }, {
+//     "type": "object",
+//     "description": "This request retrieves the possible step-in targets for
+//     the specified stack frame.\nThese targets can be used in the `stepIn`
+//     request.\nClients should only call this request if the corresponding
+//     capability `supportsStepInTargetsRequest` is true.", "properties": {
+//       "command": {
+//         "type": "string",
+//         "enum": [ "stepInTargets" ]
+//       },
+//       "arguments": {
+//         "$ref": "#/definitions/StepInTargetsArguments"
+//       }
+//     },
+//     "required": [ "command", "arguments"  ]
+//   }]
+// },
+// "StepInTargetsArguments": {
+//   "type": "object",
+//   "description": "Arguments for `stepInTargets` request.",
+//   "properties": {
+//     "frameId": {
+//       "type": "integer",
+//       "description": "The stack frame for which to retrieve the possible
+//       step-in targets."
+//     }
+//   },
+//   "required": [ "frameId" ]
+// },
+// "StepInTargetsResponse": {
+//   "allOf": [ { "$ref": "#/definitions/Response" }, {
+//     "type": "object",
+//     "description": "Response to `stepInTargets` request.",
+//     "properties": {
+//       "body": {
+//         "type": "object",
+//         "properties": {
+//           "targets": {
+//             "type": "array",
+//             "items": {
+//               "$ref": "#/definitions/StepInTarget"
+//             },
+//             "description": "The possible step-in targets of the specified
+//             source location."
+//           }
+//         },
+//         "required": [ "targets" ]
+//       }
+//     },
+//     "required": [ "body" ]
+//   }]
+// }
+void request_stepInTargets(const llvm::json::Object &request) {
+  llvm::json::Object response;
+  FillResponse(request, response);
+  auto arguments = request.getObject("arguments");
+
+  g_dap.step_in_targets.clear();
+  lldb::SBFrame frame = g_dap.GetLLDBFrame(*arguments);
+  if (frame.IsValid()) {
+    lldb::SBAddress pc_addr = frame.GetPCAddress();
+    lldb::addr_t line_end_addr = pc_addr.GetLineEntry()
+                                     .GetSameLineContiguousAddressRangeEnd(
+                                         /*include_inlined_functions=*/true)
+                                     .GetLoadAddress(g_dap.target);
+
+    int max_inst_count = line_end_addr - pc_addr.GetLoadAddress(g_dap.target);
+    lldb::SBInstructionList insts =
+        g_dap.target.ReadInstructions(pc_addr, max_inst_count);
+
+    if (!insts.IsValid()) {
+      response["success"] = false;
+      response["message"] = "Failed to get instructions for frame.";
+      g_dap.SendJSON(llvm::json::Value(std::move(response)));
+      return;
+    }
+
+    llvm::json::Array step_in_targets;
+    const auto num_insts = insts.GetSize();
+    for (size_t i = 0; i < num_insts; ++i) {
+      lldb::SBInstruction inst = insts.GetInstructionAtIndex(i);
+      if (!inst.IsValid())
+        break;
+
+      lldb::addr_t inst_addr = inst.GetAddress().GetLoadAddress(g_dap.target);
+      // line_end_addr is exclusive of the line range.
+      if (inst_addr >= line_end_addr)
+        break;
+
+      // Note: currently only x86/x64 supports flow kind.
+      lldb::InstructionControlFlowKind flow_kind =
+          inst.GetControlFlowKind(g_dap.target);
+      if (flow_kind == lldb::eInstructionControlFlowKindCall) {
+        // Use call site instruction address as id which is easy to debug.
+        llvm::json::Object step_in_target;
+        step_in_target["id"] = inst_addr;
+
+        llvm::StringRef call_operand_name = inst.GetOperands(g_dap.target);
+        lldb::addr_t call_target_addr;
+        if (call_operand_name.getAsInteger(0, call_target_addr))
+          continue;
+
+        lldb::SBAddress call_target_load_addr =
+            g_dap.target.ResolveLoadAddress(call_target_addr);
+        if (!call_target_load_addr.IsValid())
+          continue;
+
+        lldb::SBSymbolContext sc = g_dap.target.ResolveSymbolContextForAddress(
+            call_target_load_addr, lldb::eSymbolContextFunction);
+
+        // Step in targets only supports functions with debug info.
+        std::string step_in_target_name;
+        if (sc.IsValid() && sc.GetFunction().IsValid())
+          step_in_target_name = sc.GetFunction().GetDisplayName();
+
+        // Skip call sites if we fail to resolve its symbol name.
+        if (step_in_target_name.empty())
+          continue;
+
+        g_dap.step_in_targets.try_emplace(inst_addr, step_in_target_name);
+        step_in_target.try_emplace("label", step_in_target_name);
+        step_in_targets.emplace_back(std::move(step_in_target));
+      }
+    }
+    llvm::json::Object body;
+    body.try_emplace("targets", std::move(step_in_targets));
+    response.try_emplace("body", std::move(body));
   } else {
     response["success"] = llvm::json::Value(false);
+    response["message"] = "Failed to get frame for input frameId.";
   }
   g_dap.SendJSON(llvm::json::Value(std::move(response)));
 }
@@ -3904,6 +4049,7 @@ void RegisterRequestCallbacks() {
   g_dap.RegisterRequestCallback("source", request_source);
   g_dap.RegisterRequestCallback("stackTrace", request_stackTrace);
   g_dap.RegisterRequestCallback("stepIn", request_stepIn);
+  g_dap.RegisterRequestCallback("stepInTargets", request_stepInTargets);
   g_dap.RegisterRequestCallback("stepOut", request_stepOut);
   g_dap.RegisterRequestCallback("threads", request_threads);
   g_dap.RegisterRequestCallback("variables", request_variables);

@jeffreytan81 jeffreytan81 requested a review from zhyty March 26, 2024 16:21


class TestDAP_stepInTargets(lldbdap_testcase.DAPTestCaseBase):
@skipIf(archs=no_match(["x86_64"])) # ARM flow kind is not supported yet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that this doesn't work on arm because of lack of InstructionControlFlowKind support

Comment on lines 3272 to 3274
int max_inst_count = line_end_addr - pc_addr.GetLoadAddress(g_dap.target);
lldb::SBInstructionList insts =
g_dap.target.ReadInstructions(pc_addr, max_inst_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a new ReadInstructions API that is range based. Internally you can use Disassembler::DisassembleRange.
That way you'll decode only the minimum necessary instructions.

continue;

lldb::SBSymbolContext sc = g_dap.target.ResolveSymbolContextForAddress(
call_target_load_addr, lldb::eSymbolContextFunction);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a comment here explaining why eSymbolContextSymbol is not being considered, even when seemingly it should just work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained the line below:

// Step in targets only supports functions with debug info

But can repeat here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you don't explain why

Comment on lines +32 to +34
lldb::SBAddress
GetSameLineContiguousAddressRangeEnd(bool include_inlined_functions) const;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this API right? We spoke about using existing APIs for this. I believe we can find the index the SBLineEntry in the line table using:

uint32_t SBCompileUnit::FindLineEntryIndex(lldb::SBLineEntry &line_entry, bool exact = false) const;

And then you can use:

lldb::SBLineEntry SBCompileUnit::GetLineEntryAtIndex(uint32_t idx) const;

With the next index until the line doesn't match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clayborg, right, I know I can do that but I did not use it for two reasons:

  1. That seems to be what GetSameLineContiguousAddressRangeEnd API is doing internally so I prefer to reuse the API than implement myself.
  2. This seems to depend on the dwarf line table encoding which may not work for other platforms like PDB? Hiding behind GetSameLineContiguousAddressRangeEnd makes the code more resilient?

Anyway, if you still prefer us calling "GetLineEntryAtIndex" repeat, I will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a second look at GetSameLineContiguousAddressRangeEnd implementation, it seems to take care of compiler generate code and inline functions which I find it is tedious to duplicate the implementation in lldb-dap:

if (*original_file_sp == *next_line_sc.line_entry.original_file_sp &&
(next_line_sc.line_entry.line == 0 ||
line == next_line_sc.line_entry.line)) {
// Include any line 0 entries - they indicate that this is compiler-
// generated code that does not correspond to user source code.
// next_line_sc is the same file & line as this LineEntry, so extend
// our AddressRange by its size and continue to see if there are more
// LineEntries that we can combine. However, if there was nothing to
// extend we're done.
if (!complete_line_range.Extend(next_line_sc.line_entry.range))
break;
continue;
}
if (include_inlined_functions && next_line_sc.block &&
next_line_sc.block->GetContainingInlinedBlock() != nullptr) {
// The next_line_sc might be in a different file if it's an inlined
// function. If this is the case then we still want to expand our line
// range to include them if the inlined function is at the same call site
// as this line entry. The current block could represent a nested inline
// function call so we need to need to check up the block tree to see if
// we find one.
auto inlined_parent_block =
next_line_sc.block->GetContainingInlinedBlockWithCallSite(
start_call_site);
if (!inlined_parent_block)
// We didn't find any parent inlined block with a call site at this line
// entry so this inlined function is probably at another line.
break;
// Extend our AddressRange by the size of the inlined block, but if there
// was nothing to add then we're done.
if (!complete_line_range.Extend(next_line_sc.line_entry.range))

So I think it makes more sense to reuse it. Please take a further review. Thanks.

Comment on lines 814 to 823
def request_stepInTargets(self, frameId):
if self.exit_status is not None:
raise ValueError("request_continue called after process exited")
args_dict = {"threadId": threadId}
raise ValueError("request_stepInTargets called after process exited")
args_dict = {"frameId": frameId}
command_dict = {
"command": "stepInTargets",
"type": "request",
"arguments": args_dict,
}
return self.send_recv(command_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move request_stepInTargets after the request_stepIn(self, threadID): So the diff doesn't get muddled up? Right now it thinks you rewrite request_stepIn, but if you add request_stepInTargets after this function the diff will be easier to read.

Comment on lines +70 to +84
SBAddress SBLineEntry::GetSameLineContiguousAddressRangeEnd(
bool include_inlined_functions) const {
LLDB_INSTRUMENT_VA(this);

SBAddress sb_address;
if (m_opaque_up) {
AddressRange line_range = m_opaque_up->GetSameLineContiguousAddressRange(
include_inlined_functions);

sb_address.SetAddress(line_range.GetBaseAddress());
sb_address.OffsetAddress(line_range.GetByteSize());
}
return sb_address;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this API and use existing ones I spoke about in the inline comment in the header file?

Comment on lines +3267 to +3268
lldb::SBAddress line_end_addr =
pc_addr.GetLineEntry().GetSameLineContiguousAddressRangeEnd(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use existing APIs for this without adding GetSameLineContiguousAddressRangeEnd() as mentioned above in inline comments.

Comment on lines +3296 to +3299
llvm::StringRef call_operand_name = inst.GetOperands(g_dap.target);
lldb::addr_t call_target_addr;
if (call_operand_name.getAsInteger(0, call_target_addr))
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how portable this will as different disassemblers will have different things in the operands. Also many disassemblers will have the function name already in the comment:

const char *func_name = inst.GetComment(g_dap.target);

Might be worth checking into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into using the comment. But it is not very portable:

    0x55555555a426 <+934>:  callq  0x555555558990 ; bar at main.cpp:64
    0x55555555a42b <+939>:  movl   %eax, -0xb0(%rbp)
    0x55555555a431 <+945>:  callq  0x5555555589a0 ; bar2 at main.cpp:68
    0x55555555a436 <+950>:  movl   -0xb0(%rbp), %edi
    0x55555555a43c <+956>:  movl   %eax, %esi
    0x55555555a43e <+958>:  callq  0x555555558970 ; foo at main.cpp:60
  1. We will have to manually parse the comment to get the target function name.
  2. The function name does not include the function overloaded parameters like "foo" vs "for(int, int)".

@jimingham
Copy link
Collaborator

jimingham commented Mar 27, 2024

This patch provides the initial implementation for the "Step Into Specific/Step In Targets" feature in VSCode DAP.

The implementation disassembles all the call instructions in step range and try to resolve operand name (assuming one operand) using debug info. Later, the call target function name is chosen by end user and specified in the StepInto() API call.

It is v1 because of using the existing step in target function name API. This implementation has several limitations:

  • Won't for indirect/virtual function call -- in most cases, our disassembler won't be able to solve the indirect call target address/name.

Targeted step-in's don't predict anything. The algorithm works by stepping in and then matching the target name against the function you've stopped in, and if it doesn't match step out and keep going. So you don't ever need to resolve any names up front. Are you saying for indirect/virtual function calls you can't figure out the name once you've arrived at the target?

  • Won't work for target function without debug info -- if the target function has symbol but not debug info, the existing ThreadPlanStepInRange won't stop.

That's just a matter of adding the --step-in-avoids-nodebug option to the SB API's. That works from the command line
already.

  • Relying on function names can be fragile -- if there is some middle glue/thunk code, our disassembler can only resolve the glue/thunk code's name not the real target function name. It can be fragile to depend compiler/linker emits the same names for both.

lldb should always step through glue or thunks automatically. That's going to get handled by the trampoline handler's "step through" plan so the targeted StepIn plan won't ever stop on a thunk. If you have a system where there aren't trampoline handlers for your indirect thunks or other glue code, you would be better served adding those than trying to get the step in plans to handle them directly.

  • Does not support step into raw address call sites -- it is a valid scenario that in Visual Studio debugger, user can explicitly choose a raw address to step into which land in the function without debug info/symbol, then choose UI to load the debug info on-demand for that module/frame to continue exploring.

That would be trivial to add.

A more reliable design could be extending the ThreadPlanStepInRange to support step in based on call-site instruction offset/PC which I will propose in next iteration.

For setting step-in targets based on call-site instruction, you definitely will need to add some support for a pc rather than a symbol match. That should be quite easy.

But I don't think that's a viable method for the "step in target" that people really want to use. They see:

some_func(other_func(method_call()));

and they want to step into method_call. I don't think there's any way you are going to know what address "method_call" will resolve to in this context. You'd have to know what "this" actually is, and deal with overload resolution, etc. That does not sound very doable to me.

Note that this feature as implemented is robust because it doesn't try to predict anything. It just steps through the range you told it to, stepping one level deep (not counting trampolines) and stops if the place it stopped matches the target. I don't think switching to a "predict the target up front and go there" method will be nearly as reliable. The expression you are stepping through could be way more complicated than the one I showed, and trying to figure out what some text snippet is actually going to resolve to could be quite complex.

Adding the address match in this algorithm is easy, since instead of asking "does the name match my target" you would ask "am I at the target address".

The problem of thunks and trampolines should be handled by having a trampoline handler that handles them. That will benefit all of your step-in behavior since otherwise you'll be stopping in the thunk, which no-one really wants. And it will mean you can ignore them for the purposes of targeted step in.

For the other problems you mentioned, I think what is needed is a richer way to provide the match between what the user sees in their code, and the symbol they are actually going to stop at. We could allow regex matches, or be smarter about stripping out irrelevant bits to get to a base name as we do in other instances.

@jeffreytan81
Copy link
Contributor Author

jeffreytan81 commented Mar 27, 2024

@jimingham, thanks for detailed comment.

"Step in targets" or "Step Into Specific" is an IDE feature that works in two steps:

  1. Show a list of call sites before stepping to allow users specify which call to step into
  2. Perform step into with user choosing call site function to stop.
    https://www.tabsoverspaces.com/233742-step-into-a-specific-call-when-debugging-in-visual-studio contains a good screenshot.

Note: in step 1, we will have to show users the callsite target function before performing stepping. For indirect/virtual call, you can't reliable resolve the target function name without running a smaller interpreter to simulate the execution.

I don't think there's any way you are going to know what address "method_call"
will resolve to in this context. You'd have to know what "this" actually is, and deal with overload resolution, etc. That does > not sound very doable to me.

I am not proposing using the function target PC, but the call instruction address in the call site side. For example:

    0x55555555a426 <+934>:  callq  0x555555558990 ; bar at main.cpp:64
    0x55555555a42b <+939>:  movl   %eax, -0xb0(%rbp)
    0x55555555a431 <+945>:  callq  0x5555555589a0 ; bar2 at main.cpp:68
    0x55555555a436 <+950>:  movl   -0xb0(%rbp), %edi
    0x55555555a43c <+956>:  movl   %eax, %esi
    0x55555555a43e <+958>:  callq  0x555555558970 ; foo at main.cpp:60

For the above step range above, if user wants to step into "bar2()" function, I propose us pass 0x55555555a431 as an extra callsite PC to underlying ThreadPlanStepInRange, so ThreadPlanStepInRange::DoWillResume() can detect the current IP would match the user specified call-site PC 0x55555555a431. Then ThreadPlanStepInRange::DefaultShouldStopHereCallback() can detect the state that there is a previous matching callsite PC, so return true to stop when eFrameCompareYounger.

Hope this makes sense.

@jimingham
Copy link
Collaborator

@jimingham, thanks for detailed comment.

"Step in targets" or "Step Into Specific" is an IDE feature that works in two steps:

  1. Show a list of call sites before stepping to allow users specify which call to step into
  2. Perform step into with user choosing call site function to stop.
    https://www.tabsoverspaces.com/233742-step-into-a-specific-call-when-debugging-in-visual-studio contains a good screenshot.

Note: in step 1, we will have to show users the callsite target function before performing stepping. For indirect/virtual call, you can't reliable resolve the target function name without running a smaller interpreter to simulate the execution.

I don't think there's any way you are going to know what address "method_call"
will resolve to in this context. You'd have to know what "this" actually is, and deal with overload resolution, etc. That does > not sound very doable to me.

I am not proposing using the function target PC, but the call instruction address in the call site side. For example:

    0x55555555a426 <+934>:  callq  0x555555558990 ; bar at main.cpp:64
    0x55555555a42b <+939>:  movl   %eax, -0xb0(%rbp)
    0x55555555a431 <+945>:  callq  0x5555555589a0 ; bar2 at main.cpp:68
    0x55555555a436 <+950>:  movl   -0xb0(%rbp), %edi
    0x55555555a43c <+956>:  movl   %eax, %esi
    0x55555555a43e <+958>:  callq  0x555555558970 ; foo at main.cpp:60

For the above step range above, if user wants to step into "bar2()" function, I propose us pass 0x55555555a431 as an extra callsite PC to underlying ThreadPlanStepInRange, so ThreadPlanStepInRange::DoWillResume() can detect the current IP would match the user specified call-site PC 0x55555555a431. Then ThreadPlanStepInRange::DefaultShouldStopHereCallback() can detect the state that there is a previous matching callsite PC, so return true to stop when eFrameCompareYounger.

Hope this makes sense.

That sounds fine. I try to stay out of the game of telling people what is going to run in any given range of code because it is pretty hard to get that right from assembly, whereas if you watch execution you are always right. But if that's what the DAP requires, that seems to be your fate.

It would be fine to add an m_step_into_target_addr to ThreadPlanStepInRange and check that in its ShouldStopHereCallback. Not sure why you mention DoWillResume, that's not where ShouldStopHere gets called, and deciding your step was done then would be way too late in the whole "I've stopped, what should I do next" negotiation. But if you put it in the ShouldStopHereCallback it will get called at the right time.

Copy link
Contributor

@kusmour kusmour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments are addressed. Stamping to unblock. We can iterate with another PR :D

@jeffreytan81 jeffreytan81 merged commit 2f2e31c into llvm:main Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants