Skip to content

[lldb] Claim to support swbreak and hwbreak packets when debugging a gdbremote #102873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

xusheng6
Copy link
Contributor

@xusheng6 xusheng6 commented Aug 12, 2024

This fixes #56125 and vadimcn/codelldb#666, as well as the downstream issue in our binary ninja debugger: Vector35/debugger#535

Basically, lldb does not claim to support the swbreak packet so the gdbserver would not use it. As a result, the gdbserver always sends the unmodified program counter value which, on systems like x86, causes the program counter to be off-by-one (or otherwise wrong). For reference, the lldb-server always sends the modified program counter value so it works perfectly with lldb.

https://sourceware.org/gdb/current/onlinedocs/gdb.html/Stop-Reply-Packets.html#swbreak-stop-reason

No new code is added to add support swbreak, since the way lldb works already expects the remote to have adjusted the program counter. The change just lets the gdbserver know that lldb supports it, so that it will send the adjusted program counter.

To test this PR, you can use lldb to connect to a gdbserver running on e.g., Ubuntu 22.04, and see the program counter is off-by-one without the patch. With the patch, things work as expected

@xusheng6 xusheng6 requested a review from JDevlieghere as a code owner August 12, 2024 10:07
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-lldb

Author: xusheng (xusheng6)

Changes

This fixes #56125 and vadimcn/codelldb#666, as well as the downstream issue in our binary ninja debugger: Vector35/debugger#535

Basically, lldb does not claim to support the swbreak packet so the gdbserver would not use it. As a result, it always sends the unmodified program counter value which, on systems like x86, causes the program counter to be off-by-off (or otherwise wrong).

No new code is added to add support swbreak, since the way lldb works already expects the remote to have adjusted the program counter. The change just lets the gdbserver know that lldb supports it, and it will send the adjusted program counter.

I am unable to add a unit test for this. I browsed the gdbserver unit test code and it seems to be testing against the gdbserver in lldb, which also does not use the swbreak packet. But as discussed in the issue, there is no reason to add support for sending swbreak in lldb-server.

To test this PR, you can use lldb to connect to a gdbserver running on e.g., Ubuntu 22.04, and see the program counter is off-by-one without the patch. With the patch, things work as expected


Full diff: https://github.com/llvm/llvm-project/pull/102873.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp (+1-1)
  • (modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+3)
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
index 74e392249a94eb..d8b17a8ff59baf 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
@@ -353,7 +353,7 @@ void GDBRemoteCommunicationClient::GetRemoteQSupported() {
   // build the qSupported packet
   std::vector<std::string> features = {"xmlRegisters=i386,arm,mips,arc",
                                        "multiprocess+", "fork-events+",
-                                       "vfork-events+"};
+                                       "vfork-events+", "swbreak+", "hwbreak+"};
   StreamString packet;
   packet.PutCString("qSupported");
   for (uint32_t i = 0; i < features.size(); ++i) {
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 6f9c2cc1e4b4e8..b8fe8fdc9b8742 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -2349,6 +2349,9 @@ StateType ProcessGDBRemote::SetThreadStopInfo(StringExtractor &stop_packet) {
         if (!value.getAsInteger(0, addressing_bits)) {
           addressable_bits.SetHighmemAddressableBits(addressing_bits);
         }
+      } else if (key.compare("swbreak") == 0 || key.compare("hwbreak") == 0) {
+        // There is nothing needs to be done for swbreak or hwbreak since
+        // the value is expected to be empty
       } else if (key.size() == 2 && ::isxdigit(key[0]) && ::isxdigit(key[1])) {
         uint32_t reg = UINT32_MAX;
         if (!key.getAsInteger(16, reg))

Copy link

github-actions bot commented Aug 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@DavidSpickett
Copy link
Collaborator

On the testing, lldb sends this qsupported value regardless of what the remote sends. So there's nothing to do there unless you just repeat the feature list. There are some tests that look at lldb-server's qsupported values, but I don't see any for the client lldb.

You could check we don't crash when seeing swbreak/hwbreak in a stop response, but it's the same as checking we don't crash on any other unknown key in there. Since you can't observe any behaviour change from swbreak/hwbreak, so I don't think that has much value either.

What you could do is write a test where lldb connects to a mock gdbserver that claims to be an architecture where the correction to addresses would be done by a gdb server (GDB's gdbserver I mean). Then via lldb check that the stopped location is unmodified. You could then run the test with swbreak / hwbreak / neither of those in the stop info and each time lldb will report the value unchanged.

So in the unlikely case that we decided to change how lldb behaves, the test would fail. lldb/test/API/functionalities/gdb_remote_client/TestStopPCs.py might be a good starting point.

@DavidSpickett
Copy link
Collaborator

Also there is probably a qsupported parsing function in lldb-server, please add a comment there to say we consume lldb's swbreak/hwbreak feature but it doesn't change the behaviour of lldb-server.

@xusheng6
Copy link
Contributor Author

Also there is probably a qsupported parsing function in lldb-server, please add a comment there to say we consume lldb's swbreak/hwbreak feature but it doesn't change the behaviour of lldb-server.

Fixed in d0851d4

@xusheng6
Copy link
Contributor Author

On the testing, lldb sends this qsupported value regardless of what the remote sends. So there's nothing to do there unless you just repeat the feature list. There are some tests that look at lldb-server's qsupported values, but I don't see any for the client lldb.

You could check we don't crash when seeing swbreak/hwbreak in a stop response, but it's the same as checking we don't crash on any other unknown key in there. Since you can't observe any behaviour change from swbreak/hwbreak, so I don't think that has much value either.

What you could do is write a test where lldb connects to a mock gdbserver that claims to be an architecture where the correction to addresses would be done by a gdb server (GDB's gdbserver I mean). Then via lldb check that the stopped location is unmodified. You could then run the test with swbreak / hwbreak / neither of those in the stop info and each time lldb will report the value unchanged.

So in the unlikely case that we decided to change how lldb behaves, the test would fail. lldb/test/API/functionalities/gdb_remote_client/TestStopPCs.py might be a good starting point.

I totally agree with you that the first two cases, even if added, would not provide much value. For the third case, i.e., writing a mock gdbserver, I think that is quite a bit of work -- or do we already have some infra that I can use?

By the way, do I always need some unit test change to get a PR accepted? In this particular case I do not see a compelling reason to add a new test, though if this is a LLVM development policy then I will start looking into it.

@DavidSpickett
Copy link
Collaborator

or do we already have some infra that I can use?

lldb/test/API/functionalities/gdb_remote_client/TestStopPCs.py is an example of that infrastructure. The threadStopInfo is where you would insert the swbreak hwbreak key.

In fact, if i386 is an architecture where this correction of PC might happen (or not) you could just extend that test. The result should be the same whether the mock gdbserver returns "swbreak" "hwbreak" or leaves that field out completely.

If i386 isn't an architecture where correction would be done, you should be able to find some other target xml stubs elsewhere in tests. I don't think this test case is i386 specific. It can be tricky because that XML needs to include a minimal set of registers before lldb will function, so if in doubt, copy paste a working one.

By the way, do I always need some unit test change to get a PR accepted? In this particular case I do not see a compelling reason to add a new test, though if this is a LLVM development policy then I will start looking into it.

Not 100% of the time but you will always have to justify why it does not have tests, or has the sort of tests that it has. This justification is useful for auditing code later, it also serves as a guide to anyone trying to reproduce an old bug that has no test cases.

https://llvm.org/docs/DeveloperPolicy.html#test-cases is written from the llvm perspective but applies to lldb as well https://lldb.llvm.org/resources/contributing.html#test-infrastructure.

In this case because no one does lldb -> gdbserver testing within the llvm project, so my worry is that this "easy fix" will get forgotten over time and someone will remove it as dead code or forget it in a refactoring. This is not a case of "if it has no tests it will get deleted" but it will certainly be at higher risk.

We have had cases like this before for example, someone fixed a bug handling an msp430 simulator and the test for that replays the packets that it sent to lldb so we don't have to have the actual simulator to hand.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Aug 12, 2024

AArch64 for example would be:

<?xml version="1.0"?>
<target version="1.0">
  <architecture>aarch64</architecture>
  <feature name="org.gnu.gdb.aarch64.core">
    <reg name="pc" bitsize="64"/>
    <reg name="x0" regnum="0" bitsize="64"/>
    <reg name="cpsr" regnum="33" bitsize="32"/>
  </feature>
</target>

Which I got by working out what lldb/test/API/functionalities/gdb_remote_client/TestXMLRegisterFlags.py's XML expands to.

@xusheng6
Copy link
Contributor Author

or do we already have some infra that I can use?

lldb/test/API/functionalities/gdb_remote_client/TestStopPCs.py is an example of that infrastructure. The threadStopInfo is where you would insert the swbreak hwbreak key.

In fact, if i386 is an architecture where this correction of PC might happen (or not) you could just extend that test. The result should be the same whether the mock gdbserver returns "swbreak" "hwbreak" or leaves that field out completely.

If i386 isn't an architecture where correction would be done, you should be able to find some other target xml stubs elsewhere in tests. I don't think this test case is i386 specific. It can be tricky because that XML needs to include a minimal set of registers before lldb will function, so if in doubt, copy paste a working one.

By the way, do I always need some unit test change to get a PR accepted? In this particular case I do not see a compelling reason to add a new test, though if this is a LLVM development policy then I will start looking into it.

Not 100% of the time but you will always have to justify why it does not have tests, or has the sort of tests that it has. This justification is useful for auditing code later, it also serves as a guide to anyone trying to reproduce an old bug that has no test cases.

https://llvm.org/docs/DeveloperPolicy.html#test-cases is written from the llvm perspective but applies to lldb as well https://lldb.llvm.org/resources/contributing.html#test-infrastructure.

In this case because no one does lldb -> gdbserver testing within the llvm project, so my worry is that this "easy fix" will get forgotten over time and someone will remove it as dead code or forget it in a refactoring. This is not a case of "if it has no tests it will get deleted" but it will certainly be at higher risk.

We have had cases like this before for example, someone fixed a bug handling an msp430 simulator and the test for that replays the packets that it sent to lldb so we don't have to have the actual simulator to hand.

Yes, i386 is an architecture where the PC adjustment happens, and I have tentatively added a unit test for it in 55f9473

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Aug 13, 2024

The test steps are fine, but I was assuming that we'd extend the existing TestStopPCs.py to do the same thing in less code. If that test had 1 more thread, you could have normal / swbreak / hwbreak threads in one test case.

Copy link

github-actions bot commented Aug 13, 2024

✅ With the latest revision this PR passed the Python code formatter.

@DavidSpickett DavidSpickett merged commit 5dbec8c into llvm:main Aug 13, 2024
4 of 5 checks passed
@DavidSpickett
Copy link
Collaborator

Thanks for the contribution! The test especially, it will keep us honest and protect your downstream tools going forward.

Copy link

@xusheng6 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing up the patch. We do see some handling of this behavior for x86 in lldb, e.g. ProcessWindowsNative, and KDP protocol support with x86 Darwin kernel debugging, also breakpoint-pc-offset in ProcessGDBRemote. But we do assume that the pc value reported by a thread that hit a breakpoint will be the start of the breakpoint instruction, not the end.

The only question I have is -- if we are talking to a stub that supports swbreak/hwbreak, and we get a stop packet without those that hit a breakpoint, do we need to decr the pc value? I'm guessing gdbserver et al don't report a reason:breakpoint key-value pair in the stop packet. Should we treat the presence of swbreak or hwbreak as equivalent to receiving a reason:breakpoint in the packet (see the reason argument passed to ProcessGDBRemote::SetThreadStopInfo) to unambiguously indicate that a breakpoint was hit by this thread?

@@ -4245,6 +4245,10 @@ std::vector<std::string> GDBRemoteCommunicationServerLLGS::HandleFeatures(
.Case("vfork-events+", Extension::vfork)
.Default({});

// We consume lldb's swbreak/hwbreak feature, but it doesn't change the
// behaviour of lldb-server. We always adjust the program counter for targets
// like x86
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please end the comment with a period.

@@ -2354,6 +2354,9 @@ StateType ProcessGDBRemote::SetThreadStopInfo(StringExtractor &stop_packet) {
if (!key.getAsInteger(16, reg))
expedited_register_map[reg] = std::string(std::move(value));
}
// swbreak and hwbreak are also expected keys, but we don't need to
// change our behaviour for them because lldb always expects the remote
// to adjust the program counter (if relevant, e.g., for x86 targets)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please end the comment with a period.

@jasonmolenda
Copy link
Collaborator

oh lol I missed that this was already merged. But I am curious what you think about treating swbreak/hwbreak as equivalent to having received reason:breakpoint.

@jasonmolenda
Copy link
Collaborator

Just to be clear, if I'm understanding the packet we'll be getting back, we have no indication that we hit the breakpoint, we only show that we are stopped at an address which has a breakpoint. Current lldb stepping behavior will work -- because the rule is, when we stop at a breakpoint address, we will say we hit the breakpoint. I am refining a patch #96260 which changes this behavior -- we handle "instruction stepped / stopped at a breakpoint" differently than "hit a breakpoint". I worry this difference will be lost with a stub that reports swbreak/hwbreak, stepping won't work correctly, and we won't capture it in the testsuite or on any of our CI bots.

@xusheng6
Copy link
Contributor Author

Just to be clear, if I'm understanding the packet we'll be getting back, we have no indication that we hit the breakpoint, we only show that we are stopped at an address which has a breakpoint. Current lldb stepping behavior will work -- because the rule is, when we stop at a breakpoint address, we will say we hit the breakpoint. I am refining a patch #96260 which changes this behavior -- we handle "instruction stepped / stopped at a breakpoint" differently than "hit a breakpoint". I worry this difference will be lost with a stub that reports swbreak/hwbreak, stepping won't work correctly, and we won't capture it in the testsuite or on any of our CI bots.

My work mainly concerns the case when lldb is connected to a gdbserver, and as far as I can tell, gdbserver does not send a reason:breakpoint packet at all. Maybe lldb-server sends it? I am not sure

Also, I do not think lldb itself is adjusting the PC -- I think lldb-server does it, and that is also the reason why my PR did not need to alter the lldb behavior at all. lldb always expects the remote to have adjusted the PC if necessary, and it just uses whatever value that is reported

@labath
Copy link
Collaborator

labath commented Aug 14, 2024

Just to be clear, if I'm understanding the packet we'll be getting back, we have no indication that we hit the breakpoint, we only show that we are stopped at an address which has a breakpoint. Current lldb stepping behavior will work -- because the rule is, when we stop at a breakpoint address, we will say we hit the breakpoint. I am refining a patch #96260 which changes this behavior -- we handle "instruction stepped / stopped at a breakpoint" differently than "hit a breakpoint". I worry this difference will be lost with a stub that reports swbreak/hwbreak, stepping won't work correctly, and we won't capture it in the testsuite or on any of our CI bots.

My work mainly concerns the case when lldb is connected to a gdbserver, and as far as I can tell, gdbserver does not send a reason:breakpoint packet at all. Maybe lldb-server sends it?

Yes, it does.

Also, I do not think lldb itself is adjusting the PC -- I think lldb-server does it, and that is also the reason why my PR did not need to alter the lldb behavior at all. lldb always expects the remote to have adjusted the PC if necessary, and it just uses whatever value that is reported

That is true, but I believe Jason is saying something slightly different. Currently you are able to use lldb+gdbserver because lldb treats a stop as a breakpoint hit every time we stop and the PC value is equal to one of the breakpoints. Jason's patch will change that and have lldb report a breakpoint hit only if it server tells it that was the reason for the stop. lldb-server does that (reason:breakpoint), and everything will work fine, but that will most likely break your use case.

To (preemtively) fix that, you should probably change lldb to recognize the gdb response (swbreak) and treat it the same way as the one from lldb-server. Ideally, you should also add a test to make sure this works (and stays working) -- for that, you could probably extend the test you added here to check that lldb reports the correct stop reason.

@xusheng6
Copy link
Contributor Author

Just to be clear, if I'm understanding the packet we'll be getting back, we have no indication that we hit the breakpoint, we only show that we are stopped at an address which has a breakpoint. Current lldb stepping behavior will work -- because the rule is, when we stop at a breakpoint address, we will say we hit the breakpoint. I am refining a patch #96260 which changes this behavior -- we handle "instruction stepped / stopped at a breakpoint" differently than "hit a breakpoint". I worry this difference will be lost with a stub that reports swbreak/hwbreak, stepping won't work correctly, and we won't capture it in the testsuite or on any of our CI bots.

My work mainly concerns the case when lldb is connected to a gdbserver, and as far as I can tell, gdbserver does not send a reason:breakpoint packet at all. Maybe lldb-server sends it?

Yes, it does.

Also, I do not think lldb itself is adjusting the PC -- I think lldb-server does it, and that is also the reason why my PR did not need to alter the lldb behavior at all. lldb always expects the remote to have adjusted the PC if necessary, and it just uses whatever value that is reported

That is true, but I believe Jason is saying something slightly different. Currently you are able to use lldb+gdbserver because lldb treats a stop as a breakpoint hit every time we stop and the PC value is equal to one of the breakpoints. Jason's patch will change that and have lldb report a breakpoint hit only if it server tells it that was the reason for the stop. lldb-server does that (reason:breakpoint), and everything will work fine, but that will most likely break your use case.

To (preemtively) fix that, you should probably change lldb to recognize the gdb response (swbreak) and treat it the same way as the one from lldb-server. Ideally, you should also add a test to make sure this works (and stays working) -- for that, you could probably extend the test you added here to check that lldb reports the correct stop reason.

Oh I see your point now, and I also see the lldb is handling a reason:xxxx packet from the lldb-server. In this sense, I think it is fair to treat "swbreak/hwbreak" in the same way as "reason:breakpoint".

@jasonmolenda do you think you can handle this along with your patch, or you think I should do something for it preemptively as suggested by labath? I personally prefer the former case because I am quite new to the lldb code base

@jasonmolenda
Copy link
Collaborator

To (preemtively) fix that, you should probably change lldb to recognize the gdb response (swbreak) and treat it the same way as the one from lldb-server. Ideally, you should also add a test to make sure this works (and stays working) -- for that, you could probably extend the test you added here to check that lldb reports the correct stop reason.

Oh I see your point now, and I also see the lldb is handling a reason:xxxx packet from the lldb-server. In this sense, I think it is fair to treat "swbreak/hwbreak" in the same way as "reason:breakpoint".

@jasonmolenda do you think you can handle this along with your patch, or you think I should do something for it preemptively as suggested by labath? I personally prefer the former case because I am quite new to the lldb code base

No problem, I'll make that change in ProcessGDBRemote when I re-land my patch (I have a few outstanding issues the CI bots found when I first landed it). My change will be untested, but it is a simple change. We already fake up a reason:watchpoint stop reason when we get watch/rwatch/awatch in a stop reply packet today, it will be the same kind of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LLDB: GDB remote protocol feature/packet swbreak not supported
5 participants