Skip to content

[lldb] Skip remote PutFile when MD5 hashes equal #88812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 18, 2024

Conversation

Awfa
Copy link
Contributor

@Awfa Awfa commented Apr 15, 2024

This PR adds a check within PutFile to exit early when both local and destination files have matching MD5 hashes. If they differ, or there is trouble getting the hashes, the regular code path to put the file is run.

As I needed this to talk to an lldb-server which runs the gdb-remote protocol, I enabled CalculateMD5 within Platform/gdb-server and also found and fixed a parsing bug within it as well. Before this PR, the client is incorrectly parsing the response packet containing the checksum; after this PR, hopefully this is fixed. There is a test for the parsing behavior included in this PR.

@Awfa Awfa requested a review from JDevlieghere as a code owner April 15, 2024 22:55
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Apr 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-lldb

Author: Anthony Ha (Awfa)

Changes

This PR adds a check within PutFile to exit early when both local and destination files have matching MD5 hashes. If they differ, or there is trouble getting the hashes, the regular code path to put the file is run.

As I needed this to talk to an lldb-server which runs the gdb-remote protocol, I enabled CalculateMD5 within Platform/gdb-server and also found and fixed a parsing bug within it as well. Before this PR, the client is incorrectly parsing the response packet containing the checksum; after this PR, hopefully this is fixed. There is a test for the parsing behavior included in this PR.


Full diff: https://github.com/llvm/llvm-project/pull/88812.diff

5 Files Affected:

  • (modified) lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp (+9)
  • (modified) lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h (+3)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp (+34-2)
  • (modified) lldb/source/Target/Platform.cpp (+29-1)
  • (modified) lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp (+23)
diff --git a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
index 88f1ad15b6b485..d9f3e40019cf29 100644
--- a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
+++ b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
@@ -684,6 +684,15 @@ Status PlatformRemoteGDBServer::RunShellCommand(
                                           signo_ptr, command_output, timeout);
 }
 
+bool PlatformRemoteGDBServer::CalculateMD5(const FileSpec &file_spec,
+                                           uint64_t &low, uint64_t &high) {
+  if (IsConnected()) {
+    // Note that high/low are switched in the gdb remote communication client
+    return m_gdb_client_up->CalculateMD5(file_spec, high, low);
+  }
+  return false;
+}
+
 void PlatformRemoteGDBServer::CalculateTrapHandlerSymbolNames() {
   m_trap_handlers.push_back(ConstString("_sigtramp"));
 }
diff --git a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
index 638f7db5ef800c..d83fc386f59427 100644
--- a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
+++ b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
@@ -146,6 +146,9 @@ class PlatformRemoteGDBServer : public Platform, private UserIDResolver {
 
   void CalculateTrapHandlerSymbolNames() override;
 
+  bool CalculateMD5(const FileSpec &file_spec, uint64_t &low,
+                    uint64_t &high) override;
+
   const lldb::UnixSignalsSP &GetRemoteUnixSignals() override;
 
   size_t ConnectToWaitingProcesses(lldb_private::Debugger &debugger,
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
index 1f6116967a4f64..8c79d5fecf12fe 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
@@ -3433,8 +3433,40 @@ bool GDBRemoteCommunicationClient::CalculateMD5(
       return false;
     if (response.Peek() && *response.Peek() == 'x')
       return false;
-    low = response.GetHexMaxU64(false, UINT64_MAX);
-    high = response.GetHexMaxU64(false, UINT64_MAX);
+
+    // GDBRemoteCommunicationServerCommon::Handle_vFile_MD5 concatenates low and
+    // high hex strings. We can't use response.GetHexMaxU64 because that can't
+    // handle the concatenated hex string. What would happen is parsing the low
+    // would consume the whole response packet - which is a bug. Instead, we get
+    // the byte string for each low and high hex separately, and parse them.
+    //
+    // An alternate way to handle this is to change the server to put a
+    // delimiter between the low/high parts, and change the client to parse the
+    // delimiter. However, we choose not to do this so existing lldb-servers
+    // don't have to be patched
+
+    // Get low part
+    auto part = response.GetStringRef().substr(response.GetFilePos(),
+                                               sizeof(uint64_t) * 2);
+    if (part.size() != sizeof(uint64_t) * 2)
+      return false;
+    response.SetFilePos(response.GetFilePos() + part.size());
+
+    bool conversionErrored = part.getAsInteger(16, low);
+    if (conversionErrored)
+      return false;
+
+    // Get high part
+    part = response.GetStringRef().substr(response.GetFilePos(),
+                                          sizeof(uint64_t) * 2);
+    if (part.size() != sizeof(uint64_t) * 2)
+      return false;
+    response.SetFilePos(response.GetFilePos() + part.size());
+
+    conversionErrored = part.getAsInteger(16, high);
+    if (conversionErrored)
+      return false;
+
     return true;
   }
   return false;
diff --git a/lldb/source/Target/Platform.cpp b/lldb/source/Target/Platform.cpp
index 4ce290dfbe035f..cdbafb17a5df4d 100644
--- a/lldb/source/Target/Platform.cpp
+++ b/lldb/source/Target/Platform.cpp
@@ -1184,7 +1184,7 @@ bool Platform::IsCompatibleArchitecture(const ArchSpec &arch,
 Status Platform::PutFile(const FileSpec &source, const FileSpec &destination,
                          uint32_t uid, uint32_t gid) {
   Log *log = GetLog(LLDBLog::Platform);
-  LLDB_LOGF(log, "[PutFile] Using block by block transfer....\n");
+  LLDB_LOGF(log, "[PutFile] Using block by block transfer....");
 
   auto source_open_options =
       File::eOpenOptionReadOnly | File::eOpenOptionCloseOnExec;
@@ -1197,6 +1197,34 @@ Status Platform::PutFile(const FileSpec &source, const FileSpec &destination,
   if (!source_file)
     return Status(source_file.takeError());
   Status error;
+
+  bool requires_upload = true;
+  {
+    uint64_t dest_md5_low, dest_md5_high;
+    bool success = CalculateMD5(destination, dest_md5_low, dest_md5_high);
+    if (!success) {
+      LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of destination");
+    } else {
+      auto local_md5 = llvm::sys::fs::md5_contents(source.GetPath());
+      if (!local_md5) {
+        LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of source");
+      } else {
+        uint64_t local_md5_high, local_md5_low;
+        std::tie(local_md5_high, local_md5_low) = local_md5->words();
+        LLDB_LOGF(log, "[PutFile] destination md5: %016" PRIx64 "%016" PRIx64,
+                  dest_md5_high, dest_md5_low);
+        LLDB_LOGF(log, "[PutFile]       local md5: %016" PRIx64 "%016" PRIx64,
+                  local_md5_high, local_md5_low);
+        requires_upload =
+            local_md5_high != dest_md5_high || local_md5_low != dest_md5_low;
+      }
+    }
+  }
+  if (!requires_upload) {
+    LLDB_LOGF(log, "[PutFile] skipping PutFile because md5sums match");
+    return error;
+  }
+
   uint32_t permissions = source_file.get()->GetPermissions(error);
   if (permissions == 0)
     permissions = lldb::eFilePermissionsFileDefault;
diff --git a/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp b/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
index f93cf8ce679c5b..8e0b2805e292e5 100644
--- a/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
+++ b/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
@@ -592,3 +592,26 @@ TEST_F(GDBRemoteCommunicationClientTest, WriteMemoryTags) {
                  std::vector<uint8_t>{0x99}, "QMemTags:456789,0:80000000:99",
                  "E03", false);
 }
+
+TEST_F(GDBRemoteCommunicationClientTest, CalculateMD5) {
+  FileSpec file_spec("/foo/bar", FileSpec::Style::posix);
+  uint64_t high, low;
+  std::future<bool> async_result = std::async(std::launch::async, [&] {
+    return client.CalculateMD5(file_spec, high, low);
+  });
+
+  lldb_private::StreamString stream;
+  stream.PutCString("vFile:MD5:");
+  stream.PutStringAsRawHex8(file_spec.GetPath(false));
+  HandlePacket(server, stream.GetString().str(),
+               "F,"
+               "deadbeef01020304"
+               "05060708deadbeef");
+  ASSERT_TRUE(async_result.get());
+
+  // Server and client puts/parses low, and then high
+  const uint64_t expected_low = 0xdeadbeef01020304;
+  const uint64_t expected_high = 0x05060708deadbeef;
+  EXPECT_EQ(expected_low, low);
+  EXPECT_EQ(expected_high, high);
+}

[lldb] Skip remote PutFile when MD5 hashes equal
@Awfa Awfa requested a review from bulbazord April 16, 2024 05:08
@DavidSpickett
Copy link
Collaborator

@weliveindetail I think this might fix the problems you were having remote debugging clang-repl (I can't seem to find the actual Discourse thread).

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better! Let's get the comments and style sorted out, but I think this is looking good :)

[lldb] Skip remote PutFile when MD5 hashes equal
@Awfa Awfa requested review from DavidSpickett and bulbazord April 17, 2024 00:23
@emaste
Copy link
Member

emaste commented Apr 17, 2024

MD5 is insufficient for claiming that files are identical; how do we migrate this to a secure hash?

@Awfa
Copy link
Contributor Author

Awfa commented Apr 17, 2024

MD5 is insufficient for claiming that files are identical; how do we migrate this to a secure hash?

Is there an attack vector you're concerned about?

Or are you wary of workflow friction when a file won't upload to the remote platform because the hashes accidently collide?

@emaste
Copy link
Member

emaste commented Apr 17, 2024

MD5 is insufficient for claiming that files are identical; how do we migrate this to a secure hash?

Is there an attack vector you're concerned about?

Or are you wary of workflow friction when a file won't upload to the remote platform because the hashes accidently collide?

It's the latter I'm thinking of. I don't imagine this particularly interesting or valuable to an attacker, but MD5 collisions are becoming quite easy to obtain and I think it's plausible some software will make use of intentional collisions. Mainly I think we should have a plan to move to a better hash -- is there any interop issue with GDB's protocol/stub?

@DavidSpickett
Copy link
Collaborator

If it were a GDB packet it would be in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Host-I_002fO-Packets.html#Host-I_002fO-Packets but I see no sign of it, or in GDB's sources.

The first appearance of vFile:MD5 is e0f8f57 though it is not documented as one of our extensions in https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt.

So a good follow up PR would be to list it in that document. It's self explanatory but still, weird that it's not there.

This means we could add vFile:betterHash (or a more general hash:<kind>) and try sending it to the remote. If the remote responds that it doesn't support it, we ask it for MD5. Though we may want a way to say don't use MD5 even if the remote would accept it, if security is the concern (a bit like ssh key formats work).

In the case of this specific PR, it's fixing code that should have worked all along but didn't. So the issue of MD5 collision is worth looking at, but I wouldn't let it block this.

@emaste, since you know about the shortcomings of MD5, could you open an issue for this? And I'll add to that the internal lldb details.

@DavidSpickett
Copy link
Collaborator

@bulbazord please give the final approval if it looks good to you.

We can sort out moving from MD5 on an issue.

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emaste
Copy link
Member

emaste commented Apr 18, 2024

Ok, submitted #89271 for the MD5 migration. I agree that issue does not block this change.

@Awfa
Copy link
Contributor Author

Awfa commented Apr 18, 2024

Someone will have to merge for me because it says "Only those with write access to this repository can merge pull requests."

Am I supposed to request write access somewhere before making these PRs? I haven't contributed before.

@bulbazord
Copy link
Member

Someone will have to merge for me because it says "Only those with write access to this repository can merge pull requests."

Am I supposed to request write access somewhere before making these PRs? I haven't contributed before.

First time contributors aren't expected to have commit access, so no worries there. I approved it so I can land this change for you. Let's keep an eye on the build bots in case this inadvertently introduces any regressions. :)

@bulbazord bulbazord merged commit 22c26fa into llvm:main Apr 18, 2024
Copy link

@Awfa Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Comment on lines 3421 to +3422
bool GDBRemoteCommunicationClient::CalculateMD5(
const lldb_private::FileSpec &file_spec, uint64_t &high, uint64_t &low) {
const lldb_private::FileSpec &file_spec, uint64_t &low, uint64_t &high) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing the interface anyway, could we have this return an std::optional<llvm::MD5::MD5Result> to match what md5_contents returns?

if (!success) {
LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of destination");
} else {
auto local_md5 = llvm::sys::fs::md5_contents(source.GetPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious from the right hand side what the return value is. Case in point, it returns an ErrorOr<MD5::MD5Result> and if I hadn't looked it up I wouldn't have noticed we're not actually logging the error.

@Awfa
Copy link
Contributor Author

Awfa commented Apr 19, 2024

I got an email that the build failed and I should revert because of this: https://lab.llvm.org/buildbot/#/builders/68/builds/72623

@bulbazord can you or someone with write permissions revert this PR so I have time to triage the issue?

@DavidSpickett
Copy link
Collaborator

So a good follow up PR would be to list it in that document. It's self explanatory but still, weird that it's not there.

#89357

@DavidSpickett
Copy link
Collaborator

@bulbazord can you or someone with write permissions revert this PR so I have time to triage the issue?

The next build is green, those DAP tests do fail once in a while so it's unrelated to your change.

@DavidSpickett
Copy link
Collaborator

Please address @JDevlieghere's comments in a new PR.

@bulbazord
Copy link
Member

I got an email that the build failed and I should revert because of this: https://lab.llvm.org/buildbot/#/builders/68/builds/72623

@bulbazord can you or someone with write permissions revert this PR so I have time to triage the issue?

As David said, no need since that was unrelated to your change. Jonas makes a good point in his review though (that I totally missed when I reviewed), please open a new PR that addresses his feedback.

@Awfa
Copy link
Contributor Author

Awfa commented Apr 19, 2024

Thanks for taking a look at the build. I will do a new PR addressing the issues.

JDevlieghere pushed a commit that referenced this pull request May 3, 2024
# Overview
In my previous PR: #88812,
@JDevlieghere suggested to match return types of the various calculate
md5 functions.

This PR achieves that by changing the various calculate md5 functions to
return `llvm::ErrorOr<llvm::MD5::MD5Result>`.
 
The suggestion was to go for `std::optional<>` but I opted for
`llvm::ErrorOr<>` because local calculate md5 was already possibly
returning `ErrorOr`.

To make sure I didn't break the md5 calculation functionality, I ran
some tests for the gdb remote client, and things seem to work.

# Testing
1. Remote file doesn't exist

![image](https://github.com/llvm/llvm-project/assets/1326275/b26859e2-18c3-4685-be8f-c6b6a5a4bc77)

1. Remote file differs

![image](https://github.com/llvm/llvm-project/assets/1326275/cbdb3c58-555a-401b-9444-c5ff4c04c491)

1. Remote file matches

![image](https://github.com/llvm/llvm-project/assets/1326275/07561572-22d1-4e0a-988f-bc91b5c2ffce)

## Test gaps
Unfortunately, I had to modify
`lldb/source/Plugins/Platform/MacOSX/PlatformDarwinDevice.cpp` and I
can't test the changes there. Hopefully, the existing test suite / code
review from whomever is reading this will catch any issues.

Co-authored-by: Anthony Ha <[email protected]>
JDevlieghere pushed a commit that referenced this pull request May 9, 2024
This is a retake of #90921
which got reverted because I forgot to modify the CalculateMD5 unit test
I had added in #88812

The prior failing build is here:
https://lab.llvm.org/buildbot/#/builders/68/builds/73622
To make sure this error doesn't happen, I ran `ninja
ProcessGdbRemoteTests` and then executed the resulting test binary and
observed the `CalculateMD5` test passed.

# Overview
In my previous PR: #88812,
@JDevlieghere suggested to match return types of the various calculate
md5 functions.

This PR achieves that by changing the various calculate md5 functions to
return `llvm::ErrorOr<llvm::MD5::MD5Result>`.
 
The suggestion was to go for `std::optional<>` but I opted for
`llvm::ErrorOr<>` because local calculate md5 was already possibly
returning `ErrorOr`.

To make sure I didn't break the md5 calculation functionality, I ran
some tests for the gdb remote client, and things seem to work.

# Testing
1. Remote file doesn't exist

![image](https://github.com/llvm/llvm-project/assets/1326275/b26859e2-18c3-4685-be8f-c6b6a5a4bc77)

1. Remote file differs

![image](https://github.com/llvm/llvm-project/assets/1326275/cbdb3c58-555a-401b-9444-c5ff4c04c491)

1. Remote file matches

![image](https://github.com/llvm/llvm-project/assets/1326275/07561572-22d1-4e0a-988f-bc91b5c2ffce)

## Test gaps
Unfortunately, I had to modify
`lldb/source/Plugins/Platform/MacOSX/PlatformDarwinDevice.cpp` and I
can't test the changes there. Hopefully, the existing test suite / code
review from whomever is reading this will catch any issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants