-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lldb] Skip remote PutFile when MD5 hashes equal #88812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Skip remote PutFile when MD5 hashes equal #88812
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lldb Author: Anthony Ha (Awfa) ChangesThis PR adds a check within As I needed this to talk to an Full diff: https://github.com/llvm/llvm-project/pull/88812.diff 5 Files Affected:
diff --git a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
index 88f1ad15b6b485..d9f3e40019cf29 100644
--- a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
+++ b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
@@ -684,6 +684,15 @@ Status PlatformRemoteGDBServer::RunShellCommand(
signo_ptr, command_output, timeout);
}
+bool PlatformRemoteGDBServer::CalculateMD5(const FileSpec &file_spec,
+ uint64_t &low, uint64_t &high) {
+ if (IsConnected()) {
+ // Note that high/low are switched in the gdb remote communication client
+ return m_gdb_client_up->CalculateMD5(file_spec, high, low);
+ }
+ return false;
+}
+
void PlatformRemoteGDBServer::CalculateTrapHandlerSymbolNames() {
m_trap_handlers.push_back(ConstString("_sigtramp"));
}
diff --git a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
index 638f7db5ef800c..d83fc386f59427 100644
--- a/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
+++ b/lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.h
@@ -146,6 +146,9 @@ class PlatformRemoteGDBServer : public Platform, private UserIDResolver {
void CalculateTrapHandlerSymbolNames() override;
+ bool CalculateMD5(const FileSpec &file_spec, uint64_t &low,
+ uint64_t &high) override;
+
const lldb::UnixSignalsSP &GetRemoteUnixSignals() override;
size_t ConnectToWaitingProcesses(lldb_private::Debugger &debugger,
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
index 1f6116967a4f64..8c79d5fecf12fe 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
@@ -3433,8 +3433,40 @@ bool GDBRemoteCommunicationClient::CalculateMD5(
return false;
if (response.Peek() && *response.Peek() == 'x')
return false;
- low = response.GetHexMaxU64(false, UINT64_MAX);
- high = response.GetHexMaxU64(false, UINT64_MAX);
+
+ // GDBRemoteCommunicationServerCommon::Handle_vFile_MD5 concatenates low and
+ // high hex strings. We can't use response.GetHexMaxU64 because that can't
+ // handle the concatenated hex string. What would happen is parsing the low
+ // would consume the whole response packet - which is a bug. Instead, we get
+ // the byte string for each low and high hex separately, and parse them.
+ //
+ // An alternate way to handle this is to change the server to put a
+ // delimiter between the low/high parts, and change the client to parse the
+ // delimiter. However, we choose not to do this so existing lldb-servers
+ // don't have to be patched
+
+ // Get low part
+ auto part = response.GetStringRef().substr(response.GetFilePos(),
+ sizeof(uint64_t) * 2);
+ if (part.size() != sizeof(uint64_t) * 2)
+ return false;
+ response.SetFilePos(response.GetFilePos() + part.size());
+
+ bool conversionErrored = part.getAsInteger(16, low);
+ if (conversionErrored)
+ return false;
+
+ // Get high part
+ part = response.GetStringRef().substr(response.GetFilePos(),
+ sizeof(uint64_t) * 2);
+ if (part.size() != sizeof(uint64_t) * 2)
+ return false;
+ response.SetFilePos(response.GetFilePos() + part.size());
+
+ conversionErrored = part.getAsInteger(16, high);
+ if (conversionErrored)
+ return false;
+
return true;
}
return false;
diff --git a/lldb/source/Target/Platform.cpp b/lldb/source/Target/Platform.cpp
index 4ce290dfbe035f..cdbafb17a5df4d 100644
--- a/lldb/source/Target/Platform.cpp
+++ b/lldb/source/Target/Platform.cpp
@@ -1184,7 +1184,7 @@ bool Platform::IsCompatibleArchitecture(const ArchSpec &arch,
Status Platform::PutFile(const FileSpec &source, const FileSpec &destination,
uint32_t uid, uint32_t gid) {
Log *log = GetLog(LLDBLog::Platform);
- LLDB_LOGF(log, "[PutFile] Using block by block transfer....\n");
+ LLDB_LOGF(log, "[PutFile] Using block by block transfer....");
auto source_open_options =
File::eOpenOptionReadOnly | File::eOpenOptionCloseOnExec;
@@ -1197,6 +1197,34 @@ Status Platform::PutFile(const FileSpec &source, const FileSpec &destination,
if (!source_file)
return Status(source_file.takeError());
Status error;
+
+ bool requires_upload = true;
+ {
+ uint64_t dest_md5_low, dest_md5_high;
+ bool success = CalculateMD5(destination, dest_md5_low, dest_md5_high);
+ if (!success) {
+ LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of destination");
+ } else {
+ auto local_md5 = llvm::sys::fs::md5_contents(source.GetPath());
+ if (!local_md5) {
+ LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of source");
+ } else {
+ uint64_t local_md5_high, local_md5_low;
+ std::tie(local_md5_high, local_md5_low) = local_md5->words();
+ LLDB_LOGF(log, "[PutFile] destination md5: %016" PRIx64 "%016" PRIx64,
+ dest_md5_high, dest_md5_low);
+ LLDB_LOGF(log, "[PutFile] local md5: %016" PRIx64 "%016" PRIx64,
+ local_md5_high, local_md5_low);
+ requires_upload =
+ local_md5_high != dest_md5_high || local_md5_low != dest_md5_low;
+ }
+ }
+ }
+ if (!requires_upload) {
+ LLDB_LOGF(log, "[PutFile] skipping PutFile because md5sums match");
+ return error;
+ }
+
uint32_t permissions = source_file.get()->GetPermissions(error);
if (permissions == 0)
permissions = lldb::eFilePermissionsFileDefault;
diff --git a/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp b/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
index f93cf8ce679c5b..8e0b2805e292e5 100644
--- a/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
+++ b/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
@@ -592,3 +592,26 @@ TEST_F(GDBRemoteCommunicationClientTest, WriteMemoryTags) {
std::vector<uint8_t>{0x99}, "QMemTags:456789,0:80000000:99",
"E03", false);
}
+
+TEST_F(GDBRemoteCommunicationClientTest, CalculateMD5) {
+ FileSpec file_spec("/foo/bar", FileSpec::Style::posix);
+ uint64_t high, low;
+ std::future<bool> async_result = std::async(std::launch::async, [&] {
+ return client.CalculateMD5(file_spec, high, low);
+ });
+
+ lldb_private::StreamString stream;
+ stream.PutCString("vFile:MD5:");
+ stream.PutStringAsRawHex8(file_spec.GetPath(false));
+ HandlePacket(server, stream.GetString().str(),
+ "F,"
+ "deadbeef01020304"
+ "05060708deadbeef");
+ ASSERT_TRUE(async_result.get());
+
+ // Server and client puts/parses low, and then high
+ const uint64_t expected_low = 0xdeadbeef01020304;
+ const uint64_t expected_high = 0x05060708deadbeef;
+ EXPECT_EQ(expected_low, low);
+ EXPECT_EQ(expected_high, high);
+}
|
lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
Outdated
Show resolved
Hide resolved
[lldb] Skip remote PutFile when MD5 hashes equal
@weliveindetail I think this might fix the problems you were having remote debugging clang-repl (I can't seem to find the actual Discourse thread). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better! Let's get the comments and style sorted out, but I think this is looking good :)
lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
Outdated
Show resolved
Hide resolved
[lldb] Skip remote PutFile when MD5 hashes equal
MD5 is insufficient for claiming that files are identical; how do we migrate this to a secure hash? |
Is there an attack vector you're concerned about? Or are you wary of workflow friction when a file won't upload to the remote platform because the hashes accidently collide? |
It's the latter I'm thinking of. I don't imagine this particularly interesting or valuable to an attacker, but MD5 collisions are becoming quite easy to obtain and I think it's plausible some software will make use of intentional collisions. Mainly I think we should have a plan to move to a better hash -- is there any interop issue with GDB's protocol/stub? |
If it were a GDB packet it would be in https://sourceware.org/gdb/current/onlinedocs/gdb.html/Host-I_002fO-Packets.html#Host-I_002fO-Packets but I see no sign of it, or in GDB's sources. The first appearance of So a good follow up PR would be to list it in that document. It's self explanatory but still, weird that it's not there. This means we could add In the case of this specific PR, it's fixing code that should have worked all along but didn't. So the issue of MD5 collision is worth looking at, but I wouldn't let it block this. @emaste, since you know about the shortcomings of MD5, could you open an issue for this? And I'll add to that the internal lldb details. |
@bulbazord please give the final approval if it looks good to you. We can sort out moving from MD5 on an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ok, submitted #89271 for the MD5 migration. I agree that issue does not block this change. |
Someone will have to merge for me because it says "Only those with write access to this repository can merge pull requests." Am I supposed to request write access somewhere before making these PRs? I haven't contributed before. |
First time contributors aren't expected to have commit access, so no worries there. I approved it so I can land this change for you. Let's keep an eye on the build bots in case this inadvertently introduces any regressions. :) |
@Awfa Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
bool GDBRemoteCommunicationClient::CalculateMD5( | ||
const lldb_private::FileSpec &file_spec, uint64_t &high, uint64_t &low) { | ||
const lldb_private::FileSpec &file_spec, uint64_t &low, uint64_t &high) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're changing the interface anyway, could we have this return an std::optional<llvm::MD5::MD5Result>
to match what md5_contents
returns?
if (!success) { | ||
LLDB_LOGF(log, "[PutFile] couldn't get md5 sum of destination"); | ||
} else { | ||
auto local_md5 = llvm::sys::fs::md5_contents(source.GetPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious from the right hand side what the return value is. Case in point, it returns an ErrorOr<MD5::MD5Result>
and if I hadn't looked it up I wouldn't have noticed we're not actually logging the error.
I got an email that the build failed and I should revert because of this: https://lab.llvm.org/buildbot/#/builders/68/builds/72623 @bulbazord can you or someone with write permissions revert this PR so I have time to triage the issue? |
|
The next build is green, those DAP tests do fail once in a while so it's unrelated to your change. |
Please address @JDevlieghere's comments in a new PR. |
As David said, no need since that was unrelated to your change. Jonas makes a good point in his review though (that I totally missed when I reviewed), please open a new PR that addresses his feedback. |
Thanks for taking a look at the build. I will do a new PR addressing the issues. |
# Overview In my previous PR: #88812, @JDevlieghere suggested to match return types of the various calculate md5 functions. This PR achieves that by changing the various calculate md5 functions to return `llvm::ErrorOr<llvm::MD5::MD5Result>`. The suggestion was to go for `std::optional<>` but I opted for `llvm::ErrorOr<>` because local calculate md5 was already possibly returning `ErrorOr`. To make sure I didn't break the md5 calculation functionality, I ran some tests for the gdb remote client, and things seem to work. # Testing 1. Remote file doesn't exist  1. Remote file differs  1. Remote file matches  ## Test gaps Unfortunately, I had to modify `lldb/source/Plugins/Platform/MacOSX/PlatformDarwinDevice.cpp` and I can't test the changes there. Hopefully, the existing test suite / code review from whomever is reading this will catch any issues. Co-authored-by: Anthony Ha <[email protected]>
This is a retake of #90921 which got reverted because I forgot to modify the CalculateMD5 unit test I had added in #88812 The prior failing build is here: https://lab.llvm.org/buildbot/#/builders/68/builds/73622 To make sure this error doesn't happen, I ran `ninja ProcessGdbRemoteTests` and then executed the resulting test binary and observed the `CalculateMD5` test passed. # Overview In my previous PR: #88812, @JDevlieghere suggested to match return types of the various calculate md5 functions. This PR achieves that by changing the various calculate md5 functions to return `llvm::ErrorOr<llvm::MD5::MD5Result>`. The suggestion was to go for `std::optional<>` but I opted for `llvm::ErrorOr<>` because local calculate md5 was already possibly returning `ErrorOr`. To make sure I didn't break the md5 calculation functionality, I ran some tests for the gdb remote client, and things seem to work. # Testing 1. Remote file doesn't exist  1. Remote file differs  1. Remote file matches  ## Test gaps Unfortunately, I had to modify `lldb/source/Plugins/Platform/MacOSX/PlatformDarwinDevice.cpp` and I can't test the changes there. Hopefully, the existing test suite / code review from whomever is reading this will catch any issues.
This PR adds a check within
PutFile
to exit early when both local and destination files have matching MD5 hashes. If they differ, or there is trouble getting the hashes, the regular code path to put the file is run.As I needed this to talk to an
lldb-server
which runs the gdb-remote protocol, I enabledCalculateMD5
withinPlatform/gdb-server
and also found and fixed a parsing bug within it as well. Before this PR, the client is incorrectly parsing the response packet containing the checksum; after this PR, hopefully this is fixed. There is a test for the parsing behavior included in this PR.