Skip to content

[lldb] Add Checksum to FileSpec #71457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 9, 2023

Conversation

JDevlieghere
Copy link
Member

Store a Checksum in FileSpec. Its purpose is to store the MD5 hash added
to the DWARF 5 line table.

@llvmbot llvmbot added the lldb label Nov 6, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2023

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Store a Checksum in FileSpec. Its purpose is to store the MD5 hash added
to the DWARF 5 line table.


Full diff: https://github.com/llvm/llvm-project/pull/71457.diff

8 Files Affected:

  • (added) lldb/include/lldb/Utility/Checksum.h (+36)
  • (modified) lldb/include/lldb/Utility/FileSpec.h (+17-2)
  • (modified) lldb/source/Utility/CMakeLists.txt (+1)
  • (added) lldb/source/Utility/Checksum.cpp (+46)
  • (modified) lldb/source/Utility/FileSpec.cpp (+6-3)
  • (modified) lldb/unittests/Utility/CMakeLists.txt (+1)
  • (added) lldb/unittests/Utility/ChecksumTest.cpp (+57)
  • (modified) lldb/unittests/Utility/FileSpecTest.cpp (+21-37)
diff --git a/lldb/include/lldb/Utility/Checksum.h b/lldb/include/lldb/Utility/Checksum.h
new file mode 100644
index 000000000000000..90a579b247636ac
--- /dev/null
+++ b/lldb/include/lldb/Utility/Checksum.h
@@ -0,0 +1,36 @@
+//===-- Checksum.h ----------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_UTILITY_CHECKSUM_H
+#define LLDB_UTILITY_CHECKSUM_H
+
+#include "llvm/Support/MD5.h"
+
+namespace lldb_private {
+class Checksum {
+public:
+  static llvm::MD5::MD5Result sentinel;
+
+  Checksum(llvm::MD5::MD5Result md5 = sentinel);
+  Checksum(const Checksum &checksum);
+  Checksum &operator=(const Checksum &checksum);
+
+  explicit operator bool() const;
+  bool operator==(const Checksum &checksum) const;
+  bool operator!=(const Checksum &checksum) const;
+
+  std::string digest() const;
+
+private:
+  void SetMD5(llvm::MD5::MD5Result);
+
+  llvm::MD5::MD5Result m_checksum;
+};
+} // namespace lldb_private
+
+#endif // LLDB_UTILITY_CHECKSUM_H
diff --git a/lldb/include/lldb/Utility/FileSpec.h b/lldb/include/lldb/Utility/FileSpec.h
index f06a8543a056e87..29506b01c56d40b 100644
--- a/lldb/include/lldb/Utility/FileSpec.h
+++ b/lldb/include/lldb/Utility/FileSpec.h
@@ -13,15 +13,18 @@
 #include <optional>
 #include <string>
 
+#include "lldb/Utility/Checksum.h"
 #include "lldb/Utility/ConstString.h"
 
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/MD5.h"
 #include "llvm/Support/Path.h"
 
 #include <cstddef>
 #include <cstdint>
+#include <optional>
 
 namespace lldb_private {
 class Stream;
@@ -72,7 +75,8 @@ class FileSpec {
   ///     The style of the path
   ///
   /// \see FileSpec::SetFile (const char *path)
-  explicit FileSpec(llvm::StringRef path, Style style = Style::native);
+  explicit FileSpec(llvm::StringRef path, Style style = Style::native,
+                    const Checksum &checksum = {});
 
   explicit FileSpec(llvm::StringRef path, const llvm::Triple &triple);
 
@@ -362,7 +366,11 @@ class FileSpec {
   ///
   /// \param[in] style
   ///     The style for the given path.
-  void SetFile(llvm::StringRef path, Style style);
+  ///
+  /// \param[in] checksum
+  ///     The checksum for the given path.
+  void SetFile(llvm::StringRef path, Style style,
+               const Checksum &checksum = {});
 
   /// Change the file specified with a new path.
   ///
@@ -420,6 +428,9 @@ class FileSpec {
   ///   The lifetime of the StringRefs is tied to the lifetime of the FileSpec.
   std::vector<llvm::StringRef> GetComponents() const;
 
+  /// Return the checksum for this FileSpec or all zeros if there is none.
+  const Checksum &GetChecksum() const { return m_checksum; };
+
 protected:
   // Convenience method for setting the file without changing the style.
   void SetFile(llvm::StringRef path);
@@ -427,6 +438,7 @@ class FileSpec {
   /// Called anytime m_directory or m_filename is changed to clear any cached
   /// state in this object.
   void PathWasModified() {
+    m_checksum = Checksum();
     m_is_resolved = false;
     m_absolute = Absolute::Calculate;
   }
@@ -443,6 +455,9 @@ class FileSpec {
   /// The unique'd filename path.
   ConstString m_filename;
 
+  /// The optional MD5 checksum of the file.
+  Checksum m_checksum;
+
   /// True if this path has been resolved.
   mutable bool m_is_resolved = false;
 
diff --git a/lldb/source/Utility/CMakeLists.txt b/lldb/source/Utility/CMakeLists.txt
index 16afab1113a970c..a3b0a405b4133f6 100644
--- a/lldb/source/Utility/CMakeLists.txt
+++ b/lldb/source/Utility/CMakeLists.txt
@@ -29,6 +29,7 @@ add_lldb_library(lldbUtility NO_INTERNAL_DEPENDENCIES
   Args.cpp
   Baton.cpp
   Broadcaster.cpp
+  Checksum.cpp
   CompletionRequest.cpp
   Connection.cpp
   ConstString.cpp
diff --git a/lldb/source/Utility/Checksum.cpp b/lldb/source/Utility/Checksum.cpp
new file mode 100644
index 000000000000000..70167e497a526c4
--- /dev/null
+++ b/lldb/source/Utility/Checksum.cpp
@@ -0,0 +1,46 @@
+//===-- Checksum.cpp ------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lldb/Utility/Checksum.h"
+#include "llvm/ADT/SmallString.h"
+
+using namespace lldb_private;
+
+Checksum::Checksum(llvm::MD5::MD5Result md5) { SetMD5(md5); }
+
+Checksum::Checksum(const Checksum &checksum) { SetMD5(checksum.m_checksum); }
+
+Checksum &Checksum::operator=(const Checksum &checksum) {
+  SetMD5(checksum.m_checksum);
+  return *this;
+}
+
+void Checksum::SetMD5(llvm::MD5::MD5Result md5) {
+  std::uninitialized_copy_n(md5.begin(), 16, m_checksum.begin());
+}
+
+Checksum::operator bool() const {
+  return !std::equal(m_checksum.begin(), m_checksum.end(), sentinel.begin());
+}
+
+bool Checksum::operator==(const Checksum &checksum) const {
+  return std::equal(m_checksum.begin(), m_checksum.end(),
+                    checksum.m_checksum.begin());
+}
+
+bool Checksum::operator!=(const Checksum &checksum) const {
+  return !std::equal(m_checksum.begin(), m_checksum.end(),
+                     checksum.m_checksum.begin());
+}
+
+std::string Checksum::digest() const {
+  return std::string(m_checksum.digest().str());
+}
+
+llvm::MD5::MD5Result Checksum::sentinel = {0, 0, 0, 0, 0, 0, 0, 0,
+                                           0, 0, 0, 0, 0, 0, 0, 0};
diff --git a/lldb/source/Utility/FileSpec.cpp b/lldb/source/Utility/FileSpec.cpp
index eb34ef97cea0b2f..4bbfbb7c1fab5e6 100644
--- a/lldb/source/Utility/FileSpec.cpp
+++ b/lldb/source/Utility/FileSpec.cpp
@@ -68,8 +68,9 @@ void Denormalize(llvm::SmallVectorImpl<char> &path, FileSpec::Style style) {
 FileSpec::FileSpec() : m_style(GetNativeStyle()) {}
 
 // Default constructor that can take an optional full path to a file on disk.
-FileSpec::FileSpec(llvm::StringRef path, Style style) : m_style(style) {
-  SetFile(path, style);
+FileSpec::FileSpec(llvm::StringRef path, Style style, const Checksum &checksum)
+    : m_checksum(checksum), m_style(style) {
+  SetFile(path, style, checksum);
 }
 
 FileSpec::FileSpec(llvm::StringRef path, const llvm::Triple &triple)
@@ -171,9 +172,11 @@ void FileSpec::SetFile(llvm::StringRef pathname) { SetFile(pathname, m_style); }
 // Update the contents of this object with a new path. The path will be split
 // up into a directory and filename and stored as uniqued string values for
 // quick comparison and efficient memory usage.
-void FileSpec::SetFile(llvm::StringRef pathname, Style style) {
+void FileSpec::SetFile(llvm::StringRef pathname, Style style,
+                       const Checksum &checksum) {
   Clear();
   m_style = (style == Style::native) ? GetNativeStyle() : style;
+  m_checksum = checksum;
 
   if (pathname.empty())
     return;
diff --git a/lldb/unittests/Utility/CMakeLists.txt b/lldb/unittests/Utility/CMakeLists.txt
index 5c7003a156813dc..097dae860b15911 100644
--- a/lldb/unittests/Utility/CMakeLists.txt
+++ b/lldb/unittests/Utility/CMakeLists.txt
@@ -4,6 +4,7 @@ add_lldb_unittest(UtilityTests
   OptionsWithRawTest.cpp
   ArchSpecTest.cpp
   BroadcasterTest.cpp
+  ChecksumTest.cpp
   ConstStringTest.cpp
   CompletionRequestTest.cpp
   DataBufferTest.cpp
diff --git a/lldb/unittests/Utility/ChecksumTest.cpp b/lldb/unittests/Utility/ChecksumTest.cpp
new file mode 100644
index 000000000000000..7537d30b5ff5b84
--- /dev/null
+++ b/lldb/unittests/Utility/ChecksumTest.cpp
@@ -0,0 +1,57 @@
+//===-- ChecksumTest.cpp --------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "gtest/gtest.h"
+
+#include "lldb/Utility/Checksum.h"
+
+using namespace lldb_private;
+
+static llvm::MD5::MD5Result hash1 = {0, 1, 2,  3,  4,  5,  6,  7,
+                                     8, 9, 10, 11, 12, 13, 14, 15};
+
+static llvm::MD5::MD5Result hash2 = {0, 1, 2,  3,  4,  5,  6,  7,
+                                     8, 9, 10, 11, 12, 13, 14, 15};
+
+static llvm::MD5::MD5Result hash3 = {8, 9, 10, 11, 12, 13, 14, 15,
+                                     0, 1, 2,  3,  4,  5,  6,  7};
+
+TEST(ChecksumTest, TestConstructor) {
+  Checksum checksum1;
+  EXPECT_FALSE(static_cast<bool>(checksum1));
+  EXPECT_EQ(checksum1, Checksum());
+
+  Checksum checksum2 = Checksum(hash1);
+  EXPECT_EQ(checksum2, Checksum(hash1));
+
+  Checksum checksum3(checksum2);
+  EXPECT_EQ(checksum3, Checksum(hash1));
+}
+
+TEST(ChecksumTest, TestCopyConstructor) {
+  Checksum checksum1;
+  EXPECT_FALSE(static_cast<bool>(checksum1));
+  EXPECT_EQ(checksum1, Checksum());
+
+  Checksum checksum2 = checksum1;
+  EXPECT_EQ(checksum2, checksum1);
+
+  Checksum checksum3(checksum1);
+  EXPECT_EQ(checksum3, checksum1);
+}
+
+TEST(ChecksumTest, TestMD5) {
+  Checksum checksum1(hash1);
+  EXPECT_TRUE(static_cast<bool>(checksum1));
+
+  // Make sure two checksums with the same underlying hashes are the same.
+  EXPECT_EQ(Checksum(hash1), Checksum(hash2));
+
+  // Make sure two checksums with different underlying hashes are different.
+  EXPECT_NE(Checksum(hash1), Checksum(hash3));
+}
diff --git a/lldb/unittests/Utility/FileSpecTest.cpp b/lldb/unittests/Utility/FileSpecTest.cpp
index 2a62f6b1e76120f..ebe7bde7d9c2169 100644
--- a/lldb/unittests/Utility/FileSpecTest.cpp
+++ b/lldb/unittests/Utility/FileSpecTest.cpp
@@ -28,8 +28,8 @@ TEST(FileSpecTest, FileAndDirectoryComponents) {
 
   FileSpec fs_windows("F:\\bar", FileSpec::Style::windows);
   EXPECT_STREQ("F:\\bar", fs_windows.GetPath().c_str());
-  // EXPECT_STREQ("F:\\", fs_windows.GetDirectory().GetPath().c_str()); // It returns
-  // "F:/"
+  // EXPECT_STREQ("F:\\", fs_windows.GetDirectory().GetPath().c_str()); // It
+  // returns "F:/"
   EXPECT_STREQ("bar", fs_windows.GetFilename().GetCString());
 
   FileSpec fs_posix_root("/", FileSpec::Style::posix);
@@ -297,45 +297,18 @@ TEST(FileSpecTest, FormatFileSpec) {
 
 TEST(FileSpecTest, IsRelative) {
   llvm::StringRef not_relative[] = {
-    "/",
-    "/a",
-    "/a/",
-    "/a/b",
-    "/a/b/",
-    "//",
-    "//a/",
-    "//a/b",
-    "//a/b/",
-    "~",
-    "~/",
-    "~/a",
-    "~/a/",
-    "~/a/b",
-    "~/a/b/",
-    "/foo/.",
-    "/foo/..",
-    "/foo/../",
-    "/foo/../.",
+      "/",      "/a",     "/a/",     "/a/b",     "/a/b/",     "//",   "//a/",
+      "//a/b",  "//a/b/", "~",       "~/",       "~/a",       "~/a/", "~/a/b",
+      "~/a/b/", "/foo/.", "/foo/..", "/foo/../", "/foo/../.",
   };
-  for (const auto &path: not_relative) {
+  for (const auto &path : not_relative) {
     SCOPED_TRACE(path);
     EXPECT_FALSE(PosixSpec(path).IsRelative());
   }
   llvm::StringRef is_relative[] = {
-    ".",
-    "./",
-    ".///",
-    "a",
-    "./a",
-    "./a/",
-    "./a/",
-    "./a/b",
-    "./a/b/",
-    "../foo",
-    "foo/bar.c",
-    "./foo/bar.c"
-  };
-  for (const auto &path: is_relative) {
+      ".",    "./",    ".///",   "a",      "./a",       "./a/",
+      "./a/", "./a/b", "./a/b/", "../foo", "foo/bar.c", "./foo/bar.c"};
+  for (const auto &path : is_relative) {
     SCOPED_TRACE(path);
     EXPECT_TRUE(PosixSpec(path).IsRelative());
   }
@@ -421,7 +394,6 @@ TEST(FileSpecTest, Match) {
 
   EXPECT_TRUE(Match("", "/foo/bar"));
   EXPECT_TRUE(Match("", ""));
-
 }
 
 TEST(FileSpecTest, TestAbsoluteCaching) {
@@ -534,3 +506,15 @@ TEST(FileSpecTest, TestGetComponents) {
     EXPECT_EQ(file_spec.GetComponents(), pair.second);
   }
 }
+
+TEST(FileSpecTest, TestChecksum) {
+  Checksum checksum({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15});
+  FileSpec file_spec("/foo/bar", FileSpec::Style::posix, checksum);
+  EXPECT_TRUE(static_cast<bool>(file_spec.GetChecksum()));
+  EXPECT_EQ(file_spec.GetChecksum(), checksum);
+
+  FileSpec copy = file_spec;
+
+  EXPECT_TRUE(static_cast<bool>(copy.GetChecksum()));
+  EXPECT_EQ(file_spec.GetChecksum(), copy.GetChecksum());
+}

@JDevlieghere
Copy link
Member Author

Depends on #71456.

I considered an alternative where we don't store the Checksum in the FileSpec but create a new SupportFile class that wraps a Checksum and a FileSpec. The reason I didn't got with this approach is because of the FileSpecList. The latter has helper functions that are used in the context of "support files" which would have to duplicated in a SupportFileList and I wasn't sure we weren't vending a FileSpecList through the SB API that would now become a SupportFileList.

A checksum is 16 bytes. Please let me know if you think that the overhead of that is too much and outweighs the churn and code duplication to adopt a SupportFile and SupportFileList.

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You pushed a commit from a different PR btw

Comment on lines +78 to +80
explicit FileSpec(llvm::StringRef path, Style style = Style::native,
const Checksum &checksum = {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the doxygen

@medismailben
Copy link
Member

Depends on #71456.

I considered an alternative where we don't store the Checksum in the FileSpec but create a new SupportFile class that wraps a Checksum and a FileSpec. The reason I didn't got with this approach is because of the FileSpecList. The latter has helper functions that are used in the context of "support files" which would have to duplicated in a SupportFileList and I wasn't sure we weren't vending a FileSpecList through the SB API that would now become a SupportFileList.

A checksum is 16 bytes. Please let me know if you think that the overhead of that is too much and outweighs the churn and code duplication to adopt a SupportFile and SupportFileList.

I personally like the current implementation, I think it makes sense to have the Checksum in the FileSpec, I don't see the point of making new SupportFile{,List} classes if it's just wrapping a FileSpec and its Checksum.

Copy link
Member

@medismailben medismailben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments.

#include "lldb/Utility/ConstString.h"

#include "llvm/ADT/StringRef.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/MD5.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you shouldn't need this since it's already included in Checksum.h

protected:
// Convenience method for setting the file without changing the style.
void SetFile(llvm::StringRef path);

/// Called anytime m_directory or m_filename is changed to clear any cached
/// state in this object.
void PathWasModified() {
m_checksum = Checksum();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to recompute the checksum if the path was changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the FileSpec is a system independent path description (i.e. the file doesn't need to exist on the host) there's no way to recompute it automatically, although the user can specify a new checksum if the path changes.

@clayborg
Copy link
Collaborator

clayborg commented Nov 7, 2023

So I debugged lldb with lldb and adding a counter into the FileSpec constructor and came up a max of 721833 files after trying to set a breakpoint in Driver.cpp, so that storage is about 16.5MB which isn't too bad. This goes up to 27.5MB with the FileSpec being 40 bytes, so not too bad. Hopefully this won't overwhelm some larger targets in LLDB memory wise

@medismailben
Copy link
Member

So I debugged lldb with lldb and adding a counter into the FileSpec constructor and came up a max of 721833 files after trying to set a breakpoint in Driver.cpp, so that storage is about 16.5MB which isn't too bad. This goes up to 27.5MB with the FileSpec being 40 bytes, so not too bad. Hopefully this won't overwhelm some larger targets in LLDB memory wise

@clayborg I think it would be worth having that stat for FileSpecs in-tree so everyone can have access to it using statistics dump for instance.

@clayborg
Copy link
Collaborator

clayborg commented Nov 8, 2023

So I debugged lldb with lldb and adding a counter into the FileSpec constructor and came up a max of 721833 files after trying to set a breakpoint in Driver.cpp, so that storage is about 16.5MB which isn't too bad. This goes up to 27.5MB with the FileSpec being 40 bytes, so not too bad. Hopefully this won't overwhelm some larger targets in LLDB memory wise

@clayborg I think it would be worth having that stat for FileSpecs in-tree so everyone can have access to it using statistics dump for instance.

I agree. Easy to add a static variable into FileSpec to do the counting and then hookup to stats. Might be nice to make a mix in class that any internal objects can inherit from that wouldn't change the size of any objects, but on construction would add to a count, and on destruction would decrement. It would be nice to track FileSpec, Symbol and other common objects in a debugger process.

Store a Checksum in FileSpec. Its purpose is to store the MD5 hash that
was added to the DWARF 5 line table.

This increases the size of a FileSpec from 24 to 40 bytes. The
alternative is to introduce a new SupportFile abstraction for a FileSpec
+ Checksum but that would require a corresponding SupportFileList class.
During review we decided that wasn't worth it, but that's something we
can revisit in the future.
@JDevlieghere JDevlieghere merged commit 919f5ef into llvm:main Nov 9, 2023
@JDevlieghere JDevlieghere deleted the ChecksumInFileSpec branch November 9, 2023 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants