Skip to content

[clang][deps][modules] Allocate input file paths lazily #114457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 11, 2024

Conversation

jansvoboda11
Copy link
Contributor

This PR builds on top of #113984 and attempts to avoid allocating input file paths eagerly. Instead, the InputFileInfo type used by ASTReader now only holds StringRefs that point into the PCM file buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way: ModuleDeps now only holds (an owning copy of) the short unresolved input file paths, which get resolved lazily. This can be a big win, I'm seeing up to a 5% speedup.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Oct 31, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2024

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

This PR builds on top of #113984 and attempts to avoid allocating input file paths eagerly. Instead, the InputFileInfo type used by ASTReader now only holds StringRefs that point into the PCM file buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way: ModuleDeps now only holds (an owning copy of) the short unresolved input file paths, which get resolved lazily. This can be a big win, I'm seeing up to a 5% speedup.


Patch is 45.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114457.diff

29 Files Affected:

  • (modified) clang/include/clang/Serialization/ModuleFile.h (+7-2)
  • (modified) clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h (+12-4)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+17-24)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+25-16)
  • (modified) clang/test/ClangScanDeps/diagnostics.c (+1-1)
  • (modified) clang/test/ClangScanDeps/header-search-pruning-transitive.c (+9-9)
  • (modified) clang/test/ClangScanDeps/link-libraries.c (+6-6)
  • (modified) clang/test/ClangScanDeps/modules-context-hash.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-dep-args.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-extern-submodule.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-extern-unrelated.m (+7-7)
  • (modified) clang/test/ClangScanDeps/modules-file-path-isolation.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-fmodule-name-no-module-built.m (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-full-by-mod-name.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-full.cpp (+6-6)
  • (modified) clang/test/ClangScanDeps/modules-implicit-dot-private.m (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-incomplete-umbrella.c (+7-7)
  • (modified) clang/test/ClangScanDeps/modules-inferred.m (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-no-undeclared-includes.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-pch-common-submodule.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-pch-common-via-submodule.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-pch.c (+10-10)
  • (modified) clang/test/ClangScanDeps/modules-priv-fw-from-pub.m (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-redefinition.m (+1-1)
  • (modified) clang/test/ClangScanDeps/modules-symlink-dir-vfs.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-transitive.c (+2-2)
  • (modified) clang/test/ClangScanDeps/optimize-vfs.m (+6-6)
  • (modified) clang/test/ClangScanDeps/removed-args.c (+4-4)
  • (modified) clang/tools/clang-scan-deps/ClangScanDeps.cpp (+3-8)
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 29d3354d07a129..f20cb2f9f35ae8 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -62,8 +62,9 @@ enum ModuleKind {
 
 /// The input file info that has been loaded from an AST file.
 struct InputFileInfo {
-  std::string FilenameAsRequested;
-  std::string Filename;
+  StringRef UnresolvedImportedFilenameAsRequested;
+  StringRef UnresolvedImportedFilename;
+
   uint64_t ContentHash;
   off_t StoredSize;
   time_t StoredTime;
@@ -71,6 +72,10 @@ struct InputFileInfo {
   bool Transient;
   bool TopLevel;
   bool ModuleMap;
+
+  bool isValid() const {
+    return !UnresolvedImportedFilenameAsRequested.empty();
+  }
 };
 
 /// The input file that has been loaded from this AST file, along with
diff --git a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
index 3de19d756da00d..c2071a6eadedd1 100644
--- a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
+++ b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
@@ -120,10 +120,6 @@ struct ModuleDeps {
   /// additionally appear in \c FileDeps as a dependency.
   std::string ClangModuleMapFile;
 
-  /// A collection of absolute paths to files that this module directly depends
-  /// on, not including transitive dependencies.
-  llvm::StringSet<> FileDeps;
-
   /// A collection of absolute paths to module map files that this module needs
   /// to know about. The ordering is significant.
   std::vector<std::string> ModuleMapFileDeps;
@@ -143,13 +139,25 @@ struct ModuleDeps {
   /// an entity from this module is used.
   llvm::SmallVector<Module::LinkLibrary, 2> LinkLibraries;
 
+  /// Invokes \c Cb for all file dependencies of this module. Each provided
+  /// \c StringRef is only valid within the individual callback invocation.
+  void forEachFileDep(llvm::function_ref<void(StringRef)> Cb) const;
+
   /// Get (or compute) the compiler invocation that can be used to build this
   /// module. Does not include argv[0].
   const std::vector<std::string> &getBuildArguments();
 
 private:
+  friend class ModuleDepCollector;
   friend class ModuleDepCollectorPP;
 
+  /// The base directory for relative paths in \c FileDeps.
+  std::string FileDepsBaseDir;
+
+  /// A collection of paths to files that this module directly depends on, not
+  /// including transitive dependencies.
+  std::vector<std::string> FileDeps;
+
   std::variant<std::monostate, CowCompilerInvocation, std::vector<std::string>>
       BuildInfo;
 };
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index d383b4f29a5264..acccf77a5b6cb9 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2473,7 +2473,7 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
     return InputFileInfo();
 
   // If we've already loaded this input file, return it.
-  if (!F.InputFileInfosLoaded[ID - 1].Filename.empty())
+  if (F.InputFileInfosLoaded[ID - 1].isValid())
     return F.InputFileInfosLoaded[ID - 1];
 
   // Go find this input file.
@@ -2510,21 +2510,11 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
   R.Transient = static_cast<bool>(Record[4]);
   R.TopLevel = static_cast<bool>(Record[5]);
   R.ModuleMap = static_cast<bool>(Record[6]);
-  std::tie(R.FilenameAsRequested, R.Filename) = [&]() {
-    uint16_t AsRequestedLength = Record[7];
-
-    StringRef NameAsRequestedRef = Blob.substr(0, AsRequestedLength);
-    StringRef NameRef = Blob.substr(AsRequestedLength);
-
-    std::string NameAsRequested =
-        ResolveImportedPathAndAllocate(PathBuf, NameAsRequestedRef, F);
-    std::string Name = ResolveImportedPathAndAllocate(PathBuf, NameRef, F);
-
-    if (Name.empty())
-      Name = NameAsRequested;
-
-    return std::make_pair(std::move(NameAsRequested), std::move(Name));
-  }();
+  uint16_t AsRequestedLength = Record[7];
+  R.UnresolvedImportedFilenameAsRequested = Blob.substr(0, AsRequestedLength);
+  R.UnresolvedImportedFilename = Blob.substr(AsRequestedLength);
+  if (R.UnresolvedImportedFilename.empty())
+    R.UnresolvedImportedFilename = R.UnresolvedImportedFilenameAsRequested;
 
   Expected<llvm::BitstreamEntry> MaybeEntry = Cursor.advance();
   if (!MaybeEntry) // FIXME this drops errors on the floor.
@@ -2576,7 +2566,8 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
   time_t StoredTime = FI.StoredTime;
   bool Overridden = FI.Overridden;
   bool Transient = FI.Transient;
-  StringRef Filename = FI.FilenameAsRequested;
+  auto Filename =
+      ResolveImportedPath(PathBuf, FI.UnresolvedImportedFilenameAsRequested, F);
   uint64_t StoredContentHash = FI.ContentHash;
 
   // For standard C++ modules, we don't need to check the inputs.
@@ -2592,17 +2583,17 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
     Overridden = false;
   }
 
-  auto File = FileMgr.getOptionalFileRef(Filename, /*OpenFile=*/false);
+  auto File = FileMgr.getOptionalFileRef(*Filename, /*OpenFile=*/false);
 
   // For an overridden file, create a virtual file with the stored
   // size/timestamp.
   if ((Overridden || Transient || SkipChecks) && !File)
-    File = FileMgr.getVirtualFileRef(Filename, StoredSize, StoredTime);
+    File = FileMgr.getVirtualFileRef(*Filename, StoredSize, StoredTime);
 
   if (!File) {
     if (Complain) {
       std::string ErrorStr = "could not find file '";
-      ErrorStr += Filename;
+      ErrorStr += *Filename;
       ErrorStr += "' referenced by AST file '";
       ErrorStr += F.FileName;
       ErrorStr += "'";
@@ -2622,7 +2613,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
   if ((!Overridden && !Transient) && !SkipChecks &&
       SM.isFileOverridden(*File)) {
     if (Complain)
-      Error(diag::err_fe_pch_file_overridden, Filename);
+      Error(diag::err_fe_pch_file_overridden, *Filename);
 
     // After emitting the diagnostic, bypass the overriding file to recover
     // (this creates a separate FileEntry).
@@ -2714,7 +2705,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
       // The top-level PCH is stale.
       StringRef TopLevelPCHName(ImportStack.back()->FileName);
       Diag(diag::err_fe_ast_file_modified)
-          << Filename << moduleKindForDiagnostic(ImportStack.back()->Kind)
+          << *Filename << moduleKindForDiagnostic(ImportStack.back()->Kind)
           << TopLevelPCHName << FileChange.Kind
           << (FileChange.Old && FileChange.New)
           << llvm::itostr(FileChange.Old.value_or(0))
@@ -2723,7 +2714,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
       // Print the import stack.
       if (ImportStack.size() > 1) {
         Diag(diag::note_pch_required_by)
-          << Filename << ImportStack[0]->FileName;
+          << *Filename << ImportStack[0]->FileName;
         for (unsigned I = 1; I < ImportStack.size(); ++I)
           Diag(diag::note_pch_required_by)
             << ImportStack[I-1]->FileName << ImportStack[I]->FileName;
@@ -2971,8 +2962,10 @@ ASTReader::ReadControlBlock(ModuleFile &F,
         for (unsigned I = 0; I < N; ++I) {
           bool IsSystem = I >= NumUserInputs;
           InputFileInfo FI = getInputFileInfo(F, I + 1);
+          auto FilenameAsRequested = ResolveImportedPath(
+              PathBuf, FI.UnresolvedImportedFilenameAsRequested, F);
           Listener->visitInputFile(
-              FI.FilenameAsRequested, IsSystem, FI.Overridden,
+              *FilenameAsRequested, IsSystem, FI.Overridden,
               F.Kind == MK_ExplicitModule || F.Kind == MK_PrebuiltModule);
         }
       }
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 637416cd1fc621..2e97cac0796cee 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -21,6 +21,16 @@ using namespace clang;
 using namespace tooling;
 using namespace dependencies;
 
+void ModuleDeps::forEachFileDep(llvm::function_ref<void(StringRef)> Cb) const {
+  SmallString<0> PathBuf;
+  PathBuf.reserve(256);
+  for (StringRef FileDep : FileDeps) {
+    auto ResolvedFileDep =
+        ASTReader::ResolveImportedPath(PathBuf, FileDep, FileDepsBaseDir);
+    Cb(*ResolvedFileDep);
+  }
+}
+
 const std::vector<std::string> &ModuleDeps::getBuildArguments() {
   assert(!std::holds_alternative<std::monostate>(BuildInfo) &&
          "Using uninitialized ModuleDeps");
@@ -596,6 +606,7 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   serialization::ModuleFile *MF =
       MDC.ScanInstance.getASTReader()->getModuleManager().lookup(
           *M->getASTFile());
+  MD.FileDepsBaseDir = MF->BaseDirectory;
   MDC.ScanInstance.getASTReader()->visitInputFileInfos(
       *MF, /*IncludeSystem=*/true,
       [&](const serialization::InputFileInfo &IFI, bool IsSystem) {
@@ -604,9 +615,9 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
         // actual on-disk module map file that allowed inferring the module,
         // which is what we need for building the module explicitly
         // Let's ignore this file.
-        if (StringRef(IFI.Filename).ends_with("__inferred_module.map"))
+        if (IFI.UnresolvedImportedFilename.ends_with("__inferred_module.map"))
           return;
-        MDC.addFileDep(MD, IFI.Filename);
+        MDC.addFileDep(MD, IFI.UnresolvedImportedFilename);
       });
 
   llvm::DenseSet<const Module *> SeenDeps;
@@ -614,15 +625,20 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   addAllSubmoduleDeps(M, MD, SeenDeps);
   addAllAffectingClangModules(M, MD, SeenDeps);
 
+  SmallString<0> PathBuf;
+  PathBuf.reserve(256);
   MDC.ScanInstance.getASTReader()->visitInputFileInfos(
       *MF, /*IncludeSystem=*/true,
       [&](const serialization::InputFileInfo &IFI, bool IsSystem) {
         if (!(IFI.TopLevel && IFI.ModuleMap))
           return;
-        if (StringRef(IFI.FilenameAsRequested)
-                .ends_with("__inferred_module.map"))
+        if (IFI.UnresolvedImportedFilenameAsRequested.ends_with(
+                "__inferred_module.map"))
           return;
-        MD.ModuleMapFileDeps.emplace_back(IFI.FilenameAsRequested);
+        auto ResolvedFilenameAsRequested = ASTReader::ResolveImportedPath(
+            PathBuf, IFI.UnresolvedImportedFilenameAsRequested,
+            MF->BaseDirectory);
+        MD.ModuleMapFileDeps.emplace_back(*ResolvedFilenameAsRequested);
       });
 
   CowCompilerInvocation CI =
@@ -779,23 +795,16 @@ static StringRef makeAbsoluteAndPreferred(CompilerInstance &CI, StringRef Path,
 void ModuleDepCollector::addFileDep(StringRef Path) {
   if (IsStdModuleP1689Format) {
     // Within P1689 format, we don't want all the paths to be absolute path
-    // since it may violate the tranditional make style dependencies info.
-    FileDeps.push_back(std::string(Path));
+    // since it may violate the traditional make style dependencies info.
+    FileDeps.emplace_back(Path);
     return;
   }
 
   llvm::SmallString<256> Storage;
   Path = makeAbsoluteAndPreferred(ScanInstance, Path, Storage);
-  FileDeps.push_back(std::string(Path));
+  FileDeps.emplace_back(Path);
 }
 
 void ModuleDepCollector::addFileDep(ModuleDeps &MD, StringRef Path) {
-  if (IsStdModuleP1689Format) {
-    MD.FileDeps.insert(Path);
-    return;
-  }
-
-  llvm::SmallString<256> Storage;
-  Path = makeAbsoluteAndPreferred(ScanInstance, Path, Storage);
-  MD.FileDeps.insert(Path);
+  MD.FileDeps.emplace_back(Path);
 }
diff --git a/clang/test/ClangScanDeps/diagnostics.c b/clang/test/ClangScanDeps/diagnostics.c
index bc05594f7828dc..8e3cf4c9f9fa6d 100644
--- a/clang/test/ClangScanDeps/diagnostics.c
+++ b/clang/test/ClangScanDeps/diagnostics.c
@@ -34,8 +34,8 @@ module mod { header "mod.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_MOD:.*]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
diff --git a/clang/test/ClangScanDeps/header-search-pruning-transitive.c b/clang/test/ClangScanDeps/header-search-pruning-transitive.c
index 6c422650902689..1e829bb02ddc4a 100644
--- a/clang/test/ClangScanDeps/header-search-pruning-transitive.c
+++ b/clang/test/ClangScanDeps/header-search-pruning-transitive.c
@@ -72,8 +72,8 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_X:.*]]",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/X.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/X.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "X"
@@ -85,11 +85,11 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_Y_WITH_A]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Y.h",
-// CHECK-NEXT:         "[[PREFIX]]/a/a.h",
 // CHECK-NEXT:         "[[PREFIX]]/begin/begin.h",
-// CHECK-NEXT:         "[[PREFIX]]/end/end.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/a/a.h",
+// CHECK-NEXT:         "[[PREFIX]]/end/end.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Y"
@@ -128,8 +128,8 @@ module X { header "X.h" }
 // also has a different context hash from the first version of module X.
 // CHECK-NOT:        "context-hash": "[[HASH_X]]",
 // CHECK:            "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/X.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/X.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "X"
@@ -141,10 +141,10 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_Y_WITHOUT_A]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Y.h",
 // CHECK-NEXT:         "[[PREFIX]]/begin/begin.h",
-// CHECK-NEXT:         "[[PREFIX]]/end/end.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/end/end.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Y"
diff --git a/clang/test/ClangScanDeps/link-libraries.c b/clang/test/ClangScanDeps/link-libraries.c
index bc0b0c546ea032..cc2e223102024b 100644
--- a/clang/test/ClangScanDeps/link-libraries.c
+++ b/clang/test/ClangScanDeps/link-libraries.c
@@ -44,8 +44,8 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/Framework.framework/Headers/Framework.h"
-// CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [
 // CHECK-NEXT:         {
@@ -67,8 +67,8 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/direct.h"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "direct"
@@ -89,10 +89,10 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/root.h",
+// CHECK-NEXT:         "[[PREFIX]]/root/textual.h",
 // CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
-// CHECK-NEXT:         "[[PREFIX]]/root.h"
-// CHECK-NEXT:         "[[PREFIX]]/root/textual.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "root"
@@ -104,7 +104,7 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/transitive.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [
diff --git a/clang/test/ClangScanDeps/modules-context-hash.c b/clang/test/ClangScanDeps/modules-context-hash.c
index 69fd8561a2b32a..9489563576d3b1 100644
--- a/clang/test/ClangScanDeps/modules-context-hash.c
+++ b/clang/test/ClangScanDeps/modules-context-hash.c
@@ -34,9 +34,9 @@
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_MOD_A:.*]]",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/a/dep.h",
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/a/dep.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
@@ -72,9 +72,9 @@
 // CHECK:            ],
 // CHECK-NOT:        "context-hash": "[[HASH_MOD_A]]",
 // CHECK:            "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/b/dep.h",
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/b/dep.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
diff --git a/clang/test/ClangScanDeps/modules-dep-args.c b/clang/test/ClangScanDeps/modules-dep-args.c
index 14de2e8f4594a1..19f915923b84cf 100644
--- a/clang/test/ClangScanDeps/modules-dep-args.c
+++ b/clang/test/ClangScanDeps/modules-dep-args.c
@@ -58,8 +58,8 @@ module Direct { header "direct.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/direct.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/direct.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Direct"
diff --git a/clang/test/ClangScanDeps/modules-extern-submodule.c b/clang/test/ClangScanDeps/modules-extern-submodule.c
index 6e9082b0165fdf..01d3d6ba5e0d3a 100644
--- a/clang/test/ClangScanDeps/modules-extern-submodule.c
+++ b/clang/test/ClangScanDeps/modules-extern-submodule.c
@@ -49,8 +49,8 @@ module third {}
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/first/first/first.h",
 // CHECK-NEXT:         "[[PREFIX]]/first/first/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/first/first/first.h",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/sub.modulemap"
 // CHECK-NEXT:       ],
@@ -73,8 +73,8 @@ module third {}
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
 // CHECK-NEXT:         "[[PREFIX]]/second/second/module.modulemap",
-// CHECK-NEXT:         "[[PREFIX]]/second/second/sub.h",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/sub.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/second/second/sub.h",
 // CHECK-NEXT:         "[[PREFIX]]/third/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
diff --git a/clang/test/ClangScanDeps/modules-extern-unre...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2024

@llvm/pr-subscribers-clang-modules

Author: Jan Svoboda (jansvoboda11)

Changes

This PR builds on top of #113984 and attempts to avoid allocating input file paths eagerly. Instead, the InputFileInfo type used by ASTReader now only holds StringRefs that point into the PCM file buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way: ModuleDeps now only holds (an owning copy of) the short unresolved input file paths, which get resolved lazily. This can be a big win, I'm seeing up to a 5% speedup.


Patch is 45.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114457.diff

29 Files Affected:

  • (modified) clang/include/clang/Serialization/ModuleFile.h (+7-2)
  • (modified) clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h (+12-4)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+17-24)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+25-16)
  • (modified) clang/test/ClangScanDeps/diagnostics.c (+1-1)
  • (modified) clang/test/ClangScanDeps/header-search-pruning-transitive.c (+9-9)
  • (modified) clang/test/ClangScanDeps/link-libraries.c (+6-6)
  • (modified) clang/test/ClangScanDeps/modules-context-hash.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-dep-args.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-extern-submodule.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-extern-unrelated.m (+7-7)
  • (modified) clang/test/ClangScanDeps/modules-file-path-isolation.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-fmodule-name-no-module-built.m (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-full-by-mod-name.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-full.cpp (+6-6)
  • (modified) clang/test/ClangScanDeps/modules-implicit-dot-private.m (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-incomplete-umbrella.c (+7-7)
  • (modified) clang/test/ClangScanDeps/modules-inferred.m (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-no-undeclared-includes.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-pch-common-submodule.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-pch-common-via-submodule.c (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-pch.c (+10-10)
  • (modified) clang/test/ClangScanDeps/modules-priv-fw-from-pub.m (+4-4)
  • (modified) clang/test/ClangScanDeps/modules-redefinition.m (+1-1)
  • (modified) clang/test/ClangScanDeps/modules-symlink-dir-vfs.c (+2-2)
  • (modified) clang/test/ClangScanDeps/modules-transitive.c (+2-2)
  • (modified) clang/test/ClangScanDeps/optimize-vfs.m (+6-6)
  • (modified) clang/test/ClangScanDeps/removed-args.c (+4-4)
  • (modified) clang/tools/clang-scan-deps/ClangScanDeps.cpp (+3-8)
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 29d3354d07a129..f20cb2f9f35ae8 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -62,8 +62,9 @@ enum ModuleKind {
 
 /// The input file info that has been loaded from an AST file.
 struct InputFileInfo {
-  std::string FilenameAsRequested;
-  std::string Filename;
+  StringRef UnresolvedImportedFilenameAsRequested;
+  StringRef UnresolvedImportedFilename;
+
   uint64_t ContentHash;
   off_t StoredSize;
   time_t StoredTime;
@@ -71,6 +72,10 @@ struct InputFileInfo {
   bool Transient;
   bool TopLevel;
   bool ModuleMap;
+
+  bool isValid() const {
+    return !UnresolvedImportedFilenameAsRequested.empty();
+  }
 };
 
 /// The input file that has been loaded from this AST file, along with
diff --git a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
index 3de19d756da00d..c2071a6eadedd1 100644
--- a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
+++ b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
@@ -120,10 +120,6 @@ struct ModuleDeps {
   /// additionally appear in \c FileDeps as a dependency.
   std::string ClangModuleMapFile;
 
-  /// A collection of absolute paths to files that this module directly depends
-  /// on, not including transitive dependencies.
-  llvm::StringSet<> FileDeps;
-
   /// A collection of absolute paths to module map files that this module needs
   /// to know about. The ordering is significant.
   std::vector<std::string> ModuleMapFileDeps;
@@ -143,13 +139,25 @@ struct ModuleDeps {
   /// an entity from this module is used.
   llvm::SmallVector<Module::LinkLibrary, 2> LinkLibraries;
 
+  /// Invokes \c Cb for all file dependencies of this module. Each provided
+  /// \c StringRef is only valid within the individual callback invocation.
+  void forEachFileDep(llvm::function_ref<void(StringRef)> Cb) const;
+
   /// Get (or compute) the compiler invocation that can be used to build this
   /// module. Does not include argv[0].
   const std::vector<std::string> &getBuildArguments();
 
 private:
+  friend class ModuleDepCollector;
   friend class ModuleDepCollectorPP;
 
+  /// The base directory for relative paths in \c FileDeps.
+  std::string FileDepsBaseDir;
+
+  /// A collection of paths to files that this module directly depends on, not
+  /// including transitive dependencies.
+  std::vector<std::string> FileDeps;
+
   std::variant<std::monostate, CowCompilerInvocation, std::vector<std::string>>
       BuildInfo;
 };
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index d383b4f29a5264..acccf77a5b6cb9 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -2473,7 +2473,7 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
     return InputFileInfo();
 
   // If we've already loaded this input file, return it.
-  if (!F.InputFileInfosLoaded[ID - 1].Filename.empty())
+  if (F.InputFileInfosLoaded[ID - 1].isValid())
     return F.InputFileInfosLoaded[ID - 1];
 
   // Go find this input file.
@@ -2510,21 +2510,11 @@ InputFileInfo ASTReader::getInputFileInfo(ModuleFile &F, unsigned ID) {
   R.Transient = static_cast<bool>(Record[4]);
   R.TopLevel = static_cast<bool>(Record[5]);
   R.ModuleMap = static_cast<bool>(Record[6]);
-  std::tie(R.FilenameAsRequested, R.Filename) = [&]() {
-    uint16_t AsRequestedLength = Record[7];
-
-    StringRef NameAsRequestedRef = Blob.substr(0, AsRequestedLength);
-    StringRef NameRef = Blob.substr(AsRequestedLength);
-
-    std::string NameAsRequested =
-        ResolveImportedPathAndAllocate(PathBuf, NameAsRequestedRef, F);
-    std::string Name = ResolveImportedPathAndAllocate(PathBuf, NameRef, F);
-
-    if (Name.empty())
-      Name = NameAsRequested;
-
-    return std::make_pair(std::move(NameAsRequested), std::move(Name));
-  }();
+  uint16_t AsRequestedLength = Record[7];
+  R.UnresolvedImportedFilenameAsRequested = Blob.substr(0, AsRequestedLength);
+  R.UnresolvedImportedFilename = Blob.substr(AsRequestedLength);
+  if (R.UnresolvedImportedFilename.empty())
+    R.UnresolvedImportedFilename = R.UnresolvedImportedFilenameAsRequested;
 
   Expected<llvm::BitstreamEntry> MaybeEntry = Cursor.advance();
   if (!MaybeEntry) // FIXME this drops errors on the floor.
@@ -2576,7 +2566,8 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
   time_t StoredTime = FI.StoredTime;
   bool Overridden = FI.Overridden;
   bool Transient = FI.Transient;
-  StringRef Filename = FI.FilenameAsRequested;
+  auto Filename =
+      ResolveImportedPath(PathBuf, FI.UnresolvedImportedFilenameAsRequested, F);
   uint64_t StoredContentHash = FI.ContentHash;
 
   // For standard C++ modules, we don't need to check the inputs.
@@ -2592,17 +2583,17 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
     Overridden = false;
   }
 
-  auto File = FileMgr.getOptionalFileRef(Filename, /*OpenFile=*/false);
+  auto File = FileMgr.getOptionalFileRef(*Filename, /*OpenFile=*/false);
 
   // For an overridden file, create a virtual file with the stored
   // size/timestamp.
   if ((Overridden || Transient || SkipChecks) && !File)
-    File = FileMgr.getVirtualFileRef(Filename, StoredSize, StoredTime);
+    File = FileMgr.getVirtualFileRef(*Filename, StoredSize, StoredTime);
 
   if (!File) {
     if (Complain) {
       std::string ErrorStr = "could not find file '";
-      ErrorStr += Filename;
+      ErrorStr += *Filename;
       ErrorStr += "' referenced by AST file '";
       ErrorStr += F.FileName;
       ErrorStr += "'";
@@ -2622,7 +2613,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
   if ((!Overridden && !Transient) && !SkipChecks &&
       SM.isFileOverridden(*File)) {
     if (Complain)
-      Error(diag::err_fe_pch_file_overridden, Filename);
+      Error(diag::err_fe_pch_file_overridden, *Filename);
 
     // After emitting the diagnostic, bypass the overriding file to recover
     // (this creates a separate FileEntry).
@@ -2714,7 +2705,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
       // The top-level PCH is stale.
       StringRef TopLevelPCHName(ImportStack.back()->FileName);
       Diag(diag::err_fe_ast_file_modified)
-          << Filename << moduleKindForDiagnostic(ImportStack.back()->Kind)
+          << *Filename << moduleKindForDiagnostic(ImportStack.back()->Kind)
           << TopLevelPCHName << FileChange.Kind
           << (FileChange.Old && FileChange.New)
           << llvm::itostr(FileChange.Old.value_or(0))
@@ -2723,7 +2714,7 @@ InputFile ASTReader::getInputFile(ModuleFile &F, unsigned ID, bool Complain) {
       // Print the import stack.
       if (ImportStack.size() > 1) {
         Diag(diag::note_pch_required_by)
-          << Filename << ImportStack[0]->FileName;
+          << *Filename << ImportStack[0]->FileName;
         for (unsigned I = 1; I < ImportStack.size(); ++I)
           Diag(diag::note_pch_required_by)
             << ImportStack[I-1]->FileName << ImportStack[I]->FileName;
@@ -2971,8 +2962,10 @@ ASTReader::ReadControlBlock(ModuleFile &F,
         for (unsigned I = 0; I < N; ++I) {
           bool IsSystem = I >= NumUserInputs;
           InputFileInfo FI = getInputFileInfo(F, I + 1);
+          auto FilenameAsRequested = ResolveImportedPath(
+              PathBuf, FI.UnresolvedImportedFilenameAsRequested, F);
           Listener->visitInputFile(
-              FI.FilenameAsRequested, IsSystem, FI.Overridden,
+              *FilenameAsRequested, IsSystem, FI.Overridden,
               F.Kind == MK_ExplicitModule || F.Kind == MK_PrebuiltModule);
         }
       }
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 637416cd1fc621..2e97cac0796cee 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -21,6 +21,16 @@ using namespace clang;
 using namespace tooling;
 using namespace dependencies;
 
+void ModuleDeps::forEachFileDep(llvm::function_ref<void(StringRef)> Cb) const {
+  SmallString<0> PathBuf;
+  PathBuf.reserve(256);
+  for (StringRef FileDep : FileDeps) {
+    auto ResolvedFileDep =
+        ASTReader::ResolveImportedPath(PathBuf, FileDep, FileDepsBaseDir);
+    Cb(*ResolvedFileDep);
+  }
+}
+
 const std::vector<std::string> &ModuleDeps::getBuildArguments() {
   assert(!std::holds_alternative<std::monostate>(BuildInfo) &&
          "Using uninitialized ModuleDeps");
@@ -596,6 +606,7 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   serialization::ModuleFile *MF =
       MDC.ScanInstance.getASTReader()->getModuleManager().lookup(
           *M->getASTFile());
+  MD.FileDepsBaseDir = MF->BaseDirectory;
   MDC.ScanInstance.getASTReader()->visitInputFileInfos(
       *MF, /*IncludeSystem=*/true,
       [&](const serialization::InputFileInfo &IFI, bool IsSystem) {
@@ -604,9 +615,9 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
         // actual on-disk module map file that allowed inferring the module,
         // which is what we need for building the module explicitly
         // Let's ignore this file.
-        if (StringRef(IFI.Filename).ends_with("__inferred_module.map"))
+        if (IFI.UnresolvedImportedFilename.ends_with("__inferred_module.map"))
           return;
-        MDC.addFileDep(MD, IFI.Filename);
+        MDC.addFileDep(MD, IFI.UnresolvedImportedFilename);
       });
 
   llvm::DenseSet<const Module *> SeenDeps;
@@ -614,15 +625,20 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   addAllSubmoduleDeps(M, MD, SeenDeps);
   addAllAffectingClangModules(M, MD, SeenDeps);
 
+  SmallString<0> PathBuf;
+  PathBuf.reserve(256);
   MDC.ScanInstance.getASTReader()->visitInputFileInfos(
       *MF, /*IncludeSystem=*/true,
       [&](const serialization::InputFileInfo &IFI, bool IsSystem) {
         if (!(IFI.TopLevel && IFI.ModuleMap))
           return;
-        if (StringRef(IFI.FilenameAsRequested)
-                .ends_with("__inferred_module.map"))
+        if (IFI.UnresolvedImportedFilenameAsRequested.ends_with(
+                "__inferred_module.map"))
           return;
-        MD.ModuleMapFileDeps.emplace_back(IFI.FilenameAsRequested);
+        auto ResolvedFilenameAsRequested = ASTReader::ResolveImportedPath(
+            PathBuf, IFI.UnresolvedImportedFilenameAsRequested,
+            MF->BaseDirectory);
+        MD.ModuleMapFileDeps.emplace_back(*ResolvedFilenameAsRequested);
       });
 
   CowCompilerInvocation CI =
@@ -779,23 +795,16 @@ static StringRef makeAbsoluteAndPreferred(CompilerInstance &CI, StringRef Path,
 void ModuleDepCollector::addFileDep(StringRef Path) {
   if (IsStdModuleP1689Format) {
     // Within P1689 format, we don't want all the paths to be absolute path
-    // since it may violate the tranditional make style dependencies info.
-    FileDeps.push_back(std::string(Path));
+    // since it may violate the traditional make style dependencies info.
+    FileDeps.emplace_back(Path);
     return;
   }
 
   llvm::SmallString<256> Storage;
   Path = makeAbsoluteAndPreferred(ScanInstance, Path, Storage);
-  FileDeps.push_back(std::string(Path));
+  FileDeps.emplace_back(Path);
 }
 
 void ModuleDepCollector::addFileDep(ModuleDeps &MD, StringRef Path) {
-  if (IsStdModuleP1689Format) {
-    MD.FileDeps.insert(Path);
-    return;
-  }
-
-  llvm::SmallString<256> Storage;
-  Path = makeAbsoluteAndPreferred(ScanInstance, Path, Storage);
-  MD.FileDeps.insert(Path);
+  MD.FileDeps.emplace_back(Path);
 }
diff --git a/clang/test/ClangScanDeps/diagnostics.c b/clang/test/ClangScanDeps/diagnostics.c
index bc05594f7828dc..8e3cf4c9f9fa6d 100644
--- a/clang/test/ClangScanDeps/diagnostics.c
+++ b/clang/test/ClangScanDeps/diagnostics.c
@@ -34,8 +34,8 @@ module mod { header "mod.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_MOD:.*]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
diff --git a/clang/test/ClangScanDeps/header-search-pruning-transitive.c b/clang/test/ClangScanDeps/header-search-pruning-transitive.c
index 6c422650902689..1e829bb02ddc4a 100644
--- a/clang/test/ClangScanDeps/header-search-pruning-transitive.c
+++ b/clang/test/ClangScanDeps/header-search-pruning-transitive.c
@@ -72,8 +72,8 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_X:.*]]",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/X.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/X.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "X"
@@ -85,11 +85,11 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_Y_WITH_A]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Y.h",
-// CHECK-NEXT:         "[[PREFIX]]/a/a.h",
 // CHECK-NEXT:         "[[PREFIX]]/begin/begin.h",
-// CHECK-NEXT:         "[[PREFIX]]/end/end.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/a/a.h",
+// CHECK-NEXT:         "[[PREFIX]]/end/end.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Y"
@@ -128,8 +128,8 @@ module X { header "X.h" }
 // also has a different context hash from the first version of module X.
 // CHECK-NOT:        "context-hash": "[[HASH_X]]",
 // CHECK:            "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/X.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/X.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "X"
@@ -141,10 +141,10 @@ module X { header "X.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_Y_WITHOUT_A]]",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Y.h",
 // CHECK-NEXT:         "[[PREFIX]]/begin/begin.h",
-// CHECK-NEXT:         "[[PREFIX]]/end/end.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/end/end.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Y"
diff --git a/clang/test/ClangScanDeps/link-libraries.c b/clang/test/ClangScanDeps/link-libraries.c
index bc0b0c546ea032..cc2e223102024b 100644
--- a/clang/test/ClangScanDeps/link-libraries.c
+++ b/clang/test/ClangScanDeps/link-libraries.c
@@ -44,8 +44,8 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/Framework.framework/Headers/Framework.h"
-// CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [
 // CHECK-NEXT:         {
@@ -67,8 +67,8 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/direct.h"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "direct"
@@ -89,10 +89,10 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/root.h",
+// CHECK-NEXT:         "[[PREFIX]]/root/textual.h",
 // CHECK-NEXT:         "[[PREFIX]]/Inputs/frameworks/module.modulemap"
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
-// CHECK-NEXT:         "[[PREFIX]]/root.h"
-// CHECK-NEXT:         "[[PREFIX]]/root/textual.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "root"
@@ -104,7 +104,7 @@ module transitive {
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/transitive.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [
diff --git a/clang/test/ClangScanDeps/modules-context-hash.c b/clang/test/ClangScanDeps/modules-context-hash.c
index 69fd8561a2b32a..9489563576d3b1 100644
--- a/clang/test/ClangScanDeps/modules-context-hash.c
+++ b/clang/test/ClangScanDeps/modules-context-hash.c
@@ -34,9 +34,9 @@
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "[[HASH_MOD_A:.*]]",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/a/dep.h",
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/a/dep.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
@@ -72,9 +72,9 @@
 // CHECK:            ],
 // CHECK-NOT:        "context-hash": "[[HASH_MOD_A]]",
 // CHECK:            "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/b/dep.h",
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/mod.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/b/dep.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "mod"
diff --git a/clang/test/ClangScanDeps/modules-dep-args.c b/clang/test/ClangScanDeps/modules-dep-args.c
index 14de2e8f4594a1..19f915923b84cf 100644
--- a/clang/test/ClangScanDeps/modules-dep-args.c
+++ b/clang/test/ClangScanDeps/modules-dep-args.c
@@ -58,8 +58,8 @@ module Direct { header "direct.h" }
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/direct.h",
-// CHECK-NEXT:         "[[PREFIX]]/module.modulemap"
+// CHECK-NEXT:         "[[PREFIX]]/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/direct.h"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
 // CHECK-NEXT:       "name": "Direct"
diff --git a/clang/test/ClangScanDeps/modules-extern-submodule.c b/clang/test/ClangScanDeps/modules-extern-submodule.c
index 6e9082b0165fdf..01d3d6ba5e0d3a 100644
--- a/clang/test/ClangScanDeps/modules-extern-submodule.c
+++ b/clang/test/ClangScanDeps/modules-extern-submodule.c
@@ -49,8 +49,8 @@ module third {}
 // CHECK:            ],
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
-// CHECK-NEXT:         "[[PREFIX]]/first/first/first.h",
 // CHECK-NEXT:         "[[PREFIX]]/first/first/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/first/first/first.h",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/module.modulemap",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/sub.modulemap"
 // CHECK-NEXT:       ],
@@ -73,8 +73,8 @@ module third {}
 // CHECK-NEXT:       "context-hash": "{{.*}}",
 // CHECK-NEXT:       "file-deps": [
 // CHECK-NEXT:         "[[PREFIX]]/second/second/module.modulemap",
-// CHECK-NEXT:         "[[PREFIX]]/second/second/sub.h",
 // CHECK-NEXT:         "[[PREFIX]]/second/second/sub.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/second/second/sub.h",
 // CHECK-NEXT:         "[[PREFIX]]/third/module.modulemap"
 // CHECK-NEXT:       ],
 // CHECK-NEXT:       "link-libraries": [],
diff --git a/clang/test/ClangScanDeps/modules-extern-unre...
[truncated]

Copy link

github-actions bot commented Oct 31, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jansvoboda11
Copy link
Contributor Author

Ping.

@jansvoboda11 jansvoboda11 merged commit 9d4837f into llvm:main Nov 11, 2024
8 checks passed
@jansvoboda11 jansvoboda11 deleted the file-dep-optimization branch November 11, 2024 17:46
jansvoboda11 added a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2024
jansvoboda11 added a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2024
…rder

[clang][deps] Adjust tests after upstream PR llvm#114457
jansvoboda11 added a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2024
This PR builds on top of llvm#113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.

(cherry picked from commit 9d4837f)
jansvoboda11 added a commit to swiftlang/llvm-project that referenced this pull request Nov 11, 2024
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
This PR builds on top of llvm#113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants