-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[clang module] Current Working Directory Pruning #124786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang Author: Qiongsi Wu (qiongsiwu) ChangesWhen computing the context hash, Specifically, Full diff: https://github.com/llvm/llvm-project/pull/124786.diff 5 Files Affected:
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
index 4a343f2872d8d9..9ad8e68c33eb10 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
@@ -63,7 +63,10 @@ enum class ScanningOptimizations {
/// Canonicalize -D and -U options.
Macros = 8,
- DSS_LAST_BITMASK_ENUM(Macros),
+ /// Ignore the compiler's working directory if it is safe.
+ IgnoreCWD = 0x10,
+
+ DSS_LAST_BITMASK_ENUM(IgnoreCWD),
Default = All
};
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 2e97cac0796cee..714efb86fa3796 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -397,9 +397,92 @@ void ModuleDepCollector::applyDiscoveredDependencies(CompilerInvocation &CI) {
}
}
+static bool isSafeToIgnoreCWD(const CowCompilerInvocation &CI) {
+ // Check if the command line input uses relative paths.
+ // It is not safe to ignore the current working directory if any of the
+ // command line inputs use relative paths.
+#define IF_RELATIVE_RETURN_FALSE(PATH) \
+ do { \
+ if (!PATH.empty() && !llvm::sys::path::is_absolute(PATH)) \
+ return false; \
+ } while (0)
+
+#define IF_ANY_RELATIVE_RETURN_FALSE(PATHS) \
+ do { \
+ if (std::any_of(PATHS.begin(), PATHS.end(), [](const auto &P) { \
+ return !P.empty() && !llvm::sys::path::is_absolute(P); \
+ })) \
+ return false; \
+ } while (0)
+
+ // Header search paths.
+ const auto &HeaderSearchOpts = CI.getHeaderSearchOpts();
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.Sysroot);
+ for (auto &Entry : HeaderSearchOpts.UserEntries)
+ if (Entry.IgnoreSysRoot)
+ IF_RELATIVE_RETURN_FALSE(Entry.Path);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ResourceDir);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ModuleCachePath);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ModuleUserBuildPath);
+ for (auto I = HeaderSearchOpts.PrebuiltModuleFiles.begin(),
+ E = HeaderSearchOpts.PrebuiltModuleFiles.end();
+ I != E;) {
+ auto Current = I++;
+ IF_RELATIVE_RETURN_FALSE(Current->second);
+ }
+ IF_ANY_RELATIVE_RETURN_FALSE(HeaderSearchOpts.PrebuiltModulePaths);
+ IF_ANY_RELATIVE_RETURN_FALSE(HeaderSearchOpts.VFSOverlayFiles);
+
+ // Preprocessor options.
+ const auto &PPOpts = CI.getPreprocessorOpts();
+ IF_ANY_RELATIVE_RETURN_FALSE(PPOpts.MacroIncludes);
+ IF_ANY_RELATIVE_RETURN_FALSE(PPOpts.Includes);
+ IF_RELATIVE_RETURN_FALSE(PPOpts.ImplicitPCHInclude);
+
+ // Frontend options.
+ const auto &FrontendOpts = CI.getFrontendOpts();
+ for (const FrontendInputFile &Input : FrontendOpts.Inputs) {
+ if (Input.isBuffer())
+ continue; // FIXME: Can this happen when parsing command-line?
+
+ IF_RELATIVE_RETURN_FALSE(Input.getFile());
+ }
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.CodeCompletionAt.FileName);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModuleMapFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModuleFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModulesEmbedFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ASTMergeFiles);
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.OverrideRecordLayoutsFile);
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.StatsFile);
+
+ // Filesystem options.
+ const auto &FileSystemOpts = CI.getFileSystemOpts();
+ IF_RELATIVE_RETURN_FALSE(FileSystemOpts.WorkingDir);
+
+ // Codegen options.
+ const auto &CodeGenOpts = CI.getCodeGenOpts();
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.DebugCompilationDir);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.CoverageCompilationDir);
+
+ // Sanitizer options.
+ IF_ANY_RELATIVE_RETURN_FALSE(CI.getLangOpts().NoSanitizeFiles);
+
+ // Coverage mappings.
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.ProfileInstrumentUsePath);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.SampleProfileFile);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.ProfileRemappingFile);
+
+ // Dependency output options.
+ for (auto &ExtraDep : CI.getDependencyOutputOpts().ExtraDeps)
+ IF_RELATIVE_RETURN_FALSE(ExtraDep.first);
+
+ return true;
+}
+
static std::string getModuleContextHash(const ModuleDeps &MD,
const CowCompilerInvocation &CI,
bool EagerLoadModules,
+ bool IgnoreCWD,
llvm::vfs::FileSystem &VFS) {
llvm::HashBuilder<llvm::TruncatedBLAKE3<16>, llvm::endianness::native>
HashBuilder;
@@ -410,7 +493,7 @@ static std::string getModuleContextHash(const ModuleDeps &MD,
HashBuilder.add(getClangFullRepositoryVersion());
HashBuilder.add(serialization::VERSION_MAJOR, serialization::VERSION_MINOR);
llvm::ErrorOr<std::string> CWD = VFS.getCurrentWorkingDirectory();
- if (CWD)
+ if (CWD && !IgnoreCWD)
HashBuilder.add(*CWD);
// Hash the BuildInvocation without any input files.
@@ -443,8 +526,11 @@ static std::string getModuleContextHash(const ModuleDeps &MD,
void ModuleDepCollector::associateWithContextHash(
const CowCompilerInvocation &CI, ModuleDeps &Deps) {
- Deps.ID.ContextHash = getModuleContextHash(
- Deps, CI, EagerLoadModules, ScanInstance.getVirtualFileSystem());
+ bool IgnoreCWD = any(OptimizeArgs & ScanningOptimizations::IgnoreCWD) &&
+ isSafeToIgnoreCWD(CI);
+ Deps.ID.ContextHash =
+ getModuleContextHash(Deps, CI, EagerLoadModules, IgnoreCWD,
+ ScanInstance.getVirtualFileSystem());
bool Inserted = ModuleDepsByID.insert({Deps.ID, &Deps}).second;
(void)Inserted;
assert(Inserted && "duplicate module mapping");
diff --git a/clang/test/ClangScanDeps/modules-context-hash-cwd.c b/clang/test/ClangScanDeps/modules-context-hash-cwd.c
new file mode 100644
index 00000000000000..45be72301c635d
--- /dev/null
+++ b/clang/test/ClangScanDeps/modules-context-hash-cwd.c
@@ -0,0 +1,123 @@
+// Test current directory pruning when computing the context hash.
+
+// REQUIRES: shell
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb0.json.in > %t/cdb0.json
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb1.json.in > %t/cdb1.json
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb2.json.in > %t/cdb2.json
+// RUN: clang-scan-deps -compilation-database %t/cdb0.json -format experimental-full > %t/result0.json
+// RUN: clang-scan-deps -compilation-database %t/cdb1.json -format experimental-full > %t/result1.json
+// RUN: clang-scan-deps -compilation-database %t/cdb2.json -format experimental-full -optimize-args=header-search,system-warnings,vfs,canonicalize-macros > %t/result2.json
+// RUN: cat %t/result0.json %t/result1.json | FileCheck %s
+// RUN: cat %t/result0.json %t/result2.json | FileCheck %s -check-prefix=SKIPOPT
+
+//--- cdb0.json.in
+[{
+ "directory": "DIR",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- cdb1.json.in
+[{
+ "directory": "DIR/a",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- cdb2.json.in
+[{
+ "directory": "DIR/a/",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- include/module.modulemap
+module mod {
+ header "mod.h"
+}
+
+//--- include/mod.h
+
+//--- tu.c
+#include "mod.h"
+
+// Check that result0 and result1 compute the same hash with optimization
+// on. The only difference between result0 and result1 is the compiler's
+// working directory.
+// CHECK: {
+// CHECK-NEXT: "modules": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "clang-module-deps": [],
+// CHECK: "context-hash": "[[HASH:.*]]",
+// CHECK: }
+// CHECK: "translation-units": [
+// CHECK: {
+// CHECK: "commands": [
+// CHECK: {
+// CHECK-NEXT: "clang-context-hash": "{{.*}}",
+// CHECK-NEXT: "clang-module-deps": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "context-hash": "[[HASH]]",
+// CHECK-NEXT: "module-name": "mod"
+// CHECK: }
+// CHECK: ],
+// CHECK: {
+// CHECK-NEXT: "modules": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "clang-module-deps": [],
+// CHECK: "context-hash": "[[HASH]]",
+// CHECK: }
+// CHECK: "translation-units": [
+// CHECK: {
+// CHECK: "commands": [
+// CHECK: {
+// CHECK-NEXT: "clang-context-hash": "{{.*}}",
+// CHECK-NEXT: "clang-module-deps": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "context-hash": "[[HASH]]",
+// CHECK-NEXT: "module-name": "mod"
+// CHECK: }
+// CHECK: ],
+
+// Check that result0 and result2 compute different hashes because
+// the working directory optmization is turned off for result2.
+// SKIPOPT: {
+// SKIPOPT-NEXT: "modules": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "clang-module-deps": [],
+// SKIPOPT: "context-hash": "[[HASH0:.*]]",
+// SKIPOPT: }
+// SKIPOPT: "translation-units": [
+// SKIPOPT: {
+// SKIPOPT: "commands": [
+// SKIPOPT: {
+// SKIPOPT-NEXT: "clang-context-hash": "{{.*}}",
+// SKIPOPT-NEXT: "clang-module-deps": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "context-hash": "[[HASH0]]",
+// SKIPOPT-NEXT: "module-name": "mod"
+// SKIPOPT: }
+// SKIPOPT: ],
+// SKIPOPT: {
+// SKIPOPT-NEXT: "modules": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "clang-module-deps": [],
+// SKIPOPT-NOT: "context-hash": "[[HASH0]]",
+// SKIPOPT: "context-hash": "[[HASH2:.*]]",
+// SKIPOPT: }
+// SKIPOPT: "translation-units": [
+// SKIPOPT: {
+// SKIPOPT: "commands": [
+// SKIPOPT: {
+// SKIPOPT-NEXT: "clang-context-hash": "{{.*}}",
+// SKIPOPT-NEXT: "clang-module-deps": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NOT: "context-hash": "[[HASH0]]",
+// SKIPOPT-NEXT: "context-hash": "[[HASH2]]"
+// SKIPOPT-NEXT: "module-name": "mod"
+// SKIPOPT: }
+// SKIPOPT: ],
+
diff --git a/clang/test/ClangScanDeps/working-dir.m b/clang/test/ClangScanDeps/working-dir.m
index a04f8c2486b98d..c6b7b1988d3cf7 100644
--- a/clang/test/ClangScanDeps/working-dir.m
+++ b/clang/test/ClangScanDeps/working-dir.m
@@ -2,7 +2,7 @@
// RUN: split-file %s %t
// RUN: sed -e "s|DIR|%/t|g" %t/build/compile-commands.json.in > %t/build/compile-commands.json
// RUN: clang-scan-deps -compilation-database %t/build/compile-commands.json \
-// RUN: -j 1 -format experimental-full --optimize-args=all > %t/deps.db
+// RUN: -j 1 -format experimental-full --optimize-args=header-search,system-warnings,vfs,canonicalize-macros > %t/deps.db
// RUN: cat %t/deps.db | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t
// Check that there are two separate modules hashes. One for each working dir.
diff --git a/clang/tools/clang-scan-deps/ClangScanDeps.cpp b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
index 709dc513be2811..8d429534a20073 100644
--- a/clang/tools/clang-scan-deps/ClangScanDeps.cpp
+++ b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
@@ -164,6 +164,8 @@ static void ParseArgs(int argc, char **argv) {
.Case("system-warnings", ScanningOptimizations::SystemWarnings)
.Case("vfs", ScanningOptimizations::VFS)
.Case("canonicalize-macros", ScanningOptimizations::Macros)
+ .Case("ignore-current-working-dir",
+ ScanningOptimizations::IgnoreCWD)
.Case("all", ScanningOptimizations::All)
.Default(std::nullopt);
if (!Optimization) {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right approach, but I think it would be good to have a test for the relative path checking. Not every option needs a test, just one should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better if this optimization happens really early in the process since you only visit all the options in CI? In that case, you can just reset the CurrentWorkingDirectory
in the VFS so all the searching is done without CWD. This avoids any hard to debug issues if some options are not taken care of (needs CWD but not checked) but the trade off is more explicit errors during scanning.
It has to happen after the header search optimization in case that removes relative header search paths. |
…ugh -working-directory.
Gentle ping for review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some small style comments.
clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
Outdated
Show resolved
Hide resolved
When computing the context hash, `clang` always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe. Specifically, `clang` checks if it is safe to ignore the working directory in `isSafeToIgnoreCWD`. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, and `llvm::sys::path::is_absolute` is false. If all the paths examined are not relative, `clang` considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash. (cherry picked from commit 54acda2)
When computing the context hash, `clang` always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe. Specifically, `clang` checks if it is safe to ignore the working directory in `isSafeToIgnoreCWD`. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, and `llvm::sys::path::is_absolute` is false. If all the paths examined are not relative, `clang` considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash. (cherry picked from commit 54acda2)
When computing the context hash, `clang` always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe. Specifically, `clang` checks if it is safe to ignore the working directory in `isSafeToIgnoreCWD`. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, and `llvm::sys::path::is_absolute` is false. If all the paths examined are not relative, `clang` considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash.
…re current working directory (#128446) This PR explicitly sets `DebugCompilationDir` to the system's root directory if it is safe to ignore the current working directory. This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see #124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set `DW_AT_comp_dir`. This may lead to the following situation: 1. Two different compilations need the same PCM file. 2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect. 3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different `DW_AT_comp_dir `, which is embedded in the debug info. This PR resets the `DebugCompilationDir` if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory. rdar://145249881
…afe to ignore current working directory (#128446) This PR explicitly sets `DebugCompilationDir` to the system's root directory if it is safe to ignore the current working directory. This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see llvm/llvm-project#124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set `DW_AT_comp_dir`. This may lead to the following situation: 1. Two different compilations need the same PCM file. 2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect. 3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different `DW_AT_comp_dir `, which is embedded in the debug info. This PR resets the `DebugCompilationDir` if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory. rdar://145249881
…re current working directory (llvm#128446) This PR explicitly sets `DebugCompilationDir` to the system's root directory if it is safe to ignore the current working directory. This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see llvm#124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set `DW_AT_comp_dir`. This may lead to the following situation: 1. Two different compilations need the same PCM file. 2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect. 3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different `DW_AT_comp_dir `, which is embedded in the debug info. This PR resets the `DebugCompilationDir` if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory. rdar://145249881 (cherry picked from commit 7f482aa) Conflicts: clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
…on Off by Default (#129809) #124786 implemented current working directory (CWD) optimization and the optimization was on by default. We have discovered that build system needs to be compatible with the CWD optimization and default off is a better behavior. The build system needs to be aware that the current working directory is ignored. Without a good way of notifying the build system, it is less risky to default to off. This PR implement the change. rdar://145860213
… Optimization Off by Default (#129809) llvm/llvm-project#124786 implemented current working directory (CWD) optimization and the optimization was on by default. We have discovered that build system needs to be compatible with the CWD optimization and default off is a better behavior. The build system needs to be aware that the current working directory is ignored. Without a good way of notifying the build system, it is less risky to default to off. This PR implement the change. rdar://145860213
…on Off by Default (llvm#129809) llvm#124786 implemented current working directory (CWD) optimization and the optimization was on by default. We have discovered that build system needs to be compatible with the CWD optimization and default off is a better behavior. The build system needs to be aware that the current working directory is ignored. Without a good way of notifying the build system, it is less risky to default to off. This PR implement the change. rdar://145860213 (cherry picked from commit 7bd492f) Conflicts: clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
…on Off by Default (llvm#129809) llvm#124786 implemented current working directory (CWD) optimization and the optimization was on by default. We have discovered that build system needs to be compatible with the CWD optimization and default off is a better behavior. The build system needs to be aware that the current working directory is ignored. Without a good way of notifying the build system, it is less risky to default to off. This PR implement the change. rdar://145860213 (cherry picked from commit 7bd492f) Conflicts: clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
…on Off by Default (llvm#129809) llvm#124786 implemented current working directory (CWD) optimization and the optimization was on by default. We have discovered that build system needs to be compatible with the CWD optimization and default off is a better behavior. The build system needs to be aware that the current working directory is ignored. Without a good way of notifying the build system, it is less risky to default to off. This PR implement the change. rdar://145860213
When computing the context hash,
clang
always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe.Specifically,
clang
checks if it is safe to ignore the working directory inisSafeToIgnoreCWD
. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, andllvm::sys::path::is_absolute
is false. If all the paths examined are not relative,clang
considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash.