Skip to content

[clang modules] Setting DebugCompilationDir when it is safe to ignore current working directory #128446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 26, 2025

Conversation

qiongsiwu
Copy link
Contributor

@qiongsiwu qiongsiwu commented Feb 24, 2025

This PR explicitly sets DebugCompilationDir to the system's root directory if it is safe to ignore the current working directory.

This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see #124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set DW_AT_comp_dir. This may lead to the following situation:

  1. Two different compilations need the same PCM file.
  2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect.
  3. The second compilation assumes a different working directory, and expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different DW_AT_comp_dir , which is embedded in the debug info.

This PR resets the DebugCompilationDir if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory.

rdar://145249881

@qiongsiwu qiongsiwu marked this pull request as draft February 24, 2025 00:55
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/pr-subscribers-clang

Author: Qiongsi Wu (qiongsiwu)

Changes

This PR explicitly sets DebugCompilationDir to the system's root directory if it is safe to ignore the current working directory.

This fixes a problem where a PCM file's embedded debug information can lead to compilation failure. The compiler may have decided it is indeed safe to ignore the current working directory. In this case, the PCM file's content is functionally correct regardless of the current working directory because no inputs use relative paths (see #124786). However, a PCM may contain debug info. If debug info is requested, the compiler uses the current working directory value to set DW_AT_comp_dir. This may lead to the following situation:

  1. Two different compilations need the same PCM file.
  2. The PCM file is compiled assuming a working directory, which is embedded in the debug info, but otherwise has no effect.
  3. The second compilation assumes a different working directory, hence expects an identically-sized pcm file. However, it cannot find such a PCM, because the existing PCM file has been compiled assuming a different DW_AT_comp_dir , which is embedded in the debug info.

This PR resets the DebugCompilationDir if it is functionally safe to ignore the working directory so the above situation is avoided, since all debug information will share the same working directory.


Full diff: https://github.com/llvm/llvm-project/pull/128446.diff

2 Files Affected:

  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+16-1)
  • (added) clang/test/ClangScanDeps/modules-debug-dir.c (+22)
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 1c5f4c4b50ab6..8a94535d3806c 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -492,8 +492,23 @@ static std::string getModuleContextHash(const ModuleDeps &MD,
   auto &FSOpts = const_cast<FileSystemOptions &>(CI.getFileSystemOpts());
   if (CWD && !IgnoreCWD)
     HashBuilder.add(*CWD);
-  else
+  else {
     FSOpts.WorkingDir.clear();
+    auto &CGOpts = const_cast<CodeGenOptions &>(CI.getCodeGenOpts());
+    if (CGOpts.DwarfVersion && CWD) {
+      // It is necessary to explicitly set the DebugCompilationDir
+      // to a common directory (e.g. root) if IgnoreCWD is true.
+      // When IgnoreCWD is true, the module's content should not depend
+      // on the current working directory. However, if dwarf information
+      // is needed (when CGOpts.DwarfVersion is non-zero), and if
+      // CGOpts.DebugCompilationDir is not explicitly set,
+      // the current working directory will be automatically embedded
+      // in the dwarf information in the pcm, contradicting the assumption
+      // that it is safe to ignore the CWD. Thus in such cases,
+      // CGOpts.DebugCompilationDir is explicitly set to a common directory.
+      CGOpts.DebugCompilationDir = llvm::sys::path::root_path(*CWD);
+    }
+  }
 
   // Hash the BuildInvocation without any input files.
   SmallString<0> ArgVec;
diff --git a/clang/test/ClangScanDeps/modules-debug-dir.c b/clang/test/ClangScanDeps/modules-debug-dir.c
new file mode 100644
index 0000000000000..580f72205e68f
--- /dev/null
+++ b/clang/test/ClangScanDeps/modules-debug-dir.c
@@ -0,0 +1,22 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb.json.in > %t/cdb.json
+// RUN: clang-scan-deps -compilation-database %t/cdb.json -format \
+// RUN:   experimental-full > %t/result.json
+
+//--- cdb.json.in
+[{
+  "directory": "DIR",
+  "command": "clang -g -fdebug-info-for-profiling DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+  "file": "DIR/tu.c"
+}]
+
+//--- include/module.modulemap
+module mod {
+  header "mod.h"
+}
+
+//--- include/mod.h
+
+//--- tu.c
+#include "mod.h"

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Feb 24, 2025

Note to reviewers:
This fixes rdar://145249881, but I'd like to get some input on three things.

  1. I am not sure if it is a good idea to set DebugCompilationDir to root. I chose it because I think it is safe and all systems should have the root directory. I am all ears for a different choice.
  2. I have not figured out a good way to test this. For some reason, I cannot get the module compilation command to produce debug information in the test (through -g, or -dwarf-version options). I don't think we always want to hardcode DebugCompilationDir to root, so I think we want to set it only if dwarf info is needed. May I get some pointers to get the compiler to generate debug info when compiling the pcm? The test is fixed. I need -gmodules.
  3. This fix feels very patchy to me. I worry that something else may somehow embed the CWD in the binary and we may have to play the cat and mouse game again. I think it may be fine for now, but I'd not want to leave it like this for too long and I'd love to get some suggestions.

Thanks!

@qiongsiwu qiongsiwu self-assigned this Feb 24, 2025
@qiongsiwu qiongsiwu marked this pull request as ready for review February 24, 2025 17:37
@cachemeifyoucan
Copy link
Collaborator

@adrian-prantl for the debug directory option since that might interact how debugger working with the option.

I think we need a solution approved by debugger forks for how to avoid stamping CWD into debug info and when it is safe to do so (for example, we guarantee all the paths in the TU/module are absolute path).

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Feb 24, 2025

Or maybe it makes more sense to set the working directory to SYSROOT, or to the input header's path, or the directory that contains the output pcm?

Update: setting DebugCompilationDir to the pcms' output directory works. But as @cachemeifyoucan pointed out earlier this may not be the right thing to do.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the correctness of the change up for other reviewers. My only gripe is the const_cast.

Copy link
Contributor

@kastiglione kastiglione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense on the surface. I'll pull this change and run lldb tests.

Comment on lines 502 to 507
// on the current working directory. However, if dwarf information
// is needed (when CGOpts.DwarfVersion is non-zero), and if
// CGOpts.DebugCompilationDir is not explicitly set,
// the current working directory will be automatically embedded
// in the dwarf information in the pcm, contradicting the assumption
// that it is safe to ignore the CWD. Thus in such cases,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this comment, I was tripped up on "if CGOpts.DebugCompilationDir is not explicitly set", thinking that this should mean a if (!CGOpts.DebugCompilationDir) somewhere in this function.

But I understand now. What do you think of this wording?

Suggested change
// on the current working directory. However, if dwarf information
// is needed (when CGOpts.DwarfVersion is non-zero), and if
// CGOpts.DebugCompilationDir is not explicitly set,
// the current working directory will be automatically embedded
// in the dwarf information in the pcm, contradicting the assumption
// that it is safe to ignore the CWD. Thus in such cases,
// on the current working directory. However, if dwarf information
// is needed (when CGOpts.DwarfVersion is non-zero), then
// CGOpts.DebugCompilationDir must be populated, because otherwise
// the current working directory will be automatically embedded
// in the dwarf information in the pcm, contradicting the assumption
// that it is safe to ignore the CWD. Thus in such cases,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this updated wording (before I didn't understand the intention) the patch looks safe to me.

Copy link
Member

@cyndyishida cyndyishida Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does lldb use the CWD encoded in the PCM? Would it better/possible to avoid setting CGOpts.DebugCompilationDir when IgnoreCWD == true ?

I'm not as concerned now. The important factors seem to be that

  • the CWD in explicitly built PCM's shouldn't need to be directly referenced in the debugger.
  • Even if CGOpts.DebugCompilationDir was empty here, it would still be written later which may incur more side effects.

Thanks for the explanation folks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I understand now. What do you think of this wording?

Thanks @kastiglione ! The comment modification looks great. It is in the PR now.

@qiongsiwu
Copy link
Contributor Author

Ping for review. Thanks so much!

Copy link
Member

@cyndyishida cyndyishida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @jansvoboda11 should also sign off.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@qiongsiwu qiongsiwu merged commit 7f482aa into llvm:main Feb 26, 2025
11 checks passed
@qiongsiwu qiongsiwu deleted the fix_145249881 branch February 26, 2025 21:21
qiongsiwu added a commit to qiongsiwu/llvm-project that referenced this pull request Feb 26, 2025
…re current working directory (llvm#128446)

This PR explicitly sets `DebugCompilationDir` to the system's root
directory if it is safe to ignore the current working directory.

This fixes a problem where a PCM file's embedded debug information can
lead to compilation failure. The compiler may have decided it is indeed
safe to ignore the current working directory. In this case, the PCM
file's content is functionally correct regardless of the current working
directory because no inputs use relative paths (see
llvm#124786). However, a PCM may
contain debug info. If debug info is requested, the compiler uses the
current working directory value to set `DW_AT_comp_dir`. This may lead
to the following situation:
1. Two different compilations need the same PCM file.
2. The PCM file is compiled assuming a working directory, which is
embedded in the debug info, but otherwise has no effect.
3. The second compilation assumes a different working directory, and
expects an identically-sized pcm file. However, it cannot find such a
PCM, because the existing PCM file has been compiled assuming a
different `DW_AT_comp_dir `, which is embedded in the debug info.

This PR resets the `DebugCompilationDir` if it is functionally safe to
ignore the working directory so the above situation is avoided, since
all debug information will share the same working directory.

rdar://145249881
(cherry picked from commit 7f482aa)

 Conflicts:
	clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
qiongsiwu added a commit to swiftlang/llvm-project that referenced this pull request Feb 27, 2025
[clang modules] Setting `DebugCompilationDir` when it is safe to ignore current working directory (llvm#128446)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants