Skip to content

[clang][modules] Allow including module maps to be non-affecting #89992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions clang/include/clang/Serialization/ASTWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ class StoredDeclsList;
class SwitchCase;
class Token;

namespace SrcMgr {
class FileInfo;
} // namespace SrcMgr

/// Writes an AST file containing the contents of a translation unit.
///
/// The ASTWriter class produces a bitstream containing the serialized
Expand Down Expand Up @@ -490,6 +494,11 @@ class ASTWriter : public ASTDeserializationListener,
/// during \c SourceManager serialization.
void computeNonAffectingInputFiles();

/// Some affecting files can be included from files that are not affecting.
/// This function erases source locations pointing into such files.
SourceLocation getAffectingIncludeLoc(const SourceManager &SourceMgr,
const SrcMgr::FileInfo &File);

/// Returns an adjusted \c FileID, accounting for any non-affecting input
/// files.
FileID getAdjustedFileID(FileID FID) const;
Expand Down
90 changes: 58 additions & 32 deletions clang/lib/Serialization/ASTWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,54 +173,50 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {

const HeaderSearch &HS = PP.getHeaderSearchInfo();
const ModuleMap &MM = HS.getModuleMap();
const SourceManager &SourceMgr = PP.getSourceManager();

std::set<const FileEntry *> ModuleMaps;
auto CollectIncludingModuleMaps = [&](FileID FID, FileEntryRef F) {
if (!ModuleMaps.insert(F).second)
std::set<const Module *> ProcessedModules;
auto CollectModuleMapsForHierarchy = [&](const Module *M) {
M = M->getTopLevelModule();

if (!ProcessedModules.insert(M).second)
return;
SourceLocation Loc = SourceMgr.getIncludeLoc(FID);
// The include location of inferred module maps can point into the header
// file that triggered the inferring. Cut off the walk if that's the case.
while (Loc.isValid() && isModuleMap(SourceMgr.getFileCharacteristic(Loc))) {
FID = SourceMgr.getFileID(Loc);
F = *SourceMgr.getFileEntryRefForID(FID);
if (!ModuleMaps.insert(F).second)
break;
Loc = SourceMgr.getIncludeLoc(FID);
}
};

std::set<const Module *> ProcessedModules;
auto CollectIncludingMapsFromAncestors = [&](const Module *M) {
for (const Module *Mod = M; Mod; Mod = Mod->Parent) {
if (!ProcessedModules.insert(Mod).second)
break;
std::queue<const Module *> Q;
Q.push(M);
while (!Q.empty()) {
const Module *Mod = Q.front();
Q.pop();

// The containing module map is affecting, because it's being pointed
// into by Module::DefinitionLoc.
if (FileID FID = MM.getContainingModuleMapFileID(Mod); FID.isValid())
CollectIncludingModuleMaps(FID, *SourceMgr.getFileEntryRefForID(FID));
// For inferred modules, the module map that allowed inferring is not in
// the include chain of the virtual containing module map file. It did
// affect the compilation, though.
if (FileID FID = MM.getModuleMapFileIDForUniquing(Mod); FID.isValid())
CollectIncludingModuleMaps(FID, *SourceMgr.getFileEntryRefForID(FID));
if (auto FE = MM.getContainingModuleMapFile(Mod))
ModuleMaps.insert(*FE);
// For inferred modules, the module map that allowed inferring is not
// related to the virtual containing module map file. It did affect the
// compilation, though.
if (auto FE = MM.getModuleMapFileForUniquing(Mod))
ModuleMaps.insert(*FE);

for (auto *SubM : Mod->submodules())
Q.push(SubM);
}
};

// Handle all the affecting modules referenced from the root module.

CollectModuleMapsForHierarchy(RootModule);

std::queue<const Module *> Q;
Q.push(RootModule);
while (!Q.empty()) {
const Module *CurrentModule = Q.front();
Q.pop();

CollectIncludingMapsFromAncestors(CurrentModule);
for (const Module *ImportedModule : CurrentModule->Imports)
CollectIncludingMapsFromAncestors(ImportedModule);
CollectModuleMapsForHierarchy(ImportedModule);
for (const Module *UndeclaredModule : CurrentModule->UndeclaredUses)
CollectIncludingMapsFromAncestors(UndeclaredModule);
CollectModuleMapsForHierarchy(UndeclaredModule);

for (auto *M : CurrentModule->submodules())
Q.push(M);
Expand Down Expand Up @@ -249,9 +245,27 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {

for (const auto &KH : HS.findResolvedModulesForHeader(*File))
if (const Module *M = KH.getModule())
CollectIncludingMapsFromAncestors(M);
CollectModuleMapsForHierarchy(M);
}

// FIXME: This algorithm is not correct for module map hierarchies where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this working before, or always broken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole concept of mapping modules to module map files is broken. However, we would previously correctly mark MSub1.modulemap as affecting and we won't with this patch:

//--- M.modulemap
module M {}
extern module M.Sub "MSub1.modulemap"
//--- MSub1.modulemap
extern module M.Sub "MSub2.modulemap"
//--- MSub2.modulemap
module M.Sub {}

Previously we would walk up the module map file include hierarchy from MSub2.modulemap to MSub1.modulemap to M.modulemap.

Now, we will only walk the module hierarchy and only mark the defining MSub2.modulemap and M.modulemap as affecting, effectively creating a "hole" in the tree of module map file includes by omitting MSub1.modulemap.

Similar thing happens in this case (assuming M is affecting and N is not):

//--- N.modulemap
module N {}
extern module N.Sub "NSub.modulemap"
//--- NSub.modulemap
module N.Sub {}
module M {}

However, cases like the following were broken before and will remain broken with this patch:

//--- M.modulemap
module M {}
extern module M.Sub "MSub.modulemap"
//--- MSub.modulemap
module M.Sub {}
extern module N "N.modulemap"
//--- N.modulemap
module N {}
//--- M.modulemap
module M {}
module N {}
extern module M.Sub "MSub.modulemap"
extern module N.Sub "NSub.modulemap"
//--- MSub.modulemap
module M.Sub {}
//--- NSub.modulemap
module N.Sub {}

etc...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we will only walk the module hierarchy and only mark the defining MSub2.modulemap and M.modulemap as affecting, effectively creating a "hole" in the tree of module map file includes by omitting MSub1.modulemap.

Does this cause a failure when it tries to include MSub1.modulemap? In some sense this scenario could be "fine" since you have all the modulemaps that contributed to the module definition.

However, cases like the following were broken before and will remain broken with this patch:

Would being lazier about the module definitions solve this?

I'm not clear what the path to actually fixing these looks like

Copy link
Contributor Author

@jansvoboda11 jansvoboda11 Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause a failure when it tries to include MSub1.modulemap? In some sense this scenario could be "fine" since you have all the modulemaps that contributed to the module definition.

Yes, exactly.

I thought this will break the AST file replay mode, but that doesn't actually reparse the module map file:

// RUN: %clang_cc1 -fmodules -fmodules-embed-all-files -emit-module -fmodule-name=M %t/M.modulemap -o %t/M.pcm
// RUN: rm %t/*.modulemap
// RUN: %clang_cc1 -E %t/M.pcm

One way of actually hitting this issue is compiling the module with a pruned file system that only contains files reported by the scanner. There are lots of different ways people compile modules, so maybe that is relevant.

However, cases like the following were broken before and will remain broken with this patch:

Would being lazier about the module definitions solve this?

What do you mean by this? Ignoring missing files in extern module X "<file>" until we actually need X?

I'm not clear what the path to actually fixing these looks like

I'm a fan of defining this away by posing restrictions on the shape of the module map file hierarchies (so that they more or less correspond to the module graph they describe).

// module map file defining a (sub)module of a top-level module X includes
// a module map file that defines a (sub)module of another top-level module Y.
// Whenever X is affecting and Y is not, "replaying" this PCM file will fail
// when parsing module map files for X due to not knowing about the `extern`
// module map for Y.
//
// We don't have a good way to fix it here. We could mark all children of
// affecting module map files as being affecting as well, but that's
// expensive. SourceManager does not model the edge from parent to child
// SLocEntries, so instead, we would need to iterate over leaf module map
// files, walk up their include hierarchy and check whether we arrive at an
// affecting module map.
//
// Instead of complicating and slowing down this function, we should probably
// just ban module map hierarchies where module map defining a (sub)module X
// includes a module map defining a module that's not a submodule of X.

return ModuleMaps;
}

Expand Down Expand Up @@ -1639,6 +1653,18 @@ struct InputFileEntry {

} // namespace

SourceLocation ASTWriter::getAffectingIncludeLoc(const SourceManager &SourceMgr,
const SrcMgr::FileInfo &File) {
SourceLocation IncludeLoc = File.getIncludeLoc();
if (IncludeLoc.isValid()) {
FileID IncludeFID = SourceMgr.getFileID(IncludeLoc);
assert(IncludeFID.isValid() && "IncludeLoc in invalid file");
if (!IsSLocAffecting[IncludeFID.ID])
IncludeLoc = SourceLocation();
}
return IncludeLoc;
}

void ASTWriter::WriteInputFiles(SourceManager &SourceMgr,
HeaderSearchOptions &HSOpts) {
using namespace llvm;
Expand Down Expand Up @@ -1692,7 +1718,7 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr,
Entry.IsSystemFile = isSystem(File.getFileCharacteristic());
Entry.IsTransient = Cache->IsTransient;
Entry.BufferOverridden = Cache->BufferOverridden;
Entry.IsTopLevel = File.getIncludeLoc().isInvalid();
Entry.IsTopLevel = getAffectingIncludeLoc(SourceMgr, File).isInvalid();
Entry.IsModuleMap = isModuleMap(File.getFileCharacteristic());

auto ContentHash = hash_code(-1);
Expand Down Expand Up @@ -2219,7 +2245,7 @@ void ASTWriter::WriteSourceManagerBlock(SourceManager &SourceMgr,
SLocEntryOffsets.push_back(Offset);
// Starting offset of this entry within this module, so skip the dummy.
Record.push_back(getAdjustedOffset(SLoc->getOffset()) - 2);
AddSourceLocation(File.getIncludeLoc(), Record);
AddSourceLocation(getAffectingIncludeLoc(SourceMgr, File), Record);
Record.push_back(File.getFileCharacteristic()); // FIXME: stable encoding
Record.push_back(File.hasLineDirectives());

Expand Down
20 changes: 8 additions & 12 deletions clang/test/ClangScanDeps/modules-extern-unrelated.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// This test checks that only module map files defining affecting modules are
// affecting.

// RUN: rm -rf %t
// RUN: split-file %s %t

Expand All @@ -22,15 +25,8 @@
//--- second/second.h
#include "first_other.h"

//--- cdb.json.template
[{
"directory": "DIR",
"file": "DIR/tu.m",
"command": "clang -fmodules -fmodules-cache-path=DIR/cache -I DIR/zeroth -I DIR/first -I DIR/second -c DIR/tu.m -o DIR/tu.o"
}]

// RUN: sed -e "s|DIR|%/t|g" -e "s|INPUTS|%/S/Inputs|g" %t/cdb.json.template > %t/cdb.json
// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/result.json
// RUN: clang-scan-deps -format experimental-full -o %t/result.json \
// RUN: -- %clang -fmodules -fmodules-cache-path=%t/cache -I %t/zeroth -I %t/first -I %t/second -c %t/tu.m -o %t/tu.o
// RUN: cat %t/result.json | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t

// CHECK: {
Expand Down Expand Up @@ -67,11 +63,11 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/second/second.modulemap",
// CHECK-NEXT: "command-line": [
// CHECK-NOT: "-fmodule-map-file=[[PREFIX]]/second/module.modulemap"
// CHECK: ],
// CHECK-NEXT: "context-hash": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/first/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/second.h",
// CHECK-NEXT: "[[PREFIX]]/second/second.modulemap"
// CHECK-NEXT: ],
Expand All @@ -90,11 +86,11 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "clang-modulemap-file": "[[PREFIX]]/zeroth/module.modulemap",
// CHECK-NEXT: "command-line": [
// CHECK-NOT: "-fmodule-map-file=[[PREFIX]]/second/module.modulemap"
// CHECK: ],
// CHECK-NEXT: "context-hash": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/first/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/second/second.modulemap",
// CHECK-NEXT: "[[PREFIX]]/zeroth/module.modulemap",
// CHECK-NEXT: "[[PREFIX]]/zeroth/zeroth.h"
Expand All @@ -115,7 +111,7 @@
// CHECK-NEXT: ],
// CHECK-NEXT: "command-line": [
// CHECK: ],
// CHECK-NEXT: "executable": "clang",
// CHECK-NEXT: "executable": "{{.*}}",
// CHECK-NEXT: "file-deps": [
// CHECK-NEXT: "[[PREFIX]]/tu.m"
// CHECK-NEXT: ],
Expand Down