Skip to content

Add driver flag to precompile Swift-compatible explicit Clang modules. #28107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 14, 2019

Conversation

allevato
Copy link
Member

@allevato allevato commented Nov 6, 2019

Some additional context/discussion can be found in this Swift Forums thread.

This change adds two flags to the Swift driver:

  • -emit-pcm, which takes as input a module map and generates an explicit precompiled module using a Clang compiler instance from Swift's ClangImporter.
  • -dump-pcm, which takes as input a .pcm file generated by the Swift compiler (either by -emit-pcm, or one from the implicit module cache) and prints info about it as a debugging aid (equivalent to clang -module-file-info)

Such a .pcm file can be passed to swiftc using -Xcc -fmodule-file=path/to/module.pcm and imported like any other C/Objective-C module.

Example usage

swiftc -emit-pcm -module-name TheNameOfAModuleInTheModuleMap \
  -target arm64-apple-ios11.0.0 \
  -o MyModule.pcm \
  -Xcc -DADDITIONAL_CLANG_FLAGS \
  path/to/module.modulemap

swiftc -dump-pcm MyModule.pcm

Motivation

When distributing Swift builds consisting of large numbers of Objective-C and Swift modules remotely with build systems like Bazel, we can't rely on the implicit module cache to reduce repeated work—the cache won't be the same from machine to machine, and it's not possible to share it because absolute paths to the workspaces will be different (and those paths are written into the modules).

This means that each swiftc job's ClangImporter must redo the work of parsing the C/Objective-C headers used by a Swift module, which greatly slows down our remote builds. For dependency graphs with large amounts of Objective-C, we a significant amount of time reported by the compiler spent in Name Binding, and file system traces indicate that much of that is processing headers.

Instead, we'd like to explicitly precompile those C modules and propagate the .pcm artifacts as part of the build graph, thus doing the work only once. Since precompiled modules are not binary-compatible across different versions of Clang or across invocations where the non-benign invocation options differ, the only stable way to make a .pcm that is compatible with Swift is to extend swiftc to have its own ClangImporter instance do it.

@allevato
Copy link
Member Author

allevato commented Nov 6, 2019

cc @DougGregor, could you add a reviewer to this please? Thanks!

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you're working on this, because it's really important for us to be able to separately build the Clang modules on which a Swift compilation job depends. I feel like there is a bunch of code duplication here where the new build-a-Clang-module path is kept separate from similar code paths (in particular, bridging PCH generation) that is likely to stray. Specifically, I think I prefer integrating this path as another "frontend" action (the way we do for emitting a bridging PCH) so we aren't duplicating so much command-line parsing and setup logic, and get tools like output file maps and such for free. What do you think?

return LangOpts->CPlusPlus ? clang::Language::CXX : clang::Language::C;
}

bool ClangImporter::emitPrecompiledModule(StringRef moduleMapPath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a significant amount of overlap between this and ClangImporter::emitBridgingPCH. Can we unify these two functions, so the somewhat arcane knowledge of how to get Clang to do this properly isn't duplicated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// This is functionally equivalent to Clang's -module-file-info frontend flag,
// but is provided here to guarantee compatibility with modules emitted by the
// Swift compiler since those modules will not be compatible with a different
// version of Clang (even for debug-printing).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Clang's -module-file-info was based on relatively stable parts of the Clang module file format? Personally, Could we rely on that for the "dump a module's metadata for debugging" case, rather than have this much Clang-specific logic in Swift?

Copy link
Member Author

@allevato allevato Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the Clang used to generate the module must be the same version as the one that runs -module-file-info on it, or you get an error like:

error: PCH file built from a different branch () than the compiler ((clang-1100.0.33.8))

And Xcode provides no guarantee that its clang binary and the Clang linked into its swiftc binary will be the same version; over the past few versions of Swift, it's more often been the case that they're slightly different revisions. For example, Xcode 11's clang is clang-1100.0.33.8 but swiftc's clang is clang-1100.0.33.7.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for correcting my faulty memory. I agree it makes sense to have this functionality in Swift.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've turned this into a -dump-pcm no-output mode in the standard driver/frontend.

}

StringRef Filename = Invocation.ModuleMapFilename;
auto ErrOrBuf = llvm::MemoryBuffer::getFile(Filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be going through the virtual file system rather than poking directly at the file system?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, probably. I copied this from the modulewrap main, but that might be a different situation since the input file to that action is always an output from a previous action? (Or maybe that should also be updated at some point?)

If I migrate this to a frontend action, then this separate check can probably go away and be folded into something more appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modulewrap main should also be updated at some point; it's part of the reason that I think it's useful to make this a frontend action, because it's easier to get this stuff set up properly.

}
};

int emit_pcm_main(ArrayRef<const char *> Args, const char *Argv0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a large amount of code to effectively make this its own driver, which I'm concerned will get out of sync with the way we import the resulting Clang modules in subtle ways. Could we instead handle this the way the emission of the precompiled bridging header is handled, as a frontend command that relies on all of the same command-line option parsing and setup as a compile job?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it similar to -emit-pch makes sense as a frontend job, especially if you want the Swift driver to at some point in the future be able to construct explicit PCM jobs as part of a compilation, which is a use case I hadn't considered.

For my use case (integration with Bazel), I want to invoke this completely independently, so I'd want it to be a stable driver invocation as well; for example, -emit-pch is a frontend-only flag because that job is only created under the hood when -enable-bridging-pch is passed to the driver, so I wouldn't want Bazel's integration to rely on invoking the frontend directly since that's considered unstable. As a standalone invocation, I'd want swiftc -emit-pcm -module-name Foo path/to/some/module.modulemap to construct only the PCM job and ensure that only modulemap files are supplied as inputs.

Are there any frontend actions other than compilation that I can look at for guidance to model this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up emulating what was done in FrontendTool for -emit-pch, but also made the flag available directly to the driver. So now if -emit-pcm (or -dump-pcm) is used, it's treated as a SingleCompile action type, and FrontendTool checks that flag and hands off control to ClangImporter.

I thought I would also need to create a special FrontendAction subclass like GeneratePCHAction, but that ended up not being necessary—it looks like that would only be needed if the driver wanted to register PCM actions separately as part of a larger compilation job?

Please let me know if this updated version is closer to what you had in mind!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having swiftc -emit-pcm -module-name Foo path/to/some/module.modulemap effectively create only a PCM-emitting job makes sense. The "compilation" path is really the right one for the driver here, because you want a lot of the same code to pass through (VFS, output file maps, most compilation options, etc.).

@allevato
Copy link
Member Author

This seems to be working for some basic modules (like the ones in the test case in the PR), but I'm seeing some odd behavior (an assertion failure) when I try to build and then consume explicit modules for some of the Apple SDK system frameworks/modules. (We need to turn on -fno-implicit-modules so that the module cache doesn't affect the content of the modules, meaning tht even the system frameworks have to be explicit.)

In order to do this, I have the following modules that I build explicitly with -emit-pcm invocations:

  • _Builtin_stddef_max_align_t (defined in <clang-resource-dir>/include/module.modulemap)
  • Darwin (defined in $SDKROOT/usr/include/module.modulemap, depends on [_Builtin_stddef_max_align_t])
  • ObjectiveC (defined in $SDKROOT/usr/include/objc/module.modulemap, depends on [_Builtin_stddef_max_align_t, Darwin])
  • os_object (defined in $SDKROOT/usr/include/module.modulemap, depends on [_Builtin_stddef_max_align_t, Darwin, ObjectiveC])
  • Dispatch (defined in $SDKROOT/usr/include/dispatch/module.modulemap, depends on [_Builtin_stddef_max_align_t, Darwin, ObjectiveC, os_object])

Building Dispatch is where things go south. I pass all the dependencies to using swiftc -emit-pcm -Xcc -fmodule-file=/path/to/module.pcm, and I get a crash like this:

Assertion failed: (D == Ptr && "Didn't find this decl on its identifier's chain!"), function RemoveDecl, file /REDACTED/Swift/llvm/tools/clang/lib/Sema/IdentifierResolver.cpp, line 221.

0  swift                    0x00000001116fc655 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1  swift                    0x00000001116fb945 llvm::sys::RunSignalHandlers() + 85
2  swift                    0x00000001116fcc38 SignalHandler(int) + 264
3  libsystem_platform.dylib 0x00007fff6f68eb5d _sigtramp + 29
4  swift                    0x00000001139ec06d cmark_strbuf__initbuf + 126920
5  libsystem_c.dylib        0x00007fff6f5486a6 abort + 127
6  libsystem_c.dylib        0x00007fff6f51120d basename_r + 0
7  swift                    0x0000000112476bb3 clang::IdentifierResolver::RemoveDecl(clang::NamedDecl*) (.cold.5) + 35
8  swift                    0x0000000110a6bc08 clang::IdentifierResolver::RemoveDecl(clang::NamedDecl*) + 232
9  swift                    0x000000011093de90 clang::ASTReader::pushExternalDeclIntoScope(clang::NamedDecl*, clang::DeclarationName) + 160
10 swift                    0x000000011093dc08 clang::ASTReader::InitializeSema(clang::Sema&) + 280
11 swift                    0x0000000110a9e595 clang::Sema::Initialize() + 85
12 swift                    0x00000001108cde55 clang::Parser::Initialize() + 4421
13 swift                    0x0000000110832f6d clang::ParseAST(clang::Sema&, bool, bool) + 381
14 swift                    0x00000001106da74e clang::FrontendAction::Execute() + 78
15 swift                    0x0000000110696c31 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1473
16 swift                    0x000000010e9d11b0 swift::ClangImporter::emitPrecompiledModule(llvm::StringRef, llvm::StringRef, llvm::StringRef) + 1280
17 swift                    0x000000010defd669 performCompile(swift::CompilerInstance&, swift::CompilerInvocation&, llvm::ArrayRef<char const*>, int&, swift::FrontendObserver*, swift::UnifiedStatsReporter*) + 1785
18 swift                    0x000000010defbfa2 swift::performFrontend(llvm::ArrayRef<char const*>, char const*, void*, swift::FrontendObserver*) + 2978
19 swift                    0x000000010dea6388 main + 696
20 libdyld.dylib            0x00007fff6f4a33d5 start + 1
21 libdyld.dylib            0x000000000000001d start + 2427833417

I threw some D->dump() calls inside RemoveDecl and saw some stuff about "undeserialized declarations", but I don't really have any idea what that means.

Then, I noticed the differing semantics of -fmodule-file=MODULE_NAME=path vs. -fmodule-file=path and decided to try the former form instead, and that succeeded (at least, it didn't crash).

This behavior happened both at HEAD, and in the 5.1 branch. I didn't observe it in an older prototype that I've been experimenting with internally, where I'm using sources that date back to sometime in June (I'll have to fish out more specific timestamps).

I don't think what I'm seeing is related to my PR specifically, but it's something I'd like to figure out before we're able to make real use of this. Is the behavior I'm observing (especially the difference between -fmodule-file forms) expected?

@allevato
Copy link
Member Author

Following up to my previous comment, I observe the same behavior if I build and consume the same explicit modules using the standalone clang binary built from apple/llvm-project: crash when using -fmodule-file=path, success when using -fmodule-file=module_name=path, but not from clang distributed with Xcode 11.

So, at least the issue appears to be unrelated to this PR or to Swift/ClangImporter. I'll see if I can get the same behavior on upstream clang next.

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks great to me!

@allevato
Copy link
Member Author

Thanks Doug! If there's nothing else blocking, can you kick off the tests and merge if successful? I don't have permissions to do so.

@benlangmuir
Copy link
Contributor

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants