Skip to content

[clang][Driver] Don't ignore -gmodules .gch files #77711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

Bigcheese
Copy link
Contributor

A previous commit (82f75ed) made clang ignore .gch files that were not Clang AST files. This broke -gmodules, which embeds the Clang AST into an object file containing debug info.

This changes the probing to detect any file format recognized by llvm::identify_magic() as potentially containing a Clang AST.

Previous PR: #69204

@Bigcheese Bigcheese requested review from MaskRay and zmodem January 11, 2024 01:03
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:binary-utilities labels Jan 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-binary-utilities

Author: Michael Spencer (Bigcheese)

Changes

A previous commit (82f75ed) made clang ignore .gch files that were not Clang AST files. This broke -gmodules, which embeds the Clang AST into an object file containing debug info.

This changes the probing to detect any file format recognized by llvm::identify_magic() as potentially containing a Clang AST.

Previous PR: #69204


Full diff: https://github.com/llvm/llvm-project/pull/77711.diff

6 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+8-4)
  • (modified) clang/test/PCH/gch-probe.c (+4)
  • (modified) llvm/include/llvm/BinaryFormat/Magic.h (+1)
  • (modified) llvm/lib/BinaryFormat/Magic.cpp (+2)
  • (modified) llvm/lib/Object/Binary.cpp (+1)
  • (modified) llvm/lib/Object/ObjectFile.cpp (+1)
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 2d8ef841d4f6be..36f1545e285e14 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -43,6 +43,7 @@
 #include "clang/Driver/XRayArgs.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/StringExtras.h"
+#include "llvm/BinaryFormat/Magic.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/Option/ArgList.h"
 #include "llvm/Support/CodeGen.h"
@@ -948,10 +949,13 @@ static void handleAMDGPUCodeObjectVersionOptions(const Driver &D,
   }
 }
 
-static bool hasClangPchSignature(const Driver &D, StringRef Path) {
+static bool maybeHasClangPchSignature(const Driver &D, StringRef Path) {
   if (llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MemBuf =
           D.getVFS().getBufferForFile(Path))
-    return (*MemBuf)->getBuffer().starts_with("CPCH");
+    // Return true for both raw Clang AST files and object files which may
+    // contain a __clangast section.
+    return llvm::identify_magic((*MemBuf)->getBuffer()) !=
+           llvm::file_magic::unknown;
   return false;
 }
 
@@ -964,14 +968,14 @@ static bool gchProbe(const Driver &D, StringRef Path) {
     std::error_code EC;
     for (llvm::vfs::directory_iterator DI = D.getVFS().dir_begin(Path, EC), DE;
          !EC && DI != DE; DI = DI.increment(EC)) {
-      if (hasClangPchSignature(D, DI->path()))
+      if (maybeHasClangPchSignature(D, DI->path()))
         return true;
     }
     D.Diag(diag::warn_drv_pch_ignoring_gch_dir) << Path;
     return false;
   }
 
-  if (hasClangPchSignature(D, Path))
+  if (maybeHasClangPchSignature(D, Path))
     return true;
   D.Diag(diag::warn_drv_pch_ignoring_gch_file) << Path;
   return false;
diff --git a/clang/test/PCH/gch-probe.c b/clang/test/PCH/gch-probe.c
index 8b1e1fab5ad97b..0905c9baebdae0 100644
--- a/clang/test/PCH/gch-probe.c
+++ b/clang/test/PCH/gch-probe.c
@@ -2,6 +2,10 @@
 // RUN: %clang -x c-header -c %s -o %t.h.gch
 // RUN: %clang -fsyntax-only -include %t.h %s
 
+// -gmodules embeds the Clang AST file in an object file.
+// RUN: %clang -x c-header -c %s -gmodules -o %t.h.gch
+// RUN: %clang -fsyntax-only -include %t.h %s
+
 // gch probing should ignore files which are not clang pch files.
 // RUN: %clang -fsyntax-only -include %S/Inputs/gch-probe.h %s 2>&1 | FileCheck %s
 // CHECK: warning: precompiled header '{{.*}}gch-probe.h.gch' was ignored because it is not a clang PCH file
diff --git a/llvm/include/llvm/BinaryFormat/Magic.h b/llvm/include/llvm/BinaryFormat/Magic.h
index c635a269576587..6978c066bda468 100644
--- a/llvm/include/llvm/BinaryFormat/Magic.h
+++ b/llvm/include/llvm/BinaryFormat/Magic.h
@@ -21,6 +21,7 @@ struct file_magic {
   enum Impl {
     unknown = 0,       ///< Unrecognized file
     bitcode,           ///< Bitcode file
+    clang_ast,         ///< Clang PCH or PCM
     archive,           ///< ar style archive file
     elf,               ///< ELF Unknown type
     elf_relocatable,   ///< ELF Relocatable object file
diff --git a/llvm/lib/BinaryFormat/Magic.cpp b/llvm/lib/BinaryFormat/Magic.cpp
index 45a0b7e11452b4..bd378337ed3338 100644
--- a/llvm/lib/BinaryFormat/Magic.cpp
+++ b/llvm/lib/BinaryFormat/Magic.cpp
@@ -98,6 +98,8 @@ file_magic llvm::identify_magic(StringRef Magic) {
   case 'C':
     if (startswith(Magic, "CCOB"))
       return file_magic::offload_bundle_compressed;
+    if (startswith(Magic, "CPCH"))
+      return file_magic::clang_ast;
     break;
   case '!':
     if (startswith(Magic, "!<arch>\n") || startswith(Magic, "!<thin>\n"))
diff --git a/llvm/lib/Object/Binary.cpp b/llvm/lib/Object/Binary.cpp
index 0b9d95485287dc..2dfae8ab5d3c64 100644
--- a/llvm/lib/Object/Binary.cpp
+++ b/llvm/lib/Object/Binary.cpp
@@ -84,6 +84,7 @@ Expected<std::unique_ptr<Binary>> object::createBinary(MemoryBufferRef Buffer,
     // PDB does not support the Binary interface.
     return errorCodeToError(object_error::invalid_file_type);
   case file_magic::unknown:
+  case file_magic::clang_ast:
   case file_magic::cuda_fatbinary:
   case file_magic::coff_cl_gl_object:
   case file_magic::dxcontainer_object:
diff --git a/llvm/lib/Object/ObjectFile.cpp b/llvm/lib/Object/ObjectFile.cpp
index c05eb0a0468e2b..6a226a3bbdbca3 100644
--- a/llvm/lib/Object/ObjectFile.cpp
+++ b/llvm/lib/Object/ObjectFile.cpp
@@ -155,6 +155,7 @@ ObjectFile::createObjectFile(MemoryBufferRef Object, file_magic Type,
   switch (Type) {
   case file_magic::unknown:
   case file_magic::bitcode:
+  case file_magic::clang_ast:
   case file_magic::coff_cl_gl_object:
   case file_magic::archive:
   case file_magic::macho_universal_binary:

@zmodem
Copy link
Collaborator

zmodem commented Jan 11, 2024

Thanks for looking into this. I wasn't aware of -gmodules relying on "gch probing" when I wrote the previous patch.

It seems unfortunate to make maybeHasClangPchSignature so broad as to return true for any file format that we recognize. Would it be possible to at least tighten it to only consider the kind of object files which could have a clangast section? Or is that list too long? What does the code that reads these files look like, could we leverage that somehow?

An alternative would be turn the logic around, and only ignore GCC PCH files (I believe they all start with the file magic gpch). However I do think that the current approach of "whitelisting" the kind of file we're looking for is better.

@Bigcheese
Copy link
Contributor Author

What does the code that reads these files look like, could we leverage that somehow?

You can call clang::ObjectFilePCHContainerReader::ExtractPCH() and then check the magic. This lives in the CodeGen library which I don't think the driver currently (or should) links against, but this is the best way to know if something is valid.

An alternative would be turn the logic around, and only ignore GCC PCH files (I believe they all start with the file magic gpch). However I do think that the current approach of "whitelisting" the kind of file we're looking for is better.

I would be fine with this approach, but agree that it's best if we can be more selective. My concern is that clang::ObjectFilePCHContainerReader::ExtractPCH() and this detection may get out of sync, as support for -gmodules is automatic anytime someone adds a new object format.

@zmodem
Copy link
Collaborator

zmodem commented Jan 11, 2024

What does the code that reads these files look like, could we leverage that somehow?

You can call clang::ObjectFilePCHContainerReader::ExtractPCH() and then check the magic. This lives in the CodeGen library which I don't think the driver currently (or should) links against, but this is the best way to know if something is valid.

Agreed we probably don't want that. Maybe a compromise would be to call llvm::object::ObjectFile::createObjectFile just to check that it's at least an object file? It looks like Driver is linking against Object already.

An alternative would be turn the logic around, and only ignore GCC PCH files (I believe they all start with the file magic gpch). However I do think that the current approach of "whitelisting" the kind of file we're looking for is better.

I would be fine with this approach, but agree that it's best if we can be more selective. My concern is that clang::ObjectFilePCHContainerReader::ExtractPCH() and this detection may get out of sync, as support for -gmodules is automatic anytime someone adds a new object format.

Speaking of keeping things in sync, I see we also have printRawClangAST in llvm-objdump.cpp duplicating some of ExtractPCH's logic :)

@Bigcheese
Copy link
Contributor Author

I updated the patch to try to actually open the object file.

Copy link
Collaborator

@zmodem zmodem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

A previous commit (82f75ed) made clang ignore .gch files that were not
Clang AST files. This broke `-gmodules`, which embeds the Clang AST
into an object file containing debug info.

This changes the probing to detect any valid object file as a
potential Clang PCH.
@Bigcheese Bigcheese merged commit 894c224 into llvm:main Jan 16, 2024
@jakeegan
Copy link
Member

Hi, on AIX we are seeing

error: input is not a PCH file: '/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/tools/clang/test/PCH/Output/gch-probe.c.tmp.h.gch'
fatal error: file '/home/powerllvm/powerllvm_env/aix-ppc64/clang-ppc64-aix/build/tools/clang/test/PCH/Output/gch-probe.c.tmp.h.gch' is not a valid precompiled PCH file: file too small to contain AST file magic

https://lab.llvm.org/buildbot/#/builders/214/builds/11018/steps/6/logs/FAIL__Clang__gch-probe_c

Could you take a look please?

@Bigcheese
Copy link
Contributor Author

Yep, I'll take a look.

@Bigcheese
Copy link
Contributor Author

Ah, looks like all the other gmodules tests have:

// Unsupported on AIX because we don't support the requisite "__clangast"
// section in XCOFF yet.
// UNSUPPORTED: target={{.*}}-aix{{.*}}

I'll just add that.

Bigcheese added a commit to Bigcheese/llvm-project that referenced this pull request Jan 17, 2024
Bigcheese added a commit that referenced this pull request Jan 17, 2024
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
A previous commit (82f75ed) made clang ignore .gch files that were not
Clang AST files. This broke `-gmodules`, which embeds the Clang AST into
an object file containing debug info.

This changes the probing to detect any file format recognized by
`llvm::identify_magic()` as potentially containing a Clang AST.

Previous PR: llvm#69204
@MaskRay MaskRay mentioned this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:binary-utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants