Skip to content

[LLD][COFF] Create COFFObjectFile instance when constructing ObjFile (NFC) #120144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Dec 16, 2024

This change moves the creation of COFFObjectFile to the construction of ObjFile, instead of delaying it until parsing.

This change moves the creation of COFFObjectFile to the construction of ObjFile, instead of delaying it until parsing.
@llvmbot
Copy link
Member

llvmbot commented Dec 16, 2024

@llvm/pr-subscribers-platform-windows
@llvm/pr-subscribers-lld-coff

@llvm/pr-subscribers-lld

Author: Jacek Caban (cjacek)

Changes

This change moves the creation of COFFObjectFile to the construction of ObjFile, instead of delaying it until parsing.


Full diff: https://github.com/llvm/llvm-project/pull/120144.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+3-3)
  • (modified) lld/COFF/InputFiles.cpp (+18-17)
  • (modified) lld/COFF/InputFiles.h (+4-2)
  • (modified) lld/COFF/LTO.cpp (+1-1)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 0705f1c1be9992..b588086a3c98b8 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -230,7 +230,7 @@ void LinkerDriver::addBuffer(std::unique_ptr<MemoryBuffer> mb,
     break;
   case file_magic::coff_object:
   case file_magic::coff_import_library:
-    ctx.symtab.addFile(make<ObjFile>(ctx, mbref, lazy));
+    ctx.symtab.addFile(ObjFile::create(ctx, mbref, lazy));
     break;
   case file_magic::pdb:
     ctx.symtab.addFile(make<PDBInputFile>(ctx, mbref));
@@ -313,7 +313,7 @@ void LinkerDriver::addArchiveBuffer(MemoryBufferRef mb, StringRef symName,
 
   InputFile *obj;
   if (magic == file_magic::coff_object) {
-    obj = make<ObjFile>(ctx, mb);
+    obj = ObjFile::create(ctx, mb);
   } else if (magic == file_magic::bitcode) {
     obj =
         make<BitcodeFile>(ctx, mb, parentName, offsetInArchive, /*lazy=*/false);
@@ -1390,7 +1390,7 @@ void LinkerDriver::convertResources() {
     return;
   }
   ObjFile *f =
-      make<ObjFile>(ctx, convertResToCOFF(resources, resourceObjFiles));
+      ObjFile::create(ctx, convertResToCOFF(resources, resourceObjFiles));
   ctx.symtab.addFile(f);
   f->includeResourceChunks();
 }
diff --git a/lld/COFF/InputFiles.cpp b/lld/COFF/InputFiles.cpp
index 9ff9346ed598b6..6ce075be798e4a 100644
--- a/lld/COFF/InputFiles.cpp
+++ b/lld/COFF/InputFiles.cpp
@@ -162,13 +162,26 @@ lld::coff::getArchiveMembers(COFFLinkerContext &ctx, Archive *file) {
   return v;
 }
 
-ObjFile::ObjFile(COFFLinkerContext &ctx, MemoryBufferRef m, bool lazy)
-    : InputFile(ctx.symtab, ObjectKind, m, lazy) {}
+ObjFile::ObjFile(SymbolTable &symtab, COFFObjectFile *coffObj, bool lazy)
+    : InputFile(symtab, ObjectKind, coffObj->getMemoryBufferRef(), lazy),
+      coffObj(coffObj) {}
+
+ObjFile *ObjFile::create(COFFLinkerContext &ctx, MemoryBufferRef m, bool lazy) {
+  // Parse a memory buffer as a COFF file.
+  Expected<std::unique_ptr<Binary>> bin = createBinary(m);
+  if (!bin)
+    Fatal(ctx) << "Could not parse " << m.getBufferIdentifier();
+
+  auto *obj = dyn_cast<COFFObjectFile>(bin->get());
+  if (!obj)
+    Fatal(ctx) << m.getBufferIdentifier() << " is not a COFF file";
+
+  bin->release();
+  return make<ObjFile>(ctx.symtab, obj, lazy);
+}
 
 void ObjFile::parseLazy() {
   // Native object file.
-  std::unique_ptr<Binary> coffObjPtr = CHECK(createBinary(mb), this);
-  COFFObjectFile *coffObj = cast<COFFObjectFile>(coffObjPtr.get());
   uint32_t numSymbols = coffObj->getNumberOfSymbols();
   for (uint32_t i = 0; i < numSymbols; ++i) {
     COFFSymbolRef coffSym = check(coffObj->getSymbol(i));
@@ -219,16 +232,6 @@ void ObjFile::initializeECThunks() {
 }
 
 void ObjFile::parse() {
-  // Parse a memory buffer as a COFF file.
-  std::unique_ptr<Binary> bin = CHECK(createBinary(mb), this);
-
-  if (auto *obj = dyn_cast<COFFObjectFile>(bin.get())) {
-    bin.release();
-    coffObj.reset(obj);
-  } else {
-    Fatal(symtab.ctx) << toString(this) << " is not a COFF file";
-  }
-
   // Read section and symbol tables.
   initializeChunks();
   initializeSymbols();
@@ -807,9 +810,7 @@ std::optional<Symbol *> ObjFile::createDefined(
 }
 
 MachineTypes ObjFile::getMachineType() const {
-  if (coffObj)
-    return static_cast<MachineTypes>(coffObj->getMachine());
-  return IMAGE_FILE_MACHINE_UNKNOWN;
+  return static_cast<MachineTypes>(coffObj->getMachine());
 }
 
 ArrayRef<uint8_t> ObjFile::getDebugSection(StringRef secName) {
diff --git a/lld/COFF/InputFiles.h b/lld/COFF/InputFiles.h
index fd2e409ada30f4..956e4dd8bc4cf6 100644
--- a/lld/COFF/InputFiles.h
+++ b/lld/COFF/InputFiles.h
@@ -136,8 +136,10 @@ class ArchiveFile : public InputFile {
 // .obj or .o file. This may be a member of an archive file.
 class ObjFile : public InputFile {
 public:
-  explicit ObjFile(COFFLinkerContext &ctx, MemoryBufferRef m,
-                   bool lazy = false);
+  static ObjFile *create(COFFLinkerContext &ctx, MemoryBufferRef mb,
+                         bool lazy = false);
+  explicit ObjFile(SymbolTable &symtab, COFFObjectFile *coffObj, bool lazy);
+
   static bool classof(const InputFile *f) { return f->kind() == ObjectKind; }
   void parse() override;
   void parseLazy();
diff --git a/lld/COFF/LTO.cpp b/lld/COFF/LTO.cpp
index d7e003ce3e2d88..a8cecb39ac614c 100644
--- a/lld/COFF/LTO.cpp
+++ b/lld/COFF/LTO.cpp
@@ -259,7 +259,7 @@ std::vector<InputFile *> BitcodeCompiler::compile() {
     if (llvm::is_contained(ctx.config.saveTempsArgs, "prelink") || emitASM)
       saveBuffer(buf[i].second, ltoObjName);
     if (!emitASM)
-      ret.push_back(make<ObjFile>(ctx, MemoryBufferRef(objBuf, ltoObjName)));
+      ret.push_back(ObjFile::create(ctx, MemoryBufferRef(objBuf, ltoObjName)));
   }
 
   return ret;

@cjacek
Copy link
Contributor Author

cjacek commented Dec 16, 2024

This change prepares for re-landing #119294. Determining the machine type of the object is necessary when constructing InputFile, as it dictates which symbol table to use. Initially, the machine type was read directly from the memory buffer, which works for regular COFF files but requires different handling for bigobj files. While it would be feasible to implement that logic directly, the necessary handling already exists in the COFFObjectFile implementation (and identify_magic), making it seems better to leverage that instead.

Currently, the creation of COFFObjectFile instances is postponed until the file is parsed. This PR moves the creation earlier in the process. Since these instances are created regardless, this change does not introduce additional work but simply schedules it earlier.

This change introduces no memory overhead for fully parsed files. However, it does affect objects that are lazily parsed, such as those in thin libraries or wholearchive imports. Currently, a temporary COFFObjectFile is created in parseLazy and recreated if the file is later used. With this PR, parseLazy reuses the object stored in ObjFile, making it non-temporary. While this changes memory usage, the impact is minimal since COFFObjectFile is primarily a set of pointers to the memory buffer, which is already stored. For fully parsed files, this PR avoids constructing the object twice, offering a slight performance improvement.

Creating the underlying object to represent the binary type is not a new approach, as it is already done for LTO input files.

With this PR, it becomes possible to use COFFObjectFile::getMachine() directly in ObjFile::create as part of #119294.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this is kinda complicated, but your explanation sounds reasonable, and I think I've managed to follow it, so this sounds ok to me.

@aganea
Copy link
Member

aganea commented Dec 16, 2024

How does this do in terms of link time when linking large monolithic/static binaries? Are you able to profile? I am thinking about PEs that bring in lots of libraries during link, ie. clang.exe linked with /MT, chrome.exe in non-component mode or Unreal Engine Editor in monolithic mode. The change seems ok, but better be sure.

@cjacek
Copy link
Contributor Author

cjacek commented Dec 17, 2024

I benchmarked this change by cross-compiling clang.exe with llvm-mingw, and the difference was within the margin of error. I also checked -start-lib/-end-lib code path by hacking a script to use that for linking clang.exe and observed similar results. According to perf report, COFFObjectFile::create accounts for approximately 0.01% of execution time, and the change does not affect measured peak memory usage.

@cjacek cjacek merged commit 9c8214f into llvm:main Dec 17, 2024
12 checks passed
@cjacek cjacek deleted the obj-create branch December 17, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants