Skip to content

[lld-macho] Support archives without index #132942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions lld/MachO/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -314,8 +314,6 @@ static InputFile *addFile(StringRef path, LoadType loadType,
std::unique_ptr<object::Archive> archive = CHECK(
object::Archive::create(mbref), path + ": failed to parse archive");

if (!archive->isEmpty() && !archive->hasSymbolTable())
error(path + ": archive has no index; run ranlib to add one");
file = make<ArchiveFile>(std::move(archive), isForceHidden);

if (tar && file->getArchive().isThin())
Expand Down Expand Up @@ -362,9 +360,11 @@ static InputFile *addFile(StringRef path, LoadType loadType,
": Archive::children failed: " + toString(std::move(e)));
}
} else if (isCommandLineLoad && config->forceLoadObjC) {
for (const object::Archive::Symbol &sym : file->getArchive().symbols())
if (sym.getName().starts_with(objc::symbol_names::klass))
file->fetch(sym);
if (file->getArchive().hasSymbolTable()) {
for (const object::Archive::Symbol &sym : file->getArchive().symbols())
if (sym.getName().starts_with(objc::symbol_names::klass))
file->fetch(sym);
}

// TODO: no need to look for ObjC sections for a given archive member if
// we already found that it contains an ObjC symbol.
Expand Down Expand Up @@ -394,7 +394,6 @@ static InputFile *addFile(StringRef path, LoadType loadType,
": Archive::children failed: " + toString(std::move(e)));
}
}

file->addLazySymbols();
loadedArchives[path] = ArchiveFileInfo{file, isCommandLineLoad};
newFile = file;
Expand Down
60 changes: 43 additions & 17 deletions lld/MachO/InputFiles.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2159,9 +2159,31 @@ ArchiveFile::ArchiveFile(std::unique_ptr<object::Archive> &&f, bool forceHidden)
void ArchiveFile::addLazySymbols() {
// Avoid calling getMemoryBufferRef() on zero-symbol archive
// since that crashes.
if (file->isEmpty() || file->getNumberOfSymbols() == 0)
if (file->isEmpty() ||
(file->hasSymbolTable() && file->getNumberOfSymbols() == 0))
return;

if (!file->hasSymbolTable()) {
// No index, treat each child as a lazy object file.
Error e = Error::success();
for (const object::Archive::Child &c : file->children(e)) {
// Check `seen` but don't insert so a future eager load can still happen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for when we first read a .a but it's then -all_load / -force_loaded later, yes?

Do we parse all .o files a second time then? (In any case, this isn't something that we expect to happen in practice, yes?)

(ps: good tests for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going off the logic in Driver.cpp, specifically:

      // Command-line loads take precedence. If file is previously loaded via
      // command line, or is loaded via LC_LINKER_OPTION and being loaded via
      // LC_LINKER_OPTION again, using the cached archive is enough.

so I guess it's your scenario but s/force_load/LC_LINKER_OPTION. I think we would actually drop the force load in that case. Is that a bug?

if (seen.contains(c.getChildOffset()))
continue;
if (!seenLazy.insert(c.getChildOffset()).second)
continue;
auto file = childToObjectFile(c, /*lazy=*/true);
if (!file)
error(toString(this) +
": couldn't process child: " + toString(file.takeError()));
inputFiles.insert(*file);
}
if (e)
error(toString(this) +
": Archive::children failed: " + toString(std::move(e)));
return;
}

Error err = Error::success();
auto child = file->child_begin(err);
// Ignore the I/O error here - will be reported later.
Expand Down Expand Up @@ -2191,16 +2213,17 @@ void ArchiveFile::addLazySymbols() {

static Expected<InputFile *>
loadArchiveMember(MemoryBufferRef mb, uint32_t modTime, StringRef archiveName,
uint64_t offsetInArchive, bool forceHidden, bool compatArch) {
uint64_t offsetInArchive, bool forceHidden, bool compatArch,
bool lazy) {
if (config->zeroModTime)
modTime = 0;

switch (identify_magic(mb.getBuffer())) {
case file_magic::macho_object:
return make<ObjFile>(mb, modTime, archiveName, /*lazy=*/false, forceHidden,
return make<ObjFile>(mb, modTime, archiveName, lazy, forceHidden,
compatArch);
case file_magic::bitcode:
return make<BitcodeFile>(mb, archiveName, offsetInArchive, /*lazy=*/false,
return make<BitcodeFile>(mb, archiveName, offsetInArchive, lazy,
forceHidden, compatArch);
default:
return createStringError(inconvertibleErrorCode(),
Expand All @@ -2212,19 +2235,7 @@ loadArchiveMember(MemoryBufferRef mb, uint32_t modTime, StringRef archiveName,
Error ArchiveFile::fetch(const object::Archive::Child &c, StringRef reason) {
if (!seen.insert(c.getChildOffset()).second)
return Error::success();

Expected<MemoryBufferRef> mb = c.getMemoryBufferRef();
if (!mb)
return mb.takeError();

Expected<TimePoint<std::chrono::seconds>> modTime = c.getLastModified();
if (!modTime)
return modTime.takeError();

Expected<InputFile *> file =
loadArchiveMember(*mb, toTimeT(*modTime), getName(), c.getChildOffset(),
forceHidden, compatArch);

auto file = childToObjectFile(c, /*lazy=*/false);
if (!file)
return file.takeError();

Expand All @@ -2251,6 +2262,21 @@ void ArchiveFile::fetch(const object::Archive::Symbol &sym) {
toMachOString(symCopy) + ": " + toString(std::move(e)));
}

Expected<InputFile *>
ArchiveFile::childToObjectFile(const llvm::object::Archive::Child &c,
bool lazy) {
Expected<MemoryBufferRef> mb = c.getMemoryBufferRef();
if (!mb)
return mb.takeError();

Expected<TimePoint<std::chrono::seconds>> modTime = c.getLastModified();
if (!modTime)
return modTime.takeError();

return loadArchiveMember(*mb, toTimeT(*modTime), getName(),
c.getChildOffset(), forceHidden, compatArch, lazy);
}

static macho::Symbol *createBitcodeSymbol(const lto::InputFile::Symbol &objSym,
BitcodeFile &file) {
StringRef name = saver().save(objSym.getName());
Expand Down
3 changes: 3 additions & 0 deletions lld/MachO/InputFiles.h
Original file line number Diff line number Diff line change
Expand Up @@ -297,10 +297,13 @@ class ArchiveFile final : public InputFile {
static bool classof(const InputFile *f) { return f->kind() == ArchiveKind; }

private:
Expected<InputFile *> childToObjectFile(const llvm::object::Archive::Child &c,
bool lazy);
std::unique_ptr<llvm::object::Archive> file;
// Keep track of children fetched from the archive by tracking
// which address offsets have been fetched already.
llvm::DenseSet<uint64_t> seen;
llvm::DenseSet<uint64_t> seenLazy;
// Load all symbols with hidden visibility (-load_hidden).
bool forceHidden;
};
Expand Down
23 changes: 23 additions & 0 deletions lld/test/MachO/archive-no-index.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
; REQUIRES: x86
; RUN: rm -rf %t; split-file %s %t

; RUN: llvm-as %t/lib.ll -o %t/lib.o
; RUN: llvm-mc -filetype=obj -triple=x86_64-apple-macos -o %t/main.o %t/main.s
; RUN: llvm-ar rcST %t/lib.a %t/lib.o
; RUN: %lld %t/main.o %t/lib.a -o %t/out

;--- main.s
.global _main
_main:
call _foo
mov $0, %rax
ret

;--- lib.ll
target triple = "x86_64-apple-darwin"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

define void @foo() {
entry:
ret void
}
43 changes: 43 additions & 0 deletions lld/test/MachO/archive-no-index.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# REQUIRES: x86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a bitcode test too, since this adds support for that as well as far as I can tell?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# RUN: rm -rf %t; split-file %s %t
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-macos -o %t/main.o %t/main.s
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-macos -o %t/lib.o %t/lib.s
# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-macos -o %t/lib2.o %t/lib2.s
# RUN: llvm-ar crST %t/lib.a %t/lib.o %t/lib2.o
# RUN: %lld %t/main.o %t/lib.a -o %t/out
# RUN: llvm-nm %t/out | FileCheck %s

# CHECK-NOT: T _bar
# CHECK: T _foo

## Test that every kind of eager load mechanism still works.
# RUN: %lld %t/main.o %t/lib.a -all_load -o %t/all_load
# RUN: llvm-nm %t/all_load | FileCheck %s --check-prefix FORCED-LOAD
# RUN: %lld %t/main.o -force_load %t/lib.a -o %t/force_load
# RUN: llvm-nm %t/force_load | FileCheck %s --check-prefix FORCED-LOAD
# RUN: %lld %t/main.o %t/lib.a -ObjC -o %t/objc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re -ObjC: How expensive would it be to look at objc symbols in all LazyObjects?

(But fine to punt on this for now: As you say, it's not clear if it's needed, and the current behavior is consistent with -start-lib / -end-lib.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively cheap if we do it ObjFile::parseLazy. The annoying part would be going back and turning every lazy we just created into a defined partway through the iteration. I'm still confused on if this can actually happen though.

# RUN: llvm-nm %t/objc | FileCheck %s --check-prefix FORCED-LOAD

# FORCED-LOAD: T _bar

#--- lib.s
.global _foo
_foo:
ret

#--- lib2.s
.section __DATA,__objc_catlist
.quad 0x1234

.section __TEXT,__text
.global _bar
_bar:
ret

#--- main.s
.global _main
_main:
call _foo
mov $0, %rax
ret
32 changes: 0 additions & 32 deletions lld/test/MachO/invalid/archive-no-index.s

This file was deleted.