Skip to content

Object: Don't error out on malformed bitcode files. #96848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions llvm/include/llvm/Object/ArchiveWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,30 @@ enum class SymtabWritingMode {
BigArchive64 // Only write the 64-bit symbol table.
};

void warnToStderr(Error Err);

// Write an archive directly to an output stream.
Error writeArchiveToStream(raw_ostream &Out,
ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab,
object::Archive::Kind Kind, bool Deterministic,
bool Thin, std::optional<bool> IsEC = std::nullopt);
bool Thin, std::optional<bool> IsEC = std::nullopt,
function_ref<void(Error)> Warn = warnToStderr);

Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab, object::Archive::Kind Kind,
bool Deterministic, bool Thin,
std::unique_ptr<MemoryBuffer> OldArchiveBuf = nullptr,
std::optional<bool> IsEC = std::nullopt);
std::optional<bool> IsEC = std::nullopt,
function_ref<void(Error)> Warn = warnToStderr);

// writeArchiveToBuffer is similar to writeArchive but returns the Archive in a
// buffer instead of writing it out to a file.
Expected<std::unique_ptr<MemoryBuffer>>
writeArchiveToBuffer(ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab, object::Archive::Kind Kind,
bool Deterministic, bool Thin);
bool Deterministic, bool Thin,
function_ref<void(Error)> Warn = warnToStderr);
}

#endif
64 changes: 51 additions & 13 deletions llvm/lib/Object/ArchiveWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -482,16 +482,45 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
}

static Expected<std::unique_ptr<SymbolicFile>>
getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
object::Archive::Kind Kind, function_ref<void(Error)> Warn) {
const file_magic Type = identify_magic(Buf.getBuffer());
// Don't attempt to read non-symbolic file types.
if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
return nullptr;
if (Type == file_magic::bitcode) {
auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
Buf, file_magic::bitcode, &Context);
if (!ObjOrErr)
return ObjOrErr.takeError();
// An error reading a bitcode file most likely indicates that the file
// was created by a compiler from the future. Normally we don't try to
// implement forwards compatibility for bitcode files, but when creating an
// archive we can implement best-effort forwards compatibility by treating
// the file as a blob and not creating symbol index entries for it. lld and
// mold ignore the archive symbol index, so provided that you use one of
// these linkers, LTO will work as long as lld or the gold plugin is newer
// than the compiler. We only ignore errors if the archive format is one
// that is supported by a linker that is known to ignore the index,
// otherwise there's no chance of this working so we may as well error out.
// We print a warning on read failure so that users of linkers that rely on
// the symbol index can diagnose the issue.
//
// This is the same behavior as GNU ar when the linker plugin returns an
// error when reading the input file. If the bitcode file is actually
// malformed, it will be diagnosed at link time.
if (!ObjOrErr) {
switch (Kind) {
case object::Archive::K_BSD:
case object::Archive::K_GNU:
case object::Archive::K_GNU64:
Warn(ObjOrErr.takeError());
return nullptr;
case object::Archive::K_AIXBIG:
case object::Archive::K_COFF:
case object::Archive::K_DARWIN:
case object::Archive::K_DARWIN64:
return ObjOrErr.takeError();
}
}
return std::move(*ObjOrErr);
} else {
auto ObjOrErr = object::SymbolicFile::createSymbolicFile(Buf);
Expand Down Expand Up @@ -751,7 +780,7 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
object::Archive::Kind Kind, bool Thin, bool Deterministic,
SymtabWritingMode NeedSymbols, SymMap *SymMap,
LLVMContext &Context, ArrayRef<NewArchiveMember> NewMembers,
std::optional<bool> IsEC) {
std::optional<bool> IsEC, function_ref<void(Error)> Warn) {
static char PaddingData[8] = {'\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n'};
uint64_t MemHeadPadSize = 0;
uint64_t Pos =
Expand Down Expand Up @@ -819,8 +848,10 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,

if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) {
for (const NewArchiveMember &M : NewMembers) {
Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
getSymbolicFile(M.Buf->getMemBufferRef(), Context);
Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr = getSymbolicFile(
M.Buf->getMemBufferRef(), Context, Kind, [&](Error Err) {
Warn(createFileError(M.MemberName, std::move(Err)));
});
if (!SymFileOrErr)
return createFileError(M.MemberName, SymFileOrErr.takeError());
SymFiles.push_back(std::move(*SymFileOrErr));
Expand Down Expand Up @@ -1001,7 +1032,8 @@ Error writeArchiveToStream(raw_ostream &Out,
ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab,
object::Archive::Kind Kind, bool Deterministic,
bool Thin, std::optional<bool> IsEC) {
bool Thin, std::optional<bool> IsEC,
function_ref<void(Error)> Warn) {
assert((!Thin || !isBSDLike(Kind)) && "Only the gnu format has a thin mode");

SmallString<0> SymNamesBuf;
Expand All @@ -1023,7 +1055,7 @@ Error writeArchiveToStream(raw_ostream &Out,

Expected<std::vector<MemberData>> DataOrErr = computeMemberData(
StringTable, SymNames, Kind, Thin, Deterministic, WriteSymtab,
isCOFFArchive(Kind) ? &SymMap : nullptr, Context, NewMembers, IsEC);
isCOFFArchive(Kind) ? &SymMap : nullptr, Context, NewMembers, IsEC, Warn);
if (Error E = DataOrErr.takeError())
return E;
std::vector<MemberData> &Data = *DataOrErr;
Expand Down Expand Up @@ -1266,19 +1298,23 @@ Error writeArchiveToStream(raw_ostream &Out,
return Error::success();
}

void warnToStderr(Error Err) {
llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(), "warning: ");
}

Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab, object::Archive::Kind Kind,
bool Deterministic, bool Thin,
std::unique_ptr<MemoryBuffer> OldArchiveBuf,
std::optional<bool> IsEC) {
std::optional<bool> IsEC, function_ref<void(Error)> Warn) {
Expected<sys::fs::TempFile> Temp =
sys::fs::TempFile::create(ArcName + ".temp-archive-%%%%%%%.a");
if (!Temp)
return Temp.takeError();
raw_fd_ostream Out(Temp->FD, false);

if (Error E = writeArchiveToStream(Out, NewMembers, WriteSymtab, Kind,
Deterministic, Thin, IsEC)) {
Deterministic, Thin, IsEC, Warn)) {
if (Error DiscardError = Temp->discard())
return joinErrors(std::move(E), std::move(DiscardError));
return E;
Expand All @@ -1302,12 +1338,14 @@ Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
Expected<std::unique_ptr<MemoryBuffer>>
writeArchiveToBuffer(ArrayRef<NewArchiveMember> NewMembers,
SymtabWritingMode WriteSymtab, object::Archive::Kind Kind,
bool Deterministic, bool Thin) {
bool Deterministic, bool Thin,
function_ref<void(Error)> Warn) {
SmallVector<char, 0> ArchiveBufferVector;
raw_svector_ostream ArchiveStream(ArchiveBufferVector);

if (Error E = writeArchiveToStream(ArchiveStream, NewMembers, WriteSymtab,
Kind, Deterministic, Thin, std::nullopt))
if (Error E =
writeArchiveToStream(ArchiveStream, NewMembers, WriteSymtab, Kind,
Deterministic, Thin, std::nullopt, Warn))
return std::move(E);

return std::make_unique<SmallVectorMemoryBuffer>(
Expand Down
44 changes: 37 additions & 7 deletions llvm/test/Object/archive-malformed-object.test
Original file line number Diff line number Diff line change
@@ -1,45 +1,75 @@
## Show that the archive library emits error messages when adding malformed
## objects.
## object files and skips symbol tables for "malformed" bitcode files, which
## are assumed to be bitcode files generated by compilers from the future.

# RUN: rm -rf %t.dir
# RUN: split-file %s %t.dir
# RUN: cd %t.dir

## Malformed bitcode object is the first file member of archive if the symbol table is required.
## Create a malformed bitcode object.
# RUN: llvm-as input.ll -o input.bc
# RUN: cp input.bc good.bc
# RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"
# RUN: not llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1

## Malformed bitcode object is the last file member of archive if the symbol table is required.
## Malformed bitcode objects either warn or error depending on the archive format
## (see switch in getSymbolicFile). If the archive was created with a warning,
## we want to check that the archive map is empty. llvm-nm will fail when it
## tries to read the malformed bitcode file, but it's supposed to print the
## archive map first, which in this case it won't because there won't be one.
# RUN: rm -rf bad.a
# RUN: not llvm-ar rc bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: llvm-ar --format=bsd rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
# RUN: not llvm-nm --print-armap bad.a | count 0
# RUN: rm -rf bad.a
# RUN: llvm-ar --format=gnu rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
# RUN: not llvm-nm --print-armap bad.a | count 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2>&1 and check the stderr as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only want to test that there is no archive map. The test happens to use the functionality of llvm-nm. We don't particularly care whether (how) llvm-nm fails afterwards, because this is not a test of llvm-nm. We need to use not but that's only necessary to prevent the llvm-nm failure from causing a test failure.

With 2>&1 we would need to test that the first line of llvm-nm output is something like

llvm-nm: error: bad.a(input.bc): Invalid bitcode signature

But I don't think there's a way to use FileCheck to check that the first line of output matches a pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I thought a CHECK-FIRST directive was added to FileCheck, to allow that, but evidently not. If you did want to check that something is the first line though, I think you can do something like this:

# CHECK: {{^}}
# CHECK-SAME: thing I'm interested in

This works, because the {{^}} matches the start of the line, and since every line has a start, it matches specifically the first line. The CHECK-SAME then pins the thing it's checking to the previously-matched line, i.e. the first line.

(I am ambivalent on whether you should do this for this specific case, but I think it may be worth a comment explaining why you're doing this llvm-nm invocation here either way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. That's an interesting trick, but it's probably not worth it here.

# RUN: rm -rf bad.a
# RUN: not llvm-ar --format=bigarchive rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: rm -rf bad.a
# RUN: not llvm-ar --format=coff rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: rm -rf bad.a
# RUN: not llvm-ar --format=darwin rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1

## Malformed bitcode object is the last file member of archive and
## the symbol table is required. In this case we check that the
## symbol table contains entries for the good object only.
# RUN: rm -rf bad.a
# RUN: llvm-ar rc bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
# RUN: not llvm-nm --print-armap bad.a | FileCheck %s --check-prefix=ARMAP

## Malformed bitcode object if the symbol table is not required for big archive.
## For big archives we print an error instead of a warning because the AIX linker
## presumably requires the index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diggerlin, could you confirm that the big archive format does require the archive symbol index, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EsmeYi / @hubert-reinterpretcast are either of you able to advise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with XCOFF, but it presumably requires the index.
I believe all of PE/COFF, XCOFF, Mach-O linkers require the index.

ELF linkers like lld and mold have ignored the index completely (https://maskray.me/blog/2022-01-16-archives-and-start-lib). lld's wasm port has followed up, but other ports keep using the index. (I do want to help, but changing the other ports has a very low priority in my task list...)

# RUN: rm -rf bad.a
# RUN: not llvm-ar --format=bigarchive rcS bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: rm -rf bad.a
# RUN: not llvm-ar --format=bigarchive rcS bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=ERR1

# ERR1: error: bad.a: 'input.bc': Invalid bitcode signature
# WARN1: warning: 'input.bc': Invalid bitcode signature

## Non-bitcode malformed file.
# RUN: yaml2obj input.yaml -o input.o
# RUN: not llvm-ar rc bad.a input.o 2>&1 | FileCheck %s --check-prefix=ERR2

# ERR2: error: bad.a: 'input.o': section header table goes past the end of the file: e_shoff = 0x9999

## Don't emit an error if the symbol table is not required for formats other than the big archive format.
# RUN: llvm-ar --format=gnu rcS good.a input.o input.bc
## Don't emit an error or warning if the symbol table is not required for formats other than the big archive format.
# RUN: llvm-ar --format=gnu rcS good.a input.o input.bc 2>&1 | count 0
# RUN: llvm-ar t good.a | FileCheck %s --check-prefix=CONTENTS

# CONTENTS: input.o
# CONTENTS-NEXT: input.bc

# ARMAP: Archive map
# ARMAP-NEXT: foo in good.bc
# ARMAP-EMPTY:

#--- input.ll
target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux"

@foo = global i32 1

#--- input.yaml
--- !ELF
FileHeader:
Expand Down
Loading