Skip to content

[LLD][COFF] Add support for ARM64EC import call thunks. #107931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions lld/COFF/Chunks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1093,4 +1093,26 @@ void CHPERedirectionChunk::writeTo(uint8_t *buf) const {
}
}

ImportThunkChunkARM64EC::ImportThunkChunkARM64EC(ImportFile *file)
: ImportThunkChunk(file->ctx, file->impSym), file(file) {}

void ImportThunkChunkARM64EC::writeTo(uint8_t *buf) const {
memcpy(buf, importThunkARM64EC, sizeof(importThunkARM64EC));
applyArm64Addr(buf, file->impSym->getRVA(), rva, 12);
applyArm64Ldr(buf + 4, file->impSym->getRVA() & 0xfff);

// The exit thunk may be missing. This can happen if the application only
// references a function by its address (in which case the thunk is never
// actually used, but is still required to fill the auxiliary IAT), or in
// cases of hand-written assembly calling an imported ARM64EC function (where
// the exit thunk is ignored by __icall_helper_arm64ec). In such cases, MSVC
// link.exe uses 0 as the RVA.
uint32_t exitThunkRVA = exitThunk ? exitThunk->getRVA() : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is 0, won't this end up with a rather bogus instruction sequence? Shouldn't we replace the adrp+add with a mov x0, #0 or similar instead, in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this behavior would be bogus. I followed MSVC's approach in these cases and implemented it the same way (since it's an RVA, it sets x10 to the image base, which requires addrp + add). We could either zero x10 or leave it unchanged in these cases. It's unlikely to matter for the OS loader, so I can make this change if you prefer.

Emitting a warning in such scenarios might be tempting, but we can’t do that because there are valid situations where an exit thunk may be missing. For example, if an application only ever takes the function’s address and never calls it directly via the import table. In these cases, the application uses __imp_aux_* symbols instead of the regular __imp_* symbols. We still need to populate the auxiliary IAT, which requires those thunks, even if they’re never used.

If we tracked __imp_ and __imp_aux_ usage separately, we might be able to skip the thunk and fill the auxiliary IAT with zeroes or something similar. However, since this behavior isn’t documented, skipping thunks would require ensuring the OS loader can handle it in all cases, which is tricky to verify with black-box testing. Therefore, it feels safer to follow MSVC’s behavior.

Another situation where exit thunks may not be available is when hand-written assembly is used to make the call. In such cases, if the callee is ARM64EC, it should still work because the thunk would usually be skipped. It’s possible the thunk wouldn’t be skipped (for example, if patching is detected in the callee’s area but not the callee itself). If the call goes through the thunk, __icall_helper_arm64ec wouldn’t use an exit thunk if the callee is ARM64EC. It would only fail if the callee isn’t ARM64EC, which the developer may know will never be the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough. So if the symbol is missing, we end up with RVA 0, which ends up in producing an adrp+add pair that produces the image base address, which is passed to __icall_helper_arm64ec, which wouldn't use it in this case - that sounds ok with me.

Perhaps a comment, saying that this 0 case can happen, but in these cases, the address ends up ignored - if I'm understanding the issue correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it may happen in valid cases, but be either ignored by __icall_helper_arm64ec or the thole thunk may be unused/skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment, thanks!

applyArm64Addr(buf + 8, exitThunkRVA, rva + 8, 12);
applyArm64Imm(buf + 12, exitThunkRVA & 0xfff, 0);

Defined *helper = cast<Defined>(file->ctx.config.arm64ECIcallHelper);
applyArm64Branch26(buf + 16, helper->getRVA() - rva - 16);
}

} // namespace lld::coff
24 changes: 24 additions & 0 deletions lld/COFF/Chunks.h
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,14 @@ static const uint8_t importThunkARM64[] = {
0x00, 0x02, 0x1f, 0xd6, // br x16
};

static const uint32_t importThunkARM64EC[] = {
0x9000000b, // adrp x11, 0x0
0xf940016b, // ldr x11, [x11]
0x9000000a, // adrp x10, 0x0
0x9100014a, // add x10, x10, #0x0
0x14000000 // b 0x0
};

// Windows-specific.
// A chunk for DLL import jump table entry. In a final output, its
// contents will be a JMP instruction to some __imp_ symbol.
Expand Down Expand Up @@ -599,6 +607,22 @@ class ImportThunkChunkARM64 : public ImportThunkChunk {
MachineTypes getMachine() const override { return ARM64; }
};

// ARM64EC __impchk_* thunk implementation.
// Performs an indirect call to an imported function pointer
// using the __icall_helper_arm64ec helper function.
class ImportThunkChunkARM64EC : public ImportThunkChunk {
public:
explicit ImportThunkChunkARM64EC(ImportFile *file);
size_t getSize() const override { return sizeof(importThunkARM64EC); };
MachineTypes getMachine() const override { return ARM64EC; }
void writeTo(uint8_t *buf) const override;

Defined *exitThunk;

private:
ImportFile *file;
};

class RangeExtensionThunkARM : public NonSectionCodeChunk {
public:
explicit RangeExtensionThunkARM(COFFLinkerContext &ctx, Defined *t)
Expand Down
1 change: 1 addition & 0 deletions lld/COFF/Config.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ struct Configuration {
std::set<std::string> delayLoads;
std::map<std::string, int> dllOrder;
Symbol *delayLoadHelper = nullptr;
Symbol *arm64ECIcallHelper = nullptr;

bool saveTemps = false;

Expand Down
7 changes: 6 additions & 1 deletion lld/COFF/Driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1383,6 +1383,11 @@ void LinkerDriver::createECExportThunks() {
}
}

void LinkerDriver::pullArm64ECIcallHelper() {
if (!ctx.config.arm64ECIcallHelper)
ctx.config.arm64ECIcallHelper = addUndefined("__icall_helper_arm64ec");
}

// In MinGW, if no symbols are chosen to be exported, then all symbols are
// automatically exported by default. This behavior can be forced by the
// -export-all-symbols option, so that it happens even when exports are
Expand Down Expand Up @@ -2685,7 +2690,7 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
if (auto *arg = args.getLastArg(OPT_print_symbol_order))
config->printSymbolOrder = arg->getValue();

ctx.symtab.initializeEntryThunks();
ctx.symtab.initializeECThunks();

// Identify unreferenced COMDAT sections.
if (config->doGC) {
Expand Down
2 changes: 2 additions & 0 deletions lld/COFF/Driver.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ class LinkerDriver {

std::unique_ptr<llvm::TarWriter> tar; // for /linkrepro

void pullArm64ECIcallHelper();

private:
// Searches a file from search paths.
std::optional<StringRef> findFileIfNew(StringRef filename);
Expand Down
7 changes: 7 additions & 0 deletions lld/COFF/InputFiles.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,8 @@ void ObjFile::initializeECThunks() {
ctx.symtab.addEntryThunk(getSymbol(entry->src), getSymbol(entry->dst));
break;
case Arm64ECThunkType::Exit:
ctx.symtab.addExitThunk(getSymbol(entry->src), getSymbol(entry->dst));
break;
case Arm64ECThunkType::GuestExit:
break;
default:
Expand Down Expand Up @@ -1088,6 +1090,11 @@ void ImportFile::parse() {
thunkSym = ctx.symtab.addImportThunk(
name, impSym, make<ImportThunkChunkX64>(ctx, impSym));
// FIXME: Add aux IAT symbols.

StringRef impChkName = saver().save("__impchk_" + name);
impchkThunk = make<ImportThunkChunkARM64EC>(this);
ctx.symtab.addImportThunk(impChkName, impSym, impchkThunk);
ctx.driver.pullArm64ECIcallHelper();
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions lld/COFF/InputFiles.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ class DefinedImportData;
class DefinedImportThunk;
class DefinedRegular;
class ImportThunkChunk;
class ImportThunkChunkARM64EC;
class SectionChunk;
class Symbol;
class Undefined;
Expand Down Expand Up @@ -349,6 +350,7 @@ class ImportFile : public InputFile {

DefinedImportData *impSym = nullptr;
Symbol *thunkSym = nullptr;
ImportThunkChunkARM64EC *impchkThunk = nullptr;
std::string dllName;

private:
Expand Down
22 changes: 16 additions & 6 deletions lld/COFF/MarkLive.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,23 @@ void markLive(COFFLinkerContext &ctx) {
worklist.push_back(c);
};

auto addSym = [&](Symbol *b) {
if (auto *sym = dyn_cast<DefinedRegular>(b))
std::function<void(Symbol *)> addSym;

auto addImportFile = [&](ImportFile *file) {
file->live = true;
if (file->impchkThunk && file->impchkThunk->exitThunk)
addSym(file->impchkThunk->exitThunk);
};

addSym = [&](Symbol *b) {
if (auto *sym = dyn_cast<DefinedRegular>(b)) {
enqueue(sym->getChunk());
else if (auto *sym = dyn_cast<DefinedImportData>(b))
sym->file->live = true;
else if (auto *sym = dyn_cast<DefinedImportThunk>(b))
sym->wrappedSym->file->live = sym->wrappedSym->file->thunkLive = true;
} else if (auto *sym = dyn_cast<DefinedImportData>(b)) {
addImportFile(sym->file);
} else if (auto *sym = dyn_cast<DefinedImportThunk>(b)) {
addImportFile(sym->wrappedSym->file);
sym->wrappedSym->file->thunkLive = true;
}
};

// Add GC root chunks.
Expand Down
16 changes: 15 additions & 1 deletion lld/COFF/SymbolTable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -557,7 +557,11 @@ void SymbolTable::addEntryThunk(Symbol *from, Symbol *to) {
entryThunks.push_back({from, to});
}

void SymbolTable::initializeEntryThunks() {
void SymbolTable::addExitThunk(Symbol *from, Symbol *to) {
exitThunks[from] = to;
}

void SymbolTable::initializeECThunks() {
for (auto it : entryThunks) {
auto *to = dyn_cast<Defined>(it.second);
if (!to)
Expand All @@ -573,6 +577,16 @@ void SymbolTable::initializeEntryThunks() {
}
from->getChunk()->setEntryThunk(to);
}

for (ImportFile *file : ctx.importFileInstances) {
if (!file->impchkThunk)
continue;

Symbol *sym = exitThunks.lookup(file->thunkSym);
if (!sym)
sym = exitThunks.lookup(file->impSym);
file->impchkThunk->exitThunk = dyn_cast_or_null<Defined>(sym);
}
}

Symbol *SymbolTable::addUndefined(StringRef name, InputFile *f,
Expand Down
4 changes: 3 additions & 1 deletion lld/COFF/SymbolTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,8 @@ class SymbolTable {
ImportThunkChunk *chunk);
void addLibcall(StringRef name);
void addEntryThunk(Symbol *from, Symbol *to);
void initializeEntryThunks();
void addExitThunk(Symbol *from, Symbol *to);
void initializeECThunks();

void reportDuplicate(Symbol *existing, InputFile *newFile,
SectionChunk *newSc = nullptr,
Expand Down Expand Up @@ -141,6 +142,7 @@ class SymbolTable {
std::unique_ptr<BitcodeCompiler> lto;
bool ltoCompilationDone = false;
std::vector<std::pair<Symbol *, Symbol *>> entryThunks;
llvm::DenseMap<Symbol *, Symbol *> exitThunks;

COFFLinkerContext &ctx;
};
Expand Down
2 changes: 2 additions & 0 deletions lld/COFF/Writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1248,6 +1248,8 @@ void Writer::appendImportThunks() {
DefinedImportThunk *thunk = cast<DefinedImportThunk>(file->thunkSym);
if (file->thunkLive)
textSec->addChunk(thunk->getChunk());
if (file->impchkThunk)
textSec->addChunk(file->impchkThunk);
}

if (!delayIdata.empty()) {
Expand Down
2 changes: 2 additions & 0 deletions lld/test/COFF/Inputs/loadconfig-arm64ec.s
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ __os_arm64x_dispatch_ret:
.xword 0
__os_arm64x_check_call:
.xword 0
.globl __os_arm64x_dispatch_icall
__os_arm64x_dispatch_icall:
__os_arm64x_check_icall:
.xword 0
__os_arm64x_get_x64_information:
Expand Down
84 changes: 74 additions & 10 deletions lld/test/COFF/arm64ec-import.test
Original file line number Diff line number Diff line change
Expand Up @@ -2,50 +2,73 @@ REQUIRES: aarch64, x86
RUN: split-file %s %t.dir && cd %t.dir

RUN: llvm-mc -filetype=obj -triple=arm64ec-windows test.s -o test.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows icall.s -o icall.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows hybmp.s -o hybmp.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
RUN: llvm-lib -machine:arm64ec -def:test.def -out:test-arm64ec.lib
RUN: llvm-lib -machine:arm64ec -def:test2.def -out:test2-arm64ec.lib
RUN: llvm-lib -machine:x64 -def:test.def -out:test-x86_64.lib

Link using ARM64EC import library:
RUN: lld-link -machine:arm64ec -dll -noentry -out:out.dll loadconfig-arm64ec.obj \
RUN: lld-link -machine:arm64ec -dll -noentry -out:out.dll loadconfig-arm64ec.obj icall.obj hybmp.obj \
RUN: test.obj test-arm64ec.lib test2-arm64ec.lib

Link using x86_64 import library:
RUN: lld-link -machine:arm64ec -dll -noentry -out:out2.dll loadconfig-arm64ec.obj \
RUN: lld-link -machine:arm64ec -dll -noentry -out:out2.dll loadconfig-arm64ec.obj icall.obj hybmp.obj \
RUN: test.obj test-x86_64.lib test2-arm64ec.lib

RUN: llvm-readobj --coff-imports out.dll | FileCheck --check-prefix=IMPORTS %s
RUN: llvm-readobj --coff-imports out2.dll | FileCheck --check-prefix=IMPORTS %s
IMPORTS: Import {
IMPORTS-NEXT: Name: test.dll
IMPORTS-NEXT: ImportLookupTableRVA:
IMPORTS-NEXT: ImportAddressTableRVA: 0x2000
IMPORTS-NEXT: ImportAddressTableRVA: 0x3000
IMPORTS-NEXT: Symbol: data (0)
IMPORTS-NEXT: Symbol: func (0)
IMPORTS-NEXT: Symbol: func2 (0)
IMPORTS-NEXT: }
IMPORTS-NEXT: Import {
IMPORTS-NEXT: Name: test2.dll
IMPORTS-NEXT: ImportLookupTableRVA:
IMPORTS-NEXT: ImportAddressTableRVA: 0x2020
IMPORTS-NEXT: ImportAddressTableRVA: 0x3020
IMPORTS-NEXT: Symbol: t2func (0)
IMPORTS-NEXT: }

RUN: llvm-objdump -d out.dll | FileCheck --check-prefix=DISASM %s
RUN: llvm-objdump -d out2.dll | FileCheck --check-prefix=DISASM %s

DISASM: 0000000180001000 <.text>:
DISASM-NEXT: 180001000: ff 25 02 10 00 00 jmpq *0x1002(%rip) # 0x180002008
DISASM: 180001000: 52800000 mov w0, #0x0 // =0
DISASM-NEXT: 180001004: d65f03c0 ret
DISASM-NEXT: 180001008: d000000b adrp x11, 0x180003000
DISASM-NEXT: 18000100c: f940056b ldr x11, [x11, #0x8]
DISASM-NEXT: 180001010: 9000000a adrp x10, 0x180001000 <.text>
DISASM-NEXT: 180001014: 9101114a add x10, x10, #0x44
DISASM-NEXT: 180001018: 17fffffa b 0x180001000 <.text>
DISASM-NEXT: 18000101c: d000000b adrp x11, 0x180003000
DISASM-NEXT: 180001020: f940096b ldr x11, [x11, #0x10]
DISASM-NEXT: 180001024: f0ffffea adrp x10, 0x180000000
DISASM-NEXT: 180001028: 9100014a add x10, x10, #0x0
DISASM-NEXT: 18000102c: 17fffff5 b 0x180001000 <.text>
DISASM-NEXT: 180001030: d000000b adrp x11, 0x180003000
DISASM-NEXT: 180001034: f940116b ldr x11, [x11, #0x20]
DISASM-NEXT: 180001038: 9000000a adrp x10, 0x180001000 <.text>
DISASM-NEXT: 18000103c: 9101314a add x10, x10, #0x4c
DISASM-NEXT: 180001040: 17fffff0 b 0x180001000 <.text>
DISASM-NEXT: 180001044: 52800020 mov w0, #0x1 // =1
DISASM-NEXT: 180001048: d65f03c0 ret
DISASM-NEXT: 18000104c: 52800040 mov w0, #0x2 // =2
DISASM-NEXT: 180001050: d65f03c0 ret
DISASM-NEXT: ...
DISASM-NEXT: 180002000: ff 25 02 10 00 00 jmpq *0x1002(%rip) # 0x180003008

RUN: llvm-readobj --hex-dump=.test out.dll | FileCheck --check-prefix=TESTSEC %s
RUN: llvm-readobj --hex-dump=.test out2.dll | FileCheck --check-prefix=TESTSEC %s
TESTSEC: 0x180005000 08200000 00200000 10200000 20200000
TESTSEC-NEXT: 0x180005010 00100000
TESTSEC: 0x180006000 08300000 00300000 10300000 20300000
TESTSEC-NEXT: 0x180006010 08100000 1c100000 00200000

RUN: llvm-readobj --headers out.dll | FileCheck -check-prefix=HEADERS %s
HEADERS: LoadConfigTableRVA: 0x3008
HEADERS: IATRVA: 0x2000
HEADERS: LoadConfigTableRVA: 0x4010
HEADERS: IATRVA: 0x3000
HEADERS: IATSize: 0x1000

#--- test.s
Expand All @@ -57,8 +80,49 @@ arm64ec_data_sym:
.rva __imp_data
.rva __imp_func2
.rva __imp_t2func
.rva __impchk_func
.rva __impchk_func2
.rva func

#--- icall.s
.text
.globl __icall_helper_arm64ec
.p2align 2, 0x0
__icall_helper_arm64ec:
mov w0, #0
ret

#--- hybmp.s
.section .hybmp$x, "yi"
// __imp_func exit thunk is ignored when func is defined as well
.symidx __imp_func
.symidx dead_exit_thunk
.word 4
.symidx func
.symidx func_exit_thunk
.word 4
.symidx __imp_t2func
.symidx t2func_exit_thunk
.word 4

.section .wowthk$aa,"xr",discard,func_exit_thunk
.globl func_exit_thunk
func_exit_thunk:
mov w0, #1
ret

.section .wowthk$aa,"xr",discard,t2func_exit_thunk
.globl t2func_exit_thunk
t2func_exit_thunk:
mov w0, #2
ret

.section .wowthk$aa,"xr",discard,dead_exit_thunk
.globl dead_exit_thunk
dead_exit_thunk:
mov w0, #0xdead
ret

#--- test.def
NAME test.dll
EXPORTS
Expand Down
Loading