Skip to content

[LLD][COFF] Generate redirection metadata for custom ARM64EC export thunks. #105901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Aug 23, 2024

This allows using custom export thunks instead of default generated ones. This is useful for performance in cases where transfering between JIT and ARM64EC code is more expensive than just emulating the whole function (but it's still useful to have ARM64EC version so that ARM64EC callers don't call into the emulator). It's also useful for compatibility, where applications have specific expectations about function contents (like syscall functions).

@llvmbot
Copy link
Member

llvmbot commented Aug 23, 2024

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-coff

Author: Jacek Caban (cjacek)

Changes

This allows using custom export thunks instead of default generated ones. This is useful for performance in cases where transfering between JIT and ARM64EC code is more expensive than just emulating the whole function (but it's still useful to have ARM64EC version so that ARM64EC callers don't call into the emulator). It's also useful for compatibility, where applications have specific expectations about function contents (like syscall functions).


Full diff: https://github.com/llvm/llvm-project/pull/105901.diff

2 Files Affected:

  • (modified) lld/COFF/Writer.cpp (+14)
  • (added) lld/test/COFF/arm64ec-cust-export-thunk.s (+82)
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 0360e186ecf0cf..5fc05d7556f4f8 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -2062,6 +2062,20 @@ void Writer::createECChunks() {
     if (auto thunk = dyn_cast<ECExportThunkChunk>(sym->getChunk())) {
       hexpthkSec->addChunk(thunk);
       exportThunks.push_back({thunk, thunk->target});
+    } else if (auto def = dyn_cast<DefinedRegular>(sym)) {
+      // Allow section chunk to be treated as an export thunk if it looks like
+      // one.
+      SectionChunk *chunk = def->getChunk();
+      if (!chunk->live || chunk->getMachine() != AMD64)
+        continue;
+      assert(sym->getName().starts_with("EXP+"));
+      StringRef targetName = sym->getName().substr(strlen("EXP+"));
+      Symbol *targetSym = ctx.symtab.find((targetName + "$hp_target").str());
+      if (!targetSym)
+        targetSym = ctx.symtab.find(targetName);
+      Defined *t = dyn_cast_or_null<Defined>(targetSym);
+      if (t && isArm64EC(t->getChunk()->getMachine()))
+        exportThunks.push_back({chunk, t});
     }
   }
 
diff --git a/lld/test/COFF/arm64ec-cust-export-thunk.s b/lld/test/COFF/arm64ec-cust-export-thunk.s
new file mode 100644
index 00000000000000..e56f34c8752173
--- /dev/null
+++ b/lld/test/COFF/arm64ec-cust-export-thunk.s
@@ -0,0 +1,82 @@
+# REQUIRES: aarch64, x86
+# RUN: split-file %s %t.dir && cd %t.dir
+
+# Test that metadata is generated when a custom export thunk is supplied.
+
+# RUN: llvm-mc -filetype=obj -triple=arm64ec-windows func.s -o func.obj
+# RUN: llvm-mc -filetype=obj -triple=arm64ec-windows hp-func.s -o hp-func.obj
+# RUN: llvm-mc -filetype=obj -triple=x86_64-windows thunk.s -o thunk.obj
+# RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
+
+# RUN: lld-link -out:out.dll -machine:arm64ec func.obj thunk.obj loadconfig-arm64ec.obj -dll -noentry "-export:#func,EXPORTAS,func"
+
+# RUN: llvm-objdump -d out.dll | FileCheck --check-prefixes=DISASM,DISASM-EXP %s
+# DISASM:      Disassembly of section .text:
+# DISASM-EMPTY:
+# DISASM-NEXT: 0000000180001000 <.text>:
+# DISASM-NEXT: 180001000: 52800040     mov     w0, #0x2                // =2
+# DISASM-NEXT: 180001004: d65f03c0     ret
+# DISASM-NEXT:                 ...
+# DISASM-EXP-EMPTY:
+# DISASM-EXP-NEXT: 0000000180002000 <func>:
+# DISASM-NEXT: 180002000: b8 03 00 00 00               movl    $0x3, %eax
+# DISASM-NEXT: 180002005: c3                           retq
+
+# RUN: llvm-objdump -p out.dll | FileCheck --check-prefix=EXPORT %s
+# EXPORT:      Ordinal      RVA  Name
+# EXPORT-NEXT:       1   0x2000  func
+
+# RUN: llvm-readobj --coff-load-config out.dll | FileCheck --check-prefix=CHPE %s
+# CHPE:       CodeMap [
+# CHPE-NEXT:    0x1000 - 0x1008  ARM64EC
+# CHPE-NEXT:    0x2000 - 0x2006  X64
+# CHPE-NEXT:  ]
+# CHPE-NEXT:  CodeRangesToEntryPoints [
+# CHPE-NEXT:    0x2000 - 0x2006 -> 0x2000
+# CHPE-NEXT:  ]
+# CHPE-NEXT:  RedirectionMetadata [
+# CHPE-NEXT:    0x2000 -> 0x1000
+# CHPE-NEXT:  ]
+
+# RUN: lld-link -out:out2.dll -machine:arm64ec hp-func.obj thunk.obj loadconfig-arm64ec.obj -dll -noentry
+# RUN: llvm-objdump -d out2.dll | FileCheck --check-prefix=DISASM %s
+# RUN: llvm-readobj --coff-load-config out2.dll | FileCheck --check-prefix=CHPE %s
+
+#--- func.s
+    .globl "#func"
+    .p2align 2, 0x0
+"#func":
+    mov w0, #2
+    ret
+
+#--- hp-func.s
+    .section .text,"xr",discard,"#func$hp_target"
+    .globl "#func$hp_target"
+    .p2align 2, 0x0
+"#func$hp_target":
+    mov w0, #2
+    ret
+
+    .def "EXP+#func"
+    .scl 2
+    .type 32
+    .endef
+    .weak func
+.set func, "EXP+#func"
+    .weak "#func"
+.set "#func", "#func$hp_target"
+
+    .data
+    .rva func
+
+#--- thunk.s
+    .def "EXP+#func"
+    .scl 2
+    .type 32
+    .endef
+    .section .wowthk$aa,"xr",discard,"EXP+#func"
+    .globl "EXP+#func"
+    .p2align 2, 0x0
+"EXP+#func":
+    movl $3, %eax
+    retq

@cjacek
Copy link
Contributor Author

cjacek commented Aug 23, 2024

A more detailed explanation of a scenario where this impacts performance can be found in this blog article (search for "EXP+#").

During my experimentation, I had limited success trying to get the MSVC linker to generate executables with the required thunks. While it is possible to pass these thunks similarly to how they are handled in tests, this approach seems less flexible than the implementation I’ve provided here. For instance, exporting the thunks leads to a duplicated symbol error, which I worked around by using the DATA export type to prevent the linker from generating another thunk. However, I wasn’t able to find a way to control the redirection target directly. It appears that MSVC linker reads the contents at an offset expected in EXExportThunkCode and blindly uses that as if it were a generated thunk.

This patch, on the other hand, uses symbol lookup, ensuring consistent behavior akin to how generated code would function. It identifies and treats any thunk chunk resembling an export thunk accordingly.

It depends on #105897 for a corner case when EXP+ symbol is a weak alias.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks!

StringRef targetName = sym->getName().substr(strlen("EXP+"));
Symbol *targetSym = ctx.symtab.find((targetName + "$hp_target").str());
if (!targetSym)
targetSym = ctx.symtab.find(targetName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here, that for an export thunk EXP+#foo, we're first looking for #foo$hp_target, but secondarily also for #foo. (I was wondering about this and scratching my head, overlooking the fact that we do look for the unsuffixed version two lines below...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I added a comment and merged. Thanks for reviews!

…hunks.

This allows using custom export thunks instead of default generated ones.
This is useful for performance in cases where transfering between JIT and
ARM64EC code is more expensive than just emulating the whole function (but
it's still useful to have ARM64EC version so that ARM64EC callers don't
call into the emulator). It's also useful for compatibility, where
applications have specific expectations about function contents (like
syscall functions).
@cjacek cjacek force-pushed the arm64ec-exp-thunk branch from 2c7be0a to 539517c Compare August 26, 2024 18:38
@cjacek cjacek merged commit a35398d into llvm:main Aug 26, 2024
6 of 7 checks passed
@cjacek cjacek deleted the arm64ec-exp-thunk branch August 26, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants