Skip to content

[LLD][AArch64] Mark .plt with PURECODE flag if all input sections also have it #132224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Il-Capitano
Copy link
Contributor

Mark the synthetic .plt section with the SHF_AARCH64_PURECODE section flag if all executable input sections also have that flag.

Without this change, if we were to compile a binary with -mexecute-only, the final executable will only have .plt not marked with the section flag, causing it to be placed in a different load segment. This leads to an extra page's worth of memory usage unnecessarily when running the executable.

A similar issue happens if we always set the section flag on .plt and compile a binary without -mexecute-only, so the solution should match the SHF_AARCH64_PURECODE section flags between .plt and all other executable sections.

…o have it

Mark the synthetic `.plt` section with the `SHF_AARCH64_PURECODE`
section flag if all executable input sections also have that flag.

Without this change, if we were to compile a binary with
`-mexecute-only`, the final executable will only have `.plt` not marked
with the section flag, causing it to be placed in a different load
segment. This leads to an extra page's worth of memory usage
unnecessarily when running the executable.

A similar issue happens if we always set the section flag on `.plt` and
compile a binary without `-mexecute-only`, so the solution should match
the `SHF_AARCH64_PURECODE` section flags between `.plt` and all other
executable sections.
@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-lld-elf

Author: Csanád Hajdú (Il-Capitano)

Changes

Mark the synthetic .plt section with the SHF_AARCH64_PURECODE section flag if all executable input sections also have that flag.

Without this change, if we were to compile a binary with -mexecute-only, the final executable will only have .plt not marked with the section flag, causing it to be placed in a different load segment. This leads to an extra page's worth of memory usage unnecessarily when running the executable.

A similar issue happens if we always set the section flag on .plt and compile a binary without -mexecute-only, so the solution should match the SHF_AARCH64_PURECODE section flags between .plt and all other executable sections.


Full diff: https://github.com/llvm/llvm-project/pull/132224.diff

2 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (+12)
  • (added) lld/test/ELF/aarch64-execute-only-plt.s (+115)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index b03c4282ab1aa..a7ff8ed9b16d1 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
   // modify the instructions in the PLT entries.
   if (ctx.arg.emachine == EM_SPARCV9)
     this->flags |= SHF_WRITE;
+
+  // On AArch64, PLT entries only do loads from the .got.plt section, so the
+  // .plt section can be marked with the SHF_AARCH64_PURECODE section flag. We
+  // only do this if all other executable sections also have the same section
+  // flag set, because otherwise .plt can't be allocated in the same segment as
+  // the other executable sections.
+  if (ctx.arg.emachine == EM_AARCH64 &&
+      all_of(ctx.inputSections, [](InputSectionBase *sec) {
+        return !(sec->flags & SHF_EXECINSTR) ||
+               (sec->flags & SHF_AARCH64_PURECODE);
+      }))
+    this->flags |= SHF_AARCH64_PURECODE;
 }
 
 void PltSection::writeTo(uint8_t *buf) {
diff --git a/lld/test/ELF/aarch64-execute-only-plt.s b/lld/test/ELF/aarch64-execute-only-plt.s
new file mode 100644
index 0000000000000..08e69fba8fb0c
--- /dev/null
+++ b/lld/test/ELF/aarch64-execute-only-plt.s
@@ -0,0 +1,115 @@
+// REQUIRES: aarch64
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64 start.s -o start.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-same-section.s -o foo-xo-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-same-section.s -o foo-rx-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-different-section.s -o foo-xo-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-different-section.s -o foo-rx-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %p/Inputs/plt-aarch64.s -o plt.o
+// RUN: ld.lld -shared plt.o -soname=t2.so -o plt.so
+// RUN: ld.lld start.o foo-xo-same-section.o plt.so -o xo-same-section
+// RUN: ld.lld start.o foo-rx-same-section.o plt.so -o rx-same-section
+// RUN: ld.lld start.o foo-xo-different-section.o plt.so -o xo-different-section
+// RUN: ld.lld start.o foo-rx-different-section.o plt.so -o rx-different-section
+// RUN: llvm-readobj -S -l xo-same-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-same-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-readobj -S -l xo-different-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-different-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-different-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-different-section | FileCheck --check-prefix=DISASM %s
+
+// CHECK-XO:         Name: .plt
+// CHECK-XO-NEXT:    Type: SHT_PROGBITS
+// CHECK-XO-NEXT:    Flags [
+// CHECK-XO-NEXT:      SHF_AARCH64_PURECODE
+// CHECK-XO-NEXT:      SHF_ALLOC
+// CHECK-XO-NEXT:      SHF_EXECINSTR
+// CHECK-XO-NEXT:    ]
+// CHECK-XO-NEXT:    Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-XO:         VirtualAddress: 0x2102C8
+// CHECK-XO-NEXT:    PhysicalAddress: 0x2102C8
+// CHECK-XO-NEXT:    FileSize: 88
+// CHECK-XO-NEXT:    MemSize: 88
+// CHECK-XO-NEXT:    Flags [
+// CHECK-XO-NEXT:      PF_X
+// CHECK-XO-NEXT:    ]
+
+// CHECK-RX:         Name: .plt
+// CHECK-RX-NEXT:    Type: SHT_PROGBITS
+// CHECK-RX-NEXT:    Flags [
+// CHECK-RX-NEXT:      SHF_ALLOC
+// CHECK-RX-NEXT:      SHF_EXECINSTR
+// CHECK-RX-NEXT:    ]
+// CHECK-RX-NEXT:    Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-RX:         VirtualAddress: 0x2102C8
+// CHECK-RX-NEXT:    PhysicalAddress: 0x2102C8
+// CHECK-RX-NEXT:    FileSize: 88
+// CHECK-RX-NEXT:    MemSize: 88
+// CHECK-RX-NEXT:    Flags [
+// CHECK-RX-NEXT:      PF_R
+// CHECK-RX-NEXT:      PF_X
+// CHECK-RX-NEXT:    ]
+
+// DISASM-LABEL: Disassembly of section .plt:
+// DISASM-LABEL: <.plt>:
+// DISASM-NEXT:  2102e0: stp  x16, x30, [sp, #-0x10]!
+// DISASM-NEXT:          adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x400]
+// DISASM-NEXT:          add  x16, x16, #0x400
+// DISASM-NEXT:          br   x17
+// DISASM-NEXT:          nop
+// DISASM-NEXT:          nop
+// DISASM-NEXT:          nop
+
+// DISASM-LABEL: <bar@plt>:
+// DISASM-NEXT:  210300: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x408]
+// DISASM-NEXT:          add  x16, x16, #0x408
+// DISASM-NEXT:          br   x17
+
+// DISASM-LABEL: <weak@plt>:
+// DISASM-NEXT:  210310: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x410]
+// DISASM-NEXT:          add  x16, x16, #0x410
+// DISASM-NEXT:          br   x17
+
+//--- start.s
+.section .text,"axy",@progbits,unique,0
+.global _start, foo, bar
+.weak weak
+_start:
+  bl foo
+  bl bar
+  bl weak
+  ret
+
+//--- foo-xo-same-section.s
+.section .text,"axy",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-rx-same-section.s
+.section .text,"ax",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-xo-different-section.s
+.section .foo,"axy",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-rx-different-section.s
+.section .foo,"ax",@progbits,unique,0
+.global foo
+foo:
+  ret

@llvmbot
Copy link
Member

llvmbot commented Mar 20, 2025

@llvm/pr-subscribers-lld

Author: Csanád Hajdú (Il-Capitano)

Changes

Mark the synthetic .plt section with the SHF_AARCH64_PURECODE section flag if all executable input sections also have that flag.

Without this change, if we were to compile a binary with -mexecute-only, the final executable will only have .plt not marked with the section flag, causing it to be placed in a different load segment. This leads to an extra page's worth of memory usage unnecessarily when running the executable.

A similar issue happens if we always set the section flag on .plt and compile a binary without -mexecute-only, so the solution should match the SHF_AARCH64_PURECODE section flags between .plt and all other executable sections.


Full diff: https://github.com/llvm/llvm-project/pull/132224.diff

2 Files Affected:

  • (modified) lld/ELF/SyntheticSections.cpp (+12)
  • (added) lld/test/ELF/aarch64-execute-only-plt.s (+115)
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index b03c4282ab1aa..a7ff8ed9b16d1 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
   // modify the instructions in the PLT entries.
   if (ctx.arg.emachine == EM_SPARCV9)
     this->flags |= SHF_WRITE;
+
+  // On AArch64, PLT entries only do loads from the .got.plt section, so the
+  // .plt section can be marked with the SHF_AARCH64_PURECODE section flag. We
+  // only do this if all other executable sections also have the same section
+  // flag set, because otherwise .plt can't be allocated in the same segment as
+  // the other executable sections.
+  if (ctx.arg.emachine == EM_AARCH64 &&
+      all_of(ctx.inputSections, [](InputSectionBase *sec) {
+        return !(sec->flags & SHF_EXECINSTR) ||
+               (sec->flags & SHF_AARCH64_PURECODE);
+      }))
+    this->flags |= SHF_AARCH64_PURECODE;
 }
 
 void PltSection::writeTo(uint8_t *buf) {
diff --git a/lld/test/ELF/aarch64-execute-only-plt.s b/lld/test/ELF/aarch64-execute-only-plt.s
new file mode 100644
index 0000000000000..08e69fba8fb0c
--- /dev/null
+++ b/lld/test/ELF/aarch64-execute-only-plt.s
@@ -0,0 +1,115 @@
+// REQUIRES: aarch64
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64 start.s -o start.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-same-section.s -o foo-xo-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-same-section.s -o foo-rx-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-different-section.s -o foo-xo-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-different-section.s -o foo-rx-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %p/Inputs/plt-aarch64.s -o plt.o
+// RUN: ld.lld -shared plt.o -soname=t2.so -o plt.so
+// RUN: ld.lld start.o foo-xo-same-section.o plt.so -o xo-same-section
+// RUN: ld.lld start.o foo-rx-same-section.o plt.so -o rx-same-section
+// RUN: ld.lld start.o foo-xo-different-section.o plt.so -o xo-different-section
+// RUN: ld.lld start.o foo-rx-different-section.o plt.so -o rx-different-section
+// RUN: llvm-readobj -S -l xo-same-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-same-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-readobj -S -l xo-different-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-different-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-different-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-different-section | FileCheck --check-prefix=DISASM %s
+
+// CHECK-XO:         Name: .plt
+// CHECK-XO-NEXT:    Type: SHT_PROGBITS
+// CHECK-XO-NEXT:    Flags [
+// CHECK-XO-NEXT:      SHF_AARCH64_PURECODE
+// CHECK-XO-NEXT:      SHF_ALLOC
+// CHECK-XO-NEXT:      SHF_EXECINSTR
+// CHECK-XO-NEXT:    ]
+// CHECK-XO-NEXT:    Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-XO:         VirtualAddress: 0x2102C8
+// CHECK-XO-NEXT:    PhysicalAddress: 0x2102C8
+// CHECK-XO-NEXT:    FileSize: 88
+// CHECK-XO-NEXT:    MemSize: 88
+// CHECK-XO-NEXT:    Flags [
+// CHECK-XO-NEXT:      PF_X
+// CHECK-XO-NEXT:    ]
+
+// CHECK-RX:         Name: .plt
+// CHECK-RX-NEXT:    Type: SHT_PROGBITS
+// CHECK-RX-NEXT:    Flags [
+// CHECK-RX-NEXT:      SHF_ALLOC
+// CHECK-RX-NEXT:      SHF_EXECINSTR
+// CHECK-RX-NEXT:    ]
+// CHECK-RX-NEXT:    Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-RX:         VirtualAddress: 0x2102C8
+// CHECK-RX-NEXT:    PhysicalAddress: 0x2102C8
+// CHECK-RX-NEXT:    FileSize: 88
+// CHECK-RX-NEXT:    MemSize: 88
+// CHECK-RX-NEXT:    Flags [
+// CHECK-RX-NEXT:      PF_R
+// CHECK-RX-NEXT:      PF_X
+// CHECK-RX-NEXT:    ]
+
+// DISASM-LABEL: Disassembly of section .plt:
+// DISASM-LABEL: <.plt>:
+// DISASM-NEXT:  2102e0: stp  x16, x30, [sp, #-0x10]!
+// DISASM-NEXT:          adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x400]
+// DISASM-NEXT:          add  x16, x16, #0x400
+// DISASM-NEXT:          br   x17
+// DISASM-NEXT:          nop
+// DISASM-NEXT:          nop
+// DISASM-NEXT:          nop
+
+// DISASM-LABEL: <bar@plt>:
+// DISASM-NEXT:  210300: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x408]
+// DISASM-NEXT:          add  x16, x16, #0x408
+// DISASM-NEXT:          br   x17
+
+// DISASM-LABEL: <weak@plt>:
+// DISASM-NEXT:  210310: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT:          ldr  x17, [x16, #0x410]
+// DISASM-NEXT:          add  x16, x16, #0x410
+// DISASM-NEXT:          br   x17
+
+//--- start.s
+.section .text,"axy",@progbits,unique,0
+.global _start, foo, bar
+.weak weak
+_start:
+  bl foo
+  bl bar
+  bl weak
+  ret
+
+//--- foo-xo-same-section.s
+.section .text,"axy",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-rx-same-section.s
+.section .text,"ax",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-xo-different-section.s
+.section .foo,"axy",@progbits,unique,0
+.global foo
+foo:
+  ret
+
+//--- foo-rx-different-section.s
+.section .foo,"ax",@progbits,unique,0
+.global foo
+foo:
+  ret

@Il-Capitano
Copy link
Contributor Author

I had some concern about the performance impact of looping through all input sections when creating the .plt section, so I did some measurements of building Clang with -mexecute-only -ffunction-sections on an AArch64 machine (I verified that the final binary has the correct flags set).

I couldn't measure any time difference in the linking step between the old and new versions, it was within noise. So I don't think performance of the loop is a concern. In the common, non-execute-only case, short circuiting in all_of should also prevent any noticeable time difference.

@Il-Capitano Il-Capitano requested review from smithp35 and MaskRay March 20, 2025 14:52
Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a suggestion that I think will work for non-degenerate cases that avoids the loop.

@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
// modify the instructions in the PLT entries.
if (ctx.arg.emachine == EM_SPARCV9)
this->flags |= SHF_WRITE;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively it should be possible to universally set SHF_AARCH64_PURECODE and then
handle this in Writer.cpp::createPhdrs()

https://github.com/llvm/llvm-project/blob/main/lld/ELF/Writer.cpp#L2381

    uint64_t newFlags = computeFlags(ctx, sec->getPhdrFlags());
    // When --no-rosegment is specified, RO and RX sections are compatible.
    uint32_t incompatible = flags ^ newFlags;
    if (ctx.arg.singleRoRx && !(newFlags & PF_W))
      incompatible &= ~PF_X;

Something like:

  if (sec == ctx.in.plt && (flags & PF_R))
    newFlags |= PF_R;

It is true that the .plt could in theory be the first section, but this would normally take a linker script making it the first OutputSection, but I think that's unlikely, and could be fixed with PHDRS.

I did think we might do this for all OutputSections but I guess for bare-metal there's still a use case for separate XO and non-XO segments.

Another possibility is to record any non-XO OutputSection that we see in ctx.

Copy link
Contributor Author

@Il-Capitano Il-Capitano Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sec can't be compared with ctx.in.plt there, because it is an output section, and ctx.in.plt is an input section. We'd have to do findSection(ctx, ".plt") in order to get the .plt output section.

Another concern I have with manipulating the output sections directly is that maybe one of the non-synthetic input sections might be placed in the .plt output section? I'm not sure if this really happens with real code, but I'd rather write a solution that works in every case by using SHF_AARCH64_PURECODE correctly on the input sections.

For an alternate solution, another possible spot I found where we can modify the flags of ctx.in.plt is in this loop inside addOrphanSections:

// For further --emit-reloc handling code we need target output section
// to be created before we create relocation output section, so we want
// to create target sections first. We do not want priority handling
// for synthetic sections because them are special.
size_t n = 0;
for (InputSectionBase *isec : ctx.inputSections) {
// Process InputSection and MergeInputSection.
if (LLVM_LIKELY(isa<InputSection>(isec)))
ctx.inputSections[n++] = isec;
// In -r links, SHF_LINK_ORDER sections are added while adding their parent
// sections because we need to know the parent's output section before we
// can select an output section for the SHF_LINK_ORDER section.
if (ctx.arg.relocatable && (isec->flags & SHF_LINK_ORDER))
continue;
if (auto *sec = dyn_cast<InputSection>(isec))
if (InputSectionBase *rel = sec->getRelocatedSection())
if (auto *relIS = dyn_cast_or_null<InputSectionBase>(rel->parent))
add(relIS);
add(isec);
if (ctx.arg.relocatable)
for (InputSectionBase *depSec : isec->dependentSections)
if (depSec->flags & SHF_LINK_ORDER)
add(depSec);
}

Adding something like this here works:

  // Only check for PURECODE flag on AArch64 to decide if .plt should have the
  // flag as well or not.
  bool isAllPurecode = ctx.arg.emachine == EM_AARCH64;
  for (InputSectionBase *isec : ctx.inputSections) {
    isAllPurecode = isAllPurecode && (isa<SyntheticSection>(isec) ||
                                      !(isec->flags & SHF_EXECINSTR) ||
                                      (isec->flags & SHF_AARCH64_PURECODE));
    // ...
  }
  if (isAllPurecode)
    ctx.in.plt->flags |= SHF_AARCH64_PURECODE;

We can save looping through the input sections an extra time in the PltSection constructor, but the logic gets decoupled from PltSection, which I'm not a fan of. What do you think about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the long comment!

It is possible to find the OutputSection that contains the .plt, it would be something like .in.plt->parent. That would mean that we would only need to check OutputSections rather than input sections. If the .plt is mixed with non XO InputSections then the OutputSection is in will be non-XO. However ...

Taking a step back, I think it will be worth thinking through what the heuristics for Program Header generation are when it comes to XO. Apologies I didn't have time to write this up yesterday Evening. I think there could be more than just the .plt that is affected.

In principle any orphan section with SHF_PURECODE (that generates an OutputSection) will propagate SHF_PURECODE to the OutputSection, which is going to auto-generate an XO program header on a transition from non-XO, which isn't going to be helpful for a non-XO program. How much of a problem this is I don't know. For an Android/Linux system needing full XO, there may be a non-zero number of libraries that need SHF_PURECODE just in case they are used in an XO context. In a contrived worst case we have alternate XO, non-XO output sections and get a separate program header for each OutputSection.

Thinking of a model for how this would be used, I think we have two (possibly three) cases:

  • Bare-metal system (how XO is currently used on Arm), no PLT, no dynamic linking, linker script, potential mix of XO (my code) and non XO (library code).
  • An OS that can support all XO or non-XO for a particular program, PLT highly likely, default linker script, dynamic linking. I'm guessing this is where Android will be heading.
  • An OS that can support separate parts of the program being XO and non-XO (presumably separated by a page boundary). I don't think that anyone needs/wants this level of flexibility.

For the bare-metal system we would like to have separate XO and non-XO program headers for the same output file. It is up to the user to write the linker script to separate out the XO and non-XO into distinct memory regions, and possibly use PHDRS to make sure they get what they need.

For the OS that can only have a program thats XO or not XO, we ideally want all executable OutputSections to be XO before generating an XO program header.

For the OS that can have multiple XO and non XO parts, then there's no good simple heuristic that I can think of that's always going to work. However I think we can probably rule this use case out.

With that in mind I propose that we do something like:

  • Unconditionally add SHF_PURECODE to the .plt.
  • For a program using an OS (defined as having a dynamic section, or a PLT), then when auto generating program headers (no linker script PHDRS), then clear SHF_PURECODE from all executable OutputSections if at least on executable OutputSection is not XO.
  • Leave behaviour as it is for bare-metal programs (that don't have a PLT or dynamic section).

Not sure I've got that completely right, but it should be close. I think that could be applied in createPhdrs().

The alternative view is that this is too complicated and it is only the PLT that the linker should care about, getting XO right is the users responsibility.

In that case it may simplify to

  • Unconditionally add SHF_PURECODE to the .plt
  • If at least one executable OutputSection has non-XO, then find the OutputSection containing the PLT (.in.plt->parent) and clear SHF_PURECODE from that OutputSection.

Again this could be done at the start of createPhdrs().

Copy link
Contributor Author

@Il-Capitano Il-Capitano Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough reply! It really helped refine my understanding of the problem.

You're right that the main use case we care about is the whole program being XO or RX. What do you think about doing the following:

  • Unconditionally set SHF_AARCH64_PURECODE for .plt.
  • When auto generating program headers, consider XO and RX sections compatible, allowing them to be placed in the same segment. We could also add a flag similar to --rosegment to control this behaviour.
  • At this point we don't need to strip the PURECODE flag from the output sections, they'll just be placed in a program header that is RX instead of XO. Leaving the section flag intact shouldn't cause any issues I think.

We can do this by just adding the following snippet in createPhdrs():

    if (newFlags & PF_X)
      incompatible &= ~PF_R;

For bare-metal targets, this wouldn't allow separate auto-generated program headers with XO and RX code though, a linker script (or just a flag?) would be required to separate those out into different program headers. I don't have any experience working with bare-metal, do you think this is a reasonable requirement? If not, can we detect in the linker whether we're linking for a target with an OS or not?

If you think this would be a good approach, I'll open a separate PR superseding this one, as it's a more general solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bare metal case with linker script I think that would be OK. I expect that in a majority of cases a MEMORY region would be setup for the XO and non-XO memory. These would have distinct addresses such that a separate program header would be created anyway. If it weren't then PHDRS could be used to force the separation.

We'd need to release note the change in behaviour but I think that it is worth it to get the merging case right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR then with my proposed approach. Thank you for your insights!

Il-Capitano added a commit to Il-Capitano/llvm-project that referenced this pull request Mar 21, 2025
…t` flag

Following from the discussion in llvm#132224, this seems like the best
approach to deal with a mix of XO and RX output sections in the same
binary. This change will also simplify the implementation of the
PURECODE section flag for AArch64.

To control this behaviour, the `--[no-]xosegment` flag is added to LLD
(similarly to `--[no-]rosegment`), which determines whether to allow
merging XO and RX sections in the same segment. The default value is
`--no-xosegment`, which is a breaking change compared to the previous
behaviour.

Release notes are also added, since this will be a breaking change.
@Il-Capitano
Copy link
Contributor Author

I opened #132412 as a general approach of dealing with a mix of XO and RX sections in the same binary. I'll close this PR because of that. I'll do a separate change regarding the section flags for .plt and .iplt.

@Il-Capitano Il-Capitano deleted the execute-only-plt branch March 21, 2025 15:43
Il-Capitano added a commit to Il-Capitano/llvm-project that referenced this pull request Apr 7, 2025
…t` flag

Following from the discussion in llvm#132224, this seems like the best
approach to deal with a mix of XO and RX output sections in the same
binary. This change will also simplify the implementation of the
PURECODE section flag for AArch64.

To control this behaviour, the `--[no-]xosegment` flag is added to LLD
(similarly to `--[no-]rosegment`), which determines whether to allow
merging XO and RX sections in the same segment. The default value is
`--no-xosegment`, which is a breaking change compared to the previous
behaviour.

Release notes are also added, since this will be a breaking change.
Il-Capitano added a commit that referenced this pull request Apr 8, 2025
…t` flag (#132412)

Following from the discussion in #132224, this seems like the best
approach to deal with a mix of XO and RX output sections in the same
binary. This change will also simplify the implementation of the
PURECODE section flag for AArch64.

To control this behaviour, the `--[no-]xosegment` flag is added to LLD
(similarly to `--[no-]rosegment`), which determines whether to allow
merging XO and RX sections in the same segment. The default value is
`--no-xosegment`, which is a breaking change compared to the previous
behaviour.

Release notes are also added, since this will be a breaking change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants