Skip to content

[LLD][AArch64] Mark .plt with PURECODE flag if all input sections also have it #132224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions lld/ELF/SyntheticSections.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
// modify the instructions in the PLT entries.
if (ctx.arg.emachine == EM_SPARCV9)
this->flags |= SHF_WRITE;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively it should be possible to universally set SHF_AARCH64_PURECODE and then
handle this in Writer.cpp::createPhdrs()

https://github.com/llvm/llvm-project/blob/main/lld/ELF/Writer.cpp#L2381

    uint64_t newFlags = computeFlags(ctx, sec->getPhdrFlags());
    // When --no-rosegment is specified, RO and RX sections are compatible.
    uint32_t incompatible = flags ^ newFlags;
    if (ctx.arg.singleRoRx && !(newFlags & PF_W))
      incompatible &= ~PF_X;

Something like:

  if (sec == ctx.in.plt && (flags & PF_R))
    newFlags |= PF_R;

It is true that the .plt could in theory be the first section, but this would normally take a linker script making it the first OutputSection, but I think that's unlikely, and could be fixed with PHDRS.

I did think we might do this for all OutputSections but I guess for bare-metal there's still a use case for separate XO and non-XO segments.

Another possibility is to record any non-XO OutputSection that we see in ctx.

Copy link
Contributor Author

@Il-Capitano Il-Capitano Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sec can't be compared with ctx.in.plt there, because it is an output section, and ctx.in.plt is an input section. We'd have to do findSection(ctx, ".plt") in order to get the .plt output section.

Another concern I have with manipulating the output sections directly is that maybe one of the non-synthetic input sections might be placed in the .plt output section? I'm not sure if this really happens with real code, but I'd rather write a solution that works in every case by using SHF_AARCH64_PURECODE correctly on the input sections.

For an alternate solution, another possible spot I found where we can modify the flags of ctx.in.plt is in this loop inside addOrphanSections:

// For further --emit-reloc handling code we need target output section
// to be created before we create relocation output section, so we want
// to create target sections first. We do not want priority handling
// for synthetic sections because them are special.
size_t n = 0;
for (InputSectionBase *isec : ctx.inputSections) {
// Process InputSection and MergeInputSection.
if (LLVM_LIKELY(isa<InputSection>(isec)))
ctx.inputSections[n++] = isec;
// In -r links, SHF_LINK_ORDER sections are added while adding their parent
// sections because we need to know the parent's output section before we
// can select an output section for the SHF_LINK_ORDER section.
if (ctx.arg.relocatable && (isec->flags & SHF_LINK_ORDER))
continue;
if (auto *sec = dyn_cast<InputSection>(isec))
if (InputSectionBase *rel = sec->getRelocatedSection())
if (auto *relIS = dyn_cast_or_null<InputSectionBase>(rel->parent))
add(relIS);
add(isec);
if (ctx.arg.relocatable)
for (InputSectionBase *depSec : isec->dependentSections)
if (depSec->flags & SHF_LINK_ORDER)
add(depSec);
}

Adding something like this here works:

  // Only check for PURECODE flag on AArch64 to decide if .plt should have the
  // flag as well or not.
  bool isAllPurecode = ctx.arg.emachine == EM_AARCH64;
  for (InputSectionBase *isec : ctx.inputSections) {
    isAllPurecode = isAllPurecode && (isa<SyntheticSection>(isec) ||
                                      !(isec->flags & SHF_EXECINSTR) ||
                                      (isec->flags & SHF_AARCH64_PURECODE));
    // ...
  }
  if (isAllPurecode)
    ctx.in.plt->flags |= SHF_AARCH64_PURECODE;

We can save looping through the input sections an extra time in the PltSection constructor, but the logic gets decoupled from PltSection, which I'm not a fan of. What do you think about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the long comment!

It is possible to find the OutputSection that contains the .plt, it would be something like .in.plt->parent. That would mean that we would only need to check OutputSections rather than input sections. If the .plt is mixed with non XO InputSections then the OutputSection is in will be non-XO. However ...

Taking a step back, I think it will be worth thinking through what the heuristics for Program Header generation are when it comes to XO. Apologies I didn't have time to write this up yesterday Evening. I think there could be more than just the .plt that is affected.

In principle any orphan section with SHF_PURECODE (that generates an OutputSection) will propagate SHF_PURECODE to the OutputSection, which is going to auto-generate an XO program header on a transition from non-XO, which isn't going to be helpful for a non-XO program. How much of a problem this is I don't know. For an Android/Linux system needing full XO, there may be a non-zero number of libraries that need SHF_PURECODE just in case they are used in an XO context. In a contrived worst case we have alternate XO, non-XO output sections and get a separate program header for each OutputSection.

Thinking of a model for how this would be used, I think we have two (possibly three) cases:

  • Bare-metal system (how XO is currently used on Arm), no PLT, no dynamic linking, linker script, potential mix of XO (my code) and non XO (library code).
  • An OS that can support all XO or non-XO for a particular program, PLT highly likely, default linker script, dynamic linking. I'm guessing this is where Android will be heading.
  • An OS that can support separate parts of the program being XO and non-XO (presumably separated by a page boundary). I don't think that anyone needs/wants this level of flexibility.

For the bare-metal system we would like to have separate XO and non-XO program headers for the same output file. It is up to the user to write the linker script to separate out the XO and non-XO into distinct memory regions, and possibly use PHDRS to make sure they get what they need.

For the OS that can only have a program thats XO or not XO, we ideally want all executable OutputSections to be XO before generating an XO program header.

For the OS that can have multiple XO and non XO parts, then there's no good simple heuristic that I can think of that's always going to work. However I think we can probably rule this use case out.

With that in mind I propose that we do something like:

  • Unconditionally add SHF_PURECODE to the .plt.
  • For a program using an OS (defined as having a dynamic section, or a PLT), then when auto generating program headers (no linker script PHDRS), then clear SHF_PURECODE from all executable OutputSections if at least on executable OutputSection is not XO.
  • Leave behaviour as it is for bare-metal programs (that don't have a PLT or dynamic section).

Not sure I've got that completely right, but it should be close. I think that could be applied in createPhdrs().

The alternative view is that this is too complicated and it is only the PLT that the linker should care about, getting XO right is the users responsibility.

In that case it may simplify to

  • Unconditionally add SHF_PURECODE to the .plt
  • If at least one executable OutputSection has non-XO, then find the OutputSection containing the PLT (.in.plt->parent) and clear SHF_PURECODE from that OutputSection.

Again this could be done at the start of createPhdrs().

Copy link
Contributor Author

@Il-Capitano Il-Capitano Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough reply! It really helped refine my understanding of the problem.

You're right that the main use case we care about is the whole program being XO or RX. What do you think about doing the following:

  • Unconditionally set SHF_AARCH64_PURECODE for .plt.
  • When auto generating program headers, consider XO and RX sections compatible, allowing them to be placed in the same segment. We could also add a flag similar to --rosegment to control this behaviour.
  • At this point we don't need to strip the PURECODE flag from the output sections, they'll just be placed in a program header that is RX instead of XO. Leaving the section flag intact shouldn't cause any issues I think.

We can do this by just adding the following snippet in createPhdrs():

    if (newFlags & PF_X)
      incompatible &= ~PF_R;

For bare-metal targets, this wouldn't allow separate auto-generated program headers with XO and RX code though, a linker script (or just a flag?) would be required to separate those out into different program headers. I don't have any experience working with bare-metal, do you think this is a reasonable requirement? If not, can we detect in the linker whether we're linking for a target with an OS or not?

If you think this would be a good approach, I'll open a separate PR superseding this one, as it's a more general solution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bare metal case with linker script I think that would be OK. I expect that in a majority of cases a MEMORY region would be setup for the XO and non-XO memory. These would have distinct addresses such that a separate program header would be created anyway. If it weren't then PHDRS could be used to force the separation.

We'd need to release note the change in behaviour but I think that it is worth it to get the merging case right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR then with my proposed approach. Thank you for your insights!

// On AArch64, PLT entries only do loads from the .got.plt section, so the
// .plt section can be marked with the SHF_AARCH64_PURECODE section flag. We
// only do this if all other executable sections also have the same section
// flag set, because otherwise .plt can't be allocated in the same segment as
// the other executable sections.
if (ctx.arg.emachine == EM_AARCH64 &&
all_of(ctx.inputSections, [](InputSectionBase *sec) {
return !(sec->flags & SHF_EXECINSTR) ||
(sec->flags & SHF_AARCH64_PURECODE);
}))
this->flags |= SHF_AARCH64_PURECODE;
}

void PltSection::writeTo(uint8_t *buf) {
Expand Down
88 changes: 88 additions & 0 deletions lld/test/ELF/aarch64-execute-only-plt.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
// REQUIRES: aarch64
// RUN: rm -rf %t && split-file %s %t && cd %t

// RUN: llvm-mc -filetype=obj -triple=aarch64 start.s -o start.o
// RUN: llvm-mc -filetype=obj -triple=aarch64 xo-same-section.s -o xo-same-section.o
// RUN: llvm-mc -filetype=obj -triple=aarch64 rx-same-section.s -o rx-same-section.o
// RUN: llvm-mc -filetype=obj -triple=aarch64 xo-different-section.s -o xo-different-section.o
// RUN: llvm-mc -filetype=obj -triple=aarch64 rx-different-section.s -o rx-different-section.o
// RUN: llvm-mc -filetype=obj -triple=aarch64 %p/Inputs/plt-aarch64.s -o plt.o
// RUN: ld.lld -shared plt.o -soname=t2.so -o plt.so
// RUN: ld.lld start.o xo-same-section.o plt.so -o xo-same-section
// RUN: ld.lld start.o rx-same-section.o plt.so -o rx-same-section
// RUN: ld.lld start.o xo-different-section.o plt.so -o xo-different-section
// RUN: ld.lld start.o rx-different-section.o plt.so -o rx-different-section
// RUN: llvm-readelf -S -l xo-same-section | FileCheck --check-prefix=CHECK-XO %s
// RUN: llvm-readelf -S -l rx-same-section | FileCheck --check-prefix=CHECK-RX %s
// RUN: llvm-readelf -S -l xo-different-section | FileCheck --check-prefix=CHECK-XO %s
// RUN: llvm-readelf -S -l rx-different-section | FileCheck --check-prefix=CHECK-RX %s
// RUN: llvm-objdump -d --no-show-raw-insn xo-same-section | FileCheck --check-prefix=DISASM %s
// RUN: llvm-objdump -d --no-show-raw-insn rx-same-section | FileCheck --check-prefix=DISASM %s
// RUN: llvm-objdump -d --no-show-raw-insn xo-different-section | FileCheck --check-prefix=DISASM %s
// RUN: llvm-objdump -d --no-show-raw-insn rx-different-section | FileCheck --check-prefix=DISASM %s

/// Name Type Address Off Size ES Flg Lk Inf Al
// CHECK-XO: .plt PROGBITS 00000000002102e0 0002e0 000040 00 AXy 0 0 16
// CHECK-RX: .plt PROGBITS 00000000002102e0 0002e0 000040 00 AX 0 0 16

/// The address of .plt above should be within this program header.
/// Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
// CHECK-XO: LOAD 0x0002c8 0x00000000002102c8 0x00000000002102c8 0x000058 0x000058 E 0x10000
// CHECK-RX: LOAD 0x0002c8 0x00000000002102c8 0x00000000002102c8 0x000058 0x000058 R E 0x10000

// DISASM-LABEL: Disassembly of section .plt:
// DISASM-LABEL: <.plt>:
// DISASM-NEXT: 2102e0: stp x16, x30, [sp, #-0x10]!
// DISASM-NEXT: adrp x16, 0x230000 <weak+0x230000>
// DISASM-NEXT: ldr x17, [x16, #0x400]
// DISASM-NEXT: add x16, x16, #0x400
// DISASM-NEXT: br x17
// DISASM-NEXT: nop
// DISASM-NEXT: nop
// DISASM-NEXT: nop

// DISASM-LABEL: <bar@plt>:
// DISASM-NEXT: 210300: adrp x16, 0x230000 <weak+0x230000>
// DISASM-NEXT: ldr x17, [x16, #0x408]
// DISASM-NEXT: add x16, x16, #0x408
// DISASM-NEXT: br x17

// DISASM-LABEL: <weak@plt>:
// DISASM-NEXT: 210310: adrp x16, 0x230000 <weak+0x230000>
// DISASM-NEXT: ldr x17, [x16, #0x410]
// DISASM-NEXT: add x16, x16, #0x410
// DISASM-NEXT: br x17

//--- start.s
.section .text,"axy",@progbits,unique,0
.global _start, foo, bar
.weak weak
_start:
bl foo
bl bar
bl weak
ret

//--- xo-same-section.s
.section .text,"axy",@progbits,unique,0
.global foo
foo:
ret

//--- rx-same-section.s
.section .text,"ax",@progbits,unique,0
.global foo
foo:
ret

//--- xo-different-section.s
.section .foo,"axy",@progbits,unique,0
.global foo
foo:
ret

//--- rx-different-section.s
.section .foo,"ax",@progbits,unique,0
.global foo
foo:
ret