-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLD][AArch64] Mark .plt with PURECODE flag if all input sections also have it #132224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o have it Mark the synthetic `.plt` section with the `SHF_AARCH64_PURECODE` section flag if all executable input sections also have that flag. Without this change, if we were to compile a binary with `-mexecute-only`, the final executable will only have `.plt` not marked with the section flag, causing it to be placed in a different load segment. This leads to an extra page's worth of memory usage unnecessarily when running the executable. A similar issue happens if we always set the section flag on `.plt` and compile a binary without `-mexecute-only`, so the solution should match the `SHF_AARCH64_PURECODE` section flags between `.plt` and all other executable sections.
@llvm/pr-subscribers-lld-elf Author: Csanád Hajdú (Il-Capitano) ChangesMark the synthetic Without this change, if we were to compile a binary with A similar issue happens if we always set the section flag on Full diff: https://github.com/llvm/llvm-project/pull/132224.diff 2 Files Affected:
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index b03c4282ab1aa..a7ff8ed9b16d1 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
// modify the instructions in the PLT entries.
if (ctx.arg.emachine == EM_SPARCV9)
this->flags |= SHF_WRITE;
+
+ // On AArch64, PLT entries only do loads from the .got.plt section, so the
+ // .plt section can be marked with the SHF_AARCH64_PURECODE section flag. We
+ // only do this if all other executable sections also have the same section
+ // flag set, because otherwise .plt can't be allocated in the same segment as
+ // the other executable sections.
+ if (ctx.arg.emachine == EM_AARCH64 &&
+ all_of(ctx.inputSections, [](InputSectionBase *sec) {
+ return !(sec->flags & SHF_EXECINSTR) ||
+ (sec->flags & SHF_AARCH64_PURECODE);
+ }))
+ this->flags |= SHF_AARCH64_PURECODE;
}
void PltSection::writeTo(uint8_t *buf) {
diff --git a/lld/test/ELF/aarch64-execute-only-plt.s b/lld/test/ELF/aarch64-execute-only-plt.s
new file mode 100644
index 0000000000000..08e69fba8fb0c
--- /dev/null
+++ b/lld/test/ELF/aarch64-execute-only-plt.s
@@ -0,0 +1,115 @@
+// REQUIRES: aarch64
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64 start.s -o start.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-same-section.s -o foo-xo-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-same-section.s -o foo-rx-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-different-section.s -o foo-xo-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-different-section.s -o foo-rx-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %p/Inputs/plt-aarch64.s -o plt.o
+// RUN: ld.lld -shared plt.o -soname=t2.so -o plt.so
+// RUN: ld.lld start.o foo-xo-same-section.o plt.so -o xo-same-section
+// RUN: ld.lld start.o foo-rx-same-section.o plt.so -o rx-same-section
+// RUN: ld.lld start.o foo-xo-different-section.o plt.so -o xo-different-section
+// RUN: ld.lld start.o foo-rx-different-section.o plt.so -o rx-different-section
+// RUN: llvm-readobj -S -l xo-same-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-same-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-readobj -S -l xo-different-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-different-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-different-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-different-section | FileCheck --check-prefix=DISASM %s
+
+// CHECK-XO: Name: .plt
+// CHECK-XO-NEXT: Type: SHT_PROGBITS
+// CHECK-XO-NEXT: Flags [
+// CHECK-XO-NEXT: SHF_AARCH64_PURECODE
+// CHECK-XO-NEXT: SHF_ALLOC
+// CHECK-XO-NEXT: SHF_EXECINSTR
+// CHECK-XO-NEXT: ]
+// CHECK-XO-NEXT: Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-XO: VirtualAddress: 0x2102C8
+// CHECK-XO-NEXT: PhysicalAddress: 0x2102C8
+// CHECK-XO-NEXT: FileSize: 88
+// CHECK-XO-NEXT: MemSize: 88
+// CHECK-XO-NEXT: Flags [
+// CHECK-XO-NEXT: PF_X
+// CHECK-XO-NEXT: ]
+
+// CHECK-RX: Name: .plt
+// CHECK-RX-NEXT: Type: SHT_PROGBITS
+// CHECK-RX-NEXT: Flags [
+// CHECK-RX-NEXT: SHF_ALLOC
+// CHECK-RX-NEXT: SHF_EXECINSTR
+// CHECK-RX-NEXT: ]
+// CHECK-RX-NEXT: Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-RX: VirtualAddress: 0x2102C8
+// CHECK-RX-NEXT: PhysicalAddress: 0x2102C8
+// CHECK-RX-NEXT: FileSize: 88
+// CHECK-RX-NEXT: MemSize: 88
+// CHECK-RX-NEXT: Flags [
+// CHECK-RX-NEXT: PF_R
+// CHECK-RX-NEXT: PF_X
+// CHECK-RX-NEXT: ]
+
+// DISASM-LABEL: Disassembly of section .plt:
+// DISASM-LABEL: <.plt>:
+// DISASM-NEXT: 2102e0: stp x16, x30, [sp, #-0x10]!
+// DISASM-NEXT: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x400]
+// DISASM-NEXT: add x16, x16, #0x400
+// DISASM-NEXT: br x17
+// DISASM-NEXT: nop
+// DISASM-NEXT: nop
+// DISASM-NEXT: nop
+
+// DISASM-LABEL: <bar@plt>:
+// DISASM-NEXT: 210300: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x408]
+// DISASM-NEXT: add x16, x16, #0x408
+// DISASM-NEXT: br x17
+
+// DISASM-LABEL: <weak@plt>:
+// DISASM-NEXT: 210310: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x410]
+// DISASM-NEXT: add x16, x16, #0x410
+// DISASM-NEXT: br x17
+
+//--- start.s
+.section .text,"axy",@progbits,unique,0
+.global _start, foo, bar
+.weak weak
+_start:
+ bl foo
+ bl bar
+ bl weak
+ ret
+
+//--- foo-xo-same-section.s
+.section .text,"axy",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-rx-same-section.s
+.section .text,"ax",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-xo-different-section.s
+.section .foo,"axy",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-rx-different-section.s
+.section .foo,"ax",@progbits,unique,0
+.global foo
+foo:
+ ret
|
@llvm/pr-subscribers-lld Author: Csanád Hajdú (Il-Capitano) ChangesMark the synthetic Without this change, if we were to compile a binary with A similar issue happens if we always set the section flag on Full diff: https://github.com/llvm/llvm-project/pull/132224.diff 2 Files Affected:
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index b03c4282ab1aa..a7ff8ed9b16d1 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx)
// modify the instructions in the PLT entries.
if (ctx.arg.emachine == EM_SPARCV9)
this->flags |= SHF_WRITE;
+
+ // On AArch64, PLT entries only do loads from the .got.plt section, so the
+ // .plt section can be marked with the SHF_AARCH64_PURECODE section flag. We
+ // only do this if all other executable sections also have the same section
+ // flag set, because otherwise .plt can't be allocated in the same segment as
+ // the other executable sections.
+ if (ctx.arg.emachine == EM_AARCH64 &&
+ all_of(ctx.inputSections, [](InputSectionBase *sec) {
+ return !(sec->flags & SHF_EXECINSTR) ||
+ (sec->flags & SHF_AARCH64_PURECODE);
+ }))
+ this->flags |= SHF_AARCH64_PURECODE;
}
void PltSection::writeTo(uint8_t *buf) {
diff --git a/lld/test/ELF/aarch64-execute-only-plt.s b/lld/test/ELF/aarch64-execute-only-plt.s
new file mode 100644
index 0000000000000..08e69fba8fb0c
--- /dev/null
+++ b/lld/test/ELF/aarch64-execute-only-plt.s
@@ -0,0 +1,115 @@
+// REQUIRES: aarch64
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64 start.s -o start.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-same-section.s -o foo-xo-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-same-section.s -o foo-rx-same-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-xo-different-section.s -o foo-xo-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 foo-rx-different-section.s -o foo-rx-different-section.o
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %p/Inputs/plt-aarch64.s -o plt.o
+// RUN: ld.lld -shared plt.o -soname=t2.so -o plt.so
+// RUN: ld.lld start.o foo-xo-same-section.o plt.so -o xo-same-section
+// RUN: ld.lld start.o foo-rx-same-section.o plt.so -o rx-same-section
+// RUN: ld.lld start.o foo-xo-different-section.o plt.so -o xo-different-section
+// RUN: ld.lld start.o foo-rx-different-section.o plt.so -o rx-different-section
+// RUN: llvm-readobj -S -l xo-same-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-same-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-readobj -S -l xo-different-section | FileCheck --check-prefix=CHECK-XO %s
+// RUN: llvm-readobj -S -l rx-different-section | FileCheck --check-prefix=CHECK-RX %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-same-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn xo-different-section | FileCheck --check-prefix=DISASM %s
+// RUN: llvm-objdump -d --no-show-raw-insn rx-different-section | FileCheck --check-prefix=DISASM %s
+
+// CHECK-XO: Name: .plt
+// CHECK-XO-NEXT: Type: SHT_PROGBITS
+// CHECK-XO-NEXT: Flags [
+// CHECK-XO-NEXT: SHF_AARCH64_PURECODE
+// CHECK-XO-NEXT: SHF_ALLOC
+// CHECK-XO-NEXT: SHF_EXECINSTR
+// CHECK-XO-NEXT: ]
+// CHECK-XO-NEXT: Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-XO: VirtualAddress: 0x2102C8
+// CHECK-XO-NEXT: PhysicalAddress: 0x2102C8
+// CHECK-XO-NEXT: FileSize: 88
+// CHECK-XO-NEXT: MemSize: 88
+// CHECK-XO-NEXT: Flags [
+// CHECK-XO-NEXT: PF_X
+// CHECK-XO-NEXT: ]
+
+// CHECK-RX: Name: .plt
+// CHECK-RX-NEXT: Type: SHT_PROGBITS
+// CHECK-RX-NEXT: Flags [
+// CHECK-RX-NEXT: SHF_ALLOC
+// CHECK-RX-NEXT: SHF_EXECINSTR
+// CHECK-RX-NEXT: ]
+// CHECK-RX-NEXT: Address: 0x2102E0
+
+/// The address of .plt above should be within this program header.
+// CHECK-RX: VirtualAddress: 0x2102C8
+// CHECK-RX-NEXT: PhysicalAddress: 0x2102C8
+// CHECK-RX-NEXT: FileSize: 88
+// CHECK-RX-NEXT: MemSize: 88
+// CHECK-RX-NEXT: Flags [
+// CHECK-RX-NEXT: PF_R
+// CHECK-RX-NEXT: PF_X
+// CHECK-RX-NEXT: ]
+
+// DISASM-LABEL: Disassembly of section .plt:
+// DISASM-LABEL: <.plt>:
+// DISASM-NEXT: 2102e0: stp x16, x30, [sp, #-0x10]!
+// DISASM-NEXT: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x400]
+// DISASM-NEXT: add x16, x16, #0x400
+// DISASM-NEXT: br x17
+// DISASM-NEXT: nop
+// DISASM-NEXT: nop
+// DISASM-NEXT: nop
+
+// DISASM-LABEL: <bar@plt>:
+// DISASM-NEXT: 210300: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x408]
+// DISASM-NEXT: add x16, x16, #0x408
+// DISASM-NEXT: br x17
+
+// DISASM-LABEL: <weak@plt>:
+// DISASM-NEXT: 210310: adrp x16, 0x230000 <weak+0x230000>
+// DISASM-NEXT: ldr x17, [x16, #0x410]
+// DISASM-NEXT: add x16, x16, #0x410
+// DISASM-NEXT: br x17
+
+//--- start.s
+.section .text,"axy",@progbits,unique,0
+.global _start, foo, bar
+.weak weak
+_start:
+ bl foo
+ bl bar
+ bl weak
+ ret
+
+//--- foo-xo-same-section.s
+.section .text,"axy",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-rx-same-section.s
+.section .text,"ax",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-xo-different-section.s
+.section .foo,"axy",@progbits,unique,0
+.global foo
+foo:
+ ret
+
+//--- foo-rx-different-section.s
+.section .foo,"ax",@progbits,unique,0
+.global foo
+foo:
+ ret
|
I had some concern about the performance impact of looping through all input sections when creating the I couldn't measure any time difference in the linking step between the old and new versions, it was within noise. So I don't think performance of the loop is a concern. In the common, non-execute-only case, short circuiting in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a suggestion that I think will work for non-degenerate cases that avoids the loop.
@@ -2610,6 +2610,18 @@ PltSection::PltSection(Ctx &ctx) | |||
// modify the instructions in the PLT entries. | |||
if (ctx.arg.emachine == EM_SPARCV9) | |||
this->flags |= SHF_WRITE; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively it should be possible to universally set SHF_AARCH64_PURECODE and then
handle this in Writer.cpp::createPhdrs()
https://github.com/llvm/llvm-project/blob/main/lld/ELF/Writer.cpp#L2381
uint64_t newFlags = computeFlags(ctx, sec->getPhdrFlags());
// When --no-rosegment is specified, RO and RX sections are compatible.
uint32_t incompatible = flags ^ newFlags;
if (ctx.arg.singleRoRx && !(newFlags & PF_W))
incompatible &= ~PF_X;
Something like:
if (sec == ctx.in.plt && (flags & PF_R))
newFlags |= PF_R;
It is true that the .plt could in theory be the first section, but this would normally take a linker script making it the first OutputSection, but I think that's unlikely, and could be fixed with PHDRS.
I did think we might do this for all OutputSections but I guess for bare-metal there's still a use case for separate XO and non-XO segments.
Another possibility is to record any non-XO OutputSection that we see in ctx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sec
can't be compared with ctx.in.plt
there, because it is an output section, and ctx.in.plt
is an input section. We'd have to do findSection(ctx, ".plt")
in order to get the .plt
output section.
Another concern I have with manipulating the output sections directly is that maybe one of the non-synthetic input sections might be placed in the .plt
output section? I'm not sure if this really happens with real code, but I'd rather write a solution that works in every case by using SHF_AARCH64_PURECODE
correctly on the input sections.
For an alternate solution, another possible spot I found where we can modify the flags of ctx.in.plt
is in this loop inside addOrphanSections
:
llvm-project/lld/ELF/LinkerScript.cpp
Lines 1015 to 1040 in 77edfbb
// For further --emit-reloc handling code we need target output section | |
// to be created before we create relocation output section, so we want | |
// to create target sections first. We do not want priority handling | |
// for synthetic sections because them are special. | |
size_t n = 0; | |
for (InputSectionBase *isec : ctx.inputSections) { | |
// Process InputSection and MergeInputSection. | |
if (LLVM_LIKELY(isa<InputSection>(isec))) | |
ctx.inputSections[n++] = isec; | |
// In -r links, SHF_LINK_ORDER sections are added while adding their parent | |
// sections because we need to know the parent's output section before we | |
// can select an output section for the SHF_LINK_ORDER section. | |
if (ctx.arg.relocatable && (isec->flags & SHF_LINK_ORDER)) | |
continue; | |
if (auto *sec = dyn_cast<InputSection>(isec)) | |
if (InputSectionBase *rel = sec->getRelocatedSection()) | |
if (auto *relIS = dyn_cast_or_null<InputSectionBase>(rel->parent)) | |
add(relIS); | |
add(isec); | |
if (ctx.arg.relocatable) | |
for (InputSectionBase *depSec : isec->dependentSections) | |
if (depSec->flags & SHF_LINK_ORDER) | |
add(depSec); | |
} |
Adding something like this here works:
// Only check for PURECODE flag on AArch64 to decide if .plt should have the
// flag as well or not.
bool isAllPurecode = ctx.arg.emachine == EM_AARCH64;
for (InputSectionBase *isec : ctx.inputSections) {
isAllPurecode = isAllPurecode && (isa<SyntheticSection>(isec) ||
!(isec->flags & SHF_EXECINSTR) ||
(isec->flags & SHF_AARCH64_PURECODE));
// ...
}
if (isAllPurecode)
ctx.in.plt->flags |= SHF_AARCH64_PURECODE;
We can save looping through the input sections an extra time in the PltSection
constructor, but the logic gets decoupled from PltSection
, which I'm not a fan of. What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the long comment!
It is possible to find the OutputSection that contains the .plt, it would be something like .in.plt->parent
. That would mean that we would only need to check OutputSections rather than input sections. If the .plt is mixed with non XO InputSections then the OutputSection is in will be non-XO. However ...
Taking a step back, I think it will be worth thinking through what the heuristics for Program Header generation are when it comes to XO. Apologies I didn't have time to write this up yesterday Evening. I think there could be more than just the .plt
that is affected.
In principle any orphan section with SHF_PURECODE (that generates an OutputSection) will propagate SHF_PURECODE to the OutputSection, which is going to auto-generate an XO program header on a transition from non-XO, which isn't going to be helpful for a non-XO program. How much of a problem this is I don't know. For an Android/Linux system needing full XO, there may be a non-zero number of libraries that need SHF_PURECODE just in case they are used in an XO context. In a contrived worst case we have alternate XO, non-XO output sections and get a separate program header for each OutputSection.
Thinking of a model for how this would be used, I think we have two (possibly three) cases:
- Bare-metal system (how XO is currently used on Arm), no PLT, no dynamic linking, linker script, potential mix of XO (my code) and non XO (library code).
- An OS that can support all XO or non-XO for a particular program, PLT highly likely, default linker script, dynamic linking. I'm guessing this is where Android will be heading.
- An OS that can support separate parts of the program being XO and non-XO (presumably separated by a page boundary). I don't think that anyone needs/wants this level of flexibility.
For the bare-metal system we would like to have separate XO and non-XO program headers for the same output file. It is up to the user to write the linker script to separate out the XO and non-XO into distinct memory regions, and possibly use PHDRS to make sure they get what they need.
For the OS that can only have a program thats XO or not XO, we ideally want all executable OutputSections to be XO before generating an XO program header.
For the OS that can have multiple XO and non XO parts, then there's no good simple heuristic that I can think of that's always going to work. However I think we can probably rule this use case out.
With that in mind I propose that we do something like:
- Unconditionally add SHF_PURECODE to the .plt.
- For a program using an OS (defined as having a dynamic section, or a PLT), then when auto generating program headers (no linker script PHDRS), then clear SHF_PURECODE from all executable OutputSections if at least on executable OutputSection is not XO.
- Leave behaviour as it is for bare-metal programs (that don't have a PLT or dynamic section).
Not sure I've got that completely right, but it should be close. I think that could be applied in createPhdrs().
The alternative view is that this is too complicated and it is only the PLT that the linker should care about, getting XO right is the users responsibility.
In that case it may simplify to
- Unconditionally add SHF_PURECODE to the .plt
- If at least one executable OutputSection has non-XO, then find the OutputSection containing the PLT (.in.plt->parent) and clear SHF_PURECODE from that OutputSection.
Again this could be done at the start of createPhdrs().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough reply! It really helped refine my understanding of the problem.
You're right that the main use case we care about is the whole program being XO or RX. What do you think about doing the following:
- Unconditionally set
SHF_AARCH64_PURECODE
for.plt
. - When auto generating program headers, consider XO and RX sections compatible, allowing them to be placed in the same segment. We could also add a flag similar to
--rosegment
to control this behaviour. - At this point we don't need to strip the PURECODE flag from the output sections, they'll just be placed in a program header that is RX instead of XO. Leaving the section flag intact shouldn't cause any issues I think.
We can do this by just adding the following snippet in createPhdrs()
:
if (newFlags & PF_X)
incompatible &= ~PF_R;
For bare-metal targets, this wouldn't allow separate auto-generated program headers with XO and RX code though, a linker script (or just a flag?) would be required to separate those out into different program headers. I don't have any experience working with bare-metal, do you think this is a reasonable requirement? If not, can we detect in the linker whether we're linking for a target with an OS or not?
If you think this would be a good approach, I'll open a separate PR superseding this one, as it's a more general solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the bare metal case with linker script I think that would be OK. I expect that in a majority of cases a MEMORY region would be setup for the XO and non-XO memory. These would have distinct addresses such that a separate program header would be created anyway. If it weren't then PHDRS could be used to force the separation.
We'd need to release note the change in behaviour but I think that it is worth it to get the merging case right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a separate PR then with my proposed approach. Thank you for your insights!
…t` flag Following from the discussion in llvm#132224, this seems like the best approach to deal with a mix of XO and RX output sections in the same binary. This change will also simplify the implementation of the PURECODE section flag for AArch64. To control this behaviour, the `--[no-]xosegment` flag is added to LLD (similarly to `--[no-]rosegment`), which determines whether to allow merging XO and RX sections in the same segment. The default value is `--no-xosegment`, which is a breaking change compared to the previous behaviour. Release notes are also added, since this will be a breaking change.
I opened #132412 as a general approach of dealing with a mix of XO and RX sections in the same binary. I'll close this PR because of that. I'll do a separate change regarding the section flags for |
…t` flag Following from the discussion in llvm#132224, this seems like the best approach to deal with a mix of XO and RX output sections in the same binary. This change will also simplify the implementation of the PURECODE section flag for AArch64. To control this behaviour, the `--[no-]xosegment` flag is added to LLD (similarly to `--[no-]rosegment`), which determines whether to allow merging XO and RX sections in the same segment. The default value is `--no-xosegment`, which is a breaking change compared to the previous behaviour. Release notes are also added, since this will be a breaking change.
…t` flag (#132412) Following from the discussion in #132224, this seems like the best approach to deal with a mix of XO and RX output sections in the same binary. This change will also simplify the implementation of the PURECODE section flag for AArch64. To control this behaviour, the `--[no-]xosegment` flag is added to LLD (similarly to `--[no-]rosegment`), which determines whether to allow merging XO and RX sections in the same segment. The default value is `--no-xosegment`, which is a breaking change compared to the previous behaviour. Release notes are also added, since this will be a breaking change.
Mark the synthetic
.plt
section with theSHF_AARCH64_PURECODE
section flag if all executable input sections also have that flag.Without this change, if we were to compile a binary with
-mexecute-only
, the final executable will only have.plt
not marked with the section flag, causing it to be placed in a different load segment. This leads to an extra page's worth of memory usage unnecessarily when running the executable.A similar issue happens if we always set the section flag on
.plt
and compile a binary without-mexecute-only
, so the solution should match theSHF_AARCH64_PURECODE
section flags between.plt
and all other executable sections.