Skip to content

[llvm-objcopy] Add --compress-sections #85036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions llvm/docs/CommandGuide/llvm-objcopy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,14 @@ them.
Compress DWARF debug sections in the output, using the specified format.
Supported formats are ``zlib`` and ``zstd``. Use ``zlib`` if ``<format>`` is omitted.

.. option:: --compress-sections <section>=<format>

Compress or decompress sections matched by ``<section>`` using the specified
format. Supported formats are ``zlib`` and ``zstd``. Specify ``none`` for
decompression. When a section is matched by multiple options, the last one
wins. A wildcard ``<section>`` starting with '!' is disallowed.
Sections within a segment cannot be (de)compressed.

.. option:: --decompress-debug-sections

Decompress any compressed DWARF debug sections in the output.
Expand Down
4 changes: 4 additions & 0 deletions llvm/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@ Changes to the LLVM tools
for ELF input to skip the specified symbols when executing other options
that can change a symbol's name, binding or visibility.

* llvm-objcopy now supports ``--compress-sections`` to compress or decompress
arbitrary sections not within a segment.
(`#85036 <https://github.com/llvm/llvm-project/pull/85036>`_.)

* llvm-profgen now supports COFF+DWARF binaries. This enables Sample-based PGO
on Windows using Intel VTune's SEP. For details on usage, see the `end-user
documentation for SPGO
Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/ObjCopy/CommonConfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,9 @@ struct CommonConfig {
bool DecompressDebugSections = false;

DebugCompressionType CompressionType = DebugCompressionType::None;

SmallVector<std::pair<NameMatcher, llvm::DebugCompressionType>, 0>
compressSections;
};

} // namespace objcopy
Expand Down
34 changes: 26 additions & 8 deletions llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,23 +215,41 @@ static Error dumpSectionToFile(StringRef SecName, StringRef Filename,
}

Error Object::compressOrDecompressSections(const CommonConfig &Config) {
// Build a list of the debug sections we are going to replace.
// We can't call `AddSection` while iterating over sections,
// Build a list of sections we are going to replace.
// We can't call `addSection` while iterating over sections,
// because it would mutate the sections array.
SmallVector<std::pair<SectionBase *, std::function<SectionBase *()>>, 0>
ToReplace;
for (SectionBase &Sec : sections()) {
if ((Sec.Flags & SHF_ALLOC) || !StringRef(Sec.Name).starts_with(".debug"))
std::optional<DebugCompressionType> CType;
for (auto &[Matcher, T] : Config.compressSections)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought: does it make sense to ignore sections specified with compressSections if their compression state already matches the requested one? Alternatively, report a specific error for that case? So e.g. if requested to decompress the already decompressed .symtab, we'd either do nothing or emit an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With recent GNU objcopy, --compress-debug-sections=zstd on a zlib-compressed section will recompress the content with zstd. This is whether our behavior is different from GNU. I've actually tried implementing this behavior, but in the end the benefit doesn't feel clear and the complexity seems quite high (I don't even find a good way to implement it related to our CompressedSection abstraction).

I think the scenario is likely very rare and users might not expect a specific behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps add a test to show the current behaviour then? I'm not sure either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added
## If a section is already compressed, compression request for another format is ignored.

if (Matcher.matches(Sec.Name))
CType = T;
// Handle --compress-debug-sections and --decompress-debug-sections, which
// apply to non-ALLOC debug sections.
if (!(Sec.Flags & SHF_ALLOC) && StringRef(Sec.Name).starts_with(".debug")) {
if (Config.CompressionType != DebugCompressionType::None)
CType = Config.CompressionType;
else if (Config.DecompressDebugSections)
CType = DebugCompressionType::None;
}
if (!CType)
continue;

if (Sec.ParentSegment)
return createStringError(
errc::invalid_argument,
"section '" + Sec.Name +
"' within a segment cannot be (de)compressed");

if (auto *CS = dyn_cast<CompressedSection>(&Sec)) {
if (Config.DecompressDebugSections) {
if (*CType == DebugCompressionType::None)
ToReplace.emplace_back(
&Sec, [=] { return &addSection<DecompressedSection>(*CS); });
}
} else if (Config.CompressionType != DebugCompressionType::None) {
ToReplace.emplace_back(&Sec, [&, S = &Sec] {
} else if (*CType != DebugCompressionType::None) {
ToReplace.emplace_back(&Sec, [=, S = &Sec] {
return &addSection<CompressedSection>(
CompressedSection(*S, Config.CompressionType, Is64Bits));
CompressedSection(*S, *CType, Is64Bits));
});
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Disallow (de)compression for sections within a segment as they are
## effectively immutable.
# RUN: rm -rf %t && mkdir %t && cd %t
# RUN: yaml2obj %s -o a
# RUN: not llvm-objcopy a /dev/null --compress-sections .text=zlib 2>&1 | FileCheck %s --implicit-check-not=error:

# CHECK: error: 'a': section '.text' within a segment cannot be (de)compressed

# RUN: not llvm-objcopy a /dev/null --compress-sections foo=none 2>&1 | FileCheck %s --check-prefix=CHECK2 --implicit-check-not=error:

# CHECK2: error: 'a': section 'foo' within a segment cannot be (de)compressed

## There is an error even if 'foo' is already compressed with zlib.
# RUN: not llvm-objcopy a /dev/null --compress-sections foo=zlib 2>&1 | FileCheck %s --check-prefix=CHECK3 --implicit-check-not=error:

# CHECK3: error: 'a': section 'foo' within a segment cannot be (de)compressed

--- !ELF
FileHeader:
Class: ELFCLASS64
Data: ELFDATA2LSB
Type: ET_EXEC
Machine: EM_X86_64
ProgramHeaders:
- Type: PT_LOAD
FirstSec: .text
LastSec: foo
Align: 0x1000
Offset: 0x1000
Sections:
- Name: .text
Type: SHT_PROGBITS
Offset: 0x1000
Content: C3
- Name: foo
Type: SHT_PROGBITS
Flags: [ SHF_COMPRESSED ]
Content: 010000000000000040000000000000000100000000000000789cd36280002d3269002f800151
128 changes: 128 additions & 0 deletions llvm/test/tools/llvm-objcopy/ELF/compress-sections.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# REQUIRES: x86-registered-target, zlib, zstd

# RUN: rm -rf %t && mkdir %t && cd %t
# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o a.o
## '*0=none' wins because it is the last. '*0' sections are decompressed (if originally compressed) or kept unchanged (if uncompressed).
## No section is named 'nomatch'. The third option is a no-op.
# RUN: llvm-objcopy a.o out --compress-sections='*0=zlib' --compress-sections '*0=none' --compress-sections 'nomatch=none' 2>&1 | count 0
# RUN: llvm-readelf -S out | FileCheck %s --check-prefix=CHECK1

# CHECK1: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK1: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK1: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK1-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 3 8
# CHECK1-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK1-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK1: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK1-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 7 8
# CHECK1-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK1-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

## Mixing zlib and zstd.
# RUN: llvm-objcopy a.o out2 --compress-sections '*c0=zlib' --compress-sections .debug_str=zstd
# RUN: llvm-readelf -Sr -x nonalloc0 -x .debug_str out2 2>&1 | FileCheck %s --check-prefix=CHECK2
# RUN: llvm-readelf -z -x nonalloc0 -x .debug_str out2 | FileCheck %s --check-prefix=CHECK2DE

# CHECK2: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK2: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK2: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK2-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 3 8
# CHECK2-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK2-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK2: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 8
# CHECK2-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 7 8
# CHECK2-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK2-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MSC 0 0 8

## llvm-readelf -r doesn't support SHF_COMPRESSED SHT_RELA.
# CHECK2: warning: {{.*}}: unable to read relocations from SHT_RELA section with index 8: section [index 8] has an invalid sh_size ([[#]]) which is not a multiple of its sh_entsize (24)

# CHECK2: Hex dump of section 'nonalloc0':
## zlib with ch_size=0x10
# CHECK2-NEXT: 01000000 00000000 10000000 00000000
# CHECK2-NEXT: 08000000 00000000 {{.*}}
# CHECK2: Hex dump of section '.debug_str':
## zstd with ch_size=0x38
# CHECK2-NEXT: 02000000 00000000 38000000 00000000
# CHECK2-NEXT: 01000000 00000000 {{.*}}

# CHECK2DE: Hex dump of section 'nonalloc0':
# CHECK2DE-NEXT: 0x00000000 00000000 00000000 00000000 00000000 ................
# CHECK2DE-EMPTY:
# CHECK2DE-NEXT: Hex dump of section '.debug_str':
# CHECK2DE-NEXT: 0x00000000 41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA

## --decompress-debug-sections takes precedence, even if it is before --compress-sections.
# RUN: llvm-objcopy a.o out3 --decompress-debug-sections --compress-sections .debug_str=zstd
# RUN: llvm-readelf -S out3 | FileCheck %s --check-prefix=CHECK3

# CHECK3: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

# RUN: llvm-objcopy a.o out4 --compress-sections '*0=zlib'
# RUN: llvm-readelf -S out4 | FileCheck %s --check-prefix=CHECK4

# CHECK4: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK4: .text PROGBITS [[#%x,TEXT:]] [[#%x,]] [[#%x,]] 00 AX 0 0 4
# CHECK4: foo0 PROGBITS [[#%x,FOO0:]] [[#%x,]] [[#%x,]] 00 AC 0 0 8
# CHECK4-NEXT: .relafoo0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 3 8
# CHECK4-NEXT: foo1 PROGBITS [[#%x,FOO1:]] [[#%x,]] [[#%x,]] 00 A 0 0 8
# CHECK4-NEXT: .relafoo1 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 I 11 5 8
# CHECK4: nonalloc0 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 8
# CHECK4-NEXT: .relanonalloc0 RELA [[#%x,]] [[#%x,]] [[#%x,]] 18 IC 11 7 8
# CHECK4-NEXT: nonalloc1 PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 8
# CHECK4-NEXT: .debug_str PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 01 MS 0 0 1

## If a section is already compressed, compression request for another format is ignored.
# RUN: llvm-objcopy a.o out5 --compress-sections 'nonalloc0=zlib'
# RUN: llvm-readelf -x nonalloc0 out5 | FileCheck %s --check-prefix=CHECK5
# RUN: llvm-objcopy out5 out5a --compress-sections 'nonalloc0=zstd'
# RUN: cmp out5 out5a

# CHECK5: Hex dump of section 'nonalloc0':
## zlib with ch_size=0x10
# CHECK5-NEXT: 01000000 00000000 10000000 00000000
# CHECK5-NEXT: 08000000 00000000 {{.*}}

# RUN: not llvm-objcopy --compress-sections=foo a.o out 2>&1 | \
# RUN: FileCheck %s --check-prefix=ERR1 --implicit-check-not=error:
# ERR1: error: --compress-sections: parse error, not 'section-glob=[none|zlib|zstd]'

# RUN: llvm-objcopy --compress-sections 'a[=zlib' a.o out 2>&1 | \
# RUN: FileCheck %s --check-prefix=ERR2 --implicit-check-not=error:
# ERR2: warning: invalid glob pattern, unmatched '['

# RUN: not llvm-objcopy a.o out --compress-sections='.debug*=zlib-gabi' --compress-sections='.debug*=' 2>&1 | \
# RUN: FileCheck -check-prefix=ERR3 %s
# ERR3: error: invalid or unsupported --compress-sections format: .debug*=zlib-gabi

# RUN: not llvm-objcopy a.o out --compress-sections='!.debug*=zlib' 2>&1 | \
# RUN: FileCheck -check-prefix=ERR4 %s
# ERR4: error: --compress-sections: negative pattern is unsupported

.globl _start
_start:
ret

.section foo0,"a"
.balign 8
.quad .text-.
.quad .text-.
.section foo1,"a"
.balign 8
.quad .text-.
.quad .text-.
.section nonalloc0,""
.balign 8
.quad .text+1
.quad .text+2
sym0:
.section nonalloc1,""
.balign 8
.quad 42
sym1:

.section .debug_str,"MS",@progbits,1
.Linfo_string0:
.asciz "AAAAAAAAAAAAAAAAAAAAAAAAAAA"
.Linfo_string1:
.asciz "BBBBBBBBBBBBBBBBBBBBBBBBBBB"
29 changes: 29 additions & 0 deletions llvm/test/tools/llvm-objcopy/ELF/decompress-sections.test
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,42 @@
# RUN: yaml2obj %s -o %t
# RUN: llvm-objcopy --decompress-debug-sections %t %t.de
# RUN: llvm-readelf -S %t.de | FileCheck %s
# RUN: llvm-objcopy --compress-sections '*nonalloc=none' --compress-sections .debugx=none %t %t.1.de
# RUN: cmp %t.de %t.1.de

# CHECK: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK: .debug_alloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 AC 0 0 0
# CHECK-NEXT: .debug_nonalloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK-NEXT: .debugx PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK-NEXT: nodebug PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 0

# RUN: llvm-objcopy --compress-sections '.debug*=none' %t %t2.de
# RUN: llvm-readelf -S -x .debug_alloc -x .debug_nonalloc -x .debugx %t2.de | FileCheck %s --check-prefix=CHECK2

# CHECK2: Name Type Address Off Size ES Flg Lk Inf Al
# CHECK2: .debug_alloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 A 0 0 1
# CHECK2-NEXT: .debug_nonalloc PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK2-NEXT: .debugx PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 0 0 1
# CHECK2-NEXT: nodebug PROGBITS 0000000000000000 [[#%x,]] [[#%x,]] 00 C 0 0 0

# CHECK2: Hex dump of section '.debug_alloc':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-EMPTY:
# CHECK2: Hex dump of section '.debug_nonalloc':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-EMPTY:
# CHECK2-NEXT: Hex dump of section '.debugx':
# CHECK2-NEXT: 0x00000000 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000010 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000020 2a000000 00000000 2a000000 00000000 *.......*.......
# CHECK2-NEXT: 0x00000030 2a000000 00000000 2a000000 00000000 *.......*.......

--- !ELF
FileHeader:
Class: ELFCLASS64
Expand Down
36 changes: 36 additions & 0 deletions llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,42 @@ objcopy::parseObjcopyOptions(ArrayRef<const char *> RawArgsArr,
return createStringError(errc::invalid_argument, Reason);
}

for (const auto *A : InputArgs.filtered(OBJCOPY_compress_sections)) {
SmallVector<StringRef, 0> Fields;
StringRef(A->getValue()).split(Fields, '=');
if (Fields.size() != 2 || Fields[1].empty()) {
return createStringError(
errc::invalid_argument,
A->getSpelling() +
": parse error, not 'section-glob=[none|zlib|zstd]'");
}

auto Type = StringSwitch<DebugCompressionType>(Fields[1])
.Case("zlib", DebugCompressionType::Zlib)
.Case("zstd", DebugCompressionType::Zstd)
.Default(DebugCompressionType::None);
if (Type == DebugCompressionType::None && Fields[1] != "none") {
return createStringError(
errc::invalid_argument,
"invalid or unsupported --compress-sections format: %s",
A->getValue());
}

auto &P = Config.compressSections.emplace_back();
P.second = Type;
auto Matcher =
NameOrPattern::create(Fields[0], SectionMatchStyle, ErrorCallback);
// =none allows overriding a previous =zlib or =zstd. Reject negative
// patterns, which would be confusing.
if (Matcher && !Matcher->isPositiveMatch()) {
return createStringError(
errc::invalid_argument,
"--compress-sections: negative pattern is unsupported");
}
if (Error E = P.first.addMatcher(std::move(Matcher)))
return std::move(E);
}

Config.AddGnuDebugLink = InputArgs.getLastArgValue(OBJCOPY_add_gnu_debuglink);
// The gnu_debuglink's target is expected to not change or else its CRC would
// become invalidated and get rejected. We can avoid recalculating the
Expand Down
6 changes: 6 additions & 0 deletions llvm/tools/llvm-objcopy/ObjcopyOpts.td
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ def : Flag<["--"], "compress-debug-sections">, Alias<compress_debug_sections>,
AliasArgs<["zlib"]>;
def decompress_debug_sections : Flag<["--"], "decompress-debug-sections">,
HelpText<"Decompress DWARF debug sections">;
defm compress_sections
: Eq<"compress-sections",
"Compress or decompress sections using specified format. Supported "
"formats: zlib, zstd. Specify 'none' for decompression">,
MetaVarName<"<section-glob>=<format>">;

defm split_dwo
: Eq<"split-dwo", "Equivalent to --extract-dwo and <dwo-file> as the output file and no other options, "
"and then --strip-dwo on the input file">,
Expand Down