-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[lldb][Mach-O] Handle shared cache binaries correctly #117832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb][Mach-O] Handle shared cache binaries correctly #117832
Conversation
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables. However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279
@llvm/pr-subscribers-lldb Author: Jason Molenda (jasonmolenda) ChangesThe Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279 Full diff: https://github.com/llvm/llvm-project/pull/117832.diff 1 Files Affected:
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 079fd905037d45..5f047d84d53e73 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -2244,6 +2244,18 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
// code.
typedef AddressDataArray<lldb::addr_t, bool, 100> FunctionStarts;
+ // The virtual address offset from TEXT to the symbol/string tables
+ // in the LINKEDIT section. The LC_SYMTAB symtab_command `symoff` and
+ // `stroff` are uint32_t's that give the file offset in the binary.
+ // If the binary is laid down in memory with all segments consecutive,
+ // then these are the offsets from the mach-o header aka TEXT segment
+ // to the tables' virtual addresses.
+ // But if the binary is loaded in virtual address space with different
+ // slides for the segments (e.g. a shared cache), the LINKEDIT may be
+ // more than 4GB away from TEXT, and a 32-bit offset is not sufficient.
+ offset_t symbol_table_offset_from_TEXT = 0;
+ offset_t string_table_offset_from_TEXT = 0;
+
// Record the address of every function/data that we add to the symtab.
// We add symbols to the table in the order of most information (nlist
// records) to least (function starts), and avoid duplicating symbols
@@ -2282,6 +2294,8 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
if (m_data.GetU32(&offset, &symtab_load_command.symoff, 4) ==
nullptr) // fill in symoff, nsyms, stroff, strsize fields
return;
+ string_table_offset_from_TEXT = symtab_load_command.stroff;
+ symbol_table_offset_from_TEXT = symtab_load_command.symoff;
break;
case LC_DYLD_INFO:
@@ -2403,9 +2417,9 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
const addr_t linkedit_file_offset = linkedit_section_sp->GetFileOffset();
const addr_t symoff_addr = linkedit_load_addr +
- symtab_load_command.symoff -
+ symbol_table_offset_from_TEXT -
linkedit_file_offset;
- strtab_addr = linkedit_load_addr + symtab_load_command.stroff -
+ strtab_addr = linkedit_load_addr + string_table_offset_from_TEXT -
linkedit_file_offset;
// Always load dyld - the dynamic linker - from memory if we didn't
@@ -2473,17 +2487,17 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
lldb::addr_t linkedit_offset = linkedit_section_sp->GetFileOffset();
lldb::offset_t linkedit_slide =
linkedit_offset - m_linkedit_original_offset;
- symtab_load_command.symoff += linkedit_slide;
- symtab_load_command.stroff += linkedit_slide;
+ symbol_table_offset_from_TEXT += linkedit_slide;
+ string_table_offset_from_TEXT += linkedit_slide;
dyld_info.export_off += linkedit_slide;
dysymtab.indirectsymoff += linkedit_slide;
function_starts_load_command.dataoff += linkedit_slide;
exports_trie_load_command.dataoff += linkedit_slide;
}
- nlist_data.SetData(m_data, symtab_load_command.symoff,
+ nlist_data.SetData(m_data, symbol_table_offset_from_TEXT,
nlist_data_byte_size);
- strtab_data.SetData(m_data, symtab_load_command.stroff,
+ strtab_data.SetData(m_data, string_table_offset_from_TEXT,
strtab_data_byte_size);
// We shouldn't have exports data from both the LC_DYLD_INFO command
|
I have two criticisms of the patch as-is. We read the LC_SYMTAB load command into The second criticism is that I don't have a good way to test it. I have a request to have ProcessMachCore treat an LC_SEGMENT that has a virtual address & size, but no file size as an all-zeroes segment. In which case it would be possible to create a mach-o corefile that is larger than 4GB in size, but actually only uses a couple hundred kb on disk (and doesn't fill the CI filesystems), and then we'd have to hand-write a mach-o file with a LINKEDIT 4GB away from the TEXT segment. There's a couple pieces that don't exist to do all of this right now, though. |
Is changing |
Yes, this is one possibility, and maybe the best. The current structure reflects the in-binary layout, so it can be read (with endian fixing) on one go. |
Updated patch to have a local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @Michael137's suggestion and the name makes it obvious what's going on. LGTM.
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables. However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279 (cherry picked from commit 448ac7d)
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables. However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279 (cherry picked from commit 448ac7d)
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables. However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279 (cherry picked from commit 448ac7d)
…-vm-offset-shared-cache-segments-61 [lldb][Mach-O] Handle shared cache binaries correctly (llvm#117832)
…-vm-offset-shared-cache-segments [lldb][Mach-O] Handle shared cache binaries correctly (llvm#117832)
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the `symoff` and `stroff` are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables. However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables. Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure. rdar://140432279 (cherry picked from commit 448ac7d)
The Mach-O load commands have an LC_SYMTAB / struct symtab_command which represents the offset of the symbol table (nlist records) and string table for this binary. In a mach-o binary on disk, these are file offsets. If a mach-o binary is loaded in memory with all segments consecutive, the
symoff
andstroff
are the offsets from the TEXT segment (aka the mach-o header) virtual address to the virtual address of the start of these tables.However, if a Mach-O binary is a part of the shared cache, then the segments will be separated -- they will have different slide values. And it is possible for the LINKEDIT segment to be greater than 4GB away from the TEXT segment in the virtual address space, so these 32-bit offsets cannot express the offset from TEXT segment to these tables.
Create separate uint64_t variables to track the offset to the symbol table and string table, instead of reusing the 32-bit ones in the symtab_command structure.
rdar://140432279