Skip to content

[lld-macho] Fix duplicate GOT entries for personality functions #95054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

BertalanD
Copy link
Member

As stated in UnwindInfoSectionImpl::prepareRelocations's comments, the unwind info uses section+addend relocations for personality functions defined in the same file as the function itself. As personality functions are always accessed via the GOT, we need to resolve those to a symbol. Previously, we did this by keeping a map which resolves these to symbols, creating a synthetic symbol if we didn't find it in the map.

This approach has an issue: if we process the object file containing the personality function before any external uses, the entry in the map remains unpopulated, so we create a synthetic symbol and a corresponding GOT entry. If we encounter a relocation to it in a later file which requires GOT (such as in __eh_frame), we add that symbol to the GOT, too, effectively creating two entries which point to the same piece of code.

This commit fixes that by searching the personality function's section for a symbol at that offset which already has a GOT entry, and only creating a synthetic symbol if there is none. As all non-unwind sections are already processed by this point, it ensures no duplication.

This should only really affect our tests (and make them clearer), as personality functions are usually defined in platform runtime libraries.

As stated in `UnwindInfoSectionImpl::prepareRelocations`'s comments, the
unwind info uses section+addend relocations for personality functions
defined in the same file as the function itself. As personality
functions are always accessed via the GOT, we need to resolve those to a
symbol. Previously, we did this by keeping a map which resolves these to
symbols, creating a synthetic symbol if we didn't find it in the map.

This approach has an issue: if we process the object file containing the
personality function before any external uses, the entry in the map
remains unpopulated, so we create a synthetic symbol and a corresponding
GOT entry. If we encounter a relocation to it in a later file which
requires GOT (such as in `__eh_frame`), we add that symbol to the GOT,
too, effectively creating two entries which point to the same piece of
code.

This commit fixes that by searching the personality function's section
for a symbol at that offset which already has a GOT entry, and only
creating a synthetic symbol if there is none. As all non-unwind sections
are already processed by this point, it ensures no duplication.

This should only really affect our tests (and make them clearer), as
personality functions are usually defined in platform runtime libraries.
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: Daniel Bertalan (BertalanD)

Changes

As stated in UnwindInfoSectionImpl::prepareRelocations's comments, the unwind info uses section+addend relocations for personality functions defined in the same file as the function itself. As personality functions are always accessed via the GOT, we need to resolve those to a symbol. Previously, we did this by keeping a map which resolves these to symbols, creating a synthetic symbol if we didn't find it in the map.

This approach has an issue: if we process the object file containing the personality function before any external uses, the entry in the map remains unpopulated, so we create a synthetic symbol and a corresponding GOT entry. If we encounter a relocation to it in a later file which requires GOT (such as in __eh_frame), we add that symbol to the GOT, too, effectively creating two entries which point to the same piece of code.

This commit fixes that by searching the personality function's section for a symbol at that offset which already has a GOT entry, and only creating a synthetic symbol if there is none. As all non-unwind sections are already processed by this point, it ensures no duplication.

This should only really affect our tests (and make them clearer), as personality functions are usually defined in platform runtime libraries.


Full diff: https://github.com/llvm/llvm-project/pull/95054.diff

3 Files Affected:

  • (modified) lld/MachO/UnwindInfoSection.cpp (+23-11)
  • (modified) lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s (+8-7)
  • (modified) lld/test/MachO/compact-unwind.s (+1-1)
diff --git a/lld/MachO/UnwindInfoSection.cpp b/lld/MachO/UnwindInfoSection.cpp
index 0ac2f39a6180c..7033481d6014b 100644
--- a/lld/MachO/UnwindInfoSection.cpp
+++ b/lld/MachO/UnwindInfoSection.cpp
@@ -298,19 +298,31 @@ void UnwindInfoSectionImpl::prepareRelocations(ConcatInputSection *isec) {
       assert(!isCoalescedWeak(referentIsec));
       // Personality functions can be referenced via section relocations
       // if they live in the same object file. Create placeholder synthetic
-      // symbols for them in the GOT.
+      // symbols for them in the GOT. If the corresponding symbol is already
+      // in the GOT, use that to avoid creating a duplicate entry. All GOT
+      // entries needed by non-unwind sections will have already been added
+      // by this point.
       Symbol *&s = personalityTable[{referentIsec, r.addend}];
       if (s == nullptr) {
-        // This runs after dead stripping, so the noDeadStrip argument does not
-        // matter.
-        s = make<Defined>("<internal>", /*file=*/nullptr, referentIsec,
-                          r.addend, /*size=*/0, /*isWeakDef=*/false,
-                          /*isExternal=*/false, /*isPrivateExtern=*/false,
-                          /*includeInSymtab=*/true,
-                          /*isReferencedDynamically=*/false,
-                          /*noDeadStrip=*/false);
-        s->used = true;
-        in.got->addEntry(s);
+        Defined *const *gotEntry =
+            llvm::find_if(referentIsec->symbols, [&](Defined const *d) {
+              return d->value == static_cast<uint64_t>(r.addend) &&
+                     d->isInGot();
+            });
+        if (gotEntry != referentIsec->symbols.end()) {
+          s = *gotEntry;
+        } else {
+          // This runs after dead stripping, so the noDeadStrip argument does
+          // not matter.
+          s = make<Defined>("<internal>", /*file=*/nullptr, referentIsec,
+                            r.addend, /*size=*/0, /*isWeakDef=*/false,
+                            /*isExternal=*/false, /*isPrivateExtern=*/false,
+                            /*includeInSymtab=*/true,
+                            /*isReferencedDynamically=*/false,
+                            /*noDeadStrip=*/false);
+          s->used = true;
+          in.got->addEntry(s);
+        }
       }
       r.referent = s;
       r.addend = 0;
diff --git a/lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s b/lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
index 676577d6b17e9..35f39ba5fb1e2 100644
--- a/lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
+++ b/lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
@@ -42,19 +42,20 @@
 
 # RUN: llvm-objdump --macho --indirect-symbols --unwind-info --bind %t/d.out | FileCheck %s --check-prefixes=D -D#%x,OFF=0x100000000
 
-# A:      Indirect symbols for (__DATA_CONST,__got)
+# A:      Indirect symbols for (__DATA_CONST,__got) 4 entries
 # A-NEXT: address                    index name
 # A:      0x[[#%x,GXX_PERSONALITY_LO:]] [[#]] ___gxx_personality_v0
+# A:      0x[[#%x,PERSONALITY_1:]]      [[#]] _personality_1
+# A:      0x[[#%x,PERSONALITY_2:]]      [[#]] _personality_2
 # A:      0x[[#%x,GXX_PERSONALITY_HI:]] [[#]] ___gxx_personality_v0
-# A:      0x[[#%x,PERSONALITY_1:]]  LOCAL
-# A:      0x[[#%x,PERSONALITY_2:]]  LOCAL
 
 # BC:      Indirect symbols for (__DATA_CONST,__got)
 # BC-NEXT: address                    index name
-# C:       0x[[#%x,GXX_PERSONALITY_HI:]] LOCAL
 # BC:      0x[[#%x,GXX_PERSONALITY_LO:]] LOCAL
-# BC:      0x[[#%x,PERSONALITY_1:]]      LOCAL
-# BC:      0x[[#%x,PERSONALITY_2:]]      LOCAL
+# C:       0x[[#%x,GXX_PERSONALITY_HI:]] [[#]] ___gxx_personality_v0
+# BC:      0x[[#%x,PERSONALITY_1:]]      [[#]] _personality_1
+# BC:      0x[[#%x,PERSONALITY_2:]]      [[#]] _personality_2
+# BC-EMPTY:
 
 # CHECK:        Personality functions: (count = 3)
 # CHECK-DAG:     personality[{{[0-9]+}}]: 0x{{0*}}[[#GXX_PERSONALITY_LO-OFF]]
@@ -66,7 +67,7 @@
 # A-NEXT: __DATA_CONST __got        0x[[#GXX_PERSONALITY_LO-0]] pointer         0 libc++abi        ___gxx_personality_v0
 
 
-# D:      Indirect symbols for (__DATA_CONST,__got)
+# D:      Indirect symbols for (__DATA_CONST,__got) 6 entries
 # D-NEXT: address                    index name
 # D:      0x[[#%x,GXX_PERSONALITY_HI:]] [[#]] ___gxx_personality_v0
 # D:      0x[[#%x,PERSONALITY_1:]] [[#]] _personality_1
diff --git a/lld/test/MachO/compact-unwind.s b/lld/test/MachO/compact-unwind.s
index fa73ccb10a32a..27e4b44dc0b09 100644
--- a/lld/test/MachO/compact-unwind.s
+++ b/lld/test/MachO/compact-unwind.s
@@ -29,7 +29,7 @@
 # FIRST:      Indirect symbols for (__DATA_CONST,__got)
 # FIRST-NEXT: address                    index name
 # FIRST-DAG:  0x[[#%x,GXX_PERSONALITY:]] [[#]] ___gxx_personality_v0
-# FIRST-DAG:  0x[[#%x,MY_PERSONALITY:]]  LOCAL
+# FIRST-DAG:  0x[[#%x,MY_PERSONALITY:]]
 
 # SECOND:      Indirect symbols for (__DATA_CONST,__got)
 # SECOND-NEXT: address                    index name

Copy link
Contributor

@nico nico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks!

@oontvoo has looked at personality functions in lld in the past, and @speednoisemovement is generally looking at lld things, so cc'ing them for their information. Feel free to hit "merge" in a few hours though :)

@oontvoo
Copy link
Member

oontvoo commented Jun 11, 2024

as personality functions are usually defined in platform runtime libraries.

Just wanted to point out that, we have seen a few use-cases that involve custom personality symbols (IOWs., they're user-defined symbols and NOT in the runtime libraries) . In fact, the compact-unwind-both-local-and-dylib-personality.s was testing some of that.

The change itself LGTM, though.

@BertalanD BertalanD merged commit 6afbda7 into llvm:main Jun 11, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants