Skip to content

[LTO][Legacy] Add new C APIs to query undefined symbols in assembly #145413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

cachemeifyoucan
Copy link
Collaborator

Add new APIs to legacy LTO C API to surface undefined symbols that
parsed from assembly. This information is needed by thin LTO to figure
out the symbols not to dead strip, while such information is
automatically forwarded in full LTO already. Linker needs to fetch the
information and adds the undefs from assembly to MustPreserveSymbols
list just like treating undefs from a non-LTO object file.

Resolves: #29340

Created using spr 1.3.6
Created using spr 1.3.6
@llvmbot llvmbot added the LTO Link time optimization (regular/full LTO or ThinLTO) label Jun 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2025

@llvm/pr-subscribers-lto

Author: Steven Wu (cachemeifyoucan)

Changes

Add new APIs to legacy LTO C API to surface undefined symbols that
parsed from assembly. This information is needed by thin LTO to figure
out the symbols not to dead strip, while such information is
automatically forwarded in full LTO already. Linker needs to fetch the
information and adds the undefs from assembly to MustPreserveSymbols
list just like treating undefs from a non-LTO object file.

Resolves: #29340


Full diff: https://github.com/llvm/llvm-project/pull/145413.diff

6 Files Affected:

  • (modified) llvm/include/llvm-c/lto.h (+16-1)
  • (modified) llvm/include/llvm/LTO/legacy/LTOModule.h (+8)
  • (added) llvm/test/LTO/AArch64/module-asm.ll (+21)
  • (modified) llvm/tools/llvm-lto/llvm-lto.cpp (+2)
  • (modified) llvm/tools/lto/lto.cpp (+9)
  • (modified) llvm/tools/lto/lto.exports (+2)
diff --git a/llvm/include/llvm-c/lto.h b/llvm/include/llvm-c/lto.h
index 5ceb02224d2bb..91195b0fe9458 100644
--- a/llvm/include/llvm-c/lto.h
+++ b/llvm/include/llvm-c/lto.h
@@ -46,7 +46,7 @@ typedef bool lto_bool_t;
  * @{
  */
 
-#define LTO_API_VERSION 29
+#define LTO_API_VERSION 30
 
 /**
  * \since prior to LTO_API_VERSION=3
@@ -286,6 +286,21 @@ lto_module_get_symbol_name(lto_module_t mod, unsigned int index);
 extern lto_symbol_attributes
 lto_module_get_symbol_attribute(lto_module_t mod, unsigned int index);
 
+/**
+ * Returns the number of asm undefined symbols in the object module.
+ *
+ * \since prior to LTO_API_VERSION=30
+ */
+extern unsigned int lto_module_get_num_asm_undef_symbols(lto_module_t mod);
+
+/**
+ * Returns the name of the ith asm undefined symbol in the object module.
+ *
+ * \since prior to LTO_API_VERSION=30
+ */
+extern const char *lto_module_get_asm_undef_symbol_name(lto_module_t mod,
+                                                        unsigned int index);
+
 /**
  * Returns the module's linker options.
  *
diff --git a/llvm/include/llvm/LTO/legacy/LTOModule.h b/llvm/include/llvm/LTO/legacy/LTOModule.h
index 2b6a8734e78f6..e4d52ac067a6d 100644
--- a/llvm/include/llvm/LTO/legacy/LTOModule.h
+++ b/llvm/include/llvm/LTO/legacy/LTOModule.h
@@ -143,6 +143,14 @@ struct LTOModule {
     return StringRef();
   }
 
+  uint32_t getAsmUndefSymbolCount() { return _asm_undefines.size(); }
+
+  StringRef getAsmUndefSymbolName(uint32_t index) {
+    if (index < _asm_undefines.size())
+      return _asm_undefines[index];
+    return StringRef();
+  }
+
   const GlobalValue *getSymbolGV(uint32_t index) {
     if (index < _symbols.size())
       return _symbols[index].symbol;
diff --git a/llvm/test/LTO/AArch64/module-asm.ll b/llvm/test/LTO/AArch64/module-asm.ll
new file mode 100644
index 0000000000000..321c7890e9df0
--- /dev/null
+++ b/llvm/test/LTO/AArch64/module-asm.ll
@@ -0,0 +1,21 @@
+; RUN: llvm-as %s -o %t.o
+; RUN: llvm-lto %t.o --list-symbols-only | FileCheck %s
+
+; CHECK: ___foo    { function defined hidden }
+; CHECK: ___bar    { function defined default }
+; CHECK: _foo    { data defined default }
+; CHECK: ___foo    { asm extern }
+
+target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "arm64-apple-macosx12.0.0"
+
+module asm ".globl _foo"
+module asm "_foo = ___foo"
+
+define hidden i32 @__foo() {
+  ret i32 0
+}
+
+define i32 @__bar() {
+  ret i32 0
+}
diff --git a/llvm/tools/llvm-lto/llvm-lto.cpp b/llvm/tools/llvm-lto/llvm-lto.cpp
index 21953ee98d6a0..05e9502e3abbe 100644
--- a/llvm/tools/llvm-lto/llvm-lto.cpp
+++ b/llvm/tools/llvm-lto/llvm-lto.cpp
@@ -475,6 +475,8 @@ static void testLTOModule(const TargetOptions &Options) {
         printLTOSymbolAttributes(Module->getSymbolAttributes(I));
         outs() << "\n";
       }
+      for (int I = 0, E = Module->getAsmUndefSymbolCount(); I != E; ++I)
+        outs() << Module->getAsmUndefSymbolName(I) << "    { asm extern }\n";
     }
     if (QueryHasCtorDtor)
       outs() << Filename
diff --git a/llvm/tools/lto/lto.cpp b/llvm/tools/lto/lto.cpp
index bb64d42ccced1..467a4da27dcd8 100644
--- a/llvm/tools/lto/lto.cpp
+++ b/llvm/tools/lto/lto.cpp
@@ -322,6 +322,15 @@ lto_symbol_attributes lto_module_get_symbol_attribute(lto_module_t mod,
   return unwrap(mod)->getSymbolAttributes(index);
 }
 
+unsigned int lto_module_get_num_asm_undef_symbols(lto_module_t mod) {
+  return unwrap(mod)->getAsmUndefSymbolCount();
+}
+
+const char *lto_module_get_asm_undef_symbol_name(lto_module_t mod,
+                                                 unsigned int index) {
+  return unwrap(mod)->getAsmUndefSymbolName(index).data();
+}
+
 const char* lto_module_get_linkeropts(lto_module_t mod) {
   return unwrap(mod)->getLinkerOpts().data();
 }
diff --git a/llvm/tools/lto/lto.exports b/llvm/tools/lto/lto.exports
index 4164c3919a97f..850e159b725a3 100644
--- a/llvm/tools/lto/lto.exports
+++ b/llvm/tools/lto/lto.exports
@@ -14,6 +14,8 @@ lto_module_get_macho_cputype
 lto_module_get_num_symbols
 lto_module_get_symbol_attribute
 lto_module_get_symbol_name
+lto_module_get_num_asm_undef_symbols
+lto_module_get_asm_undef_symbol_name
 lto_module_get_target_triple
 lto_module_set_target_triple
 lto_module_is_object_file

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok to make legacy thin LTO equivalent to legacy full LTO behavior. I don't know that the new LTO api handles this case for full or thin LTO? Does that issue remain unresolved for lld/gold?

@cachemeifyoucan
Copy link
Collaborator Author

This seems ok to make legacy thin LTO equivalent to legacy full LTO behavior. I don't know that the new LTO api handles this case for full or thin LTO? Does that issue remain unresolved for lld/gold?

lld is fine but it would be good if you can confirm that. There are two problems in legacy LTO that uses LTOModule to vend symbols to linker. One is that linker cannot distinguish undefs from assembly or bitcode, the other is that if the undef can be resolved in the same object file, LTOModule will only show the defined symbols, not undefined ones. lld has visibility for all symbols in the object file, and it distinguishes which one is prevailing and not, so it doesn't care if the symbol is assembly or bitcode and can know which one to keep. Legacy API doesn't track prevailing symbols so it needs to preserve all edges from assembly code to bitcode, otherwise the definition in bitcode might get internalized and dead stripped.

I don't see reports in the issue regarding lld, but I can leave that open if you see there are still issues in lld/gold.

@cachemeifyoucan cachemeifyoucan merged commit a93eb14 into main Jun 24, 2025
7 of 8 checks passed
@cachemeifyoucan cachemeifyoucan deleted the users/cachemeifyoucan/spr/ltolegacy-add-new-c-apis-to-query-undefined-symbols-in-assembly branch June 24, 2025 22:01
anthonyhatran pushed a commit to anthonyhatran/llvm-project that referenced this pull request Jun 26, 2025
…lvm#145413)

Add new APIs to legacy LTO C API to surface undefined symbols that
parsed from assembly. This information is needed by thin LTO to figure
out the symbols not to dead strip, while such information is
automatically forwarded in full LTO already. Linker needs to fetch the
information and adds the undefs from assembly to MustPreserveSymbols
list just like treating undefs from a non-LTO object file.

Resolves: llvm#29340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline ASM IR Considered Harmful
3 participants