Skip to content

[LLD] [COFF] Handle manually defined __imp_ pointers in LTO #70777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2023

Conversation

mstorsjo
Copy link
Member

@mstorsjo mstorsjo commented Oct 31, 2023

Such pointers are often used by the core parts of mingw-w64, to locally define a function that might have been referred to with dllimport.

(MSVC style linkers can automatically provide such pointers, if there are undefined references to __imp_<func> left but a definition of <func> is available - although this prints the warning LNK4217. GNU ld doesn't do this, so in mingw-w64, such things are generally handled by manually providing the relevant __imp_ pointers.)

Make sure that a full LTO build, that does LTO of both the __imp_ pointer and the object file referencing it, successfully resolves such symbols.

This solution admittedly probably reduces the effect of the LTO compilation if there would happen to be __imp_ prefixed symbols included, in LTO objects, that aren't actually used. Such symbols are mostly used in the base toolchain, not often in user code, and usually only the relevant object files are linked in anyway.

This fixes #57982.

@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2023

@llvm/pr-subscribers-lto
@llvm/pr-subscribers-lld
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-coff

Author: Martin Storsjö (mstorsjo)

Changes

Such pointers are often used by the core parts of mingw-w64, to locally define a function that might have been referred to with dllimport.

(MSVC style linkers can automatically provide such pointers, if there are undefined references to __imp_&lt;func&gt; left but a definition of &lt;func&gt; is available - although this prints the warning LNK4217. GNU ld doesn't do this, so in mingw-w64, such things are generally handled by manually providing the relevant __imp_ pointers.)

Make sure that a full LTO build, that does LTO of both the __imp_ pointer and the object file referencing it, successfully resolves such symbols.

This solution admittedly probably reduces the effect of the LTO compilation if there would happen to be __imp_ prefixed symbols included, in LTO objects, that aren't actually used. Such symbols are mostly used in the base toolchain, not often in user code, and usually only the relevant object files are linked in anyway.


Full diff: https://github.com/llvm/llvm-project/pull/70777.diff

2 Files Affected:

  • (modified) lld/COFF/InputFiles.cpp (+5)
  • (added) lld/test/COFF/lto-imp-prefix.ll (+39)
diff --git a/lld/COFF/InputFiles.cpp b/lld/COFF/InputFiles.cpp
index 38ce29e6ab68c04..ba569488af64009 100644
--- a/lld/COFF/InputFiles.cpp
+++ b/lld/COFF/InputFiles.cpp
@@ -1061,6 +1061,11 @@ void BitcodeFile::parse() {
     } else {
       sym = ctx.symtab.addRegular(this, symName, nullptr, fakeSC, 0,
                                   objSym.isWeak());
+      // If one LTO object defines a __imp_<foo> symbol, and another LTO
+      // object refers to <foo> with dllimport, make sure the __imp_ symbol
+      // is kept.
+      if (symName.starts_with("__imp_"))
+        sym->isUsedInRegularObj = true;
     }
     symbols.push_back(sym);
     if (objSym.isUsed())
diff --git a/lld/test/COFF/lto-imp-prefix.ll b/lld/test/COFF/lto-imp-prefix.ll
new file mode 100644
index 000000000000000..503e4fd0649e41d
--- /dev/null
+++ b/lld/test/COFF/lto-imp-prefix.ll
@@ -0,0 +1,39 @@
+; REQUIRES: x86
+
+; RUN: rm -rf %t.dir
+; RUN: split-file %s %t.dir
+; RUN: llvm-as %t.dir/main.ll -o %t.main.obj
+; RUN: llvm-as %t.dir/other.ll -o %t.other.obj
+
+; RUN: lld-link /entry:entry %t.main.obj %t.other.obj /out:%t.exe /subsystem:console
+
+;--- main.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define void @entry() {
+entry:
+  tail call void @importedFunc()
+  tail call void @other()
+  ret void
+}
+
+declare dllimport void @importedFunc()
+
+declare void @other()
+
+;--- other.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+@__imp_importedFunc = global ptr @importedFuncReplacement
+
+define internal void @importedFuncReplacement() {
+entry:
+  ret void
+}
+
+define void @other() {
+entry:
+  ret void
+}

@mstorsjo mstorsjo added the LTO Link time optimization (regular/full LTO or ThinLTO) label Oct 31, 2023
@@ -1061,6 +1061,11 @@ void BitcodeFile::parse() {
} else {
sym = ctx.symtab.addRegular(this, symName, nullptr, fakeSC, 0,
objSym.isWeak());
// If one LTO object defines a __imp_<foo> symbol, and another LTO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case you've described is very specific, can you please lead with a higher-level comment about what this code does? As a suggestion, you can start with:
// Model all symbols with the __imp_ prefix as having external references. If one LTO object ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done - thanks!

Such pointers are often used by the core parts of mingw-w64,
to locally define a function that might have been referred to with
dllimport.

(MSVC style linkers can automatically provide such pointers, if
there are undefined references to __imp_<func> left but a
definition of <func> is available - although this prints the warning
LNK4217. GNU ld doesn't do this, so in mingw-w64, such things are
generally handled by manually providing the relevant __imp_
pointers.)

Make sure that a full LTO build, that does LTO of both the
__imp_ pointer and the object file referencing it, successfully
resolves such symbols.

This solution admittedly probably reduces the effect of the LTO
compilation if there would happen to be __imp_ prefixed symbols
included that aren't actually used. Such symbols are mostly used
in the base toolchain, not often in user code, and usually only
the relevant object files are linked in anyway.

This fixes llvm#57982.
@mstorsjo mstorsjo merged commit 3ab6209 into llvm:main Nov 4, 2023
@mstorsjo mstorsjo deleted the lld-lto-imp-prefix branch November 4, 2023 21:49
@MaskRay
Copy link
Member

MaskRay commented Nov 18, 2023

This patch forces __imp_sym to be retained in the output. Am I right that this could be relaxed to when sym is referenced? But that would introduce some complexity as __imp_sym sym->isUsedInRegularObj = true; needs to be done after all input files are parsed.

@mstorsjo
Copy link
Member Author

This patch forces __imp_sym to be retained in the output. Am I right that this could be relaxed to when sym is referenced?

Yes, that's correct.

But that would introduce some complexity as __imp_sym sym->isUsedInRegularObj = true; needs to be done after all input files are parsed.

Indeed - something like what I do in #71376 could perhaps help here as well - but this case is a bit trickier. When LLD looks at the bitcode object files, it sees one with an undefined __imp_sym and one with a defined __imp_sym - i.e. a perfectly normal situation, nothing to do about that. But when the LTO engine compiles it, it sees one defined __imp_sym with no references within the set of bitcode files, and one undefined sym with a dllimport attribute.

In practice I don't think this affects the overall end result much, because this kind of trickery with manually defined __imp_ symbols is very rare in normal code; it's used a bunch within mingw-w64-crt, but also there, the link would mostly include those object files that actually are needed. But perhaps it does end up retaining a handful of __imp_ pointers that might not be needed. I would think that MSVC like environments almost never use this construct.

Ideally, the optimal condition would be to only mark isUsedInRegularObj on any defined bitcode symbol with an __imp_ prefix, if any other of the bitcode input files has an undefined __imp_sym symbol. I'll see if I get time to take another shot at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lld:COFF lld LTO Link time optimization (regular/full LTO or ThinLTO) platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COFF LTO fails to map locally defined __imp_name symbols
4 participants