Skip to content

[LLD][COFF] Process all live import symbols in MapFile's getSymbols() #109117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Sep 18, 2024

The current logic assumes that the import file is pulled by object files, and the loop for import files only needs to handle cases where the __imp_ symbol is implicitly pulled by an import thunk. This is fragile, as the symbol may also be pulled through other means, such as the -export argument in tests. Additionally, this logic is insufficient for ARM64EC, which exposes multiple symbols through an import file, and referencing any one of them causes all of them to be defined.

With this change, import symbols are added to syms more often, but we ensure that output symbols remain unique later in the process

@cjacek
Copy link
Contributor Author

cjacek commented Sep 18, 2024

Depends on #109115.

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-platform-windows

Author: Jacek Caban (cjacek)

Changes

The current logic assumes that the import file is pulled by object files, and the loop for import files only needs to handle cases where the __imp_ symbol is implicitly pulled by an import thunk. This is fragile, as the symbol may also be pulled through other means, such as the -export argument in tests. Additionally, this logic is insufficient for ARM64EC, which exposes multiple symbols through an import file, and referencing any one of them causes all of them to be defined.

With this change, import symbols are added to syms more often, but we ensure that output symbols remain unique later in the process


Full diff: https://github.com/llvm/llvm-project/pull/109117.diff

5 Files Affected:

  • (modified) lld/COFF/InputFiles.h (+2-2)
  • (modified) lld/COFF/MapFile.cpp (+3-11)
  • (modified) lld/COFF/SymbolTable.cpp (+3-3)
  • (modified) lld/COFF/SymbolTable.h (+2-2)
  • (added) lld/test/COFF/export-imp.test (+11)
diff --git a/lld/COFF/InputFiles.h b/lld/COFF/InputFiles.h
index 5fa93f57ef9e3a..a20b097cbe04af 100644
--- a/lld/COFF/InputFiles.h
+++ b/lld/COFF/InputFiles.h
@@ -349,7 +349,7 @@ class ImportFile : public InputFile {
   MachineTypes getMachineType() const override;
 
   DefinedImportData *impSym = nullptr;
-  Symbol *thunkSym = nullptr;
+  Defined *thunkSym = nullptr;
   ImportThunkChunkARM64EC *impchkThunk = nullptr;
   std::string dllName;
 
@@ -365,7 +365,7 @@ class ImportFile : public InputFile {
   // Auxiliary IAT symbols and chunks on ARM64EC.
   DefinedImportData *impECSym = nullptr;
   Chunk *auxLocation = nullptr;
-  Symbol *auxThunkSym = nullptr;
+  Defined *auxThunkSym = nullptr;
   DefinedImportData *auxImpCopySym = nullptr;
   Chunk *auxCopyLocation = nullptr;
 
diff --git a/lld/COFF/MapFile.cpp b/lld/COFF/MapFile.cpp
index 52e9ce996f2390..b9ef32da113c4f 100644
--- a/lld/COFF/MapFile.cpp
+++ b/lld/COFF/MapFile.cpp
@@ -122,17 +122,9 @@ static void getSymbols(const COFFLinkerContext &ctx,
     if (!file->live)
       continue;
 
-    if (!file->thunkSym)
-      continue;
-
-    if (!file->thunkSym->isLive())
-      continue;
-
-    if (auto *thunkSym = dyn_cast<Defined>(file->thunkSym))
-      syms.push_back(thunkSym);
-
-    if (auto *impSym = dyn_cast_or_null<Defined>(file->impSym))
-      syms.push_back(impSym);
+    syms.push_back(file->impSym);
+    if (file->thunkSym && file->thunkSym->isLive())
+      syms.push_back(file->thunkSym);
   }
 
   sortUniqueSymbols(syms, ctx.config.imageBase);
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index efea16ccbbfea0..fc169ec211a9fe 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -815,13 +815,13 @@ DefinedImportData *SymbolTable::addImportData(StringRef n, ImportFile *f,
   return nullptr;
 }
 
-Symbol *SymbolTable::addImportThunk(StringRef name, DefinedImportData *id,
-                                    ImportThunkChunk *chunk) {
+Defined *SymbolTable::addImportThunk(StringRef name, DefinedImportData *id,
+                                     ImportThunkChunk *chunk) {
   auto [s, wasInserted] = insert(name, nullptr);
   s->isUsedInRegularObj = true;
   if (wasInserted || isa<Undefined>(s) || s->isLazy()) {
     replaceSymbol<DefinedImportThunk>(s, ctx, name, id, chunk);
-    return s;
+    return cast<Defined>(s);
   }
 
   reportDuplicate(s, id->file);
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index bf97cf442039e0..e3f674b8098f8b 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -105,8 +105,8 @@ class SymbolTable {
                     CommonChunk *c = nullptr);
   DefinedImportData *addImportData(StringRef n, ImportFile *f,
                                    Chunk *&location);
-  Symbol *addImportThunk(StringRef name, DefinedImportData *s,
-                         ImportThunkChunk *chunk);
+  Defined *addImportThunk(StringRef name, DefinedImportData *s,
+                          ImportThunkChunk *chunk);
   void addLibcall(StringRef name);
   void addEntryThunk(Symbol *from, Symbol *to);
   void addExitThunk(Symbol *from, Symbol *to);
diff --git a/lld/test/COFF/export-imp.test b/lld/test/COFF/export-imp.test
new file mode 100644
index 00000000000000..db71ef74641639
--- /dev/null
+++ b/lld/test/COFF/export-imp.test
@@ -0,0 +1,11 @@
+; REQUIRES: x86
+
+; RUN: llvm-lib -machine:amd64 -out:%t.imp.lib -def:%s
+; RUN: lld-link -machine:amd64 -out:%t.dll %t.imp.lib -dll -noentry -export:__imp_func,DATA -map
+
+; FileCheck %s < %t.imp.map
+; CHECK: 0001:00000098       __imp_func                 0000000180001098     export-imp-thunk.test.tmp.imp:imp.dll
+
+LIBRARY imp.dll
+EXPORTS
+        func

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


syms.push_back(thunkSym);

if (auto *impSym = dyn_cast_or_null<Defined>(file->impSym))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So technically, it's not possible for impSym to be null, or to be anything else than a Defined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked that. In case of duplication, addImportData reports an error. However, it's possible to turn that error into a warning with -force, so it may be a null after all. I restored the null check.

impSym is already declared as DefinedImportData and as far as I can tell there is no way it can be replaced, so the cast is not needed.


if (auto *impSym = dyn_cast_or_null<Defined>(file->impSym))
syms.push_back(impSym);
syms.push_back(file->impSym);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about this in #108459 (comment) before, but I didn't quite grasp the response then, but now I understand.

So if we have another object file referencing e.g. __imp_func (which is an undefined reference from the scope of that object file, originally), that defined symbol will get picked up and added to the list of symbols to include in the loop above, looping over ctx.objFileInstances.

That's why a regular reference to __imp_func, without referencing func at all, would have been listed in the map file just fine. But if we pull in __imp_func via an -export: option, it isn't so we need to include it here. Now I get it!

@mstorsjo
Copy link
Member

Oh btw, please update the patch/PR subject to mention MapFile in some form - getSymbols() on its own doesn't really say much about the scope/context about what it does.

The current logic assumes that the import file is pulled by object files,
and the loop for import files only needs to handle cases where the __imp_
symbol is implicitly pulled by an import thunk. This is fragile, as the
symbol may also be pulled through other means, such as the -export argument
in tests. Additionally, this logic is insufficient for ARM64EC, which
exposes multiple symbols through an import file, and referencing any one
of them causes all of them to be defined.

With this change, import symbols are added to syms more often, but we
ensure that output symbols remain unique later in the process.
@cjacek cjacek changed the title [LLD][COFF] Process all live import symbols in getSymbols() [LLD][COFF] Process all live import symbols in MapFile's getSymbols() Sep 18, 2024
@cjacek cjacek merged commit 912e821 into llvm:main Sep 19, 2024
8 checks passed
@cjacek cjacek deleted the imp-map branch September 19, 2024 11:20
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
…llvm#109117)

The current logic assumes that the import file is pulled by object
files, and the loop for import files only needs to handle cases where
the `__imp_` symbol is implicitly pulled by an import thunk. This is
fragile, as the symbol may also be pulled through other means, such as
the -export argument in tests. Additionally, this logic is insufficient
for ARM64EC, which exposes multiple symbols through an import file, and
referencing any one of them causes all of them to be defined.
    
With this change, import symbols are added to `syms` more often, but we
ensure that output symbols remain unique later in the process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants