Skip to content

[lld][WebAssembly] Return 0 for synthetic function offsets #96134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 21, 2024

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Jun 20, 2024

When two or more functions' signatures differ, one of them is selected and for other signatures unreachable stubs are generated:

replaceWithUnreachable(f, *f->signature, debugName);
// Replace the given symbol body with an unreachable function.
// This is used by handleWeakUndefines in order to generate a callable
// equivalent of an undefined function and also handleSymbolVariants for
// undefined functions that don't match the signature of the definition.
InputFunction *SymbolTable::replaceWithUnreachable(Symbol *sym,
const WasmSignature &sig,
StringRef debugName) {
auto *func = make<SyntheticFunction>(sig, sym->getName(), debugName);
func->setBody(unreachableFn);
ctx.syntheticFunctions.emplace_back(func);
// Mark new symbols as local. For relocatable output we don't want them
// to be exported outside the object file.
replaceSymbol<DefinedFunction>(sym, debugName, WASM_SYMBOL_BINDING_LOCAL,
nullptr, func);
// Ensure the stub function doesn't get a table entry. Its address
// should always compare equal to the null pointer.
sym->isStub = true;
return func;
}

And when these SyntheticFunctions are generated, this constructor is used,

InputFunction(StringRef name, const WasmSignature &s)
: InputChunk(nullptr, InputChunk::Function, name), signature(s) {
assert(s.Kind == WasmSignature::Function);
}
which does not set its function field:
const WasmFunction *function;
As a result, the function field contains a garbage value for these stub functions.

InputFunction::getFunctionCodeOffset() is called when relocations are resolved for .debug_info section to get functions' PC locations. But because these stub functions don't have their function field set, this function segfaults:

uint32_t getFunctionCodeOffset() const { return function->CodeOffset; }

This bug seems to be triggered when these conditions are met:

  • There is a signature mismatch warning with multiple different definitions (one definition with other declarations is not sufficient) with weak linkage with the same name
  • The 'stub' function containing unreachable has a callsite, meaning it isn't DCE'd
  • .debug_info section is generated (i.e., DWARF is used)

This PR initializes the field with nullptr, and in InputFunction::getFunctionCodeOffset, checks if function is nullptr, and if so, just returns 0. This function is called only for resolving relocations in the .debug_info section, and addresses of these stub functions, which are not the functions users wrote in the first place, are not really meaningful anyway.

When two or more functions' signatures differ, one of them is selected
and for other signatures `unreachable` stubs are generated:
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L975
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L852-L870

And when these `SyntheticFunction`s are generated, this constructor is
used,
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L266-L269
which does not set its `function` field:
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L304
As a result, the `function` field contains a garbage value for these
stub functions.

`InputFunction::getFunctionCodeOffset()` is called when relocations are
resolved for `.debug_info` section to get functions' PC locations. But
because these stub functions don't have their `function` field set, this
function segfaults:
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L282

This PR initializes the field with `nullptr`, and in
`InputFunction::getFunctionCodeOffset`, checks if `function` is
`nullptr`, and if so, just returns 0. This function is called only for
resolving relocations in the `.debug_info` section, and addresses of
these stub functions, which are not the functions users wrote in the
first place, are not really meaningful anyway.
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/pr-subscribers-lld-wasm

@llvm/pr-subscribers-lld

Author: Heejin Ahn (aheejin)

Changes

When two or more functions' signatures differ, one of them is selected and for other signatures unreachable stubs are generated:

replaceWithUnreachable(f, *f->signature, debugName);
// Replace the given symbol body with an unreachable function.
// This is used by handleWeakUndefines in order to generate a callable
// equivalent of an undefined function and also handleSymbolVariants for
// undefined functions that don't match the signature of the definition.
InputFunction *SymbolTable::replaceWithUnreachable(Symbol *sym,
const WasmSignature &sig,
StringRef debugName) {
auto *func = make<SyntheticFunction>(sig, sym->getName(), debugName);
func->setBody(unreachableFn);
ctx.syntheticFunctions.emplace_back(func);
// Mark new symbols as local. For relocatable output we don't want them
// to be exported outside the object file.
replaceSymbol<DefinedFunction>(sym, debugName, WASM_SYMBOL_BINDING_LOCAL,
nullptr, func);
// Ensure the stub function doesn't get a table entry. Its address
// should always compare equal to the null pointer.
sym->isStub = true;
return func;
}

And when these SyntheticFunctions are generated, this constructor is used,

InputFunction(StringRef name, const WasmSignature &s)
: InputChunk(nullptr, InputChunk::Function, name), signature(s) {
assert(s.Kind == WasmSignature::Function);
}
which does not set its function field:
const WasmFunction *function;
As a result, the function field contains a garbage value for these stub functions.

InputFunction::getFunctionCodeOffset() is called when relocations are resolved for .debug_info section to get functions' PC locations. But because these stub functions don't have their function field set, this function segfaults:

uint32_t getFunctionCodeOffset() const { return function->CodeOffset; }

This PR initializes the field with nullptr, and in InputFunction::getFunctionCodeOffset, checks if function is nullptr, and if so, just returns 0. This function is called only for resolving relocations in the .debug_info section, and addresses of these stub functions, which are not the functions users wrote in the first place, are not really meaningful anyway.


Full diff: https://github.com/llvm/llvm-project/pull/96134.diff

5 Files Affected:

  • (added) lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll (+31)
  • (added) lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll (+31)
  • (added) lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll (+30)
  • (added) lld/test/wasm/signature-mismatch-debug-info.test (+8)
  • (modified) lld/wasm/InputChunks.h (+9-2)
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll
new file mode 100644
index 0000000000000..9ebc5c4b9c922
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll
@@ -0,0 +1,31 @@
+target triple = "wasm32-unknown-emscripten"
+
+define void @foo(i32 %a) !dbg !6 {
+  ret void
+}
+
+define void @test0() !dbg !10 {
+entry:
+  call void @foo(i32 3), !dbg !13
+  ret void, !dbg !14
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "a.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{null, !9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = distinct !DISubprogram(name: "test0", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, spFlags: DISPFlagDefinition, unit: !0)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 8, column: 3, scope: !10)
+!14 = !DILocation(line: 9, column: 1, scope: !10)
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll
new file mode 100644
index 0000000000000..7b8295363c802
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll
@@ -0,0 +1,31 @@
+target triple = "wasm32-unknown-emscripten"
+
+define void @foo(i32 %a, i32 %b) !dbg !6 {
+  ret void
+}
+
+define void @test1() !dbg !10 {
+entry:
+  call void @foo(i32 4, i32 5), !dbg !13
+  ret void, !dbg !14
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "b.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{null, !9, !9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = distinct !DISubprogram(name: "test1", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, spFlags: DISPFlagDefinition, unit: !0)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 8, column: 3, scope: !10)
+!14 = !DILocation(line: 9, column: 1, scope: !10)
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll
new file mode 100644
index 0000000000000..3d2f8c2e0a941
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll
@@ -0,0 +1,30 @@
+target triple = "wasm32-unknown-emscripten"
+
+define i32 @main() !dbg !6 {
+entry:
+  call void @test0(), !dbg !10
+  call void @test1(), !dbg !11
+  ret i32 0, !dbg !12
+}
+
+declare void @test0()
+
+declare void @test1()
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "main.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 4, type: !7, scopeLine: 4, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{!9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = !DILocation(line: 5, column: 3, scope: !6)
+!11 = !DILocation(line: 6, column: 3, scope: !6)
+!12 = !DILocation(line: 7, column: 3, scope: !6)
diff --git a/lld/test/wasm/signature-mismatch-debug-info.test b/lld/test/wasm/signature-mismatch-debug-info.test
new file mode 100644
index 0000000000000..fe1e8c2dbe579
--- /dev/null
+++ b/lld/test/wasm/signature-mismatch-debug-info.test
@@ -0,0 +1,8 @@
+# This is a regression test that checks whether a function signature mismatch
+# in functions with debug info does not cause does not cause a segmentation
+# fault when writing .debug_info section.
+
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-a.ll -o %t.a.o
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-b.ll -o %t.b.o
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-main.ll -o %t.main.o
+; RUN: wasm-ld -o %t.wasm %t.a.o %t.b.o %t.main.o --export=main --no-entry
diff --git a/lld/wasm/InputChunks.h b/lld/wasm/InputChunks.h
index cf8a5249b19a0..5174439facc67 100644
--- a/lld/wasm/InputChunks.h
+++ b/lld/wasm/InputChunks.h
@@ -279,7 +279,14 @@ class InputFunction : public InputChunk {
   }
   void setExportName(std::string exportName) { this->exportName = exportName; }
   uint32_t getFunctionInputOffset() const { return getInputSectionOffset(); }
-  uint32_t getFunctionCodeOffset() const { return function->CodeOffset; }
+  uint32_t getFunctionCodeOffset() const {
+    // For generated synthetic functions, such as unreachable stubs generated
+    // for signature mismatches, 'function' reference does not exist. This
+    // function is used to get function offsets for .debug_info section, and for
+    // those generated stubs function offsets are not meaningful anyway. So just
+    // return 0 in those cases.
+    return function ? function->CodeOffset : 0;
+  }
   uint32_t getFunctionIndex() const { return *functionIndex; }
   bool hasFunctionIndex() const { return functionIndex.has_value(); }
   void setFunctionIndex(uint32_t index);
@@ -301,7 +308,7 @@ class InputFunction : public InputChunk {
     return compressedSize;
   }
 
-  const WasmFunction *function;
+  const WasmFunction *function = nullptr;
 
 protected:
   std::optional<std::string> exportName;

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work finding this! Do you know what C/C++ is needed to reproduce? i.e. why haven't we seen it before?

}

define void @test0() !dbg !10 {
entry:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this function have entry: but not the one above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generated these tests from .c file and manually removed some unnecessary parts, and I think I left some entrys. Removed them for consistency.

@@ -0,0 +1,31 @@
target triple = "wasm32-unknown-emscripten"

define void @foo(i32 %a, i32 %b) !dbg !6 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not result in a link error due to multiple definitions of foo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. They are not marked as weak, so it's weird... I tested with system (x86) clang and gcc and they all seem to error out.

When we do this by LTO, meaning running wasm-ld directly on bitcode, they error out saying duplicate definition. When does this "signature mismatch" occur usually? When a definition and a callsite do not match in the signature?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, regardless of signature... you can't have two global definitions of foo in the same program unless one of them is weak. So I can't see why this wouldn't fail, even with matching signatures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the current non-weak functions don't error out, but changed the test function linkages to be weak, which should not error out.

!11 = !DISubroutineType(types: !12)
!12 = !{null}
!13 = !DILocation(line: 8, column: 3, scope: !10)
!14 = !DILocation(line: 9, column: 1, scope: !10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests include a lot of stuff that is perhaps not completely relevant to the bug I guess? Can we remove any of these lines that are not needed? Or is it better to keep them more like clang-generated bitcode?

One thing that would be great would be if we could write this in .s format... how does the .s format look for these tests? is it even more verbose?

Copy link
Member Author

@aheejin aheejin Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I got after I spent some time deleting unnecessary debug info already... I can further delete DILocation nodes and the line debug info attached to instructions. But we at least need DICompileUnit and DISubprogram and other nodes that are (transitively) referenced by those nodes.

Can I attach debug info in .s files? If so, can you point me to an example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simply converting these files to .s will include the debug info. I'm not sure it will be harder to easier to read.

Copy link
Member Author

@aheejin aheejin Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting these files to .s files generated this, which was too verbose: 6f7d8cd

So I just rewrote them just enough to reproduce the error: 165246a
(Turns out just mentioning the function name in .debug_info section was sufficient to reproduce the error)

@aheejin
Copy link
Member Author

aheejin commented Jun 20, 2024

Nice work finding this! Do you know what C/C++ is needed to reproduce?

a.c:

__attribute__((weak)) void foo(int a) {}
void test0() {
  foo(3);
}

b.c:

__attribute__((weak)) void foo(int a, int b) {}
void test1() {
  foo(4, 5);
}

main.c:

void test0();
void test1();
int main() {
  test0();
  test1();
  return 0;
}

And I manually removed instructions and debug info that were not necessary in the generated files.

i.e. why haven't we seen it before?

I'm not very sure. This happens when all of these are satisfied:

  1. There is a signature mismatch warning with multiple different definitions (one definition with other declarations is not sufficient) with weak linkage with the same name
  2. The 'stub' function containing unreachable has a callsite, meaning it isn't DCE'd
  3. .debug_info section is generated (i.e., DWARF is used)

I'm not sure how easy or hard to meet all three conditions.

Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
having the tests in .s form makes the test very nice and simple

@aheejin
Copy link
Member Author

aheejin commented Jun 21, 2024

@sbc100 Do you have any more concerns?

@aheejin
Copy link
Member Author

aheejin commented Jun 21, 2024

Will land this. Please let me know if you have more comments, which can be addressed as a follow-up.

@aheejin aheejin merged commit 7d2c2af into llvm:main Jun 21, 2024
7 checks passed
@aheejin aheejin deleted the mismatch_stub_fix branch June 21, 2024 22:56
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
When two or more functions' signatures differ, one of them is selected
and for other signatures `unreachable` stubs are generated:
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L975
https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L852-L870

And when these `SyntheticFunction`s are generated, this constructor is
used,

https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L266-L269
which does not set its `function` field:

https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L304
As a result, the `function` field contains a garbage value for these
stub functions.

`InputFunction::getFunctionCodeOffset()` is called when relocations are
resolved for `.debug_info` section to get functions' PC locations. But
because these stub functions don't have their `function` field set, this
function segfaults:

https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L282

This bug seems to be triggered when these conditions are met:
- There is a signature mismatch warning with multiple different
definitions (one definition with other declarations is not sufficient)
with weak linkage with the same name
- The 'stub' function containing unreachable has a callsite, meaning it
isn't DCE'd
- .debug_info section is generated (i.e., DWARF is used)

This PR initializes the field with `nullptr`, and in
`InputFunction::getFunctionCodeOffset`, checks if `function` is
`nullptr`, and if so, just returns 0. This function is called only for
resolving relocations in the `.debug_info` section, and addresses of
these stub functions, which are not the functions users wrote in the
first place, are not really meaningful anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants