-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lld][WebAssembly] Return 0 for synthetic function offsets #96134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When two or more functions' signatures differ, one of them is selected and for other signatures `unreachable` stubs are generated: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L975 https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L852-L870 And when these `SyntheticFunction`s are generated, this constructor is used, https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L266-L269 which does not set its `function` field: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L304 As a result, the `function` field contains a garbage value for these stub functions. `InputFunction::getFunctionCodeOffset()` is called when relocations are resolved for `.debug_info` section to get functions' PC locations. But because these stub functions don't have their `function` field set, this function segfaults: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L282 This PR initializes the field with `nullptr`, and in `InputFunction::getFunctionCodeOffset`, checks if `function` is `nullptr`, and if so, just returns 0. This function is called only for resolving relocations in the `.debug_info` section, and addresses of these stub functions, which are not the functions users wrote in the first place, are not really meaningful anyway.
@llvm/pr-subscribers-lld-wasm @llvm/pr-subscribers-lld Author: Heejin Ahn (aheejin) ChangesWhen two or more functions' signatures differ, one of them is selected and for other signatures llvm-project/lld/wasm/SymbolTable.cpp Line 975 in 57778ec
llvm-project/lld/wasm/SymbolTable.cpp Lines 852 to 870 in 57778ec
And when these llvm-project/lld/wasm/InputChunks.h Lines 266 to 269 in 57778ec
function field:llvm-project/lld/wasm/InputChunks.h Line 304 in 57778ec
function field contains a garbage value for these stub functions.
llvm-project/lld/wasm/InputChunks.h Line 282 in 57778ec
This PR initializes the field with Full diff: https://github.com/llvm/llvm-project/pull/96134.diff 5 Files Affected:
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll
new file mode 100644
index 0000000000000..9ebc5c4b9c922
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-a.ll
@@ -0,0 +1,31 @@
+target triple = "wasm32-unknown-emscripten"
+
+define void @foo(i32 %a) !dbg !6 {
+ ret void
+}
+
+define void @test0() !dbg !10 {
+entry:
+ call void @foo(i32 3), !dbg !13
+ ret void, !dbg !14
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "a.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{null, !9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = distinct !DISubprogram(name: "test0", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, spFlags: DISPFlagDefinition, unit: !0)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 8, column: 3, scope: !10)
+!14 = !DILocation(line: 9, column: 1, scope: !10)
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll
new file mode 100644
index 0000000000000..7b8295363c802
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-b.ll
@@ -0,0 +1,31 @@
+target triple = "wasm32-unknown-emscripten"
+
+define void @foo(i32 %a, i32 %b) !dbg !6 {
+ ret void
+}
+
+define void @test1() !dbg !10 {
+entry:
+ call void @foo(i32 4, i32 5), !dbg !13
+ ret void, !dbg !14
+}
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "b.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{null, !9, !9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = distinct !DISubprogram(name: "test1", scope: !1, file: !1, line: 7, type: !11, scopeLine: 7, spFlags: DISPFlagDefinition, unit: !0)
+!11 = !DISubroutineType(types: !12)
+!12 = !{null}
+!13 = !DILocation(line: 8, column: 3, scope: !10)
+!14 = !DILocation(line: 9, column: 1, scope: !10)
diff --git a/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll b/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll
new file mode 100644
index 0000000000000..3d2f8c2e0a941
--- /dev/null
+++ b/lld/test/wasm/Inputs/signature-mismatch-debug-info-main.ll
@@ -0,0 +1,30 @@
+target triple = "wasm32-unknown-emscripten"
+
+define i32 @main() !dbg !6 {
+entry:
+ call void @test0(), !dbg !10
+ call void @test1(), !dbg !11
+ ret i32 0, !dbg !12
+}
+
+declare void @test0()
+
+declare void @test1()
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4}
+!llvm.ident = !{!5}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 19.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+!1 = !DIFile(filename: "main.c", directory: "")
+!2 = !{i32 7, !"Dwarf Version", i32 4}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{!"clang version 19.0.0git"}
+!6 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 4, type: !7, scopeLine: 4, spFlags: DISPFlagDefinition, unit: !0)
+!7 = !DISubroutineType(types: !8)
+!8 = !{!9}
+!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!10 = !DILocation(line: 5, column: 3, scope: !6)
+!11 = !DILocation(line: 6, column: 3, scope: !6)
+!12 = !DILocation(line: 7, column: 3, scope: !6)
diff --git a/lld/test/wasm/signature-mismatch-debug-info.test b/lld/test/wasm/signature-mismatch-debug-info.test
new file mode 100644
index 0000000000000..fe1e8c2dbe579
--- /dev/null
+++ b/lld/test/wasm/signature-mismatch-debug-info.test
@@ -0,0 +1,8 @@
+# This is a regression test that checks whether a function signature mismatch
+# in functions with debug info does not cause does not cause a segmentation
+# fault when writing .debug_info section.
+
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-a.ll -o %t.a.o
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-b.ll -o %t.b.o
+; RUN: llc -filetype=obj %p/Inputs/signature-mismatch-debug-info-main.ll -o %t.main.o
+; RUN: wasm-ld -o %t.wasm %t.a.o %t.b.o %t.main.o --export=main --no-entry
diff --git a/lld/wasm/InputChunks.h b/lld/wasm/InputChunks.h
index cf8a5249b19a0..5174439facc67 100644
--- a/lld/wasm/InputChunks.h
+++ b/lld/wasm/InputChunks.h
@@ -279,7 +279,14 @@ class InputFunction : public InputChunk {
}
void setExportName(std::string exportName) { this->exportName = exportName; }
uint32_t getFunctionInputOffset() const { return getInputSectionOffset(); }
- uint32_t getFunctionCodeOffset() const { return function->CodeOffset; }
+ uint32_t getFunctionCodeOffset() const {
+ // For generated synthetic functions, such as unreachable stubs generated
+ // for signature mismatches, 'function' reference does not exist. This
+ // function is used to get function offsets for .debug_info section, and for
+ // those generated stubs function offsets are not meaningful anyway. So just
+ // return 0 in those cases.
+ return function ? function->CodeOffset : 0;
+ }
uint32_t getFunctionIndex() const { return *functionIndex; }
bool hasFunctionIndex() const { return functionIndex.has_value(); }
void setFunctionIndex(uint32_t index);
@@ -301,7 +308,7 @@ class InputFunction : public InputChunk {
return compressedSize;
}
- const WasmFunction *function;
+ const WasmFunction *function = nullptr;
protected:
std::optional<std::string> exportName;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work finding this! Do you know what C/C++ is needed to reproduce? i.e. why haven't we seen it before?
} | ||
|
||
define void @test0() !dbg !10 { | ||
entry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this function have entry:
but not the one above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generated these tests from .c
file and manually removed some unnecessary parts, and I think I left some entry
s. Removed them for consistency.
@@ -0,0 +1,31 @@ | |||
target triple = "wasm32-unknown-emscripten" | |||
|
|||
define void @foo(i32 %a, i32 %b) !dbg !6 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not result in a link error due to multiple definitions of foo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. They are not marked as weak
, so it's weird... I tested with system (x86) clang and gcc and they all seem to error out.
When we do this by LTO, meaning running wasm-ld
directly on bitcode, they error out saying duplicate definition. When does this "signature mismatch" occur usually? When a definition and a callsite do not match in the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, regardless of signature... you can't have two global definitions of foo in the same program unless one of them is weak. So I can't see why this wouldn't fail, even with matching signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the current non-weak functions don't error out, but changed the test function linkages to be weak
, which should not error out.
!11 = !DISubroutineType(types: !12) | ||
!12 = !{null} | ||
!13 = !DILocation(line: 8, column: 3, scope: !10) | ||
!14 = !DILocation(line: 9, column: 1, scope: !10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests include a lot of stuff that is perhaps not completely relevant to the bug I guess? Can we remove any of these lines that are not needed? Or is it better to keep them more like clang-generated bitcode?
One thing that would be great would be if we could write this in .s
format... how does the .s
format look for these tests? is it even more verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I got after I spent some time deleting unnecessary debug info already... I can further delete DILocation
nodes and the line debug info attached to instructions. But we at least need DICompileUnit
and DISubprogram
and other nodes that are (transitively) referenced by those nodes.
Can I attach debug info in .s
files? If so, can you point me to an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think simply converting these files to .s
will include the debug info. I'm not sure it will be harder to easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.c: __attribute__((weak)) void foo(int a) {}
void test0() {
foo(3);
} b.c: __attribute__((weak)) void foo(int a, int b) {}
void test1() {
foo(4, 5);
} main.c: void test0();
void test1();
int main() {
test0();
test1();
return 0;
} And I manually removed instructions and debug info that were not necessary in the generated files.
I'm not very sure. This happens when all of these are satisfied:
I'm not sure how easy or hard to meet all three conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
having the tests in .s form makes the test very nice and simple
@sbc100 Do you have any more concerns? |
Will land this. Please let me know if you have more comments, which can be addressed as a follow-up. |
When two or more functions' signatures differ, one of them is selected and for other signatures `unreachable` stubs are generated: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L975 https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/SymbolTable.cpp#L852-L870 And when these `SyntheticFunction`s are generated, this constructor is used, https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L266-L269 which does not set its `function` field: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L304 As a result, the `function` field contains a garbage value for these stub functions. `InputFunction::getFunctionCodeOffset()` is called when relocations are resolved for `.debug_info` section to get functions' PC locations. But because these stub functions don't have their `function` field set, this function segfaults: https://github.com/llvm/llvm-project/blob/57778ec36c9c7e96b76a167f19dccbe00d49c9d4/lld/wasm/InputChunks.h#L282 This bug seems to be triggered when these conditions are met: - There is a signature mismatch warning with multiple different definitions (one definition with other declarations is not sufficient) with weak linkage with the same name - The 'stub' function containing unreachable has a callsite, meaning it isn't DCE'd - .debug_info section is generated (i.e., DWARF is used) This PR initializes the field with `nullptr`, and in `InputFunction::getFunctionCodeOffset`, checks if `function` is `nullptr`, and if so, just returns 0. This function is called only for resolving relocations in the `.debug_info` section, and addresses of these stub functions, which are not the functions users wrote in the first place, are not really meaningful anyway.
When two or more functions' signatures differ, one of them is selected and for other signatures
unreachable
stubs are generated:llvm-project/lld/wasm/SymbolTable.cpp
Line 975 in 57778ec
llvm-project/lld/wasm/SymbolTable.cpp
Lines 852 to 870 in 57778ec
And when these
SyntheticFunction
s are generated, this constructor is used,llvm-project/lld/wasm/InputChunks.h
Lines 266 to 269 in 57778ec
function
field:llvm-project/lld/wasm/InputChunks.h
Line 304 in 57778ec
function
field contains a garbage value for these stub functions.InputFunction::getFunctionCodeOffset()
is called when relocations are resolved for.debug_info
section to get functions' PC locations. But because these stub functions don't have theirfunction
field set, this function segfaults:llvm-project/lld/wasm/InputChunks.h
Line 282 in 57778ec
This bug seems to be triggered when these conditions are met:
This PR initializes the field with
nullptr
, and inInputFunction::getFunctionCodeOffset
, checks iffunction
isnullptr
, and if so, just returns 0. This function is called only for resolving relocations in the.debug_info
section, and addresses of these stub functions, which are not the functions users wrote in the first place, are not really meaningful anyway.