-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[WebAssembly] Support parsing .lto_set_conditional #126546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the second module. And if there is an originally local-linkage global value defined in the first module and referenced in the second module or the vice versa, that value is _promoted_ by attaching a module has ID to their names in order to prevent name clashes because now they can be referenced from other modules. And when that promoted global value is a function, a `.lto_set_conditional` entry is written to the first module to avoid breaking references from inline assembly: https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91 The syntax of this is, if the original function name is `a` and the module ID is `123`, ```ll module asm ".lto_set_conditional symbolA,symbolA.123" ``` These symbols are parsed here: https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467 The first function symbol in this `.lto_set_conditional` do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not assigned as function symbols but they are not really data either, so the object writer crashes here: https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820 This PR makes the object writer just skip those symbols. --- This problem was discovered when I was testing with `-fwhole-program-vtables`. The reason we didn't have this problem before with ThinLTO was because `-fsplit-lto-unit`, which splits LTO units when possible, defaults to false, but it defaults to true when `-fwhole-program-vtables` is used.
@llvm/pr-subscribers-mc @llvm/pr-subscribers-backend-webassembly Author: Heejin Ahn (aheejin) ChangesIn the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the second module. And if there is an originally local-linkage global value defined in the first module and referenced in the second module or the vice versa, that value is promoted by attaching a module has ID to their names in order to prevent name clashes because now they can be referenced from other modules. And when that promoted global value is a function, a llvm-project/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp Lines 84 to 91 in d21fc58
The syntax of this is, if the original function name is module asm ".lto_set_conditional symbolA,symbolA.123" These symbols are parsed here: llvm-project/llvm/lib/MC/MCParser/AsmParser.cpp Line 6467 in 648981f
The first function symbol in this llvm-project/llvm/lib/MC/WasmObjectWriter.cpp Line 1820 in 5b9e6c7
This PR makes the object writer just skip those symbols. This problem was discovered when I was testing with Full diff: https://github.com/llvm/llvm-project/pull/126546.diff 2 Files Affected:
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index c5a95cb3da54331..d51bbd642196acb 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1817,7 +1817,18 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
assert(WasmIndices.count(&WS) > 0);
Info.ElementIndex = WasmIndices.find(&WS)->second;
} else if (WS.isDefined()) {
- assert(DataLocations.count(&WS) > 0);
+ if (!DataLocations.count(&WS))
+ // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines
+ // can appear:
+ // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
+ // ...
+ // (Here [moduleId] will be replaced by a real module hash ID)
+ //
+ // Here the original symbols (symbolA here) have been renamed to symbol
+ // new names created by attaching their module IDs and the original
+ // symbols do not appear in the bitcode anymore, and thus not in
+ // DataLocations. We should ignore them.
+ continue;
Info.DataRef = DataLocations.find(&WS)->second;
}
WS.setIndex(SymbolInfos.size());
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.ll b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
new file mode 100644
index 000000000000000..84270aea60bdebe
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
@@ -0,0 +1,13 @@
+; RUN: llc %s -filetype=obj
+
+target triple = "wasm32-unknown-unknown"
+
+; In .lto_set_conditional, the first symbol is renamed to the second symbol, so
+; the first symbol does not exist anymore in the file. Object writer should not
+; crash when .lto_set_conditional is present.
+
+module asm ".lto_set_conditional a,a.new"
+
+define hidden void @a.new() {
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work tracking this down!
; the first symbol does not exist anymore in the file. Object writer should not | ||
; crash when .lto_set_conditional is present. | ||
|
||
module asm ".lto_set_conditional a,a.new" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we wring this in asm directly instead? (i.e could this test be a .s file?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llvm/lib/MC/WasmObjectWriter.cpp
Outdated
// Here the original symbols (symbolA here) have been renamed to symbol | ||
// new names created by attaching their module IDs and the original | ||
// symbols do not appear in the bitcode anymore, and thus not in | ||
// DataLocations. We should ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should instead add an early return right that top of this function (before LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");
).
Something like:
if (WS.isData() && !DataLocations.count(&WS)) contionue
Or maybe we could have isInSymtab
return false for these symbols which is already handled with a continue above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the split module. https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L315-L366 And if there is an originally local-linkage global value defined in the original module and referenced in the split module or the vice versa, that value is _promoted_ by attaching a module ID to their names in order to prevent name clashes because now they can be referenced from other modules. https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L46-L100 And when that promoted global value is a function, a `.lto_set_conditional` entry is written to the original module to avoid breaking references from inline assembly: https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91 The syntax of this is, if the original function name is `symbolA` and the module ID is `123`, ```ll module asm ".lto_set_conditional symbolA,symbolA.123" ``` These symbols are parsed here: https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467 The first function symbol in this `.lto_set_conditional` do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not assigned as function symbols but they are not really data either, so the object writer crashes here: https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820 This PR makes the object writer just skip those symbols. --- This problem was discovered when I was testing with `-fwhole-program-vtables`. The reason we didn't have this problem before with ThinLTO was because `-fsplit-lto-unit`, which splits LTO units when possible, defaults to false, but it defaults to true when `-fwhole-program-vtables` is used.
In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the split module.
And if there is an originally local-linkage global value defined in the original module and referenced in the split module or the vice versa, that value is promoted by attaching a module ID to their names in order to prevent name clashes because now they can be referenced from other modules.
And when that promoted global value is a function, a
.lto_set_conditional
entry is written to the original module to avoid breaking references from inline assembly:llvm-project/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
Lines 84 to 91 in d21fc58
The syntax of this is, if the original function name is
symbolA
and the module ID is123
,These symbols are parsed here:
llvm-project/llvm/lib/MC/MCParser/AsmParser.cpp
Line 6467 in 648981f
The first function symbol in this
.lto_set_conditional
do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not assigned as function symbols but they are not really data either, so the object writer crashes here:llvm-project/llvm/lib/MC/WasmObjectWriter.cpp
Line 1820 in 5b9e6c7
This PR makes the object writer just skip those symbols.
This problem was discovered when I was testing with
-fwhole-program-vtables
. The reason we didn't have this problem before with ThinLTO was because-fsplit-lto-unit
, which splits LTO units when possible, defaults to false, but it defaults to true when-fwhole-program-vtables
is used.