Skip to content

[WebAssembly] Support parsing .lto_set_conditional #126546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 1, 2025

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Feb 10, 2025

In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the split module.

And if there is an originally local-linkage global value defined in the original module and referenced in the split module or the vice versa, that value is promoted by attaching a module ID to their names in order to prevent name clashes because now they can be referenced from other modules.

And when that promoted global value is a function, a .lto_set_conditional entry is written to the original module to avoid breaking references from inline assembly:

if (isa<Function>(&ExportGV) && allowPromotionAlias(OldName)) {
// Create a local alias with the original name to avoid breaking
// references from inline assembly.
std::string Alias =
".lto_set_conditional " + OldName + "," + NewName + "\n";
ExportM.appendModuleInlineAsm(Alias);
}
}

The syntax of this is, if the original function name is symbolA and the module ID is 123,

module asm ".lto_set_conditional symbolA,symbolA.123"

These symbols are parsed here:

Sym = Parser.getContext().getOrCreateSymbol(Name);

The first function symbol in this .lto_set_conditional do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not assigned as function symbols but they are not really data either, so the object writer crashes here:

assert(DataLocations.count(&WS) > 0);

This PR makes the object writer just skip those symbols.


This problem was discovered when I was testing with -fwhole-program-vtables. The reason we didn't have this problem before with ThinLTO was because -fsplit-lto-unit, which splits LTO units when possible, defaults to false, but it defaults to true when -fwhole-program-vtables is used.

In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the second module.

And if there is an originally local-linkage global value defined in the
first module and referenced in the second module or the vice versa, that
value is _promoted_ by attaching a module has ID to their names in order
to prevent name clashes because now they can be referenced from other
modules.

And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the first module to avoid
breaking references from inline assembly:
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91

The syntax of this is, if the original function name is `a` and the
module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:
https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467

The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with
`-fwhole-program-vtables`.
The reason we didn't have this problem before with ThinLTO was because
`-fsplit-lto-unit`, which splits LTO units when possible, defaults to
false, but it defaults to true when `-fwhole-program-vtables` is used.
@aheejin aheejin requested a review from sbc100 February 10, 2025 17:09
@llvmbot llvmbot added backend:WebAssembly mc Machine (object) code labels Feb 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-webassembly

Author: Heejin Ahn (aheejin)

Changes

In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a specific criteria is moved to the second module.

And if there is an originally local-linkage global value defined in the first module and referenced in the second module or the vice versa, that value is promoted by attaching a module has ID to their names in order to prevent name clashes because now they can be referenced from other modules.

And when that promoted global value is a function, a .lto_set_conditional entry is written to the first module to avoid breaking references from inline assembly:

if (isa<Function>(&ExportGV) && allowPromotionAlias(OldName)) {
// Create a local alias with the original name to avoid breaking
// references from inline assembly.
std::string Alias =
".lto_set_conditional " + OldName + "," + NewName + "\n";
ExportM.appendModuleInlineAsm(Alias);
}
}

The syntax of this is, if the original function name is a and the module ID is 123,

module asm ".lto_set_conditional symbolA,symbolA.123"

These symbols are parsed here:

Sym = Parser.getContext().getOrCreateSymbol(Name);

The first function symbol in this .lto_set_conditional do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not assigned as function symbols but they are not really data either, so the object writer crashes here:

assert(DataLocations.count(&WS) > 0);

This PR makes the object writer just skip those symbols.


This problem was discovered when I was testing with -fwhole-program-vtables.
The reason we didn't have this problem before with ThinLTO was because -fsplit-lto-unit, which splits LTO units when possible, defaults to false, but it defaults to true when -fwhole-program-vtables is used.


Full diff: https://github.com/llvm/llvm-project/pull/126546.diff

2 Files Affected:

  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (+12-1)
  • (added) llvm/test/MC/WebAssembly/lto-set-conditional.ll (+13)
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index c5a95cb3da54331..d51bbd642196acb 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1817,7 +1817,18 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       assert(WasmIndices.count(&WS) > 0);
       Info.ElementIndex = WasmIndices.find(&WS)->second;
     } else if (WS.isDefined()) {
-      assert(DataLocations.count(&WS) > 0);
+      if (!DataLocations.count(&WS))
+        // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines
+        // can appear:
+        // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
+        // ...
+        // (Here [moduleId] will be replaced by a real module hash ID)
+        //
+        // Here the original symbols (symbolA here) have been renamed to symbol
+        // new names created by attaching their module IDs and the original
+        // symbols do not appear in the bitcode anymore, and thus not in
+        // DataLocations. We should ignore them.
+        continue;
       Info.DataRef = DataLocations.find(&WS)->second;
     }
     WS.setIndex(SymbolInfos.size());
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.ll b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
new file mode 100644
index 000000000000000..84270aea60bdebe
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
@@ -0,0 +1,13 @@
+; RUN: llc %s -filetype=obj
+
+target triple = "wasm32-unknown-unknown"
+
+; In .lto_set_conditional, the first symbol is renamed to the second symbol, so
+; the first symbol does not exist anymore in the file. Object writer should not
+; crash when .lto_set_conditional is present.
+
+module asm ".lto_set_conditional a,a.new"
+
+define hidden void @a.new() {
+  ret void
+}

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work tracking this down!

; the first symbol does not exist anymore in the file. Object writer should not
; crash when .lto_set_conditional is present.

module asm ".lto_set_conditional a,a.new"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we wring this in asm directly instead? (i.e could this test be a .s file?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Here the original symbols (symbolA here) have been renamed to symbol
// new names created by attaching their module IDs and the original
// symbols do not appear in the bitcode anymore, and thus not in
// DataLocations. We should ignore them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should instead add an early return right that top of this function (before LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");).

Something like:

if (WS.isData() && !DataLocations.count(&WS)) contionue

Or maybe we could have isInSymtab return false for these symbols which is already handled with a continue above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@aheejin aheejin merged commit 4d1c827 into llvm:main Apr 1, 2025
11 checks passed
@aheejin aheejin deleted the lto_set_conditional_fix branch April 1, 2025 18:15
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request Apr 2, 2025
In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the split module.
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L315-L366

And if there is an originally local-linkage global value defined in the
original module and referenced in the split module or the vice versa,
that value is _promoted_ by attaching a module ID to their names in
order to prevent name clashes because now they can be referenced from
other modules.
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L46-L100

And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the original module to avoid
breaking references from inline assembly:

https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91

The syntax of this is, if the original function name is `symbolA` and
the module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:

https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467

The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with
`-fwhole-program-vtables`. The reason we didn't have this problem before
with ThinLTO was because `-fsplit-lto-unit`, which splits LTO units when
possible, defaults to false, but it defaults to true when
`-fwhole-program-vtables` is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:WebAssembly mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants