Skip to content

[llvm-debuginfo-analyzer] Add support for WebAssembly binary format. #82588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
316 changes: 313 additions & 3 deletions llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ DESCRIPTION
binary object files and prints their contents in a logical view, which
is a human readable representation that closely matches the structure
of the original user source code. Supported object file formats include
ELF, Mach-O, PDB and COFF.
ELF, Mach-O, WebAssembly, PDB and COFF.

The **logical view** abstracts the complexity associated with the
different low-level representations of the debugging information that
Expand Down Expand Up @@ -468,8 +468,9 @@ If the <pattern> criteria is too general, a more selective option can
be specified to target a particular category of elements:
lines (:option:`--select-lines`), scopes (:option:`--select-scopes`),
symbols (:option:`--select-symbols`) and types (:option:`--select-types`).

These options require knowledge of the debug information format (DWARF,
CodeView, COFF), as the given **kind** describes a very specific type
CodeView), as the given **kind** describes a very specific type
of element.

LINES
Expand Down Expand Up @@ -598,7 +599,7 @@ When comparing logical views created from different debug formats, its
accuracy depends on how close the debug information represents the
user code. For instance, a logical view created from a binary file with
DWARF debug information may include more detailed data than a logical
view created from a binary file with CodeView/COFF debug information.
view created from a binary file with CodeView debug information.

The following options describe the elements to compare.

Expand Down Expand Up @@ -1952,6 +1953,315 @@ The **{Coverage}** and **{Location}** attributes describe the debug
location and coverage for logical symbols. For optimized code, the
coverage value decreases and it affects the program debuggability.

WEBASSEMBLY SUPPORT
~~~~~~~~~~~~~~~~~~~
The below example is used to show the WebAssembly output generated by
:program:`llvm-debuginfo-analyzer`. We compiled the example for a
WebAssembly 32-bit target with Clang (-O0 -g --target=wasm32):

.. code-block:: c++

1 using INTPTR = const int *;
2 int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
3 if (ParamBool) {
4 typedef int INTEGER;
5 const INTEGER CONSTANT = 7;
6 return CONSTANT;
7 }
8 return ParamUnsigned;
9 }

PRINT BASIC DETAILS
^^^^^^^^^^^^^^^^^^^
The following command prints basic details for all the logical elements
sorted by the debug information internal offset; it includes its lexical
level and debug info format.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level,format
--output-sort=offset
--print=scopes,symbols,types,lines,instructions
test-clang.wasm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In LLVM clang -c emits .o and after wasm-ld we usually create .wasm. Given that the wasm file looks like a single object, would it be better to use .o here and below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changing .wasm to .o.


or

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level,format
--output-sort=offset
--print=elements
test-clang.wasm

Each row represents an element that is present within the debug
information. The first column represents the scope level, followed by
the associated line number (if any), and finally the description of
the element.

.. code-block:: none

Logical View:
[000] {File} 'test-clang.wasm' -> WASM

[001] {CompileUnit} 'test.cpp'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
[004] 5 {Line}
[004] {Code} 'i32.const 7'
[004] {Code} 'local.set 10'
[004] {Code} 'local.get 5'
[004] {Code} 'local.get 10'
[004] {Code} 'i32.store 12'
[004] 6 {Line}
[004] {Code} 'i32.const 7'
[004] {Code} 'local.set 11'
[004] {Code} 'local.get 5'
[004] {Code} 'local.get 11'
[004] {Code} 'i32.store 28'
[004] {Code} 'br 1'
[004] - {Line}
[004] {Code} 'end'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Line}
[003] {Code} 'nop'
[003] {Code} 'end'
[003] {Code} 'i64.div_s'
[003] {Code} 'global.get 0'
[003] {Code} 'local.set 3'
[003] {Code} 'i32.const 32'
[003] {Code} 'local.set 4'
[003] {Code} 'local.get 3'
[003] {Code} 'local.get 4'
[003] {Code} 'i32.sub'
[003] {Code} 'local.set 5'
[003] {Code} 'local.get 5'
[003] {Code} 'local.get 0'
[003] {Code} 'i32.store 24'
[003] {Code} 'local.get 5'
[003] {Code} 'local.get 1'
[003] {Code} 'i32.store 20'
[003] {Code} 'local.get 2'
[003] {Code} 'local.set 6'
[003] {Code} 'local.get 5'
[003] {Code} 'local.get 6'
[003] {Code} 'i32.store8 19'
[003] 3 {Line}
[003] {Code} 'local.get 5'
[003] {Code} 'i32.load8_u 19'
[003] {Code} 'local.set 7'
[003] 3 {Line}
[003] {Code} 'i32.const 1'
[003] {Code} 'local.set 8'
[003] {Code} 'local.get 7'
[003] {Code} 'local.get 8'
[003] {Code} 'i32.and'
[003] {Code} 'local.set 9'
[003] {Code} 'block'
[003] {Code} 'block'
[003] {Code} 'local.get 9'
[003] {Code} 'i32.eqz'
[003] {Code} 'br_if 0'
[003] 8 {Line}
[003] {Code} 'local.get 5'
[003] {Code} 'i32.load 20'
[003] {Code} 'local.set 12'
[003] 8 {Line}
[003] {Code} 'local.get 5'
[003] {Code} 'local.get 12'
[003] {Code} 'i32.store 28'
[003] - {Line}
[003] {Code} 'end'
[003] 9 {Line}
[003] {Code} 'local.get 5'
[003] {Code} 'i32.load 28'
[003] {Code} 'local.set 13'
[003] {Code} 'local.get 13'
[003] {Code} 'return'
[003] {Code} 'end'
[003] 9 {Line}
[003] {Code} 'unreachable'
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'

SELECT LOGICAL ELEMENTS
^^^^^^^^^^^^^^^^^^^^^^^
The following prints all *instructions*, *symbols* and *types* that
contain **'block'** or **'.store'** in their names or types, using a tab
layout and given the number of matches.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level
--select-nocase --select-regex
--select=BLOCK --select=.store
--report=list
--print=symbols,types,instructions,summary
test-clang.wasm

Logical View:
[000] {File} 'test-clang.wasm'

[001] {CompileUnit} 'test.cpp'
[003] {Code} 'block'
[003] {Code} 'block'
[004] {Code} 'i32.store 12'
[003] {Code} 'i32.store 20'
[003] {Code} 'i32.store 24'
[004] {Code} 'i32.store 28'
[003] {Code} 'i32.store 28'
[003] {Code} 'i32.store8 19'

-----------------------------
Element Total Printed
-----------------------------
Scopes 3 0
Symbols 4 0
Types 2 0
Lines 62 8
-----------------------------
Total 71 8

COMPARISON MODE
^^^^^^^^^^^^^^^
Given the previous example we found the above debug information issue
(related to the previous invalid scope location for the **'typedef int
INTEGER'**) by comparing against another compiler.

Using GCC to generate test-dwarf-gcc.o, we can apply a selection pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copypasta? (gcc doesn't actually support wasm, right? have you tried a use case like this?)

Copy link
Member Author

@CarlosAlbertoEnciso CarlosAlbertoEnciso Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct; GCC does not support wasm.

What the llvm-debuginfo-analyzer does, is to process the binary file created by GCC and creates its logical view (scopes, symbols, types, lines, etc).

Note: The use cases described in the documentation, are used as tests for the tool (DWARD, COFF and WebAssembly) (llvm/test/tools/llvm-debuginfo-analyzer)

with the printing mode to obtain the following logical view output.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level
--select-regex --select-nocase --select=INTe
--report=list
--print=symbols,types
test-clang.wasm test-dwarf-gcc.o

Logical View:
[000] {File} 'test-clang.wasm'

[001] {CompileUnit} 'test.cpp'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'

Logical View:
[000] {File} 'test-dwarf-gcc.o'

[001] {CompileUnit} 'test.cpp'
[004] 4 {TypeAlias} 'INTEGER' -> 'int'
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'

The output shows that both objects contain the same elements. But the
**'typedef INTEGER'** is located at different scope level. The GCC
generated object, shows **'4'**, which is the correct value.

There are 2 comparison methods: logical view and logical elements.

LOGICAL VIEW
""""""""""""
It compares the logical view as a whole unit; for a match, each compared
logical element must have the same parents and children.

The output shows in view form the **missing (-), added (+)** elements,
giving more context by swapping the reference and target object files.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level
--compare=types
--report=view
--print=symbols,types
test-clang.wasm test-dwarf-gcc.o

Reference: 'test-clang.wasm'
Target: 'test-dwarf-gcc.o'

Logical View:
[000] {File} 'test-clang.wasm'

[001] {CompileUnit} 'test.cpp'
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'

The output shows the merging view path (reference and target) with the
missing and added elements.

LOGICAL ELEMENTS
""""""""""""""""
It compares individual logical elements without considering if their
parents are the same. For both comparison methods, the equal criteria
includes the name, source code location, type, lexical scope level.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level
--compare=types
--report=list
--print=symbols,types,summary
test-clang.wasm test-dwarf-gcc.o

Reference: 'test-clang.wasm'
Target: 'test-dwarf-gcc.o'

(1) Missing Types:
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'

(1) Added Types:
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'

----------------------------------------
Element Expected Missing Added
----------------------------------------
Scopes 4 0 0
Symbols 0 0 0
Types 2 1 1
Lines 0 0 0
----------------------------------------
Total 6 1 1

Changing the *Reference* and *Target* order:

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level
--compare=types
--report=list
--print=symbols,types,summary
test-dwarf-gcc.o test-clang.wasm

Reference: 'test-dwarf-gcc.o'
Target: 'test-clang.wasm'

(1) Missing Types:
-[004] 4 {TypeAlias} 'INTEGER' -> 'int'

(1) Added Types:
+[003] 4 {TypeAlias} 'INTEGER' -> 'int'

----------------------------------------
Element Expected Missing Added
----------------------------------------
Scopes 4 0 0
Symbols 0 0 0
Types 2 1 1
Lines 0 0 0
----------------------------------------
Total 6 1 1
Comment on lines +2252 to +2260
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table looks the same as the previous one after the reference and the target switched. Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results are correct.

They match the case described in: https://llvm.org/docs/CommandGuide/llvm-debuginfo-analyzer.html#comparison-mode

In the case of WebAssembly vs DWARF, the number of missing and added types is the same:

 (1) Missing Types:
  -[004]     4     {TypeAlias} 'INTEGER' -> 'int'

 (1) Added Types:
  +[003]     4     {TypeAlias} 'INTEGER' -> 'int'


As the *Reference* and *Target* are switched, the *Added Types* from
the first case now are listed as *Missing Types*.

EXIT STATUS
-----------
:program:`llvm-debuginfo-analyzer` returns 0 if the input files were
Expand Down
42 changes: 42 additions & 0 deletions llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,48 @@ class LVBinaryReader : public LVReader {
std::unique_ptr<MCContext> MC;
std::unique_ptr<MCInstPrinter> MIP;

// https://yurydelendik.github.io/webassembly-dwarf/
// 2. Consuming and Generating DWARF for WebAssembly Code
// Note: Some DWARF constructs don't map one-to-one onto WebAssembly
// constructs. We strive to enumerate and resolve any ambiguities here.
//
// 2.1. Code Addresses
// Note: DWARF associates various bits of debug info
// with particular locations in the program via its code address (instruction
// pointer or PC). However, WebAssembly's linear memory address space does not
// contain WebAssembly instructions.
//
// Wherever a code address (see 2.17 of [DWARF]) is used in DWARF for
// WebAssembly, it must be the offset of an instruction relative within the
// Code section of the WebAssembly file. The DWARF is considered malformed if
// a PC offset is between instruction boundaries within the Code section.
//
// Note: It is expected that a DWARF consumer does not know how to decode
// WebAssembly instructions. The instruction pointer is selected as the offset
// in the binary file of the first byte of the instruction, and it is
// consistent with the WebAssembly Web API conventions definition of the code
// location.
//
// EXAMPLE: .DEBUG_LINE INSTRUCTION POINTERS
// The .debug_line DWARF section maps instruction pointers to source
// locations. With WebAssembly, the .debug_line section maps Code
// section-relative instruction offsets to source locations.
//
// EXAMPLE: DW_AT_* ATTRIBUTES
// For entities with a single associated code address, DWARF uses
// the DW_AT_low_pc attribute to specify the associated code address value.
// For WebAssembly, the DW_AT_low_pc's value is a Code section-relative
// instruction offset.
//
// For entities with a single contiguous range of code, DWARF uses a
// pair of DW_AT_low_pc and DW_AT_high_pc attributes to specify the associated
// contiguous range of code address values. For WebAssembly, these attributes
// are Code section-relative instruction offsets.
//
// For entities with multiple ranges of code, DWARF uses the DW_AT_ranges
// attribute, which refers to the array located at the .debug_ranges section.
LVAddress WasmCodeSectionOffset = 0;

// Loads all info for the architecture of the provided object file.
Error loadGenericTargetInfo(StringRef TheTriple, StringRef TheFeatures);

Expand Down
Loading