Skip to content

Commit 0b88de8

Browse files
committed
Add a compiler/interpreter of LLDB data formatter bytecode to examples
1 parent 2e0506f commit 0b88de8

File tree

6 files changed

+870
-0
lines changed

6 files changed

+870
-0
lines changed

lldb/docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ interesting areas to contribute to lldb.
164164
resources/fuzzing
165165
resources/sbapi
166166
resources/dataformatters
167+
resources/formatterbytecode
167168
resources/extensions
168169
resources/lldbgdbremote
169170
resources/lldbplatformpackets
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
Formatter Bytecode
2+
==================
3+
4+
Background
5+
----------
6+
7+
LLDB provides very rich customization options to display data types (see :doc:`/use/variable/`). To use custom data formatters, developers need to edit the global `~/.lldbinit` file to make sure they are found and loaded. In addition to this rather manual workflow, developers or library authors can ship ship data formatters with their code in a format that allows LLDB automatically find them and run them securely.
8+
9+
An end-to-end example of such a workflow is the Swift `DebugDescription` macro (see https://www.swift.org/blog/announcing-swift-6/#debugging ) that translates Swift string interpolation into LLDB summary strings, and puts them into a `.lldbsummaries` section, where LLDB can find them.
10+
11+
This document describes a minimal bytecode tailored to running LLDB formatters. It defines a human-readable assembler representation for the language, an efficient binary encoding, a virtual machine for evaluating it, and format for embedding formatters into binary containers.
12+
13+
Goals
14+
~~~~~
15+
16+
Provide an efficient and secure encoding for data formatters that can be used as a compilation target from user-friendly representations (such as DIL, Swift DebugDescription, or NatVis).
17+
18+
Non-goals
19+
~~~~~~~~~
20+
21+
While humans could write the assembler syntax, making it user-friendly is not a goal.
22+
23+
Design of the virtual machine
24+
-----------------------------
25+
26+
The LLDB formatter virtual machine uses a stack-based bytecode, comparable with DWARF expressions, but with higher-level data types and functions.
27+
28+
The virtual machine has two stacks, a data and a control stack. The control stack is kept separate to make it easier to reason about the security aspects of the virtual machine.
29+
30+
Data types
31+
~~~~~~~~~~
32+
33+
All objects on the data stack must have one of the following data types. These data types are "host" data types, in LLDB parlance.
34+
35+
* *String* (UTF-8)
36+
* *Int* (64 bit)
37+
* *UInt* (64 bit)
38+
* *Object* (Basically an `SBValue`)
39+
* *Type* (Basically an `SBType`)
40+
* *Selector* (One of the predefine functions)
41+
42+
*Object* and *Type* are opaque, they can only be used as a parameters of `call`.
43+
44+
Instruction set
45+
---------------
46+
47+
Stack operations
48+
~~~~~~~~~~~~~~~~
49+
50+
These instructions manipulate the data stack directly.
51+
52+
======== ========== ===========================
53+
Opcode Mnemonic Stack effect
54+
-------- ---------- ---------------------------
55+
0x00 `dup` `(x -> x x)`
56+
0x01 `drop` `(x y -> x)`
57+
0x02 `pick` `(x ... UInt -> x ... x)`
58+
0x03 `over` `(x y -> x y x)`
59+
0x04 `swap` `(x y -> y x)`
60+
0x05 `rot` `(x y z -> z x y)`
61+
======= ========== ===========================
62+
63+
Control flow
64+
~~~~~~~~~~~~
65+
66+
These manipulate the control stack and program counter.
67+
68+
======== ========== ============================================================
69+
Opcode Mnemonic Description
70+
-------- ---------- ------------------------------------------------------------
71+
0x10 `{` push a code block address onto the control stack
72+
-- `}` (technically not an opcode) syntax for end of code block
73+
0x11 `if` pop a block from the control stack,
74+
if the top of the data stack is nonzero, execute it
75+
0x12 `ifelse` pop two blocks from the control stack, if the top of
76+
the data stack is nonzero, execute the first,
77+
otherwise the second.
78+
======== ========== ============================================================
79+
80+
Literals for basic types
81+
~~~~~~~~~~~~~~~~~~~~~~~~
82+
83+
======== =========== ============================================================
84+
Opcode Mnemonic Description
85+
-------- ----------- ------------------------------------------------------------
86+
0x20 `123u` `( -> UInt)` push an unsigned 64-bit host integer
87+
0x21 `123` `( -> Int)` push a signed 64-bit host integer
88+
0x22 `"abc"` `( -> String)` push a UTF-8 host string
89+
0x23 `@strlen` `( -> Selector)` push one of the predefined function
90+
selectors. See `call`.
91+
======== =========== ============================================================
92+
93+
Arithmetic, logic, and comparison operations
94+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95+
96+
======== ========== ===========================
97+
Opcode Mnemonic Stack effect
98+
-------- ---------- ---------------------------
99+
0x30 `+` `(x y -> [x+y])`
100+
0x31 `-` etc ...
101+
0x32 `*`
102+
0x33 `/`
103+
0x34 `%`
104+
0x35 `<<`
105+
0x36 `>>`
106+
0x37 `shra` (arithmetic shift right)
107+
0x40 `~`
108+
0x41 `|`
109+
0x42 `^`
110+
0x50 `=`
111+
0x51 `!=`
112+
0x52 `<`
113+
0x53 `>`
114+
0x54 `=<`
115+
0x55 `>=`
116+
======== ========== ===========================
117+
118+
Function calls
119+
~~~~~~~~~~~~~~
120+
121+
For security reasons the list of functions callable with `call` is predefined. The supported functions are either existing methods on `SBValue`, or string formatting operations.
122+
123+
======== ========== ============================================
124+
Opcode Mnemonic Stack effect
125+
-------- ---------- --------------------------------------------
126+
0x60 `call` `(Object argN ... arg0 Selector -> retval)`
127+
======== ========== ============================================
128+
129+
Method is one of a predefined set of *Selectors*.
130+
131+
==== ============================ =================================================== ==================================
132+
Sel. Mnemonic Stack Effect Description
133+
---- ---------------------------- --------------------------------------------------- ----------------------------------
134+
0x00 `summary` `(Object @summary -> String)` `SBValue::GetSummary`
135+
0x01 `type_summary` `(Object @type_summary -> String)` `SBValue::GetTypeSummary`
136+
0x10 `get_num_children` `(Object @get_num_children -> UInt)` `SBValue::GetNumChildren`
137+
0x11 `get_child_at_index` `(Object UInt @get_child_at_index -> Object)` `SBValue::GetChildAtIndex`
138+
0x12 `get_child_with_name` `(Object String @get_child_with_name -> Object)` `SBValue::GetChildAtIndex`
139+
0x13 `get_child_index` `(Object String @get_child_index -> UInt)` `SBValue::GetChildIndex`
140+
0x15 `get_type` `(Object @get_type -> Type)` `SBValue::GetType`
141+
0x16 `get_template_argument_type` `(Object UInt @get_template_argument_type -> Type)` `SBValue::GetTemplateArgumentType`
142+
0x17 `cast` `(Object Type @cast -> Object)` `SBValue::Cast`
143+
0x20 `get_value` `(Object @get_value -> Object)` `SBValue::GetValue`
144+
0x21 `get_value_as_unsigned` `(Object @get_value_as_unsigned -> UInt)` `SBValue::GetValueAsUnsigned`
145+
0x22 `get_value_as_signed` `(Object @get_value_as_signed -> Int)` `SBValue::GetValueAsSigned`
146+
0x23 `get_value_as_address` `(Object @get_value_as_address -> UInt)` `SBValue::GetValueAsAddress`
147+
0x24 `get_value_as_address` `(Object @get_value_as_address -> UInt)` `SBValue::GetValueAsAddress`
148+
0x40 `read_memory_byte` `(UInt @read_memory_byte -> UInt)` `Target::ReadMemory`
149+
0x41 `read_memory_uint32` `(UInt @read_memory_uint32 -> UInt)` `Target::ReadMemory`
150+
0x42 `read_memory_int32` `(UInt @read_memory_int32 -> Int)` `Target::ReadMemory`
151+
0x43 `read_memory_uint64` `(UInt @read_memory_uint64 -> UInt)` `Target::ReadMemory`
152+
0x44 `read_memory_int64` `(UInt @read_memory_int64 -> Int)` `Target::ReadMemory`
153+
0x45 `read_memory_address` `(UInt @read_memory_uint64 -> UInt)` `Target::ReadMemory`
154+
0x46 `read_memory` `(UInt Type @read_memory -> Object)` `Target::ReadMemory`
155+
0x50 `fmt` `(String arg0 ... @fmt -> String)` `llvm::format`
156+
0x51 `sprintf` `(String arg0 ... sprintf -> String)` `sprintf`
157+
0x52 `strlen` `(String strlen -> String)` `strlen in bytes`
158+
==== ============================ =================================================== ==================================
159+
160+
Byte Code
161+
~~~~~~~~~
162+
163+
Most instructions are just a single byte opcode. The only exceptions are the literals:
164+
165+
* *String*: Length in bytes encoded as ULEB128, followed length bytes
166+
* *Int*: LEB128
167+
* *UInt*: ULEB128
168+
* *Selector*: ULEB128
169+
170+
Embedding
171+
~~~~~~~~~
172+
173+
Expression programs are embedded into an `.lldbformatters` section (an evolution of the Swift `.lldbsummaries` section) that is a dictionary of type names/regexes and descriptions. It consists of a list of records. Each record starts with the following header:
174+
175+
* Version number (ULEB128)
176+
* Remaining size of the record (minus the header) (ULEB128)
177+
178+
The version number is increased whenever an incompatible change is made. Adding new opcodes is not an incompatible change since consumers can unambiguously detect this and report an error.
179+
180+
Space between two records may be padded with NULL bytes.
181+
182+
In version 1, a record consists of a dictionary key, which is type name or regex.
183+
184+
* Length of the key in bytes (ULEB128)
185+
* The key (UTF-8)
186+
187+
A regex has to start with `^`, which is part of the regular expression.
188+
189+
This is followed by one or more dictionary values that immediately follow each other and entirely fill out the record size from the header. Each expression program has the following layout:
190+
191+
* Function signature (1 byte)
192+
* Length of the program (ULEB128)
193+
* The program bytecode
194+
195+
The possible function signatures are:
196+
197+
========= ====================== ==========================
198+
Signature Mnemonic Stack Effect
199+
--------- ---------------------- --------------------------
200+
0x00 `@summary` `(Object -> String)`
201+
0x01 `@init` `(Object -> Object+)`
202+
0x02 `@get_num_children` `(Object+ -> UInt)`
203+
0x03 `@get_child_index` `(Object+ String -> UInt)`
204+
0x04 `@get_child_at_index` `(Object+ UInt -> Object)`
205+
0x05 `@get_value` `(Object+ -> Object)`
206+
========= ====================== ==========================
207+
208+
If not specified, the init function defaults to an empty function that just passes the Object along. Its results may be cached and allow common prep work to be done for an Object that can be reused by subsequent calls to the other methods. This way subsequent calls to `@get_child_at_index` can avoid recomputing shared information, for example.
209+
210+
While it is more efficient to store multiple programs per type key, this is not a requirement. LLDB will merge all entries. If there are conflicts the result is undefined.
211+
212+
Execution model
213+
~~~~~~~~~~~~~~~
214+
215+
Execution begins at the first byte in the program. The program counter of the virtual machine starts at offset 0 of the bytecode and may never move outside the range of the program as defined in the header. The data stack starts with one Object or the result of the `@init` function (`Object+` in the table above).
216+
217+
Error handling
218+
~~~~~~~~~~~~~~
219+
220+
In version 1 errors are unrecoverable, the entire expression will fail if any kind of error is encountered.
221+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
all: test
2+
3+
.PHONY: test
4+
test:
5+
python3 compiler.py
6+
mkdir -p _test
7+
clang++ -std=c++17 test/MyOptional.cpp -g -o _test/MyOptional
8+
lldb _test/MyOptional -o "command script import test/formatter.py" -o "b -p here" -o "r" -o "v x" -o "v y" -o q

0 commit comments

Comments
 (0)