Skip to content

[clang][docs] Update bytecode interpreter docs #124252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 16 additions & 103 deletions clang/docs/ConstantInterpreter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ by the evaluator. The interpreter is activated using the following flags:
Bytecode Compilation
====================

Bytecode compilation is handled in ``ByteCodeStmtGen.h`` for statements
and ``ByteCodeExprGen.h`` for expressions. The compiler has two different
Bytecode compilation is handled in ``Compiler.h`` for statements
and for expressions. The compiler has two different
backends: one to generate bytecode for functions (``ByteCodeEmitter``) and
one to directly evaluate expressions as they are compiled, without
generating bytecode (``EvalEmitter``). All functions are compiled to
Expand All @@ -44,11 +44,11 @@ Primitive Types
Signed or unsigned integers of a specific bit width, implemented using
the ```Integral``` type.

* ``PT_{U|S}intFP``
* ``PT_IntAP{S}``

Signed or unsigned integers of an arbitrary, but fixed width used to
implement integral types which are required by the target, but are not
supported by the host. Under the hood, they rely on APValue. The
supported by the host. Under the hood, they rely on ``APInt``. The
``Integral`` specialisation for these types is required by opcodes to
share an implementation with fixed integrals.

Expand All @@ -57,38 +57,29 @@ Primitive Types
Representation for boolean types, essentially a 1-bit unsigned
``Integral``.

* ``PT_RealFP``
* ``PT_Float``

Arbitrary, but fixed precision floating point numbers. Could be
specialised in the future similarly to integers in order to improve
floating point performance.

* ``PT_Ptr``

Pointer type, defined in ``"Pointer.h"``. A pointer can be either null,
reference interpreter-allocated memory (``BlockPointer``) or point to an
address which can be derived, but not accessed (``ExternPointer``).
Pointer type, defined in ``"Pointer.h"``. The most common type of
pointer is a "BlockPointer", which points to an ``interp::Block``.
But other pointer types exist, such as typeid pointers or
integral pointers.

* ``PT_FnPtr``

Function pointer type, can also be a null function pointer. Defined
in ``"FnPointer.h"``.
in ``"FunctionPointer.h"``.

* ``PT_MemPtr``
* ``PT_MemberPtr``

Member pointer type, can also be a null member pointer. Defined
in ``"MemberPointer.h"``

* ``PT_VoidPtr``

Void pointer type, can be used for round-trip casts. Represented as
the union of all pointers which can be cast to void.
Defined in ``"VoidPointer.h"``.

* ``PT_ObjCBlockPtr``

Pointer type for ObjC blocks. Defined in ``"ObjCBlockPointer.h"``.

Composite types
---------------

Expand Down Expand Up @@ -219,35 +210,21 @@ Pointers
--------

Pointers, implemented in ``Pointer.h`` are represented as a tagged union.
Some of these may not yet be available in upstream ``clang``.

* **BlockPointer**: used to reference memory allocated and managed by the
interpreter, being the only pointer kind which allows dereferencing in the
interpreter
* **ExternPointer**: points to memory which can be addressed, but not read by
the interpreter. It is equivalent to APValue, tracking a declaration and a path
of fields and indices into that allocation.
* **TargetPointer**: represents a target address derived from a base address
through pointer arithmetic, such as ``((int *)0x100)[20]``. Null pointers are
target pointers with a zero offset.
* **TypeInfoPointer**: tracks information for the opaque type returned by
* **TypeIDPointer**: tracks information for the opaque type returned by
``typeid``
* **InvalidPointer**: is dummy pointer created by an invalid operation which
allows the interpreter to continue execution. Does not allow pointer
arithmetic or dereferencing.
* **IntegralPointer**: a pointer formed from an integer,
think ``(int*)123``.

Besides the previously mentioned union, a number of other pointer-like types
have their own type:

* **ObjCBlockPointer** tracks Objective-C blocks
* **FnPointer** tracks functions and lazily caches their compiled version
* **FunctionPointer** tracks functions.
* **MemberPointer** tracks C++ object members

Void pointers, which can be built by casting any of the aforementioned
pointers, are implemented as a union of all pointer types. The ``BitCast``
opcode is responsible for performing all legal conversions between these
types and primitive integers.

BlockPointer
~~~~~~~~~~~~

Expand Down Expand Up @@ -311,73 +288,9 @@ of ``a.c``, but its offset would point to ``&a.c[1]``. The
array-to-pointer decay operation adjusts a pointer to an array (where
the offset is equal to the base) to a pointer to the first element.

ExternPointer
~~~~~~~~~~~~~

Extern pointers can be derived, pointing into symbols which are not
readable from constexpr. An external pointer consists of a base
declaration, along with a path designating a subobject, similar to
the ``LValuePath`` of an APValue. Extern pointers can be converted
to block pointers if the underlying variable is defined after the
pointer is created, as is the case in the following example:

.. code-block:: c

extern const int a;
constexpr const int *p = &a;
const int a = 5;
static_assert(*p == 5, "x");

TargetPointer
~~~~~~~~~~~~~

While null pointer arithmetic or integer-to-pointer conversion is
banned in constexpr, some expressions on target offsets must be folded,
replicating the behaviour of the ``offsetof`` builtin. Target pointers
are characterised by 3 offsets: a field offset, an array offset and a
base offset, along with a descriptor specifying the type the pointer is
supposed to refer to. Array indexing adjusts the array offset, while the
field offset is adjusted when a pointer to a member is created. Casting
an integer to a pointer sets the value of the base offset. As a special
case, null pointers are target pointers with all offsets set to 0.

TypeInfoPointer
~~~~~~~~~~~~~~~

``TypeInfoPointer`` tracks two types: the type assigned to
``std::type_info`` and the type which was passed to ``typeinfo``.

InvalidPointer
~~~~~~~~~~~~~~

Such pointers are built by operations which cannot generate valid
pointers, allowing the interpreter to continue execution after emitting
a warning. Inspecting such a pointer stops execution.

TODO
====

Missing Language Features
-------------------------

* Changing the active field of unions
* ``volatile``
* ``__builtin_constant_p``
* ``dynamic_cast``
* ``new`` and ``delete``
* Fixed Point numbers and arithmetic on Complex numbers
* Several builtin methods, including string operations and
``__builtin_bit_cast``
* Continue-after-failure: a form of exception handling at the bytecode
level should be implemented to allow execution to resume. As an example,
argument evaluation should resume after the computation of an argument fails.
* Pointer-to-Integer conversions
* Lazy descriptors: the interpreter creates a ``Record`` and ``Descriptor``
when it encounters a type: ones which are not yet defined should be lazily
created when required

Known Bugs
----------

* If execution fails, memory storing APInts and APFloats is leaked when the
stack is cleared
It is part of the taged union in ``Pointer``.
11 changes: 0 additions & 11 deletions clang/www/OpenProjects.html
Original file line number Diff line number Diff line change
Expand Up @@ -90,17 +90,6 @@ <h1>Open Clang Projects</h1>
performance as well as to find ways to proactively alert us when we've
introduced a change that has significant negative impact on build times.</li>

<li><b>Complete support for the experimental constant expression interpreter
</b>: Clang's production constant expression interpreter computes a constant
expression result by walking over AST nodes, performing calculations as it
goes. This does not have good performance properties, and so we've begun work
on an <a href="https://clang.llvm.org/docs/ConstantInterpreter.html">
experimental constant expression interpreter</a> that works by converting the
AST into bytecode that is interpreted. This effort has a long tail of work left
to complete because it requires implementing byte code for every kind of
expression and type that can be used in a constant expression for C++ and C.
</li>

<li><b>Improve clang-doc</b>: Clang's library-based design allows it to be used
by a variety of tools that reason about source code.
<a href="https://clang.llvm.org/extra/clang-doc.html">clang-doc</a> is one
Expand Down
Loading