Skip to content

Commit 9712e45

Browse files
committed
[ownership] Add a section to SIL.rst that describes the semantics of safe interior pointers in Ownership SIL.
This is just part of my ongoing effort to document more explicitly Ownership SSA in SIL.rst. NOTE: I realized while writing this, we might be able toalso do this form of verification with owned values. I don't know how easy/hard it would be though. Its nice to have that owned values do not need to look for these when optimizing. So that would need to be balanced against expanding this.
1 parent b785a5f commit 9712e45

File tree

1 file changed

+171
-0
lines changed

1 file changed

+171
-0
lines changed

docs/SIL.rst

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2043,6 +2043,177 @@ parts::
20432043
return %1 : $Klass
20442044
}
20452045

2046+
Borrowed Object based Safe Interior Pointers
2047+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2048+
2049+
What is an "Unsafe Interior Pointer"
2050+
````````````````````````````````````
2051+
2052+
An unsafe interior pointer is a bare pointer into the innards of an object. A
2053+
simple example of this in C++ would be using the method std::vector::data() to
2054+
get to the innards of a std::vector. In general interior pointers are unsafe to
2055+
use since languages do not provide any guarantees that the interior pointer will
2056+
not be used after the underlying object has been deallocated. To see this,
2057+
consider the following C++ example::
2058+
2059+
int unfortunateFunction() {
2060+
int *unsafeInteriorPointer = nullptr;
2061+
{
2062+
std::vector<int> vector;
2063+
vector.push_back(5);
2064+
unsafeInteriorPointer = vector.data();
2065+
printf("%d\n", *unsafeInteriorPointer); // Prints "5".
2066+
} // vector deallocated here
2067+
return *unsafeInteriorPointer; // Kaboom
2068+
}
2069+
2070+
In words, C++ allows for us to get the interior pointer into the vector, but
2071+
then lets us do whatever we want with the pointer, including use it after the
2072+
underlying memory has been invalidated.
2073+
2074+
From a user's perspective, interior pointers are really useful since one can use
2075+
it to pass data to other APIs that are only expecting a pointer and also since
2076+
one can use it to sometimes get better performance. But from a language designer
2077+
perspective, this sort of API verboten and leads to bugs, crashes, and security
2078+
vulnerabilities. That being said, clearly users have a need for such
2079+
functionality, so we, as language designers, should figure out manners to
2080+
express these sorts of patterns in our various languages in a safe way that
2081+
prevents user’s from foot-gunning themselves. In SIL, we have solved this
2082+
problem via the direct modeling of interior pointer instructions as a high level
2083+
concept in our IR.
2084+
2085+
Safe Interior Pointers in SIL
2086+
`````````````````````````````
2087+
2088+
In contrast to LLVM-IR, SIL provides mechanisms that language designers can use
2089+
to express concepts like the above in a manner that allows the language to
2090+
define away compiler generated unsafe interior pointer usage using "Safe
2091+
Interior Pointers". This is implemented in SIL by:
2092+
2093+
1. Classifying a set of instructions as being "interior pointer" instructions.
2094+
2. Enforcing in the SILVerifier that all "interior pointer" instructions can
2095+
only have operands with `Guaranteed`_ ownership.
2096+
3. Enforcing in the SILVerifier that any transitive address use of the interior
2097+
pointer to be a liveness requirement of the "interior pointer"'s
2098+
operand.
2099+
2100+
Note that the transitive address use verifier from (3) does not attempt to
2101+
classify uses directly. Instead the verifier:
2102+
2103+
1. Has an explicit list of instructions that it understands as requiring
2104+
liveness of the base object.
2105+
2106+
2. Has a second list of instructions that require liveness and produce a address
2107+
whose transitive uses need to be recursively processed.
2108+
2109+
3. Asserts on any instructions that are not known to the verifier. This ensures
2110+
that the verifier is kept up to date with new instructions.
2111+
2112+
Note that typically instructions in category (1) are instructions whose uses do
2113+
not propagate the pointer value, so they are safe. In contrast, some other
2114+
instructions in category (1) are escaping uses of the address such as
2115+
`pointer_to_address`_. Those uses are unsafe--the user is reponsible for
2116+
managing unsafe pointer lifetimes and the compiler must not extend those pointer
2117+
lifetimes.
2118+
2119+
These rules ensure statically that any uses of the address that are not escaped
2120+
explicitly by an instruction like `pointer_to_address`_ are within the
2121+
guaranteed pointers scope where the guaranteed value is statically known to be
2122+
live. As a result, in SIL it is impossible to express such a bug in compiler
2123+
generated code. As an example, consider the following unsafe interior pointer
2124+
SIL::
2125+
2126+
class Klass { var k: KlassField }
2127+
struct KlassWrapper { var k: Klass }
2128+
2129+
// ...
2130+
2131+
// Today SIL restricts interior pointer instructions to only have operands
2132+
// with guaranteed ownership.
2133+
%1 = begin_borrow %0 : $Klass
2134+
2135+
// %2 is an interior pointer into %1. Since %2 is an address, it's uses are
2136+
// not treated as uses of underlying borrowed object %1 in the ownership
2137+
// system. This is because at the ownership level objects with None
2138+
// ownership are not verified and do not have any constraints on how they
2139+
// are used from the ownership system.
2140+
//
2141+
// Instead the ownership verifier gathers up all such uses and treats them
2142+
// as uses of the object from which the interior pointer was projected from
2143+
// transitively. This means that this is a constraint on the guaranteed
2144+
// objects use, not on the trivial values.
2145+
%2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
2146+
%3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k // %3 is a $*Klass
2147+
2148+
// So if we end the borrow %1 at this point, invalidating the addresses
2149+
// ``%2`` and ``%3``.
2150+
end_borrow %1 : $Klass
2151+
2152+
// We would here be loading from an invalidated address. This would cause a
2153+
// verifier error since %3's use here is a regular use that is inferred up
2154+
// on %1.
2155+
%4 = load [copy] %3 : $*KlassWrapper
2156+
2157+
// ...
2158+
2159+
Notice how due to a possible bug in the compiler, we are loading from
2160+
potentially uninitialized memory ``%4``. This would have caused a verifier error
2161+
stating that ``%4`` was an interior pointer based use-after-free of ``%1``
2162+
implying this is mal-formed SIL.
2163+
2164+
NOTE: This is a constraint on the base object, not on the addresses themselves
2165+
which are viewed as outside of the ownership system since they have `None`_
2166+
ownership.
2167+
2168+
In contrast to the previous example, the following example follows ownership
2169+
invariants and is valid SIL::
2170+
2171+
class Klass { var k: KlassField }
2172+
struct KlassWrapper { var k: Klass }
2173+
2174+
// ...
2175+
2176+
%1 = begin_borrow %0 : $Klass
2177+
// %2 is an interior pointer into the Klass k. Since %2 is an address and
2178+
// addresses have None ownership, it's uses are not treated as uses of the
2179+
// underlying object %1.
2180+
%2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
2181+
2182+
// Destroying %1 at this location would result in a verifier error since
2183+
// %2's uses are considered to be uses of %1.
2184+
//
2185+
// end_lifetime %1 : $Klass
2186+
2187+
// We are statically not loading from an invalidated address here since we
2188+
// are within the lifetime of ``%1``.
2189+
%3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k
2190+
%4 = load [copy] %3 : $*Klass // %1 must be live here transitively
2191+
2192+
// ``%1``'s lifetime ends. Importantly we know that within the lifetime of
2193+
// ``%1``, ``%0``'s lifetime can not shrink past this point, implying
2194+
// transitive static safety.
2195+
end_borrow %1 : $Klass
2196+
2197+
In the second example, we show a well-formed SIL program showing off SIL's Safe
2198+
Interior Pointers. All of the uses of ``%2``, the interior pointer, are
2199+
transitively uses of the base underlying object, ``%0``.
2200+
2201+
The current list of interior pointer SIL instructions are:
2202+
2203+
* `project_box`_ - projects a pointer out of a reference counted box. (*)
2204+
* `ref_element_addr`_ - projects a field out of a reference counted class.
2205+
* `ref_tail_addr`_ - projects out a pointer to a class’s tail allocated array
2206+
memory (assuming the class was initialized to have such an array).
2207+
* `open_existential_box`_ - projects the address of the value out of a boxed
2208+
existential container using the current function context/protocol conformance
2209+
to create an "opened archetype".
2210+
* `project_existential_box`_ - projects a pointer to the value inside a boxed
2211+
existential container. Must be the type for which the box was initially
2212+
allocated for and not for an "opened" archetype.
2213+
2214+
(*) We still need to finish adding support for project_box, but all other
2215+
interior pointers are guarded already.
2216+
20462217
Runtime Failure
20472218
---------------
20482219

0 commit comments

Comments
 (0)