Skip to content

Commit ec80b2a

Browse files
committed
[IR] Introduce captures attribute
This introduces the `captures` attribute as described in: https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420 This initial patch only introduces the IR/bitcode support for the attribute and its in-memory representation as `CaptureInfo`. This will be followed by a patch to remove (and upgrade) the `nocapture` attribute, and then by actual inference/analysis support. Based on the RFC feedback, I've used a syntax similar to the `memory` attribute, though the only "location" that can be specified right now is `ret`. I've added some pretty extensive documentation to LangRef on the semantics. One non-obvious bit here is that using ptrtoint will not result in a "return-only" capture, even if the ptrtoint result is only used in the return value. Without this requirement we wouldn't be able to continue ordinary capture analysis on the return value.
1 parent c2979c5 commit ec80b2a

File tree

19 files changed

+469
-9
lines changed

19 files changed

+469
-9
lines changed

llvm/docs/LangRef.rst

Lines changed: 120 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1397,6 +1397,36 @@ Currently, only the following parameter attributes are defined:
13971397
function, returning a pointer to allocated storage disjoint from the
13981398
storage for any other object accessible to the caller.
13991399

1400+
``captures(...)``
1401+
This attributes restrict the ways in which the callee may capture the
1402+
pointer. This is not a valid attribute for return values. This attribute
1403+
applies only to the particular copy of the pointer passed in this argument.
1404+
1405+
The arguments of ``captures`` is a list of captured pointer components,
1406+
which may be ``none``, or a combination of:
1407+
1408+
- ``address``: The integral address of the pointer.
1409+
- ``provenance``: The ability to access the pointer for both read and write
1410+
after the function returns.
1411+
- ``read_provenance``: The ability to access the pointer only for reads
1412+
after the function returns.
1413+
1414+
Additionally, it is possible to specify that the pointer is captured via
1415+
the return value only, by using ``caputres(ret: ...)``.
1416+
1417+
The `pointer capture section <pointercapture>` discusses these semantics
1418+
in more detail.
1419+
1420+
Some examples of how to use the attribute:
1421+
1422+
- ``captures(none)``: Pointer not captured.
1423+
- ``captures(address, provenance)``: Equivalent to omitting the attribute.
1424+
- ``captures(address)``: Address may be captured, but not provenance.
1425+
- ``captures(address, read_provenance)``: Both address and provenance
1426+
captured, but only for read-only access.
1427+
- ``captures(ret: address, provenance)``: Pointer captured through return
1428+
value only.
1429+
14001430
.. _nocapture:
14011431

14021432
``nocapture``
@@ -3339,10 +3369,91 @@ Pointer Capture
33393369
---------------
33403370

33413371
Given a function call and a pointer that is passed as an argument or stored in
3342-
the memory before the call, a pointer is *captured* by the call if it makes a
3343-
copy of any part of the pointer that outlives the call.
3344-
To be precise, a pointer is captured if one or more of the following conditions
3345-
hold:
3372+
memory before the call, the call may capture two components of the pointer:
3373+
3374+
* The address of the pointer, which is its integral value. This also includes
3375+
parts of the address or any information about the address, including the
3376+
fact that it does not equal one specific value.
3377+
* The provenance of the pointer, which is the ability to perform memory
3378+
accesses through the pointer, in the sense of the :ref:`pointer aliasing
3379+
rules <pointeraliasing>`. We further distinguish whether only read acceses
3380+
are allowed, or both reads and writes.
3381+
3382+
For example, the following function captures the address of ``%a``, because
3383+
it is compared to a pointer, leaking information about the identitiy of the
3384+
pointer:
3385+
3386+
.. code-block:: llvm
3387+
3388+
@glb = global i8 0
3389+
3390+
define i1 @f(ptr %a) {
3391+
%c = icmp eq ptr %a, @glb
3392+
ret i1 %c
3393+
}
3394+
3395+
The function does not capture the provenance of the pointer, because the
3396+
``icmp`` instruction only operates on the pointer address. The following
3397+
function captures both the address and provenance of the pointer, as both
3398+
may be read from ``@glb`` after the function returns:
3399+
3400+
.. code-block:: llvm
3401+
3402+
@glb = global ptr null
3403+
3404+
define void @f(ptr %a) {
3405+
store ptr %a, ptr @glb
3406+
ret void
3407+
}
3408+
3409+
The following function captures *neither* the address nor the provenance of
3410+
the pointer:
3411+
3412+
.. code-block:: llvm
3413+
3414+
define i32 @f(ptr %a) {
3415+
%v = load i32, ptr %a
3416+
ret i32
3417+
}
3418+
3419+
While address capture includes uses of the address within the body of the
3420+
function, provenance capture refers exclusively to the ability to perform
3421+
accesses *after* the function returns. Memory accesses within the function
3422+
itself are not considered pointer captures.
3423+
3424+
We can further say that the capture only occurs through a specific location.
3425+
In the following example, the pointer (both address and provenance) is captured
3426+
through the return value only:
3427+
3428+
.. code-block:: llvm
3429+
3430+
define ptr @f(ptr %a) {
3431+
%gep = getelementptr i8, ptr %a, i64 4
3432+
ret ptr %gep
3433+
}
3434+
3435+
However, we always consider direct inspection of the pointer address
3436+
(e.g. using ``ptrtoint``) to be location-independent. The following example
3437+
is *not* considered a return-only capture, even though the ``ptrtoint``
3438+
ultimately only contribues to the return value:
3439+
3440+
.. code-block:: llvm
3441+
3442+
@lookup = constant [4 x i8] [i8 0, i8 1, i8 2, i8 3]
3443+
3444+
define ptr @f(ptr %a) {
3445+
%a.addr = ptrtoint ptr %a to i64
3446+
%mask = and i64 %a.addr, 3
3447+
%gep = getelementptr i8, ptr @lookup, i64 %mask
3448+
ret ptr %gep
3449+
}
3450+
3451+
This definition is chosen to allow capture analysis to continue with the return
3452+
value in the usual fashion.
3453+
3454+
The following describes possible ways to capture a pointer in more detail,
3455+
where unqualified uses of the word "capture" refer to capturing both address
3456+
and provenance.
33463457

33473458
1. The call stores any bit of the pointer carrying information into a place,
33483459
and the stored bits can be read from the place by the caller after this call
@@ -3381,30 +3492,30 @@ hold:
33813492
@lock = global i1 true
33823493

33833494
define void @f(ptr %a) {
3384-
store ptr %a, ptr* @glb
3495+
store ptr %a, ptr @glb
33853496
store atomic i1 false, ptr @lock release ; %a is captured because another thread can safely read @glb
33863497
store ptr null, ptr @glb
33873498
ret void
33883499
}
33893500

3390-
3. The call's behavior depends on any bit of the pointer carrying information.
3501+
3. The call's behavior depends on any bit of the pointer carrying information
3502+
(address capture only).
33913503

33923504
.. code-block:: llvm
33933505

33943506
@glb = global i8 0
33953507

33963508
define void @f(ptr %a) {
33973509
%c = icmp eq ptr %a, @glb
3398-
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
3510+
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; captures address of %a only
33993511
BB_EXIT:
34003512
call void @exit()
34013513
unreachable
34023514
BB_CONTINUE:
34033515
ret void
34043516
}
34053517

3406-
4. The pointer is used in a volatile access as its address.
3407-
3518+
4. The pointer is used as the pointer operand of a volatile access.
34083519

34093520
.. _volatile:
34103521

llvm/include/llvm/AsmParser/LLParser.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,7 @@ namespace llvm {
379379
bool inAttrGrp, LocTy &BuiltinLoc);
380380
bool parseRangeAttr(AttrBuilder &B);
381381
bool parseInitializesAttr(AttrBuilder &B);
382+
bool parseCapturesAttr(AttrBuilder &B);
382383
bool parseRequiredTypeAttr(AttrBuilder &B, lltok::Kind AttrToken,
383384
Attribute::AttrKind AttrKind);
384385

llvm/include/llvm/AsmParser/LLToken.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,11 @@ enum Kind {
207207
kw_inaccessiblememonly,
208208
kw_inaccessiblemem_or_argmemonly,
209209

210+
// Captures attribute:
211+
kw_address,
212+
kw_provenance,
213+
kw_read_provenance,
214+
210215
// nofpclass attribute:
211216
kw_all,
212217
kw_nan,

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -788,6 +788,7 @@ enum AttributeKindCodes {
788788
ATTR_KIND_NO_EXT = 99,
789789
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
790790
ATTR_KIND_SANITIZE_TYPE = 101,
791+
ATTR_KIND_CAPTURES = 102,
791792
};
792793

793794
enum ComdatSelectionKindCodes {

llvm/include/llvm/IR/Attributes.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,9 @@ class Attribute {
284284
/// Returns memory effects.
285285
MemoryEffects getMemoryEffects() const;
286286

287+
/// Returns information from captures attribute.
288+
CaptureInfo getCaptureInfo() const;
289+
287290
/// Return the FPClassTest for nofpclass
288291
FPClassTest getNoFPClass() const;
289292

@@ -436,6 +439,7 @@ class AttributeSet {
436439
UWTableKind getUWTableKind() const;
437440
AllocFnKind getAllocKind() const;
438441
MemoryEffects getMemoryEffects() const;
442+
CaptureInfo getCaptureInfo() const;
439443
FPClassTest getNoFPClass() const;
440444
std::string getAsString(bool InAttrGrp = false) const;
441445

@@ -1260,6 +1264,9 @@ class AttrBuilder {
12601264
/// Add memory effect attribute.
12611265
AttrBuilder &addMemoryAttr(MemoryEffects ME);
12621266

1267+
/// Add captures attribute.
1268+
AttrBuilder &addCapturesAttr(CaptureInfo CI);
1269+
12631270
// Add nofpclass attribute
12641271
AttrBuilder &addNoFPClassAttr(FPClassTest NoFPClassMask);
12651272

llvm/include/llvm/IR/Attributes.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@ def NoCallback : EnumAttr<"nocallback", IntersectAnd, [FnAttr]>;
183183
/// Function creates no aliases of pointer.
184184
def NoCapture : EnumAttr<"nocapture", IntersectAnd, [ParamAttr]>;
185185

186+
/// Specify how the pointer may be captured.
187+
def Captures : IntAttr<"captures", IntersectCustom, [ParamAttr]>;
188+
186189
/// Function is not a source of divergence.
187190
def NoDivergenceSource : EnumAttr<"nodivergencesource", IntersectAnd, [FnAttr]>;
188191

llvm/include/llvm/Support/ModRef.h

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,93 @@ raw_ostream &operator<<(raw_ostream &OS, MemoryEffects RMRB);
273273
// Legacy alias.
274274
using FunctionModRefBehavior = MemoryEffects;
275275

276+
/// Components of the pointer that may be captured.
277+
enum class CaptureComponents : uint8_t {
278+
None = 0,
279+
Address = (1 << 0),
280+
ReadProvenance = (1 << 1),
281+
Provenance = (1 << 2) | ReadProvenance,
282+
All = Address | Provenance,
283+
LLVM_MARK_AS_BITMASK_ENUM(Provenance),
284+
};
285+
286+
inline bool capturesNothing(CaptureComponents CC) {
287+
return CC == CaptureComponents::None;
288+
}
289+
290+
inline bool capturesAnything(CaptureComponents CC) {
291+
return CC != CaptureComponents::None;
292+
}
293+
294+
inline bool capturesAddress(CaptureComponents CC) {
295+
return (CC & CaptureComponents::Address) != CaptureComponents::None;
296+
}
297+
298+
inline bool capturesReadProvenanceOnly(CaptureComponents CC) {
299+
return (CC & CaptureComponents::Provenance) ==
300+
CaptureComponents::ReadProvenance;
301+
}
302+
303+
inline bool capturesFullProvenance(CaptureComponents CC) {
304+
return (CC & CaptureComponents::Provenance) == CaptureComponents::Provenance;
305+
}
306+
307+
raw_ostream &operator<<(raw_ostream &OS, CaptureComponents CC);
308+
309+
/// Represents which components of the pointer may be captured and whether
310+
/// the capture is via the return value only. This represents the captures(...)
311+
/// attribute in IR.
312+
///
313+
/// For more information on the precise semantics see LangRef.
314+
class CaptureInfo {
315+
CaptureComponents Components;
316+
bool ReturnOnly;
317+
318+
public:
319+
CaptureInfo(CaptureComponents Components, bool ReturnOnly = false)
320+
: Components(Components),
321+
ReturnOnly(capturesAnything(Components) && ReturnOnly) {}
322+
323+
/// Create CaptureInfo that may capture all components of the pointer.
324+
static CaptureInfo all() { return CaptureInfo(CaptureComponents::All); }
325+
326+
/// Get the potentially captured components of the pointer.
327+
operator CaptureComponents() const { return Components; }
328+
329+
/// Whether the pointer is captured through the return value only.
330+
bool isReturnOnly() const { return ReturnOnly; }
331+
332+
bool operator==(CaptureInfo Other) const {
333+
return Components == Other.Components && ReturnOnly == Other.ReturnOnly;
334+
}
335+
336+
bool operator!=(CaptureInfo Other) const { return !(*this == Other); }
337+
338+
/// Compute union of CaptureInfos.
339+
CaptureInfo operator|(CaptureInfo Other) const {
340+
return CaptureInfo(Components | Other.Components,
341+
ReturnOnly && Other.ReturnOnly);
342+
}
343+
344+
/// Compute intersection of CaptureInfos.
345+
CaptureInfo operator&(CaptureInfo Other) const {
346+
return CaptureInfo(Components & Other.Components,
347+
ReturnOnly || Other.ReturnOnly);
348+
}
349+
350+
static CaptureInfo createFromIntValue(uint32_t Data) {
351+
return CaptureInfo(CaptureComponents(Data >> 1), Data & 1);
352+
}
353+
354+
/// Convert CaptureInfo into an encoded integer value (used by captures
355+
/// attribute).
356+
uint32_t toIntValue() const {
357+
return (uint32_t(Components) << 1) | ReturnOnly;
358+
}
359+
};
360+
361+
raw_ostream &operator<<(raw_ostream &OS, CaptureInfo Info);
362+
276363
} // namespace llvm
277364

278365
#endif

llvm/lib/AsmParser/LLLexer.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,9 @@ lltok::Kind LLLexer::LexIdentifier() {
704704
KEYWORD(argmemonly);
705705
KEYWORD(inaccessiblememonly);
706706
KEYWORD(inaccessiblemem_or_argmemonly);
707+
KEYWORD(address);
708+
KEYWORD(provenance);
709+
KEYWORD(read_provenance);
707710

708711
// nofpclass attribute
709712
KEYWORD(all);

0 commit comments

Comments
 (0)