data-apis · rgommers · Feb 21, 2021 · Dec 30, 2020 · Dec 30, 2020 · Jan 2, 2021
diff --git a/spec/API_specification/array_object.md b/spec/API_specification/array_object.md
@@ -366,6 +366,23 @@ Evaluates `x1_i & x2_i` for each element `x1_i` of an array instance `x1` with t
 Element-wise results must equal the results returned by the equivalent element-wise function [`bitwise_and(x1, x2)`](elementwise_functions.md#logical_andx1-x2-).
 ```
 
+(method-__dlpack__)=
+### \_\_dlpack\_\_(/, *, stream=None)
+
+Exports the array as a DLPack capsule, for consumption by {ref}`function-from_dlpack`.
+
+#### Parameters
+
+-   **stream**: _Optional\[int\]_
+
+    -   If given, the CUDA or ROCm stream number the consumer will use. Default is `None`, which means the legacy default stream.
+
+#### Returns
+
+-   **capsule**: _&lt;PyCapsule&gt;_
+
+    -   A DLPack capsule for the array. See {ref}`data-interchange` for details.
+
 (method-__eq__)=
 ### \_\_eq\_\_(x1, x2, /)
 

diff --git a/spec/API_specification/creation_functions.md b/spec/API_specification/creation_functions.md
@@ -116,6 +116,27 @@ Returns a two-dimensional array with ones on the `k`th diagonal and zeros elsewh
 
     -   an array where all elements are equal to zero, except for the `k`th diagonal, whose values are equal to one.
 
+(function-from_dlpack)=
+### from_dlpack(x, /)
+
+Returns a new array containing the data from another (array) object with a `__dlpack__` method.
+
+#### Parameters
+
+-   **x**: _object_
+
+    -   input (array) object.
+
+#### Returns
+
+-   **out**: _&lt;array&gt;_
+
+    -   an array containing the data in `x`.
+
+        ```{note}
+        The returned array may be either a copy or a view. See {ref}`data-interchange` for details.
+        ```
+
 (function-full)=
 ### full(shape, fill_value, /, *, dtype=None)
 

diff --git a/spec/_static/images/DLPack_diagram.png b/spec/_static/images/DLPack_diagram.png
diff --git a/spec/conf.py b/spec/conf.py
@@ -50,6 +50,7 @@
 
 # MyST options
 myst_heading_anchors = 3
+myst_enable_extensions = ["colon_fence"]
 
 # -- Options for HTML output -------------------------------------------------
 

diff --git a/spec/design_topics/data_interchange.md b/spec/design_topics/data_interchange.md
@@ -58,18 +58,66 @@ means the object it is attached to must return a `numpy.ndarray`
 containing the data the object holds).
 ```
 
-TODO: design an appropriate Python API for DLPACK (`to_dlpack` followed by `from_dlpack` is a little clunky, we'd like it to work more like the buffer protocol does on CPU, with a single constructor function).
 
-TODO: specify the expected behaviour with copy/view/move/shared-memory semantics in detail.
+## Syntax for data interchange with DLPack
 
+The array API will offer the following syntax for data interchange:
 
-```{note}
+1. A `from_dlpack(x)` function, which accepts (array) objects with a
+   `__dlpack__` method and uses that method to construct a new array
+   containing the data from `x`.
+2. A `__dlpack__(self, stream=None)` method on the array object, which
+   will be called from within `from_dlpack`.
+
+
+## Semantics
+
+DLPack describe the memory layout of strided, n-dimensional arrays.
+When a user calls `y = from_dlpack(x)`, the library implementing `x` (the
+"producer") will provide access to the data from `x` to the library
+containing `from_dlpack` (the "consumer"). If possible, this must be
+zero-copy (i.e. `y` will be a _view_ on `x`). If not possible, that library
+may make a copy of the data. In both cases:
+- the producer keeps owning the memory
+- `y` may or may not be a view, therefore the user must keep the
+   recommendation to avoid mutating `y` in mind - see
+   {ref}`copyview-mutability`.
+- Both `x` and `y` may continue to be used just like arrays created in other ways.
 
 If an array that is accessed via the interchange protocol lives on a
-device that the requesting library does not support, one of two things
-must happen: moving data to another device, or raising an exception.
-Device transfers are typically expensive, hence doing that silently can
-lead to hard to detect performance issues. Hence it is recommended to
-raise an exception, and let the user explicitly enable device transfers
-via, e.g., a `force=False` keyword that they can set to `True`.
-```
+device that the requesting library does not support, it is recommended to
+raise a `TypeError`.
+
+Stream handling through the `stream` keyword applies to CUDA and ROCm. The
+producer must pass the stream it will use to the consumer, the consumer must
+synchronize only when necessary. In the common case of the default stream
+being used, synchronization will be unnecessary so asynchronous execution is
+enabled.
+
+
+## Implementation
+
+_Note that while this API standard largely tries to avoid discussing implementation details, some discussion and requirements are needed here because data interchange requires coordination between implementers on, e.g., memory management._
+
+![Diagram of DLPack structs](/_static/images/DLPack_diagram.png)
+
+_DLPack diagram. Dark blue are the structs it defines, light blue struct members, gray text enum values of supported devices and data types._
+
+The `__dlpack__` method will produce a `PyCapsule` containing a
+`DLPackManagedTensor`, which will be consumed immediately within
+`from_dlpack` - therefore it is consumed exactly once, and it will not be
+visible to users of the Python API.
+
+The consumer must set the PyCapsule name to `"used_dltensor"`, and call the
+`deleter` of the `DLPackManagedTensor` when it no longer needs the data.
+
+For further details on DLPack design and how to implement support for it,
+refer to [github.com/dmlc/dlpack](https://github.com/dmlc/dlpack).
+
+:::{warning}
+DLPack contains a `device_id`, which will be the device ID (an integer, `0, 1, ...`) which the producer library uses. In practice this will likely be the same numbering as that of the consumer, however that is not guaranteed. Depending on the hardware type, it may be possible for the consumer library implementation to look up the actual device from the pointer to the data - this is possible for example for CUDA device pointers.
+
+It is recommended that implementers of this array API document whether the
+`.device` attribute of the array returned from `from_dlpack` is guaranteed to
+be a certain order or not.
+:::