ai-dynamo
diff --git a/‎components/connect/README.md
Lines changed: 162 additions & 0 deletions b/‎components/connect/README.md
Lines changed: 162 additions & 0 deletions
diff --git a/‎components/connect/connector.md
Lines changed: 133 additions & 0 deletions b/‎components/connect/connector.md
Lines changed: 133 additions & 0 deletions
diff --git a/‎components/connect/descriptor.md
Lines changed: 45 additions & 0 deletions b/‎components/connect/descriptor.md
Lines changed: 45 additions & 0 deletions
diff --git a/‎components/connect/device.md
Lines changed: 53 additions & 0 deletions b/‎components/connect/device.md
Lines changed: 53 additions & 0 deletions
@@ -0,0 +1,162 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Dynamo Connect
+
+Dynamo connect provides a Pythonic interface to the NIXL base RDMA subsystem via a set of Python classes.
+The primary goal of this library to simplify the integration of NIXL based RDMA into inference applications.
+
+All operations using the Connect library begin with the [`Connector`](connector.md) class and the type of operation required.
+There are four types of supported operations:
+
+ 1. **Register local readable memory**:
+
+    Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to read from.
+
+ 2. **Register local writable memory**:
+
+    Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to write to.
+
+ 3. **Read from registered, remote memory**:
+
+    Read remote memory buffer(s), registered by a remote worker to be readable, into local memory buffer(s).
+
+ 4. **Write to registered, remote memory**:
+
+    Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
+
+By connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed.
+Given the list above, the correct pairing of operations would be 1 & 3 or 2 & 4.
+Where one side is a "(read|write)-able operation" and the other is its correctly paired "(read|write) operation".
+Specifically, a read operation must be paired with a readable operation, and a write operation must be paired with a writable operation.
+
+```mermaid
+sequenceDiagram
+    participant LocalWorker
+    participant RemoteWorker
+    participant NIXL
+
+    LocalWorker ->> NIXL: Register memory (Descriptor)
+    RemoteWorker ->> NIXL: Register memory (Descriptor)
+    LocalWorker ->> LocalWorker: Create Readable/WritableOperation
+    LocalWorker ->> RemoteWorker: Send RDMA metadata (via HTTP/TCP+NATS)
+    RemoteWorker ->> NIXL: Begin Read/WriteOperation with metadata
+    NIXL -->> RemoteWorker: Data transfer (RDMA)
+    RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
+```
+
+## Examples
+
+### Generic Example
+
+In the diagram below, Local creates a [`WritableOperation`](writable_operation.md) intended to receive data from Remote.
+Local then sends metadata about the requested RDMA operation to Remote.
+Remote then uses the metadata to create a [`WriteOperation`](write_operation.md) which will perform the GPU Direct RDMA memory transfer from Remote's GPU memory to Local's GPU memory.
+
+```mermaid
+---
+title: Write Operation Between Two Workers
+---
+flowchart LR
+  c1[Remote] --"3: .begin_write()"--- WriteOperation
+  WriteOperation e1@=="4: GPU Direct RDMA"==> WritableOperation
+  WritableOperation --"1: .create_writable()"--- c2[Local]
+  c2 e2@--"2: RDMA Metadata via HTTP"--> c1
+  e1@{ animate: true; }
+  e2@{ animate: true; }
+```
+
+### Multimodal Example
+
+In the case of the [Dynamo Multimodal Disaggregated Example](../../examples/multimodal/README.md):
+
+ 1. The HTTP frontend accepts a text prompt and a URL to an image.
+
+ 2. The prompt and URL are then enqueued with the Processor before being dispatched to the first available Decode Worker.
+
+ 3. Decode Worker then requests a Prefill Worker to provide key-value data for the LLM powering the Decode Worker.
+
+ 4. Prefill Worker then requests that the image be processed and provided as embeddings by the Encode Worker.
+
+ 5. Encode Worker acquires the image, processes it, performs inference on the image using a specialized vision model, and finally provides the embeddings to Prefill Worker.
+
+ 6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.
+
+ 7. Finally, Decode Worker performs the requested inference.
+
+```mermaid
+---
+title: Multimodal Disaggregated Workflow
+---
+flowchart LR
+  p0[HTTP Frontend] i0@--"text prompt"-->p1[Processor]
+  p0 i1@--"url"-->p1
+  p1 i2@--"prompt"-->dw[Decode Worker]
+  p1 i3@--"url"-->dw
+  dw i4@--"prompt"-->pw[Prefill Worker]
+  dw i5@--"url"-->pw
+  pw i6@--"url"-->ew[Encode Worker]
+  ew o0@=="image embeddings"==>pw
+  pw o1@=="kv_cache updates"==>dw
+  dw o2@--"inference results"-->p0
+
+  i0@{ animate: true; }
+  i1@{ animate: true; }
+  i2@{ animate: true; }
+  i3@{ animate: true; }
+  i4@{ animate: true; }
+  i5@{ animate: true; }
+  i6@{ animate: true; }
+  o0@{ animate: true; }
+  o1@{ animate: true; }
+  o2@{ animate: true; }
+```
+
+> [!Note]
+> In this example, it is the data transfer between the Prefill Worker and the Encode Worker that utilizes the Dynamo Connect library.
+> The KV Cache transfer between Decode Worker and Prefill Worker utilizes the NIXL base RDMA subsystem directly without using the Dynamo Connect library.
+
+#### Code Examples
+
+See [prefill_worker](../../examples/multimodal/components/prefill_worker.py#L199) or [decode_worker](../../examples/multimodal/components/decode_worker.py#L239),
+for how they coordinate directly with the Encode Worker by creating a [`WritableOperation`](writable_operation.md),
+sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
+
+See [encode_worker](../../examples/multimodal/components/encode_worker.py#L190),
+for how the resulting embeddings are registered with the RDMA subsystem by creating a [`Descriptor`](descriptor.md),
+a [`WriteOperation`](write_operation.md) is created using the metadata provided by the requesting worker,
+and the worker awaits for the data transfer to complete for yielding a response.
+
+## Python Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [SerializedRequest](serialized_request.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
+
+
+## References
+
+  - [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) @ [GitHub](https://github.com/ai-dynamo/dynamo)
+    - [NVIDIA Dynamo Connect](https://github.com/ai-dynamo/dynamo/tree/main/components/connect)
+  - [NVIDIA Inference Transfer Library (NIXL)](https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/#nvidia_inference_transfer_library_nixl_low-latency_hardware-agnostic_communication%C2%A0) @ [GitHub](https://github.com/ai-dynamo/nixl)
+  - [Dynamo Multimodal Example](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal)
+  - [NVIDIA GPU Direct](https://developer.nvidia.com/gpudirect)
@@ -0,0 +1,133 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.connect.Connector
+
+Core class for managing the connection between workers in a distributed environment.
+Use this class to create readable and writable operations, or read and write data to remote workers.
+
+This class is responsible for interfacing with the NIXL-based RDMA subsystem and providing a "Pythonic" interface
+with which to utilize GPU Direct RDMA accelerated data transfers between models hosted by different workers in a Dynamo pipeline.
+The connector provides two methods of moving data between workers:
+
+  - Preparing local memory to be written to by a remote worker.
+
+  - Preparing local memory to be read by a remote worker.
+
+In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
+The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
+The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
+provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
+
+The RDMA metadata must be provided to the remote worker expected to complete the operation.
+The metadata contains required information (identifiers, keys, etc.) which enables the remote worker to interact with the provided memory.
+
+> [!Warning]
+> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
+> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
+
+
+## Example Usage
+
+```python
+    @async_on_start
+    async def async_init(self):
+      runtime = dynamo_context["runtime"]
+
+      self.connector = dynamo.connect.Connector(runtime=runtime)
+      await self.connector.initialize()
+```
+
+> [!Tip]
+> See [`ReadOperation`](read_operation.md#example-usage), [`ReadableOperation`](readable_operation.md#example-usage),
+> [`WritableOperation`](writable_operation.md#example-usage), and [`WriteOperation`](write_operation.md#example-usage)
+> for additional examples.
+
+
+## Methods
+
+### `begin_read`
+
+Creates a [`ReadOperation`](read_operation.md) for transferring data from a remote worker.
+
+To create the operation, the serialized request from a remote worker's [`ReadableOperation`](readable_operation.md)
+along with a matching set of local memory descriptors which reference memory intended to receive data from the remote worker
+must be provided.
+The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](read_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `begin_write`
+
+Creates a [`WriteOperation`](write_operation.md) for transferring data to a remote worker.
+
+To create the operation, the serialized request from a remote worker's [`WritableOperation`](writable_operation.md)
+along with a matching set of local memory descriptors which reference memory to be transferred to the remote worker
+must be provided.
+The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](write_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `create_readable`
+
+Creates a [`ReadableOperation`](readable_operation.md) for transferring data to a remote worker.
+
+To create the operation, a set of local memory descriptors must be provided that reference memory intended to be transferred to a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately readable by a remote worker with the necessary metadata.
+The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](readable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `create_writable`
+
+Creates a [`WritableOperation`](writable_operation.md) for transferring data from a remote worker.
+
+To create the operation, a set of local memory descriptors must be provided which reference memory intended to receive data from a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately writable by a remote worker with the necessary metadata.
+The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](writable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+
+## Related Classes
+
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
@@ -0,0 +1,45 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# dynamo.connect.Descriptor
+
+Memory descriptor that ensures memory is registered with the NIXL base RDMA subsystem.
+Memory must be registered with the RDMA subsystem to enable interaction with the memory.
+
+Descriptor objects are administrative and do not copy, move, or otherwise modify the registered memory.
+
+There are four ways to create a descriptor:
+
+ 1. From a `torch.Tensor` object. Device information will be derived from the provided object.
+
+ 2. From a `tuple` containing either a NumPy or CuPy `ndarray` and information describing where the memory resides (Host/CPU vs GPU).
+
+ 3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory.
+
+ 4. From a `tuple` comprised of the address of the memory, its size in bytes, and device information.
+    An optional reference to a Python object can be provided to avoid garbage collection issues.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
@@ -0,0 +1,53 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.connect.Device
+
+`Device` class describes the device a given allocation resides in.
+Usually host (`"cpu"`) or GPU (`"cuda"`) memory.
+
+When a system contains multiple GPU devices, specific GPU devices can be identified by including their ordinal index number.
+For example, to reference the second GPU in a system `"cuda:1"` can be used.
+
+By default, when `"cuda"` is provided, it is assumed to be `"cuda:0"` or the first GPU enumerated by the system.
+
+
+## Properties
+
+### `id`
+
+Gets the identity, or ordinal, of the device.
+
+When the device is the [`HOST`](device_kind.md#host), this value is always `0`.
+
+When the device is a [`GPU`](device_kind.md#cuda), this value identifies a specific GPU.
+
+### `kind`
+
+Gets the [`DeviceKind`](device_kind.md) of device the instance references.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [OperationStatus](operation_status.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)