Skip to content

Commit 582b976

Browse files
committed
feat: Deliver Connect Library
This change delivers the Connect library. Connect (aka dynamo.connect) is a Pythonic wrapper of NIXL, with specialized Dynamo support aspects, which specialized in worker-to-worker data transfer (as opposed to KV cache data transfer). Includes README and API documentation. Renamed `connect.SerializedRequest` to `connect.RdmaMetadata` for type-purpose clarity reasons. This change will require updating the Multimodal Example prior to deleting the previous incarnation of the connect library.
1 parent 3363d8b commit 582b976

File tree

12 files changed

+2326
-0
lines changed

12 files changed

+2326
-0
lines changed

components/connect/README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Dynamo Connect
19+
20+
Dynamo connect provides a Pythonic interface to the NIXL base RDMA subsystem via a set of Python classes.
21+
The primary goal of this library to simplify the integration of NIXL based RDMA into inference applications.
22+
23+
All operations using the Connect library begin with the [`Connector`](connector.md) class and the type of operation required.
24+
There are four types of supported operations:
25+
26+
1. **Register local readable memory**:
27+
28+
Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to read from.
29+
30+
2. **Register local writable memory**:
31+
32+
Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to write to.
33+
34+
3. **Read from registered, remote memory**:
35+
36+
Read remote memory buffer(s), registered by a remote worker to be readable, into local memory buffer(s).
37+
38+
4. **Write to registered, remote memory**:
39+
40+
Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
41+
42+
By connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed.
43+
Given the list above, the correct pairing of operations would be 1 & 3 or 2 & 4.
44+
Where one side is a "(read|write)-able operation" and the other is its correctly paired "(read|write) operation".
45+
Specifically, a read operation must be paired with a readable operation, and a write operation must be paired with a writable operation.
46+
47+
## Examples
48+
49+
### Generic Example
50+
51+
In the diagram below, Local creates a [`WritableOperation`](writable_operation.md) intended to receive data from Remote.
52+
Local then sends metadata about the requested RDMA operation to Remote.
53+
Remote then uses the metadata to create a [`WriteOperation`](write_operation.md) which will perform the GPU Direct RDMA memory transfer from Remote's GPU memory to Local's GPU memory.
54+
55+
```mermaid
56+
---
57+
title: Write Operation Between Two Workers
58+
---
59+
flowchart LR
60+
c1[Remote] --"3: .begin_write()"--- WriteOperation
61+
WriteOperation e1@=="4: GPU Direct RDMA"==> WritableOperation
62+
WritableOperation --"1: .create_writable()"--- c2[Local]
63+
c2 e2@--"2: RDMA Metadata via HTTP"--> c1
64+
e1@{ animate: true; }
65+
e2@{ animate: true; }
66+
```
67+
68+
### Multimodal Example
69+
70+
In the case of the [Dynamo Multimodal Disaggregated Example](../../examples/multimodal/README.md):
71+
72+
1. The HTTP frontend accepts a text prompt and a URL to an image.
73+
74+
2. The prompt and URL are then enqueued with the Processor before being dispatched to the first available Decode Worker.
75+
76+
3. Decode Worker then requests a Prefill Worker to provide key-value data for the LLM powering the Decode Worker.
77+
78+
4. Prefill Worker then requests that the image be processed and provided as embeddings by the Encode Worker.
79+
80+
5. Encode Worker acquires the image, processes it, performs inference on the image using a specialized vision model, and finally provides the embeddings to Prefill Worker.
81+
82+
6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.
83+
84+
7. Finally, Decode Worker performs the requested inference.
85+
86+
```mermaid
87+
---
88+
title: Multimodal Disaggregated Workflow
89+
---
90+
flowchart LR
91+
p0[HTTP Frontend] i0@--"text prompt"-->p1[Processor]
92+
p0 i1@--"url"-->p1
93+
p1 i2@--"prompt"-->dw[Decode Worker]
94+
p1 i3@--"url"-->dw
95+
dw i4@--"prompt"-->pw[Prefill Worker]
96+
dw i5@--"url"-->pw
97+
pw i6@--"url"-->ew[Encode Worker]
98+
ew o0@=="image embeddings"==>pw
99+
pw o1@=="kv_cache updates"==>dw
100+
dw o2@--"inference results"-->p0
101+
102+
i0@{ animate: true; }
103+
i1@{ animate: true; }
104+
i2@{ animate: true; }
105+
i3@{ animate: true; }
106+
i4@{ animate: true; }
107+
i5@{ animate: true; }
108+
i6@{ animate: true; }
109+
o0@{ animate: true; }
110+
o1@{ animate: true; }
111+
o2@{ animate: true; }
112+
```
113+
114+
> [!Note]
115+
> In this example, it is the data transfer between the Prefill Worker and the Encode Worker that utilizes the Dynamo Connect library.
116+
> The KV Cache transfer between Decode Worker and Prefill Worker utilizes the NIXL base RDMA subsystem directly without using the Dynamo Connect library.
117+
118+
#### Code Examples
119+
120+
See [prefill_worker](../../examples/multimodal/components/prefill_worker.py#L199) or [decode_worker](../../examples/multimodal/components/decode_worker.py#L239),
121+
for how they coordinate directly with the Encode Worker by creating a [`WritableOperation`](writable_operation.md),
122+
sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
123+
124+
See [encode_worker](../../examples/multimodal/components/encode_worker.py#L190),
125+
for how the resulting embeddings are registered with the RDMA subsystem by creating a [`Descriptor`](descriptor.md),
126+
a [`WriteOperation`](write_operation.md) is created using the metadata provided by the requesting worker,
127+
and the worker awaits for the data transfer to complete for yielding a response.
128+
129+
## Python Classes
130+
131+
- [Connector](connector.md)
132+
- [Descriptor](descriptor.md)
133+
- [Device](device.md)
134+
- [ReadOperation](read_operation.md)
135+
- [ReadableOperation](readable_operation.md)
136+
- [SerializedRequest](serialized_request.md)
137+
- [WritableOperation](writable_operation.md)
138+
- [WriteOperation](write_operation.md)
139+
140+
141+
## References
142+
143+
- [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) @ [GitHub](https://github.com/ai-dynamo/dynamo)
144+
- [NVIDIA Dynamo Connect](https://github.com/ai-dynamo/dynamo/tree/main/components/connect)
145+
- [NVIDIA Inference Transfer Library (NIXL)](https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/#nvidia_inference_transfer_library_nixl_low-latency_hardware-agnostic_communication%C2%A0) @ [GitHub](https://github.com/ai-dynamo/nixl)
146+
- [Dynamo Multimodal Example](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal)
147+
- [NVIDIA GPU Direct](https://developer.nvidia.com/gpudirect)

components/connect/connector.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# dynamo.connect.Connector
19+
20+
Core class for managing the connection between workers in a distributed environment.
21+
Use this class to create readable and writable operations, or read and write data to remote workers.
22+
23+
This class is responsible for interfacing with the NIXL-based RDMA subsystem and providing a "Pythonic" interface
24+
with which to utilize GPU Direct RDMA accelerated data transfers between models hosted by different workers in a Dynamo pipeline.
25+
The connector provides two methods of moving data between workers:
26+
27+
- Preparing local memory to be written to by a remote worker.
28+
29+
- Preparing local memory to be read by a remote worker.
30+
31+
In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
32+
The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
33+
The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
34+
provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
35+
36+
The RDMA metadata must be provided to the remote worker expected to complete the operation.
37+
The metadata contains required information (identifiers, keys, etc.) which enables the remote worker to interact with the provided memory.
38+
39+
> [!Warning]
40+
> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
41+
> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
42+
43+
44+
## Example Usage
45+
46+
```python
47+
@async_on_start
48+
async def async_init(self):
49+
runtime = dynamo_context["runtime"]
50+
51+
self.connector = dynamo.connect.Connector(runtime=runtime)
52+
await self.connector.initialize()
53+
```
54+
55+
> [!Tip]
56+
> See [`ReadOperation`](read_operation.md#example-usage), [`ReadableOperation`](readable_operation.md#example-usage),
57+
> [`WritableOperation`](writable_operation.md#example-usage), and [`WriteOperation`](write_operation.md#example-usage)
58+
> for additional examples.
59+
60+
61+
## Methods
62+
63+
### `begin_read`
64+
65+
Creates a [`ReadOperation`](read_operation.md) for transferring data from a remote worker.
66+
67+
To create the operation, the serialized request from a remote worker's [`ReadableOperation`](readable_operation.md)
68+
along with a matching set of local memory descriptors which reference memory intended to receive data from the remote worker
69+
must be provided.
70+
The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
71+
72+
Once created, data transfer will begin immediately.
73+
74+
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
75+
therefore the operation should be awaited until completed unless cancellation is intended.
76+
77+
Use [`.wait_for_completion()`](read_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
78+
79+
### `begin_write`
80+
81+
Creates a [`WriteOperation`](write_operation.md) for transferring data to a remote worker.
82+
83+
To create the operation, the serialized request from a remote worker's [`WritableOperation`](writable_operation.md)
84+
along with a matching set of local memory descriptors which reference memory to be transferred to the remote worker
85+
must be provided.
86+
The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
87+
88+
Once created, data transfer will begin immediately.
89+
90+
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
91+
therefore the operation should be awaited until completed unless cancellation is intended.
92+
93+
Use [`.wait_for_completion()`](write_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
94+
95+
### `create_readable`
96+
97+
Creates a [`ReadableOperation`](readable_operation.md) for transferring data to a remote worker.
98+
99+
To create the operation, a set of local memory descriptors must be provided that reference memory intended to be transferred to a remote worker.
100+
Once created, the memory referenced by the provided descriptors becomes immediately readable by a remote worker with the necessary metadata.
101+
The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
102+
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
103+
104+
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
105+
therefore the operation should be awaited until completed unless cancellation is intended.
106+
107+
Use [`.wait_for_completion()`](readable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
108+
109+
### `create_writable`
110+
111+
Creates a [`WritableOperation`](writable_operation.md) for transferring data from a remote worker.
112+
113+
To create the operation, a set of local memory descriptors must be provided which reference memory intended to receive data from a remote worker.
114+
Once created, the memory referenced by the provided descriptors becomes immediately writable by a remote worker with the necessary metadata.
115+
The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
116+
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
117+
118+
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
119+
therefore the operation should be awaited until completed unless cancellation is intended.
120+
121+
Use [`.wait_for_completion()`](writable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
122+
123+
124+
## Related Classes
125+
126+
- [Descriptor](descriptor.md)
127+
- [Device](device.md)
128+
- [OperationStatus](operation_status.md)
129+
- [RdmaMetadata](rdma_metadata.md)
130+
- [ReadOperation](read_operation.md)
131+
- [ReadableOperation](readable_operation.md)
132+
- [WritableOperation](writable_operation.md)
133+
- [WriteOperation](write_operation.md)

components/connect/descriptor.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
# dynamo.connect.Descriptor
18+
19+
Memory descriptor that ensures memory is registered with the NIXL base RDMA subsystem.
20+
Memory must be registered with the RDMA subsystem to enable interaction with the memory.
21+
22+
Descriptor objects are administrative and do not copy, move, or otherwise modify the registered memory.
23+
24+
There are four ways to create a descriptor:
25+
26+
1. From a `torch.Tensor` object. Device information will be derived from the provided object.
27+
28+
2. From a `tuple` containing either a NumPy or CuPy `ndarray` and information describing where the memory resides (Host/CPU vs GPU).
29+
30+
3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory.
31+
32+
4. From a `tuple` comprised of the address of the memory, its size in bytes, and device information.
33+
An optional reference to a Python object can be provided to avoid garbage collection issues.
34+
35+
36+
## Related Classes
37+
38+
- [Connector](connector.md)
39+
- [Device](device.md)
40+
- [OperationStatus](operation_status.md)
41+
- [RdmaMetadata](rdma_metadata.md)
42+
- [ReadOperation](read_operation.md)
43+
- [ReadableOperation](readable_operation.md)
44+
- [WritableOperation](writable_operation.md)
45+
- [WriteOperation](write_operation.md)

components/connect/device.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# dynamo.connect.Device
19+
20+
`Device` class describes the device a given allocation resides in.
21+
Usually host (`"cpu"`) or GPU (`"cuda"`) memory.
22+
23+
When a system contains multiple GPU devices, specific GPU devices can be identified by including their ordinal index number.
24+
For example, to reference the second GPU in a system `"cuda:1"` can be used.
25+
26+
By default, when `"cuda"` is provided, it is assumed to be `"cuda:0"` or the first GPU enumerated by the system.
27+
28+
29+
## Properties
30+
31+
### `id`
32+
33+
Gets the identity, or ordinal, of the device.
34+
35+
When the device is the [`HOST`](device_kind.md#host), this value is always `0`.
36+
37+
When the device is a [`GPU`](device_kind.md#cuda), this value identifies a specific GPU.
38+
39+
### `kind`
40+
41+
Gets the [`DeviceKind`](device_kind.md) of device the instance references.
42+
43+
44+
## Related Classes
45+
46+
- [Connector](connector.md)
47+
- [Descriptor](descriptor.md)
48+
- [OperationStatus](operation_status.md)
49+
- [ReadOperation](read_operation.md)
50+
- [ReadableOperation](readable_operation.md)
51+
- [RdmaMetadata](rdma_metadata.md)
52+
- [WritableOperation](writable_operation.md)
53+
- [WriteOperation](write_operation.md)

0 commit comments

Comments
 (0)