Skip to content

Commit 1d229e3

Browse files
committed
Merge branch 'main' into feature/state_management_rebase
2 parents 50b34ac + f3d03d9 commit 1d229e3

File tree

2 files changed

+269
-72
lines changed

2 files changed

+269
-72
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,28 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
206206

207207
`ENABLE_TENSOR_FUSER`
208208

209+
### Support
210+
211+
#### Model Instance Group Kind
212+
213+
The PyTorch backend supports the following kinds of
214+
[Model Instance Groups](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
215+
where the input tensors are placed as follows:
216+
217+
* `KIND_GPU`: Inputs are prepared on the GPU device associated with the model
218+
instance.
219+
220+
* `KIND_CPU`: Inputs are prepared on the CPU.
221+
222+
* `KIND_MODEL`: Inputs are prepared on the CPU. When loading the model, the
223+
backend does not choose the GPU device for the model; instead, it respects the
224+
device(s) specified in the model and uses them as they are during inference.
225+
This is useful when the model internally utilizes multiple GPUs, as demonstrated
226+
in this
227+
[example model](https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_instance_group_kind_model/gen_models.py).
228+
If no device is specified in the model, the backend uses the first available
229+
GPU device. This feature is available starting in the 23.06 release.
230+
209231
### Important Notes
210232

211233
* The execution of PyTorch model on GPU is asynchronous in nature. See

0 commit comments

Comments
 (0)