File tree Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Expand file tree Collapse file tree 1 file changed +22
-0
lines changed Original file line number Diff line number Diff line change @@ -206,6 +206,28 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
206
206
207
207
`ENABLE_TENSOR_FUSER`
208
208
209
+ ### Support
210
+
211
+ #### Model Instance Group Kind
212
+
213
+ The PyTorch backend supports the following kinds of
214
+ [ Model Instance Groups] ( https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups )
215
+ where the input tensors are placed as follows:
216
+
217
+ * ` KIND_GPU ` : Inputs are prepared on the GPU device associated with the model
218
+ instance.
219
+
220
+ * ` KIND_CPU ` : Inputs are prepared on the CPU.
221
+
222
+ * ` KIND_MODEL ` : Inputs are prepared on the CPU. When loading the model, the
223
+ backend does not choose the GPU device for the model; instead, it respects the
224
+ device(s) specified in the model and uses them as they are during inference.
225
+ This is useful when the model internally utilizes multiple GPUs, as demonstrated
226
+ in this
227
+ [ example model] ( https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_instance_group_kind_model/gen_models.py ) .
228
+ If no device is specified in the model, the backend uses the first available
229
+ GPU device. This feature is available starting in the 23.06 release.
230
+
209
231
### Important Notes
210
232
211
233
* The execution of PyTorch model on GPU is asynchronous in nature. See
You can’t perform that action at this time.
0 commit comments