Skip to content

Commit 326eaeb

Browse files
committed
ammending deployment on triton docs
1 parent c2cb969 commit 326eaeb

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

docsrc/tutorials/deploy_torch_tensorrt_to_triton.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ Deploying a Torch-TensorRT model (to Triton)
22
============================================
33

44
Optimization and deployment go hand in hand in a discussion about Machine
5-
Learning infrastructure. For a Torch-TensorRT user, network level optimzation
6-
to get the maximum performance would already be an area of expertize.
5+
Learning infrastructure. Once network level optimzation are done
6+
to get the maximum performance, the next step would be to deploy it.
77

88
However, serving this optimized model comes with it's own set of considerations
99
and challenges like: building an infrastructure to support concorrent model
@@ -18,7 +18,7 @@ Step 1: Optimize your model with Torch-TensorRT
1818
-----------------------------------------------
1919

2020
Most Torch-TensorRT users will be familiar with this step. For the purpose of
21-
this demoonstration, we will be using a ResNet50 model from Torchhub.
21+
this demonstration, we will be using a ResNet50 model from Torchhub.
2222

2323
Let’s first pull the NGC PyTorch Docker container. You may need to create
2424
an account and get the API key from `here <https://ngc.nvidia.com/setup/>`__.
@@ -30,7 +30,7 @@ Sign up and login with your key (follow the instructions
3030
# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Pytorch
3131
# container; eg. 22.04
3232

33-
docker run -it --gpus all -v /path/to/folder:/resnet50_eg nvcr.io/nvidia/pytorch:<xx.xx>-py3
33+
docker run -it --gpus all -v /path/to/local/folder/to/copy/model:/resnet50_eg nvcr.io/nvidia/pytorch:<xx.xx>-py3
3434

3535
Once inside the container, we can proceed to download a ResNet model from
3636
Torchhub and optimize it with Torch-TensorRT.
@@ -180,25 +180,25 @@ with the Triton Inference Server.
180180
::
181181

182182
# Setting up client
183-
triton_client = httpclient.InferenceServerClient(url="localhost:8000")
183+
client = httpclient.InferenceServerClient(url="localhost:8000")
184184

185185
Secondly, we specify the names of the input and output layer(s) of our model.
186186

187187
::
188188

189-
test_input = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
190-
test_input.set_data_from_numpy(transformed_img, binary_data=True)
189+
inputs = httpclient.InferInput("input__0", transformed_img.shape, datatype="FP32")
190+
inputs.set_data_from_numpy(transformed_img, binary_data=True)
191191

192-
test_output = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
192+
outputs = httpclient.InferRequestedOutput("output__0", binary_data=True, class_count=1000)
193193

194194
Lastly, we send an inference request to the Triton Inference Server.
195195

196196
::
197197

198198
# Querying the server
199-
results = triton_client.infer(model_name="resnet50", inputs=[test_input], outputs=[test_output])
200-
test_output_fin = results.as_numpy('output__0')
201-
print(test_output_fin[:5])
199+
results = client.infer(model_name="resnet50", inputs=[inputs], outputs=[outputs])
200+
inference_output = results.as_numpy('output__0')
201+
print(inference_output[:5])
202202

203203
The output of the same should look like below:
204204

0 commit comments

Comments
 (0)