Skip to content

Commit f8285ba

Browse files
committed
refactor: Refactor perf_run and add internal benchmark scripts
Signed-off-by: Dheeraj Peri <[email protected]>
1 parent 46d0e86 commit f8285ba

File tree

4 files changed

+241
-139
lines changed

4 files changed

+241
-139
lines changed

tools/perf/README.md

Lines changed: 129 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,129 @@
1-
# Performance Benchmarking
2-
3-
This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:
4-
5-
1. Torch
6-
2. Torch-TensorRT
7-
3. TensorRT
8-
9-
Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
10-
11-
## Prerequisite
12-
13-
Benchmark scripts depends on following Python packages in addition to requirements.txt packages
14-
15-
1. Torch-TensorRT
16-
2. Torch
17-
3. TensorRT
18-
19-
## Structure
20-
21-
```
22-
./
23-
├── config
24-
│ ├── vgg16_trt.yml
25-
│ └── vgg16.yml
26-
├── models
27-
├── perf_run.py
28-
└── README.md
29-
```
30-
31-
Please save your configuration files at config directory. Similarly, place your model files at models path.
32-
33-
## Usage
34-
35-
To run the benchmark for a given configuration file:
36-
37-
```
38-
python perf_run.py --config=config/vgg16.yml
39-
```
40-
41-
## Configuration
42-
43-
There are two sample configuration files added.
44-
45-
* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
46-
* vgg16_trt.yml demonstrates how to use an external TensorRT serialized engine file directly.
47-
48-
49-
### Supported fields
50-
51-
| Name | Supported Values | Description |
52-
| --- | --- | --- |
53-
| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
54-
| input | - | Input binding names. Expected to list shapes of each input bindings |
55-
| model | - | Configure the model filename and name |
56-
| filename | - | Model file name to load from disk. |
57-
| name | - | Model name |
58-
| runtime | - | Runtime configurations |
59-
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
60-
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
61-
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
62-
63-
Note:
64-
1. Please note that torch runtime perf is not supported for int8 yet.
65-
2. Torchscript module filename should end with .jit.pt otherwise it will be treated as a TensorRT engine.
66-
67-
68-
69-
Additional sample use case:
70-
71-
```
72-
backend:
73-
- torch
74-
- torch_tensorrt
75-
- tensorrt
76-
input:
77-
input0:
78-
- 3
79-
- 224
80-
- 224
81-
num_inputs: 1
82-
model:
83-
filename: model.plan
84-
name: vgg16
85-
runtime:
86-
device: 0
87-
precision:
88-
- fp32
89-
- fp16
90-
```
1+
# Performance Benchmarking
2+
3+
This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:
4+
5+
1. Torch
6+
2. Torch-TensorRT
7+
3. FX-TRT
8+
4. TensorRT
9+
10+
11+
Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
12+
13+
## Prerequisite
14+
15+
Benchmark scripts depends on following Python packages in addition to requirements.txt packages
16+
17+
1. Torch-TensorRT
18+
2. Torch
19+
3. TensorRT
20+
21+
## Structure
22+
23+
```
24+
./
25+
├── config
26+
│ ├── vgg16_trt.yml
27+
│ └── vgg16.yml
28+
├── models
29+
├── perf_run.py
30+
├── hub.py
31+
├── custom_models.py
32+
├── requirements.txt
33+
├── benchmark.sh
34+
└── README.md
35+
```
36+
37+
38+
39+
* `config` - Directory which contains sample yaml configuration files for VGG network.
40+
* `models` - Model directory
41+
* `perf_run.py` - Performance benchmarking script which supports torch, torch_tensorrt, fx2trt, tensorrt backends
42+
* `hub.py` - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
43+
* `custom_models.py` - Script which includes custom models other than torchvision and timm (eg: HF BERT)
44+
* `utils.py` - utility functions script
45+
* `benchmark.sh` - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.
46+
47+
## Usage
48+
49+
There are two ways you can run a performance benchmark.
50+
51+
### Using YAML config files
52+
53+
To run the benchmark for a given configuration file:
54+
55+
```python
56+
python perf_run.py --config=config/vgg16.yml
57+
```
58+
59+
There are two sample configuration files added.
60+
61+
* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
62+
* vgg16_trt.yml demonstrates how to use an external TensorRT serialized engine file directly.
63+
64+
65+
### Supported fields
66+
67+
| Name | Supported Values | Description |
68+
| ----------------- | ------------------------------------ | ------------------------------------------------------------ |
69+
| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
70+
| input | - | Input binding names. Expected to list shapes of each input bindings |
71+
| model | - | Configure the model filename and name |
72+
| filename | - | Model file name to load from disk. |
73+
| name | - | Model name |
74+
| runtime | - | Runtime configurations |
75+
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
76+
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
77+
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
78+
79+
Additional sample use case:
80+
81+
```
82+
backend:
83+
- torch
84+
- torch_tensorrt
85+
- tensorrt
86+
input:
87+
input0:
88+
- 3
89+
- 224
90+
- 224
91+
num_inputs: 1
92+
model:
93+
filename: model.plan
94+
name: vgg16
95+
runtime:
96+
device: 0
97+
precision:
98+
- fp32
99+
- fp16
100+
```
101+
102+
Note:
103+
104+
1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
105+
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
106+
107+
### Using CompileSpec options via CLI
108+
109+
Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
110+
111+
* `--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt,tensorrt
112+
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension))
113+
* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
114+
* `--batch_size` : Batch size
115+
* `--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
116+
* `--device` : Device ID
117+
* `--truncate` : Truncate long and double weights in the network in Torch-TensorRT
118+
* `--is_trt_engine` : Boolean flag to be enabled if the model file provided is a TensorRT engine.
119+
* `--report` : Path of the output file where performance summary is written.
120+
121+
Eg:
122+
123+
```
124+
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
125+
--precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
126+
--batch_size 1 \
127+
--backends torch,torch_tensorrt,tensorrt \
128+
--report "vgg_perf_bs1.txt"
129+
```

tools/perf/benchmark.sh

Lines changed: 49 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,64 @@
11
#!/bin/bash
22

3+
MODELS_DIR="models"
4+
35
# Download the Torchscript models
46
python hub.py
57

68
batch_sizes=(1 2 4 8 16 32 64 128 256)
79

8-
# # Benchmark VGG16 model
9-
# echo "Benchmarking VGG16 model"
10-
# for bs in 1 2
11-
# do
12-
# python perf_run.py --model models/vgg16_scripted.jit.pt \
13-
# --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
14-
# --batch_size ${bs} \
15-
# --backends torch,torch_tensorrt,tensorrt \
16-
# --report "vgg_perf_bs${bs}.txt"
17-
# done
18-
#
19-
# # Benchmark Resnet50 model
20-
# echo "Benchmarking Resnet50 model"
21-
# for bs in 1 2
22-
# do
23-
# python perf_run.py --model models/resnet50_scripted.jit.pt \
24-
# --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
25-
# --batch_size ${bs} \
26-
# --backends torch,torch_tensorrt,tensorrt \
27-
# --report "rn50_perf_bs${bs}.txt"
28-
# done
29-
#
30-
# # Benchmark VIT model
31-
# echo "Benchmarking VIT model"
32-
# for bs in 1 2
33-
# do
34-
# python perf_run.py --model models/vit_scripted.jit.pt \
35-
# --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
36-
# --batch_size ${bs} \
37-
# --backends torch,torch_tensorrt,tensorrt \
38-
# --report "vit_perf_bs${bs}.txt"
39-
# done
40-
#
41-
# # Benchmark EfficientNet-B0 model
42-
# echo "Benchmarking EfficientNet-B0 model"
43-
# for bs in 1 2
44-
# do
45-
# python perf_run.py --model models/efficientnet_b0_scripted.jit.pt \
46-
# --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
47-
# --batch_size ${bs} \
48-
# --backends torch,torch_tensorrt,tensorrt \
49-
# --report "eff_b0_perf_bs${bs}.txt"
50-
# done
10+
#Benchmark VGG16 model
11+
echo "Benchmarking VGG16 model"
12+
for bs in batch_sizes
13+
do
14+
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
15+
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
16+
--batch_size ${bs} \
17+
--backends torch,torch_tensorrt,tensorrt \
18+
--report "vgg_perf_bs${bs}.txt"
19+
done
20+
21+
# Benchmark Resnet50 model
22+
echo "Benchmarking Resnet50 model"
23+
for bs in batch_sizes
24+
do
25+
python perf_run.py --model ${MODELS_DIR}/resnet50_scripted.jit.pt \
26+
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
27+
--batch_size ${bs} \
28+
--backends torch,torch_tensorrt,tensorrt \
29+
--report "rn50_perf_bs${bs}.txt"
30+
done
31+
32+
# Benchmark VIT model
33+
echo "Benchmarking VIT model"
34+
for bs in batch_sizes
35+
do
36+
python perf_run.py --model ${MODELS_DIR}/vit_scripted.jit.pt \
37+
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
38+
--batch_size ${bs} \
39+
--backends torch,torch_tensorrt,tensorrt \
40+
--report "vit_perf_bs${bs}.txt"
41+
done
42+
43+
# Benchmark EfficientNet-B0 model
44+
echo "Benchmarking EfficientNet-B0 model"
45+
for bs in batch_sizes
46+
do
47+
python perf_run.py --model ${MODELS_DIR}/efficientnet_b0_scripted.jit.pt \
48+
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
49+
--batch_size ${bs} \
50+
--backends torch,torch_tensorrt,tensorrt \
51+
--report "eff_b0_perf_bs${bs}.txt"
52+
done
5153

5254
# Benchmark BERT model
53-
for bs in 1
55+
echo "Benchmarking Huggingface BERT base model"
56+
for bs in batch_sizes
5457
do
55-
python perf_run.py --model models/bert_base_uncased_traced.jit.pt \
58+
python perf_run.py --model ${MODELS_DIR}/bert_base_uncased_traced.jit.pt \
5659
--precision fp32 --inputs="(${bs}, 128)@int32;(${bs}, 128)@int32" \
5760
--batch_size ${bs} \
58-
--backends torch_tensorrt \
61+
--backends torch,torch_tensorrt \
5962
--truncate \
6063
--report "bert_base_perf_bs${bs}.txt"
6164
done

tools/perf/perf_run.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def run_torch_tensorrt(model, input_tensors, params, precision, truncate_long_an
8686
if precision == 'int8':
8787
compile_settings.update({"calib": params.get('calibration_cache')})
8888

89-
with torchtrt.logging.debug():
89+
with torchtrt.logging.errors():
9090
model = torchtrt.compile(model, **compile_settings)
9191

9292
iters = params.get('iterations', 20)
@@ -307,9 +307,10 @@ def load_model(params):
307307
arg_parser.add_argument("--model", type=str, help="Name of the model file")
308308
arg_parser.add_argument("--inputs", type=str, help="List of input shapes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT")
309309
arg_parser.add_argument("--batch_size", type=int, default=1, help="Batch size to build and run")
310-
arg_parser.add_argument("--precision", default="fp32", type=str, help="Precision of TensorRT engine")
310+
arg_parser.add_argument("--precision", default="fp32", type=str, help="Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16")
311+
arg_parser.add_argument("--calibration_cache", type=str, help="Name of the calibration cache file")
311312
arg_parser.add_argument("--device", type=int, help="device id")
312-
arg_parser.add_argument("--truncate", action='store_true', help="Truncate long and double weights in the network")
313+
arg_parser.add_argument("--truncate", action='store_true', help="Truncate long and double weights in the network in Torch-TensorRT")
313314
arg_parser.add_argument("--is_trt_engine", action='store_true', help="Boolean flag to determine if the user provided model is a TRT engine or not")
314315
arg_parser.add_argument("--report", type=str, help="Path of the output file where performance summary is written.")
315316
args = arg_parser.parse_args()

0 commit comments

Comments
 (0)