refactor: Refactor perf_run and add internal benchmark scripts

peri044 · peri044 · commit f8285baee41d · 2022-08-11T11:11:58.000-07:00
Signed-off-by: Dheeraj Peri &lt;peri.dheeraj@gmail.com&gt;
diff --git a/tools/perf/README.md b/tools/perf/README.md
@@ -1,90 +1,129 @@
-# Performance Benchmarking
-
-This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:
-
-1. Torch
-2. Torch-TensorRT
-3. TensorRT
-
-Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
-
-## Prerequisite
-
-Benchmark scripts depends on following Python packages in addition to requirements.txt packages
-
-1. Torch-TensorRT
-2. Torch
-3. TensorRT
-
-## Structure
-
-```
-./
-├── config
-│   ├── vgg16_trt.yml
-│   └── vgg16.yml
-├── models
-├── perf_run.py
-└── README.md
-```
-
-Please save your configuration files at config directory. Similarly, place your model files at models path.
-
-## Usage
-
-To run the benchmark for a given configuration file:
-
-```
-python perf_run.py --config=config/vgg16.yml
-```
-
-## Configuration
-
-There are two sample configuration files added.
-
-* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
-* vgg16_trt.yml demonstrates how to use an external TensorRT serialized engine file directly.
-
-
-### Supported fields
-
-| Name | Supported Values | Description |
-| --- | --- | --- |
-| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
-| input | - | Input binding names. Expected to list shapes of each input bindings |
-| model | - | Configure the model filename and name |
-| filename | - | Model file name to load from disk. |
-| name | - | Model name |
-| runtime | - | Runtime configurations |
-| device | 0 | Target device ID to run inference. Range depends on available GPUs |
-| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
-| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
-
-Note:
-1. Please note that torch runtime perf is not supported for int8 yet.
-2. Torchscript module filename should end with .jit.pt otherwise it will be treated as a TensorRT engine.
-
-
-
-Additional sample use case:
-
-```
-backend:
-  - torch
-  - torch_tensorrt
-  - tensorrt
-input:
-  input0:
-    - 3
-    - 224
-    - 224
-  num_inputs: 1
-model:
-  filename: model.plan
-  name: vgg16
-runtime:
-  device: 0
-  precision:
-    - fp32
-    - fp16
-```
+# Performance Benchmarking
+
+This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:
+
+1. Torch
+2. Torch-TensorRT
+3. FX-TRT
+4. TensorRT
+
+
+Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
+
+## Prerequisite
+
+Benchmark scripts depends on following Python packages in addition to requirements.txt packages
+
+1. Torch-TensorRT
+2. Torch
+3. TensorRT
+
+## Structure
+
+```
+./
+├── config
+│   ├── vgg16_trt.yml
+│   └── vgg16.yml
+├── models
+├── perf_run.py
+├── hub.py
+├── custom_models.py
+├── requirements.txt
+├── benchmark.sh
+└── README.md
+```
+
+
+
+* `config` - Directory which contains sample yaml configuration files for VGG network.
+* `models` - Model directory
+* `perf_run.py` - Performance benchmarking script which supports torch, torch_tensorrt, fx2trt, tensorrt backends
+* `hub.py` - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
+* `custom_models.py` - Script which includes custom models other than torchvision and timm (eg: HF BERT)  
+* `utils.py` - utility functions script
+* `benchmark.sh` - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.
+
+## Usage
+
+There are two ways you can run a performance benchmark.
+
+### Using YAML config files
+
+To run the benchmark for a given configuration file:
+
+```python
+python perf_run.py --config=config/vgg16.yml
+```
+
+There are two sample configuration files added.
+
+* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
+* vgg16_trt.yml demonstrates how to use an external TensorRT serialized engine file directly.
+
+
+### Supported fields
+
+| Name              | Supported Values                     | Description                                                  |
+| ----------------- | ------------------------------------ | ------------------------------------------------------------ |
+| backend           | all, torch, torch_tensorrt, tensorrt | Supported backends for inference.                            |
+| input             | -                                    | Input binding names. Expected to list shapes of each input bindings |
+| model             | -                                    | Configure the model filename and name                        |
+| filename          | -                                    | Model file name to load from disk.                           |
+| name              | -                                    | Model name                                                   |
+| runtime           | -                                    | Runtime configurations                                       |
+| device            | 0                                    | Target device ID to run inference. Range depends on available GPUs |
+| precision         | fp32, fp16 or half, int8             | Target precision to run inference. int8 cannot be used with 'all' backend |
+| calibration_cache | -                                    | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
+
+Additional sample use case:
+
+```
+backend:
+  - torch
+  - torch_tensorrt
+  - tensorrt
+input:
+  input0:
+    - 3
+    - 224
+    - 224
+  num_inputs: 1
+model:
+  filename: model.plan
+  name: vgg16
+runtime:
+  device: 0
+  precision:
+    - fp32
+    - fp16
+```
+
+Note:
+
+1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
+2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
+
+### Using CompileSpec options via CLI
+
+Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
+
+* `--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt,tensorrt
+* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension))
+* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
+* `--batch_size` : Batch size
+* `--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
+* `--device` : Device ID
+* `--truncate` : Truncate long and double weights in the network in Torch-TensorRT
+* `--is_trt_engine` : Boolean flag to be enabled if the model file provided is a TensorRT engine.
+* `--report` : Path of the output file where performance summary is written.
+
+Eg:
+
+```
+  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
+                     --batch_size 1 \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vgg_perf_bs1.txt"
+```
diff --git a/tools/perf/benchmark.sh b/tools/perf/benchmark.sh
@@ -1,61 +1,64 @@
 #!/bin/bash
 
+MODELS_DIR="models"
+
 # Download the Torchscript models
 python hub.py
 
 batch_sizes=(1 2 4 8 16 32 64 128 256)
 
-# # Benchmark VGG16 model
-# echo "Benchmarking VGG16 model"
-# for bs in 1 2
-# do
-#   python perf_run.py --model models/vgg16_scripted.jit.pt \
-#                      --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
-#                      --batch_size ${bs} \
-#                      --backends torch,torch_tensorrt,tensorrt \
-#                      --report "vgg_perf_bs${bs}.txt"
-# done
-#
-# # Benchmark Resnet50 model
-# echo "Benchmarking Resnet50 model"
-# for bs in 1 2
-# do
-#   python perf_run.py --model models/resnet50_scripted.jit.pt \
-#                      --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
-#                      --batch_size ${bs} \
-#                      --backends torch,torch_tensorrt,tensorrt \
-#                      --report "rn50_perf_bs${bs}.txt"
-# done
-#
-# # Benchmark VIT model
-# echo "Benchmarking VIT model"
-# for bs in 1 2
-# do
-#   python perf_run.py --model models/vit_scripted.jit.pt \
-#                      --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
-#                      --batch_size ${bs} \
-#                      --backends torch,torch_tensorrt,tensorrt \
-#                      --report "vit_perf_bs${bs}.txt"
-# done
-#
-# # Benchmark EfficientNet-B0 model
-# echo "Benchmarking EfficientNet-B0 model"
-# for bs in 1 2
-# do
-#   python perf_run.py --model models/efficientnet_b0_scripted.jit.pt \
-#                      --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
-#                      --batch_size ${bs} \
-#                      --backends torch,torch_tensorrt,tensorrt \
-#                      --report "eff_b0_perf_bs${bs}.txt"
-# done
+#Benchmark VGG16 model
+echo "Benchmarking VGG16 model"
+for bs in batch_sizes
+do
+  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vgg_perf_bs${bs}.txt"
+done
+
+# Benchmark Resnet50 model
+echo "Benchmarking Resnet50 model"
+for bs in batch_sizes
+do
+  python perf_run.py --model ${MODELS_DIR}/resnet50_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "rn50_perf_bs${bs}.txt"
+done
+
+# Benchmark VIT model
+echo "Benchmarking VIT model"
+for bs in batch_sizes
+do
+  python perf_run.py --model ${MODELS_DIR}/vit_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vit_perf_bs${bs}.txt"
+done
+
+# Benchmark EfficientNet-B0 model
+echo "Benchmarking EfficientNet-B0 model"
+for bs in batch_sizes
+do
+  python perf_run.py --model ${MODELS_DIR}/efficientnet_b0_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "eff_b0_perf_bs${bs}.txt"
+done
 
 # Benchmark BERT model
-for bs in 1
+echo "Benchmarking Huggingface BERT base model"
+for bs in batch_sizes
 do
-  python perf_run.py --model models/bert_base_uncased_traced.jit.pt \
+  python perf_run.py --model ${MODELS_DIR}/bert_base_uncased_traced.jit.pt \
                      --precision fp32 --inputs="(${bs}, 128)@int32;(${bs}, 128)@int32" \
                      --batch_size ${bs} \
-                     --backends torch_tensorrt \
+                     --backends torch,torch_tensorrt \
                      --truncate \
                      --report "bert_base_perf_bs${bs}.txt"
 done
diff --git a/tools/perf/perf_run.py b/tools/perf/perf_run.py
@@ -86,7 +86,7 @@ def run_torch_tensorrt(model, input_tensors, params, precision, truncate_long_an
     if precision == 'int8':
         compile_settings.update({"calib": params.get('calibration_cache')})
 
-    with torchtrt.logging.debug():
+    with torchtrt.logging.errors():
         model = torchtrt.compile(model, **compile_settings)
 
     iters = params.get('iterations', 20)
@@ -307,9 +307,10 @@ def load_model(params):
     arg_parser.add_argument("--model", type=str, help="Name of the model file")
     arg_parser.add_argument("--inputs", type=str, help="List of input shapes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT")
     arg_parser.add_argument("--batch_size", type=int, default=1, help="Batch size to build and run")
-    arg_parser.add_argument("--precision", default="fp32", type=str, help="Precision of TensorRT engine")
+    arg_parser.add_argument("--precision", default="fp32", type=str, help="Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16")
+    arg_parser.add_argument("--calibration_cache", type=str, help="Name of the calibration cache file")
     arg_parser.add_argument("--device", type=int, help="device id")
-    arg_parser.add_argument("--truncate", action='store_true', help="Truncate long and double weights in the network")
+    arg_parser.add_argument("--truncate", action='store_true', help="Truncate long and double weights in the network  in Torch-TensorRT")
     arg_parser.add_argument("--is_trt_engine", action='store_true', help="Boolean flag to determine if the user provided model is a TRT engine or not")
     arg_parser.add_argument("--report", type=str, help="Path of the output file where performance summary is written.")
     args = arg_parser.parse_args()
diff --git a/tools/perf/utils.py b/tools/perf/utils.py