Skip to content

Commit 965402d

Browse files
committed
feat(trtorchc): Adding flag for sparse weights
Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
1 parent 78c7b76 commit 965402d

File tree

13 files changed

+130
-114
lines changed

13 files changed

+130
-114
lines changed

CHANGELOG.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Changelog
22

33

4-
# v0.0.1 (2020-03-31)
4+
# 0.0.1 (2020-03-31)
55

66

77
### Bug Fixes
@@ -24,7 +24,7 @@
2424
* **CheckMethodOperatorSupport:** A new API which will check the graph ([28ee445](https://github.com/NVIDIA/TRTorch/commit/28ee445)), closes [#26](https://github.com/NVIDIA/TRTorch/issues/26)
2525
* **hardtanh:** Adds support for the the hard tanh operator ([391af52](https://github.com/NVIDIA/TRTorch/commit/391af52))
2626

27-
# v0.0.2 (2020-05-17)
27+
# 0.0.2 (2020-05-17)
2828

2929

3030
### Bug Fixes
@@ -73,7 +73,7 @@
7373
* **conv2d_to_convolution:** A pass to map aten::conv2d to _convolution ([2c5c0d5](https://github.com/NVIDIA/TRTorch/commit/2c5c0d5))
7474

7575

76-
# v0.0.3 (2020-07-18)
76+
# 0.0.3 (2020-07-18)
7777

7878

7979
* feat!: Lock bazel version ([25f4371](https://github.com/NVIDIA/TRTorch/commit/25f4371))
@@ -232,7 +232,7 @@ Signed-off-by: Naren Dasan <[email protected]>
232232
Signed-off-by: Naren Dasan <[email protected]>
233233

234234

235-
# v0.1.0 (2020-10-23)
235+
# 0.1.0 (2020-10-23)
236236

237237

238238
### Bug Fixes
@@ -303,7 +303,7 @@ Signed-off-by: Naren Dasan <[email protected]>
303303

304304

305305

306-
# v0.2.0 (2021-02-25)
306+
# 0.2.0 (2021-02-25)
307307

308308

309309
* refactor!: Update bazel and trt versions ([0618b6b](https://github.com/NVIDIA/TRTorch/commit/0618b6b))
@@ -355,7 +355,7 @@ Signed-off-by: Naren Dasan <[email protected]>
355355
Signed-off-by: Naren Dasan <[email protected]>
356356

357357

358-
# v0.3.0 (2021-05-13)
358+
# 0.3.0 (2021-05-13)
359359

360360

361361
### Bug Fixes
@@ -430,7 +430,7 @@ source with the dependencies in WORKSPACE changed
430430
Signed-off-by: Naren Dasan <[email protected]>
431431
Signed-off-by: Naren Dasan <[email protected]>
432432

433-
# v0.4.0 (2021-08-24)
433+
# 0.4.0 (2021-08-24)
434434

435435

436436
* feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, ([9327cce](https://github.com/NVIDIA/TRTorch/commit/9327cce))
@@ -491,26 +491,27 @@ Signed-off-by: Naren Dasan <[email protected]>
491491
* **aten::ones:** Adding support for aten::ones ([2b45a3d](https://github.com/NVIDIA/TRTorch/commit/2b45a3d))
492492
* **aten::slice:** Patching slice for new optional params ([a11287f](https://github.com/NVIDIA/TRTorch/commit/a11287f))
493493
* **aten::sqrt:** Adding support for sqrt evaluators ([6aaba3b](https://github.com/NVIDIA/TRTorch/commit/6aaba3b))
494-
* Support fallback options in trtorchc ([ad966b7](https://github.com/NVIDIA/TRTorch/commit/ad966b7))
495494
* **aten::std|aten::masked_fill:** Implement masked_fill, aten::std ([a086a5b](https://github.com/NVIDIA/TRTorch/commit/a086a5b))
496495
* **aten::std|aten::masked_fill:** Implement masked_fill, aten::std ([2866627](https://github.com/NVIDIA/TRTorch/commit/2866627))
496+
* **jetson:** Support for Jetpack 4.6 ([9760fe3](https://github.com/NVIDIA/TRTorch/commit/9760fe3))
497+
* **to_backend:** Updating backend integration preproc function ([080b594](https://github.com/NVIDIA/TRTorch/commit/080b594))
498+
* Enable sparsity support in TRTorch ([f9e1f2b](https://github.com/NVIDIA/TRTorch/commit/f9e1f2b))
499+
* **trtorchc:** Adding flag for sparse weights ([bfdc6f5](https://github.com/NVIDIA/TRTorch/commit/bfdc6f5))
497500
* Add aten::full converter, quantization ops testcases ([9f2ffd0](https://github.com/NVIDIA/TRTorch/commit/9f2ffd0))
498501
* Add aten::type_as lowering pass ([b57a6dd](https://github.com/NVIDIA/TRTorch/commit/b57a6dd))
499502
* Add functionality for QAT workflow ([fc8eafb](https://github.com/NVIDIA/TRTorch/commit/fc8eafb))
500503
* Add functionality for QAT workflow ([f776e76](https://github.com/NVIDIA/TRTorch/commit/f776e76))
501504
* Add support for providing input datatypes in TRTorch ([a3f4a3c](https://github.com/NVIDIA/TRTorch/commit/a3f4a3c))
502505
* Adding automatic casting to compare layers ([90af26e](https://github.com/NVIDIA/TRTorch/commit/90af26e))
503-
* Enable sparsity support in TRTorch ([f9e1f2b](https://github.com/NVIDIA/TRTorch/commit/f9e1f2b))
504506
* Enable sparsity support in TRTorch ([decd0ed](https://github.com/NVIDIA/TRTorch/commit/decd0ed))
505-
* **jetson:** Support for Jetpack 4.6 ([9760fe3](https://github.com/NVIDIA/TRTorch/commit/9760fe3))
506507
* Enable TRT 8.0 QAT functionality in TRTorch ([c76a28a](https://github.com/NVIDIA/TRTorch/commit/c76a28a))
507-
* **to_backend:** Updating backend integration preproc function ([080b594](https://github.com/NVIDIA/TRTorch/commit/080b594))
508508
* Makefile for trtorchrt.so example ([c60c521](https://github.com/NVIDIA/TRTorch/commit/c60c521))
509509
* show pytorch code of unsupported operators ([2ee2a84](https://github.com/NVIDIA/TRTorch/commit/2ee2a84))
510510
* support aten::Int ([5bc977d](https://github.com/NVIDIA/TRTorch/commit/5bc977d))
511-
* Using shared_ptrs to manage TRT resources in runtime ([e336630](https://github.com/NVIDIA/TRTorch/commit/e336630))
512511
* **trtorchc:** Adding more dtype aliases ([652fb13](https://github.com/NVIDIA/TRTorch/commit/652fb13))
513512
* **trtorchc:** Adding new support for dtypes and formats in ([c39bf81](https://github.com/NVIDIA/TRTorch/commit/c39bf81))
513+
* Support fallback options in trtorchc ([ad966b7](https://github.com/NVIDIA/TRTorch/commit/ad966b7))
514+
* Using shared_ptrs to manage TRT resources in runtime ([e336630](https://github.com/NVIDIA/TRTorch/commit/e336630))
514515
* **trtorchc:** Embedding engines in modules from the CLI ([2b4b9e3](https://github.com/NVIDIA/TRTorch/commit/2b4b9e3))
515516

516517

cpp/bin/trtorchc/README.md

Lines changed: 99 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -14,106 +14,108 @@ to standard TorchScript. Load with `torch.jit.load()` and run like you would run
1414

1515
```
1616
trtorchc [input_file_path] [output_file_path]
17-
[input_specs...] {OPTIONS}
17+
[input_specs...] {OPTIONS}
1818
19-
TRTorch is a compiler for TorchScript, it will compile and optimize
20-
TorchScript programs to run on NVIDIA GPUs using TensorRT
19+
TRTorch is a compiler for TorchScript, it will compile and optimize
20+
TorchScript programs to run on NVIDIA GPUs using TensorRT
2121
22-
OPTIONS:
22+
OPTIONS:
2323
24-
-h, --help Display this help menu
25-
Verbiosity of the compiler
26-
-v, --verbose Dumps debugging information about the
27-
compilation process onto the console
28-
-w, --warnings Disables warnings generated during
29-
compilation onto the console (warnings
30-
are on by default)
31-
--i, --info Dumps info messages generated during
32-
compilation onto the console
33-
--build-debuggable-engine Creates a debuggable engine
34-
--use-strict-types Restrict operating type to only use set
35-
operation precision
36-
--allow-gpu-fallback (Only used when targeting DLA
37-
(device-type)) Lets engine run layers on
38-
GPU if they are not supported on DLA
39-
--allow-torch-fallback Enable layers to run in torch if they
40-
are not supported in TensorRT
41-
--disable-tf32 Prevent Float32 layers from using the
42-
TF32 data format
43-
-p[precision...],
44-
--enabled-precision=[precision...]
45-
(Repeatable) Enabling an operating
46-
precision for kernels to use when
47-
building the engine (Int8 requires a
48-
calibration-cache argument) [ float |
49-
float32 | f32 | fp32 | half | float16 |
50-
f16 | fp16 | int8 | i8 | char ]
51-
(default: float)
52-
-d[type], --device-type=[type] The type of device the engine should be
53-
built for [ gpu | dla ] (default: gpu)
54-
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
55-
(defaults to 0)
56-
--dla-core=[dla_core] DLACore id if running on available DLA
57-
(defaults to 0)
58-
--engine-capability=[capability] The type of device the engine should be
59-
built for [ standard | safety |
60-
dla_standalone ]
61-
--calibration-cache-file=[file_path]
62-
Path to calibration cache file to use
63-
for post training quantization
64-
--ffo=[forced_fallback_ops...],
65-
--forced-fallback-op=[forced_fallback_ops...]
66-
(Repeatable) Operator in the graph that
67-
should be forced to fallback to Pytorch
68-
for execution (allow torch fallback must
69-
be set)
70-
--ffm=[forced_fallback_mods...],
71-
--forced-fallback-mod=[forced_fallback_mods...]
72-
(Repeatable) Module that should be
73-
forced to fallback to Pytorch for
74-
execution (allow torch fallback must be
75-
set)
76-
--embed-engine Whether to treat input file as a
77-
serialized TensorRT engine and embed it
78-
into a TorchScript module (device spec
79-
must be provided)
80-
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
81-
used to select kernels
82-
--num-avg-timing-iters=[num_iters]
83-
Number of averaging timing iterations
84-
used to select kernels
85-
--workspace-size=[workspace_size] Maximum size of workspace given to
86-
TensorRT
87-
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
88-
set, 0 means not set)
89-
-t[threshold],
90-
--threshold=[threshold] Maximum acceptable numerical deviation
91-
from standard torchscript output
92-
(default 2e-5)
93-
--no-threshold-check Skip checking threshold compliance
94-
--truncate-long-double,
95-
--truncate, --truncate-64bit Truncate weights that are provided in
96-
64bit to 32bit (Long, Double to Int,
97-
Float)
98-
--save-engine Instead of compiling a full a
99-
TorchScript program, save the created
100-
engine to the path specified as the
101-
output path
102-
input_file_path Path to input TorchScript file
103-
output_file_path Path for compiled TorchScript (or
104-
TensorRT engine) file
105-
input_specs... Specs for inputs to engine, can either
106-
be a single size or a range defined by
107-
Min, Optimal, Max sizes, e.g.
108-
"(N,..,C,H,W)"
109-
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
110-
Data Type and format can be specified by
111-
adding an "@" followed by dtype and "%"
112-
followed by format to the end of the
113-
shape spec. e.g. "(3, 3, 32,
114-
32)@f16%NHWC"
115-
"--" can be used to terminate flag options and force all following
116-
arguments to be treated as positional options
24+
-h, --help Display this help menu
25+
Verbiosity of the compiler
26+
-v, --verbose Dumps debugging information about the
27+
compilation process onto the console
28+
-w, --warnings Disables warnings generated during
29+
compilation onto the console (warnings
30+
are on by default)
31+
--i, --info Dumps info messages generated during
32+
compilation onto the console
33+
--build-debuggable-engine Creates a debuggable engine
34+
--use-strict-types Restrict operating type to only use set
35+
operation precision
36+
--allow-gpu-fallback (Only used when targeting DLA
37+
(device-type)) Lets engine run layers on
38+
GPU if they are not supported on DLA
39+
--allow-torch-fallback Enable layers to run in torch if they
40+
are not supported in TensorRT
41+
--disable-tf32 Prevent Float32 layers from using the
42+
TF32 data format
43+
--sparse-weights Enable sparsity for weights of conv and
44+
FC layers
45+
-p[precision...],
46+
--enabled-precision=[precision...]
47+
(Repeatable) Enabling an operating
48+
precision for kernels to use when
49+
building the engine (Int8 requires a
50+
calibration-cache argument) [ float |
51+
float32 | f32 | fp32 | half | float16 |
52+
f16 | fp16 | int8 | i8 | char ]
53+
(default: float)
54+
-d[type], --device-type=[type] The type of device the engine should be
55+
built for [ gpu | dla ] (default: gpu)
56+
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
57+
(defaults to 0)
58+
--dla-core=[dla_core] DLACore id if running on available DLA
59+
(defaults to 0)
60+
--engine-capability=[capability] The type of device the engine should be
61+
built for [ standard | safety |
62+
dla_standalone ]
63+
--calibration-cache-file=[file_path]
64+
Path to calibration cache file to use
65+
for post training quantization
66+
--ffo=[forced_fallback_ops...],
67+
--forced-fallback-op=[forced_fallback_ops...]
68+
(Repeatable) Operator in the graph that
69+
should be forced to fallback to Pytorch
70+
for execution (allow torch fallback must
71+
be set)
72+
--ffm=[forced_fallback_mods...],
73+
--forced-fallback-mod=[forced_fallback_mods...]
74+
(Repeatable) Module that should be
75+
forced to fallback to Pytorch for
76+
execution (allow torch fallback must be
77+
set)
78+
--embed-engine Whether to treat input file as a
79+
serialized TensorRT engine and embed it
80+
into a TorchScript module (device spec
81+
must be provided)
82+
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
83+
used to select kernels
84+
--num-avg-timing-iters=[num_iters]
85+
Number of averaging timing iterations
86+
used to select kernels
87+
--workspace-size=[workspace_size] Maximum size of workspace given to
88+
TensorRT
89+
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
90+
set, 0 means not set)
91+
-t[threshold],
92+
--threshold=[threshold] Maximum acceptable numerical deviation
93+
from standard torchscript output
94+
(default 2e-5)
95+
--no-threshold-check Skip checking threshold compliance
96+
--truncate-long-double,
97+
--truncate, --truncate-64bit Truncate weights that are provided in
98+
64bit to 32bit (Long, Double to Int,
99+
Float)
100+
--save-engine Instead of compiling a full a
101+
TorchScript program, save the created
102+
engine to the path specified as the
103+
output path
104+
input_file_path Path to input TorchScript file
105+
output_file_path Path for compiled TorchScript (or
106+
TensorRT engine) file
107+
input_specs... Specs for inputs to engine, can either
108+
be a single size or a range defined by
109+
Min, Optimal, Max sizes, e.g.
110+
"(N,..,C,H,W)"
111+
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
112+
Data Type and format can be specified by
113+
adding an "@" followed by dtype and "%"
114+
followed by format to the end of the
115+
shape spec. e.g. "(3, 3, 32,
116+
32)@f16%NHWC"
117+
"--" can be used to terminate flag options and force all following
118+
arguments to be treated as positional options
117119
```
118120

119121
e.g.

cpp/bin/trtorchc/main.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,9 @@ int main(int argc, char** argv) {
246246
args::Flag disable_tf32(
247247
parser, "disable-tf32", "Prevent Float32 layers from using the TF32 data format", {"disable-tf32"});
248248

249+
args::Flag sparse_weights(
250+
parser, "sparse-weights", "Enable sparsity for weights of conv and FC layers", {"sparse-weights"});
251+
249252
args::ValueFlagList<std::string> enabled_precision(
250253
parser,
251254
"precision",
@@ -464,6 +467,10 @@ int main(int argc, char** argv) {
464467
compile_settings.disable_tf32 = true;
465468
}
466469

470+
if (sparse_weights) {
471+
compile_settings.sparse_weights = true;
472+
}
473+
467474
std::string calibration_cache_file_path = "";
468475
if (calibration_cache_file) {
469476
calibration_cache_file_path = resolve_path(args::get(calibration_cache_file));

docs/._index.html

-4 KB
Binary file not shown.

docs/v0.4.0/._index.html

-4 KB
Binary file not shown.

docs/v0.4.0/_notebooks/Resnet50-example.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -716,7 +716,7 @@
716716
</div>
717717
</div>
718718
<p>
719-
<img alt="70b640c0c3dd4f07b7ff43006c6adb49" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
719+
<img alt="ca63aba6e3dc4251aef5ad05ec7c326d" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
720720
</p>
721721
<h1 id="notebooks-resnet50-example--page-root">
722722
TRTorch Getting Started - ResNet 50

docs/v0.4.0/_notebooks/lenet-getting-started.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -810,7 +810,7 @@
810810
</div>
811811
</div>
812812
<p>
813-
<img alt="466e6cbb236d47c5b3895a9f0d8b4046" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
813+
<img alt="1b2a3097f3d048229ca0b49a1e3bf61a" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
814814
</p>
815815
<h1 id="notebooks-lenet-getting-started--page-root">
816816
TRTorch Getting Started - LeNet

0 commit comments

Comments
 (0)