Skip to content

chore: v0.4.0 version bump PR #600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
404 changes: 141 additions & 263 deletions CHANGELOG.md

Large diffs are not rendered by default.

196 changes: 99 additions & 97 deletions cpp/bin/trtorchc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,106 +14,108 @@ to standard TorchScript. Load with `torch.jit.load()` and run like you would run

```
trtorchc [input_file_path] [output_file_path]
[input_specs...] {OPTIONS}
[input_specs...] {OPTIONS}

TRTorch is a compiler for TorchScript, it will compile and optimize
TorchScript programs to run on NVIDIA GPUs using TensorRT
TRTorch is a compiler for TorchScript, it will compile and optimize
TorchScript programs to run on NVIDIA GPUs using TensorRT

OPTIONS:
OPTIONS:

-h, --help Display this help menu
Verbiosity of the compiler
-v, --verbose Dumps debugging information about the
compilation process onto the console
-w, --warnings Disables warnings generated during
compilation onto the console (warnings
are on by default)
--i, --info Dumps info messages generated during
compilation onto the console
--build-debuggable-engine Creates a debuggable engine
--use-strict-types Restrict operating type to only use set
operation precision
--allow-gpu-fallback (Only used when targeting DLA
(device-type)) Lets engine run layers on
GPU if they are not supported on DLA
--allow-torch-fallback Enable layers to run in torch if they
are not supported in TensorRT
--disable-tf32 Prevent Float32 layers from using the
TF32 data format
-p[precision...],
--enabled-precision=[precision...]
(Repeatable) Enabling an operating
precision for kernels to use when
building the engine (Int8 requires a
calibration-cache argument) [ float |
float32 | f32 | fp32 | half | float16 |
f16 | fp16 | int8 | i8 | char ]
(default: float)
-d[type], --device-type=[type] The type of device the engine should be
built for [ gpu | dla ] (default: gpu)
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
(defaults to 0)
--dla-core=[dla_core] DLACore id if running on available DLA
(defaults to 0)
--engine-capability=[capability] The type of device the engine should be
built for [ standard | safety |
dla_standalone ]
--calibration-cache-file=[file_path]
Path to calibration cache file to use
for post training quantization
--ffo=[forced_fallback_ops...],
--forced-fallback-op=[forced_fallback_ops...]
(Repeatable) Operator in the graph that
should be forced to fallback to Pytorch
for execution (allow torch fallback must
be set)
--ffm=[forced_fallback_mods...],
--forced-fallback-mod=[forced_fallback_mods...]
(Repeatable) Module that should be
forced to fallback to Pytorch for
execution (allow torch fallback must be
set)
--embed-engine Whether to treat input file as a
serialized TensorRT engine and embed it
into a TorchScript module (device spec
must be provided)
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
used to select kernels
--num-avg-timing-iters=[num_iters]
Number of averaging timing iterations
used to select kernels
--workspace-size=[workspace_size] Maximum size of workspace given to
TensorRT
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
set, 0 means not set)
-t[threshold],
--threshold=[threshold] Maximum acceptable numerical deviation
from standard torchscript output
(default 2e-5)
--no-threshold-check Skip checking threshold compliance
--truncate-long-double,
--truncate, --truncate-64bit Truncate weights that are provided in
64bit to 32bit (Long, Double to Int,
Float)
--save-engine Instead of compiling a full a
TorchScript program, save the created
engine to the path specified as the
output path
input_file_path Path to input TorchScript file
output_file_path Path for compiled TorchScript (or
TensorRT engine) file
input_specs... Specs for inputs to engine, can either
be a single size or a range defined by
Min, Optimal, Max sizes, e.g.
"(N,..,C,H,W)"
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
Data Type and format can be specified by
adding an "@" followed by dtype and "%"
followed by format to the end of the
shape spec. e.g. "(3, 3, 32,
32)@f16%NHWC"
"--" can be used to terminate flag options and force all following
arguments to be treated as positional options
-h, --help Display this help menu
Verbiosity of the compiler
-v, --verbose Dumps debugging information about the
compilation process onto the console
-w, --warnings Disables warnings generated during
compilation onto the console (warnings
are on by default)
--i, --info Dumps info messages generated during
compilation onto the console
--build-debuggable-engine Creates a debuggable engine
--use-strict-types Restrict operating type to only use set
operation precision
--allow-gpu-fallback (Only used when targeting DLA
(device-type)) Lets engine run layers on
GPU if they are not supported on DLA
--allow-torch-fallback Enable layers to run in torch if they
are not supported in TensorRT
--disable-tf32 Prevent Float32 layers from using the
TF32 data format
--sparse-weights Enable sparsity for weights of conv and
FC layers
-p[precision...],
--enabled-precision=[precision...]
(Repeatable) Enabling an operating
precision for kernels to use when
building the engine (Int8 requires a
calibration-cache argument) [ float |
float32 | f32 | fp32 | half | float16 |
f16 | fp16 | int8 | i8 | char ]
(default: float)
-d[type], --device-type=[type] The type of device the engine should be
built for [ gpu | dla ] (default: gpu)
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
(defaults to 0)
--dla-core=[dla_core] DLACore id if running on available DLA
(defaults to 0)
--engine-capability=[capability] The type of device the engine should be
built for [ standard | safety |
dla_standalone ]
--calibration-cache-file=[file_path]
Path to calibration cache file to use
for post training quantization
--ffo=[forced_fallback_ops...],
--forced-fallback-op=[forced_fallback_ops...]
(Repeatable) Operator in the graph that
should be forced to fallback to Pytorch
for execution (allow torch fallback must
be set)
--ffm=[forced_fallback_mods...],
--forced-fallback-mod=[forced_fallback_mods...]
(Repeatable) Module that should be
forced to fallback to Pytorch for
execution (allow torch fallback must be
set)
--embed-engine Whether to treat input file as a
serialized TensorRT engine and embed it
into a TorchScript module (device spec
must be provided)
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
used to select kernels
--num-avg-timing-iters=[num_iters]
Number of averaging timing iterations
used to select kernels
--workspace-size=[workspace_size] Maximum size of workspace given to
TensorRT
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
set, 0 means not set)
-t[threshold],
--threshold=[threshold] Maximum acceptable numerical deviation
from standard torchscript output
(default 2e-5)
--no-threshold-check Skip checking threshold compliance
--truncate-long-double,
--truncate, --truncate-64bit Truncate weights that are provided in
64bit to 32bit (Long, Double to Int,
Float)
--save-engine Instead of compiling a full a
TorchScript program, save the created
engine to the path specified as the
output path
input_file_path Path to input TorchScript file
output_file_path Path for compiled TorchScript (or
TensorRT engine) file
input_specs... Specs for inputs to engine, can either
be a single size or a range defined by
Min, Optimal, Max sizes, e.g.
"(N,..,C,H,W)"
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
Data Type and format can be specified by
adding an "@" followed by dtype and "%"
followed by format to the end of the
shape spec. e.g. "(3, 3, 32,
32)@f16%NHWC"
"--" can be used to terminate flag options and force all following
arguments to be treated as positional options
```

e.g.
Expand Down
7 changes: 7 additions & 0 deletions cpp/bin/trtorchc/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ int main(int argc, char** argv) {
args::Flag disable_tf32(
parser, "disable-tf32", "Prevent Float32 layers from using the TF32 data format", {"disable-tf32"});

args::Flag sparse_weights(
parser, "sparse-weights", "Enable sparsity for weights of conv and FC layers", {"sparse-weights"});

args::ValueFlagList<std::string> enabled_precision(
parser,
"precision",
Expand Down Expand Up @@ -464,6 +467,10 @@ int main(int argc, char** argv) {
compile_settings.disable_tf32 = true;
}

if (sparse_weights) {
compile_settings.sparse_weights = true;
}

std::string calibration_cache_file_path = "";
if (calibration_cache_file) {
calibration_cache_file_path = resolve_path(args::get(calibration_cache_file));
Expand Down
Binary file removed docs/._index.html
Binary file not shown.
Empty file added docs/v0.4.0/.nojekyll
Empty file.
Loading