pytorch
diff --git a/‎CHANGELOG.md
Lines changed: 13 additions & 12 deletions b/‎CHANGELOG.md
Lines changed: 13 additions & 12 deletions
diff --git a/‎cpp/bin/trtorchc/README.md
Lines changed: 99 additions & 97 deletions b/‎cpp/bin/trtorchc/README.md
Lines changed: 99 additions & 97 deletions
diff --git a/‎cpp/bin/trtorchc/main.cpp
Lines changed: 7 additions & 0 deletions b/‎cpp/bin/trtorchc/main.cpp
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/._index.html
-4 KB b/‎docs/._index.html
-4 KB
diff --git a/‎docs/v0.4.0/._index.html
-4 KB b/‎docs/v0.4.0/._index.html
-4 KB
diff --git a/‎docs/v0.4.0/_notebooks/Resnet50-example.html
Lines changed: 1 addition & 1 deletion b/‎docs/v0.4.0/_notebooks/Resnet50-example.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/v0.4.0/_notebooks/lenet-getting-started.html
Lines changed: 1 addition & 1 deletion b/‎docs/v0.4.0/_notebooks/lenet-getting-started.html
Lines changed: 1 addition & 1 deletion
@@ -1,7 +1,7 @@
 # Changelog
 
 
-# v0.0.1 (2020-03-31)
+# 0.0.1 (2020-03-31)
 
 
 ### Bug Fixes
@@ -24,7 +24,7 @@
 * **CheckMethodOperatorSupport:** A new API which will check the graph ([28ee445](https://github.com/NVIDIA/TRTorch/commit/28ee445)), closes [#26](https://github.com/NVIDIA/TRTorch/issues/26)
 * **hardtanh:** Adds support for the the hard tanh operator ([391af52](https://github.com/NVIDIA/TRTorch/commit/391af52))
 
-# v0.0.2  (2020-05-17)
+# 0.0.2  (2020-05-17)
 
 
 ### Bug Fixes
@@ -73,7 +73,7 @@
 * **conv2d_to_convolution:** A pass to map aten::conv2d to _convolution ([2c5c0d5](https://github.com/NVIDIA/TRTorch/commit/2c5c0d5))
 
 
-# v0.0.3 (2020-07-18)
+# 0.0.3 (2020-07-18)
 
 
 * feat!: Lock bazel version ([25f4371](https://github.com/NVIDIA/TRTorch/commit/25f4371))
@@ -232,7 +232,7 @@ Signed-off-by: Naren Dasan <[email protected]>
 Signed-off-by: Naren Dasan <[email protected]>
 
 
-# v0.1.0 (2020-10-23)
+# 0.1.0 (2020-10-23)
 
 
 ### Bug Fixes
@@ -303,7 +303,7 @@ Signed-off-by: Naren Dasan <[email protected]>
 
 
 
-#  v0.2.0 (2021-02-25)
+#  0.2.0 (2021-02-25)
 
 
 * refactor!: Update bazel and trt versions ([0618b6b](https://github.com/NVIDIA/TRTorch/commit/0618b6b))
@@ -355,7 +355,7 @@ Signed-off-by: Naren Dasan <[email protected]>
 Signed-off-by: Naren Dasan <[email protected]>
 
 
-# v0.3.0 (2021-05-13)
+# 0.3.0 (2021-05-13)
 
 
 ### Bug Fixes
@@ -430,7 +430,7 @@ source with the dependencies in WORKSPACE changed
 Signed-off-by: Naren Dasan <[email protected]>
 Signed-off-by: Naren Dasan <[email protected]>
 
-#  v0.4.0 (2021-08-24)
+#  0.4.0 (2021-08-24)
 
 
 * feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, ([9327cce](https://github.com/NVIDIA/TRTorch/commit/9327cce))
@@ -491,26 +491,27 @@ Signed-off-by: Naren Dasan <[email protected]>
 * **aten::ones:** Adding support for aten::ones ([2b45a3d](https://github.com/NVIDIA/TRTorch/commit/2b45a3d))
 * **aten::slice:** Patching slice for new optional params ([a11287f](https://github.com/NVIDIA/TRTorch/commit/a11287f))
 * **aten::sqrt:** Adding support for sqrt evaluators ([6aaba3b](https://github.com/NVIDIA/TRTorch/commit/6aaba3b))
-* Support fallback options in trtorchc ([ad966b7](https://github.com/NVIDIA/TRTorch/commit/ad966b7))
 * **aten::std|aten::masked_fill:** Implement masked_fill, aten::std ([a086a5b](https://github.com/NVIDIA/TRTorch/commit/a086a5b))
 * **aten::std|aten::masked_fill:** Implement masked_fill, aten::std ([2866627](https://github.com/NVIDIA/TRTorch/commit/2866627))
+* **jetson:** Support for Jetpack 4.6 ([9760fe3](https://github.com/NVIDIA/TRTorch/commit/9760fe3))
+* **to_backend:** Updating backend integration preproc function ([080b594](https://github.com/NVIDIA/TRTorch/commit/080b594))
+* Enable sparsity support in TRTorch ([f9e1f2b](https://github.com/NVIDIA/TRTorch/commit/f9e1f2b))
+* **trtorchc:** Adding flag for sparse weights ([bfdc6f5](https://github.com/NVIDIA/TRTorch/commit/bfdc6f5))
 * Add aten::full converter, quantization ops testcases ([9f2ffd0](https://github.com/NVIDIA/TRTorch/commit/9f2ffd0))
 * Add aten::type_as lowering pass ([b57a6dd](https://github.com/NVIDIA/TRTorch/commit/b57a6dd))
 * Add functionality for QAT workflow ([fc8eafb](https://github.com/NVIDIA/TRTorch/commit/fc8eafb))
 * Add functionality for QAT workflow ([f776e76](https://github.com/NVIDIA/TRTorch/commit/f776e76))
 * Add support for providing input datatypes in TRTorch ([a3f4a3c](https://github.com/NVIDIA/TRTorch/commit/a3f4a3c))
 * Adding automatic casting to compare layers ([90af26e](https://github.com/NVIDIA/TRTorch/commit/90af26e))
-* Enable sparsity support in TRTorch ([f9e1f2b](https://github.com/NVIDIA/TRTorch/commit/f9e1f2b))
 * Enable sparsity support in TRTorch ([decd0ed](https://github.com/NVIDIA/TRTorch/commit/decd0ed))
-* **jetson:** Support for Jetpack 4.6 ([9760fe3](https://github.com/NVIDIA/TRTorch/commit/9760fe3))
 * Enable TRT 8.0 QAT functionality in TRTorch ([c76a28a](https://github.com/NVIDIA/TRTorch/commit/c76a28a))
-* **to_backend:** Updating backend integration preproc function ([080b594](https://github.com/NVIDIA/TRTorch/commit/080b594))
 * Makefile for trtorchrt.so example ([c60c521](https://github.com/NVIDIA/TRTorch/commit/c60c521))
 * show pytorch code of unsupported operators ([2ee2a84](https://github.com/NVIDIA/TRTorch/commit/2ee2a84))
 * support aten::Int ([5bc977d](https://github.com/NVIDIA/TRTorch/commit/5bc977d))
-* Using shared_ptrs to manage TRT resources in runtime ([e336630](https://github.com/NVIDIA/TRTorch/commit/e336630))
 * **trtorchc:** Adding more dtype aliases ([652fb13](https://github.com/NVIDIA/TRTorch/commit/652fb13))
 * **trtorchc:** Adding new support for dtypes and formats in ([c39bf81](https://github.com/NVIDIA/TRTorch/commit/c39bf81))
+* Support fallback options in trtorchc ([ad966b7](https://github.com/NVIDIA/TRTorch/commit/ad966b7))
+* Using shared_ptrs to manage TRT resources in runtime ([e336630](https://github.com/NVIDIA/TRTorch/commit/e336630))
 * **trtorchc:** Embedding engines in modules from the CLI ([2b4b9e3](https://github.com/NVIDIA/TRTorch/commit/2b4b9e3))
 
 
 
@@ -14,106 +14,108 @@ to standard TorchScript. Load with `torch.jit.load()` and run like you would run
 
 ```
 trtorchc [input_file_path] [output_file_path]
-    [input_specs...] {OPTIONS}
+  [input_specs...] {OPTIONS}
 
-    TRTorch is a compiler for TorchScript, it will compile and optimize
-    TorchScript programs to run on NVIDIA GPUs using TensorRT
+  TRTorch is a compiler for TorchScript, it will compile and optimize
+  TorchScript programs to run on NVIDIA GPUs using TensorRT
 
-  OPTIONS:
+OPTIONS:
 
-      -h, --help                        Display this help menu
-      Verbiosity of the compiler
-        -v, --verbose                     Dumps debugging information about the
-                                          compilation process onto the console
-        -w, --warnings                    Disables warnings generated during
-                                          compilation onto the console (warnings
-                                          are on by default)
-        --i, --info                       Dumps info messages generated during
-                                          compilation onto the console
-      --build-debuggable-engine         Creates a debuggable engine
-      --use-strict-types                Restrict operating type to only use set
-                                        operation precision
-      --allow-gpu-fallback              (Only used when targeting DLA
-                                        (device-type)) Lets engine run layers on
-                                        GPU if they are not supported on DLA
-      --allow-torch-fallback            Enable layers to run in torch if they
-                                        are not supported in TensorRT
-      --disable-tf32                    Prevent Float32 layers from using the
-                                        TF32 data format
-      -p[precision...],
-      --enabled-precision=[precision...]
-                                        (Repeatable) Enabling an operating
-                                        precision for kernels to use when
-                                        building the engine (Int8 requires a
-                                        calibration-cache argument) [ float |
-                                        float32 | f32 | fp32 | half | float16 |
-                                        f16 | fp16 | int8 | i8 | char ]
-                                        (default: float)
-      -d[type], --device-type=[type]    The type of device the engine should be
-                                        built for [ gpu | dla ] (default: gpu)
-      --gpu-id=[gpu_id]                 GPU id if running on multi-GPU platform
-                                        (defaults to 0)
-      --dla-core=[dla_core]             DLACore id if running on available DLA
-                                        (defaults to 0)
-      --engine-capability=[capability]  The type of device the engine should be
-                                        built for [ standard | safety |
-                                        dla_standalone ]
-      --calibration-cache-file=[file_path]
-                                        Path to calibration cache file to use
-                                        for post training quantization
-      --ffo=[forced_fallback_ops...],
-      --forced-fallback-op=[forced_fallback_ops...]
-                                        (Repeatable) Operator in the graph that
-                                        should be forced to fallback to Pytorch
-                                        for execution (allow torch fallback must
-                                        be set)
-      --ffm=[forced_fallback_mods...],
-      --forced-fallback-mod=[forced_fallback_mods...]
-                                        (Repeatable) Module that should be
-                                        forced to fallback to Pytorch for
-                                        execution (allow torch fallback must be
-                                        set)
-      --embed-engine                    Whether to treat input file as a
-                                        serialized TensorRT engine and embed it
-                                        into a TorchScript module (device spec
-                                        must be provided)
-      --num-min-timing-iter=[num_iters] Number of minimization timing iterations
-                                        used to select kernels
-      --num-avg-timing-iters=[num_iters]
-                                        Number of averaging timing iterations
-                                        used to select kernels
-      --workspace-size=[workspace_size] Maximum size of workspace given to
-                                        TensorRT
-      --max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
-                                        set, 0 means not set)
-      -t[threshold],
-      --threshold=[threshold]           Maximum acceptable numerical deviation
-                                        from standard torchscript output
-                                        (default 2e-5)
-      --no-threshold-check              Skip checking threshold compliance
-      --truncate-long-double,
-      --truncate, --truncate-64bit      Truncate weights that are provided in
-                                        64bit to 32bit (Long, Double to Int,
-                                        Float)
-      --save-engine                     Instead of compiling a full a
-                                        TorchScript program, save the created
-                                        engine to the path specified as the
-                                        output path
-      input_file_path                   Path to input TorchScript file
-      output_file_path                  Path for compiled TorchScript (or
-                                        TensorRT engine) file
-      input_specs...                    Specs for inputs to engine, can either
-                                        be a single size or a range defined by
-                                        Min, Optimal, Max sizes, e.g.
-                                        "(N,..,C,H,W)"
-                                        "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
-                                        Data Type and format can be specified by
-                                        adding an "@" followed by dtype and "%"
-                                        followed by format to the end of the
-                                        shape spec. e.g. "(3, 3, 32,
-                                        32)@f16%NHWC"
-      "--" can be used to terminate flag options and force all following
-      arguments to be treated as positional options
+    -h, --help                        Display this help menu
+    Verbiosity of the compiler
+      -v, --verbose                     Dumps debugging information about the
+                                        compilation process onto the console
+      -w, --warnings                    Disables warnings generated during
+                                        compilation onto the console (warnings
+                                        are on by default)
+      --i, --info                       Dumps info messages generated during
+                                        compilation onto the console
+    --build-debuggable-engine         Creates a debuggable engine
+    --use-strict-types                Restrict operating type to only use set
+                                      operation precision
+    --allow-gpu-fallback              (Only used when targeting DLA
+                                      (device-type)) Lets engine run layers on
+                                      GPU if they are not supported on DLA
+    --allow-torch-fallback            Enable layers to run in torch if they
+                                      are not supported in TensorRT
+    --disable-tf32                    Prevent Float32 layers from using the
+                                      TF32 data format
+    --sparse-weights                  Enable sparsity for weights of conv and
+                                      FC layers
+    -p[precision...],
+    --enabled-precision=[precision...]
+                                      (Repeatable) Enabling an operating
+                                      precision for kernels to use when
+                                      building the engine (Int8 requires a
+                                      calibration-cache argument) [ float |
+                                      float32 | f32 | fp32 | half | float16 |
+                                      f16 | fp16 | int8 | i8 | char ]
+                                      (default: float)
+    -d[type], --device-type=[type]    The type of device the engine should be
+                                      built for [ gpu | dla ] (default: gpu)
+    --gpu-id=[gpu_id]                 GPU id if running on multi-GPU platform
+                                      (defaults to 0)
+    --dla-core=[dla_core]             DLACore id if running on available DLA
+                                      (defaults to 0)
+    --engine-capability=[capability]  The type of device the engine should be
+                                      built for [ standard | safety |
+                                      dla_standalone ]
+    --calibration-cache-file=[file_path]
+                                      Path to calibration cache file to use
+                                      for post training quantization
+    --ffo=[forced_fallback_ops...],
+    --forced-fallback-op=[forced_fallback_ops...]
+                                      (Repeatable) Operator in the graph that
+                                      should be forced to fallback to Pytorch
+                                      for execution (allow torch fallback must
+                                      be set)
+    --ffm=[forced_fallback_mods...],
+    --forced-fallback-mod=[forced_fallback_mods...]
+                                      (Repeatable) Module that should be
+                                      forced to fallback to Pytorch for
+                                      execution (allow torch fallback must be
+                                      set)
+    --embed-engine                    Whether to treat input file as a
+                                      serialized TensorRT engine and embed it
+                                      into a TorchScript module (device spec
+                                      must be provided)
+    --num-min-timing-iter=[num_iters] Number of minimization timing iterations
+                                      used to select kernels
+    --num-avg-timing-iters=[num_iters]
+                                      Number of averaging timing iterations
+                                      used to select kernels
+    --workspace-size=[workspace_size] Maximum size of workspace given to
+                                      TensorRT
+    --max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
+                                      set, 0 means not set)
+    -t[threshold],
+    --threshold=[threshold]           Maximum acceptable numerical deviation
+                                      from standard torchscript output
+                                      (default 2e-5)
+    --no-threshold-check              Skip checking threshold compliance
+    --truncate-long-double,
+    --truncate, --truncate-64bit      Truncate weights that are provided in
+                                      64bit to 32bit (Long, Double to Int,
+                                      Float)
+    --save-engine                     Instead of compiling a full a
+                                      TorchScript program, save the created
+                                      engine to the path specified as the
+                                      output path
+    input_file_path                   Path to input TorchScript file
+    output_file_path                  Path for compiled TorchScript (or
+                                      TensorRT engine) file
+    input_specs...                    Specs for inputs to engine, can either
+                                      be a single size or a range defined by
+                                      Min, Optimal, Max sizes, e.g.
+                                      "(N,..,C,H,W)"
+                                      "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
+                                      Data Type and format can be specified by
+                                      adding an "@" followed by dtype and "%"
+                                      followed by format to the end of the
+                                      shape spec. e.g. "(3, 3, 32,
+                                      32)@f16%NHWC"
+    "--" can be used to terminate flag options and force all following
+    arguments to be treated as positional options
 ```
 
 e.g.
 
@@ -246,6 +246,9 @@ int main(int argc, char** argv) {
   args::Flag disable_tf32(
       parser, "disable-tf32", "Prevent Float32 layers from using the TF32 data format", {"disable-tf32"});
 
+  args::Flag sparse_weights(
+      parser, "sparse-weights", "Enable sparsity for weights of conv and FC layers", {"sparse-weights"});
+
   args::ValueFlagList<std::string> enabled_precision(
       parser,
       "precision",
@@ -464,6 +467,10 @@ int main(int argc, char** argv) {
     compile_settings.disable_tf32 = true;
   }
 
+  if (sparse_weights) {
+    compile_settings.sparse_weights = true;
+  }
+
   std::string calibration_cache_file_path = "";
   if (calibration_cache_file) {
     calibration_cache_file_path = resolve_path(args::get(calibration_cache_file));
 
@@ -716,7 +716,7 @@
         </div>
        </div>
        <p>
-        <img alt="70b640c0c3dd4f07b7ff43006c6adb49" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
+        <img alt="ca63aba6e3dc4251aef5ad05ec7c326d" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
        </p>
        <h1 id="notebooks-resnet50-example--page-root">
         TRTorch Getting Started - ResNet 50
 
@@ -810,7 +810,7 @@
         </div>
        </div>
        <p>
-        <img alt="466e6cbb236d47c5b3895a9f0d8b4046" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
+        <img alt="1b2a3097f3d048229ca0b49a1e3bf61a" src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png"/>
        </p>
        <h1 id="notebooks-lenet-getting-started--page-root">
         TRTorch Getting Started - LeNet