Skip to content

Commit e837b7f

Browse files
authored
Merge pull request #48 from NVIDIA/pytorch_1.5.0
Upgrading to LibTorch 1.5.0 (CUDA 10.2, cuDNN 7.6.5, TensorRT 7.0.0)
2 parents 36d27da + 9d0bdaf commit e837b7f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+1142
-423
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@ cpp/ptq/datasets/data/
2424
tests/accuracy/datasets/data/*
2525
._.DS_Store
2626
*.tar.gz
27+
*.tgz

BUILD

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ pkg_tar(
1111
"//core/conversion/evaluators:include",
1212
"//core/execution:include",
1313
"//core/lowering:include",
14-
"//core/lowering/irfusers:include",
14+
"//core/lowering/passes:include",
1515
"//core/util:include",
1616
"//core/util/logging:include"
1717
],

README.md

Lines changed: 75 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
> Ahead of Time (AOT) compiling for PyTorch JIT
44
5-
TRTorch is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, TRTorch is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. TRTorch operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/F16) and other settings for your module.
5+
TRTorch is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, TRTorch is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. TRTorch operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/F16/INT8) and other settings for your module.
66

77
More Information / System Architecture:
88

@@ -35,28 +35,89 @@ auto results = trt_mod.forward({in_tensor});
3535
| Platform | Support |
3636
| -------- | ------- |
3737
| Linux AMD64 / GPU | **Supported** |
38-
| Linux aarch64 / GPU | **Planned/Possible with Native Compiation and small modifications to the build system** |
38+
| Linux aarch64 / GPU | **Planned/Possible with Native Compiation but untested** |
3939
| Linux aarch64 / DLA | **Planned/Possible with Native Compilation but untested** |
4040
| Windows / GPU | - |
4141
| Linux ppc64le / GPU | - |
4242
4343
### Dependencies
4444
45-
- Libtorch 1.4.0
46-
- CUDA 10.1
47-
- cuDNN 7.6
48-
- TensorRT 6.0.1
45+
- Libtorch 1.5.0
46+
- CUDA 10.2
47+
- cuDNN 7.6.5
48+
- TensorRT 7.0.0
4949
5050
## Prebuilt Binaries
5151
5252
Releases: https://github.com/NVIDIA/TRTorch/releases
5353
5454
## Compiling TRTorch
5555
56-
Install TensorRT, CUDA and cuDNN on the system before starting to compile.
56+
### Installing Dependencies
5757
58+
You need to start by having CUDA installed on the system, Libtorch will automatically be pulled for you by bazel,
59+
then you have two options.
60+
61+
#### 1. Building using cuDNN & TensorRT tarball distributions
62+
63+
> This is recommended so as to build TRTorch hermetically and insures any bugs are not caused by version issues
64+
65+
> Make sure when running TRTorch that these versions of the libraries are prioritized in your `$LD_LIBRARY_PATH`
66+
67+
1. You need to download the tarball distributions of TensorRT and cuDNN from the NVIDIA website.
68+
- https://developer.nvidia.com/cudnn
69+
- https://developer.nvidia.com/tensorrt
70+
2. Place these files in a directory (the directories `thrid_party/distdir/[x86_64-linux-gnu | aarch64-linux-gnu]` exist for this purpose)
71+
3. Compile using:
72+
``` shell
73+
bazel build //:libtrtorch --compilation_mode opt --distdir thrid_party/distdir/[x86_64-linux-gnu | aarch64-linux-gnu]
74+
```
75+
76+
#### 2. Building using locally installed cuDNN & TensorRT
77+
78+
> If you find bugs and you compiled using this method please disclose it in the issue
79+
> (an `ldd` dump would be nice too)
80+
81+
1. Install TensorRT, CUDA and cuDNN on the system before starting to compile.
82+
2. In `WORKSPACE` comment out
83+
```py
84+
# Downloaded distributions to use with --distdir
85+
http_archive(
86+
name = "cudnn",
87+
urls = ["<URL>",],
88+
89+
build_file = "@//third_party/cudnn/archive:BUILD",
90+
sha256 = "<TAR SHA256>",
91+
strip_prefix = "cuda"
92+
)
93+
94+
http_archive(
95+
name = "tensorrt",
96+
urls = ["<URL>",],
97+
98+
build_file = "@//third_party/tensorrt/archive:BUILD",
99+
sha256 = "<TAR SHA256>",
100+
strip_prefix = "TensorRT-<VERSION>"
101+
)
102+
```
103+
and uncomment
104+
```py
105+
# Locally installed dependencies
106+
new_local_repository(
107+
name = "cudnn",
108+
path = "/usr/",
109+
build_file = "@//third_party/cudnn/local:BUILD"
110+
)
111+
112+
new_local_repository(
113+
name = "tensorrt",
114+
path = "/usr/",
115+
build_file = "@//third_party/tensorrt/local:BUILD"
116+
)
117+
```
118+
3. Compile using:
58119
``` shell
59-
bazel build //:libtrtorch --compilation_mode=opt
120+
bazel build //:libtrtorch --compilation_mode opt
60121
```
61122

62123
### Debug build
@@ -84,9 +145,13 @@ Thanks for wanting to contribute! There are two main ways to handle supporting a
84145

85146
### In my application?
86147

87-
> The Node Converter Registry is not exposed in the top level API but you can try using the internal headers shipped with the tarball.
148+
> The Node Converter Registry is not exposed in the top level API but in the internal headers shipped with the tarball.
149+
150+
You can register a converter for your op using the `NodeConverterRegistry` inside your application.
151+
152+
## Known Limitations
88153

89-
You can register a converter for your op using the NodeConverterRegistry inside your application.
154+
- You cannot use both Adaptive Pooling in PyTorch and also use TRTorch Dynamic input shape
90155

91156
## Structure of the repo
92157

WORKSPACE

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,6 @@ py_repositories()
1616
load("@rules_python//python:pip.bzl", "pip_repositories", "pip_import")
1717
pip_repositories()
1818

19-
http_archive(
20-
name = "libtorch",
21-
build_file = "@//third_party/libtorch:BUILD",
22-
strip_prefix = "libtorch",
23-
urls = ["https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.4.0.zip"],
24-
sha256 = "f214bfde532877aa5d4e0803e51a28fa8edd97b6a44b6615f75a70352b6b542e"
25-
)
26-
27-
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
2819
http_archive(
2920
name = "rules_pkg",
3021
url = "https://github.com/bazelbuild/rules_pkg/releases/download/0.2.4/rules_pkg-0.2.4.tar.gz",
@@ -34,24 +25,53 @@ http_archive(
3425
load("@rules_pkg//:deps.bzl", "rules_pkg_dependencies")
3526
rules_pkg_dependencies()
3627

28+
# CUDA should be installed on the system locally
3729
new_local_repository(
3830
name = "cuda",
39-
path = "/usr/local/cuda-10.1/targets/x86_64-linux/",
31+
path = "/usr/local/cuda-10.2/targets/x86_64-linux/",
4032
build_file = "@//third_party/cuda:BUILD",
4133
)
4234

43-
new_local_repository(
35+
http_archive(
36+
name = "libtorch",
37+
build_file = "@//third_party/libtorch:BUILD",
38+
strip_prefix = "libtorch",
39+
urls = ["https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.5.0.zip"],
40+
sha256 = "0efdd4e709ab11088fa75f0501c19b0e294404231442bab1d1fb953924feb6b5"
41+
)
42+
43+
# Downloaded distributions to use with --distdir
44+
http_archive(
4445
name = "cudnn",
45-
path = "/usr/",
46-
build_file = "@//third_party/cudnn:BUILD"
46+
urls = ["https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.2_20191118/cudnn-10.2-linux-x64-v7.6.5.32.tgz",],
47+
48+
build_file = "@//third_party/cudnn/archive:BUILD",
49+
sha256 = "600267f2caaed2fd58eb214ba669d8ea35f396a7d19b94822e6b36f9f7088c20",
50+
strip_prefix = "cuda"
4751
)
4852

49-
new_local_repository(
50-
name = "tensorrt",
51-
path = "/usr/",
52-
build_file = "@//third_party/tensorrt:BUILD"
53+
http_archive(
54+
name = "tensorrt",
55+
urls = ["https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/7.0/7.0.0.11/tars/TensorRT-7.0.0.11.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz",],
56+
57+
build_file = "@//third_party/tensorrt/archive:BUILD",
58+
sha256 = "c7d73b2585b18aae68b740249efa8c8ba5ae852abe9a023720595432a8eb4efd",
59+
strip_prefix = "TensorRT-7.0.0.11"
5360
)
5461

62+
## Locally installed dependencies
63+
# new_local_repository(
64+
# name = "cudnn",
65+
# path = "/usr/",
66+
# build_file = "@//third_party/cudnn/local:BUILD"
67+
#)
68+
69+
# new_local_repository(
70+
# name = "tensorrt",
71+
# path = "/usr/",
72+
# build_file = "@//third_party/tensorrt/local:BUILD"
73+
#)
74+
5575
git_repository(
5676
name = "googletest",
5777
remote = "https://github.com/google/googletest",

core/compiler.cpp

Lines changed: 22 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,11 @@
77

88
#include "ATen/core/function_schema.h"
99

10-
#include "torch/csrc/jit/ir.h"
11-
#include "torch/csrc/jit/pass_manager.h"
10+
#include "torch/csrc/jit/frontend/function_schema_parser.h"
11+
#include "torch/csrc/jit/ir/ir.h"
12+
#include "torch/csrc/jit/passes/pass_manager.h"
1213
#include "torch/csrc/jit/passes/lower_graph.h"
1314
#include "torch/csrc/jit/passes/graph_fuser.h"
14-
#include "torch/csrc/jit/script/module.h"
15-
#include "torch/csrc/jit/script/function_schema_parser.h"
1615

1716
#include "core/util/prelude.h"
1817
#include "core/compiler.h"
@@ -42,71 +41,59 @@ c10::FunctionSchema GenerateGraphSchema(torch::jit::script::Module mod, std::str
4241

4342
void AddEngineToGraph(torch::jit::script::Module mod, std::shared_ptr<torch::jit::Graph>& g, std::string& serialized_engine) {
4443
execution::EngineID uid = execution::RegisterEngineFromSerializedEngine(serialized_engine);
45-
auto schema = execution::GetEngineFunctionSchema(uid);
4644
auto num_io = execution::GetEngineIO(uid);
4745

4846
auto self = g->addInput("self.1");
4947
self->setType(mod.type());
50-
std::vector<torch::jit::Value*> graph_inputs;
48+
49+
auto id_val = g->insertConstant(uid);
50+
51+
std::vector<torch::jit::Value*> engine_inputs;
52+
engine_inputs.push_back(id_val);
53+
5154
for (uint64_t i = 0; i < num_io.first; i++) {
5255
auto in_val = g->addInput("");
5356
in_val->setType(c10::TensorType::get());
54-
graph_inputs.push_back(in_val);
57+
engine_inputs.push_back(in_val);
5558
}
5659

57-
auto engine_node = g->create(c10::Symbol::fromQualString(schema.name()), torch::jit::ArrayRef<torch::jit::Value*>(graph_inputs), num_io.second);
60+
auto engine_node = g->create(c10::Symbol::fromQualString("trt::execute_engine"), torch::jit::ArrayRef<torch::jit::Value*>(engine_inputs), num_io.second);
5861
g->block()->appendNode(engine_node);
5962

6063
for (auto o : engine_node->outputs()) {
6164
g->registerOutput(o);
6265
}
6366

67+
LOG_DEBUG(*g << "(AddEngineToGraph)\n");
68+
6469
return;
6570
}
6671

6772
bool CheckMethodOperatorSupport(const torch::jit::script::Module& mod,
6873
std::string method_name) {
69-
auto g = mod.get_method(method_name).graph();
70-
// Go through PyTorch Lowering to simplify graph and extract weight parameters
71-
auto graph_and_parameters = torch::jit::LowerGraph(*g, mod._ivalue());
74+
// Go through Lowering to simplify graph and extract weight parameters
75+
auto graph_and_parameters = lowering::Lower(mod, method_name);
7276

73-
g = graph_and_parameters.first;
74-
75-
// Go through TRTorch Lowering to reformat graph to be conversion friendly
76-
// and also segment for accelerators and executors (TRT-DLA, TRT-GPU, PYT)
77-
lowering::LowerGraph(g);
78-
79-
auto params = graph_and_parameters.second;
80-
auto named_params = conversion::get_named_params(g->inputs(), params);
77+
auto g = graph_and_parameters.first;
8178
LOG_DEBUG(*g << "(CheckMethodOperatorSupport)\n");
8279

83-
// Is this necessary?
84-
lowering::LowerBlock(g->block());
85-
8680
return conversion::VerifyConverterSupportForBlock(g->block());
8781
}
8882

8983
std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod,
9084
std::string method_name,
9185
ExtraInfo cfg) {
92-
auto convert_cfg = std::move(cfg.convert_info);
93-
94-
auto g = mod.get_method(method_name).graph();
95-
// Go through PyTorch Lowering to simplify graph and extract weight parameters
96-
auto graph_and_parameters = torch::jit::LowerGraph(*g, mod._ivalue());
9786

98-
g = graph_and_parameters.first;
99-
100-
// Go through TRTorch Lowering to reformat graph to be conversion friendly
101-
// and also segment for accelerators and executors (TRT-DLA, TRT-GPU, PYT)
102-
lowering::LowerGraph(g);
87+
// Go through Lowering to simplify graph and extract weight parameters
88+
auto graph_and_parameters = lowering::Lower(mod, method_name);
10389

90+
auto convert_cfg = std::move(cfg.convert_info);
91+
auto g = graph_and_parameters.first;
10492
auto params = graph_and_parameters.second;
10593
auto named_params = conversion::get_named_params(g->inputs(), params);
94+
10695
LOG_INFO(*g << "(CompileGraph)\n");
10796

108-
// Is this necessary?
109-
lowering::LowerBlock(g->block());
11097
auto engine = ConvertBlockToEngine(g->block(), convert_cfg, named_params);
11198
return std::move(engine);
11299
}
@@ -115,7 +102,7 @@ torch::jit::script::Module CompileGraph(const torch::jit::script::Module& mod,
115102
ExtraInfo cfg) {
116103
// TODO: Should be doing a functional transform but need PR #31978
117104
// [jit] More robust mangling
118-
// torch::jit::script::Module new_mod = mod.clone();
105+
//torch::jit::script::Module new_mod = mod.clone();
119106
torch::jit::script::Module new_mod(mod._ivalue()->name() + "_trt");
120107
std::vector<std::shared_ptr<torch::jit::Graph>> graphs;
121108
for (const torch::jit::script::Method& method : mod.get_methods()) {

core/compiler.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#pragma once
22

33
#include <vector>
4-
#include "torch/csrc/jit/script/module.h"
4+
#include "torch/csrc/jit/api/module.h"
55
#include "core/conversion/conversion.h"
66

77
namespace trtorch {

core/conversion/conversion.cpp

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,7 @@ namespace conversion {
1414
bool isNodeConversionBlacklisted(const torch::jit::Node* n);
1515

1616
bool OpSupported(const torch::jit::Node* n) {
17-
bool evalable = evaluators::shouldEvalAtConversionTime(n);
18-
bool convertable = converters::node_is_convertable(n);
19-
return evalable || convertable;
17+
return evaluators::shouldEvalAtConversionTime(n) || converters::node_is_convertable(n);
2018
}
2119

2220
c10::optional<torch::jit::IValue> EvaluateNode(ConversionCtx* ctx, const torch::jit::Node* n, int level=0, int limit=10) {
@@ -75,8 +73,12 @@ void AddLayer(ConversionCtx* ctx, const torch::jit::Node* n) {
7573
LOG_DEBUG(ctx->logger, "Node input is a value that needs to be evaluated");
7674
auto eval = EvaluateNode(ctx, input_node);
7775
if (eval) {
78-
LOG_DEBUG(ctx->logger, "Found the value to be: " << eval.value());
79-
ctx->evaluated_value_map[input] = std::move(eval.value());
76+
if (!eval.value().isTensor()) {
77+
LOG_DEBUG(ctx->logger, "Found the value to be: " << eval.value());
78+
} else {
79+
LOG_DEBUG(ctx->logger, "Found the value to be a tensor (shape " << eval.value().toTensor().sizes() << ')');
80+
}
81+
ctx->AssociateValueAndIValue(input, eval.value());
8082
node_args.push_back(&(ctx->evaluated_value_map[input]));
8183
} else {
8284
LOG_DEBUG(ctx->logger, "Found the value is None");;
@@ -158,6 +160,10 @@ void AddInputs(ConversionCtx* ctx,
158160
TRTORCH_CHECK(profile->isValid(), "Optimization profile is invalid, please check the input range provided (conversion.AddInputs)");
159161

160162
ctx->cfg->addOptimizationProfile(profile);
163+
// TODO: Enable in TRT 7.1
164+
// if (ctx->op_precision == nvinfer1::DataType::kINT8) {
165+
// ctx->cfg->setCalibrationProfile(profile);
166+
// }
161167
}
162168

163169
void MarkOutputs(ConversionCtx* ctx, at::ArrayRef<const torch::jit::Value*> outputs) {
@@ -208,9 +214,7 @@ void ConvertBlockToNetDef(ConversionCtx* ctx, const torch::jit::Block* b, Conver
208214
}
209215

210216
for (const auto n : nodes) {
211-
if (converters::node_is_convertable(n)) {
212-
ctx->CheckLayerAddition(n);
213-
}
217+
ctx->CheckLayerAddition(n);
214218
}
215219

216220
auto outputs = b->outputs();

core/conversion/conversion.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#include <map>
44

55
#include "NvInfer.h"
6-
#include "torch/csrc/jit/ir.h"
6+
#include "torch/csrc/jit/ir/ir.h"
77
#include "core/conversion/conversionctx/ConversionCtx.h"
88

99
namespace torch {

0 commit comments

Comments
 (0)