Skip to content

Commit 221449e

Browse files
committed
restructuring
1 parent 497bfab commit 221449e

File tree

1 file changed

+80
-14
lines changed

1 file changed

+80
-14
lines changed

README.md

Lines changed: 80 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,88 @@
1-
# Torch-TensorRT
1+
<div align="center">
2+
3+
Torch-TensorRT
4+
===========================
5+
<h4> Easily achieve the best inference performance for any PyTorch model on the NVIDIA platform. </h4>
26

37
[![Documentation](https://img.shields.io/badge/docs-master-brightgreen)](https://nvidia.github.io/Torch-TensorRT/)
8+
[![pytorch](https://img.shields.io/badge/PyTorch-2.2-green)](https://www.python.org/downloads/release/python-31013/)
9+
[![cuda](https://img.shields.io/badge/cuda-12.1-green)](https://developer.nvidia.com/cuda-downloads)
10+
[![trt](https://img.shields.io/badge/TensorRT-8.6.1-green)](https://github.com/nvidia/tensorrt-llm)
11+
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
412
[![CircleCI](https://circleci.com/gh/pytorch/TensorRT.svg?style=svg)](https://app.circleci.com/pipelines/github/pytorch/TensorRT)
513

6-
> Ahead of Time (AOT) compiling for PyTorch JIT and FX
14+
---
15+
<div align="left">
16+
17+
Optimum-NVIDIA delivers the best inference performance on the NVIDIA platform through Hugging Face. Run LLaMA 2 at 1,200 tokens/second (up to 28x faster than the framework) by changing just a single line in your existing transformers code.
18+
19+
</div></div>
20+
21+
## Installation
22+
Stable versions are published on PyPI
23+
```bash
24+
pip install torch-tensorrt
25+
```
26+
27+
Nightly versions are published on the PyTorch package index
28+
```bash
29+
pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121
30+
```
31+
32+
Torch-TensorRT is also distributed in the ready-to-run [NVIDIA NGC PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) which has all dependencies with the proper versions and example notebooks included.
33+
34+
For more advanced installation methods, please see here < TO FILL>
35+
36+
## Quickstart
37+
38+
### Option 1: torch.compile
39+
You can use Torch-TensorRT anywhere you use `torch.compile`:
40+
41+
```python
42+
import torch
43+
import torch_tensorrt
44+
45+
model = <YOUR MODEL HERE>
46+
x = <YOUR INPUT HERE>
47+
48+
optimized_model = torch.compile(model, backend="tensorrt")
49+
optimized_model(x) # compiled on first run
50+
51+
optimized_model(x) # this will be fast!
52+
```
53+
54+
### Option 2: Export
55+
If you want to optimize your model ahead-of-time and/or run inference in a C++ environment, Torch-TensorRT provides an export-style workflow that serializes an optimized module. This module can be deployed in PyTorch or with libtorch (i.e. without a Python dependency).
56+
57+
#### Step 1: Optimize + serialize
58+
```python
59+
import torch
60+
import torch_tensorrt
61+
62+
model = <YOUR MODEL HERE>
63+
x = <YOUR INPUT HERE>
64+
65+
optimized_model = torch_tensorrt.compile(model, example_inputs)
66+
serialize
67+
```
68+
69+
#### Step 2: Deploy
70+
##### Deployment in PyTorch:
71+
```python
772

8-
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
73+
```
74+
75+
##### Deployment in C++:
76+
```cpp
77+
78+
```
79+
80+
## Further resources
81+
- [Optimize models from Hugging Face with Torch-TensorRT]() \[coming soon\]
82+
- [Run your model in FP8 with Torch-TensorRT]() \[coming soon\]
83+
- []
84+
85+
Torch-TensorRT compiles PyTorch by natively converting FX graphs, a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
986

1087
Resources:
1188
- [Documentation](https://nvidia.github.io/Torch-TensorRT/)
@@ -14,9 +91,6 @@ Resources:
1491
- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
1592
- [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.
1693

17-
## NVIDIA NGC Container
18-
Torch-TensorRT is distributed in the ready-to-run NVIDIA [NGC PyTorch Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) starting with 21.11. We recommend using this prebuilt container to experiment & develop with Torch-TensorRT; it has all dependencies with the proper versions as well as example notebooks included.
19-
2094
## Building a docker container for Torch-TensorRT
2195

2296
We provide a `Dockerfile` in `docker/` directory. It expects a PyTorch NGC container as a base but can easily be modified to build on top of any container that provides, PyTorch, CUDA, cuDNN and TensorRT. The dependency libraries in the container can be found in the <a href="https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html">release notes</a>.
@@ -121,14 +195,6 @@ These are the following dependencies used to verify the testcases. Torch-TensorR
121195
- cuDNN 8.9.5
122196
- TensorRT 8.6.1
123197

124-
## Prebuilt Binaries and Wheel files
125-
126-
Releases: https://github.com/pytorch/TensorRT/releases
127-
128-
```
129-
pip install tensorrt torch-tensorrt
130-
```
131-
132198
## Compiling Torch-TensorRT
133199

134200
### Installing Dependencies

0 commit comments

Comments
 (0)