-
Notifications
You must be signed in to change notification settings - Fork 364
chore: Add documentation for dynamo.compile backend #2389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
.. _dynamo_export: | ||
|
||
Torch-TensorRT (Dynamo) Backend | ||
======================== | ||
This guide presents Torch-TensorRT dynamo backend which compiles Pytorch programs | ||
into TensorRT engines through torch dynamo. Pytorch 2.1 introduced ``torch.export`` APIs which | ||
can export graphs from Pytorch programs using torch dynamo. Torch-TensorRT dynamo | ||
backend compiles these exported graphs and optimizes them using TensorRT. Here's a simple | ||
usage of the dynamo backend | ||
|
||
.. code-block:: python | ||
|
||
import torch | ||
import torch_tensorrt | ||
|
||
model = MyModel().eval().cuda() | ||
inputs = torch.randn((1, 3, 224, 224), dtype=torch.float32).cuda() | ||
exp_program = torch.export(model, inputs) | ||
trt_gm = torch_tensorrt.dynamo.compile(exp_program, inputs) # Output is a torch.fx.GraphModule | ||
trt_gm(inputs) | ||
peri044 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``torch_tensorrt.dynamo.compile`` is the main API for users to interact with Torch-TensorRT. | ||
The input type of the model should be ``ExportedProgram`` (ideally the output of torch.export) and output types is a ``torch.fx.GraphModule`` object. | ||
|
||
Customizations | ||
--------------------------------------------- | ||
|
||
There are lot of options for users to customize their settings for optimizing with TensorRT. | ||
Some of the frequently used options are as follows: | ||
|
||
|
||
* inputs - For static shapes, this can be a list of torch tensors or `torch_tensorrt.Input` objects. For dynamic shapes, this should be a list of ``torch_tensorrt.Input`` objects. | ||
* enabled_precisions - Set of precisions that TensorRT builder can use during optimization. | ||
* truncate_long_and_double - Truncates long and double values to int and floats respectively. | ||
* torch_executed_ops - Operators which are forced to be executed by Torch. | ||
* min_block_size - Minimum number of consecutive operators required to be executed as a TensorRT segment. | ||
peri044 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The complete list of options can be found `here <https://github.com/pytorch/TensorRT/blob/123a486d6644a5bbeeec33e2f32257349acc0b8f/py/torch_tensorrt/dynamo/compile.py#L51-L77>`_ | ||
Note: We do not support INT precision currently in Dynamo. Support for this currently exists in | ||
our Torchscript IR. We plan to implement similar support for dynamo in our next release. | ||
|
||
Under the hood | ||
-------------- | ||
|
||
Under the hood, ``torch_tensorrt.dynamo.compile`` performs the following on the graph. | ||
|
||
* Lowering - Applies lowering passes to add/remove operators for optimal conversion. | ||
* Partitioning - Partitions the graph into Pytorch and TensorRT segments based on the ``min_block_size`` and ``torch_executed_ops`` field. | ||
* Conversion - Pytorch ops get converted into TensorRT ops in this phase. | ||
* Optimization - Post conversion, we build the TensorRT engine and embed this inside the pytorch graph. | ||
|
||
Tracing | ||
------- | ||
|
||
``torch_tensorrt.dynamo.trace`` can be used to trace a Pytorch graphs and produce ``ExportedProgram``. | ||
This internally performs some decompositions of operators for downstream optimization. | ||
The ``ExportedProgram`` can then be used with ``torch_tensorrt.dynamo.compile`` API. | ||
If you have dynamic input shapes in your model, you can use this ``torch_tensorrt.dynamo.trace`` to export | ||
the model with dynamic shapes. Alternatively, you can use ``torch.export`` `with constraints <https://pytorch.org/docs/stable/export.html#expressing-dynamism>`_ directly as well. | ||
|
||
.. code-block:: python | ||
|
||
import torch | ||
import torch_tensorrt | ||
|
||
inputs = torch_tensorrt.Input(min_shape=(1, 3, 224, 224), | ||
opt_shape=(4, 3, 224, 224), | ||
max_shape=(8, 3, 224, 224), | ||
dtype=torch.float32) | ||
model = MyModel().eval() | ||
exp_program = torch_tensorrt.dynamo.trace(model, inputs) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.