FX converter doc

apbose · apbose · commit af6810d70768 · 2023-06-12T13:00:17.000-07:00
diff --git a/docsrc/contributors/fx_converters.rst b/docsrc/contributors/fx_converters.rst
@@ -0,0 +1,133 @@
+.. _conversion:
+
+FX Converters
+==================
+The converter library in Torch-TensorRT is located in ``TensorRT/py/torch_tensorrt/fx/converters``.
+They are categorized into - ``aten_ops_converters``, ``acc_ops_converters`` and ``nn_ops_converters``.
+The individual converters present are useful for the quantization workflow.
+The converters are registered using the ``tensorrt_converter``
+
+Steps
+==================
+
+External Interface
+-----------------
+Depending on whether the operation is generated using acc_trace, aten_trace or fx_trace, the converters are included in either of
+``aten_ops_converters``, ``acc_ops_converters`` and ``nn_ops_converters``.  The converters are registered using ``tensorrt_converter`` decorator. The function decorated
+has the arguments - ``network, target, args, kwargs, name``, which is common across all the operators schema.
+
+    * acc_ops_converters
+        *  acc_trace is produced by ``torch_tensorrt.fx.tracer.acc_tracer.acc_tracer.trace``.
+    * aten_ops
+        There are two options at present for this
+        *  Dynamo: aten_trace is produced by ``torch_tensorrt.dynamo.backend.compile``. The second round of trace is produced by ``aot_torch_tensorrt_ateb_backend`` by invoking ``aot_module_simplified`` from ``torch._functorch.aot_autograd``
+        *  FX : aten_trace is produced by ``torch_tensorrt.fx.tracer.dispatch_tracer.aten_tracer.trace``. This flow is more common currently, but this will soon be deprecated in torch_tensorrt.
+    * nn_ops
+        *  symbolic_trace is produced by ``torch.fx._symbolic_trace``.
+
+Converter implementation
+-----------------
+Three different kind of implementation is illustrated for converter implementation
+
+    * activation type
+        We illustrate the implementation of ``leaky_relu`` operator
+        * The implementation is present in ``py/torch_tensorrt/fx/impl/activation.py``
+        .. code_block:: none
+            def leaky_relu(
+                network: TRTNetwork,
+                target: Target,
+                source_ir: Optional[SourceIR],
+                name: str,
+                input_val: TRTTensor,
+                alpha: Optional[Any],
+            ):
+                #implementation
+
+        As illustrated above we define the function with args: network, name (defined in the trace), target (defined in the trace), source_ir which
+        can take enum SourceIR.aten, SourceIR.acc or SourceIR.nn. The input_val, alpha, beta(in some cases) are extraced from the trace created.
+        In the case of ``acc_trace`` the input parameters in kwargs, whereas in ``aten_trace`` the input parameters in args.
+
+        * The converter should be registered with the approriate opcode in the ``tensorrt_converter``.
+
+        Acc op is defined in ``py/torch_tensorrt/fx/converters/acc_ops_converters``
+        .. code_block:: none
+            def acc_ops_leaky_relu(
+                network: TRTNetwork,
+                target: Target,
+                args: Tuple[Argument, ...],
+                kwargs: Dict[str, Argument],
+                name: str,
+            ) -> Union[TRTTensor, Sequence[TRTTensor]]:
+            input_val = kwargs["input"]
+            negative_slope = kwargs["negative_slope"]
+            operation_type = trt.ActivationType.LEAKY_RELU
+
+            return activation.leaky_relu(
+                network, target, SourceIR.ACC, name, kwargs["input"], kwargs["negative_slope"]
+            )
+
+        Aten op is defined in ``py/torch_tensorrt/fx/converters/aten_ops_converters``
+        .. code_block:: none
+            def aten_ops_leaky_relu(
+                network: TRTNetwork,
+                target: Target,
+                args: Tuple[Argument, ...],
+                kwargs: Dict[str, Argument],
+                name: str,
+            ) -> Union[TRTTensor, Sequence[TRTTensor]]:
+                return activation.leaky_relu(network, target, SourceIR.ATEN, name, args[0], args[1])
+
+        As pointed above the acc_trace and aten_trace have input arguments in different form. acc has args in kwargs while aten has args in args
+    * operation type
+        We illustrate the implementation of ``fmod`` operator
+        * The implementation is present in ``py/torch_tensorrt/fx/impl/elementwise/ops.py``
+        .. code_block:: none
+              def fmod(
+                network: TRTNetwork,
+                target: Target,
+                source_ir: Optional[SourceIR],
+                name: str,
+                input: TRTTensor,
+                other: TRTTensor,
+              ) -> TRTTensor:
+                #implementation
+        The implementation of these operators are segregated according to the operator type
+            * Since fmod is elementwise it is present in ``py/torch_tensorrt/fx/impl/elementwise``
+              The ``py/torch_tensorrt/fx/impl/elementwise/base`` contains base functions for elementwise operator.
+            * Operators like sqrt will be present in ``py/torch_tensorrt/fx/impl/unary``
+              The ``py/torch_tensorrt/fx/impl/unary/base`` contains base functions for unary operator.
+            * Operators like softmax, layer_norm, batch_norm are present in ``py/torch_tensorrt/fx/impl/normalization``
+              Since there are no base operations common to all, there is no base file
+            * Operators like slice, select, where, embedding are present in ``py/torch_tensorrt/fx/impl/*.py``.
+              They have individual operator implementation with the same API structure as above but with different arguments
+        * The converter should be registered with the approriate opcode which is same as the activation example
+
+    * lowering type
+        There are some converters which can be decomposed into suboperations and need not have seperate converter registration.
+        Such converters can be implemented via ``lowering passes``.
+        We illustrate this via ``addmm`` operator
+        *   The decompositions are registered via ``register_decomposition``. We define ``addmm_replacement`` and replace it with the torch ops, which will have their corresponding converters called.
+        .. code_block:: none
+            def addmm_replacement(
+                input_: torch.Tensor, mat1: torch.Tensor, mat2: torch.Tensor, *, beta=1, alpha=1
+            ) -> torch.Tensor:
+                return torch.add(
+                torch.mul(input_, beta), torch.mul(torch.matmul(mat1, mat2), alpha)
+                )
+
+Tests
+----------------
+
+* FX testing
+    * The fx tests are present in ``test/converters/aten_op``
+    * The test class is derived from ``DispatchTestCase``, with parameterized testing to implement different test cases
+    * The results for ``dispatch_tracer.aten_trace`` and torch are compared.
+    * The ``expected_op`` is also test. This op will be called by the model and needs to be specified so that the approriate converter is invoked
+    * The tests throw error if any of the above conditions fail
+
+* Dynamo testing
+    * The dynamo tests are present for the lowering ops. The above converters will soon be ported to dynamo tests
+    * The lowering op test is present in ``dynamo/backend/test/test_decompositions.py``
+    * The results for ``fx.symbolic_trace `` and ``torch_tensorrt.compile`` is compared
+    * The tests also test for the ``expected_op`` and the ``unexpected_op``. ``expected_op`` are the ones corresponding to the operations the operations are lowered to, while ``unexpected_op`` is the original op
+    * The tests throw error if any of the above conditions fail