Write new top-level export and lowering documentation

GregoryComer · GregoryComer · commit 55556238fe81 · 2025-02-16T04:12:24.000-08:00
diff --git a/docs/source/api-life-cycle.md b/docs/source/api-life-cycle.md
@@ -1,4 +1,4 @@
-# ExecuTorch API Life Cycle and Deprecation Policy
+# API Life Cycle and Deprecation Policy
 
 ## API Life Cycle
 
diff --git a/docs/source/compiler-delegate-and-partitioner.md b/docs/source/compiler-delegate-and-partitioner.md
@@ -1,4 +1,4 @@
-# Backend and Delegate
+# Backends and Delegates
 
 Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch
 
diff --git a/docs/source/executorch-runtime-api-reference.rst b/docs/source/executorch-runtime-api-reference.rst
@@ -1,4 +1,4 @@
-ExecuTorch Runtime API Reference
+Runtime API Reference
 ================================
 
 The ExecuTorch C++ API provides an on-device execution framework for exported PyTorch models.
diff --git a/docs/source/export-to-executorch-api-reference.rst b/docs/source/export-to-executorch-api-reference.rst
@@ -1,4 +1,4 @@
-Export to ExecuTorch API Reference
+Export API Reference
 ----------------------------------
 
 For detailed information on how APIs evolve and the deprecation process, please refer to the `ExecuTorch API Life Cycle and Deprecation Policy <api-life-cycle.html>`__.
diff --git a/docs/source/getting-started.md b/docs/source/getting-started.md
@@ -59,7 +59,7 @@ with open(“model.pte”, “wb”) as f:
 	f.write(et_program.buffer)
 ```
 
-If the model requires varying input sizes, you will need to specify the varying dimensions and bounds as part of the `export` call. See [Exporting a Model for ExecuTorch](/TODO.md) for more information.
+If the model requires varying input sizes, you will need to specify the varying dimensions and bounds as part of the `export` call. See [Model Export and Lowering](using-executorch-export.md) for more information.
 
 The hardware backend to target is controlled by the partitioner parameter to to\_edge\_transform\_and\_lower. In this example, the XnnpackPartitioner is used to target mobile CPUs. See the delegate-specific documentation for a full description of the partitioner and available options.
 
@@ -198,7 +198,7 @@ For more information on the C++ APIs, see [Running an ExecuTorch Model Using the
 ## Next Steps
 ExecuTorch provides a high-degree of customizability to support diverse hardware targets. Depending on your use cases, consider exploring one or more of the following pages:
 
-- [Exporting a Model to ExecuTorch](/TODO.md) for advanced model conversion options.
+- [Export and Lowering](using-executorch-export.md) for advanced model conversion options.
 - [Delegates](/TODO.md) for available backends and configuration options.
 - [Using ExecuTorch on Android](/TODO.md) and [Using ExecuTorch on iOS](TODO.md) for mobile runtime integration.
 - [Using ExecuTorch with C++](/TODO.md) for embedded and mobile native development.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -53,7 +53,7 @@ Topics in this section will help you get started with ExecuTorch.
      .. grid-item-card:: :octicon:`file-code;1em`
         Getting started with ExecuTorch
         :img-top: _static/img/card-background.svg
-        :link: getting-started-setup.html
+        :link: getting-started.html
         :link-type: url
 
         A step-by-step tutorial on how to get started with
@@ -190,6 +190,7 @@ Topics in this section will help you get started with ExecuTorch.
 
    backend-delegates-integration
    backend-delegates-dependencies
+   compiler-delegate-and-partitioner
    debug-backend-delegate
 
 .. toctree::
@@ -207,7 +208,6 @@ Topics in this section will help you get started with ExecuTorch.
    :caption: Compiler Entry Points
    :hidden:
 
-   compiler-delegate-and-partitioner
    compiler-backend-dialect
    compiler-custom-compiler-passes
    compiler-memory-planning
diff --git a/docs/source/runtime-python-api-reference.rst b/docs/source/runtime-python-api-reference.rst
@@ -1,4 +1,4 @@
-ExecuTorch Runtime Python API Reference
+Runtime Python API Reference
 ----------------------------------
 The Python ``executorch.runtime`` module wraps the C++ ExecuTorch runtime. It can load and execute serialized ``.pte`` program files: see the `Export to ExecuTorch Tutorial <tutorials/export-to-executorch-tutorial.html>`__ for how to convert a PyTorch ``nn.Module`` to an ExecuTorch ``.pte`` program file. Execution accepts and returns ``torch.Tensor`` values, making it a quick way to validate the correctness of the program.
 
diff --git a/docs/source/using-executorch-export.md b/docs/source/using-executorch-export.md
@@ -1,3 +1,178 @@
-# Model Export
+# Model Export and Lowering
 
-Placeholder for top-level export documentation
+The section describes the process of taking a PyTorch model and converting to the runtime format used by ExecuTorch. This process is commonly known as "exporting", as it uses the PyTorch export functionality to convert a PyTorch model into a format suitable for on-device execution. This process yields a .pte file which is optimized for on-device execution using a particular backend.
+
+## Prerequisites
+
+Exporting requires the ExecuTorch python libraries to be installed, typically by running `pip install executorch`. See [Installation](getting-started.md#Installation) for more information. This process assumes you have a PyTorch model, can instantiate it from Python, and can provide example input tensors to run the model.
+
+## The Export and Lowering Process
+
+The process to export and lower a model to the .pte format typically involves the following steps:
+
+1) Select a backend to target.
+2) Prepare the PyTorch model, including inputs and shape specification.
+3) Export the model using torch.export.export.
+4) Optimize the model for the target backend using to_edge_transform_and_lower.
+5) Create the .pte file by calling to_executorch and serializing the output.
+
+<br/>
+
+Quantization - the process of using reduced precision to reduce inference time and memory footprint - is also commonly done at this stage. See [Quantization Overview](quantization-overview.md) for more information.
+
+## Hardware Backends
+
+ExecuTorch backends provide hardware acceleration for a specific hardware target. In order to achieve maximum performance on target hardware, ExecuTorch optimizes the model for a specific backend during the export and lowering process. This means that the resulting .pte file is specialized for the specific hardware. In order to deploy to multiple backends, such as Core ML on iOS and Arm CPU on Android, it is common to generate a dedicated .pte file for each.
+
+The choice of hardware backend is informed by the hardware that the model is intended to be deployed on. Each backend has specific hardware requires and level of model support. See the documentation for each hardware backend for more details.
+
+As part of the .pte file creation process, ExecuTorch identifies portions of the model (partitions) that are supported for the given backend. These sections are processed by the backend ahead of time to support efficient execution. Portions of the model that are not supported on the delegate, if any, are executed using the portable fallback implementation on CPU. This allows for partial model acceleration when not all model operators are supported on the backend, but may have negative performance implications. In addition, multiple partitioners can be specified in order of priority. This allows for operators not supported on GPU to run on CPU via XNNPACK, for example.
+
+### Available Backends
+
+Commonly used hardware backends are listed below. For mobile, consider using XNNPACK for Android and XNNPACK or Core ML for iOS. To create a .pte file for a specific backend, pass the appropriate partitioner class to `to_edge_transform_and_lower`. See the appropriate backend documentation and the [Export and Lowering](#export-and-lowering) section below for more information. 
+
+- [XNNPACK (Mobile CPU)](native-delegates-executorch-xnnpack-delegate.md)
+- [Core ML (iOS)](native-delegates-executorch-coreml-delegate.md)
+- [Metal Performance Shaders (iOS GPU)](native-delegates-executorch-mps-delegate.md)
+- [Vulkan (Android GPU)](native-delegates-executorch-vulkan-delegate.md)
+- [Qualcomm NPU](native-delegates-executorch-qualcomm-delegate.md)
+- [MediaTek NPU](native-delegates-executorch-mediatek-delegate.md)
+- [Arm Ethos-U NPU](native-delegates-executorch-arm-ethos-u-delegate.md)
+- [Cadence DSP](native-delegates-executorch-cadence-delegate.md)
+
+## Model Preparation
+
+The export process takes in a standard PyTorch model, typically a `torch.nn.Module`. This can be an custom model definition, or a model from an existing source, such as TorchVision or HuggingFace. See [Getting Started with ExecuTorch](getting-started.md) for an example of lowering a TorchVision model.
+
+Model export is done from Python. This is commonly done through a Python script or from an interactive Python notebook, such as Jupyter or Colab. The example below shows instantiation and inputs for a simple PyTorch model. The inputs are prepared as a tuple of torch.Tensors, and the model can run with these inputs.
+
+```python
+import torch
+
+class Model(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.seq = torch.nn.Sequential(
+            torch.nn.Conv2d(1, 8, 3),
+            torch.nn.ReLU(),
+            torch.nn.Conv2d(8, 16, 3),
+            torch.nn.ReLU(),
+            torch.nn.AdaptiveAvgPool2d([[1,1]])
+        )
+        self.linear = torch.nn.Linear(16, 10)
+    
+    def forward(self, x):
+        y = self.seq(x)
+        y = torch.flatten(y, 1)
+        y = self.linear(y)
+        return y
+
+model = Model()
+inputs = (torch.randn(1,1,16,16),)
+outputs = model(*inputs)
+print(f"Model output: {outputs}")
+```
+
+## Export and Lowering
+
+To actually export and lower the model, call `export`, `to_edge_transform_and_lower`, and `to_executorch` in sequence. This yields an ExecuTorch program which can be serialized to a file. Putting it all together, lowering the example model above using the XNNPACK delegate for mobile CPU performance can be done as follows:
+
+```python
+import torch
+from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
+from executorch.exir import to_edge_transform_and_lower
+from torch.export import Dim, export
+
+class Model(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.seq = torch.nn.Sequential(
+            torch.nn.Conv2d(1, 8, 3),
+            torch.nn.ReLU(),
+            torch.nn.Conv2d(8, 16, 3),
+            torch.nn.ReLU(),
+            torch.nn.AdaptiveAvgPool2d([1,1])
+        )
+        self.linear = torch.nn.Linear(16, 10)
+    
+    def forward(self, x):
+        y = self.seq(x)
+        y = torch.flatten(y, 1)
+        y = self.linear(y)
+        return y
+
+model = Model()
+inputs = (torch.randn(1,1,16,16),)
+dynamic_shapes = {
+    "x": {
+        2: Dim("h", min=16, max=1024),
+        3: Dim("w", min=16, max=1024),
+    }
+}
+
+exported_program = export(model, inputs, dynamic_shapes=dynamic_shapes)
+executorch_program = to_edge_transform_and_lower(
+    exported_program,
+    partitioner = [XnnpackPartitioner()]
+).to_executorch()
+
+with open("model.pte", "wb") as file:
+    file.write(executorch_program.buffer)
+```
+
+This yields a `model.pte` file which can be run on mobile devices.
+
+### Supporting Varying Input Sizes (Dynamic Shapes)
+
+The PyTorch export process uses the example inputs provided to trace through the model and reason about the size and type of tensors at each step. Unless told otherwise, export will assume a fixed input size equal to the example inputs and will use this information to optimize the model.
+
+Many models require support for varying input sizes. To support this, export takes a `dynamic_shapes` parameter, which informs the compiler of which dimensions can vary and their bounds. This takes the form of a nested dictionary, where keys correspond to input names and values specify the bounds for each input.
+
+In the example model, inputs are provided as 4-dimensions tensors following the standard convention of batch, channels, height, and width (NCHW). An input with the shape `[1, 3, 16, 16]` indicates 1 batch, 3 channels, and a height and width of 16.
+
+Suppose your model supports images with sizes between 16x16 and 1024x1024. The shape bounds can be specified as follows:
+
+```
+dynamic_shapes = {
+    "x": {
+        2: Dim("h", min=16, max=1024),
+        3: Dim("w", min=16, max=1024),
+    }
+}
+
+ep = torch.export.export(model, inputs, dynamic_shapes=dynamic_shapes)
+```
+
+In the above example, `"x"` corresponds to the parameter name in `Model.forward`. The 2 and 3 keys correpond to dimensions 2 and 3, which are height and width. As there are no specifications for batch and channel dimensions, these values are fixed according to the example inputs.
+
+ExecuTorch uses the shape bounds both to optimize the model and to plan memory for model execution. For this reason, it is advised to set the dimension upper bounds to no higher than needed, as higher bounds increase memory consumption.
+
+For more complex use cases, dynamic shape specification allows for mathematical relationships between dimensions. For more information on dynamic shape specification, see [Expressing Dynamism](https://pytorch.org/docs/stable/export.html#expressing-dynamism).
+
+## Testing the Model
+
+Before integrating the runtime code, it is common to test the exported model from Python. This can be used to evaluate model accuracy and sanity check behavior before moving to the target device. Note that not all hardware backends are available from Python, as they may require specialized hardware to function. See the specific backend documentation for more information on hardware requirements and the availablilty of simulators. The XNNPACK delegate used in this example is always available on host machines.
+
+```python
+from executorch.runtime import Runtime
+
+runtime = Runtime.get()
+
+input_tensor = torch.randn(1, 3, 32, 32)
+program = runtime.load_program("model.pte")
+method = program.load_method("forward")
+outputs = method.execute([input_tensor])
+```
+
+For more information, see [Runtime API Reference](executorch-runtime-api-reference.md).
+
+## Next Steps
+
+The PyTorch and ExecuTorch export and lowering APIs provide a high level of customizability to meet the needs of diverse hardware and models. See [torch.export](https://pytorch.org/docs/main/export.html) and [Export API Reference](export-to-executorch-api-reference.md) for more information.
+
+For advanced use cases, see the following:
+- [Quantization Overview](quantization-overview.md) for information on quantizing models to reduce inference time and memory footprint.
+- [Memory Planning](compiler-memory-planning.md) for information on controlling memory placement and planning.
+- [Custom Compiler Passes](compiler-custom-compiler-passes.md) for information on writing custom compiler passes.
+- [Export IR Specification](ir-exir.md) for information on the intermediate representation generated by export.

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# ExecuTorch API Life Cycle and Deprecation Policy`
	`1`	`+# API Life Cycle and Deprecation Policy`
`2`	`2`
`3`	`3`	`## API Life Cycle`
`4`	`4`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Backend and Delegate`
	`1`	`+# Backends and Delegates`
`2`	`2`
`3`	`3`	`Audience: Vendors, Backend Delegate developers, who are interested in integrating their own compilers and hardware as part of ExecuTorch`
`4`	`4`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-ExecuTorch Runtime API Reference`
	`1`	`+Runtime API Reference`
`2`	`2`	`================================`
`3`	`3`
`4`	`4`	`The ExecuTorch C++ API provides an on-device execution framework for exported PyTorch models.`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-Export to ExecuTorch API Reference`
	`1`	`+Export API Reference`
`2`	`2`	`----------------------------------`
`3`	`3`
`4`	`4`	For detailed information on how APIs evolve and the deprecation process, please refer to the `ExecuTorch API Life Cycle and Deprecation Policy <api-life-cycle.html>`__.
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-ExecuTorch Runtime Python API Reference`
	`1`	`+Runtime Python API Reference`
`2`	`2`	`----------------------------------`
`3`	`3`	The Python ``executorch.runtime`` module wraps the C++ ExecuTorch runtime. It can load and execute serialized ``.pte`` program files: see the `Export to ExecuTorch Tutorial <tutorials/export-to-executorch-tutorial.html>`__ for how to convert a PyTorch ``nn.Module`` to an ExecuTorch ``.pte`` program file. Execution accepts and returns ``torch.Tensor`` values, making it a quick way to validate the correctness of the program.
`4`	`4`