pytorch · mergennachin · Oct 15, 2023 · Oct 13, 2023 · Oct 13, 2023 · Oct 13, 2023
diff --git a/README.md b/README.md
@@ -30,6 +30,12 @@ and feedback about ExecuTorch using the tag **#executorch** and
 our [GitHub repository](https://github.com/pytorch/executorch/issues)
 for bug reporting.
 
+The ExecuTorch code and APIs are still changing quickly, and there are not yet
+any guarantees about forward/backward source compatibility. We recommend using
+the latest `v#.#.#` release tag from the
+[Releases](https://github.com/pytorch/executorch/releases) page when
+experimenting with this preview release.
+
 ## Directory Structure [WIP]
 
 ```

diff --git a/docs/source/_static/img/executorch-entry-points.png b/docs/source/_static/img/executorch-entry-points.png
diff --git a/docs/source/compiler-backend-dialect.md b/docs/source/compiler-backend-dialect.md
@@ -42,7 +42,7 @@ To lower edge ops to backend ops, a pass will perform pattern matching to identi
 * `transform()`. An API on ExportProgram that allows users to provide custom passes. Note that this is not guarded by any validator so the soundness of the program is not guaranteed.
 * [ExecutorchBackendConfig.passes](https://github.com/pytorch/executorch/blob/main/exir/capture/_config.py#L40). If added here, the pass will be part of the lowering process from backend dialect to ExecutorchProgram.
 
-Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. You can find more details [here](./quantization-custom-quantization.md). Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
+Example: one such pass is QuantFusion. This pass takes a "canonical quantization pattern", ie. "dequant - some_op - quant" and fuses this pattern into a single operator that is backend specific, i.e. `quantized_decomposed::some_op`. Another simpler example is [here](https://github.com/pytorch/executorch/blob/main/exir/passes/replace_edge_with_backend_pass.py#L20) where we replace `sym_size` operators to the ones that are understood by ExecuTorch
 
 
 ### Pattern Binding Decorator

diff --git a/docs/source/concepts.md b/docs/source/concepts.md
@@ -11,9 +11,9 @@ Fundamentally, it is a tensor library on top of which almost all other Python an
 
 ## [ATen Dialect](./ir-exir.md#aten-dialect)
 
-ATen dialect is the result of exporting an eager module to a graph representation. It is the entry point of the ExecuTorch compilation pipeline; after exporting to ATen dialect, subsequent passes can lower to Core ATen dialect and Edge dialect.
+ATen dialect is the immediate result of exporting an eager module to a graph representation. It is the entry point of the ExecuTorch compilation pipeline; after exporting to ATen dialect, subsequent passes can lower to [Core ATen dialect](./concepts.md#concepts#core-aten-dialect) and [Edge dialect](./concepts.md#edge-dialect).
 
-ATen dialect is a valid Export IR with additional properties. It consists of functional ATen operators, higher order operators (like control flow operators) and registered custom operators.
+ATen dialect is a valid [EXIR](./concepts.md#exir) with additional properties. It consists of functional ATen operators, higher order operators (like control flow operators) and registered custom operators.
 
 The goal of ATen dialect is to capture users’ programs as faithfully as possible.
 
@@ -26,15 +26,15 @@ ATen mode uses the ATen implementation of Tensor (`at::Tensor`) and related type
 
 ## Autograd safe ATen Dialect
 
-Autograd safe ATen dialect contains the autograd safe ATen operators, along with higher order operators (control flow ops) and registered custom operators.
+Autograd safe ATen dialect includes only differentiable ATen operators, along with higher order operators (control flow ops) and registered custom operators.
 
 ## Backend
 
 A specific hardware (like GPU, NPU) or a software stack (like XNNPACK) that consumes a graph or part of it, with performance and efficiency benefits.
 
 ## [Backend Dialect](./ir-exir.md#backend-dialect)
 
-Backend dialect is the result of exporting Edge dialect to specific backend. It’s target-aware, and may contain operators or submodules that are only meaningful to the target backend. This dialect allows the introduction of target-specific operators that do not conform to the schema defined in the Core ATen Operator Set and are not shown in ATen or Edge Dialect.
+Backend dialect is the immediate result of exporting Edge dialect to specific backend. It’s target-aware, and may contain operators or submodules that are only meaningful to the target backend. This dialect allows the introduction of target-specific operators that do not conform to the schema defined in the Core ATen Operator Set and are not shown in ATen or Edge Dialect.
 
 ## Backend registry
 
@@ -56,7 +56,7 @@ An open-source, cross-platform family of tools designed to build, test and packa
 
 In ExecuTorch, code generation is used to generate the [kernel registration library](./kernel-library-selective_build.md).
 
-## Core ATen Dialect
+## [Core ATen Dialect](https://pytorch.org/docs/stable/torch.compiler_ir.html#irs)
 
 Core ATen dialect contains the core ATen operators along with higher order operators (control flow) and registered custom operators.
 
@@ -66,15 +66,11 @@ A select subset of the PyTorch ATen operator library. Core ATen operators will n
 
 ## Core ATen Decomposition Table
 
-Decomposing an operator involves expressing it as a combination of other operators. During the export process, a default list of decompositions are used; this is known as the Core ATen Decomposition Table.
-
-## [Core ATen IR](https://pytorch.org/docs/stable/torch.compiler_ir.html#irs)
-
-Contains only the core ATen operators and registered custom operators. Registered custom operators are registered into the current PyTorch eager mode runtime, usually with a `TORCH_LIBRARY` call.
+Decomposing an operator means expressing it as a combination of other operators. During the AOT process, a default list of decompositions is employed, breaking down ATen operators into core ATen operators. This is referred to as the Core ATen Decomposition Table.
 
 ## [Custom operator](https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit?fbclid=IwAR1qLTrChO4wRokhh_wHgdbX1SZwsU-DUv1XE2xFq0tIKsZSdDLAe6prTxg#heading=h.ahugy69p2jmz)
 
-These are operators that aren't part of the ATen library, but which appear in eager mode. They are most likely associated with a specific target model or hardware platform. For example, [torchvision::roi_align](https://pytorch.org/vision/main/generated/torchvision.ops.roi_align.html) is a custom operator widely used by torchvision (doesn't target a specific hardware).
+These are operators that aren't part of the ATen library, but which appear in [eager mode](./concepts.md#eager-mode). Registered custom operators are registered into the current PyTorch eager mode runtime, usually with a `TORCH_LIBRARY` call. They are most likely associated with a specific target model or hardware platform. For example, [torchvision::roi_align](https://pytorch.org/vision/main/generated/torchvision.ops.roi_align.html) is a custom operator widely used by torchvision (doesn't target a specific hardware).
 
 ## DataLoader
 
@@ -94,7 +90,7 @@ Data type, the type of data (eg. float, integer, etc.) in a tensor.
 
 ## [Dynamic Quantization](https://pytorch.org/docs/main/quantization.html#general-quantization-flow)
 
-A method of quantizing wherein tensors are quantized on the fly during inference. This is in contrast to static quantization, where tensors are quantized before inference.
+A method of quantizing wherein tensors are quantized on the fly during inference. This is in contrast to [static quantization](./concepts.md#static-quantization), where tensors are quantized before inference.
 
 ## Dynamic shapes
 
@@ -160,6 +156,12 @@ An EXIR Graph is a PyTorch program represented in the form of a DAG (directed ac
 
 In graph mode, operators are first synthesized into a graph, which will then be compiled and executed as a whole. This is in contrast to eager mode, where operators are executed as they are encountered. Graph mode typically delivers higher performance as it allows optimizations such as operator fusion.
 
+## Higher Order Operators
+
+A higher order operator (HOP) is an operator that:
+- either accepts a Python function as input, returns a Python function as output, or both.
+- like all PyTorch operators, higher-order operators also have an optional implementation for backends and functionalities. This lets us e.g. register an autograd formula for the higher-order operator or define how the higher-order operator behaves under ProxyTensor tracing.
+
 ## Hybrid Quantization
 
 A quantization technique where different parts of the model are quantized with different techniques based on computational complexity and sensitivity to accuracy loss. Some parts of the model may not be quantized to retain accuracy.

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -143,8 +143,6 @@ Topics in this section will help you get started with ExecuTorch.
 
    runtime-overview
    runtime-backend-delegate-implementation-and-linking
-   runtime-custom-memory-allocator
-   runtime-error-handling
    runtime-platform-abstraction-layer
 
 .. toctree::
@@ -153,7 +151,7 @@ Topics in this section will help you get started with ExecuTorch.
    :caption: Quantization
    :hidden:
 
-   quantization-custom-quantization
+   quantization-overview
 
 .. toctree::
    :glob:
@@ -186,6 +184,7 @@ Topics in this section will help you get started with ExecuTorch.
    sdk-profiling
    sdk-inspector
    sdk-delegate-integration
+   sdk-tutorial
 
 Tutorials and Examples
 ~~~~~~~~~~~~~~~~~~~~~~

diff --git a/docs/source/native-delegates-executorch-xnnpack-delegate.md b/docs/source/native-delegates-executorch-xnnpack-delegate.md
@@ -109,10 +109,10 @@ Here we initialize the `XNNPACKQuantizer` and set the quantization config to be
 
 We can then configure the `XNNPACKQuantizer` as we wish. We set the following configs below as an example:
 ```python
-quantizer.set_global(qconfig_opt)  # qconfig_opt is an optional quantization config
-    .set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a module type
-    .set_object_type(torch.nn.functional.linear, qconfig_opt) # or torch functional op
-    .set_module_name("foo.bar", qconfig_opt)
+quantizer.set_global(quantization_config)
+    .set_object_type(torch.nn.Conv2d, quantization_config) # can configure by module type
+    .set_object_type(torch.nn.functional.linear, quantization_config) # or torch functional op typea
+    .set_module_name("foo.bar", quantization_config)  # or by module fully qualified name
 ```
 
 ### Quantizing your model with the XNNPACKQuantizer

diff --git a/docs/source/quantization-custom-quantization.md b/docs/source/quantization-custom-quantization.md
diff --git a/docs/source/quantization-overview.md b/docs/source/quantization-overview.md
@@ -0,0 +1,16 @@
+# Quantization Overview
+Quantization is a process that reduces the precision of computations and lowers memory footprint in the model. To learn more, please visit the [ExecuTorch concepts page](./concepts.md#quantization). This is particularly useful for edge devices, which typically have limited resources such as processing power, memory, and battery life. By using quantization, we can make our models more efficient and enable them to run effectively on these devices.
+
+In terms of flow, quantization happens early in the ExecuTorch stack:
+
+![ExecuTorch Entry Points](/_static/img/executorch-entry-points.png).
+
+A more detailed workflow can be found in the [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
+
+Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific ``Quantizer`` class. ``Quantizer`` provides API for modeling users in terms of how they want their model to be quantized and also passes on the user intention to quantization workflow.
+
+Backend developers will need to implement their own ``Quantizer`` to express how different operators or operator patterns are quantized in their backend. This is accomplished via [Annotation API](https://pytorch.org/tutorials/prototype/pt2e_quantizer.html) provided by quantization workflow. Since Quantizer is also user facing, it will expose specific APIs for modeling users to configure how they want the model to be quantized. Each backend should provide their own API documentation for their ``Quantizer``.
+
+Modeling user will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.
+
+For an example quantization flow with ``XNPACKQuantizer``, more docuemntations and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
diff --git a/docs/source/runtime-custom-memory-allocator.md b/docs/source/runtime-custom-memory-allocator.md
diff --git a/docs/source/runtime-error-handling.md b/docs/source/runtime-error-handling.md
diff --git a/docs/source/runtime-overview.md b/docs/source/runtime-overview.md
@@ -159,8 +159,6 @@ For more details about the ExecuTorch runtime, please see:
 * [Runtime API Tutorial](running-a-model-cpp-tutorial.md)
 * [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
 * [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
-* [Custom Memory Allocation](runtime-custom-memory-allocator.md)
-* [Runtime Error Handling](runtime-error-handling.md)
 * [Runtime Profiling](sdk-profiling.md)
 * [Backends and Delegates](compiler-delegate-and-partitioner.md)
 * [Backend Delegate Implementation](runtime-backend-delegate-implementation-and-linking.md)

diff --git a/docs/source/sdk-bundled-io.md b/docs/source/sdk-bundled-io.md
@@ -67,15 +67,13 @@ Here is a flow highlighting how to generate a `BundledProgram` given a PyTorch m
 ```python
 
 import torch
-from executorch import exir
+from torch.export import export
+
 from executorch.bundled_program.config import BundledConfig
 from executorch.bundled_program.core import create_bundled_program
 from executorch.bundled_program.serialize import serialize_from_bundled_program_to_flatbuffer
-from executorch.bundled_program.serialize import deserialize_from_flatbuffer_to_bundled_program
-
 
-from executorch.exir import ExecutorchBackendConfig
-from executorch.exir.passes import MemoryPlanningPass, ToOutVarPass
+from executorch.exir import to_edge
 
 
 # Step 1: ExecuTorch Program Export
@@ -118,13 +116,14 @@ capture_inputs = {
     for m_name in method_names
 }
 
-# Trace to FX Graph and emit the program
-program = (
-    exir.capture_multiple(model, capture_inputs)
-    .to_edge()
-    .to_executorch()
-    .program
-)
+# Find each method of model needs to be traced my its name, export its FX Graph.
+method_graphs = {
+    m_name: export(getattr(model, m_name), capture_inputs[m_name])
+    for m_name in method_names
+}
+
+# Emit the traced methods into ET Program.
+program = to_edge(method_graphs).to_executorch().executorch_program
 
 # Step 2: Construct BundledConfig
 
@@ -291,12 +290,13 @@ Here's the example of the dtype of test input not meet model's requirement:
 
 ```python
 import torch
-from executorch import exir
-from executorch.exir import ExecutorchBackendConfig
-from executorch.exir.passes import MemoryPlanningPass, ToOutVarPass
+from torch.export import export
+
 from executorch.bundled_program.config import BundledConfig
 from executorch.bundled_program.core import create_bundled_program
 
+from executorch.exir import to_edge
+
 
 class Module(torch.nn.Module):
     def __init__(self):
@@ -318,16 +318,14 @@ method_names = ['forward']
 inputs = torch.ones(2, 2, dtype=torch.float)
 print(model(inputs))
 
-# Trace to FX Graph.
-program = (
-    exir.capture(model, (inputs,))
-    .to_edge()
-    .to_executorch(
-        config=ExecutorchBackendConfig(
-            memory_planning_pass=MemoryPlanningPass(), to_out_var_pass=ToOutVarPass()
-        )
-    ).program
-)
+# Find each method of model needs to be traced my its name, export its FX Graph.
+method_graphs = {
+    m_name: export(getattr(model, m_name), (inputs, ))
+    for m_name in method_names
+}
+
+# Emit the traced methods into ET Program.
+program = to_edge(method_graphs).to_executorch().executorch_program
 
 
 # number of input sets to be verified
@@ -416,12 +414,13 @@ Another common error would be the method name in `BundledConfig` does not exist
 
 ```python
 import torch
-from executorch import exir
-from executorch.exir import ExecutorchBackendConfig
-from executorch.exir.passes import MemoryPlanningPass, ToOutVarPass
+from torch.export import export
+
 from executorch.bundled_program.config import BundledConfig
 from executorch.bundled_program.core import create_bundled_program
 
+from executorch.exir import to_edge
+
 
 
 class Module(torch.nn.Module):
@@ -440,23 +439,18 @@ class Module(torch.nn.Module):
 
 model = Module()
 
-# NOTE: wrong_forward is not an inference method in the above model.
-method_names = ['wrong_forward']
+method_names = ['forward']
 
 inputs = torch.ones(2, 2, dtype=torch.float)
-print(model(inputs))
 
-# Trace to FX Graph.
-program = (
-    exir.capture(model, (inputs,))
-    .to_edge()
-    .to_executorch(
-        config=ExecutorchBackendConfig(
-            memory_planning_pass=MemoryPlanningPass(), to_out_var_pass=ToOutVarPass()
-        )
-    ).program
-)
+# Find each method of model needs to be traced my its name, export its FX Graph.
+method_graphs = {
+    m_name: export(getattr(model, m_name), (inputs, ))
+    for m_name in method_names
+}
 
+# Emit the traced methods into ET Program.
+program = to_edge(method_graphs).to_executorch().executorch_program
 
 # Number of input sets to be verified
 n_input = 10
@@ -476,7 +470,11 @@ expected_outpus = [
     [[model(*x)] for x in inputs[0]]
 ]
 
-bundled_config = BundledConfig(method_names, inputs, expected_outpus)
+
+# NOTE: MISSING_METHOD_NAME is not an inference method in the above model.
+wrong_method_names = ['MISSING_METHOD_NAME']
+
+bundled_config = BundledConfig(wrong_method_names, inputs, expected_outpus)
 
 bundled_program = create_bundled_program(program, bundled_config)
 
@@ -518,6 +516,6 @@ File /executorch/bundled_program/core.py:147, in assert_valid_bundle(program, bu
     150      but {str(method_name_of_bundled_config - method_name_of_program)} does not include."
     152 # check if  has been sorted in ascending alphabetical order of method name.
     153 for bp_plan_id in range(1, len(bundled_config.execution_plan_tests)):
-AssertionError: All method names in bundled config should be found in program.execution_plan,          but {'wrong_forward'} does not include.
+AssertionError: All method names in bundled config should be found in program.execution_plan,          but {'MISSING_METHOD_NAME'} does not include.
 ```
 :::
diff --git a/docs/source/sdk-inspector.rst b/docs/source/sdk-inspector.rst
@@ -26,21 +26,21 @@ Inspector Methods
 Constructor
 ~~~~~~~~~~~
 
-.. autofunction:: sdk.inspector.inspector.Inspector.__init__
+.. autofunction:: sdk.Inspector.__init__
 
 **Example Usage:**
 
 .. code:: python
 
-    from executorch.sdk.etdb.inspector import Inspector
+    from executorch.sdk import Inspector
 
     inspector = Inspector(etdump_path="/path/to/etdump.etdp", etrecord_path="/path/to/etrecord.bin")
 
 
 print_data_tabular
 ~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: sdk.inspector.inspector.Inspector.print_data_tabular
+.. autofunction:: sdk.Inspector.print_data_tabular
 
 .. _example-usage-1:
 
@@ -56,7 +56,7 @@ print_data_tabular
 find_total_for_module
 ~~~~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: sdk.inspector.inspector.Inspector.find_total_for_module
+.. autofunction:: sdk.Inspector.find_total_for_module
 
 .. _example-usage-2:
 
@@ -74,7 +74,7 @@ find_total_for_module
 get_exported_program
 ~~~~~~~~~~~~~~~~~~~~
 
-.. autofunction:: sdk.inspector.inspector.Inspector.get_exported_program
+.. autofunction:: sdk.Inspector.get_exported_program
 
 .. _example-usage-3:
 
@@ -119,13 +119,7 @@ of an ``Inspector`` instance, for example:
 ~~~~~~~~~~~~~~~
 
 Access ``Event`` instances through the ``events`` attribute of an
-``EventBlock`` instance, for example:
-
-.. code:: python
-
-    for event_block in inspector.event_blocks:
-        for event in event_block.events:
-            # Do something with each event
+``EventBlock`` instance.
 
 .. autoclass:: sdk.inspector.inspector.Event
 

diff --git a/docs/source/sdk-tutorial.md b/docs/source/sdk-tutorial.md
@@ -0,0 +1,3 @@
+## SDK usage tutorial
+
+Please refer to the [SDK tutorial](sdk-tutorial.md) for a walkthrough on how to profile a model in ExecuTorch using the SDK.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## SDK usage tutorial

		Please refer to the [SDK tutorial](sdk-tutorial.md) for a walkthrough on how to profile a model in ExecuTorch using the SDK.