pytorch
diff --git a/‎backends/arm/CMakeLists.txt
Lines changed: 2 additions & 2 deletions b/‎backends/arm/CMakeLists.txt
Lines changed: 2 additions & 2 deletions
diff --git a/‎backends/arm/README.md
Lines changed: 12 additions & 11 deletions b/‎backends/arm/README.md
Lines changed: 12 additions & 11 deletions
diff --git a/‎backends/arm/arm_backend.py
Lines changed: 20 additions & 132 deletions b/‎backends/arm/arm_backend.py
Lines changed: 20 additions & 132 deletions
diff --git a/‎backends/arm/arm_vela.py
Lines changed: 2 additions & 3 deletions b/‎backends/arm/arm_vela.py
Lines changed: 2 additions & 3 deletions
diff --git a/‎backends/arm/ethosu_backend.py
Lines changed: 86 additions & 0 deletions b/‎backends/arm/ethosu_backend.py
Lines changed: 86 additions & 0 deletions
@@ -1,4 +1,4 @@
-# Copyright 2023 Arm Limited and/or its affiliates.
+# Copyright 2023, 2025 Arm Limited and/or its affiliates.
 #
 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
@@ -22,7 +22,7 @@ set(THIRD_PARTY_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/third-party")
 set(DRIVER_ETHOSU_INCLUDE_DIR "${THIRD_PARTY_ROOT}/ethos-u-core-driver/include")
 include_directories(${DRIVER_ETHOSU_INCLUDE_DIR})
 
-set(_arm_baremetal_sources backends/arm/runtime/ArmBackendEthosU.cpp
+set(_arm_baremetal_sources backends/arm/runtime/EthosUBackend.cpp
                            backends/arm/runtime/VelaBinStream.cpp
 )
 list(TRANSFORM _arm_baremetal_sources PREPEND "${EXECUTORCH_ROOT}/")
 
@@ -15,7 +15,7 @@ ethos-u-vela compilation stack. which follows the fully AoT flow.
 ## Layout
 
 Export:
-- `arm_backend.py` - Main entrypoint for the ArmPartitioner and ArmBackend. For more information see the section on
+- `ethosu_backend.py` - Main entrypoint for the EthosUBackend. For more information see the section on
 [Arm Backend Architecture](#arm-backend-architecture). For examples of use see `executorch/examples/arm`.
 - `tosa_mapping.py` - utilities for mapping edge dialect to TOSA
 - `tosa_quant_utils.py` - utilities for mapping quantization information to TOSA encoding
@@ -29,11 +29,11 @@ Passes:
 - `*_pass.py` - Compiler passes derived from ExportPass
 
 Quantization:
-- `arm_quantizer.py` - Quantizer for Arm backend
+- `arm_quantizer.py` - Quantizers for Arm backend. Contains the EthosUQuantizer which inherits from the TOSAQuantizer
 - `arm_quantizer_utils.py` - Utilities for quantization
 
 Runtime:
-- `runtime/ArmBackendEthosU.cpp` - The Arm backend implementation of the ExecuTorch runtime backend (BackendInterface) for Ethos-U
+- `runtime/ArmEthosUBackend.cpp` - The Arm backend implementation of the ExecuTorch runtime backend (BackendInterface) for Ethos-U
 
 Other:
 - `third-party/` - Dependencies on other code - in particular the TOSA serialization_lib for compiling to TOSA and the ethos-u-core-driver for the bare-metal backend supporting Ethos-U
@@ -177,6 +177,7 @@ create an issue on [github](https://www.github.com/pytorch/executorch/issues).
 # Arm Backend Architecture
 
 The broad principle with the Arm backend implemention for ExecuTorch is to support multiple Arm devices and device configurations through a largely Homogeneous flow with maximal sharing of class logic.
+The EthosUBackend is currently the one user facing API that target the Ethos-U55 and Ethos-U85 hardware IP. It is using the TOSABackend under the hood to share code and functionality, but also to separate testing possibilities to the TOSA flow itself.
 
 In practice for compilation, this means that the flow goes via [Arm TOSA](https://www.mlplatform.org/tosa/tosa_spec.html) to produce a common IR and quantization behaviour compatible with our various IP, and typically, device-specific backends to further lower to a device specific binary which can happen ahead of time (within the Python development flow) or at runtime (during a JIT compilation stage).
 
@@ -185,22 +186,22 @@ In practice for the runtime, this means we will share common runtime backend fun
 
 ## Arm Backend Status and Maturity
 
-The Arm Backend should be considered a prototype quality at this point, likely subject to significant change and improvement, and with a limited coverage of functionality. We are actively developing this codebase.
+The Arm EthosU Backend should be considered a prototype quality at this point, likely subject to significant change and improvement, and with a limited coverage of functionality. We are actively developing this codebase.
 
 ## Current flows
 
-The ArmBackend has a two stage process,
-- Compile to TOSA to rationalise the graph into known hardware support profiles. Currently this is to v0.80 TOSA BI with specific concern to a subset which gives support on Ethos-U55, the target of the initial prototype efforts.
+The EthosUBackend has a two stage process,
+- Compile to TOSA to rationalise the graph into known hardware support profiles. Currently this is to v0.80 TOSA BI with specific concern to a subset which gives support on Ethos-U55 and Ethos-U85, the target of the initial prototype efforts. This calls into the TOSABackend.
 - Lower via the ethos-u-vela compilation flow which takes TOSA v0.80 as an input and produces a low level commandstream for the hardware which is then passed via the delegate to the ethos-u-core-driver for direct execution.
 
-The ArmPartitioner is currenly used to ensure the operations converted are Ethos-U compatible, but will be extended to offer spec-correct TOSA Base inference and TOSA Main Inference generation in future.
+The EthosUPartitioner is currenly used to ensure the operations converted are Ethos-U compatible, but will be extended to offer spec-correct TOSA Base inference and TOSA Main Inference generation in future.
+
+There is also a generic TOSABackend with accompanying TOSAPartitioner and TOSAQuantizer, which are used by the EthosUBackend and friends. The Arm TOSA Backend can be used by it's own to verify the lowering to the TOSA representation of the model (refer to the unit tests in backends/arm/test which uses the TOSA backend in the test suites).
 
 ### Controlling compilation
 
 It is possible to control the compilation flow to aid in development and debug of both networks and the code itself.
 
-Configuration of the ArmBackend export flow is controlled by CompileSpec information (essentially used as compilation flags) to determine which of these outputs is produced. In particular this allows for use of the tosa_reference_model to run intermediate output to check for correctness and quantization accuracy without a full loop via hardware implemntation.
-
-As this is in active development see the ArmBackend for accurate information on [compilation flags](https://github.com/pytorch/executorch/blob/29f6dc9353e90951ed3fae3c57ae416de0520067/backends/arm/arm_backend.py#L319-L324)
+Configuration of the EthosUBackend export flow is controlled by CompileSpec information (essentially used as compilation flags) to determine which of these outputs is produced. In particular this allows for use of the tosa_reference_model to run intermediate output to check for correctness and quantization accuracy without a full loop via hardware implemntation.
 
-You can also refer to the [example TOSA end-to-end code](/examples/arm/arm_tosa_e2e.py)
+As this is in active development see the EthosUBackend for accurate information on [compilation flags](https://github.com/pytorch/executorch/blob/29f6dc9353e90951ed3fae3c57ae416de0520067/backends/arm/arm_backend.py#L319-L324)
@@ -12,35 +12,16 @@
 #
 
 import logging
-import os
-from typing import cast, final, List, Optional
 
-import serializer.tosa_serializer as ts  # type: ignore
-from executorch.backends.arm.arm_vela import vela_compile
-from executorch.backends.arm.operators.node_visitor import get_node_visitors
+from typing import List, Optional
 
 from executorch.backends.arm.tosa_specification import TosaSpecification
-from executorch.backends.arm._passes.arm_pass_manager import (
-    ArmPassManager,
-)  # usort: skip
-from executorch.backends.arm.process_node import (
-    process_call_function,
-    process_output,
-    process_placeholder,
-)
-from executorch.backends.arm.tosa_utils import dbg_fail, dbg_tosa_dump
-from executorch.exir.backend.backend_details import BackendDetails, PreprocessResult
+
 from executorch.exir.backend.compile_spec_schema import CompileSpec
-from torch.export.exported_program import ExportedProgram
-from torch.fx import Node
 
-# TOSA backend debug functionality
+
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.WARNING)
-TOSA_DBG_VERBOSE = os.environ.get("TOSA_DBG_VERBOSE") == "1"
-if TOSA_DBG_VERBOSE:
-    logging.basicConfig(level=logging.INFO)
-    logger.setLevel(logging.INFO)
 
 
 class ArmCompileSpecBuilder:
@@ -49,7 +30,6 @@ def __init__(self):
         self.compiler_flags = []
         self.output_format = None
         self.path_for_intermediates = None
-        self.tosa_version = None
         self.tosa_spec = None
         self.input_order = None
 
@@ -130,7 +110,7 @@ def build(self) -> List[CompileSpec]:
         assert self.tosa_spec
 
         # Always supply a TOSA version
-        self.compile_spec = [CompileSpec("tosa_version", str(self.tosa_spec).encode())]
+        self.compile_spec = [CompileSpec("tosa_spec", str(self.tosa_spec).encode())]
 
         if self.output_format == "vela":
             self.compile_spec += [
@@ -156,125 +136,33 @@ def build(self) -> List[CompileSpec]:
 
 
 def is_tosa(compile_spec: List[CompileSpec]) -> bool:
+    has_tosa_output = False
+    has_tosa_spec = False
+    for spec in compile_spec:
+        if spec.key == "output_format":
+            has_tosa_output = spec.value.decode() == "tosa"
+        if spec.key == "tosa_spec":
+            has_tosa_spec = True
+
+    return has_tosa_output and has_tosa_spec
+
+
+def is_ethosu(compile_spec: List[CompileSpec]) -> bool:
     for spec in compile_spec:
         if spec.key == "output_format":
-            return spec.value.decode() == "tosa"
+            return spec.value.decode() == "vela"
     return False
 
 
-def get_tosa_version(compile_spec: List[CompileSpec]) -> TosaSpecification:
+def get_tosa_spec(compile_spec: List[CompileSpec]) -> TosaSpecification:
     for spec in compile_spec:
-        if spec.key == "tosa_version":
+        if spec.key == "tosa_spec":
             return TosaSpecification.create_from_string(spec.value.decode())
-    raise RuntimeError("Could not find TOSA version in CompileSpec")
+    raise ValueError("Could not find TOSA version in CompileSpec")
 
 
 def get_intermediate_path(compile_spec: List[CompileSpec]) -> Optional[str]:
     for spec in compile_spec:
         if spec.key == "debug_artifact_path":
             return spec.value.decode()
     return None
-
-
-def _get_first_delegation_tag(graph_module) -> str | None:
-    """Get the first delegation tag from the graph_module or return None."""
-    for node in graph_module.graph.nodes:
-        tag = node.meta.get("delegation_tag")
-        if tag:
-            return tag
-
-    logger.debug("No delegation tag found in partition.")
-    return None
-
-
-@final
-class ArmBackend(BackendDetails):
-    @staticmethod
-    def preprocess(  # noqa: C901
-        edge_program: ExportedProgram,
-        compile_spec: List[CompileSpec],
-    ) -> PreprocessResult:
-        logger.info("ArmBackend::preprocess")
-
-        # if a debug/test build capture output files from TOSA stage
-        artifact_path = None
-        output_format = ""
-        compile_flags = []
-        input_order = []
-        for spec in compile_spec:
-            if spec.key == "debug_artifact_path":
-                artifact_path = spec.value.decode()
-            if spec.key == "output_format":
-                output_format = spec.value.decode()
-            if spec.key == "compile_flags":
-                compile_flags.append(spec.value.decode())
-            if spec.key == "input_order":
-                input_order = list(map(int, spec.value.decode().split(",")))
-
-        # Check that the output format is set in the compile spec
-        if not output_format:
-            raise RuntimeError("output format is required")
-
-        tosa_spec = TosaSpecification.create_from_compilespecs(compile_spec)
-        assert (
-            tosa_spec is not None
-        ), "TOSA backend needs a TOSA version specified in the CompileSpec!"
-
-        if output_format == "vela" and len(compile_flags) == 0:
-            # Not testing for compile_flags correctness here, just that they are
-            # present. The compiler will give errors if they are not valid.
-            raise RuntimeError("compile flags are required for vela output format")
-
-        logger.info(f"Converting ExportedProgram to TOSA: {tosa_spec}")
-
-        # Converted output for this subgraph, serializer needs path early as it emits
-        # const data directly. Path created and data written only in debug builds.
-        tosa_graph = ts.TosaSerializer(artifact_path)
-        graph_module = ArmPassManager(tosa_spec).transform_to_backend_pipeline(  # type: ignore
-            exported_program=edge_program
-        )
-
-        node_visitors = get_node_visitors(edge_program, tosa_spec)
-        input_count = 0
-        for node in graph_module.graph.nodes:
-            node = cast(Node, node)
-            if node.op == "call_function":
-                process_call_function(node, tosa_graph, node_visitors, tosa_spec)
-            elif node.op == "placeholder":
-                process_placeholder(node, tosa_graph, edge_program, tosa_spec)
-                if node.name in edge_program.graph_signature.user_inputs:
-                    input_count += 1
-            elif node.op == "output":
-                process_output(node, tosa_graph)
-            else:
-                # This will only happen if an unpartitioned graph is passed without
-                # any checking of compatibility.
-                dbg_fail(node, tosa_graph, artifact_path)
-
-        if len(input_order) > 0:
-            if input_count != len(input_order):
-                raise RuntimeError(
-                    "The rank of the input order is not equal to amount of input tensors"
-                )
-
-        if artifact_path:
-            tag = _get_first_delegation_tag(graph_module)
-            dbg_tosa_dump(
-                tosa_graph,
-                artifact_path,
-                suffix="{}".format(f"_{tag}" if tag else ""),
-            )
-
-        # Serialize and return the program. While we have always produced TOSA
-        # output as an intermediate, some flows compile to device binaries in
-        # preprocess and some consume TOSA fb directly.
-        if output_format == "vela":
-            # Emit vela_bin_stream format
-            binary = vela_compile(tosa_graph, compile_flags, input_order)
-        elif output_format == "tosa":
-            # Emit TOSA flatbuffer
-            binary = bytes(tosa_graph.serialize())
-        else:
-            raise RuntimeError(f"Unknown format {output_format}")
-
-        return PreprocessResult(processed_bytes=binary)
@@ -39,13 +39,12 @@ def vela_bin_pack_io(prefix, data, shape_order=None):
 # Output via Vela to binary stream for ArmBackendEthosU
 # WARNING: Do not change this without changing VelaBinStream.cpp as that
 #          function consumes this format and the two need to align.
-def vela_compile(tosa_graph, args: List[str], shape_order=None):
+def vela_compile(tosa_flatbuffer: bytes, args: List[str], shape_order=None):
     with tempfile.TemporaryDirectory() as tmpdir:
         tosaname = "out.tosa"
-        flatbuffer = tosa_graph.serialize()
         tosa_path = os.path.join(tmpdir, tosaname)
         with open(tosa_path, "wb") as f:
-            f.write(flatbuffer)
+            f.write(tosa_flatbuffer)
 
         # invoke vela
         output_dir = os.path.join(tmpdir, "output")
 
@@ -0,0 +1,86 @@
+# Copyright 2025 Arm Limited and/or its affiliates.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+# pyre-unsafe
+
+#
+# Main implementation of AoT flow to partition and preprocess for Arm target
+# backends. Converts via TOSA as an intermediate form supported by AoT and
+# JIT compiler flows.
+#
+
+import logging
+from typing import final, List
+
+from executorch.backends.arm.arm_vela import vela_compile
+
+from executorch.backends.arm.tosa_backend import TOSABackend
+from executorch.exir.backend.backend_details import BackendDetails, PreprocessResult
+from executorch.exir.backend.compile_spec_schema import CompileSpec
+from torch.export.exported_program import ExportedProgram
+
+# debug functionality
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.WARNING)
+
+
+@final
+class EthosUBackend(BackendDetails):
+    """
+    BackendDetails subclass for delegation to Ethos-U. Deduce the TOSA lowering from
+    the compile spec list by filtering out the compile spec values that are of interest
+    for the TOSABackend.
+    """
+
+    @staticmethod
+    def _compile_tosa_flatbuffer(
+        tosa_flatbuffer: bytes, compile_spec: list[CompileSpec]
+    ) -> bytes:
+        """
+        Static helper method to do the compilation of the TOSA flatbuffer
+        representation to a target specific binary stream.
+        """
+        compile_flags = []
+        input_order = []
+        for spec in compile_spec:
+            if spec.key == "compile_flags":
+                compile_flags.append(spec.value.decode())
+            if spec.key == "input_order":
+                input_order = list(map(int, spec.value.decode().split(",")))
+
+        if len(compile_flags) == 0:
+            # Not testing for compile_flags correctness here, just that they are
+            # present. The compiler will give errors if they are not valid.
+            raise RuntimeError(
+                "compile_flags are required in the CompileSpec list for EthosUBackend"
+            )
+
+        # Pass on the TOSA flatbuffer to the vela compiler.
+        binary = vela_compile(tosa_flatbuffer, compile_flags, input_order)
+        return binary
+
+    @staticmethod
+    def preprocess(
+        edge_program: ExportedProgram,
+        compile_spec: List[CompileSpec],
+    ) -> PreprocessResult:
+        logger.info(f"{EthosUBackend.__name__} preprocess")
+
+        # deduce TOSA compile_spec from Ethos-U compile spec. We get a new
+        # compile spec list, containing only elements relevant for the
+        # TOSABackend.
+        tosa_compile_spec = TOSABackend.filter_tosa_compile_specs(compile_spec)
+
+        # Backends doesn't allow inheritance, as stated in comments in exir/backend/backend_api.py
+        # ('All backend implementation are final...'), so use composition instead.
+        # preprocess returns the serialized TOSA flatbuffer in .processed_bytes,
+        # which can be passed on to next compilation step.
+        tosa_preprocess = TOSABackend.preprocess(edge_program, tosa_compile_spec)
+
+        binary = EthosUBackend._compile_tosa_flatbuffer(
+            tosa_preprocess.processed_bytes, compile_spec
+        )
+
+        return PreprocessResult(processed_bytes=binary)
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Copyright 2023 Arm Limited and/or its affiliates.`
	`1`	`+# Copyright 2023, 2025 Arm Limited and/or its affiliates.`
`2`	`2`	`#`
`3`	`3`	`# This source code is licensed under the BSD-style license found in the`
`4`	`4`	`# LICENSE file in the root directory of this source tree.`
`@@ -22,7 +22,7 @@ set(THIRD_PARTY_ROOT "${CMAKE_CURRENT_SOURCE_DIR}/third-party")`
`22`	`22`	`set(DRIVER_ETHOSU_INCLUDE_DIR "${THIRD_PARTY_ROOT}/ethos-u-core-driver/include")`
`23`	`23`	`include_directories(${DRIVER_ETHOSU_INCLUDE_DIR})`
`24`	`24`
`25`		`-set(_arm_baremetal_sources backends/arm/runtime/ArmBackendEthosU.cpp`
	`25`	`+set(_arm_baremetal_sources backends/arm/runtime/EthosUBackend.cpp`
`26`	`26`	`backends/arm/runtime/VelaBinStream.cpp`
`27`	`27`	`)`
`28`	`28`	`list(TRANSFORM _arm_baremetal_sources PREPEND "${EXECUTORCH_ROOT}/")`