Skip to content

Commit 4d1618e

Browse files
committed
Update base for Update on "[ET-VK] Add coop shader for int8 linear"
Title says it all! ## Changes * Apply co-operative shader for vector * matrix computations. Differential Revision: [D73279548](https://our.internmc.facebook.com/intern/diff/D73279548/) [ghstack-poisoned]
2 parents 191e6c4 + 334af4a commit 4d1618e

File tree

245 files changed

+4375
-1597
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

245 files changed

+4375
-1597
lines changed

.ci/scripts/gather_benchmark_configs.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
"samsung_galaxy_s24": "arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/98f8788c-2e25-4a3c-8bb2-0d1e8897c0db",
2525
"google_pixel_8_pro": "arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/d65096ab-900b-4521-be8b-a3619b69236a",
2626
"google_pixel_3_private_rooted": "arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/98d23ca8-ea9e-4fb7-b725-d402017b198d",
27+
"apple_iphone_15_private": "arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/55929353-2f28-4ee5-bdff-d1a95f58cb28",
2728
}
2829

2930
# Predefined benchmark configurations

.github/workflows/android-release-artifacts.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ on:
1111
description: Upload the AAR to maven staging repository
1212
required: false
1313
type: boolean
14+
flavor:
15+
type: choice
16+
options:
17+
- "xnnpack"
18+
- "vulkan+xnnpack"
1419
schedule:
1520
- cron: 0 10 * * *
1621

@@ -86,6 +91,11 @@ jobs:
8691
sed -i "s/\(coordinates(\"org.pytorch\", \"executorch-android\", \"\)\([0-9]\+.[0-9]\+.[0-9]\+\)\(\")\)/\1$VERSION\3/" extension/android/executorch_android/build.gradle
8792
fi
8893
94+
FLAVOR="${{ inputs.flavor }}"
95+
if [[ "$FLAVOR" == "vulkan+xnnpack" ]]; then
96+
export EXECUTORCH_BUILD_VULKAN=ON
97+
fi
98+
8999
# Build AAR Package
90100
mkdir aar-out
91101
export BUILD_AAR_DIR=aar-out
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: apple-perf (private devices)
2+
3+
on:
4+
# TODO (huydhn): Disable the schedule run until we land the change to add device pool and device name
5+
# to separate between public and private iOS devices
6+
# schedule:
7+
# - cron: 0 0,4,8,12,16,20 * * *
8+
pull_request:
9+
paths:
10+
- .github/workflows/apple-perf-private-device-experiment.yml
11+
# push:
12+
# branches:
13+
# - main
14+
# paths:
15+
# - .github/workflows/apple-perf-private-device-experiment.yml
16+
# Note: GitHub has an upper limit of 10 inputs
17+
workflow_dispatch:
18+
inputs:
19+
models:
20+
description: Models to be benchmarked
21+
required: false
22+
type: string
23+
default: mv3,meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8,meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8
24+
devices:
25+
description: Target devices to run benchmark
26+
required: false
27+
type: string
28+
default: apple_iphone_15_private
29+
benchmark_configs:
30+
description: The list of configs used the benchmark
31+
required: false
32+
type: string
33+
workflow_call:
34+
inputs:
35+
models:
36+
description: Models to be benchmarked
37+
required: false
38+
type: string
39+
default: mv3,meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8,meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8
40+
devices:
41+
description: Target devices to run benchmark
42+
required: false
43+
type: string
44+
default: apple_iphone_15_private
45+
benchmark_configs:
46+
description: The list of configs used the benchmark
47+
required: false
48+
type: string
49+
50+
concurrency:
51+
group: apple-perf-private-devices-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }}
52+
cancel-in-progress: true
53+
54+
jobs:
55+
apple:
56+
uses: ./.github/workflows/apple-perf.yml
57+
secrets: inherit
58+
permissions:
59+
id-token: write
60+
contents: read
61+
with:
62+
models: ${{ inputs.models || 'mv3,meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8,meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8' }}
63+
devices: apple_iphone_15_private
64+
benchmark_configs: ${{ inputs.benchmark_configs }}

.github/workflows/doc-build.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,20 @@ on:
1414
- cron: '0 0 * * *'
1515

1616
jobs:
17+
check-urls:
18+
runs-on: ubuntu-latest
19+
steps:
20+
- uses: actions/checkout@v3
21+
- name: Check URLs
22+
run: bash ./scripts/check_urls.sh
23+
24+
check-xrefs:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- uses: actions/checkout@v3
28+
- name: Check Links
29+
run: bash ./scripts/check_xrefs.sh
30+
1731
build:
1832
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
1933
permissions:

.github/workflows/pull.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ jobs:
399399
size=${arr[4]}
400400
# threshold=48120 on devserver with gcc11.4
401401
# todo(lfq): update once binary size is below 50kb.
402-
threshold="51504"
402+
threshold="51408"
403403
if [[ "$size" -le "$threshold" ]]; then
404404
echo "Success $size <= $threshold"
405405
else
@@ -436,7 +436,7 @@ jobs:
436436
size=${arr[4]}
437437
# threshold=48120 on devserver with gcc11.4
438438
# todo(lfq): update once binary size is below 50kb.
439-
threshold="51784"
439+
threshold="47552"
440440
if [[ "$size" -le "$threshold" ]]; then
441441
echo "Success $size <= $threshold"
442442
else

CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,11 @@ executorch
4545
│ └── <a href="devtools/visualization">visualization</a> - Visualization tools for representing model structure and performance metrics.
4646
├── <a href="docs">docs</a> - Static docs tooling and documentation source files.
4747
├── <a href="examples">examples</a> - Examples of various user flows, such as model export, delegates, and runtime execution.
48-
├── <a href="exir">exir</a> - Ahead-of-time library: model capture and lowering APIs. EXport Intermediate Representation (EXIR) is a format for representing the result of <a href="https://pytorch.org/docs/stable/export.html">torch.export</a>. This directory contains utilities and passes for lowering the EXIR graphs into different <a href="/docs/source/ir-exir.md">dialects</a> and eventually suitable to run on target hardware.
48+
├── <a href="exir">exir</a> - Ahead-of-time library: model capture and lowering APIs. EXport Intermediate Representation (EXIR) is a format for representing the result of <a href="https://pytorch.org/docs/stable/export.html">torch.export</a>. This directory contains utilities and passes for lowering the EXIR graphs into different <a href="docs/source/ir-exir.md">dialects</a> and eventually suitable to run on target hardware.
4949
│ ├── <a href="exir/_serialize">_serialize</a> - Serialize final export artifact.
5050
│ ├── <a href="exir/backend">backend</a> - Backend delegate ahead of time APIs.
5151
│ ├── <a href="exir/capture">capture</a> - Program capture.
52-
│ ├── <a href="exir/dialects">dialects</a> - Op sets for various dialects in the export process. Please refer to the <a href="/docs/source/ir-exir.md">EXIR spec</a> and the <a href="/docs/source/compiler-backend-dialect.md">backend dialect</a> doc for more details.
52+
│ ├── <a href="exir/dialects">dialects</a> - Op sets for various dialects in the export process. Please refer to the <a href="docs/source/ir-exir.md">EXIR spec</a> and the <a href="docs/source/compiler-backend-dialect.md">backend dialect</a> doc for more details.
5353
│ ├── <a href="exir/emit">emit</a> - Conversion from ExportedProgram to ExecuTorch execution instructions.
5454
│ ├── <a href="exir/operator">operator</a> - Operator node manipulation utilities.
5555
│ ├── <a href="exir/passes">passes</a> - Built-in compiler passes.
@@ -68,7 +68,7 @@ executorch
6868
│ ├── <a href="extension/memory_allocator">memory_allocator</a> - 1st party memory allocator implementations.
6969
│ ├── <a href="extension/module">module</a> - A simplified C++ wrapper for the runtime. An abstraction that deserializes and executes an ExecuTorch artifact (.pte file). Refer to the <a href="docs/source/extension-module.md">module documentation</a> for more information.
7070
│ ├── <a href="extension/parallel">parallel</a> - C++ threadpool integration.
71-
│ ├── <a href="extension/pybindings">pybindings</a> - Python API for executorch runtime. This is powering up the <a href="docs/source/runtime-python-api-reference.md">runtime Python API</a> for ExecuTorch.
71+
│ ├── <a href="extension/pybindings">pybindings</a> - Python API for executorch runtime. This is powering up the <a href="docs/source/runtime-python-api-reference.rst">runtime Python API</a> for ExecuTorch.
7272
│ ├── <a href="extension/pytree">pytree</a> - C++ and Python flattening and unflattening lib for pytrees.
7373
│ ├── <a href="extension/runner_util">runner_util</a> - Helpers for writing C++ PTE-execution tools.
7474
│ ├── <a href="extension/tensor">tensor</a> - Tensor maker and <code>TensorPtr</code>, details in <a href="docs/source/extension-tensor.md">this documentation</a>. For how to use <code>TensorPtr</code> and <code>Module</code>, please refer to the <a href="docs/source/using-executorch-cpp.md">"Using ExecuTorch with C++"</a> doc.
@@ -114,7 +114,7 @@ If you're completely new to open-source projects, GitHub, or ExecuTorch, please
114114
1. If you've changed APIs or added a new tool or feature, [update the
115115
documentation](#updating-documentation).
116116
1. If you added an experimental API or deprecated an existing API, follow the
117-
[API Life Cycle and Deprecation Policy](/docs/source/api-life-cycle.md).
117+
[API Life Cycle and Deprecation Policy](docs/source/api-life-cycle.md).
118118
1. Make sure your code follows the [style guides](#coding-style) and passes the
119119
[lint checks](#lintrunner).
120120
1. If you haven't already, complete the [Contributor License Agreement ("CLA")](#contributor-license-agreement-cla).

README-wheel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@ tutorials and documentation. Here are some starting points:
2525
* [Exporting to ExecuTorch](https://pytorch.org/executorch/main/tutorials/export-to-executorch-tutorial)
2626
* Learn the fundamentals of exporting a PyTorch `nn.Module` to ExecuTorch, and
2727
optimizing its performance using quantization and hardware delegation.
28-
* Running LLaMA on [iOS](docs/source/llm/llama-demo-ios) and [Android](docs/source/llm/llama-demo-android) devices.
28+
* Running LLaMA on [iOS](docs/source/llm/llama-demo-ios.md) and [Android](docs/source/llm/llama-demo-android.md) devices.
2929
* Build and run LLaMA in a demo mobile app, and learn how to integrate models
3030
with your own apps.

backends/apple/coreml/runtime/test/setup.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@ This is a tutorial for setting up tests for the **Core ML** backend.
44

55
## Running tests
66

7-
1. Follow the instructions described in [Setting Up ExecuTorch](/docs/source/getting-started-setup.md) to set up ExecuTorch environment.
7+
1. Follow the instructions described in [Setting Up ExecuTorch](../../../../../docs/source/getting-started-setup.rst) to set up ExecuTorch environment.
88

99
2. Run `install_requirements.sh` to install dependencies required by the **Core ML** backend.
1010

1111
```bash
1212
cd executorch
1313

14-
sh backends/apple/coreml/scripts/install_requirements.sh
14+
sh backends/apple/coreml/scripts/install_requirements.sh
1515

16-
```
16+
```
1717

18-
3. Follow the instructions described in [Building with CMake](/docs/source/runtime-build-and-cross-compilation.md#building-with-cmake) to set up CMake build system.
18+
3. Follow the instructions described in [Building with CMake](../../../../../docs/source/using-executorch-cpp.md#building-with-cmake) to set up CMake build system.
1919

2020
4. Install [Xcode](https://developer.apple.com/xcode/).
2121

@@ -26,7 +26,7 @@ sh backends/apple/coreml/scripts/install_requirements.sh
2626
```bash
2727
cd executorch
2828

29-
# Builds macOS universal test bundle.
29+
# Builds macOS universal test bundle.
3030

3131
sh backends/apple/coreml/srcipts/build_tests.sh
3232

@@ -40,15 +40,15 @@ cd executorch
4040
sh backends/apple/coreml/srcipts/run_tests.sh
4141
4242
```
43-
43+
4444
## Updating tests
4545

4646
1. Open the Xcode workspace.
4747

4848
```bash
4949
cd executorch
5050

51-
# Builds macOS universal test bundle.
51+
# Builds macOS universal test bundle.
5252

5353
open backends/apple/coreml/runtime/workspace/executorchcoreml.xcworkspace
5454

@@ -62,4 +62,4 @@ cd executorch
6262
# There is no need to build the tests.
6363
sh backends/apple/coreml/srcipts/run_tests.sh
6464

65-
```
65+
```

backends/apple/coreml/setup.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This is a tutorial for setting up the Core ML backend.
44

55
## AOT Setup
66

7-
1. Follow the instructions described in [Setting Up ExecuTorch](/docs/source/getting-started-setup.md) to set up ExecuTorch environment.
7+
1. Follow the instructions described in [Setting Up ExecuTorch](../../../docs/source/getting-started-setup.rst) to set up ExecuTorch environment.
88

99

1010
2. Run the example script to validate that the **Core ML** backend is set up correctly.
@@ -28,7 +28,7 @@ delegated_program_manager = edge_program_manager.to_backend(CoreMLPartitioner())
2828

2929
## Integrating Core ML delegate into runtime.
3030

31-
1. Follow the instructions described in [Building with CMake](/docs/source/runtime-build-and-cross-compilation.md#building-with-cmake) to set up CMake build system.
31+
1. Follow the instructions described in [Building with CMake](../../../docs/source/using-executorch-cpp.md#building-with-cmake) to set up CMake build system.
3232

3333
2. Install [Xcode](https://developer.apple.com/xcode/).
3434

backends/apple/mps/mps_preprocess.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from typing import ClassVar, Dict, final, List, Tuple
77

88
import torch
9+
from executorch import exir
910

1011
from executorch.backends.apple.mps.operators.node_visitor import (
1112
get_node_visitors,
@@ -35,6 +36,7 @@
3536

3637
from executorch.exir.passes.memory_format_ops_pass import DimOrderOpsRevertPass
3738
from executorch.exir.program._program import _transform
39+
from executorch.exir.verification.verifier import EXIREdgeDialectVerifier
3840
from torch.export.exported_program import ExportedProgram
3941

4042
FORMAT = "[%(levelname)s %(asctime)s %(filename)s:%(lineno)s] %(message)s"
@@ -87,7 +89,19 @@ def preprocess(
8789
# the `output_ids` array in the schema.
8890

8991
# TODO: Remove this once we have a better support for the dim-order ops.
90-
edge_program = _transform(edge_program, DimOrderOpsRevertPass())
92+
# Need to override the verifier to skip the non dim-order ops from tripping the default verifier.
93+
edge_program = _transform(
94+
edge_program,
95+
DimOrderOpsRevertPass(),
96+
override_verifiers=[
97+
EXIREdgeDialectVerifier(
98+
edge_compile_config=exir.EdgeCompileConfig(
99+
_check_ir_validity=False, # Disable the edge dialect verifier, since we are in the mps backend.
100+
),
101+
class_only=True,
102+
)
103+
],
104+
)
91105

92106
mps_graph = MPSGraph(
93107
version="0",

backends/apple/mps/setup.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ The MPS backend device maps machine learning computational graphs and primitives
1212
:::
1313
:::{grid-item-card} Tutorials we recommend you complete before this:
1414
:class-card: card-prerequisites
15-
* [Introduction to ExecuTorch](intro-how-it-works.md)
16-
* [Setting up ExecuTorch](getting-started-setup.md)
17-
* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
18-
* [ExecuTorch iOS Demo App](demo-apps-ios.md)
19-
* [ExecuTorch iOS LLaMA Demo App](llm/llama-demo-ios.md)
15+
* [Introduction to ExecuTorch](../../../docs/source/intro-how-it-works.md)
16+
* [Setting up ExecuTorch](../../../docs/source/getting-started-setup.rst)
17+
* [Building ExecuTorch with CMake](../../../docs/source/using-executorch-cpp.md#building-with-cmake)
18+
* [ExecuTorch iOS Demo App](../../../docs/source/demo-apps-ios.md)
19+
* [ExecuTorch iOS LLaMA Demo App](../../../docs/source/llm/llama-demo-ios.md)
2020
:::
2121
::::
2222

@@ -111,12 +111,12 @@ python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --no-use_fp
111111
```
112112

113113
### Profiling:
114-
1. [Optional] Generate an [ETRecord](./etrecord.rst) while you're exporting your model.
114+
1. [Optional] Generate an [ETRecord](../../../docs/source/etrecord.rst) while you're exporting your model.
115115
```bash
116116
cd executorch
117117
python3 -m examples.apple.mps.scripts.mps_example --model_name="mv3" --generate_etrecord -b
118118
```
119-
2. Run your Program on the ExecuTorch runtime and generate an [ETDump](./etdump.md).
119+
2. Run your Program on the ExecuTorch runtime and generate an [ETDump](../../../docs/source/etdump.md).
120120
```
121121
./cmake-out/examples/apple/mps/mps_executor_runner --model_path mv3_mps_bundled_fp16.pte --bundled_program --dump-outputs
122122
```

backends/arm/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Copyright 2025 Arm Limited and/or its affiliates.
2+
#
3+
# This source code is licensed under the BSD-style license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
6+
from .arm_backend import ArmCompileSpecBuilder # noqa # usort: skip
7+
from .tosa_backend import TOSABackend # noqa # usort: skip
8+
from .tosa_partitioner import TOSAPartitioner # noqa # usort: skip
9+
from .ethosu_backend import EthosUBackend # noqa # usort: skip
10+
from .ethosu_partitioner import EthosUPartitioner # noqa # usort: skip

backends/arm/_passes/convert_expand_copy_to_repeat.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,18 @@
1-
# Copyright 2024 Arm Limited and/or its affiliates.
2-
# All rights reserved.
1+
# Copyright 2024-2025 Arm Limited and/or its affiliates.
32
#
43
# This source code is licensed under the BSD-style license found in the
54
# LICENSE file in the root directory of this source tree.
65

76
# pyre-unsafe
87

8+
import logging
99
from typing import cast
1010

1111
from executorch.exir.dialects._ops import ops as exir_ops
1212
from executorch.exir.pass_base import ExportPass
1313

14+
logger = logging.getLogger(__name__)
15+
1416

1517
class ConvertExpandCopyToRepeatPass(ExportPass):
1618
"""
@@ -41,6 +43,14 @@ def call_operator(self, op, args, kwargs, meta):
4143
multiples[i] if multiples[i] != -1 and extended_shape[i] == 1 else 1
4244
for i in range(expanded_rank)
4345
]
46+
47+
if all((x == 1 for x in multiples)):
48+
# All dimensions/repetitions occur only once. Remove node
49+
# altogether since it's in practice just a copy.
50+
logger.warning("Found redundant expand node (no-op). Removing it.")
51+
52+
return args[0]
53+
4454
return super().call_operator(
4555
op=self.repeat, args=(args[0], multiples), kwargs=kwargs, meta=meta
4656
)

backends/arm/ethosu_backend.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414
import logging
1515
from typing import final, List
1616

17-
from executorch.backends.arm.arm_vela import vela_compile
17+
from executorch.backends.arm import TOSABackend
1818

19-
from executorch.backends.arm.tosa_backend import TOSABackend
19+
from executorch.backends.arm.arm_vela import vela_compile
2020
from executorch.exir.backend.backend_details import BackendDetails, PreprocessResult
2121
from executorch.exir.backend.compile_spec_schema import CompileSpec
2222
from torch.export.exported_program import ExportedProgram

backends/arm/ethosu_partitioner.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,7 @@
1010
from executorch.backends.arm.arm_backend import (
1111
is_ethosu,
1212
) # usort: skip
13-
from executorch.backends.arm.ethosu_backend import EthosUBackend
14-
from executorch.backends.arm.tosa_partitioner import TOSAPartitioner
13+
from executorch.backends.arm import EthosUBackend, TOSAPartitioner
1514
from executorch.exir.backend.compile_spec_schema import CompileSpec
1615
from executorch.exir.backend.partitioner import DelegationSpec
1716
from torch.fx.passes.operator_support import OperatorSupportBase

0 commit comments

Comments
 (0)