Skip to content

Commit aca7014

Browse files
committed
Update on "[ET-VK][8/n] Unsqueeze"
Exploit the fact that, we reduce the unsqueeze operation to permute. ``` torch.all(torch.permute(x.unsqueeze(0), [1, 0, 2, 3]) == x.unsqueeze(1)) torch.all(torch.permute(x.unsqueeze(0), [1, 2, 0, 3]) == x.unsqueeze(2)) torch.all(torch.permute(x.unsqueeze(0), [1, 2, 3, 0]) == x.unsqueeze(3)) ``` This diff introduce a minor change to the Permute implementation that it no longer requires the input dimension length to match the length of the permute array. This allows the `unsqueeze` operation to achieve a no-op `unsqueeze(0)` and then apply a permute. Differential Revision: [D56347734](https://our.internmc.facebook.com/intern/diff/D56347734/) [ghstack-poisoned]
2 parents 8f7363d + 458f1d1 commit aca7014

File tree

83 files changed

+2105
-646
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+2105
-646
lines changed

.github/workflows/android.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,23 @@ jobs:
4848
# Build Android demo app
4949
bash build/test_android_ci.sh
5050
51+
# Strip libraries for uploda
52+
strip cmake-out-android-arm64-v8a/lib/*.a cmake-out-android-arm64-v8a/extension/android/*.so
53+
strip cmake-out-android-x86_64/lib/*.a cmake-out-android-x86_64/extension/android/*.so
54+
5155
mkdir -p artifacts-to-be-uploaded
56+
mkdir -p artifacts-to-be-uploaded/arm64-v8a/
57+
mkdir -p artifacts-to-be-uploaded/x86_64/
58+
# Copy the jar to S3
59+
cp extension/android/build/libs/executorch.jar artifacts-to-be-uploaded/
5260
# Copy the app and its test suite to S3
5361
cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/debug/*.apk artifacts-to-be-uploaded/
5462
cp examples/demo-apps/android/LlamaDemo/app/build/outputs/apk/androidTest/debug/*.apk artifacts-to-be-uploaded/
55-
# Also copy the share libraries
56-
cp cmake-out-android/lib/*.a artifacts-to-be-uploaded/
63+
# Also copy the libraries
64+
cp cmake-out-android-arm64-v8a/lib/*.a artifacts-to-be-uploaded/arm64-v8a/
65+
cp cmake-out-android-arm64-v8a/extension/android/*.so artifacts-to-be-uploaded/arm64-v8a/
66+
cp cmake-out-android-x86_64/lib/*.a artifacts-to-be-uploaded/x86_64/
67+
cp cmake-out-android-x86_64/extension/android/*.so artifacts-to-be-uploaded/x86_64/
5768
5869
# Upload the app and its test suite to S3 so that they can be downloaded by the test job
5970
upload-artifacts:

.github/workflows/doc-build.yml

Lines changed: 15 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,9 @@ jobs:
4646
# ET_VERSION_DOCS will be pulled during the doc build to add to the version dropdown
4747
# on the website. See docs/source/conf.py for details
4848
49-
REF_TYPE=${{ github.ref_type }}
50-
REF_NAME=${{ github.ref_name }}
51-
52-
echo "$REF_TYPE"
53-
echo "$REF_NAME"
54-
55-
ET_VERSION_DOCS="${REF_NAME}"
49+
GITHUB_REF=${{ github.ref }}
50+
echo "$GITHUB_REF"
51+
ET_VERSION_DOCS="${GITHUB_REF}"
5652
echo "$ET_VERSION_DOCS"
5753
5854
set -eux
@@ -69,7 +65,6 @@ jobs:
6965
cd ..
7066
7167
# If it's main branch, add noindex tag to all .html files to exclude from Google Search indexing.
72-
GITHUB_REF=${{ github.ref }}
7368
echo "GitHub Ref: ${GITHUB_REF}"
7469
if [[ "${{ github.ref }}" == 'refs/heads/main' ]]; then
7570
find docs/_build/html/ -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">';
@@ -83,7 +78,7 @@ jobs:
8378
8479
upload-gh-pages:
8580
needs: build
86-
if: github.repository == 'pytorch/executorch' && github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/release/') || startsWith(github.ref, 'refs/tags/v'))
81+
if: github.repository == 'pytorch/executorch' && github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/v'))
8782
permissions:
8883
contents: write
8984
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
@@ -95,28 +90,17 @@ jobs:
9590
script: |
9691
set -euo pipefail
9792
98-
REF_TYPE=${{ github.ref_type }}
99-
REF_NAME=${{ github.ref_name }}
100-
101-
# If building for a release tag, branch, set the branch/tag name
102-
# as the target folder in the gh-pages branch. The artifacts created
103-
# during the build will be copied over to the target dir in the
104-
# gh-pages branch.
105-
if [[ "${REF_TYPE}" == branch ]]; then
106-
TARGET_FOLDER="${REF_NAME}"
107-
elif [[ "${REF_TYPE}" == tag ]]; then
108-
# Strip the leading "v" as well as the trailing patch version and "-rc" suffix.
109-
# For example: 'v0.1.2' -> '0.1' and 'v0.1.2-rc1' -> 0.1.
110-
case "${REF_NAME}" in
111-
*-rc*)
112-
echo "Aborting upload since this is an RC tag: ${REF_NAME}"
113-
# We don't generate -rc* documentation but for actual tag only.
114-
exit 0
115-
;;
116-
*)
117-
TARGET_FOLDER=$(echo "${REF_NAME}" | sed 's/v\([0-9]\+\)\.\([0-9]\+\)\.[0-9]\+/\1.\2/')
118-
;;
119-
esac
93+
# Get github.ref for the output doc folder. By default "main"
94+
# If matches a tag like refs/tags/v1.12.0-rc3 or
95+
# refs/tags/v1.12.0 convert to 1.12
96+
GITHUB_REF=${{ github.ref }}
97+
98+
# Convert refs/tags/v1.12.0rc3 into 1.12.
99+
# Adopted from https://github.com/pytorch/pytorch/blob/main/.github/workflows/_docs.yml#L150C11-L155C13
100+
if [[ "${GITHUB_REF}" =~ ^refs/tags/v([0-9]+\\.[0-9]+)\\. ]]; then
101+
TARGET_FOLDER="${BASH_REMATCH[1]}"
102+
else
103+
TARGET_FOLDER="main"
120104
fi
121105
echo "Target Folder: ${TARGET_FOLDER}"
122106

backends/apple/coreml/README.md

Lines changed: 90 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,58 +6,123 @@ Core ML is an optimized framework for running machine learning models on Apple d
66

77
## Layout
88
- `compiler/` : Lowers a module to Core ML backend.
9+
- `partition/`: Partitions a module fully or partially to Core ML backend.
10+
- `quantizer/`: Quantizes a module in Core ML favored scheme.
911
- `scripts/` : Scripts for installing dependencies and running tests.
1012
- `runtime/`: Core ML delegate runtime implementation.
1113
- `inmemoryfs`: InMemory filesystem implementation used to serialize/de-serialize AOT blob.
1214
- `kvstore`: Persistent Key-Value store implementation.
1315
- `delegate`: Runtime implementation.
1416
- `include` : Public headers.
15-
- `tests` : Tests for Core ML delegate.
16-
- `workspace` : Xcode workspace for tests.
17+
- `sdk` : SDK implementation.
18+
- `tests` : Unit tests.
19+
- `workspace` : Xcode workspace for the runtime.
1720
- `third-party/`: External dependencies.
1821

19-
## Help & Improvements
20-
If you have problems or questions or have suggestions for ways to make
21-
implementation and testing better, please create an issue on [github](https://www.github.com/pytorch/executorch/issues).
22+
## Partition and Delegation
2223

23-
## Delegation
24-
25-
For delegating the Program to the **Core ML** backend, the client must be responsible for calling `to_backend` with the **CoreMLBackend** tag.
24+
To delegate a Program to the **Core ML** backend, the client must call `to_backend` with the **CoreMLPartitioner**.
2625

2726
```python
28-
import executorch.exir as exir
2927
import torch
30-
31-
from torch.export import export
32-
33-
from executorch.exir import to_edge
34-
35-
from executorch.exir.backend.backend_api import to_backend
28+
import executorch.exir
3629

3730
from executorch.backends.apple.coreml.compiler import CoreMLBackend
31+
from executorch.backends.apple.coreml.partition.coreml_partitioner import CoreMLPartitioner
3832

39-
class LowerableSubModel(torch.nn.Module):
33+
class Model(torch.nn.Module):
4034
def __init__(self):
4135
super().__init__()
4236

4337
def forward(self, x):
4438
return torch.sin(x)
4539

46-
# Convert the lowerable module to Edge IR Representation
47-
to_be_lowered = LowerableSubModel()
48-
example_input = (torch.ones(1), )
49-
to_be_lowered_exir_submodule = to_edge(export(to_be_lowered, example_input))
40+
source_model = Model()
41+
example_inputs = (torch.ones(1), )
42+
43+
# Export the source model to Edge IR representation
44+
aten_program = torch.export.export(source_model, example_inputs)
45+
edge_program_manager = executorch.exir.to_edge(aten_program)
46+
47+
# Delegate to Core ML backend
48+
delegated_program_manager = edge_program_manager.to_backend(CoreMLPartitioner())
5049

51-
# Lower to Core ML backend
52-
lowered_module = to_backend('CoreMLBackend', to_be_lowered_exir_submodule.exported_program, [])
50+
# Serialize delegated program
51+
executorch_program = delegated_program_manager.to_executorch()
52+
with open("model.pte", "wb") as f:
53+
f.write(executorch_program.buffer)
5354
```
5455

55-
Currently, the **Core ML** backend delegates the whole module to **Core ML**. If a specific op is not supported by the **Core ML** backend then the `to_backend` call would throw an exception. We will be adding a **Core ML Partitioner** to resolve the issue.
56+
The module will be fully or partially delegated to **Core ML**, depending on whether all or part of ops are supported by the **Core ML** backend. User may force skip certain ops by `CoreMLPartitioner(skip_ops_for_coreml_delegation=...)`
57+
58+
The `to_backend` implementation is a thin wrapper over [coremltools](https://apple.github.io/coremltools/docs-guides/), `coremltools` is responsible for converting an **ExportedProgram** to a **MLModel**. The converted **MLModel** data is saved, flattened, and returned as bytes to **ExecuTorch**.
59+
60+
## Quantization
5661

57-
The `to_backend` implementation is a thin wrapper over `coremltools`, `coremltools` is responsible for converting an **ExportedProgram** to a **MLModel**. The converted **MLModel** data is saved, flattened, and returned as bytes to **ExecuTorch**.
62+
To quantize a Program in a Core ML favored way, the client may utilize **CoreMLQuantizer**.
63+
64+
```python
65+
import torch
66+
import executorch.exir
67+
68+
from torch._export import capture_pre_autograd_graph
69+
from torch.ao.quantization.quantize_pt2e import (
70+
convert_pt2e,
71+
prepare_pt2e,
72+
prepare_qat_pt2e,
73+
)
74+
75+
from executorch.backends.apple.coreml.quantizer.coreml_quantizer import CoreMLQuantizer
76+
from coremltools.optimize.torch.quantization.quantization_config import (
77+
LinearQuantizerConfig,
78+
QuantizationScheme,
79+
)
80+
81+
class Model(torch.nn.Module):
82+
def __init__(self) -> None:
83+
super().__init__()
84+
self.conv = torch.nn.Conv2d(
85+
in_channels=3, out_channels=16, kernel_size=3, padding=1
86+
)
87+
self.relu = torch.nn.ReLU()
88+
89+
def forward(self, x: torch.Tensor) -> torch.Tensor:
90+
a = self.conv(x)
91+
return self.relu(a)
92+
93+
source_model = Model()
94+
example_inputs = (torch.randn((1, 3, 256, 256)), )
95+
96+
pre_autograd_aten_dialect = capture_pre_autograd_graph(model, example_inputs)
97+
98+
quantization_config = LinearQuantizerConfig.from_dict(
99+
{
100+
"global_config": {
101+
"quantization_scheme": QuantizationScheme.symmetric,
102+
"activation_dtype": torch.uint8,
103+
"weight_dtype": torch.int8,
104+
"weight_per_channel": True,
105+
}
106+
}
107+
)
108+
quantizer = CoreMLQuantizer(quantization_config)
109+
110+
# For post-training quantization, use `prepare_pt2e`
111+
# For quantization-aware trainin,g use `prepare_qat_pt2e`
112+
prepared_graph = prepare_pt2e(pre_autograd_aten_dialect, quantizer)
113+
114+
prepared_graph(*example_inputs)
115+
converted_graph = convert_pt2e(prepared_graph)
116+
```
117+
118+
The `converted_graph` is the quantized torch model, and can be delegated to **Core ML** similarly through **CoreMLPartitioner**
58119

59120
## Runtime
60121

61-
To execute a **Core ML** delegated **Program**, the client must link to the `coremldelegate` library. Once linked there are no additional steps required, **ExecuTorch** when running the **Program** would call the **Core ML** runtime to execute the **Core ML** delegated part of the **Program**.
122+
To execute a Core ML delegated program, the application must link to the `coremldelegate` library. Once linked there are no additional steps required, ExecuTorch when running the program would call the Core ML runtime to execute the Core ML delegated part of the program.
62123

63124
Please follow the instructions described in the [Core ML setup](/backends/apple/coreml/setup.md) to link the `coremldelegate` library.
125+
126+
## Help & Improvements
127+
If you have problems or questions or have suggestions for ways to make
128+
implementation and testing better, please create an issue on [github](https://www.github.com/pytorch/executorch/issues).

backends/apple/coreml/setup.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ python3 -m examples.apple.coreml.scripts.export --model_name add
2929
4. You can now integrate the **Core ML** backend in code.
3030

3131
```python
32-
# Lower to Core ML backend
33-
lowered_module = to_backend('CoreMLBackend', to_be_lowered_exir_submodule, [])
32+
# Delegate to Core ML backend
33+
delegated_program_manager = edge_program_manager.to_backend(CoreMLPartitioner())
3434
```
3535

3636

@@ -46,15 +46,15 @@ lowered_module = to_backend('CoreMLBackend', to_be_lowered_exir_submodule, [])
4646
xcode-select --install
4747
```
4848

49-
2. Build **Core ML** delegate. The following will create a `executorch.xcframework` in `cmake-out` directory.
49+
4. Build **Core ML** delegate. The following will create `executorch.xcframework` and `coreml_backend.xcframework` in the `cmake-out` directory.
5050

5151
```bash
5252
cd executorch
5353
./build/build_apple_frameworks.sh --Release --coreml
5454
```
55-
3. Open the project in Xcode, and drag the `executorch.xcframework` generated from Step 2 to Frameworks.
55+
5. Open the project in Xcode, and drag `executorch.xcframework` and `coreml_backend.xcframework` frameworks generated from Step 2 to Frameworks.
5656

57-
4. Go to project Target’s Build Phases - Link Binaries With Libraries, click the + sign, and add the following frameworks:
57+
6. Go to project Target’s Build Phases - Link Binaries With Libraries, click the + sign, and add the following frameworks:
5858

5959
```
6060
executorch.xcframework
@@ -63,9 +63,9 @@ coreml_backend.xcframework
6363

6464
5. Go to project Target’s Build Phases - Link Binaries With Libraries, click the + sign, and add the following frameworks.
6565
```
66-
- Accelerate.framework
67-
- CoreML.framework
68-
- libsqlite3.tbd
66+
Accelerate.framework
67+
CoreML.framework
68+
libsqlite3.tbd
6969
```
7070

7171
6. The target could now run a **Core ML** delegated **Program**.

backends/apple/mps/mps_preprocess.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
from executorch.backends.apple.mps.serialization.mps_graph_schema import (
1919
MPSGraph,
2020
MPSTensor,
21+
OpType,
2122
)
2223

2324
from executorch.backends.apple.mps.serialization.mps_graph_serialize import (
@@ -65,6 +66,7 @@ def preprocess(
6566
input_ids=[],
6667
output_ids=[],
6768
constant_ids=[],
69+
graph_type=OpType.mps_graph,
6870
)
6971

7072
convert_model_to_fp16 = True
@@ -111,6 +113,16 @@ def handle_call_function(
111113
mps_graph: MPSGraph,
112114
) -> None:
113115
logging.info(f"Visiting: {node}, {node.target.__name__}")
116+
117+
if (
118+
"delegation_tag" in node.meta
119+
and "metal_kernel" in node.meta["delegation_tag"]
120+
):
121+
logging.info(
122+
f"Node '{node.target.__name__}' was marked as a Metal kernel by the MPSPartitioner!"
123+
)
124+
mps_graph.graph_type = OpType.metal_kernel
125+
114126
if node.target.__name__ in node_visitors:
115127
node_visitors[node.target.__name__].define_node(node, mps_graph)
116128
else:

0 commit comments

Comments
 (0)