@@ -95,7 +95,7 @@ For more information, see [Setting Up ExecuTorch](../getting-started-setup.md).
95
95
96
96
## Running a Large Language Model Locally
97
97
98
- This example uses Karpathy’s [ NanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
98
+ This example uses Karpathy’s [ nanoGPT ] ( https://github.com/karpathy/nanoGPT ) , which is a minimal implementation of
99
99
GPT-2 124M. This guide is applicable to other language models, as ExecuTorch is model-invariant.
100
100
101
101
There are two steps to running a model with ExecuTorch:
@@ -113,7 +113,7 @@ ExecuTorch runtime.
113
113
114
114
Exporting takes a PyTorch model and converts it into a format that can run efficiently on consumer devices.
115
115
116
- For this example, you will need the NanoGPT model and the corresponding tokenizer vocabulary.
116
+ For this example, you will need the nanoGPT model and the corresponding tokenizer vocabulary.
117
117
118
118
::::{tab-set}
119
119
:::{tab-item} curl
@@ -426,12 +426,12 @@ specific hardware (delegation), and because it is doing all of the calculations
426
426
While ExecuTorch provides a portable, cross-platform implementation for all
427
427
operators, it also provides specialized backends for a number of different
428
428
targets. These include, but are not limited to, x86 and ARM CPU acceleration via
429
- the XNNPACK backend, Apple acceleration via the CoreML backend and Metal
429
+ the XNNPACK backend, Apple acceleration via the Core ML backend and Metal
430
430
Performance Shader (MPS) backend, and GPU acceleration via the Vulkan backend.
431
431
432
432
Because optimizations are specific to a given backend, each pte file is specific
433
433
to the backend(s) targeted at export. To support multiple devices, such as
434
- XNNPACK acceleration for Android and CoreML for iOS, export a separate PTE file
434
+ XNNPACK acceleration for Android and Core ML for iOS, export a separate PTE file
435
435
for each backend.
436
436
437
437
To delegate to a backend at export time, ExecuTorch provides the ` to_backend() `
@@ -442,12 +442,12 @@ computation graph that can be accelerated by the target backend,and
442
442
acceleration and optimization. Any portions of the computation graph not
443
443
delegated will be executed by the ExecuTorch operator implementations.
444
444
445
- To delegate the exported model to the specific backend, we need to import its
446
- partitioner as well as edge compile config from ExecuTorch Codebase first, then
445
+ To delegate the exported model to a specific backend, we need to import its
446
+ partitioner as well as edge compile config from ExecuTorch codebase first, then
447
447
call ` to_backend ` with an instance of partitioner on the ` EdgeProgramManager `
448
448
object ` to_edge ` function created.
449
449
450
- Here's an example of how to delegate NanoGPT to XNNPACK (if you're deploying to an Android Phone for instance):
450
+ Here's an example of how to delegate nanoGPT to XNNPACK (if you're deploying to an Android phone for instance):
451
451
452
452
``` python
453
453
# export_nanogpt.py
@@ -466,7 +466,7 @@ from torch._export import capture_pre_autograd_graph
466
466
467
467
from model import GPT
468
468
469
- # Load the NanoGPT model.
469
+ # Load the nanoGPT model.
470
470
model = GPT .from_pretrained(' gpt2' )
471
471
472
472
# Create example inputs. This is used in the export process to provide
@@ -590,7 +590,7 @@ I'm not sure if you've heard of the "Curse of the Dragon" or not, but it's a ver
590
590
The delegated model should be noticeably faster compared to the non-delegated model.
591
591
592
592
For more information regarding backend delegateion, see the ExecuTorch guides
593
- for the [ XNNPACK Backend] ( ../tutorial-xnnpack-delegate-lowering.md ) and [ CoreML
593
+ for the [ XNNPACK Backend] ( ../tutorial-xnnpack-delegate-lowering.md ) and [ Core ML
594
594
Backend] ( ../build-run-coreml.md ) .
595
595
596
596
## Quantization
@@ -701,15 +701,15 @@ df = delegation_info.get_operator_delegation_dataframe()
701
701
print (tabulate(df, headers = " keys" , tablefmt = " fancy_grid" ))
702
702
```
703
703
704
- For NanoGPT targeting the XNNPACK backend, you might see the following:
704
+ For nanoGPT targeting the XNNPACK backend, you might see the following:
705
705
```
706
706
Total delegated subgraphs: 86
707
707
Number of delegated nodes: 473
708
708
Number of non-delegated nodes: 430
709
709
```
710
710
711
711
712
- | | op_type | occurrences_in_delegated_graphs | occurrences_in_non_delegated_graphs |
712
+ | | op_type | # in_delegated_graphs | # in_non_delegated_graphs |
713
713
| ----| ---------------------------------| ------- | -----|
714
714
| 0 | aten__ softmax_default | 12 | 0 |
715
715
| 1 | aten_add_tensor | 37 | 0 |
@@ -731,7 +731,7 @@ print(print_delegated_graph(graph_module))
731
731
This may generate a large amount of output for large models. Consider using "Control+F" or "Command+F" to locate the operator you’re interested in
732
732
( e.g. “aten_view_copy_default”). Observe which instances are not under lowered graphs.
733
733
734
- In the fragment of the output for NanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not .
734
+ In the fragment of the output for nanoGPT below, observe that embedding and add operators are delegated to XNNPACK while the sub operator is not .
735
735
736
736
```
737
737
%aten_unsqueeze_copy_default_22 : [ num_users=1] = call_function[ target=executorch.exir.dialects.edge._ ops.aten.unsqueeze_copy.default] (args = (%aten_arange_start_step_23, -2), kwargs = {})
0 commit comments