Skip to content

Commit e15f002

Browse files
JacobSzwejbkafacebook-github-bot
authored andcommitted
Runtime API tutorial (#673)
Summary: Pull Request resolved: #673 Step by step tutorial on performing a single inference of a simple model Reviewed By: dbort Differential Revision: D50037399 fbshipit-source-id: cd907818fb74e131c3e07bed768882d0a2ee281a
1 parent 04bd18b commit e15f002

File tree

5 files changed

+160
-14
lines changed

5 files changed

+160
-14
lines changed

docs/source/index.rst

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,6 @@ Topics in this section will help you get started with ExecuTorch.
144144
runtime-overview
145145
runtime-build-and-cross-compilation
146146
runtime-backend-delegate-implementation-and-linking
147-
runtime-api
148147
runtime-custom-memory-allocator
149148
runtime-error-handling
150149
runtime-platform-abstraction-layer
@@ -187,13 +186,14 @@ Topics in this section will help you get started with ExecuTorch.
187186
:hidden:
188187

189188
tutorials/export-to-executorch-tutorial
189+
running-a-model-cpp-tutorial
190190
build-run-xtensa
191191
tutorials/sdk-integration-tutorial
192192

193193
Tutorials and Examples
194194
~~~~~~~~~~~~~~~~~~~~~~
195195

196-
Ready to experiment? Check out some of the interactive
196+
Ready to experiment? Check out some of the
197197
ExecuTorch tutorials.
198198

199199
.. customcardstart::
@@ -212,4 +212,11 @@ ExecuTorch tutorials.
212212
:link: tutorials/sdk-integration.html
213213
:tags: Export
214214

215+
.. customcarditem::
216+
:header: Running an ExecuTorch Model C++ Tutorial
217+
:card_description: A tutorial for setting up memory pools, loading a model, setting inputs, executing the model, and retrieving outputs on device.
218+
:image: _static/img/generic-pytorch-logo.png
219+
:link: running-a-model-cpp.html
220+
:tags:
221+
215222
.. customcardend::
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Running an ExecuTorch Model in C++ Tutorial
2+
3+
**Author:** [Jacob Szwejbka](https://github.com/JacobSzwejbka)
4+
5+
In this tutorial, we will cover the APIs to load an ExecuTorch model,
6+
prepare the MemoryManager, set inputs, execute the model, and retrieve outputs.
7+
8+
For a high level overview of the ExecuTorch Runtime please see [Runtime Overview](runtime-overview.md), and for more in-depth documentation on
9+
each API please see the [Runtime API Reference](executorch-runtime-api-reference.rst).
10+
[Here](https://github.com/pytorch/executorch/blob/main/examples/portable/executor_runner/executor_runner.cpp) is a fully functional version C++ model runner, and the [Setting up ExecuTorch](getting-started-setup.md) doc shows how to build and run it.
11+
12+
13+
## Prerequisites
14+
15+
You will need an ExecuTorch model to follow along. We will be using
16+
the model `SimpleConv` generated from the [Exporting to ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
17+
18+
## Model Loading
19+
20+
The first step towards running your model is to load it. ExecuTorch uses an abstraction called a `DataLoader` to handle the specifics of retrieving the `.pte` file data, and then `Program` represents the loaded state.
21+
22+
Users can define their own `DataLoader`s to fit the needs of their particular system. In this tutorial we will be using the `FileDataLoader`, but you can look under [Example Data Loader Implementations](https://github.com/pytorch/executorch/tree/main/extension/data_loader) to see other options provided by the ExecuTorch project.
23+
24+
For the `FileDataLoader` all we need to do is provide a file path to the constructor.
25+
26+
``` cpp
27+
using namespace torch::executor;
28+
29+
Result<util::FileDataLoader> loader =
30+
util::FileDataLoader::from("/tmp/model.pte");
31+
assert(loader.ok());
32+
33+
Result<Program> program =
34+
torch::executor::Program::load(loader.get());
35+
assert(program.ok());
36+
```
37+
38+
## Setting Up the MemoryManager
39+
40+
Next we will set up the `MemoryManager`.
41+
42+
One of the principles of ExecuTorch is giving users control over where the memory used by the runtime comes from. Today (late 2023) users need to provide 2 different allocators:
43+
44+
* Method Allocator: A `MemoryAllocator` used to allocate runtime structures at `Method` load time. Things like Tensor metadata, the internal chain of instructions, and other runtime state come from this.
45+
46+
* Planned Memory: A `HierarchicalAllocator` containing 1 or more memory arenas where internal mutable tensor data buffers are placed. At `Method` load time internal tensors have their data pointers assigned to various offsets within. The positions of those offsets and the sizes of the arenas are determined by memory planning ahead of time.
47+
48+
For this example we will retrieve the size of the planned memory arenas dynamically from the `Program`, but for heapless environments users could retrieve this information from the `Program` ahead of time and allocate the arena statically. We will also be using a malloc based allocator for the method allocator.
49+
50+
``` cpp
51+
52+
// Method names map back to Python nn.Module method names. Most users will only have the singular method "forward".
53+
const char* method_name = "forward";
54+
55+
// MethodMeta is a lightweight structure that lets us gather metadata
56+
// information about a specific method. In this case we are looking to
57+
// get the required size of the memory planned buffers for the method
58+
// "forward".
59+
Result<MethodMeta> method_meta = program->method_meta(method_name);
60+
assert(method_meta.ok());
61+
62+
std::vector<std::unique_ptr<uint8_t[]>> planned_buffers; // Owns the Memory
63+
std::vector<Span<uint8_t>> planned_arenas; // Passed to the allocator
64+
65+
size_t num_memory_planned_buffers = method_meta->num_memory_planned_buffers();
66+
67+
// It is possible to have multiple layers in our memory hierarchy; for example, SRAM and DRAM.
68+
for (size_t id = 0; id < num_memory_planned_buffers; ++id) {
69+
// .get() will always succeed because id < num_memory_planned_buffers.
70+
size_t buffer_size =
71+
static_cast<size_t>(method_meta->memory_planned_buffer_size(id).get());
72+
planned_buffers.push_back(std::make_unique<uint8_t[]>(buffer_size));
73+
planned_arenas.push_back({planned_buffers.back().get(), buffer_size});
74+
}
75+
HierarchicalAllocator planned_memory(
76+
{planned_arenas.data(), planned_arenas.size()});
77+
78+
// Version of MemoryAllocator that uses malloc to handle allocations
79+
// rather then a fixed buffer.
80+
util::MallocMemoryAllocator method_allocator;
81+
82+
// Assemble all of the allocators into the MemoryManager that the Executor
83+
// will use.
84+
MemoryManager memory_manager(&method_allocator, &planned_memory);
85+
```
86+
87+
## Loading a Method
88+
89+
In ExecuTorch we load and initialize from the `Program` at a method granularity. Many programs will only have one method 'forward'. `load_method` is where initialization is done, from setting up tensor metadata, to intializing delegates, etc.
90+
91+
``` cpp
92+
Result<Method> method = program->load_method(method_name);
93+
assert(method.ok());
94+
```
95+
96+
## Setting Inputs
97+
98+
Now that we have our method we need to set up its inputs before we can
99+
perform an inference. In this case we know our model takes a single (1, 3, 256, 256)
100+
sized float tensor.
101+
102+
Depending on how your model was memory planned, the planned memory may or may
103+
not contain buffer space for your inputs and outputs.
104+
105+
If the outputs were not memory planned then users will need to set up the output data pointer with 'set_output_data_ptr'. In this case we will just assume our model was exported with inputs and outputs handled by the memory plan.
106+
107+
``` cpp
108+
// Create our input tensor.
109+
float data[1 * 3 * 256 * 256];
110+
Tensor::SizesType sizes[] = {1, 3, 256, 256};
111+
Tensor::DimOrderType dim_order = {0, 1, 2, 3};
112+
TensorImpl impl(
113+
ScalarType::Float, // dtype
114+
4, // number of dimensions
115+
sizes,
116+
data,
117+
dim_order);
118+
Tensor t(&impl);
119+
120+
// Implicitly casts t to EValue
121+
Error set_input_error = method->set_input(t, 0);
122+
assert(set_input_error == Error::Ok);
123+
```
124+
125+
## Perform an Inference
126+
127+
Now that our method is loaded and our inputs are set we can perform an inference. We do this by calling `execute`.
128+
129+
``` cpp
130+
Error execute_error = method->execute();
131+
assert(execute_error == Error::Ok);
132+
```
133+
134+
## Retrieve Outputs
135+
136+
Once our inference completes we can retrieve our output. We know that our model only returns a single output tensor. One potential pitfall here is that the output we get back is owned by the `Method`. Users should take care to clone their output before performing any mutations on it, or if they need it to have a lifespan seperate from the `Method`.
137+
138+
``` cpp
139+
EValue output = method->get_output(0);
140+
assert(output.isTensor());
141+
```
142+
143+
## Conclusion
144+
145+
In this tutorial, we went over the APIs and steps required to load and perform an inference with an ExecuTorch model in C++.

docs/source/runtime-api.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/source/runtime-overview.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -157,10 +157,7 @@ However, please note:
157157

158158
For more details about the ExecuTorch runtime, please see:
159159

160-
* The
161-
[`executor_runner`](https://github.com/pytorch/executorch/blob/main/examples/portable/executor_runner/executor_runner.cpp)
162-
example tool
163-
* [Runtime API](runtime-api.md)
160+
* [Runtime API Tutorial](./running-a-model-cpp-tutorial)
164161
* [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
165162
* [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
166163
* [Custom Memory Allocation](runtime-custom-memory-allocator.md)

docs/source/tutorials_source/export-to-executorch-tutorial.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
from torch.export import export, ExportedProgram
7171

7272

73-
class M(torch.nn.Module):
73+
class SimpleConv(torch.nn.Module):
7474
def __init__(self) -> None:
7575
super().__init__()
7676
self.conv = torch.nn.Conv2d(
@@ -84,7 +84,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
8484

8585

8686
example_args = (torch.randn(1, 3, 256, 256),)
87-
pre_autograd_aten_dialect = capture_pre_autograd_graph(M(), example_args)
87+
pre_autograd_aten_dialect = capture_pre_autograd_graph(SimpleConv(), example_args)
8888
print("Pre-Autograd ATen Dialect Graph")
8989
print(pre_autograd_aten_dialect)
9090

@@ -236,7 +236,7 @@ def f(x, y):
236236
# model properly for a specific backend.
237237

238238
example_args = (torch.randn(1, 3, 256, 256),)
239-
pre_autograd_aten_dialect = capture_pre_autograd_graph(M(), example_args)
239+
pre_autograd_aten_dialect = capture_pre_autograd_graph(SimpleConv(), example_args)
240240
print("Pre-Autograd ATen Dialect Graph")
241241
print(pre_autograd_aten_dialect)
242242

@@ -280,7 +280,7 @@ def f(x, y):
280280
from executorch.exir import EdgeProgramManager, to_edge
281281

282282
example_args = (torch.randn(1, 3, 256, 256),)
283-
pre_autograd_aten_dialect = capture_pre_autograd_graph(M(), example_args)
283+
pre_autograd_aten_dialect = capture_pre_autograd_graph(SimpleConv(), example_args)
284284
print("Pre-Autograd ATen Dialect Graph")
285285
print(pre_autograd_aten_dialect)
286286

@@ -338,7 +338,7 @@ def decode(x):
338338
# rather than the ``torch.ops.aten`` namespace.
339339

340340
example_args = (torch.randn(1, 3, 256, 256),)
341-
pre_autograd_aten_dialect = capture_pre_autograd_graph(M(), example_args)
341+
pre_autograd_aten_dialect = capture_pre_autograd_graph(SimpleConv(), example_args)
342342
aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, example_args)
343343
edge_program: EdgeProgramManager = to_edge(aten_dialect)
344344
print("Edge Dialect Graph")

0 commit comments

Comments
 (0)