Skip to content

Commit ddb5820

Browse files
authored
Updating the README to remove duplication (#924)
1 parent 4e1879c commit ddb5820

File tree

1 file changed

+37
-25
lines changed

1 file changed

+37
-25
lines changed

README.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -238,46 +238,58 @@ export TORCHCHAT_ROOT=${PWD}
238238
./scripts/install_et.sh
239239
```
240240

241-
### Test it out using our ExecuTorch runner
241+
242+
### Export for mobile
243+
Similar to AOTI, to deploy onto device, we first export the PTE artifact, then we load the artifact for inference.
244+
245+
The following example uses the Llama3 8B Instruct model.
246+
```
247+
# Export
248+
python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
249+
```
250+
251+
> [!NOTE]
252+
> We use `--quantize config/data/mobile.json` to quantize the
253+
llama3 model to reduce model size and improve performance for
254+
on-device use cases.
255+
256+
For more details on quantization and what settings to use for your use
257+
case visit our [Quantization documentation](docs/quantization.md).
258+
259+
### Deploy and run on Desktop
242260

243261
While ExecuTorch does not focus on desktop inference, it is capable
244-
of building a runner to do so. This is handy for testing out PTE
262+
of doing so. This is handy for testing out PTE
245263
models without sending them to a physical device.
246264

247-
Build the runner
248-
```bash
249-
scripts/build_native.sh et
250-
```
265+
Specifically there are 2 ways of doing so: Pure Python and via a Runner
266+
267+
<details>
268+
<summary>Deploying via Python</summary>
251269

252-
Get a PTE file if you don't have one already
253270
```
254-
python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
271+
# Execute
272+
python3 torchchat.py generate llama3 --device cpu --pte-path llama3.pte --prompt "Hello my name is"
255273
```
256274

257-
Execute using the runner
258-
```bash
259-
cmake-out/et_run llama3.pte -z `python3 torchchat.py where llama3`/tokenizer.model -i "Once upon a time"
260-
```
275+
</details>
261276

262-
### Export for mobile
263-
The following example uses the Llama3 8B Instruct model.
264277

278+
<details>
279+
<summary>Deploying via a Runner</summary>
280+
281+
Build the runner
282+
```bash
283+
scripts/build_native.sh et
265284
```
266-
# Export
267-
python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
268285

269-
# Execute
270-
python3 torchchat.py generate llama3 --device cpu --pte-path llama3.pte --prompt "Hello my name is"
286+
Execute using the runner
287+
```bash
288+
cmake-out/et_run llama3.pte -z `python3 torchchat.py where llama3`/tokenizer.model -i "Once upon a time"
271289
```
272290

273-
> [!NOTE]
274-
> We use `--quantize config/data/mobile.json` to quantize the
275-
llama3 model to reduce model size and improve performance for
276-
on-device use cases.
291+
</details>
277292

278-
For more details on quantization and what settings to use for your use
279-
case visit our [Quantization documentation](docs/quantization.md) or
280-
run `python3 torchchat.py export`
281293

282294
[end default]: end
283295

0 commit comments

Comments
 (0)