Updating the README to remove duplication (#924)

Jack-Khuu · web-flow · commit ddb5820f1261 · 2024-07-18T13:57:58.000-07:00
diff --git a/README.md b/README.md
@@ -238,46 +238,58 @@ export TORCHCHAT_ROOT=${PWD}
 ./scripts/install_et.sh
 ```
 
-### Test it out using our ExecuTorch runner
+
+### Export for mobile
+Similar to AOTI, to deploy onto device, we first export the PTE artifact, then we load the artifact for inference.
+
+The following example uses the Llama3 8B Instruct model.
+```
+# Export
+python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
+```
+
+> [!NOTE]
+> We use `--quantize config/data/mobile.json` to quantize the
+llama3 model to reduce model size and improve performance for
+on-device use cases.
+
+For more details on quantization and what settings to use for your use
+case visit our [Quantization documentation](docs/quantization.md).
+
+### Deploy and run on Desktop
 
 While ExecuTorch does not focus on desktop inference, it is capable
-of building a runner to do so. This is handy for testing out PTE
+of doing so. This is handy for testing out PTE
 models without sending them to a physical device.
 
-Build the runner
-```bash
-scripts/build_native.sh et
-```
+Specifically there are 2 ways of doing so: Pure Python and via a Runner
+
+<details>
+<summary>Deploying via Python</summary>
 
-Get a PTE file if you don't have one already
 ```
-python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
+# Execute
+python3 torchchat.py generate llama3 --device cpu --pte-path llama3.pte --prompt "Hello my name is"
 ```
 
-Execute using the runner
-```bash
-cmake-out/et_run llama3.pte -z `python3 torchchat.py where llama3`/tokenizer.model -i "Once upon a time"
-```
+</details>
 
-### Export for mobile
-The following example uses the Llama3 8B Instruct model.
 
+<details>
+<summary>Deploying via a Runner</summary>
+
+Build the runner
+```bash
+scripts/build_native.sh et
 ```
-# Export
-python3 torchchat.py export llama3 --quantize config/data/mobile.json --output-pte-path llama3.pte
 
-# Execute
-python3 torchchat.py generate llama3 --device cpu --pte-path llama3.pte --prompt "Hello my name is"
+Execute using the runner
+```bash
+cmake-out/et_run llama3.pte -z `python3 torchchat.py where llama3`/tokenizer.model -i "Once upon a time"
 ```
 
-> [!NOTE]
-> We use `--quantize config/data/mobile.json` to quantize the
-llama3 model to reduce model size and improve performance for
-on-device use cases.
+</details>
 
-For more details on quantization and what settings to use for your use
-case visit our [Quantization documentation](docs/quantization.md) or
-run `python3 torchchat.py export`
 
 [end default]: end