Skip to content

Commit 86c50df

Browse files
mikekgfbianbarberswolchoklarryliu0820kimishpatel
authored andcommitted
Process readme (#665)
* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <[email protected]> * Unbreak zero-temperature sampling (#599) Fixes #581. * Improve process README * [retake] Add sentencepiece tokenizer (#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <[email protected]> * add desktop.json (#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <[email protected]> Co-authored-by: Scott Wolchok <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: Kimish Patel <[email protected]> Co-authored-by: Scott Roy <[email protected]>
1 parent 824a7ab commit 86c50df

File tree

6 files changed

+24
-14
lines changed

6 files changed

+24
-14
lines changed

.github/workflows/run-readme2.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ jobs:
2727
2828
echo "::group::Create script"
2929
python3 scripts/process-readme.py > ./readme-commands.sh
30+
echo "exit 1" >> ./readme-commands.sh
3031
echo "::endgroup::"
3132
3233
echo "::group::Run This"

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -80,15 +80,18 @@ HuggingFace.
8080
python3 torchchat.py download llama3
8181
```
8282

83-
*NOTE: This command may prompt you to request access to llama3 via HuggingFace, if you do not already have access. Simply follow the prompts and re-run the command when access is granted.*
83+
*NOTE: This command may prompt you to request access to llama3 via
84+
HuggingFace, if you do not already have access. Simply follow the
85+
prompts and re-run the command when access is granted.*
8486

8587
View available models with:
8688
```
8789
python3 torchchat.py list
8890
```
8991

92+
You can also remove downloaded models with the remove command:
93+
`python3 torchchat.py remove llama3`
9094

91-
You can also remove downloaded models with the remove command: `python3 torchchat.py remove llama3`
9295

9396

9497
## Running via PyTorch / Python
@@ -111,15 +114,15 @@ python3 torchchat.py generate llama3 --prompt "write me a story about a boy and
111114

112115
For more information run `python3 torchchat.py generate --help`
113116

114-
[end default]:
115117

116118
### Browser
117119

118-
[shell default]: if false; then
120+
[skip default]: begin
119121
```
120122
python3 torchchat.py browser llama3
121123
```
122-
[shell default]: fi
124+
[skip default]: end
125+
123126

124127
*Running on http://127.0.0.1:5000* should be printed out on the
125128
terminal. Click the link or go to
@@ -139,9 +142,15 @@ conversation.
139142
AOT compiles models before execution for faster inference
140143

141144
The following example exports and executes the Llama3 8B Instruct
145+
<<<<<<< HEAD
146+
model. The first command performs the actual export, the second
147+
command loads the exported model into the Python interface to enable
148+
users to test the exported model.
149+
=======
142150
model. (The first command performs the actual export, the second
143151
command loads the exported model into the Python interface to enable
144152
users to test the exported model.)
153+
>>>>>>> cf83f45a1949b3d45e356d375486a4013badf4db
145154
146155
```
147156
# Compile
@@ -152,9 +161,10 @@ python3 torchchat.py export llama3 --output-dso-path exportedModels/llama3.so
152161
python3 torchchat.py generate llama3 --dso-path exportedModels/llama3.so --prompt "Hello my name is"
153162
```
154163

155-
NOTE: If you're machine has cuda add this flag for performance
164+
NOTE: If your machine has cuda add this flag for performance
156165
`--quantize config/data/cuda.json`
157166

167+
[end default]: end
158168
### Running native using our C++ Runner
159169

160170
The end-to-end C++ [runner](runner/run.cpp) runs an `*.so` file
@@ -167,7 +177,7 @@ scripts/build_native.sh aoti
167177

168178
Execute
169179
```bash
170-
cmake-out/aoti_run exportedModels/llama3.so -z .model-artifacts/meta-llama/Meta-Llama-3-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time"
180+
cmake-out/aoti_run exportedModels/llama3.so -z ~/.torchchat/model-cache/meta-llama/Meta-Llama-3-8B-Instruct/tokenizer.model -l 3 -i "Once upon a time"
171181
```
172182

173183
[end default]:
@@ -243,9 +253,9 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t
243253
<img src="https://pytorch.org/executorch/main/_static/img/llama_ios_app.png" width="600" alt="iOS app running a LlaMA model">
244254
</a>
245255

246-
247256
### Deploy and run on Android
248257

258+
MISSING. TBD.
249259

250260

251261

build/utils.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
import os
1111
from pathlib import Path
1212
from typing import Any, Callable, Dict, List, Tuple
13-
1413
import torch
1514

1615
##########################################################################

config/data/desktop.json

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
{
2-
"executor": {"accelerator": "fast" },
2+
"executor": {"accelerator": "fast"},
33
"precision": {"dtype" : "fast16"},
4-
"linear:int4": {"groupsize" : 256}
54
}

docs/quantization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Due to the larger vocabulary size of llama3, we also recommend quantizing the em
2222
|--|--|--|--|--|--|--|--|
2323
| embedding (symmetric) | fp32, fp16, bf16 | [8, 4]* | [32, 64, 128, 256]** | ||||
2424

25-
^ The a8w4dq quantization scheme requires inouts to be converted to fp32, due to lack of support for fp16 and bf16.
25+
^a8w4dq quantization scheme requires model to be converted to fp32, due to lack of support for fp16 and bf16 in the kernels provided with ExecuTorch.
2626

2727
* These are the only valid bitwidth options.
2828

@@ -82,7 +82,7 @@ python3 generate.py llama3 --dso-path llama3.dso --prompt "Hello my name is"
8282
```
8383
### ExecuTorch
8484
```
85-
python3 torchchat.py export llama3 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
85+
python3 torchchat.py export llama3 --dtype fp32 --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:a8w4dq": {"groupsize" : 256}}' --output-pte-path llama3.pte
8686
8787
python3 generate.py llama3 --pte-path llama3.pte --prompt "Hello my name is"
8888
```

scripts/process-readme.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ def print_between_triple_backticks(filename, predicate):
1515
elif line.startswith(command):
1616
print(line[len(command) :])
1717
elif line.startswith(end):
18+
print("exit 0")
1819
return
1920
elif line.startswith(skip):
2021
keyword = line[len(skip):-1].strip()
@@ -34,6 +35,6 @@ def print_between_triple_backticks(filename, predicate):
3435
if len(sys.argv) > 1:
3536
predicate = sys.argv[1]
3637
else:
37-
predicate = "default"
38+
predicate="default"
3839

3940
print_between_triple_backticks("README.md", predicate)

0 commit comments

Comments
 (0)