Skip to content

Commit 5aee793

Browse files
authored
Advanced.md (#772)
* move gguf tests to script * execute advanced instructions
1 parent e3db248 commit 5aee793

File tree

5 files changed

+145
-16
lines changed

5 files changed

+145
-16
lines changed

.ci/scripts/run-docs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,4 +55,13 @@ if [ "$1" == "gguf" ]; then
5555
echo "*******************************************"
5656
bash -x ./run-gguf.sh
5757
echo "::endgroup::"
58-
fi
58+
<<<<<<< HEAD
59+
fi
60+
61+
62+
if [ "$1" == "advanced" ]; then
63+
echo "TBD"
64+
fi
65+
=======
66+
fi
67+
>>>>>>> e3db2486f80b71b3143945a44f58d50c02488c90

.github/workflows/run-readme-pr.yml

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,3 +153,68 @@ jobs:
153153
echo "tests complete"
154154
echo "*******************************************"
155155
echo "::endgroup::"
156+
157+
test-advanced-any:
158+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
159+
secrets: inherit
160+
with:
161+
runner: linux.g5.4xlarge.nvidia.gpu
162+
secrets-env: "HF_TOKEN_PERIODIC"
163+
gpu-arch-type: cuda
164+
gpu-arch-version: "12.1"
165+
timeout: 60
166+
script: |
167+
echo "::group::Print machine info"
168+
uname -a
169+
echo "::endgroup::"
170+
171+
echo "::group::Install newer objcopy that supports --set-section-alignment"
172+
yum install -y devtoolset-10-binutils
173+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
174+
echo "::endgroup::"
175+
176+
echo "::group::Create script to run advanced"
177+
python3 scripts/updown.py --file docs/ADVANCED-USERS.md --replace 'llama3:stories15M,-l 3:-l 2,meta-llama/Meta-Llama-3-8B-Instruct:stories15M' --suppress huggingface-cli,HF_TOKEN > ./run-advanced.sh
178+
# for good measure, if something happened to updown processor,
179+
# and it did not error out, fail with an exit 1
180+
echo "exit 1" >> ./run-advanced.sh
181+
echo "::endgroup::"
182+
183+
echo "::group::Run advanced"
184+
echo "*******************************************"
185+
cat ./run-advanced.sh
186+
echo "*******************************************"
187+
bash -x ./run-advanced.sh
188+
=======
189+
190+
echo "::group::Completion"
191+
echo "tests complete"
192+
echo "*******************************************"
193+
>>>>>>> e3db2486f80b71b3143945a44f58d50c02488c90
194+
echo "::endgroup::"
195+
196+
test-gguf-cpu:
197+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
198+
secrets: inherit
199+
with:
200+
runner: linux.g5.4xlarge.nvidia.gpu
201+
secrets-env: "HF_TOKEN_PERIODIC"
202+
gpu-arch-type: cuda
203+
gpu-arch-version: "12.1"
204+
timeout: 60
205+
script: |
206+
echo "::group::Print machine info"
207+
uname -a
208+
echo "::endgroup::"
209+
210+
echo "::group::Install newer objcopy that supports --set-section-alignment"
211+
yum install -y devtoolset-10-binutils
212+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
213+
echo "::endgroup::"
214+
215+
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs gguf
216+
217+
echo "::group::Completion"
218+
echo "tests complete"
219+
echo "*******************************************"
220+
echo "::endgroup::"

docs/ADVANCED-USERS.md

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,10 @@ Torchchat is currently in a pre-release state and under extensive development.
88

99
[**Introduction**](#introduction) | [**Installation**](#installation) | [**Get Started**](#get-started) | [**Download**](#download) | [**Chat**](#chat) | [**Generate**](#generate) | [**Eval**](#eval) | [**Export**](#export) | [**Supported Systems**](#supported-systems) | [**Contributing**](#contributing) | [**License**](#license)
1010

11-
&nbsp;
11+
[shell default]: HF_TOKEN="${SECRET_HF_TOKEN_PERIODIC}" huggingface-cli login
12+
13+
[shell default]: TORCHCHAT_ROOT=${PWD} ./scripts/install_et.sh
14+
1215

1316
This is the advanced users guide, if you're looking to get started
1417
with LLMs, please refer to the README at the root directory of the
@@ -51,15 +54,16 @@ mistralai/Mistral-7B-v0.1 | 🚧 | ✅ | ✅ | ✅ | ✅ | ❹ |
5154
mistralai/Mistral-7B-Instruct-v0.1 | - | ✅ | ✅ | ✅ | ✅ | ❹ |
5255
mistralai/Mistral-7B-Instruct-v0.2 | - | ✅ | ✅ | ✅ | ✅ | ❹ |
5356

54-
*Key:* ✅ works correctly; 🚧 work in progress; ❌ not supported; ❹ requires 4bit groupwise quantization; 📵 not on mobile (may fit some high-end devices such as tablets);
55-
56-
&nbsp;
57+
*Key:* ✅ works correctly; 🚧 work in progress; ❌ not supported; ❹
58+
requires 4bit groupwise quantization; 📵 not on mobile (may fit some
59+
high-end devices such as tablets);
5760

58-
---
5961

6062
## Get Started
6163

62-
Torchchat lets you access LLMs through an interactive interface, prompted single-use generation, model export (for use by AOT Inductor and ExecuTorch), and standalone C++ runtimes.
64+
Torchchat lets you access LLMs through an interactive interface,
65+
prompted single-use generation, model export (for use by AOT Inductor
66+
and ExecuTorch), and standalone C++ runtimes.
6367

6468
| Function | Torchchat Command | Direct Command | Tested |
6569
|---|----|----|-----|
@@ -79,9 +83,11 @@ Mobile C++ runtime | n/a | app + AOTI | 🚧 |
7983

8084
**Getting help:** Each command implements the --help option to give addititonal information about available options:
8185

86+
[skip default]: begin
8287
```
8388
python3 torchchat.py [ export | generate | chat | eval | ... ] --help
8489
```
90+
[skip default]: end
8591

8692
Exported models can be loaded back into torchchat for chat or text
8793
generation, letting you experiment with the exported model and valid
@@ -182,9 +188,12 @@ model from Andrej Karpathy's tinyllamas model family:
182188

183189
```
184190
MODEL_NAME=stories15M
185-
MODEL_DIR=<root to your heckpoints>/${MODEL_NAME}
186-
MODEL_PATH=${MODEL_OUT}/stories15M.pt
191+
MODEL_DIR=~/checkpoints/${MODEL_NAME}
192+
MODEL_PATH=${MODEL_DIR}/stories15M.pt
187193
MODEL_OUT=~/torchchat-exports
194+
195+
mkdir -p ${MODEL_DIR}
196+
mkdir -p ${MODEL_OUT}
188197
```
189198

190199
When we export models with AOT Inductor for servers and desktops, and
@@ -242,7 +251,7 @@ ExecuTorch-exported PTE models.
242251

243252
## PyTorch eager mode and JIT-compiled execution
244253
```
245-
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --device [ cuda | cpu | mps]
254+
python3 generate.py [--compile] --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --device [ cuda | mps | cpu ]
246255
```
247256

248257
To improve performance, you can compile the model with `--compile`
@@ -306,7 +315,7 @@ using AOT Inductor for CPU oor GPUs (the latter using Triton for
306315
optimizations such as operator fusion):
307316

308317
```
309-
python3 export.py --checkpoint-path ${MODEL_PATH} --device [ cuda | cpu] --output-dso-path ${MODEL_NAME}.so
318+
python3 export.py --checkpoint-path ${MODEL_PATH} --device [ cuda | cpu ] --output-dso-path ${MODEL_NAME}.so
310319
```
311320

312321

@@ -334,7 +343,7 @@ tests against the exported model with the same interface, and support
334343
additional experiments to confirm model quality and speed.
335344

336345
```
337-
python3 generate.py --device {cuda,cpu} --dso-path ${MODEL_NAME}.so --prompt "Once upon a time"
346+
python3 generate.py --device [ cuda | cpu ] --dso-path ${MODEL_NAME}.so --prompt "Once upon a time"
338347
```
339348

340349

@@ -389,12 +398,17 @@ linear operator (asymmetric) with GPTQ | n/a | 4b (group) | n/a |
389398
linear operator (asymmetric) with HQQ | n/a | work in progress | n/a |
390399

391400
## Model precision (dtype precision setting)
392-
On top of quantizing models with quantization schemes mentioned above, models can be converted to lower bit floating point precision to reduce the memory bandwidth requirement and take advantage of higher density compute available. For example, many GPUs and some of the CPUs have good support for bfloat16 and float16. This can be taken advantage of via `--dtype arg` as shown below.
401+
On top of quantizing models with quantization schemes mentioned above, models can be converted
402+
to lower precision floating point representations to reduce the memory bandwidth requirement and
403+
take advantage of higher density compute available. For example, many GPUs and some of the CPUs
404+
have good support for bfloat16 and float16. This can be taken advantage of via `--dtype arg` as shown below.
393405

406+
[skip default]: begin
394407
```
395408
python3 generate.py --dtype [bf16 | fp16 | fp32] ...
396409
python3 export.py --dtype [bf16 | fp16 | fp32] ...
397410
```
411+
[skip default]: end
398412

399413
You can find instructions for quantizing models in
400414
[docs/quantization.md](file:///./quantization.md). Advantageously,
@@ -412,9 +426,11 @@ GGUF is a nascent industry standard format and presently torchchat can
412426
read the F16, F32, Q4_0, and Q6_K formats natively and convert them
413427
into native torchchat models by using the load-gguf option:
414428

429+
[skip default]: begin
415430
```
416431
python3 [ export.py | generate.py | ... ] --gguf-path <gguf_filename>
417432
```
433+
[skip default]: end
418434

419435
You may then apply the standard quantization options, e.g., to add
420436
embedding table quantization as described under quantization. (You
@@ -441,7 +457,7 @@ start with the original FP16 or FP32 GGUF format.
441457
To use the quantize tool, install the GGML tools at ${GGUF} . Then,
442458
you can, for example, convert a quantized model to f16 format:
443459

444-
460+
[end default]: end
445461
```
446462
${GGUF}/quantize --allow-requantize your_quantized_model.gguf fake_unquantized_model.gguf f16
447463
```
@@ -565,15 +581,15 @@ in a python-free environment with AOT Inductor and ExecuTorch.
565581

566582

567583

568-
# Contributing to torchchat
584+
# CONTRIBUTING to torchchat
569585

570586
We welcome any feature requests, bug reports, or pull requests from
571587
the community. See the [CONTRIBUTING](CONTRIBUTING.md) for
572588
instructions how to contribute to torchchat.
573589

574590

575591

576-
# License
592+
# LICENSE
577593

578594
Torchchat is released under the [BSD 3 license](./LICENSE). However
579595
you may have additional legal obligations that govern your use of other

run-advanced.sh

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
set -eou pipefail
2+
TORCHCHAT_ROOT=${PWD} ./scripts/install_et.sh
3+
if false; then
4+
python3 torchchat.py ... --help
5+
fi
6+
MODEL_NAME=stories15M
7+
MODEL_DIR=~/checkpoints/${MODEL_NAME}
8+
MODEL_PATH=${MODEL_DIR}/stories15M.pt
9+
MODEL_OUT=~/torchchat-exports
10+
11+
mkdir -p ${MODEL_DIR}
12+
mkdir -p ${MODEL_OUT}
13+
python3 generate.py --checkpoint-path ${MODEL_PATH} --prompt "Hello, my name is" --device mps
14+
python3 export.py --checkpoint-path ${MODEL_PATH} --device cuda --output-pte-path ${MODEL_NAME}.pte
15+
python3 export.py --checkpoint-path ${MODEL_PATH} --device cpu --output-dso-path ${MODEL_NAME}.so
16+
python3 generate.py --checkpoint-path ${MODEL_PATH} --pte-path ${MODEL_NAME}.pte --device cpu --prompt "Once upon a time"
17+
python3 generate.py --device {cuda,cpu} --dso-path ${MODEL_NAME}.so --prompt "Once upon a time"
18+
if false; then
19+
python3 generate.py --dtype fp32 ...
20+
python3 export.py --dtype fp32 ...
21+
fi
22+
if false; then
23+
python3 ... --gguf-path <gguf_filename>
24+
fi
25+
exit 0

scripts/updown.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,14 @@ def output(*args, **kwargs):
5353
###
5454

5555

56+
def select_first_option_between_brackets(text):
57+
return re.sub(r"\[([^]|]*?)\|[^]]*]", r"\1", text)
58+
59+
60+
def select_last_option_between_brackets(text):
61+
return re.sub(r"\[[^]]*\|([^]|]*)\]", r"\1", text)
62+
63+
5664
def remove_text_between_brackets(text):
5765
return re.sub(r"\[.*?\]", "", text)
5866

@@ -78,6 +86,12 @@ def updown_process_line(
7886
# [ x1 | c2 | x3 ] means "pick one", so we may have to check that and pick one
7987
# of the options. Probably pick the last option because testing has more likely
8088
# been performed with the first option!
89+
last=True
90+
if last:
91+
line=select_last_option_between_brackets(line)
92+
else:
93+
line=select_first_option_between_brackets(line)
94+
8195
output(
8296
remove_text_between_brackets(line),
8397
replace_list=replace_list,

0 commit comments

Comments
 (0)