Skip to content

Commit 99ebd39

Browse files
mikekgfbmalfet
authored andcommitted
Run gguf (#686)
* improve updown parser, and use in README.md execution * cut/paste errors * typo: true -> false * we scan each partial line, so need to suppress at partial line level :( * make it twice as nice * improved updown parsing * special handling for lines w/o option * enable run on quantization doc * handle white space before trip backtick * updates * run gguf * updates * add gguf to periodic * build et for gguf * update updown options to handle llama3-8b on macos * secrets * updates
1 parent 392f9e9 commit 99ebd39

File tree

6 files changed

+157
-17
lines changed

6 files changed

+157
-17
lines changed

.github/workflows/run-readme-periodic.yml

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,6 @@ jobs:
4242
bash -x ./run-readme.sh
4343
echo "::endgroup::"
4444
45-
echo "::group::Completion"
46-
echo "tests complete"
47-
echo "*******************************************"
48-
echo "::endgroup::"
4945
5046
test-quantization-any:
5147
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
@@ -79,6 +75,39 @@ jobs:
7975
bash -x ./run-quantization.sh
8076
echo "::endgroup::"
8177
78+
test-gguf-any:
79+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
80+
secrets: inherit
81+
with:
82+
runner: linux.g5.4xlarge.nvidia.gpu
83+
secrets-env: "HF_TOKEN_PERIODIC"
84+
gpu-arch-type: cuda
85+
gpu-arch-version: "12.1"
86+
timeout: 60
87+
script: |
88+
echo "::group::Print machine info"
89+
uname -a
90+
echo "::endgroup::"
91+
92+
echo "::group::Install newer objcopy that supports --set-section-alignment"
93+
yum install -y devtoolset-10-binutils
94+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
95+
echo "::endgroup::"
96+
97+
echo "::group::Create script to run gguf"
98+
python3 scripts/updown.py --file docs/GGUF.md > ./run-gguf.sh
99+
# for good measure, if something happened to updown processor,
100+
# and it did not error out, fail with an exit 1
101+
echo "exit 1" >> ./run-gguf.sh
102+
echo "::endgroup::"
103+
104+
echo "::group::Run gguf"
105+
echo "*******************************************"
106+
cat ./run-gguf.sh
107+
echo "*******************************************"
108+
bash -x ./run-gguf.sh
109+
echo "::endgroup::"
110+
82111
echo "::group::Completion"
83112
echo "tests complete"
84113
echo "*******************************************"

.github/workflows/run-readme-pr-macos.yml

Lines changed: 58 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77
workflow_dispatch:
88
jobs:
99
test-readme-macos:
10-
runs-on: macos-14-xlarge
10+
runs-on: macos-14-xlarge
1111
steps:
1212
- name: Checkout code
1313
uses: actions/checkout@v2
@@ -34,7 +34,7 @@ jobs:
3434
echo "::endgroup::"
3535
3636
echo "::group::Create script to run README"
37-
python3 scripts/updown.py --file README.md --replace 'llama3:stories15M,-l 3:-l 2,meta-llama/Meta-Llama-3-8B-Instruct:stories15M' --suppress huggingface-cli,HF_TOKEN > ./run-readme.sh
37+
python3 scripts/updown.py --file README.md --replace 'llama3:stories15M,-l 3:-l 2,meta-llama/Meta-Llama-3-8B-Instruct:stories15M' --suppress huggingface-cli,HF_TOKEN > ./run-readme.sh
3838
# for good measure, if something happened to updown processor,
3939
# and it did not error out, fail with an exit 1
4040
echo "exit 1" >> ./run-readme.sh
@@ -47,12 +47,7 @@ jobs:
4747
bash -x ./run-readme.sh
4848
echo "::endgroup::"
4949

50-
echo "::group::Completion"
51-
echo "tests complete"
52-
echo "*******************************************"
53-
echo "::endgroup::"
54-
55-
50+
5651
test-quantization-macos:
5752
runs-on: macos-14-xlarge
5853
steps:
@@ -81,7 +76,7 @@ jobs:
8176
echo "::endgroup::"
8277
8378
echo "::group::Create script to run quantization"
84-
python3 scripts/updown.py --file docs/quantization.md --replace llama3:stories15M --suppress huggingface-cli,HF_TOKEN > ./run-quantization.sh
79+
python3 scripts/updown.py --file docs/quantization.md --replace 'llama3:stories15M,-l 3:-l 2,meta-llama/Meta-Llama-3-8B-Instruct:stories15M' --suppress huggingface-cli,HF_TOKEN > ./run-quantization.sh
8580
# for good measure, if something happened to updown processor,
8681
# and it did not error out, fail with an exit 1
8782
echo "exit 1" >> ./run-quantization.sh
@@ -98,3 +93,57 @@ jobs:
9893
echo "tests complete"
9994
echo "*******************************************"
10095
echo "::endgroup::"
96+
97+
98+
test-gguf-macos:
99+
runs-on: macos-14-xlarge
100+
secrets: inherit
101+
steps:
102+
- name: Checkout code
103+
uses: actions/checkout@v2
104+
- uses: actions/setup-python@v4
105+
with:
106+
python-version: '3.10.11'
107+
- name: Setup Xcode
108+
if: runner.os == 'macOS'
109+
uses: maxim-lobanov/setup-xcode@v1
110+
with:
111+
xcode-version: '15.3'
112+
- name: Run script
113+
secrets-env: "HF_TOKEN_PERIODIC"
114+
run: |
115+
set -x
116+
# NS: Remove previous installation of torch first
117+
# as this script does not isntall anything into conda env but rather as system dep
118+
pip3 uninstall -y torch || true
119+
set -eou pipefail
120+
121+
echo "::group::Print machine info"
122+
uname -a
123+
sysctl machdep.cpu.brand_string
124+
sysctl machdep.cpu.core_count
125+
echo "::endgroup::"
126+
127+
# echo "::group::Install newer objcopy that supports --set-section-alignment"
128+
# yum install -y devtoolset-10-binutils
129+
# export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
130+
# echo "::endgroup::"
131+
132+
echo "::group::Create script to run gguf"
133+
python3 scripts/updown.py --file docs/GGUF.md > ./run-gguf.sh
134+
# for good measure, if something happened to updown processor,
135+
# and it did not error out, fail with an exit 1
136+
echo "exit 1" >> ./run-gguf.sh
137+
echo "::endgroup::"
138+
139+
echo "::group::Run gguf"
140+
echo "*******************************************"
141+
cat ./run-gguf.sh
142+
echo "*******************************************"
143+
bash -x ./run-gguf.sh
144+
echo "::endgroup::"
145+
146+
echo "::group::Completion"
147+
echo "tests complete"
148+
echo "*******************************************"
149+
echo "::endgroup::"

.github/workflows/run-readme-pr-mps.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@ on:
88
jobs:
99
test-readme-mps-macos:
1010
uses: pytorch/test-infra/.github/workflows/macos_job.yml@main
11+
secrets: inherit
1112
with:
1213
runner: macos-m1-14
14+
secrets-env: "HF_TOKEN_PERIODIC"
1315
script: |
1416
conda create -y -n test-readme-mps-macos python=3.10.11
1517
conda activate test-readme-mps-macos

.github/workflows/run-readme-pr.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ on:
1010
jobs:
1111
test-readme-any:
1212
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
13+
secrets: inherit
1314
with:
1415
runner: linux.g5.4xlarge.nvidia.gpu
1516
secrets-env: "HF_TOKEN_PERIODIC"
@@ -76,6 +77,39 @@ jobs:
7677
bash -x ./run-quantization.sh
7778
echo "::endgroup::"
7879
80+
test-gguf-any:
81+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
82+
secrets: inherit
83+
with:
84+
runner: linux.g5.4xlarge.nvidia.gpu
85+
secrets-env: "HF_TOKEN_PERIODIC"
86+
gpu-arch-type: cuda
87+
gpu-arch-version: "12.1"
88+
timeout: 60
89+
script: |
90+
echo "::group::Print machine info"
91+
uname -a
92+
echo "::endgroup::"
93+
94+
echo "::group::Install newer objcopy that supports --set-section-alignment"
95+
yum install -y devtoolset-10-binutils
96+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
97+
echo "::endgroup::"
98+
99+
echo "::group::Create script to run gguf"
100+
python3 scripts/updown.py --file docs/GGUF.md --replace 'llama3:stories15M,-l 3:-l 2,meta-llama/Meta-Llama-3-8B-Instruct:stories15M' --suppress huggingface-cli,HF_TOKEN > ./run-gguf.sh
101+
# for good measure, if something happened to updown processor,
102+
# and it did not error out, fail with an exit 1
103+
echo "exit 1" >> ./run-gguf.sh
104+
echo "::endgroup::"
105+
106+
echo "::group::Run gguf"
107+
echo "*******************************************"
108+
cat ./run-gguf.sh
109+
echo "*******************************************"
110+
bash -x ./run-gguf.sh
111+
echo "::endgroup::"
112+
79113
echo "::group::Completion"
80114
echo "tests complete"
81115
echo "*******************************************"

docs/GGUF.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,27 @@
11
# Using GGUF Models
2-
We support parsing [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) files with the following tensor types:
2+
3+
[shell default]: HF_TOKEN="${SECRET_HF_TOKEN_PERIODIC}" huggingface-cli login
4+
5+
[shell default]: TORCHCHAT_ROOT=${PWD} ./scripts/install_et.sh
6+
7+
We support parsing [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) files with
8+
the following tensor types:
39
- F16
410
- F32
511
- Q4_0
612
- Q6_K
713

8-
If an unsupported type is encountered while parsing a GGUF file, an exception is raised.
14+
If an unsupported type is encountered while parsing a GGUF file, an
15+
exception is raised.
916

1017
We now go over an example of using GGUF files in the torchchat flow.
1118

1219
### Download resources
13-
First download a GGUF model and tokenizer. In this example, we use a Q4_0 GGUF file. (Note that Q4_0 is only the dominant tensor type in the file, but the file also contains GGUF tensors of types Q6_K, F16, and F32.)
20+
21+
First download a GGUF model and tokenizer. In this example, we use a
22+
Q4_0 GGUF file. (Note that Q4_0 is only the dominant tensor type in
23+
the file, but the file also contains GGUF tensors of types Q6_K, F16,
24+
and F32.)
1425

1526
```
1627
# Download resources
@@ -55,3 +66,5 @@ python3 torchchat.py export --gguf-path ${GGUF_MODEL_PATH} --output-pte-path ${G
5566
# Generate using the PTE model that was created by the export command
5667
python3 torchchat.py generate --gguf-path ${GGUF_MODEL_PATH} --pte-path ${GGUF_PTE_PATH} --tokenizer-path ${GGUF_TOKENIZER_PATH} --temperature 0 --prompt "Once upon a time" --max-new-tokens 15
5768
```
69+
70+
[end default]: end

scripts/updown.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ def process_command(
140140
)
141141
elif keyword == "prefix":
142142
output(
143-
trailing_command[:-1],
143+
trailing_command,
144144
end="",
145145
replace_list=replace_list,
146146
suppress_list=suppress_list,
@@ -178,6 +178,19 @@ def process_command(
178178
suppress_list=suppress_list,
179179
)
180180
exit(0)
181+
elif keyword == "comment":
182+
output(
183+
"# " + trailing_command,
184+
suppress_list=None,
185+
replace_list=None,
186+
)
187+
else:
188+
output(
189+
"echo 'unknown updown command'\nexit 1",
190+
suppress_list=None,
191+
replace_list=None,
192+
)
193+
exit(1)
181194

182195
# We have processed this line as a command
183196
return True

0 commit comments

Comments
 (0)