Skip to content

Commit 22f73aa

Browse files
kimishpatelfacebook-github-bot
authored andcommitted
Update benchmarking numbers (pytorch#2881)
Summary: Pull Request resolved: pytorch#2881 ATT Created from CodeHub with https://fburl.com/edit-in-codehub Reviewed By: mergennachin, lucylq Differential Revision: D55817614
1 parent fce75e7 commit 22f73aa

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

examples/models/llama2/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ Performance was measured on Samsung Galaxy S22, S23, S24 and One Plus 12. Measur
3636

3737
|Device | Groupwise 4-bit (128) | Groupwise 4-bit (256)
3838
|--------| ---------------------- | ---------------
39-
|Galaxy S22 | x | x |
40-
|Galaxy S24 | x | x |
41-
|One plus 12 | x | x |
39+
|Galaxy S22 | 8.15 tokens/second | 8.3 tokens/second |
40+
|Galaxy S24 | 10.66 tokens/second | 11.26 tokens/second |
41+
|One plus 12 | 11.55 tokens/second | 11.6 tokens/second |
4242
|iPhone 15 pro | x | x |
4343

4444

@@ -63,7 +63,7 @@ You can export and run the original Llama2 7B model.
6363

6464
2. Export model and generate `.pte` file:
6565
```
66-
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w -d fp32
66+
python -m examples.models.llama2.export_llama --checkpoint <checkpoint.pth> --params <params.json> -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32
6767
```
6868
6969
### Option B: Download and export stories110M model

0 commit comments

Comments
 (0)