Skip to content

Commit f2f6c96

Browse files
lucylqpytorchbot
authored andcommitted
Add llama3.1 to readme (#4378)
Summary: #4376 Pull Request resolved: #4378 Reviewed By: kirklandsign Differential Revision: D60177343 Pulled By: lucylq fbshipit-source-id: f8197e7af18785bfcca3c5c2980ec1bd7acdaf9d (cherry picked from commit d6d691e)
1 parent fb2a1a7 commit f2f6c96

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

examples/models/llama2/README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ For more details, see [Llama 2 repo](https://github.com/facebookresearch/llama)
55

66
Pretrained models are not included in this repo. Users are suggested to download them [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
77

8-
# What are Llama 2 and 3?
8+
# What is Llama?
99
Llama is a collection of large language models that use publicly available data for training. These models are based on the transformer architecture, which allows it to process input sequences of arbitrary length and generate output sequences of variable length. One of the key features of Llama models is its ability to generate coherent and contextually relevant text. This is achieved through the use of attention mechanisms, which allow the model to focus on different parts of the input sequence as it generates output. Additionally, Llama models use a technique called “masked language modeling” to pre-train the model on a large corpus of text, which helps it learn to predict missing words in a sentence.
1010

1111
Llama models have shown to perform well on a variety of natural language processing tasks, including language translation, question answering, and text summarization and are also capable of generating human-like text, making Llama models a useful tool for creative writing and other applications where natural language generation is important.
@@ -47,6 +47,20 @@ Llama 2 7B performance was measured on the Samsung Galaxy S22, S24, and OnePlus
4747
|Galaxy S24 | 10.66 tokens/second | 11.26 tokens/second |
4848
|OnePlus 12 | 11.55 tokens/second | 11.6 tokens/second |
4949

50+
### Llama3 8B
51+
Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus 12 devices. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-5-run-benchmark-on).
52+
53+
Note that since Llama3's vocabulary size is 4x that of Llama2, we had to quantize embedding lookup table as well. For these results embedding lookup table was groupwise quantized with 4-bits and group size of 32.
54+
55+
|Device | Groupwise 4-bit (128) | Groupwise 4-bit (256)
56+
|--------| ---------------------- | ---------------
57+
|Galaxy S22 | 7.85 tokens/second | 8.4 tokens/second |
58+
|Galaxy S24 | 10.91 tokens/second | 11.21 tokens/second |
59+
|OnePlus 12 | 10.85 tokens/second | 11.02 tokens/second |
60+
61+
### Llama3.1
62+
> :warning: **use the main branch**: Llama3.1 is supported on the ExecuTorch main branch (not release 0.3).
63+
5064
# Instructions
5165

5266
## Tested on

0 commit comments

Comments
 (0)