Skip to content

Move model section from issue template to README.md #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 3 additions & 16 deletions .github/ISSUE_TEMPLATE/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,20 +44,6 @@ $ make --version
$ g++ --version
```

# Models

* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [pull request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.
* If your issue is with model conversion please verify the `sha256sum` of each of your `consolidated*.pth` and `ggml-model-XXX.bin` files to confirm that you have the correct model data files before logging an issue. [Latest sha256 sums for your reference](https://github.com/ggerganov/llama.cpp/issues/238).
* If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
* LLaMA:
* [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
* GPT-3
* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
* GPT-3.5 / InstructGPT / ChatGPT:
* [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
* [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Expand All @@ -75,8 +61,9 @@ Please provide detailed steps for reproducing the issue. We are not sitting in f

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability. e.g.
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.

Example environment info:
```
llama.cpp$ git log | head -1
commit 2af23d30434a677c6416812eea52ccc0af65119c
Expand All @@ -103,8 +90,8 @@ GNU Make 4.3
$ md5sum ./models/65B/ggml-model-q4_0.bin
dbdd682cce80e2d6e93cefc7449df487 ./models/65B/ggml-model-q4_0.bin
```
Here's a run with the Linux command [perf](https://www.brendangregg.com/perf.html)

Example run with the Linux command [perf](https://www.brendangregg.com/perf.html)
```
llama.cpp$ perf stat ./main -m ./models/65B/ggml-model-q4_0.bin -t 16 -n 1024 -p "Please close your issue when it has been answered."
main: seed = 1679149377
Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,17 +191,8 @@ Note the use of `--color` to distinguish between user input and generated text.

### Instruction mode with Alpaca

First, download the `ggml` Alpaca model into the `./models` folder:

```
# use one of these
# TODO: add a script to simplify the download
curl -o ./models/ggml-alpaca-7b-q4.bin -C - https://gateway.estuary.tech/gw/ipfs/QmUp1UGeQFDqJKvtjbSYPBiZZKRjLp8shVP9hT8ZB9Ynv1
curl -o ./models/ggml-alpaca-7b-q4.bin -C - https://ipfs.io/ipfs/QmUp1UGeQFDqJKvtjbSYPBiZZKRjLp8shVP9hT8ZB9Ynv1
curl -o ./models/ggml-alpaca-7b-q4.bin -C - https://cloudflare-ipfs.com/ipfs/QmUp1UGeQFDqJKvtjbSYPBiZZKRjLp8shVP9hT8ZB9Ynv1
```

Now run the `main` tool like this:
1. First, download the `ggml` Alpaca model into the `./models` folder
2. Run the `main` tool like this:

```
./main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins
Expand All @@ -228,26 +219,34 @@ cadaver, cauliflower, cabbage (vegetable), catalpa (tree) and Cailleach.

### Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data

* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [Pull Request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.

* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [pull request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.
* Please verify the sha256 checksums of all of your `consolidated*.pth` and corresponding converted `ggml-model-*.bin` model files to confirm that you have the correct model data files before creating an issue relating to your model files.
* The following command will verify if you have all possible latest files in your self-installed `./models` subdirectory:

The following command will verify if you have all possible latest files in your self-installed `./models` subdirectory:

`sha256sum --ignore-missing -c SHA256SUMS` on Linux
`sha256sum --ignore-missing -c SHA256SUMS` on Linux

or
or

`shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS
`shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS

* If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
* LLaMA:
* [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
* GPT-3
* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
* GPT-3.5 / InstructGPT / ChatGPT:
* [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
* [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

### Perplexity (Measuring model quality)

You can pass `--perplexity` as a command line option to measure perplexity over the given prompt. For more background,
see https://huggingface.co/docs/transformers/perplexity. However, in general, lower perplexity is better for LLMs.

#### Measurements
#### Latest measurements

https://github.com/ggerganov/llama.cpp/pull/270 is the unofficial tracking page for now. llama.cpp is measuring very well
The latest perplexity scores for the various model sizes and quantizations are being tracked in [discussion #406](https://github.com/ggerganov/llama.cpp/discussions/406). `llama.cpp` is measuring very well
compared to the baseline implementations. Quantization has a small negative impact to quality, but, as you can see, running
13B at q4_0 beats the 7B f16 model by a significant amount.

Expand Down Expand Up @@ -347,3 +346,4 @@ docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a`
- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions