You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -214,8 +214,7 @@ When running the larger models, make sure you have enough disk space to store al
214
214
215
215
### Memory/Disk Requirements
216
216
217
-
As the models are currently fully loaded into memory, you will need adequate disk space to save them
218
-
and sufficient RAM to load them. At the moment, memory and disk requirements are the same.
217
+
As the models are currently fully loaded into memory, you will need adequate disk space to save them and sufficient RAM to load them. At the moment, memory and disk requirements are the same.
219
218
220
219
| model | original size | quantized size (4-bit) |
@@ -227,18 +226,18 @@ and sufficient RAM to load them. At the moment, memory and disk requirements are
227
226
### Interactive mode
228
227
229
228
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
230
-
In this mode, you can always interrupt generation by pressing Ctrl+C and enter one or more lines of text which will be converted into tokens and appended to the current context. You can also specify a *reverse prompt* with the parameter `-r "reverse prompt string"`. This will result in user input being prompted whenever the exact tokens of the reverse prompt string are encountered in the generation. A typical use is to use a prompt which makes LLaMa emulate a chat between multiple users, say Alice and Bob, and pass `-r "Alice:"`.
229
+
In this mode, you can always interrupt generation by pressing Ctrl+C and entering one or more lines of text, which will be converted into tokens and appended to the current context. You can also specify a *reverse prompt* with the parameter `-r "reverse prompt string"`. This will result in user input being prompted whenever the exact tokens of the reverse prompt string are encountered in the generation. A typical use is to use a prompt that makes LLaMa emulate a chat between multiple users, say Alice and Bob, and pass `-r "Alice:"`.
231
230
232
-
Here is an example few-shot interaction, invoked with the command
231
+
Here is an example of a few-shot interaction, invoked with the command
### Using [GPT4All](https://github.com/nomic-ai/gpt4all)
278
277
279
278
- Obtain the `gpt4all-lora-quantized.bin` model
280
-
- It is distributed in the old `ggml` format which is now obsoleted
279
+
- It is distributed in the old `ggml` format, which is now obsoleted
281
280
- You have to convert it to the new format using [./convert-gpt4all-to-ggml.py](./convert-gpt4all-to-ggml.py). You may also need to
282
281
convert the model from the old format to the new format with [./migrate-ggml-2023-03-30-pr613.py](./migrate-ggml-2023-03-30-pr613.py):
283
282
@@ -291,7 +290,7 @@ convert the model from the old format to the new format with [./migrate-ggml-202
291
290
292
291
### Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data
293
292
294
-
-**Under no circumstances share IPFS, magnet links, or any other links to model downloads anywhere in this respository, including in issues, discussions or pull requests. They will be immediately deleted.**
293
+
-**Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.**
295
294
- The LLaMA models are officially distributed by Facebook and will **never** be provided through this repository.
296
295
- Refer to [Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to request access to the model data.
297
296
- Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files.
@@ -303,29 +302,27 @@ convert the model from the old format to the new format with [./migrate-ggml-202
303
302
304
303
`shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS
305
304
306
-
- If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
307
-
- LLaMA:
308
-
-[Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
309
-
-[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
310
-
- GPT-3
311
-
-[Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
312
-
- GPT-3.5 / InstructGPT / ChatGPT:
313
-
-[Aligning language models to follow instructions](https://openai.com/research/instruction-following)
314
-
-[Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
305
+
- If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
306
+
- LLaMA:
307
+
-[Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
308
+
-[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
309
+
- GPT-3
310
+
-[Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
311
+
- GPT-3.5 / InstructGPT / ChatGPT:
312
+
-[Aligning language models to follow instructions](https://openai.com/research/instruction-following)
313
+
-[Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
315
314
316
-
### Perplexity (Measuring model quality)
315
+
### Perplexity (measuring model quality)
317
316
318
-
You can use the `perplexity` example to measure perplexity over the given prompt. For more background,
319
-
see https://huggingface.co/docs/transformers/perplexity. However, in general, lower perplexity is better for LLMs.
317
+
You can use the `perplexity` example to measure perplexity over the given prompt. For more background, see [https://huggingface.co/docs/transformers/perplexity](https://huggingface.co/docs/transformers/perplexity). However, in general, lower perplexity is better for LLMs.
320
318
321
319
#### Latest measurements
322
320
323
-
The latest perplexity scores for the various model sizes and quantizations are being tracked in [discussion #406](https://github.com/ggerganov/llama.cpp/discussions/406). `llama.cpp` is measuring very well
324
-
compared to the baseline implementations. Quantization has a small negative impact to quality, but, as you can see, running
321
+
The latest perplexity scores for the various model sizes and quantizations are being tracked in [discussion #406](https://github.com/ggerganov/llama.cpp/discussions/406). `llama.cpp` is measuring very well compared to the baseline implementations. Quantization has a small negative impact on quality, but, as you can see, running
325
322
13B at q4_0 beats the 7B f16 model by a significant amount.
326
323
327
-
All measurements are done against wikitext2 test dataset (https://paperswithcode.com/dataset/wikitext-2), with default options (512 length context).
328
-
Note that the changing the context length will have a significant impact on perplexity (longer context = better perplexity).
324
+
All measurements are done against the wikitext2 test dataset (https://paperswithcode.com/dataset/wikitext-2), with default options (512 length context).
325
+
Note that changing the context length will have a significant impact on perplexity (longer context = better perplexity).
* Docker must be installed and running on your system.
370
-
* Create a folder to store big models & intermediate files (in ex. im using /llama/models)
367
+
* Create a folder to store big models & intermediate files (ex. /llama/models)
371
368
372
369
#### Images
373
370
We have two Docker images available for this project:
@@ -381,17 +378,17 @@ The easiest way to download the models, convert them to ggml and optimize them i
381
378
382
379
Replace `/path/to/models` below with the actual path where you downloaded the models.
383
380
384
-
```bash
381
+
```bash
385
382
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
386
383
```
387
384
388
-
On complete, you are ready to play!
385
+
On completion, you are ready to play!
389
386
390
387
```bash
391
388
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
392
389
```
393
390
394
-
or with light image:
391
+
or with a light image:
395
392
396
393
```bash
397
394
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
- Always consider cross-compatibility with other operating systems and architectures
413
410
- Avoid fancy looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
414
411
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
415
-
- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a`
412
+
- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`
416
413
- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions
0 commit comments