Skip to content

Commit 3ae5a56

Browse files
BarfingLemursyusiwen
authored andcommitted
readme : update models, cuda + ppl instructions (ggml-org#3510)
1 parent 00355d3 commit 3ae5a56

File tree

1 file changed

+14
-13
lines changed

1 file changed

+14
-13
lines changed

README.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ as the main playground for developing new features for the [ggml](https://github
9595
- [X] [Aquila-7B](https://huggingface.co/BAAI/Aquila-7B) / [AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
9696
- [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187)
9797
- [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
98+
- [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
9899

99100
**Bindings:**
100101

@@ -377,7 +378,7 @@ Building the program with BLAS support may lead to some performance improvements
377378
378379
- #### cuBLAS
379380
380-
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
381+
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
381382
- Using `make`:
382383
```bash
383384
make LLAMA_CUBLAS=1
@@ -613,6 +614,18 @@ For more information, see [https://huggingface.co/docs/transformers/perplexity](
613614
The perplexity measurements in table above are done against the `wikitext2` test dataset (https://paperswithcode.com/dataset/wikitext-2), with context length of 512.
614615
The time per token is measured on a MacBook M1 Pro 32GB RAM using 4 and 8 threads.
615616

617+
#### How to run
618+
619+
1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
620+
2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
621+
3. Output:
622+
```
623+
perplexity : calculating perplexity over 655 chunks
624+
24.43 seconds per pass - ETA 4.45 hours
625+
[1]4.5970,[2]5.1807,[3]6.0382,...
626+
```
627+
And after 4.45 hours, you will have the final perplexity.
628+
616629
### Interactive mode
617630

618631
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
@@ -775,18 +788,6 @@ If your issue is with model generation quality, then please at least scan the fo
775788
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
776789
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
777790

778-
#### How to run
779-
780-
1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
781-
2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
782-
3. Output:
783-
```
784-
perplexity : calculating perplexity over 655 chunks
785-
24.43 seconds per pass - ETA 4.45 hours
786-
[1]4.5970,[2]5.1807,[3]6.0382,...
787-
```
788-
And after 4.45 hours, you will have the final perplexity.
789-
790791
### Android
791792

792793
#### Building the Project using Android NDK

0 commit comments

Comments
 (0)