You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -377,7 +378,7 @@ Building the program with BLAS support may lead to some performance improvements
377
378
378
379
- #### cuBLAS
379
380
380
-
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDAToolkit](https://developer.nvidia.com/cuda-downloads).
381
+
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDAToolkit](https://developer.nvidia.com/cuda-downloads).
381
382
-Using `make`:
382
383
```bash
383
384
make LLAMA_CUBLAS=1
@@ -613,6 +614,18 @@ For more information, see [https://huggingface.co/docs/transformers/perplexity](
613
614
The perplexity measurements in table above are done against the `wikitext2` test dataset (https://paperswithcode.com/dataset/wikitext-2), with context length of 512.
614
615
The time per token is measured on a MacBookM1Pro 32GB RAM using 4 and 8 threads.
0 commit comments