You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
@@ -543,6 +543,11 @@ The defaults are:
543
543
-`CUDA_VERSION` set to `11.7.1`
544
544
-`CUDA_DOCKER_ARCH` set to `all`
545
545
546
+
The resulting images, are essentially the same as the non-CUDA images:
547
+
548
+
1.`local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
549
+
2.`local/llama.cpp:light-cuda`: This image only includes the main executable file.
550
+
546
551
#### Usage
547
552
548
553
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
0 commit comments