Llama.cpp in Docker

Run llama.cpp in a GPU accelerated Docker container.

Options

Options are specified as environment variables in the docker-compose.yml file. By default, the following options are set:

GGML_CUDA_NO_PINNED: Disable pinned memory for compatability (default is 1)
LLAMA_ARG_CTX_SIZE: The context size to use (default is 2048)
LLAMA_ARG_N_GPU_LAYERS: The number of layers to run on the GPU (default is 99)

See the llama.cpp documentation for the complete list of server options.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github/workflows		.github/workflows
models		models
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Dockerfile-cpu		Dockerfile-cpu
LICENSE		LICENSE
Makefile		Makefile
NOTES.md		NOTES.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh