Skip to content

Commit 84525e7

Browse files
docker : add support for CUDA in docker (#1461)
Co-authored-by: canardleteer <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
1 parent a7e20ed commit 84525e7

File tree

4 files changed

+104
-1
lines changed

4 files changed

+104
-1
lines changed

.devops/full-cuda.Dockerfile

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
ARG UBUNTU_VERSION=22.04
2+
3+
# This needs to generally match the container host's environment.
4+
ARG CUDA_VERSION=11.7.1
5+
6+
# Target the CUDA build image
7+
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
8+
9+
FROM ${BASE_CUDA_DEV_CONTAINER} as build
10+
11+
# Unless otherwise specified, we make a fat build.
12+
ARG CUDA_DOCKER_ARCH=all
13+
14+
RUN apt-get update && \
15+
apt-get install -y build-essential python3 python3-pip
16+
17+
COPY requirements.txt requirements.txt
18+
19+
RUN pip install --upgrade pip setuptools wheel \
20+
&& pip install -r requirements.txt
21+
22+
WORKDIR /app
23+
24+
COPY . .
25+
26+
# Set nvcc architecture
27+
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
28+
# Enable cuBLAS
29+
ENV LLAMA_CUBLAS=1
30+
31+
RUN make
32+
33+
ENTRYPOINT ["/app/.devops/tools.sh"]

.devops/main-cuda.Dockerfile

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
ARG UBUNTU_VERSION=22.04
2+
# This needs to generally match the container host's environment.
3+
ARG CUDA_VERSION=11.7.1
4+
# Target the CUDA build image
5+
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
6+
# Target the CUDA runtime image
7+
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
8+
9+
FROM ${BASE_CUDA_DEV_CONTAINER} as build
10+
11+
# Unless otherwise specified, we make a fat build.
12+
ARG CUDA_DOCKER_ARCH=all
13+
14+
RUN apt-get update && \
15+
apt-get install -y build-essential
16+
17+
WORKDIR /app
18+
19+
COPY . .
20+
21+
# Set nvcc architecture
22+
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
23+
# Enable cuBLAS
24+
ENV LLAMA_CUBLAS=1
25+
26+
RUN make
27+
28+
FROM ${BASE_CUDA_RUN_CONTAINER} as runtime
29+
30+
COPY --from=build /app/main /main
31+
32+
ENTRYPOINT [ "/main" ]

Makefile

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,12 @@ ifdef LLAMA_CUBLAS
163163
LDFLAGS += -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
164164
OBJS += ggml-cuda.o
165165
NVCC = nvcc
166-
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=native
166+
NVCCFLAGS = --forward-unknown-to-host-compiler
167+
ifdef CUDA_DOCKER_ARCH
168+
NVCCFLAGS += -Wno-deprecated-gpu-targets -arch=$(CUDA_DOCKER_ARCH)
169+
else
170+
NVCCFLAGS += -arch=native
171+
endif # CUDA_DOCKER_ARCH
167172
ifdef LLAMA_CUDA_FORCE_DMMV
168173
NVCCFLAGS += -DGGML_CUDA_FORCE_DMMV
169174
endif # LLAMA_CUDA_FORCE_DMMV
@@ -187,6 +192,7 @@ ifdef LLAMA_CUDA_KQUANTS_ITER
187192
else
188193
NVCCFLAGS += -DK_QUANTS_PER_ITERATION=2
189194
endif
195+
190196
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h
191197
$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@
192198
endif # LLAMA_CUBLAS

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -731,6 +731,38 @@ or with a light image:
731731
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
732732
```
733733

734+
### Docker With CUDA
735+
736+
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
737+
738+
#### Building Locally
739+
740+
```bash
741+
docker build -t local/llama.cpp:full-cuda -f .devops/full-cuda.Dockerfile .
742+
docker build -t local/llama.cpp:light-cuda -f .devops/main-cuda.Dockerfile .
743+
```
744+
745+
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
746+
747+
The defaults are:
748+
749+
- `CUDA_VERSION` set to `11.7.1`
750+
- `CUDA_DOCKER_ARCH` set to `all`
751+
752+
The resulting images, are essentially the same as the non-CUDA images:
753+
754+
1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
755+
2. `local/llama.cpp:light-cuda`: This image only includes the main executable file.
756+
757+
#### Usage
758+
759+
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
760+
761+
```bash
762+
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
763+
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
764+
```
765+
734766
### Contributing
735767
736768
- Contributors can open PRs

0 commit comments

Comments
 (0)