Skip to content

Commit d010ea7

Browse files
authored
Fixed Cuda Dockerfile
Previously models produced garbage output when running on GPU with layers offloaded. Similar to related fix on another repo: bartowski1182/koboldcpp-docker@331326a
1 parent 66fb034 commit d010ea7

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

docker/cuda_simple/Dockerfile

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,24 @@ FROM nvidia/cuda:${CUDA_IMAGE}
44
# We need to set the host to 0.0.0.0 to allow outside access
55
ENV HOST 0.0.0.0
66

7+
RUN apt-get update && apt-get upgrade -y \
8+
&& apt-get install -y git build-essential \
9+
python3 python3-pip gcc wget \
10+
ocl-icd-opencl-dev opencl-headers clinfo \
11+
libclblast-dev libopenblas-dev \
12+
&& mkdir -p /etc/OpenCL/vendors && echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
13+
714
COPY . .
815

9-
# Install the package
10-
RUN apt update && apt install -y python3 python3-pip
16+
# setting build related env vars
17+
ENV CUDA_DOCKER_ARCH=all
18+
ENV LLAMA_CUBLAS=1
19+
20+
# Install depencencies
1121
RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings
1222

13-
RUN LLAMA_CUBLAS=1 pip install llama-cpp-python
23+
# Install llama-cpp-python (build with cuda)
24+
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
1425

1526
# Run the server
1627
CMD python3 -m llama_cpp.server

0 commit comments

Comments
 (0)