Skip to content

Commit cf4931a

Browse files
committed
Working Open Llama 3B in a box
1 parent 217d783 commit cf4931a

File tree

6 files changed

+64
-14
lines changed

6 files changed

+64
-14
lines changed

docker/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
- `Dockerfile` - a single OpenBLAS and CuBLAS combined Dockerfile that automatically installs a previously downloaded model `model.bin`
2525

2626
## Download a Llama Model from Hugging Face
27-
- To download a MIT licensed Llama model run: `python3 ./hug_model.py -a vihangd -s open_llama_7b_700bt_ggml`
27+
- To download a MIT licensed Llama model you can run: `python3 ./hug_model.py -a vihangd -s open_llama_7b_700bt_ggml -f ggml-model-q5_1.bin`
2828
- To select and install a restricted license Llama model run: `python3 ./hug_model.py -a TheBloke -t llama`
2929
- You should now have a model in the current directory and `model.bin` symlinked to it for the subsequent Docker build and copy step. e.g.
3030
```
@@ -37,9 +37,10 @@ lrwxrwxrwx 1 user user 24 May 23 18:30 model.bin -> <downloaded-model-file>q5_
3737

3838
| Model | Quantized size |
3939
|------:|----------------:|
40+
| 3B | 3 GB |
4041
| 7B | 5 GB |
4142
| 13B | 10 GB |
42-
| 30B | 25 GB |
43+
| 33B | 25 GB |
4344
| 65B | 50 GB |
4445

4546
**Note #2:** If you want to pass or tune additional parameters, customise `./start_server.sh` before running `docker build ...`
File renamed without changes.

docker/open_llama/build.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/sh
2+
3+
MODEL="open_llama_3b"
4+
# Get open_llama_3b_ggml q5_1 quantization
5+
python3 ./hug_model.py -a SlyEcho -s ${MODEL} -f "q5_1"
6+
ls -lh *.bin
7+
8+
# Build the default OpenBLAS image
9+
docker build -t $MODEL .
10+
docker images | egrep "^(REPOSITORY|$MODEL)"
11+
12+
echo
13+
echo "To start the docker container run:"
14+
echo "docker run -t -p 8000:8000 $MODEL"

docker/auto_docker/hug_model.py renamed to docker/open_llama/hug_model.py

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -76,21 +76,23 @@ def main():
7676

7777
# Arguments
7878
parser.add_argument('-v', '--version', type=int, default=0x0003,
79-
help='an integer for the version to be used')
79+
help='hexadecimal version number of ggml file')
8080
parser.add_argument('-a', '--author', type=str, default='TheBloke',
81-
help='an author to be filtered')
82-
parser.add_argument('-t', '--tags', type=str, default='llama',
83-
help='tags for the content')
81+
help='HuggingFace author filter')
82+
parser.add_argument('-t', '--tag', type=str, default='llama',
83+
help='HuggingFace tag filter')
8484
parser.add_argument('-s', '--search', type=str, default='',
85-
help='search term')
85+
help='HuggingFace search filter')
86+
parser.add_argument('-f', '--filename', type=str, default='q5_1',
87+
help='HuggingFace model repository filename substring match')
8688

8789
# Parse the arguments
8890
args = parser.parse_args()
8991

9092
# Define the parameters
9193
params = {
9294
"author": args.author,
93-
"tags": args.tags,
95+
"tags": args.tag,
9496
"search": args.search
9597
}
9698

@@ -108,25 +110,30 @@ def main():
108110

109111
for sibling in model_info.get('siblings', []):
110112
rfilename = sibling.get('rfilename')
111-
if rfilename and 'q5_1' in rfilename:
113+
if rfilename and args.filename in rfilename:
112114
model_list.append((model_id, rfilename))
113115

114116
# Choose the model
115-
if len(model_list) == 1:
117+
model_list.sort(key=lambda x: x[0])
118+
if len(model_list) == 0:
119+
print("No models found")
120+
exit(1)
121+
elif len(model_list) == 1:
116122
model_choice = model_list[0]
117123
else:
118124
model_choice = get_user_choice(model_list)
119125

120126
if model_choice is not None:
121127
model_id, rfilename = model_choice
122128
url = f"https://huggingface.co/{model_id}/resolve/main/{rfilename}"
123-
download_file(url, rfilename)
124-
_, version = check_magic_and_version(rfilename)
129+
dest = f"{model_id.replace('/', '_')}_{rfilename}"
130+
download_file(url, dest)
131+
_, version = check_magic_and_version(dest)
125132
if version != args.version:
126133
print(f"Warning: Expected version {args.version}, but found different version in the file.")
127134
else:
128135
print("Error - model choice was None")
129-
exit(1)
136+
exit(2)
130137

131138
if __name__ == '__main__':
132139
main()

docker/open_llama/start.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/sh
2+
3+
MODEL="open_llama_3b"
4+
5+
# Start Docker container
6+
docker run --cap-add SYS_RESOURCE -p 8000:8000 -t $MODEL &
7+
sleep 10
8+
echo
9+
docker ps | egrep "(^CONTAINER|$MODEL)"
10+
11+
# Test the model works
12+
echo
13+
curl -X 'POST' 'http://localhost:8000/v1/completions' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
14+
"prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
15+
"stop": [
16+
"\n",
17+
"###"
18+
]
19+
}' | grep Paris
20+
if [ $? -eq 0 ]
21+
then
22+
echo
23+
echo "$MODEL is working!!"
24+
else
25+
echo
26+
echo "ERROR: $MODEL not replying."
27+
exit 1
28+
fi

docker/auto_docker/start_server.sh renamed to docker/open_llama/start_server.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/sh
22

3-
# For mmap support
3+
# For mlock support
44
ulimit -l unlimited
55

66
if [ "$IMAGE" = "python:3-slim-bullseye" ]; then

0 commit comments

Comments
 (0)