Skip to content

Commit 8352cdc

Browse files
authored
llava : fix bug in minicpm-v code (#11513)
* fix bug in minicpm-v code * update readme of minicpm-v
1 parent 1e2f78a commit 8352cdc

File tree

6 files changed

+80
-175
lines changed

6 files changed

+80
-175
lines changed

examples/llava/README-minicpmo2.6.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,25 @@ Currently, this readme only supports minicpm-omni's image capabilities, and we w
55

66
Download [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6) PyTorch model from huggingface to "MiniCPM-o-2_6" folder.
77

8+
9+
### Build llama.cpp
10+
Readme modification time: 20250206
11+
12+
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
13+
814
Clone llama.cpp:
915
```bash
10-
git clone git@github.com:OpenBMB/llama.cpp.git
16+
git clone https://github.com/ggerganov/llama.cpp
1117
cd llama.cpp
12-
git checkout minicpm-omni
1318
```
1419

20+
Build llama.cpp using `CMake`:
21+
```bash
22+
cmake -B build
23+
cmake --build build --config Release
24+
```
25+
26+
1527
### Usage of MiniCPM-o 2.6
1628

1729
Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf) by us)
@@ -22,25 +34,15 @@ python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-
2234
python ./convert_hf_to_gguf.py ../MiniCPM-o-2_6/model
2335

2436
# quantize int4 version
25-
./llama-quantize ../MiniCPM-o-2_6/model/ggml-model-f16.gguf ../MiniCPM-o-2_6/model/ggml-model-Q4_K_M.gguf Q4_K_M
37+
./build/bin/llama-quantize ../MiniCPM-o-2_6/model/ggml-model-f16.gguf ../MiniCPM-o-2_6/model/ggml-model-Q4_K_M.gguf Q4_K_M
2638
```
2739

28-
Build llama.cpp using `CMake`:
29-
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
30-
31-
```bash
32-
cmake -B build
33-
cmake --build build --config Release
34-
```
3540

3641
Inference on Linux or Mac
37-
```
42+
```bash
3843
# run f16 version
39-
./llama-minicpmv-cli -m ../MiniCPM-o-2_6/model/ggml-model-f16.gguf --mmproj ../MiniCPM-o-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
44+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-o-2_6/model/ggml-model-f16.gguf --mmproj ../MiniCPM-o-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
4045

4146
# run quantized int4 version
42-
./llama-minicpmv-cli -m ../MiniCPM-o-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-o-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
43-
44-
# or run in interactive mode
45-
./llama-minicpmv-cli -m ../MiniCPM-o-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-o-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
47+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-o-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-o-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
4648
```

examples/llava/README-minicpmv2.5.md

Lines changed: 18 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,26 @@
44

55
Download [MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) PyTorch model from huggingface to "MiniCPM-Llama3-V-2_5" folder.
66

7+
8+
### Build llama.cpp
9+
Readme modification time: 20250206
10+
11+
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
12+
713
Clone llama.cpp:
814
```bash
915
git clone https://github.com/ggml-org/llama.cpp
1016
cd llama.cpp
1117
```
1218

13-
### Usage
19+
Build llama.cpp using `CMake`:
20+
```bash
21+
cmake -B build
22+
cmake --build build --config Release
23+
```
24+
25+
26+
### Usage of MiniCPM-Llama3-V 2.5
1427

1528
Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) by us)
1629

@@ -20,80 +33,15 @@ python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-
2033
python ./convert_hf_to_gguf.py ../MiniCPM-Llama3-V-2_5/model
2134

2235
# quantize int4 version
23-
./llama-quantize ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf Q4_K_M
36+
./build/bin/llama-quantize ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf Q4_K_M
2437
```
2538

26-
Build for Linux or Mac
27-
28-
```bash
29-
make
30-
make llama-minicpmv-cli
31-
```
3239

3340
Inference on Linux or Mac
34-
```
41+
```bash
3542
# run f16 version
36-
./llama-minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
43+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
3744

3845
# run quantized int4 version
39-
./llama-minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
40-
41-
# or run in interactive mode
42-
./llama-minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
43-
```
44-
45-
### Android
46-
47-
#### Build on Android device using Termux
48-
We found that build on Android device would bring better runtime performance, so we recommend to build on device.
49-
50-
[Termux](https://github.com/termux/termux-app#installation) is a terminal app on Android device (no root required).
51-
52-
Install tools in Termux:
53-
```
54-
apt update && apt upgrade -y
55-
apt install git make cmake
56-
```
57-
58-
It's recommended to move your model inside the `~/` directory for best performance:
59-
```
60-
cd storage/downloads
61-
mv model.gguf ~/
62-
```
63-
64-
#### Building the Project using Android NDK
65-
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake.
66-
67-
Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux:
68-
69-
```bash
70-
mkdir build-android
71-
cd build-android
72-
export NDK=/your_ndk_path
73-
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
74-
make
75-
```
76-
77-
Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice).
78-
79-
Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission:
80-
81-
(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`)
82-
```
83-
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/
84-
$cd /data/data/com.termux/files/home/bin
85-
$chmod +x ./*
86-
```
87-
88-
Download models and push them to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/`
89-
90-
```
91-
$mv /sdcard/llama.cpp/ggml-model-Q4_K_M.gguf /data/data/com.termux/files/home/model/
92-
$mv /sdcard/llama.cpp/mmproj-model-f16.gguf /data/data/com.termux/files/home/model/
93-
```
94-
95-
Now, you can start chatting:
96-
```
97-
$cd /data/data/com.termux/files/home/bin
98-
$./llama-minicpmv-cli -m ../model/ggml-model-Q4_K_M.gguf --mmproj ../model/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
46+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
9947
```

examples/llava/README-minicpmv2.6.md

Lines changed: 18 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,25 @@
44

55
Download [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) PyTorch model from huggingface to "MiniCPM-V-2_6" folder.
66

7+
8+
### Build llama.cpp
9+
Readme modification time: 20250206
10+
11+
If there are differences in usage, please refer to the official build [documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md)
12+
713
Clone llama.cpp:
814
```bash
9-
git clone git@github.com:OpenBMB/llama.cpp.git
15+
git clone https://github.com/ggerganov/llama.cpp
1016
cd llama.cpp
11-
git checkout minicpmv-main
1217
```
1318

19+
Build llama.cpp using `CMake`:
20+
```bash
21+
cmake -B build
22+
cmake --build build --config Release
23+
```
24+
25+
1426
### Usage of MiniCPM-V 2.6
1527

1628
Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf) by us)
@@ -21,87 +33,15 @@ python ./examples/llava/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-
2133
python ./convert_hf_to_gguf.py ../MiniCPM-V-2_6/model
2234

2335
# quantize int4 version
24-
./llama-quantize ../MiniCPM-V-2_6/model/ggml-model-f16.gguf ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf Q4_K_M
36+
./build/bin/llama-quantize ../MiniCPM-V-2_6/model/ggml-model-f16.gguf ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf Q4_K_M
2537
```
2638

27-
Build for Linux or Mac
28-
29-
```bash
30-
make
31-
make llama-minicpmv-cli
32-
```
3339

3440
Inference on Linux or Mac
35-
```
41+
```bash
3642
# run f16 version
37-
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-f16.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
43+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-f16.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
3844

3945
# run quantized int4 version
40-
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
41-
42-
# or run in interactive mode
43-
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
44-
```
45-
46-
### Video
47-
Install FFmpeg
48-
```
49-
brew install ffmpeg
50-
brew install pkg-config
51-
```
52-
53-
### Android
54-
55-
#### Build on Android device using Termux
56-
We found that build on Android device would bring better runtime performance, so we recommend to build on device.
57-
58-
[Termux](https://github.com/termux/termux-app#installation) is a terminal app on Android device (no root required).
59-
60-
Install tools in Termux:
61-
```
62-
apt update && apt upgrade -y
63-
apt install git make cmake
64-
```
65-
66-
It's recommended to move your model inside the `~/` directory for best performance:
67-
```
68-
cd storage/downloads
69-
mv model.gguf ~/
70-
```
71-
72-
#### Building the Project using Android NDK
73-
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake.
74-
75-
Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux:
76-
77-
```bash
78-
mkdir build-android
79-
cd build-android
80-
export NDK=/your_ndk_path
81-
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
82-
make
83-
```
84-
85-
Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice).
86-
87-
Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission:
88-
89-
(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`)
90-
```
91-
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/
92-
$cd /data/data/com.termux/files/home/bin
93-
$chmod +x ./*
94-
```
95-
96-
Download models and push them to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/`
97-
98-
```
99-
$mv /sdcard/llama.cpp/ggml-model-Q4_K_M.gguf /data/data/com.termux/files/home/model/
100-
$mv /sdcard/llama.cpp/mmproj-model-f16.gguf /data/data/com.termux/files/home/model/
101-
```
102-
103-
Now, you can start chatting:
104-
```
105-
$cd /data/data/com.termux/files/home/bin
106-
$./llama-minicpmv-cli -m ../model/ggml-model-Q4_K_M.gguf --mmproj ../model/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
46+
./build/bin/llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
10747
```

examples/llava/clip.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1378,6 +1378,7 @@ struct clip_ctx * clip_model_load(const char * fname, const int verbosity = 1) {
13781378
LOG_INF("%s: vision_encoder: %d\n", __func__, new_clip->has_vision_encoder);
13791379
LOG_INF("%s: llava_projector: %d\n", __func__, new_clip->has_llava_projector);
13801380
LOG_INF("%s: minicpmv_projector: %d\n", __func__, new_clip->has_minicpmv_projector);
1381+
LOG_INF("%s: minicpmv_version: %d\n", __func__, new_clip->minicpmv_version);
13811382
LOG_INF("%s: glm_projector: %d\n", __func__, new_clip->has_glm_projector);
13821383
LOG_INF("%s: model size: %.2f MB\n", __func__, model_size / 1024.0 / 1024.0);
13831384
LOG_INF("%s: metadata size: %.2f MB\n", __func__, ggml_get_mem_size(meta) / 1024.0 / 1024.0);

examples/llava/minicpmv-cli.cpp

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -148,19 +148,34 @@ static void process_image(struct llava_context * ctx_llava, struct llava_image_e
148148
process_eval_image_embed(ctx_llava, embeds, params->n_batch, &n_past, idx++);
149149
eval_string(ctx_llava->ctx_llama, std::string("</image>").c_str(), params->n_batch, &n_past, false);
150150
if (num_image_embeds > 1) {
151-
size_t num_image_embeds_col = clip_uhd_num_image_embeds_col(ctx_llava->ctx_clip);
152-
eval_string(ctx_llava->ctx_llama, std::string("<slice>").c_str(), params->n_batch, &n_past, false);
153-
for (size_t i = 0; i < (num_image_embeds-1)/num_image_embeds_col; ++i) {
154-
for (size_t j = 0; j < num_image_embeds_col; ++j) {
155-
eval_string(ctx_llava->ctx_llama, std::string("<image>").c_str(), params->n_batch, &n_past, false);
156-
process_eval_image_embed(ctx_llava, embeds, params->n_batch, &n_past, idx++);
157-
eval_string(ctx_llava->ctx_llama, std::string("</image>").c_str(), params->n_batch, &n_past, false);
158-
if (j == num_image_embeds_col - 1) {
159-
eval_string(ctx_llava->ctx_llama, std::string("\n").c_str(), params->n_batch, &n_past, false);
151+
if (has_minicpmv_projector == 2) {
152+
size_t num_image_embeds_col = clip_uhd_num_image_embeds_col(ctx_llava->ctx_clip);
153+
eval_string(ctx_llava->ctx_llama, std::string("<slice>").c_str(), params->n_batch, &n_past, false);
154+
for (size_t i = 0; i < (num_image_embeds-1)/num_image_embeds_col; ++i) {
155+
for (size_t j = 0; j < num_image_embeds_col; ++j) {
156+
eval_string(ctx_llava->ctx_llama, std::string("<image>").c_str(), params->n_batch, &n_past, false);
157+
process_eval_image_embed(ctx_llava, embeds, params->n_batch, &n_past, idx++);
158+
eval_string(ctx_llava->ctx_llama, std::string("</image>").c_str(), params->n_batch, &n_past, false);
159+
if (j == num_image_embeds_col - 1) {
160+
eval_string(ctx_llava->ctx_llama, std::string("\n").c_str(), params->n_batch, &n_past, false);
161+
}
162+
}
163+
}
164+
eval_string(ctx_llava->ctx_llama, std::string("</slice>").c_str(), params->n_batch, &n_past, false);
165+
}
166+
else if (has_minicpmv_projector == 3 || has_minicpmv_projector == 4) {
167+
size_t num_image_embeds_col = clip_uhd_num_image_embeds_col(ctx_llava->ctx_clip);
168+
for (size_t i = 0; i < (num_image_embeds-1)/num_image_embeds_col; ++i) {
169+
for (size_t j = 0; j < num_image_embeds_col; ++j) {
170+
eval_string(ctx_llava->ctx_llama, std::string("<slice>").c_str(), params->n_batch, &n_past, false);
171+
process_eval_image_embed(ctx_llava, embeds, params->n_batch, &n_past, idx++);
172+
eval_string(ctx_llava->ctx_llama, std::string("</slice>").c_str(), params->n_batch, &n_past, false);
173+
if (j == num_image_embeds_col - 1) {
174+
eval_string(ctx_llava->ctx_llama, std::string("\n").c_str(), params->n_batch, &n_past, false);
175+
}
160176
}
161177
}
162178
}
163-
eval_string(ctx_llava->ctx_llama, std::string("</slice>").c_str(), params->n_batch, &n_past, false);
164179
}
165180
LOG_INF("%s: image token past: %d\n", __func__, n_past);
166181
}

examples/llava/minicpmv-convert-image-encoder-to-gguf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -597,7 +597,6 @@ def bytes_to_unicode():
597597
fname_middle = "mmproj-"
598598
has_text_encoder = False
599599
has_minicpmv_projector = True
600-
minicpmv_version = 4
601600
elif args.vision_only:
602601
fname_middle = "vision-"
603602
has_text_encoder = False

0 commit comments

Comments
 (0)