Skip to content

Commit 7b4b18d

Browse files
Remove local paths from granite vision docs
Signed-off-by: Alex-Brooks <[email protected]> Remove trailing whitespace Signed-off-by: Alex-Brooks <[email protected]>
1 parent 3a1d98f commit 7b4b18d

File tree

1 file changed

+24
-20
lines changed

1 file changed

+24
-20
lines changed

examples/llava/README-granitevision.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
Download the model and point your `GRANITE_MODEL` environment variable to the path.
44

55
```bash
6-
git clone https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview
7-
export GRANITE_MODEL=/Users/alexanderjbrooks/workspace/develop/llama.cpp/examples/llava/granite-vision-3.1-2b-preview
6+
$ git clone https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview
7+
$ export GRANITE_MODEL=./granite-vision-3.1-2b-preview
88
```
99

1010

@@ -13,9 +13,13 @@ First, we need to run the llava surgery script as shown below:
1313

1414
`python llava_surgery_v2.py -C -m $GRANITE_MODEL`
1515

16-
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory. You can load them directly with pytorch and validate that they are nonempty using the snippet below.
16+
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory, as shown below.
1717

18-
`ls $GRANITE_MODEL | grep -i llava`
18+
```bash
19+
$ ls $GRANITE_MODEL | grep -i llava
20+
llava.clip
21+
llava.projector
22+
```
1923

2024
We should see that the projector and visual encoder get split out into the llava files. Quick check to make sure they aren't empty:
2125
```python
@@ -37,7 +41,7 @@ If you actually inspect the `.keys()` of the loaded tensors, you should see a lo
3741

3842

3943
### 2. Creating the Visual Component GGUF
40-
To create the GGUF for the visual components, we need to write a config for the visual encoder; make sure the config contains the correct `image_grid_pinpoints`
44+
To create the GGUF for the visual components, we need to write a config for the visual encoder; make sure the config contains the correct `image_grid_pinpoints`
4145

4246

4347
Note: we refer to this file as `$VISION_CONFIG` later on.
@@ -83,7 +87,6 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
8387
"num_attention_heads": 16,
8488
"num_hidden_layers": 27,
8589
"patch_size": 14,
86-
"transformers_version": "4.45.0.dev0",
8790
"layer_norm_eps": 1e-6,
8891
"hidden_act": "gelu_pytorch_tanh",
8992
"projection_dim": 0,
@@ -93,24 +96,24 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
9396

9497
Create a new directory to hold the visual components, and copy the llava.clip/projector files, as well as the vision config into it.
9598

96-
```
97-
ENCODER_PATH=/Users/alexanderjbrooks/workspace/develop/llama.cpp/examples/llava/visual_encoder
98-
mkdir $ENCODER_PATH
99+
```bash
100+
$ ENCODER_PATH=$PWD/visual_encoder
101+
$ mkdir $ENCODER_PATH
99102

100-
cp $GRANITE_MODEL/llava.clip $ENCODER_PATH/pytorch_model.bin
101-
cp $GRANITE_MODEL/llava.projector $ENCODER_PATH/
102-
cp $VISION_CONFIG $ENCODER_PATH/config.json
103+
$ cp $GRANITE_MODEL/llava.clip $ENCODER_PATH/pytorch_model.bin
104+
$ cp $GRANITE_MODEL/llava.projector $ENCODER_PATH/
105+
$ cp $VISION_CONFIG $ENCODER_PATH/config.json
103106
```
104107

105108
At which point you should have something like this:
106109
```bash
107-
(venv) alexanderjbrooks@Alexanders-MacBook-Pro llava % ls $ENCODER_PATH
110+
$ ls $ENCODER_PATH
108111
config.json llava.projector pytorch_model.bin
109112
```
110113

111114
Now convert the components to GGUF; Note that we also override the image mean/std dev to `[.5,.5,.5]` since we use the siglip visual encoder - in the transformers model, you can find these numbers in the [preprocessor_config.json](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview/blob/main/preprocessor_config.json).
112115
```bash
113-
python convert_image_encoder_to_gguf.py \
116+
$ python convert_image_encoder_to_gguf.py \
114117
-m $ENCODER_PATH \
115118
--llava-projector $ENCODER_PATH/llava.projector \
116119
--output-dir $ENCODER_PATH \
@@ -123,11 +126,11 @@ this will create the first GGUF file at `$ENCODER_PATH/mmproj-model-f16.gguf`; w
123126

124127

125128
### 3. Creating the LLM GGUF.
126-
The granite vision model contains a granite LLM as its language model. For now, the easiest way to get the GGUF for LLM is by loading the composite model in `transformers` and exporting the LLM so that it can be directly converted with the normal conversion path.
129+
The granite vision model contains a granite LLM as its language model. For now, the easiest way to get the GGUF for LLM is by loading the composite model in `transformers` and exporting the LLM so that it can be directly converted with the normal conversion path.
127130

128131
First, set the `LLM_EXPORT_PATH` to the path to export the `transformers` LLM to.
129132
```
130-
export LLM_EXPORT_PATH=/Users/alexanderjbrooks/workspace/develop/llama.cpp/examples/llava/granite_vision_llm
133+
$ export LLM_EXPORT_PATH=$PWD/granite_vision_llm
131134
```
132135

133136
```python
@@ -153,12 +156,13 @@ model = transformers.AutoModelForImageTextToText.from_pretrained(MODEL_PATH, ign
153156

154157
tokenizer.save_pretrained(LLM_EXPORT_PATH)
155158
model.language_model.save_pretrained(LLM_EXPORT_PATH)
156-
```
159+
```
157160

158161
Now you can convert the exported LLM to GGUF with the normal converter in the root of the llama cpp project.
159162
```bash
160-
LLM_GGUF_PATH=$LLM_EXPORT_PATH/granite_llm.gguf
161-
python convert_hf_to_gguf.py --outfile $LLM_GGUF_PATH $LLM_EXPORT_PATH
163+
$ LLM_GGUF_PATH=$LLM_EXPORT_PATH/granite_llm.gguf
164+
...
165+
$ python convert_hf_to_gguf.py --outfile $LLM_GGUF_PATH $LLM_EXPORT_PATH
162166
```
163167

164168

@@ -168,7 +172,7 @@ Build llama cpp normally; you should have a target binary named `llama-llava-cli
168172
Note - the test image shown below can be found [here](https://github-production-user-asset-6210df.s3.amazonaws.com/10740300/415512792-d90d5562-8844-4f34-a0a5-77f62d5a58b5.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20250221%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250221T054145Z&X-Amz-Expires=300&X-Amz-Signature=86c60be490aa49ef7d53f25d6c973580a8273904fed11ed2453d0a38240ee40a&X-Amz-SignedHeaders=host).
169173

170174
```bash
171-
./build/bin/llama-llava-cli -m $LLM_GGUF_PATH \
175+
$ ./build/bin/llama-llava-cli -m $LLM_GGUF_PATH \
172176
--mmproj $VISUAL_GGUF_PATH \
173177
--image cherry_blossom.jpg \
174178
-c 16384 \

0 commit comments

Comments
 (0)