You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -13,9 +13,13 @@ First, we need to run the llava surgery script as shown below:
13
13
14
14
`python llava_surgery_v2.py -C -m $GRANITE_MODEL`
15
15
16
-
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory. You can load them directly with pytorch and validate that they are nonempty using the snippet below.
16
+
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory, as shown below.
17
17
18
-
`ls $GRANITE_MODEL | grep -i llava`
18
+
```bash
19
+
$ ls $GRANITE_MODEL| grep -i llava
20
+
llava.clip
21
+
llava.projector
22
+
```
19
23
20
24
We should see that the projector and visual encoder get split out into the llava files. Quick check to make sure they aren't empty:
21
25
```python
@@ -83,7 +87,6 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
83
87
"num_attention_heads": 16,
84
88
"num_hidden_layers": 27,
85
89
"patch_size": 14,
86
-
"transformers_version": "4.45.0.dev0",
87
90
"layer_norm_eps": 1e-6,
88
91
"hidden_act": "gelu_pytorch_tanh",
89
92
"projection_dim": 0,
@@ -93,24 +96,24 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
93
96
94
97
Create a new directory to hold the visual components, and copy the llava.clip/projector files, as well as the vision config into it.
At which point you should have something like this:
106
109
```bash
107
-
(venv) alexanderjbrooks@Alexanders-MacBook-Pro llava % ls $ENCODER_PATH
110
+
$ ls $ENCODER_PATH
108
111
config.json llava.projector pytorch_model.bin
109
112
```
110
113
111
114
Now convert the components to GGUF; Note that we also override the image mean/std dev to `[.5,.5,.5]` since we use the siglip visual encoder - in the transformers model, you can find these numbers in the [preprocessor_config.json](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview/blob/main/preprocessor_config.json).
112
115
```bash
113
-
python convert_image_encoder_to_gguf.py \
116
+
$ python convert_image_encoder_to_gguf.py \
114
117
-m $ENCODER_PATH \
115
118
--llava-projector $ENCODER_PATH/llava.projector \
116
119
--output-dir $ENCODER_PATH \
@@ -127,7 +130,7 @@ The granite vision model contains a granite LLM as its language model. For now,
127
130
128
131
First, set the `LLM_EXPORT_PATH` to the path to export the `transformers` LLM to.
@@ -168,7 +172,7 @@ Build llama cpp normally; you should have a target binary named `llama-llava-cli
168
172
Note - the test image shown below can be found [here](https://github-production-user-asset-6210df.s3.amazonaws.com/10740300/415512792-d90d5562-8844-4f34-a0a5-77f62d5a58b5.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20250221%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250221T054145Z&X-Amz-Expires=300&X-Amz-Signature=86c60be490aa49ef7d53f25d6c973580a8273904fed11ed2453d0a38240ee40a&X-Amz-SignedHeaders=host).
0 commit comments