You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -13,9 +13,13 @@ First, we need to run the llava surgery script as shown below:
13
13
14
14
`python llava_surgery_v2.py -C -m $GRANITE_MODEL`
15
15
16
-
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory. You can load them directly with pytorch and validate that they are nonempty using the snippet below.
16
+
You should see two new files (`llava.clip` and `llava.projector`) written into your model's directory, as shown below.
17
17
18
-
`ls $GRANITE_MODEL | grep -i llava`
18
+
```bash
19
+
$ ls $GRANITE_MODEL| grep -i llava
20
+
llava.clip
21
+
llava.projector
22
+
```
19
23
20
24
We should see that the projector and visual encoder get split out into the llava files. Quick check to make sure they aren't empty:
21
25
```python
@@ -37,7 +41,7 @@ If you actually inspect the `.keys()` of the loaded tensors, you should see a lo
37
41
38
42
39
43
### 2. Creating the Visual Component GGUF
40
-
To create the GGUF for the visual components, we need to write a config for the visual encoder; make sure the config contains the correct `image_grid_pinpoints`
44
+
To create the GGUF for the visual components, we need to write a config for the visual encoder; make sure the config contains the correct `image_grid_pinpoints`
41
45
42
46
43
47
Note: we refer to this file as `$VISION_CONFIG` later on.
@@ -83,7 +87,6 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
83
87
"num_attention_heads": 16,
84
88
"num_hidden_layers": 27,
85
89
"patch_size": 14,
86
-
"transformers_version": "4.45.0.dev0",
87
90
"layer_norm_eps": 1e-6,
88
91
"hidden_act": "gelu_pytorch_tanh",
89
92
"projection_dim": 0,
@@ -93,24 +96,24 @@ Note: we refer to this file as `$VISION_CONFIG` later on.
93
96
94
97
Create a new directory to hold the visual components, and copy the llava.clip/projector files, as well as the vision config into it.
At which point you should have something like this:
106
109
```bash
107
-
(venv) alexanderjbrooks@Alexanders-MacBook-Pro llava % ls $ENCODER_PATH
110
+
$ ls $ENCODER_PATH
108
111
config.json llava.projector pytorch_model.bin
109
112
```
110
113
111
114
Now convert the components to GGUF; Note that we also override the image mean/std dev to `[.5,.5,.5]` since we use the siglip visual encoder - in the transformers model, you can find these numbers in the [preprocessor_config.json](https://huggingface.co/ibm-granite/granite-vision-3.1-2b-preview/blob/main/preprocessor_config.json).
112
115
```bash
113
-
python convert_image_encoder_to_gguf.py \
116
+
$ python convert_image_encoder_to_gguf.py \
114
117
-m $ENCODER_PATH \
115
118
--llava-projector $ENCODER_PATH/llava.projector \
116
119
--output-dir $ENCODER_PATH \
@@ -123,11 +126,11 @@ this will create the first GGUF file at `$ENCODER_PATH/mmproj-model-f16.gguf`; w
123
126
124
127
125
128
### 3. Creating the LLM GGUF.
126
-
The granite vision model contains a granite LLM as its language model. For now, the easiest way to get the GGUF for LLM is by loading the composite model in `transformers` and exporting the LLM so that it can be directly converted with the normal conversion path.
129
+
The granite vision model contains a granite LLM as its language model. For now, the easiest way to get the GGUF for LLM is by loading the composite model in `transformers` and exporting the LLM so that it can be directly converted with the normal conversion path.
127
130
128
131
First, set the `LLM_EXPORT_PATH` to the path to export the `transformers` LLM to.
@@ -168,7 +172,7 @@ Build llama cpp normally; you should have a target binary named `llama-llava-cli
168
172
Note - the test image shown below can be found [here](https://github-production-user-asset-6210df.s3.amazonaws.com/10740300/415512792-d90d5562-8844-4f34-a0a5-77f62d5a58b5.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20250221%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250221T054145Z&X-Amz-Expires=300&X-Amz-Signature=86c60be490aa49ef7d53f25d6c973580a8273904fed11ed2453d0a38240ee40a&X-Amz-SignedHeaders=host).
0 commit comments