Add Doc for Converting Granite Vision -> GGUF #12006

alex-jw-brooks · 2025-02-21T17:15:23Z

Adds example docs for converting a granite vision model, which is essentially a llava next model with multiple feature layers using siglip for the visual encoder, and a granite language model as the LLM.

Depends on #11794

CC @danbev

Signed-off-by: Alex-Brooks <[email protected]>

Signed-off-by: Alex-Brooks <[email protected]> Remove trailing whitespace Signed-off-by: Alex-Brooks <[email protected]>

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

samkoesnadi · 2025-02-27T14:11:51Z

@alex-jw-brooks Hi Alex, thanks for the contribution here! I just want to understand this specific model more. As I understand this comes from IBM, how does this compare to Qwen 2.5 VL?

alex-jw-brooks · 2025-02-28T08:32:03Z

Hi @samkoesnadi! Thanks for your interest 🙂 I'm unfortunately not super familiar with all the details of Qwen2.5 VL quite yet, but hopefully I can help at least explain some details about our model to illuminate how it's similar to other existing model architectures and what use-cases you should use them for.

The best way to understand granite vision is actually to compare it to Llava Next, because they are very similar architecturally. The main differences compared to some other llava next models:

It uses multiple feature layers from the visual encoder
The visual encoder uses siglip instead of clip, which means larger tiles used in anyres / more image features per tile
Uses a granite LLM for the choice of the LLM
Handles a pretty wide variety of aspect ratios, i.e., more choices for image grid pinpoints to be used by anyres

In terms of use-cases, granite vision is largely fine-tuned for document understanding type tasks.

For how it compares to Qwen2.5 VL - I'd suggest reading the granite vision technical report and comparing it with the Qwen 2.5 VL technical report to look more deeply into technical differences and model performance. Our 2b model is also Apache 2.0 licensed, whereas 3B Qwen2.5 VL model is not (although 7B is)

samkoesnadi · 2025-02-28T09:09:11Z

Hi @samkoesnadi! Thanks for your interest 🙂 I'm unfortunately not super familiar with all the details of Qwen2.5 VL quite yet, but hopefully I can help at least explain some details about our model to illuminate how it's similar to other existing model architectures and what use-cases you should use them for.

The best way to understand granite vision is actually to compare it to Llava Next, because they are very similar architecturally. The main differences compared to some other llava next models:

It uses multiple feature layers from the visual encoder

The visual encoder uses siglip instead of clip, which means larger tiles used in anyres / more image features per tile

Uses a granite LLM for the choice of the LLM

Handles a pretty wide variety of aspect ratios, i.e., more choices for image grid pinpoints to be used by anyres

In terms of use-cases, granite vision is largely fine-tuned for document understanding type tasks.

For how it compares to Qwen2.5 VL - I'd suggest reading the granite vision technical report and comparing it with the Qwen 2.5 VL technical report to look more deeply into technical differences and model performance. Our 2b model is also Apache 2.0 licensed, whereas 3B Qwen2.5 VL model is not (although 7B is)

Thank you for your clear explanation, appreciate it :)

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

Add example docs for granite vision

3a1d98f

Signed-off-by: Alex-Brooks <[email protected]>

github-actions bot added the examples label Feb 21, 2025

Remove local paths from granite vision docs

7b4b18d

Signed-off-by: Alex-Brooks <[email protected]> Remove trailing whitespace Signed-off-by: Alex-Brooks <[email protected]>

alex-jw-brooks force-pushed the granite_vision_docs branch from ba79be7 to 7b4b18d Compare February 21, 2025 17:32

danbev approved these changes Feb 22, 2025

View reviewed changes

danbev merged commit 4d1051a into ggml-org:master Feb 25, 2025
2 checks passed

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

Add Doc for Converting Granite Vision -> GGUF (ggml-org#12006)

4d5d2fe

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

Add Doc for Converting Granite Vision -> GGUF (ggml-org#12006)

7ea902a

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025

Add Doc for Converting Granite Vision -> GGUF (ggml-org#12006)

fc55585

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025

Add Doc for Converting Granite Vision -> GGUF (ggml-org#12006)

3b005ec

* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Doc for Converting Granite Vision -> GGUF #12006

Add Doc for Converting Granite Vision -> GGUF #12006

Uh oh!

alex-jw-brooks commented Feb 21, 2025

Uh oh!

Uh oh!

samkoesnadi commented Feb 27, 2025 •

edited

Loading

Uh oh!

alex-jw-brooks commented Feb 28, 2025 •

edited

Loading

Uh oh!

samkoesnadi commented Feb 28, 2025

Uh oh!

Uh oh!

Add Doc for Converting Granite Vision -> GGUF #12006

Add Doc for Converting Granite Vision -> GGUF #12006

Uh oh!

Conversation

alex-jw-brooks commented Feb 21, 2025

Uh oh!

Uh oh!

samkoesnadi commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-jw-brooks commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samkoesnadi commented Feb 28, 2025

Uh oh!

Uh oh!

samkoesnadi commented Feb 27, 2025 •

edited

Loading

alex-jw-brooks commented Feb 28, 2025 •

edited

Loading