-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Add Doc for Converting Granite Vision -> GGUF #12006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]> Remove trailing whitespace Signed-off-by: Alex-Brooks <[email protected]>
ba79be7
to
7b4b18d
Compare
* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>
@alex-jw-brooks Hi Alex, thanks for the contribution here! I just want to understand this specific model more. As I understand this comes from IBM, how does this compare to Qwen 2.5 VL? |
Hi @samkoesnadi! Thanks for your interest 🙂 I'm unfortunately not super familiar with all the details of Qwen2.5 VL quite yet, but hopefully I can help at least explain some details about our model to illuminate how it's similar to other existing model architectures and what use-cases you should use them for. The best way to understand granite vision is actually to compare it to Llava Next, because they are very similar architecturally. The main differences compared to some other llava next models:
In terms of use-cases, granite vision is largely fine-tuned for document understanding type tasks. For how it compares to Qwen2.5 VL - I'd suggest reading the granite vision technical report and comparing it with the Qwen 2.5 VL technical report to look more deeply into technical differences and model performance. Our 2b model is also Apache 2.0 licensed, whereas 3B Qwen2.5 VL model is not (although 7B is) |
Thank you for your clear explanation, appreciate it :) |
* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>
* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>
* Add example docs for granite vision Signed-off-by: Alex-Brooks <[email protected]>
Adds example docs for converting a granite vision model, which is essentially a llava next model with multiple feature layers using siglip for the visual encoder, and a granite language model as the LLM.
Depends on #11794
CC @danbev