Skip to content

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl #13434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 10, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented May 10, 2025

Fixes #13414

Models accepting dynamic resolution (Pixtral / Mistral Small / Qwen VL), we want to:

  • Have a max resolution. If image is bigger than the max res, it will be downscaled
  • Do a warm up with a more reasonable resolution instead of the max res, otherwise many users will get OOM (tbh I'm not quite sure if this is the best solution, but let's try this and also add a custom max_image_size in the future)
  • Resize the image to a multiple of patch_size (or in the case of Qwen VL, must be patch_size * 2)

Btw @ggerganov while working on this, I realized that GGML_PAD only works with multiple power of 2, is this expected?

@ngxson ngxson requested a review from ggerganov May 10, 2025 17:03
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, GGML_PAD works with only powers of 2. Should add a comment to clarify that.

@ngxson ngxson merged commit 15e6125 into ggml-org:master May 10, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: mtmd in server mode crashes on too big image
2 participants