mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl #13434

ngxson · 2025-05-10T17:03:09Z

Models accepting dynamic resolution (Pixtral / Mistral Small / Qwen VL), we want to:

Have a max resolution. If image is bigger than the max res, it will be downscaled
Do a warm up with a more reasonable resolution instead of the max res, otherwise many users will get OOM (tbh I'm not quite sure if this is the best solution, but let's try this and also add a custom max_image_size in the future)
Resize the image to a multiple of patch_size (or in the case of Qwen VL, must be patch_size * 2)

Btw @ggerganov while working on this, I realized that GGML_PAD only works with ~~multiple~~ power of 2, is this expected?

ggerganov

Yes, GGML_PAD works with only powers of 2. Should add a comment to clarify that.

ngxson added 2 commits May 10, 2025 18:57

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl

f8d4e4c

fix typo

f4dcc24

ngxson requested a review from ggerganov May 10, 2025 17:03

github-actions bot added the examples label May 10, 2025

ggerganov approved these changes May 10, 2025

View reviewed changes

ngxson merged commit 15e6125 into ggml-org:master May 10, 2025
44 checks passed

Provide feedback