-
Notifications
You must be signed in to change notification settings - Fork 12.2k
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. #9763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <[email protected]>
Apologies for the delay. I don't think this is correct yet, I get this when running the unit tests:
Can you take a look? I'm still busy, but I should be able to take a closer look next week if you don't figure it out until then. |
It seems that I probably made a mistake during the refactoring process. I will try to resolve it. |
fix casting to int. Signed-off-by: Changyeon Kim <[email protected]>
@0cc4m Hello, I solved the problem. It was the wrong order of parameters. PS C:\work\llm\cyzero\llama.cpp\build\bin\Release> ./test-backend-ops -o POOL_2D Backend 1/2: Vulkan0 POOL_2D(pool_type=avg,type_input=f32,ne_input=[10,10,3,1],k0=1,k1=1,s0=1,s1=1,p0=0,p1=0): OK Backend 2/2: CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I can confirm the tests go through now. Code looks good, too.
… MobileVLM model. (ggml-org#9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <[email protected]> --------- Signed-off-by: Changyeon Kim <[email protected]>
… MobileVLM model. (ggml-org#9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <[email protected]> --------- Signed-off-by: Changyeon Kim <[email protected]>
The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
A GGML_OP_POOL_2D shader has been added. (Pooling)
The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
I have read the contributing guidelines
Self-reported review complexity:
test model : MobileVLM V2 1.7B (https://huggingface.co/ZiangWu/MobileVLM_V2-1.7B-GGUF)
Test image : https://raw.githubusercontent.com/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg
master (cpu) :
master (vulkan) :
PR :
test-backend-ops
Full logs: