opencl: fix for small models #11950

lhez · 2025-02-18T23:25:28Z

Currently small models like qwen2.5 0.5B does not work properly with OpenCL backend. This PR fixes this issue. This PR also changes subgroup size to 64 for all Adreno GPUs.

* opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <[email protected]> Co-authored-by: Skyler Szot <[email protected]>

shawngu-quic and others added 4 commits February 13, 2025 22:16

opencl: fix small shape gemv, remove unused extensions

b0a765c

opencl: fix transpose_16, dump_tensor, enforce subgroup size

097f869

opencl: fix for token length < 4

97151f4

opencl: use wave size of 64 for all Adreno GPUs

d55ea5e

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 18, 2025

lhez marked this pull request as ready for review February 19, 2025 06:57

max-krasnyansky approved these changes Feb 24, 2025

View reviewed changes

max-krasnyansky merged commit 34a846b into ggml-org:master Feb 24, 2025
46 checks passed

arthw mentioned this pull request Feb 26, 2025

Cherry pick 20250224 arthw/llama.cpp#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: fix for small models #11950

opencl: fix for small models #11950

Uh oh!

lhez commented Feb 18, 2025

Uh oh!

Uh oh!

Uh oh!

opencl: fix for small models #11950

opencl: fix for small models #11950

Uh oh!

Conversation

lhez commented Feb 18, 2025

Uh oh!

Uh oh!

Uh oh!