Skip to content

Vulkan: add OP sigmoid #12056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 25, 2025
Merged

Vulkan: add OP sigmoid #12056

merged 1 commit into from
Feb 25, 2025

Conversation

foldl
Copy link
Contributor

@foldl foldl commented Feb 25, 2025

OP_SIGMOID is used by Deepseek-V3 and Moonlight. Implement this operator can make these models run faster on Vulkan backend.

T/S Performance comparison on Moonlight Q8_0 with 2080Ti, generating ~200 tokens:

Before After
34.83 72.10

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 25, 2025
@jeffbolznv
Copy link
Collaborator

LGTM. I verified the backend tests passed on RTX 4070.

Can you point me to the exact model you used for perf testing? I'd like to try it out.

@foldl
Copy link
Contributor Author

foldl commented Feb 25, 2025

LGTM. I verified the backend tests passed on RTX 4070.

Can you point me to the exact model you used for perf testing? I'd like to try it out.

Moonlight from Moonshot, a lite version of DeepSeek-V3.

https://huggingface.co/moonshotai/Moonlight-16B-A3B

@jeffbolznv
Copy link
Collaborator

Thanks. I tried Moonlight-16B-A3B-Instruct-Q4_K_M.gguf and I see an improvement from 104->144 t/s with this change.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0cc4m 0cc4m merged commit c132239 into ggml-org:master Feb 25, 2025
47 checks passed
@foldl foldl deleted the vulkan_add_sigmoid branch February 25, 2025 12:03
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants