Skip to content

POC: combined scale + diagonal mask infinity + soft max op #3121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ikawrakow
Copy link
Contributor

TL;DR: This PR is a POC that combines scale + diagonal mask infinity + soft max, used on K * Q in the attention part of the network, to raise the question if we want to get involved with fusing operations.

It is a POC, so for now just a simple Metal implementation (i.e., none of the optimizations to these operations in #3084 are used). Despite this, fusing these 3 operations leads to a small, but measurable performance gain (see table). If the consensus is that this is a worthwhile thing to do, I can add implementation for the other backends, and would see if it is possible to optimize the kernel(s) for the combined scale + diagonal mask infinity + soft max operation.

model backend test t/s (Master) t/s (PR) Speedup
Falcon 7B mostly F16 Metal pp 512 379.71 ± 0.23 386.22 ± 0.43 1.017
LLaMA 7B mostly F16 Metal pp 512 539.25 ± 0.30 543.57 ± 0.35 1.008
LLaMA 7B mostly Q4_0 Metal pp 512 495.43 ± 0.59 499.20 ± 0.35 1.008
Falcon 7B mostly F16 Metal tg 128 23.26 ± 0.01 23.48 ± 0.05 1.009
LLaMA 7B mostly F16 Metal tg 128 24.14 ± 0.04 24.34 ± 0.10 1.008
LLaMA 7B mostly Q4_0 Metal tg 128 62.82 ± 0.01 63.91 ± 0.04 1.017

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Sep 14, 2023
@ikawrakow
Copy link
Contributor Author

Given the complete lack of interest, I'm closing this one.

@ikawrakow ikawrakow closed this Sep 14, 2023
@ikawrakow ikawrakow deleted the ik/combined_attn_ops branch September 14, 2023 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants