Skip to content

CUDA: revert part of the RDNA1 optimizations #8309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

daniandtheweb
Copy link
Contributor

@daniandtheweb daniandtheweb commented Jul 4, 2024

The change on the launch_bounds was causing a small performance drop in prompt processing, apparently this change was only beneficial before I tuned the mmq_y values.

model size params backend ngl test t/s master t/s PR Speedup
llama 8B Q5_K - Small 5.21 GiB 8.03 B ROCm 99 pp512 276.60 ± 0.41 300.60 ± 0.46 1.09

The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 4, 2024
@JohannesGaessler JohannesGaessler merged commit 0a42380 into ggml-org:master Jul 5, 2024
49 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants