Skip to content

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

See #12187 .

The problem is as far as I can tell the explicit check against CC 7.0. However, if on a GPU with CC >= 7.5 the highest available PTX is 7.0 then the WMMA implementation should also be used.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 6, 2025
@JohannesGaessler JohannesGaessler merged commit 5220a16 into ggml-org:master Mar 6, 2025
47 checks passed
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
@LostRuins
Copy link
Collaborator

Is there any logic to when the __CUDA_ARCH_LIST__ macro will be defined? I've noticed it existing in some places and not in others, but I don't see it explicitly defined anywhere in llama.cpp repo code.

@JohannesGaessler
Copy link
Collaborator Author

It's an NVCC macro, it's implicitly defined by CMake via e.g. set(CMAKE_CUDA_ARCHITECTURES "50;61;70;75;80").

@LostRuins
Copy link
Collaborator

I see. Do you know if it will be populated if I built with the makefile using something like -arch=all-major

@JohannesGaessler
Copy link
Collaborator Author

-arch=all_major should produce correct results but the performance will be bad. If you need to define an arbitrary list of compute capabilities you should use CMake.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants