CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

JohannesGaessler · 2025-03-06T12:21:07Z

The problem is as far as I can tell the explicit check against CC 7.0. However, if on a GPU with CC >= 7.5 the highest available PTX is 7.0 then the WMMA implementation should also be used.

LostRuins · 2025-03-10T02:24:49Z

Is there any logic to when the __CUDA_ARCH_LIST__ macro will be defined? I've noticed it existing in some places and not in others, but I don't see it explicitly defined anywhere in llama.cpp repo code.

JohannesGaessler · 2025-03-10T07:31:56Z

It's an NVCC macro, it's implicitly defined by CMake via e.g. set(CMAKE_CUDA_ARCHITECTURES "50;61;70;75;80").

LostRuins · 2025-03-10T13:24:19Z

I see. Do you know if it will be populated if I built with the makefile using something like -arch=all-major

JohannesGaessler · 2025-03-12T07:59:50Z

-arch=all_major should produce correct results but the performance will be bad. If you need to define an arbitrary list of compute capabilities you should use CMake.

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5

0f6242b

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 6, 2025

JohannesGaessler mentioned this pull request Mar 6, 2025

CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

Closed

slaren approved these changes Mar 6, 2025

View reviewed changes

JohannesGaessler merged commit 5220a16 into ggml-org:master Mar 6, 2025
47 checks passed

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (ggml-org#12222)

a39f529

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (ggml-org#12222)

288caf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

Uh oh!

JohannesGaessler commented Mar 6, 2025

Uh oh!

Uh oh!

LostRuins commented Mar 10, 2025

Uh oh!

JohannesGaessler commented Mar 10, 2025

Uh oh!

LostRuins commented Mar 10, 2025

Uh oh!

JohannesGaessler commented Mar 12, 2025

Uh oh!

Uh oh!

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

Uh oh!

Conversation

JohannesGaessler commented Mar 6, 2025

Uh oh!

Uh oh!

LostRuins commented Mar 10, 2025

Uh oh!

JohannesGaessler commented Mar 10, 2025

Uh oh!

LostRuins commented Mar 10, 2025

Uh oh!

JohannesGaessler commented Mar 12, 2025

Uh oh!

Uh oh!