CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

aendk · 2025-01-02T12:46:15Z

Hi All,
I am working on improving llama.cpp's CUDA graph performance on behalf of NVIDIA.
In preliminary testing, we are seeing up to 3% of performance gain by overlapping CPU and GPU work, and by improving CPU -> GPU copy scheduling on a high end system. The changes are likely to be even more impactful on less capable hardware.

To pave the way for these changes (and to provide readable diffs), I first isolated the cosmetic changes in this PR.
This PR does not contain any changes in the logic. It merely slims down the ggml_backend_cuda_graph_compute() by moving certain loops and other subtasks of the original function into 5 new functions.

These changes considerably improve the readability and future maintainability of this part of the CUDA backend.

Should I add prefixes to the new function names, and if so, what do you suggest?

@agray3 @mtavenrath

aendk · 2025-01-02T15:47:57Z

FYI: setting Status to Draft whilst I investigate the failed tests.

ggerganov · 2025-01-04T14:17:21Z

FYI: setting Status to Draft whilst I investigate the failed tests.

I think you just need to make the functions static.

…meters) to separate function for improved readability.

…aluation and capture to its own function.

…on for improved readability.

…ase readability.

aendk · 2025-01-09T13:52:22Z

Ok, third time's a charm. My first try lacked the static keyword, my second showed that there were some issues around missing ifdef's when disabling CUDA graphs.

Now this refactor-only PR is ready for review. Due to the ifdefs, the overall diff looks quite messy.

I strongly suggest to review commit by commit. Its easier to see there that not much has happened.
I moved some code into 5 new functions and removed a few lines of unused/dead code.

slaren

There are multiple consecutive #ifdef USE_CUDA_GRAPH, consider consolidating into a single one.

ggml/src/ggml-cuda/ggml-cuda.cu

…a single one

aendk · 2025-01-13T10:51:12Z

I implemented your requested changes. The single test failure looks like a CI / runner issue.
Feel free to re-review or relaunch the failed test.

…e improvements) (ggml-org#11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <[email protected]>

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 2, 2025

aendk marked this pull request as draft January 2, 2025 15:47

aendk force-pushed the akieslinger/refactor_cuda_backend branch from 3998c0d to 004ec3a Compare January 7, 2025 08:36

aendk added 8 commits January 8, 2025 12:41

Refactor: Moves cuda graph executable update step to separate function.

ba05331

Refactor: Moves cuda graph update check to separate function.

22c2429

Refactor: Moves cuda graph maintenance (update or adjusting copy para…

eb3ea69

…meters) to separate function for improved readability.

Fix: Adds missing reference to maintain_cuda_graph() definition.

ed10ff5

Refactor: Improves structure and abstractions by moving CUDA graph ev…

37518b7

…aluation and capture to its own function.

Refactor: Moves node graph checks and copy ops into individual functi…

0cdc133

…on for improved readability.

Refactor: Removes code permanently excluded from compilation to incre…

dd95edf

…ase readability.

Style: Adds missing newline

98d4e55

aendk force-pushed the akieslinger/refactor_cuda_backend branch from 004ec3a to 98d4e55 Compare January 9, 2025 13:15

aendk marked this pull request as ready for review January 9, 2025 13:52

slaren reviewed Jan 10, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

aendk added 2 commits January 13, 2025 09:19

Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into …

fcd62d9

…a single one

Refactor: Makes 'cuda_graph_update_required' a local variable

62f2f62

slaren approved these changes Jan 13, 2025

View reviewed changes

remove double lines between functions

5226732

slaren merged commit 39509fb into ggml-org:master Jan 13, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

Uh oh!

aendk commented Jan 2, 2025

Uh oh!

aendk commented Jan 2, 2025

Uh oh!

ggerganov commented Jan 4, 2025

Uh oh!

aendk commented Jan 9, 2025

Uh oh!

slaren left a comment

Uh oh!

Uh oh!

aendk commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042

Uh oh!

Conversation

aendk commented Jan 2, 2025

Uh oh!

aendk commented Jan 2, 2025

Uh oh!

ggerganov commented Jan 4, 2025

Uh oh!

aendk commented Jan 9, 2025

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aendk commented Jan 13, 2025

Uh oh!

Uh oh!

Uh oh!