-
Notifications
You must be signed in to change notification settings - Fork 12.2k
CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Graph Compute Function Refactor (precursor for performance improvements) #11042
Conversation
FYI: setting Status to Draft whilst I investigate the failed tests. |
I think you just need to make the functions |
3998c0d
to
004ec3a
Compare
…meters) to separate function for improved readability.
…aluation and capture to its own function.
…on for improved readability.
004ec3a
to
98d4e55
Compare
Ok, third time's a charm. My first try lacked the I strongly suggest to review commit by commit. Its easier to see there that not much has happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple consecutive #ifdef USE_CUDA_GRAPH
, consider consolidating into a single one.
I implemented your requested changes. The single test failure looks like a CI / runner issue. |
…e improvements) (ggml-org#11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <[email protected]>
…e improvements) (ggml-org#11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <[email protected]>
…e improvements) (ggml-org#11042) * Refactor: Moves cuda graph executable update step to separate function. * Refactor: Moves cuda graph update check to separate function. * Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability. * Fix: Adds missing reference to maintain_cuda_graph() definition. * Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function. * Refactor: Moves node graph checks and copy ops into individual function for improved readability. * Refactor: Removes code permanently excluded from compilation to increase readability. * Style: Adds missing newline * Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one * Refactor: Makes 'cuda_graph_update_required' a local variable * remove double lines between functions --------- Co-authored-by: slaren <[email protected]>
Hi All,
I am working on improving llama.cpp's CUDA graph performance on behalf of NVIDIA.
In preliminary testing, we are seeing up to 3% of performance gain by overlapping CPU and GPU work, and by improving CPU -> GPU copy scheduling on a high end system. The changes are likely to be even more impactful on less capable hardware.
To pave the way for these changes (and to provide readable diffs), I first isolated the cosmetic changes in this PR.
This PR does not contain any changes in the logic. It merely slims down the
ggml_backend_cuda_graph_compute()
by moving certain loops and other subtasks of the original function into 5 new functions.These changes considerably improve the readability and future maintainability of this part of the CUDA backend.
Should I add prefixes to the new function names, and if so, what do you suggest?
@agray3 @mtavenrath