Skip to content

Commit 3f9da22

Browse files
agray3slaren
andauthored
Simplify and improve CUDA graphs through use of indirect copy pointers (#9017)
* CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers Previously there was complexity in the CUDA graphs implementation due frequently changing parameters to copy kernels associated with K and V cache pointers. This patch simplifies by using indirection to avoid such parameters frequently changing, avoiding the need for frequent graph updates. Fixes #12152 * Addressed comments * fix HIP builds * properly sync to stream * removed ggml_cuda_cpy_fn_ptrs * move stream sync before free * guard to only use indirection with graphs * style fixes * check for errors --------- Co-authored-by: slaren <[email protected]>
1 parent 2a0dc97 commit 3f9da22

File tree

4 files changed

+122
-121
lines changed

4 files changed

+122
-121
lines changed

ggml/src/ggml-cuda/common.cuh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -729,7 +729,13 @@ struct ggml_cuda_graph {
729729
bool disable_due_to_failed_graph_capture = false;
730730
int number_consecutive_updates = 0;
731731
std::vector<ggml_graph_node_properties> ggml_graph_properties;
732-
std::vector<char **> updated_kernel_arg;
732+
bool use_cpy_indirection = false;
733+
std::vector<char *> cpy_dest_ptrs;
734+
char ** dest_ptrs_d;
735+
int dest_ptrs_size = 0;
736+
// Index to allow each cpy kernel to be aware of it's position within the graph
737+
// relative to other cpy nodes.
738+
int graph_cpynode_index = -1;
733739
#endif
734740
};
735741

0 commit comments

Comments
 (0)