Allow multiple copy function pointers for CUDA graph kernel updates #7565

agray3 · 2024-05-27T13:07:24Z

CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against.

Fixes #7492

…ates CUDA graphs require parameter updates to kernels associated with GGML_OP_CPY nodes. Previously the implementation only checked for a single CUDA kernel in such nodes, but this caused a bug in cases where 2 such kernels exist. This fixes the issue by using a vector to allow multiple function pointers to be stored and checked against. Fixes ggml-org#7942

agray3 · 2024-05-27T13:16:58Z

@JohannesGaessler Can you check if this works for #7527 ?

github-actions · 2024-05-27T14:46:53Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 529 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8873.23ms p(95)=21807.9ms fails=, finish reason: stop=476 truncated=53
Prompt processing (pp): avg=105.19tk/s p(95)=468.51tk/s
Token generation (tg): avg=58.91tk/s p(95)=46.72tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=ag_allow_multiple_cuda_cpy_fn_ptrs commit=21826514dfac9237a32cad6d1f2312298800ebf9

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 447.26, 447.26, 447.26, 447.26, 447.26, 690.48, 690.48, 690.48, 690.48, 690.48, 684.42, 684.42, 684.42, 684.42, 684.42, 699.25, 699.25, 699.25, 699.25, 699.25, 788.71, 788.71, 788.71, 788.71, 788.71, 787.03, 787.03, 787.03, 787.03, 787.03, 791.06, 791.06, 791.06, 791.06, 791.06, 815.97, 815.97, 815.97, 815.97, 815.97, 815.27, 815.27, 815.27, 815.27, 815.27, 827.36, 827.36, 827.36, 827.36, 827.36, 830.6, 830.6, 830.6, 830.6, 830.6, 861.16, 861.16, 861.16, 861.16, 861.16, 882.34, 882.34, 882.34, 882.34, 882.34, 905.07, 905.07, 905.07, 905.07, 905.07, 910.81, 910.81, 910.81, 910.81, 910.81, 910.32, 910.32, 910.32, 910.32, 910.32, 913.05, 913.05, 913.05, 913.05, 913.05, 910.64, 910.64, 910.64, 910.64, 910.64, 917.3, 917.3, 917.3, 917.3, 917.3, 930.07, 930.07, 930.07, 930.07, 930.07, 927.55, 927.55, 927.55, 927.55, 927.55, 931.7, 931.7, 931.7, 931.7, 931.7, 931.69, 931.69, 931.69, 931.69, 931.69, 920.1, 920.1, 920.1, 920.1, 920.1, 917.93, 917.93, 917.93, 917.93, 917.93, 919.82, 919.82, 919.82, 919.82, 919.82, 933.43, 933.43, 933.43, 933.43, 933.43, 929.53, 929.53, 929.53, 929.53, 929.53, 925.89, 925.89, 925.89, 925.89, 925.89, 924.95, 924.95, 924.95, 924.95, 924.95, 927.88, 927.88, 927.88, 927.88, 927.88, 927.28, 927.28, 927.28, 927.28, 927.28, 924.2, 924.2, 924.2, 924.2, 924.2, 926.38, 926.38, 926.38, 926.38, 926.38, 934.53, 934.53, 934.53, 934.53, 934.53, 937.01, 937.01, 937.01, 937.01, 937.01, 935.63, 935.63, 935.63, 935.63, 935.63, 933.67, 933.67, 933.67, 933.67, 933.67, 930.17, 930.17, 930.17, 930.17, 930.17, 928.36, 928.36, 928.36, 928.36, 928.36, 931.63, 931.63, 931.63, 931.63, 931.63, 930.67, 930.67, 930.67, 930.67, 930.67, 927.58, 927.58, 927.58, 927.58, 927.58, 901.39, 901.39, 901.39, 901.39, 901.39, 900.6, 900.6, 900.6, 900.6, 900.6, 897.45, 897.45, 897.45, 897.45, 897.45, 894.94, 894.94, 894.94, 894.94, 894.94, 892.24, 892.24, 892.24, 892.24, 892.24, 893.78, 893.78, 893.78, 893.78, 893.78, 895.0, 895.0, 895.0, 895.0, 895.0, 893.24, 893.24, 893.24, 893.24, 893.24, 896.37, 896.37, 896.37, 896.37, 896.37, 894.8, 894.8, 894.8, 894.8, 894.8, 894.4, 894.4, 894.4, 894.4, 894.4, 896.21, 896.21, 896.21, 896.21, 896.21, 895.12, 895.12, 895.12, 895.12, 895.12, 892.14, 892.14, 892.14, 892.14, 892.14, 893.71, 893.71, 893.71, 893.71, 893.71, 893.02, 893.02, 893.02, 893.02, 893.02, 892.89, 892.89, 892.89, 892.89, 892.89, 892.49, 892.49, 892.49]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 44.53, 44.53, 44.53, 44.53, 44.53, 44.57, 44.57, 44.57, 44.57, 44.57, 28.21, 28.21, 28.21, 28.21, 28.21, 29.32, 29.32, 29.32, 29.32, 29.32, 31.07, 31.07, 31.07, 31.07, 31.07, 31.09, 31.09, 31.09, 31.09, 31.09, 32.1, 32.1, 32.1, 32.1, 32.1, 32.86, 32.86, 32.86, 32.86, 32.86, 32.96, 32.96, 32.96, 32.96, 32.96, 32.95, 32.95, 32.95, 32.95, 32.95, 33.14, 33.14, 33.14, 33.14, 33.14, 33.31, 33.31, 33.31, 33.31, 33.31, 32.69, 32.69, 32.69, 32.69, 32.69, 32.25, 32.25, 32.25, 32.25, 32.25, 31.99, 31.99, 31.99, 31.99, 31.99, 30.6, 30.6, 30.6, 30.6, 30.6, 29.31, 29.31, 29.31, 29.31, 29.31, 29.64, 29.64, 29.64, 29.64, 29.64, 29.73, 29.73, 29.73, 29.73, 29.73, 29.5, 29.5, 29.5, 29.5, 29.5, 29.78, 29.78, 29.78, 29.78, 29.78, 29.86, 29.86, 29.86, 29.86, 29.86, 30.13, 30.13, 30.13, 30.13, 30.13, 30.34, 30.34, 30.34, 30.34, 30.34, 30.15, 30.15, 30.15, 30.15, 30.15, 30.47, 30.47, 30.47, 30.47, 30.47, 30.54, 30.54, 30.54, 30.54, 30.54, 30.29, 30.29, 30.29, 30.29, 30.29, 30.37, 30.37, 30.37, 30.37, 30.37, 30.63, 30.63, 30.63, 30.63, 30.63, 30.83, 30.83, 30.83, 30.83, 30.83, 30.84, 30.84, 30.84, 30.84, 30.84, 31.05, 31.05, 31.05, 31.05, 31.05, 31.1, 31.1, 31.1, 31.1, 31.1, 31.03, 31.03, 31.03, 31.03, 31.03, 30.78, 30.78, 30.78, 30.78, 30.78, 30.45, 30.45, 30.45, 30.45, 30.45, 30.24, 30.24, 30.24, 30.24, 30.24, 30.3, 30.3, 30.3, 30.3, 30.3, 30.5, 30.5, 30.5, 30.5, 30.5, 30.58, 30.58, 30.58, 30.58, 30.58, 30.6, 30.6, 30.6, 30.6, 30.6, 30.76, 30.76, 30.76, 30.76, 30.76, 30.63, 30.63, 30.63, 30.63, 30.63, 30.5, 30.5, 30.5, 30.5, 30.5, 29.95, 29.95, 29.95, 29.95, 29.95, 28.96, 28.96, 28.96, 28.96, 28.96, 28.67, 28.67, 28.67, 28.67, 28.67, 28.64, 28.64, 28.64, 28.64, 28.64, 28.63, 28.63, 28.63, 28.63, 28.63, 28.62, 28.62, 28.62, 28.62, 28.62, 28.65, 28.65, 28.65, 28.65, 28.65, 28.68, 28.68, 28.68, 28.68, 28.68, 28.72, 28.72, 28.72, 28.72, 28.72, 28.74, 28.74, 28.74, 28.74, 28.74, 28.75, 28.75, 28.75, 28.75, 28.75, 28.76, 28.76, 28.76, 28.76, 28.76, 28.81, 28.81, 28.81, 28.81, 28.81, 29.01, 29.01, 29.01, 29.01, 29.01, 29.13, 29.13, 29.13, 29.13, 29.13, 29.18, 29.18, 29.18]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.15, 0.15, 0.15, 0.15, 0.15, 0.42, 0.42, 0.42, 0.42, 0.42, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.27, 0.27, 0.27, 0.27, 0.27, 0.32, 0.32, 0.32, 0.32, 0.32, 0.35, 0.35, 0.35, 0.35, 0.35, 0.44, 0.44, 0.44, 0.44, 0.44, 0.34, 0.34, 0.34, 0.34, 0.34, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.31, 0.31, 0.31, 0.31, 0.31, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.32, 0.32, 0.32, 0.32, 0.32, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.28, 0.28, 0.28, 0.28, 0.28, 0.34, 0.34, 0.34, 0.34, 0.34, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.34, 0.34, 0.34, 0.34, 0.34, 0.51, 0.51, 0.51, 0.51, 0.51, 0.64, 0.64, 0.64, 0.64, 0.64, 0.6, 0.6, 0.6, 0.6, 0.6, 0.41, 0.41, 0.41, 0.41, 0.41, 0.21, 0.21, 0.21, 0.21, 0.21, 0.25, 0.25, 0.25, 0.25, 0.25, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.07, 0.07, 0.07, 0.07, 0.07, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 529 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716820583 --> 1716821207
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0]

JohannesGaessler

I can confirm that this fixes the issue both on master and for my PR.

agray3 mentioned this pull request May 27, 2024

CUDA graphs break quantized K cache #7492

Closed

JohannesGaessler approved these changes May 27, 2024

View reviewed changes

JohannesGaessler merged commit 197c006 into ggml-org:master May 27, 2024
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow multiple copy function pointers for CUDA graph kernel updates #7565

Allow multiple copy function pointers for CUDA graph kernel updates #7565

Uh oh!

agray3 commented May 27, 2024 •

edited

Loading

Uh oh!

agray3 commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

Uh oh!

Allow multiple copy function pointers for CUDA graph kernel updates #7565

Allow multiple copy function pointers for CUDA graph kernel updates #7565

Uh oh!

Conversation

agray3 commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agray3 commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

agray3 commented May 27, 2024 •

edited

Loading