metal: concurrently dispatch commands #2358

lshzh-ww · 2023-07-24T05:46:34Z

Discussion see #2309

Function ggml_metal_graph_find_concurrency will write commands that can be issued concurrently to metal context concur_list array, when ggml_metal_graph_compute is called for the first time.

Tested on M1 Max 32c gpu with :
./main -m model_file -n 256 -c 512 -s 123 -p "I believe the meaning of life is" --ignore-eos -ngl 1 --no-mmap -t 8

model	master	PR	improvement
33B Q4_0	73.0 ms/tok	70.1 ms/tok	~4%
7B Q4_0	19.50 ms/tok	18.26 ms/tok	~6%

Btw, since we already have much faster kernels for both Q_K and non Q_K, is it a good time to do prompt evaluation on metal? Although current kernels are optimized for mat-vec multiplications, they still provide much better performance than cpu. (EDIT: not true for M1 Pro)

Function `ggml_metal_graph_find_concurrency` will run and write commands that can be issued concurrently to metal context `concur_list` array, when `ggml_metal_graph_compute` is called for the first time.

slaren · 2023-07-24T11:42:49Z

ggml-metal.m

+    if (!ctx->concur_list_len) {
+        ggml_metal_graph_find_concurrency(ctx,gf);
+    }


Seems like this will break when computing graphs of different topologies.

I think for now we can assume that during the lifetime of metal ctx, the topology of graph won't change. In future, when we need to change the topology of graph and we also have a mechanism to tell the backend that the graph topology is changed, we can easily add the necessary code to address the updated topology.

For llama that's true, but there are other users of ggml.

Maybe instead of letting ggml-metal.m automatically call ggml_metal_graph_find_concurrency, we let llama.cpp decide if we should call ggml_metal_graph_find_concurrency and set the metal_ctx->concur_list?

The logic on ggml-metal.m will be not so intrusive: If metal_ctx->concur_list is set then dispatch ops concurrently, otherwise fallback to the original code path.

This will add backend-specific code in llama.cpp for now, but I imagine that in future we can have a ggml_backend_graph_optimize() interface to unify them.

Sounds good to me. Eventually we will need a better solution, but for now that should do. @ggerganov

ggerganov

On M1 Pro with Q4_0 7B the times are:

master: 28.5 ms/t
PR: 28.2 ms/t

ggml-metal.m

metal: concurrently dispatch commands

5d0dabe

Function `ggml_metal_graph_find_concurrency` will run and write commands that can be issued concurrently to metal context `concur_list` array, when `ggml_metal_graph_compute` is called for the first time.

lshzh-ww requested a review from ggerganov July 24, 2023 05:46

slaren reviewed Jul 24, 2023

View reviewed changes

lshzh-ww and others added 2 commits July 24, 2023 11:59

metal: don't call find_concurrency automatically.

6f489a7

Merge branch 'master' into HEAD

ea02f67

ggerganov approved these changes Jul 25, 2023

View reviewed changes

ggerganov reviewed Jul 25, 2023

View reviewed changes

ggml-metal.m Outdated Show resolved Hide resolved

metal : code style changes

141d88d

ggerganov merged commit 1aa18ef into ggml-org:master Jul 25, 2023

lshzh-ww mentioned this pull request Aug 16, 2023

metal: enable ggml-alloc #2627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal: concurrently dispatch commands #2358

metal: concurrently dispatch commands #2358

Uh oh!

lshzh-ww commented Jul 24, 2023 •

edited

Loading

Uh oh!

slaren Jul 24, 2023

Uh oh!

lshzh-ww Jul 24, 2023

Uh oh!

slaren Jul 24, 2023

Uh oh!

lshzh-ww Jul 24, 2023

Uh oh!

slaren Jul 24, 2023

Uh oh!

lshzh-ww Jul 24, 2023

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

metal: concurrently dispatch commands #2358

metal: concurrently dispatch commands #2358

Uh oh!

Conversation

lshzh-ww commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

lshzh-ww Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

slaren Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

lshzh-ww Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

slaren Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

lshzh-ww Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lshzh-ww commented Jul 24, 2023 •

edited

Loading