Support broadcast add & mul on CUDA (fixed) #2192

li-plus · 2023-07-12T10:25:26Z

Should fix #2191. Missed ne11 term for ky in the previous version. Now fixed and generation is correct.

JohannesGaessler · 2023-07-13T17:11:07Z

ggml-cuda.cu

-        // compute
-        mul_f32_cuda(src0_ddf_i01, src1_ddf_i01, dst_ddf_i01, ne00, ne10, cudaStream_main);
-    }
+    mul_f32_cuda(src0_ddf_i, src1_ddf_i, dst_ddf_i, ne00*i01_diff, ne10*ne11, cudaStream_main);


The broadcasting logic here is still incorrect. The implementation on master broadcasts the values per row while this broadcasts the values after flattening both tensors. As long as ne11 == 1 this doesn't make a difference but I don't think that this is the implementation that we should be using.

I noticed that the column number of src0 and src1 is ensured to be the same by checking ggml_can_repeat_rows, so broadcasting after flattening 2d sub blocks should be the same as broadcasting every row. Did I miss something?

https://github.com/ggerganov/llama.cpp/blob/32c54116318929c90fd7ae814cf9b5232cd44c36/ggml.c#L5228-L5235

Okay, according to the commit history the following seems to have happened: I implemented broadcasting for multiplication both in dimension 0 and dimension 1. @ggerganov then added the additional requirement that dimension 0 must be equal via ggml_can_repeat_rows which effectively limits broadcasting to dimension 1. If that is indeed the specification to which broadcasting should be implemented then the broadcasting logic in this PR is correct.

For now let's broadcast only over dimension 1 - later we'll fix the TODOs and support dimension 0 broadcasts.
Can also obsolete ggml_scale() when we do that.

Support broadcast add & mul on CUDA (fixed)

a53a59a

li-plus mentioned this pull request Jul 13, 2023

ggml : revert CUDA broadcast changes from #2183 #2191

Merged

JohannesGaessler requested changes Jul 13, 2023

View reviewed changes

Merge branch 'master' into bcast-cuda

4fc4014

ggerganov approved these changes Jul 14, 2023

View reviewed changes

ggerganov merged commit 206e01d into ggml-org:master Jul 14, 2023

JohannesGaessler mentioned this pull request Jul 15, 2023

ggml backends interface, ggml-cuda refactor #2230

Closed

li-plus mentioned this pull request Jul 18, 2023

Speed up 3x for CUDA implementation li-plus/chatglm.cpp#56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support broadcast add & mul on CUDA (fixed) #2192

Support broadcast add & mul on CUDA (fixed) #2192

Uh oh!

li-plus commented Jul 12, 2023

Uh oh!

JohannesGaessler Jul 13, 2023

Uh oh!

li-plus Jul 13, 2023

Uh oh!

JohannesGaessler Jul 13, 2023

Uh oh!

ggerganov Jul 14, 2023

Uh oh!

Uh oh!

Support broadcast add & mul on CUDA (fixed) #2192

Support broadcast add & mul on CUDA (fixed) #2192

Uh oh!

Conversation

li-plus commented Jul 12, 2023

Uh oh!

JohannesGaessler Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

li-plus Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

ggerganov Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!