Skip to content

Commit 8a6502d

Browse files
danbevggerganov
authored andcommitted
llama : remove redundant reshape in build_kv_store (ggml-org#6369)
* llama: remove redundant reshape in build_kv_store This commit removes the reshape of the V matrix in the build_kv_store. The motivation for this is that V matrix has the shape: ```console (gdb) p *v_cur $46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_MUL_MAT, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x0, view_offs = 0, data = 0x0, name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` And after reshaping this tensor we get: ```console gdb) p *ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens) $44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU, buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608, 8388608}, op = GGML_OP_RESHAPE, op_params = { 0 <repeats 16 times>}, flags = 0, grad = 0x0, src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0, view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0, name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0, padding = "\000\000\000\000\000\000\000"} ``` I noticed that the `src` and `view_src` fields are different but that the dimensions are the same. From the code comment it seems like the reshape call is not needed and perhaps the above can motivate the removal of the reshape call. Signed-off-by: Daniel Bevenius <[email protected]> * llama : add assert --------- Signed-off-by: Daniel Bevenius <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 0b31544 commit 8a6502d

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

llama.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5523,8 +5523,8 @@ static void llm_build_kv_store(
55235523
GGML_ASSERT(kv.size == n_ctx);
55245524

55255525
// compute the transposed [n_tokens, n_embd] V matrix
5526-
struct ggml_tensor * v_cur_t = ggml_transpose(ctx, ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens));
5527-
//struct ggml_tensor * v_cur_t = ggml_transpose(ctx, v_cur); // TODO: reshape above is likely not needed
5526+
assert(v_cur->ne[0] == n_embd_v_gqa && v_cur->ne[1] == n_tokens);
5527+
struct ggml_tensor * v_cur_t = ggml_transpose(ctx, v_cur);
55285528
cb(v_cur_t, "v_cur_t", il);
55295529

55305530
struct ggml_tensor * k_cache_view = ggml_view_1d(ctx, kv.k_l[il], n_tokens*n_embd_k_gqa,

0 commit comments

Comments
 (0)