Skip to content

Commit 06a92a1

Browse files
authored
server : fix cache reuse logic (#12161)
The first kv shift offsets the positions of all tokens after head_c. When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.
1 parent a057897 commit 06a92a1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/server/server.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3003,7 +3003,7 @@ struct server_context {
30033003
const int64_t kv_shift = (int64_t) head_p - (int64_t) head_c;
30043004

30053005
llama_kv_cache_seq_rm (ctx, slot.id, head_p, head_c);
3006-
llama_kv_cache_seq_add(ctx, slot.id, head_c, -1, kv_shift);
3006+
llama_kv_cache_seq_add(ctx, slot.id, head_c, head_c + n_match, kv_shift);
30073007

30083008
for (size_t i = 0; i < n_match; i++) {
30093009
slot.cache_tokens[head_p + i] = slot.cache_tokens[head_c + i];

0 commit comments

Comments
 (0)